{"id":1441,"date":"2024-05-28T23:21:27","date_gmt":"2024-05-28T14:21:27","guid":{"rendered":"https:\/\/xn--k10aa.com\/?p=1441"},"modified":"2024-10-20T21:16:39","modified_gmt":"2024-10-20T12:16:39","slug":"cs3","status":"publish","type":"post","link":"https:\/\/remoooo.com\/en\/cs3\/","title":{"rendered":"Compute Shader Learning Notes (Part 3) Particle Effects and Cluster Behavior Simulation"},"content":{"rendered":"<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-316.png\" alt=\"img\" class=\"wp-image-1455 lazyload\"\/><noscript><img decoding=\"async\" width=\"1436\" height=\"768\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-316.png\" alt=\"img\" class=\"wp-image-1455 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-316.png 1436w, https:\/\/remoooo.com\/wp-content\/uploads\/image-316-300x160.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-316-1024x548.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-316-768x411.png 768w\" sizes=\"(max-width: 1436px) 100vw, 1436px\" \/><\/noscript><\/figure>\n\n\n\n<p>Following the previous article<\/p>\n\n\n\n<p><a href=\"https:\/\/xn--k10aa.com\/compute-shader%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%ef%bc%88%e4%b8%80%ef%bc%89-2\/\" data-type=\"post\" data-id=\"1403\">remoooo: Compute Shader Learning Notes (II) Post-processing Effects<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">L4 particle effects and crowd behavior simulation<\/h2>\n\n\n\n<p>This chapter uses Compute Shader to generate particles. Learn how to use DrawProcedural and DrawMeshInstancedIndirect, also known as GPU Instancing.<\/p>\n\n\n\n<p>Summary of knowledge points:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Compute Shader, Material, C# script and Shader work together<\/li>\n\n\n\n<li>Graphics.DrawProcedural<\/li>\n\n\n\n<li>material.SetBuffer()<\/li>\n\n\n\n<li>xorshift random algorithm<\/li>\n\n\n\n<li>Swarm Behavior Simulation<\/li>\n\n\n\n<li>Graphics.DrawMeshInstancedIndirect<\/li>\n\n\n\n<li>Rotation, translation, and scaling matrices, homogeneous coordinates<\/li>\n\n\n\n<li>Surface Shader<\/li>\n\n\n\n<li>ComputeBufferType.Default<\/li>\n\n\n\n<li>#pragma instancing_options procedural:setup<\/li>\n\n\n\n<li>unity_InstanceID<\/li>\n\n\n\n<li>Skinned Mesh Renderer<\/li>\n\n\n\n<li>Data alignment<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">1. Introduction and preparation<\/h3>\n\n\n\n<p>In addition to being able to process large amounts of data at the same time, Compute Shader also has a key advantage, which is that the Buffer is stored in the GPU. Therefore, the data processed by the Compute Shader can be directly passed to the Shader associated with the Material, that is, the Vertex\/Fragment Shader. The key here is that the material can also SetBuffer() like the Compute Shader, accessing data directly from the GPU&#039;s Buffer!<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-303.png\" alt=\"img\" class=\"wp-image-1442 lazyload\"\/><noscript><img decoding=\"async\" width=\"1440\" height=\"359\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-303.png\" alt=\"img\" class=\"wp-image-1442 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-303.png 1440w, https:\/\/remoooo.com\/wp-content\/uploads\/image-303-300x75.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-303-1024x255.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-303-768x191.png 768w\" sizes=\"(max-width: 1440px) 100vw, 1440px\" \/><\/noscript><\/figure>\n\n\n\n<p>Using Compute Shader to create a particle system can fully demonstrate the powerful parallel capabilities of Compute Shader.<\/p>\n\n\n\n<p>During the rendering process, the Vertex Shader reads the position and other attributes of each particle from the Compute Buffer and converts them into vertices on the screen. The Fragment Shader is responsible for generating pixels based on the information of these vertices (such as position and color). Through the Graphics.DrawProcedural method, Unity can<strong>Direct Rendering<\/strong>These vertices processed by the Shader do not require a pre-defined mesh structure and do not rely on the Mesh Renderer, which is particularly effective for rendering a large number of particles.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. Hello Particle<\/h3>\n\n\n\n<p>The steps are also very simple. Define the particle information (position, speed and life cycle) in C#, initialize and pass the data to Buffer, bind Buffer to Compute Shader and Material. In the rendering stage, call Graphics.DrawProceduralNow in OnRenderObject() to achieve efficient particle rendering.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-306.png\" alt=\"img\" class=\"wp-image-1445 lazyload\"\/><noscript><img decoding=\"async\" width=\"1366\" height=\"216\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-306.png\" alt=\"img\" class=\"wp-image-1445 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-306.png 1366w, https:\/\/remoooo.com\/wp-content\/uploads\/image-306-300x47.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-306-1024x162.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-306-768x121.png 768w\" sizes=\"(max-width: 1366px) 100vw, 1366px\" \/><\/noscript><\/figure>\n\n\n\n<p>Create a new scene and create an effect: millions of particles follow the mouse and bloom into life, as follows:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-311.png\" alt=\"img\" class=\"wp-image-1450 lazyload\"\/><noscript><img decoding=\"async\" width=\"956\" height=\"374\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-311.png\" alt=\"img\" class=\"wp-image-1450 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-311.png 956w, https:\/\/remoooo.com\/wp-content\/uploads\/image-311-300x117.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-311-768x300.png 768w\" sizes=\"(max-width: 956px) 100vw, 956px\" \/><\/noscript><\/figure>\n\n\n\n<p>Writing this makes me think a lot. The life cycle of a particle is very short, ignited in an instant like a spark, and disappearing like a meteor. Despite thousands of hardships, I am just a speck of dust among billions of dust, ordinary and insignificant. These particles may float randomly in space (<strong>Use the &quot;Xorshift&quot; algorithm to calculate the position of particle spawning<\/strong>), may have unique colors, but they can&#039;t escape the fate of being programmed. Isn&#039;t this a portrayal of my life? I play my role step by step, unable to escape the invisible constraints.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cGod is dead! And how can we who have killed him not feel the greatest pain?\u201d \u2013 Friedrich Nietzsche<\/p>\n<\/blockquote>\n\n\n\n<p>Nietzsche not only announced the disappearance of religious beliefs, but also pointed out the sense of nothingness faced by modern people, that is, without the traditional moral and religious pillars, people feel unprecedented loneliness and lack of direction. Particles are defined and created in the C# script, move and die according to specific rules, which is quite similar to the state of modern people in the universe described by Nietzsche. Although everyone tries to find their own meaning, they are ultimately restricted by broader social and cosmic rules.<\/p>\n\n\n\n<p>Life is full of various inevitable pains, reflecting the inherent emptiness and loneliness of human existence.<strong>Particle death logic to be written<\/strong>All of these confirm what Nietzsche said: nothing in life is permanent. The particles in the same buffer will inevitably disappear at some point in the future, which reflects the loneliness of modern people described by Nietzsche. Individuals may feel unprecedented isolation and helplessness, so everyone is a lonely warrior who must learn to face the inner tornado and the indifference of the outside world alone.<\/p>\n\n\n\n<p>But it doesn\u2019t matter, \u201cSummer will come again and again, and those who are meant to meet will meet again.\u201d The particles in this article will also be regenerated after the end, embracing their own Buffer in the best state.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Summer will come around again. People who meet will meet again.<\/p>\n<\/blockquote>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-305.png\" alt=\"img\" class=\"wp-image-1444 lazyload\"\/><noscript><img decoding=\"async\" width=\"280\" height=\"272\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-305.png\" alt=\"img\" class=\"wp-image-1444 lazyload\"\/><\/noscript><\/figure>\n\n\n\n<p>The current version of the code can be copied and run by yourself (all with comments):<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Compute Shader: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/L4_First_Particle\/Assets\/Shaders\/ParticleFun.compute<\/li>\n\n\n\n<li>CPU: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/L4_First_Particle\/Assets\/Scripts\/ParticleFun.cs<\/li>\n\n\n\n<li>Shader: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/L4_First_Particle\/Assets\/Shaders\/Particle.shader<\/li>\n<\/ul>\n\n\n\n<p>Enough of the nonsense, let\u2019s first take a look at how the C# script is written.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-304.png\" alt=\"img\" class=\"wp-image-1443 lazyload\"\/><noscript><img decoding=\"async\" width=\"962\" height=\"250\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-304.png\" alt=\"img\" class=\"wp-image-1443 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-304.png 962w, https:\/\/remoooo.com\/wp-content\/uploads\/image-304-300x78.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-304-768x200.png 768w\" sizes=\"(max-width: 962px) 100vw, 962px\" \/><\/noscript><\/figure>\n\n\n\n<p>As usual, first define the particle buffer (structure), initialize it, and then pass it to the GPU.<strong>The key lies in the last three lines that bind the Buffer to the shader operation.<\/strong>There is nothing much to say about the code in the ellipsis below. They are all routine operations, so they are just mentioned with comments.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>struct Particle{\n    public Vector3 position; \/\/ \u7c92\u5b50\u4f4d\u7f6e\n    public Vector3 velocity; \/\/ \u7c92\u5b50\u901f\u5ea6\n    public float life;       \/\/ \u7c92\u5b50\u751f\u547d\u5468\u671f\n}\nComputeBuffer particleBuffer; \/\/ GPU \u7684 Buffer\n...\n\/\/ Init() \u4e2d\n    \/\/ \u521d\u59cb\u5316\u7c92\u5b50\u6570\u7ec4\n    Particle&#91;] particleArray = new Particle&#91;particleCount];\n\n    for (int i = 0; i &lt; particleCount; i++){\n        \/\/ \u751f\u6210\u968f\u673a\u4f4d\u7f6e\u548c\u5f52\u4e00\u5316\n        ...\n        \/\/ \u8bbe\u7f6e\u7c92\u5b50\u7684\u521d\u59cb\u4f4d\u7f6e\u548c\u901f\u5ea6\n        ... \n        \/\/ \u8bbe\u7f6e\u7c92\u5b50\u7684\u751f\u547d\u5468\u671f\n        particleArray&#91;i].life = Random.value * 5.0f + 1.0f;\n    }\n    \/\/ \u521b\u5efa\u5e76\u8bbe\u7f6eCompute Buffer\n    ...\n    \/\/ \u67e5\u627eCompute Shader\u4e2d\u7684kernel ID\n    ...\n    \/\/ \u7ed1\u5b9aCompute Buffer\u5230shader\n    shader.SetBuffer(kernelID, \"particleBuffer\", particleBuffer);\n    material.SetBuffer(\"particleBuffer\", particleBuffer);\n    material.SetInt(\"_PointSize\", pointSize);<\/code><\/pre>\n\n\n\n<p>The key rendering stage is OnRenderObject(). material.SetPass is used to set the rendering material channel. The DrawProceduralNow method draws geometry without using traditional meshes. MeshTopology.Points specifies the topology type of the rendering as points. The GPU will treat each vertex as a point and will not form lines or faces between vertices. The second parameter 1 means starting drawing from the first vertex. particleCount specifies the number of vertices to render, which is the number of particles, that is, telling the GPU how many points need to be rendered in total.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>void OnRenderObject() { material.SetPass(0); Graphics.DrawProceduralNow(MeshTopology.Points, 1, particleCount); }<\/code><\/pre>\n\n\n\n<p>Get the current mouse position method. OnGUI() This method may be called multiple times per frame. The z value is set to the camera&#039;s near clipping plane plus an offset. Here, 14 is added to get a world coordinate that is more suitable for visual depth (you can also adjust it yourself).<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>void OnGUI()\n{\n    Vector3 p = new Vector3();\n    Camera c = Camera.main;\n    Event e = Event.current;\n    Vector2 mousePos = new Vector2();\n\n    \/\/ Get the mouse position from Event.\n    \/\/ Note that the y position from Event is inverted.\n    mousePos.x = e.mousePosition.x;\n    mousePos.y = c.pixelHeight - e.mousePosition.y;\n\n    p = c.ScreenToWorldPoint(new Vector3(mousePos.x, mousePos.y, c.nearClipPlane + 14));\n\n    cursorPos.x = p.x;\n    cursorPos.y = p.y;\n}<\/code><\/pre>\n\n\n\n<p>ComputeBuffer particleBuffer has been passed to Compute Shader and Shader above.<\/p>\n\n\n\n<p>Let&#039;s first look at the data structure of the Compute Shader. Nothing special.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ \u5b9a\u4e49\u7c92\u5b50\u6570\u636e\u7ed3\u6784\nstruct Particle\n{\n    float3 position;  \/\/ \u7c92\u5b50\u7684\u4f4d\u7f6e\n    float3 velocity;  \/\/ \u7c92\u5b50\u7684\u901f\u5ea6\n    float life;       \/\/ \u7c92\u5b50\u7684\u5269\u4f59\u751f\u547d\u65f6\u95f4\n};\n\n\/\/ \u7528\u4e8e\u5b58\u50a8\u548c\u66f4\u65b0\u7c92\u5b50\u6570\u636e\u7684\u7ed3\u6784\u5316\u7f13\u51b2\u533a\uff0c\u53ef\u4eceGPU\u8bfb\u5199\nRWStructuredBuffer&lt;Particle&gt; particleBuffer;\n\n\/\/ \u4eceCPU\u8bbe\u7f6e\u7684\u53d8\u91cf\nfloat deltaTime;       \/\/ \u4ece\u4e0a\u4e00\u5e27\u5230\u5f53\u524d\u5e27\u7684\u65f6\u95f4\u5dee\nfloat2 mousePosition;  \/\/ \u5f53\u524d\u9f20\u6807\u4f4d\u7f6e<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-307.png\" alt=\"img\" class=\"wp-image-1446 lazyload\"\/><noscript><img decoding=\"async\" width=\"1440\" height=\"944\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-307.png\" alt=\"img\" class=\"wp-image-1446 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-307.png 1440w, https:\/\/remoooo.com\/wp-content\/uploads\/image-307-300x197.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-307-1024x671.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-307-768x503.png 768w\" sizes=\"(max-width: 1440px) 100vw, 1440px\" \/><\/noscript><\/figure>\n\n\n\n<p>Here I will briefly talk about a particularly useful random number sequence generation method, the xorshift algorithm. It will be used to randomly control the movement direction of particles as shown above. The particles will move randomly in three-dimensional directions.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For more information, please refer to: https:\/\/en.wikipedia.org\/wiki\/Xorshift<\/li>\n\n\n\n<li>Original paper link: https:\/\/www.jstatsoft.org\/article\/view\/v008i14<\/li>\n<\/ul>\n\n\n\n<p>This algorithm was proposed by George Marsaglia in 2003. Its advantages are that it is extremely fast and very space-efficient. Even the simplest Xorshift implementation has a very long pseudo-random number cycle.<\/p>\n\n\n\n<p>The basic operations are shift and XOR. Hence the name of the algorithm. Its core is to maintain a non-zero state variable and generate random numbers by performing a series of shift and XOR operations on this state variable.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ \u7528\u4e8e\u751f\u6210\u968f\u673a\u6570\u7684\u72b6\u6001\u53d8\u91cf\nuint rng_state;\n\nuint rand_xorshift() {\n    \/\/ Xorshift algorithm from George Marsaglia's paper\n    rng_state ^= (rng_state &lt;&lt; 13);  \/\/ \u5c06\u72b6\u6001\u53d8\u91cf\u5de6\u79fb13\u4f4d\uff0c\u7136\u540e\u4e0e\u539f\u72b6\u6001\u8fdb\u884c\u5f02\u6216\n    rng_state ^= (rng_state &gt;&gt; 17);  \/\/ \u5c06\u66f4\u65b0\u540e\u7684\u72b6\u6001\u53d8\u91cf\u53f3\u79fb17\u4f4d\uff0c\u518d\u6b21\u8fdb\u884c\u5f02\u6216\n    rng_state ^= (rng_state &lt;&lt; 5);   \/\/ \u6700\u540e\uff0c\u5c06\u72b6\u6001\u53d8\u91cf\u5de6\u79fb5\u4f4d\uff0c\u8fdb\u884c\u6700\u540e\u4e00\u6b21\u5f02\u6216\n    return rng_state;                \/\/ \u8fd4\u56de\u66f4\u65b0\u540e\u7684\u72b6\u6001\u53d8\u91cf\u4f5c\u4e3a\u751f\u6210\u7684\u968f\u673a\u6570\n}<\/code><\/pre>\n\n\n\n<p><strong>Basic Xorshift<\/strong> The core of the algorithm has been explained above, but different shift combinations can create multiple variants. The original paper also mentions the Xorshift128 variant. Using a 128-bit state variable, the state is updated by four different shifts and XOR operations. The code is as follows:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-310.png\" alt=\"img\" class=\"wp-image-1449 lazyload\"\/><noscript><img decoding=\"async\" width=\"1440\" height=\"289\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-310.png\" alt=\"img\" class=\"wp-image-1449 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-310.png 1440w, https:\/\/remoooo.com\/wp-content\/uploads\/image-310-300x60.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-310-1024x206.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-310-768x154.png 768w\" sizes=\"(max-width: 1440px) 100vw, 1440px\" \/><\/noscript><\/figure>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ c language Ver uint32_t xorshift128(void) { static uint32_t x = 123456789; static uint32_t y = 362436069; static uint32_t z = 521288629; static uint32_t w = 88675123; uint32_t t = x ^ (x &lt;&lt; 11); x = y; y = z; z = w; w = w ^ (w &gt;&gt; 19) ^ (t ^ (t &gt;&gt; 8)); return w; }<\/code><\/pre>\n\n\n\n<p>This can produce longer periods and better statistical performance. The period of this variant is close, which is very impressive.<\/p>\n\n\n\n<p>In general, this algorithm is completely sufficient for game development, but it is not suitable for use in fields such as cryptography.<\/p>\n\n\n\n<p>When using this algorithm in Compute Shader, you need to pay attention to the range of random numbers generated by the Xorshift algorithm when it is the range of uint32, and you need to do another mapping ([0, 2^32-1] is mapped to [0, 1]):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>float tmp = (1.0 \/ 4294967296.0); \/\/ conversion factor rand_xorshift()) * tmp<\/code><\/pre>\n\n\n\n<p>The direction of particle movement is signed, so we just need to subtract 0.5 from it. Random movement in three directions:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>float f0 = float(rand_xorshift()) * tmp - 0.5; float f1 = float(rand_xorshift()) * tmp - 0.5; float f2 = float(rand_xorshift()) * tmp - 0.5; float3 normalF3 = normalize(float3(f0, f1, f2)) * 0.8f; \/\/ Scaled the direction of movement<\/code><\/pre>\n\n\n\n<p>Each Kernel needs to complete the following:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>First get the particle information of the previous frame in the Buffer<\/li>\n\n\n\n<li>Maintain particle buffer (calculate particle velocity, update position and health value), write back to buffer<\/li>\n\n\n\n<li>If the health value is less than 0, regenerate a particle<\/li>\n<\/ul>\n\n\n\n<p>Generate particles. Use the random number obtained by Xorshift just now to define the particle&#039;s health value and reset its speed.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ Set the new position and life of the particle particleBuffer[id].position = float3(normalF3.x + mousePosition.x, normalF3.y + mousePosition.y, normalF3.z + 3.0); particleBuffer[id].life = 4; \/\/ Reset life particleBuffer[id].velocity = float3(0,0,0); \/\/ Reset velocity<\/code><\/pre>\n\n\n\n<p>Finally, the basic data structure of Shader:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>struct Particle{\n    float3 position;\n    float3 velocity;\n    float life;\n};\n\nstruct v2f{\n    float4 position : SV_POSITION;\n    float4 color : COLOR;\n    float life : LIFE;\n    float size: PSIZE;\n};\n\/\/ particles' data\nStructuredBuffer&lt;Particle&gt; particleBuffer;<\/code><\/pre>\n\n\n\n<p>Then the vertex shader calculates the vertex color of the particle, the Clip position of the vertex, and transmits the information of a vertex size.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>v2f vert(uint vertex_id : SV_VertexID, uint instance_id : SV_InstanceID){\n    v2f o = (v2f)0;\n\n    \/\/ Color\n    float life = particleBuffer&#91;instance_id].life;\n    float lerpVal = life * 0.25f;\n    o.color = fixed4(1.0f - lerpVal+0.1, lerpVal+0.1, 1.0f, lerpVal);\n\n    \/\/ Position\n    o.position = UnityObjectToClipPos(float4(particleBuffer&#91;instance_id].position, 1.0f));\n    o.size = _PointSize;\n\n    return o;\n}<\/code><\/pre>\n\n\n\n<p>The fragment shader calculates the interpolated color.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>float4 frag(v2f i) : COLOR{ return i.color; }<\/code><\/pre>\n\n\n\n<p>At this point, you can get the above effect.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-312.png\" alt=\"img\" class=\"wp-image-1451 lazyload\"\/><noscript><img decoding=\"async\" width=\"1110\" height=\"424\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-312.png\" alt=\"img\" class=\"wp-image-1451 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-312.png 1110w, https:\/\/remoooo.com\/wp-content\/uploads\/image-312-300x115.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-312-1024x391.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-312-768x293.png 768w\" sizes=\"(max-width: 1110px) 100vw, 1110px\" \/><\/noscript><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">3. Quad particles<\/h3>\n\n\n\n<p>In the previous section, each particle only had one point, which was not interesting. Now let&#039;s turn a point into a Quad. In Unity, there is no Quad, only a fake Quad composed of two triangles.<\/p>\n\n\n\n<p>Let&#039;s start working on it, based on the code above. Define the vertices in C#, the size of a Quad.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ struct struct Vertex { public Vector3 position; public Vector2 uv; public float life; } const int SIZE_VERTEX = 6 * sizeof(float); public float quadSize = 0.1f; \/\/ Quad size<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-313.png\" alt=\"img\" class=\"wp-image-1452 lazyload\"\/><noscript><img decoding=\"async\" width=\"1440\" height=\"636\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-313.png\" alt=\"img\" class=\"wp-image-1452 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-313.png 1440w, https:\/\/remoooo.com\/wp-content\/uploads\/image-313-300x133.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-313-1024x452.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-313-768x339.png 768w\" sizes=\"(max-width: 1440px) 100vw, 1440px\" \/><\/noscript><\/figure>\n\n\n\n<p>On a per-particle basis, set the UV coordinates of the six vertices for use in the vertex shader, and draw them in the order specified by Unity.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>index = i*6; \/\/Triangle 1 - bottom-left, top-left, top-right vertexArray[index].uv.Set(0,0); vertexArray[index+1].uv.Set(0,1 ); vertexArray[index+2].uv.Set(1,1); \/\/Triangle 2 - bottom-left, top-right, bottom-right vertexArray[index+3].uv.Set(0,0); vertexArray[index+4].uv.Set(1,1); vertexArray[index+5].uv.Set(1,0);<\/code><\/pre>\n\n\n\n<p>Finally, it is passed to Buffer. The halfSize here is used to pass to Compute Shader to calculate the positions of each vertex of Quad.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>vertexBuffer = new ComputeBuffer(numVertices, SIZE_VERTEX);\nvertexBuffer.SetData(vertexArray);\nshader.SetBuffer(kernelID, \"vertexBuffer\", vertexBuffer);\nshader.SetFloat(\"halfSize\", quadSize*0.5f);\n\nmaterial.SetBuffer(\"vertexBuffer\", vertexBuffer);<\/code><\/pre>\n\n\n\n<p>During the rendering phase, the points are changed into triangles with six points.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>void OnRenderObject() { material.SetPass(0); Graphics.DrawProceduralNow(MeshTopology.Triangles, 6, numParticles); }<\/code><\/pre>\n\n\n\n<p>Change the settings in the Shader to receive vertex data and a texture for display. Alpha culling is required.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>_MainTex(\"Texture\", 2D) = \"white\" {}     \n...\nTags{ \"Queue\"=\"Transparent\" \"RenderType\"=\"Transparent\" \"IgnoreProjector\"=\"True\" }\nLOD 200\nBlend SrcAlpha OneMinusSrcAlpha\nZWrite Off\n...\n    struct Vertex{\n        float3 position;\n        float2 uv;\n        float life;\n    };\n    StructuredBuffer&lt;Vertex&gt; vertexBuffer;\n    sampler2D _MainTex;\n    v2f vert(uint vertex_id : SV_VertexID, uint instance_id : SV_InstanceID)\n    {\n        v2f o = (v2f)0;\n\n        int index = instance_id*6 + vertex_id;\n        float lerpVal = vertexBuffer&#91;index].life * 0.25f;\n        o.color = fixed4(1.0f - lerpVal+0.1, lerpVal+0.1, 1.0f, lerpVal);\n        o.position = UnityWorldToClipPos(float4(vertexBuffer&#91;index].position, 1.0f));\n        o.uv = vertexBuffer&#91;index].uv;\n\n        return o;\n    }\n\n    float4 frag(v2f i) : COLOR\n    {\n        fixed4 color = tex2D( _MainTex, i.uv ) * i.color;\n        return color;\n    }<\/code><\/pre>\n\n\n\n<p>In the Compute Shader, add receiving vertex data and halfSize.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>struct Vertex { float3 position; float2 uv; float life; }; RWStructuredBuffer vertexBuffer; float halfSize;<\/code><\/pre>\n\n\n\n<p>Calculate the positions of the six vertices of each Quad.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-308.png\" alt=\"img\" class=\"wp-image-1447 lazyload\"\/><noscript><img decoding=\"async\" width=\"1440\" height=\"568\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-308.png\" alt=\"img\" class=\"wp-image-1447 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-308.png 1440w, https:\/\/remoooo.com\/wp-content\/uploads\/image-308-300x118.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-308-1024x404.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-308-768x303.png 768w\" sizes=\"(max-width: 1440px) 100vw, 1440px\" \/><\/noscript><\/figure>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/Set the vertex buffer \/\/ int index = id.x * 6; \/\/Triangle 1 - bottom-left, top-left, top-right vertexBuffer[index].position.x = p.position.x-halfSize; vertexBuffer[index].position.y = p.position.y-halfSize; vertexBuffer[index].position.z = p.position.z; vertexBuffer[index].life = p.life; vertexBuffer[index+1].position.x = p.position.x-halfSize; vertexBuffer[index+1].position.y = p.position.y+halfSize; vertexBuffer[index+1].position.z = p .position.z; vertexBuffer[index+1].life = p.life; vertexBuffer[index+2].position.x = p.position.x+halfSize; vertexBuffer[index+2].position.y = p.position.y+halfSize; vertexBuffer[index+2].position.z = p.position.z; vertexBuffer[index+2].life = p.life; \/\/Triangle 2 - bottom-left, top-right, bottom-right \/\/ \/\/ vertexBuffer[index+3].position.x = p.position.x-halfSize; vertexBuffer[index+3].position.y = p.position.y-halfSize; vertexBuffer[index+3].position.z = p.position.z; vertexBuffer[index+3].life = p.life; vertexBuffer[index+4].position.x = p.position.x+halfSize; vertexBuffer[index+4].position.y = p.position.y+halfSize ; vertexBuffer[index+4].position.z = p.position.z; vertexBuffer[index+4].life = p.life; vertexBuffer[index+5].position.x = p.position.x+halfSize; vertexBuffer[index+5].position.y = p.position.y-halfSize; vertexBuffer[index+5].position.z = p.position.z; vertexBuffer[index+5].life = p.life;<\/code><\/pre>\n\n\n\n<p>Mission accomplished.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-309.png\" alt=\"img\" class=\"wp-image-1448 lazyload\"\/><noscript><img decoding=\"async\" width=\"1070\" height=\"616\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-309.png\" alt=\"img\" class=\"wp-image-1448 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-309.png 1070w, https:\/\/remoooo.com\/wp-content\/uploads\/image-309-300x173.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-309-1024x590.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-309-768x442.png 768w\" sizes=\"(max-width: 1070px) 100vw, 1070px\" \/><\/noscript><\/figure>\n\n\n\n<p>Current version code:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Compute Shader: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/L4_Quad\/Assets\/Shaders\/QuadParticles.compute<\/li>\n\n\n\n<li>CPU: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/L4_Quad\/Assets\/Scripts\/QuadParticles.cs<\/li>\n\n\n\n<li>Shader: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/L4_Quad\/Assets\/Shaders\/QuadParticle.shader<\/li>\n<\/ul>\n\n\n\n<p>In the next section, we will upgrade the Mesh to a prefab and try to simulate the flocking behavior of birds in flight.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4. Flocking simulation<\/h3>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-317.png\" alt=\"img\" class=\"wp-image-1457 lazyload\"\/><noscript><img decoding=\"async\" width=\"1158\" height=\"604\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-317.png\" alt=\"img\" class=\"wp-image-1457 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-317.png 1158w, https:\/\/remoooo.com\/wp-content\/uploads\/image-317-300x156.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-317-1024x534.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-317-768x401.png 768w\" sizes=\"(max-width: 1158px) 100vw, 1158px\" \/><\/noscript><\/figure>\n\n\n\n<p>Flocking is an algorithm that simulates the collective movement of animals such as flocks of birds and schools of fish in nature. The core is based on three basic behavioral rules, proposed by Craig Reynolds in Sig 87, and is often referred to as the &quot;Boids&quot; algorithm:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Separation<\/strong> Particles cannot be too close to each other, and there must be a sense of boundary. Specifically, the particles with a certain radius around them are calculated and then a direction is calculated to avoid collision.<\/li>\n\n\n\n<li><strong>Alignment<\/strong> The speed of an individual tends to the average speed of the group, and there should be a sense of belonging. Specifically, the average speed of particles within the visual range is calculated (the speed size <strong>direction<\/strong>). This visual range is determined by the actual biological characteristics of the bird, which will be mentioned in the next section.<\/li>\n\n\n\n<li><strong>Cohesion<\/strong> The position of the individual particles tends to the average position (the center of the group) to feel safe. Specifically, each particle finds the geometric center of its neighbors and calculates a moving vector (the final result is the average<strong>Location<\/strong>).<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-314.png\" alt=\"img\" class=\"wp-image-1453 lazyload\"\/><noscript><img decoding=\"async\" width=\"1176\" height=\"250\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-314.png\" alt=\"img\" class=\"wp-image-1453 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-314.png 1176w, https:\/\/remoooo.com\/wp-content\/uploads\/image-314-300x64.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-314-1024x218.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-314-768x163.png 768w\" sizes=\"(max-width: 1176px) 100vw, 1176px\" \/><\/noscript><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-318.png\" alt=\"img\" class=\"wp-image-1456 lazyload\"\/><noscript><img decoding=\"async\" width=\"1094\" height=\"350\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-318.png\" alt=\"img\" class=\"wp-image-1456 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-318.png 1094w, https:\/\/remoooo.com\/wp-content\/uploads\/image-318-300x96.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-318-1024x328.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-318-768x246.png 768w\" sizes=\"(max-width: 1094px) 100vw, 1094px\" \/><\/noscript><\/figure>\n\n\n\n<p>Think about it, which of the above three rules is the most difficult to implement?<\/p>\n\n\n\n<p>Answer: Separation. As we all know, calculating collisions between objects is very difficult to achieve. Because each individual needs to compare distances with all other individuals, this will cause the time complexity of the algorithm to be close to O(n^2), where n is the number of particles. For example, if there are 1,000 particles, then nearly 500,000 distance calculations may be required in each iteration. In the original paper, the author took 95 seconds to render one frame (80 birds) in the original unoptimized algorithm (time complexity O(N^2)), and it took nearly 9 hours to render a 300-frame animation.<\/p>\n\n\n\n<p>Generally speaking, using a quadtree or spatial hashing method can optimize the calculation. You can also maintain a neighbor list to store the individuals around each individual at a certain distance. Of course, you can also use Compute Shader to perform hard calculations.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-315.png\" alt=\"img\" class=\"wp-image-1454 lazyload\"\/><noscript><img decoding=\"async\" width=\"274\" height=\"264\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-315.png\" alt=\"img\" class=\"wp-image-1454 lazyload\"\/><\/noscript><\/figure>\n\n\n\n<p>Without further ado, let\u2019s get started.<\/p>\n\n\n\n<p>First download the prepared project files (if not prepared in advance):<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Bird&#039;s Prefab: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/main\/Assets\/Prefabs\/Boid.prefab<\/li>\n\n\n\n<li>Script: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/main\/Assets\/Scripts\/SimpleFlocking.cs<\/li>\n\n\n\n<li>Compute Shader: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/main\/Assets\/Shaders\/SimpleFlocking.compute<\/li>\n<\/ul>\n\n\n\n<p>Then add it to an empty GO.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-321.png\" alt=\"img\" class=\"wp-image-1460 lazyload\"\/><noscript><img decoding=\"async\" width=\"1124\" height=\"458\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-321.png\" alt=\"img\" class=\"wp-image-1460 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-321.png 1124w, https:\/\/remoooo.com\/wp-content\/uploads\/image-321-300x122.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-321-1024x417.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-321-768x313.png 768w\" sizes=\"(max-width: 1124px) 100vw, 1124px\" \/><\/noscript><\/figure>\n\n\n\n<p>Start the project and you&#039;ll see a bunch of birds.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-319.png\" alt=\"img\" class=\"wp-image-1458 lazyload\"\/><noscript><img decoding=\"async\" width=\"1062\" height=\"504\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-319.png\" alt=\"img\" class=\"wp-image-1458 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-319.png 1062w, https:\/\/remoooo.com\/wp-content\/uploads\/image-319-300x142.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-319-1024x486.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-319-768x364.png 768w\" sizes=\"(max-width: 1062px) 100vw, 1062px\" \/><\/noscript><\/figure>\n\n\n\n<p>Below are some parameters for group behavior simulation.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ Define the parameters for the crowd behavior simulation. public float rotationSpeed = 1f; \/\/ Rotation speed. public float boidSpeed = 1f; \/\/ Boid speed. public float neighbourDistance = 1f; \/\/ Neighboring distance. public float boidSpeedVariation = 1f; \/\/ Speed variation. public GameObject boidPrefab; \/\/ Prefab of Boid object. public int boidsCount; \/\/ Number of Boids. public float spawnRadius; \/\/ Radius of Boid spawn. public Transform target; \/\/ The moving target of the crowd.<\/code><\/pre>\n\n\n\n<p>Except for the Boid prefab boidPrefab and the spawn radius spawnRadius, everything else needs to be passed to the GPU.<\/p>\n\n\n\n<p>For the sake of convenience, let\u2019s make a foolish mistake in this section. We will only calculate the bird\u2019s position and direction on the GPU, and then pass it back to the CPU for the following processing:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>... boidsBuffer.GetData(boidsArray); \/\/ Update the position and direction of each bird for (int i = 0; i &lt; boidsArray.Length; i++){ boids[i].transform.localPosition = boidsArray[i].position; if (!boidsArray[i].direction.Equals(Vector3.zero)){ boids[i].transform.rotation = Quaternion.LookRotation(boidsArray[i].direction); } }<\/code><\/pre>\n\n\n\n<p>The Quaternion.LookRotation() method is used to create a rotation so that an object faces a specified direction.<\/p>\n\n\n\n<p>Calculate the position of each bird in the Compute Shader.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#pragma kernel CSMain #define GROUP_SIZE 256 struct Boid{ float3 position; float3 direction; }; RWStructuredBuffer boidsBuffer; float time; float deltaTime; float rotationSpeed; float boidSpeed; float boidSpeedVariation; float3 flockPosition; float neighborDistance; int boidsCount;\n<\/code><\/pre>\n\n\n<p>[numthreads(GROUP_SIZE,1,1)]<\/p>\n\n\n\n<p>void CSMain (uint3 id : SV_DispatchThreadID) { \u2026 \/\/ Continue below }<\/p>\n\n\n\n<p>First write the logic of alignment and aggregation, and finally output the actual position and direction to the Buffer.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Boid boid = boidsBuffer&#91;id.x];\n\n    float3 separation = 0; \/\/ \u5206\u79bb\n    float3 alignment = 0; \/\/ \u5bf9\u9f50 - \u65b9\u5411\n    float3 cohesion = flockPosition; \/\/ \u805a\u5408 - \u4f4d\u7f6e\n\n    uint nearbyCount = 1; \/\/ \u81ea\u8eab\u7b97\u4f5c\u5468\u8fb9\u7684\u4e2a\u4f53\u3002\n\n    for (int i=0; i&lt;boidsCount; i++)\n    {\n        if(i!=(int)id.x) \/\/ \u628a\u81ea\u5df1\u6392\u9664 \n        {\n            Boid temp = boidsBuffer&#91;i];\n            \/\/ \u8ba1\u7b97\u5468\u56f4\u8303\u56f4\u5185\u7684\u4e2a\u4f53\n            if(distance(boid.position, temp.position)&lt; neighbourDistance){\n                alignment += temp.direction;\n                cohesion += temp.position;\n                nearbyCount++;\n            }\n        }\n    }\n    float avg = 1.0 \/ nearbyCount;\n    alignment *= avg;\n    cohesion *= avg;\n    cohesion = normalize(cohesion-boid.position);\n\n    \/\/ \u7efc\u5408\u4e00\u4e2a\u79fb\u52a8\u65b9\u5411\n    float3 direction = alignment + separation + cohesion;\n    \/\/ \u5e73\u6ed1\u8f6c\u5411\u548c\u4f4d\u7f6e\u66f4\u65b0\n    boid.direction = lerp(direction, normalize(boid.direction), 0.94);\n    \/\/ deltaTime\u786e\u4fdd\u79fb\u52a8\u901f\u5ea6\u4e0d\u4f1a\u56e0\u5e27\u7387\u53d8\u5316\u800c\u6539\u53d8\u3002\n    boid.position += boid.direction * boidSpeed * deltaTime;\n\n    boidsBuffer&#91;id.x] = boid;<\/code><\/pre>\n\n\n\n<p>This is the result of having no sense of boundaries (separation terms), all individuals appear to have a fairly close relationship and overlap.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-320.png\" alt=\"img\" class=\"wp-image-1459 lazyload\"\/><noscript><img decoding=\"async\" width=\"922\" height=\"418\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-320.png\" alt=\"img\" class=\"wp-image-1459 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-320.png 922w, https:\/\/remoooo.com\/wp-content\/uploads\/image-320-300x136.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-320-768x348.png 768w\" sizes=\"(max-width: 922px) 100vw, 922px\" \/><\/noscript><\/figure>\n\n\n\n<p>Add the following code.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>if(distance(boid.position, temp.position)&lt; neighborDistance) { float3 offset = boid.position - temp.position; float dist = length(offset); if(dist &lt; neighborDistance) { dist = max(dist, 0.000001) ; separation += offset * (1.0\/dist - 1.0\/neighbourDistance); } ...<\/code><\/pre>\n\n\n\n<p>1.0\/dist When the Boids are closer together, this value is larger, indicating that the separation force should be greater. 1.0\/neighbourDistance is a constant based on the defined neighbor distance. The difference between the two represents how much the actual separation force responds to the distance. If the distance between the two Boids is exactly neighborDistance, this value is zero (no separation force). If the distance between the two Boids is less than neighborDistance, this value is positive, and the smaller the distance, the larger the value.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-324.png\" alt=\"img\" class=\"wp-image-1463 lazyload\"\/><noscript><img decoding=\"async\" width=\"1440\" height=\"729\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-324.png\" alt=\"img\" class=\"wp-image-1463 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-324.png 1440w, https:\/\/remoooo.com\/wp-content\/uploads\/image-324-300x152.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-324-1024x518.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-324-768x389.png 768w\" sizes=\"(max-width: 1440px) 100vw, 1440px\" \/><\/noscript><\/figure>\n\n\n\n<p>Current code: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/L4_Flocking\/Assets\/Shaders\/SimpleFlocking.compute<\/p>\n\n\n\n<p>The next section will use Instanced Mesh to improve performance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5. GPU Instancing Optimization<\/h3>\n\n\n\n<p>First, let&#039;s review the content of this chapter. In both the &quot;Hello Particle&quot; and &quot;Quad Particle&quot; examples, we used the Instanced technology (Graphics.DrawProceduralNow()) to pass the particle position calculated by the Compute Shader directly to the VertexFrag shader.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-322.png\" alt=\"img\" class=\"wp-image-1461 lazyload\"\/><noscript><img decoding=\"async\" width=\"1440\" height=\"359\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-322.png\" alt=\"img\" class=\"wp-image-1461 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-322.png 1440w, https:\/\/remoooo.com\/wp-content\/uploads\/image-322-300x75.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-322-1024x255.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-322-768x191.png 768w\" sizes=\"(max-width: 1440px) 100vw, 1440px\" \/><\/noscript><\/figure>\n\n\n\n<p>DrawMeshInstancedIndirect used in this section is used to draw a large number of geometric instances. The instances are similar, but the positions, rotations or other parameters are slightly different. Compared with DrawProceduralNow, which regenerates the geometry and renders it every frame, DrawMeshInstancedIndirect only needs to set the instance information once, and then the GPU can render all instances at once based on this information. Use this function to render grass and groups of animals.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-326.png\" alt=\"img\" class=\"wp-image-1465 lazyload\"\/><noscript><img decoding=\"async\" width=\"1440\" height=\"197\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-326.png\" alt=\"img\" class=\"wp-image-1465 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-326.png 1440w, https:\/\/remoooo.com\/wp-content\/uploads\/image-326-300x41.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-326-1024x140.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-326-768x105.png 768w\" sizes=\"(max-width: 1440px) 100vw, 1440px\" \/><\/noscript><\/figure>\n\n\n\n<p>This function has many parameters, only some of which are used.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-323.png\" alt=\"img\" class=\"wp-image-1462 lazyload\"\/><noscript><img decoding=\"async\" width=\"1386\" height=\"670\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-323.png\" alt=\"img\" class=\"wp-image-1462 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-323.png 1386w, https:\/\/remoooo.com\/wp-content\/uploads\/image-323-300x145.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-323-1024x495.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-323-768x371.png 768w\" sizes=\"(max-width: 1386px) 100vw, 1386px\" \/><\/noscript><\/figure>\n\n\n\n<pre class=\"wp-block-code\"><code>Graphics.DrawMeshInstancedIndirect(boidMesh, 0, boidMaterial, bounds, argsBuffer);<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\">\n<li>boidMesh: Throw the bird Mesh in.<\/li>\n\n\n\n<li>subMeshIndex: The submesh index to draw. Usually 0 if the mesh has only one submesh.<\/li>\n\n\n\n<li>boidMaterial: The material applied to the instanced object.<\/li>\n\n\n\n<li>Bounds: The bounding box specifies the drawing range. The instantiated object will only be rendered in the area within this bounding box. Used to optimize performance.<\/li>\n\n\n\n<li>argsBuffer: ComputeBuffer of parameters, including the number of indices of each instance&#039;s geometry and the number of instances.<\/li>\n<\/ol>\n\n\n\n<p>What is this argsBuffer? This parameter is used to tell Unity which mesh we want to render and how many meshes we want to render! We can use a special Buffer as a parameter.<\/p>\n\n\n\n<p>When initializing the shader, a special Buffer is created, which is labeled ComputeBufferType.IndirectArguments. This type of buffer is specifically used to pass to the GPU so that indirect drawing commands can be executed on the GPU. The first parameter of new ComputeBuffer here is 1, which represents an args array (an array has 5 uints). Don&#039;t get it wrong.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ComputeBuffer argsBuffer; ... argsBuffer = new ComputeBuffer(1, 5 * sizeof(uint), ComputeBufferType.IndirectArguments); if (boidMesh != null) { args[0] = (uint)boidMesh.GetIndexCount(0); args[ 1] = (uint)numOfBoids; } argsBuffer.SetData(args); ... Graphics.DrawMeshInstancedIndirect(boidMesh, 0, boidMaterial, bounds, argsBuffer);<\/code><\/pre>\n\n\n\n<p>Based on the previous chapter, an offset is added to the individual data structure, which is used for the direction offset in the Compute Shader. In addition, the direction of the initial state is interpolated using Slerp, 70% keeps the original direction, and 30% is random. The result of Slerp interpolation is a quaternion, which needs to be converted to Euler angles using the quaternion method and then passed into the constructor.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>public float noise_offset; ... Quaternion rot = Quaternion.Slerp(transform.rotation, Random.rotation, 0.3f); boidsArray[i] = new Boid(pos, rot.eulerAngles, offset);<\/code><\/pre>\n\n\n\n<p>After passing this new attribute noise_offset to the Compute Shader, a noise value in the range [-1, 1] is calculated and applied to the bird&#039;s speed.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>float noise = clamp(noise1(time \/ 100.0 + boid.noise_offset), -1, 1) * 2.0 - 1.0; float velocity = boidSpeed * (1.0 + noise * boidSpeedVariation);<\/code><\/pre>\n\n\n\n<p>Then we optimized the algorithm a bit. Compute Shader is basically the same.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>if (distance(boid_pos, boidsBuffer&#91;i].position) &lt; neighbourDistance)\n{\n    float3 tempBoid_position = boidsBuffer&#91;i].position;\n\n    float3 offset = boid.position - tempBoid_position;\n    float dist = length(offset);\n    if (dist&lt;neighbourDistance){\n        dist = max(dist, 0.000001);\/\/Avoid division by zero\n        separation += offset * (1.0\/dist - 1.0\/neighbourDistance);\n    }\n    alignment += boidsBuffer&#91;i].direction;\n    cohesion += tempBoid_position;\n\n    nearbyCount += 1;\n}<\/code><\/pre>\n\n\n\n<p>The biggest difference is in the shader. This section uses a surface shader instead of a fragment. This is actually a packaged vertex and fragment shader. Unity has already done a lot of tedious work such as lighting and shadows. You can still specify a vertice.<\/p>\n\n\n\n<p>When writing shaders to make materials, you need to do special processing for instanced objects. Because the positions, rotations and other properties of ordinary rendering objects are static in Unity. For the instantiated objects to be built, their positions, rotations and other parameters are constantly changing. Therefore, a special mechanism is needed in the rendering pipeline to dynamically set the position and parameters of each instantiated object. The current method is based on the instantiation technology of the program, which can render all instantiated objects at once without drawing them one by one. That is, one-time batch rendering.<\/p>\n\n\n\n<p>The shader uses the instanced technique. The instantiation phase is executed before vert. This way each instantiated object has its own rotation, translation, and scaling matrices.<\/p>\n\n\n\n<p>Now we need to create a rotation matrix for each instantiated object. From the Buffer, we get the basic information of the bird calculated by the Compute Shader (in the previous section, the data was sent back to the CPU, and here it is directly sent to the Shader for instantiation):<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-325.png\" alt=\"img\" class=\"wp-image-1464 lazyload\"\/><noscript><img decoding=\"async\" width=\"1020\" height=\"244\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-325.png\" alt=\"img\" class=\"wp-image-1464 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-325.png 1020w, https:\/\/remoooo.com\/wp-content\/uploads\/image-325-300x72.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-325-768x184.png 768w\" sizes=\"(max-width: 1020px) 100vw, 1020px\" \/><\/noscript><\/figure>\n\n\n\n<p>In Shader, the data structure and related operations passed by Buffer are wrapped with the following macros.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ .shader\n#ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED\nstruct Boid\n{\n    float3 position;\n    float3 direction;\n    float noise_offset;\n};\n\nStructuredBuffer&lt;Boid&gt; boidsBuffer; \n#endif<\/code><\/pre>\n\n\n\n<p>Since I only specified the number of birds to be instantiated (the number of birds, which is also the size of the Buffer) in args[1] of DrawMeshInstancedIndirect of C#, I can directly access the Buffer using the unity_InstanceID index.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#pragma instancing_options procedural:setup\n\nvoid setup()\n{\n    #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED\n        _BoidPosition = boidsBuffer&#91;unity_InstanceID].position;\n        _Matrix = create_matrix(boidsBuffer&#91;unity_InstanceID].position, boidsBuffer&#91;unity_InstanceID].direction, float3(0.0, 1.0, 0.0));\n    #endif\n}<\/code><\/pre>\n\n\n\n<p>The calculation of the space transformation matrix here involves<strong>Homogeneous Coordinates<\/strong>, you can review the GAMES101 course. The point is (x,y,z,1) and the coordinates are (x,y,z,0).<\/p>\n\n\n\n<p>If you use affine transformations, the code is as follows:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>void setup() { #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED _BoidPosition = boidsBuffer[unity_InstanceID].position; _LookAtMatrix = look_at_matrix(boidsBuffer[unity_InstanceID].direction, float3(0.0, 1.0, 0.0)); #endif } void vert(inout appdata_full v, out Input data) { UNITY_INITIALIZE_OUTPUT(Input, data); #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED v.vertex = mul(_LookAtMatrix, v.vertex); v.vertex.xyz += _BoidPosition; #endif }<\/code><\/pre>\n\n\n\n<p>Not elegant enough, we can just use homogeneous coordinates. One matrix handles rotation, translation and scaling!<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>void setup()\n{\n    #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED\n    _BoidPosition = boidsBuffer&#91;unity_InstanceID].position;\n    _Matrix = create_matrix(boidsBuffer&#91;unity_InstanceID].position, boidsBuffer&#91;unity_InstanceID].direction, float3(0.0, 1.0, 0.0));\n    #endif\n}\n void vert(inout appdata_full v, out Input data)\n{\n    UNITY_INITIALIZE_OUTPUT(Input, data);\n\n    #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED\n    v.vertex = mul(_Matrix, v.vertex);\n    #endif\n}<\/code><\/pre>\n\n\n\n<p>Now, we are done! The current frame rate is nearly doubled compared to the previous section.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-329.png\" alt=\"img\" class=\"wp-image-1469 lazyload\"\/><noscript><img decoding=\"async\" width=\"1440\" height=\"872\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-329.png\" alt=\"img\" class=\"wp-image-1469 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-329.png 1440w, https:\/\/remoooo.com\/wp-content\/uploads\/image-329-300x182.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-329-1024x620.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-329-768x465.png 768w\" sizes=\"(max-width: 1440px) 100vw, 1440px\" \/><\/noscript><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-327.png\" alt=\"img\" class=\"wp-image-1466 lazyload\"\/><noscript><img decoding=\"async\" width=\"898\" height=\"454\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-327.png\" alt=\"img\" class=\"wp-image-1466 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-327.png 898w, https:\/\/remoooo.com\/wp-content\/uploads\/image-327-300x152.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-327-768x388.png 768w\" sizes=\"(max-width: 898px) 100vw, 898px\" \/><\/noscript><\/figure>\n\n\n\n<p>Current version code:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Compute Shader: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/L4_Instanced\/Assets\/Shaders\/InstancedFlocking.compute<\/li>\n\n\n\n<li>CPU: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/L4_Instanced\/Assets\/Scripts\/InstancedFlocking.cs<\/li>\n\n\n\n<li>Shader: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/blob\/L4_Instanced\/Assets\/Shaders\/InstancedFlocking.shader<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6. Apply skin animation<\/h3>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-328.png\" alt=\"img\" class=\"wp-image-1467 lazyload\"\/><noscript><img decoding=\"async\" width=\"1212\" height=\"336\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-328.png\" alt=\"img\" class=\"wp-image-1467 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-328.png 1212w, https:\/\/remoooo.com\/wp-content\/uploads\/image-328-300x83.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-328-1024x284.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-328-768x213.png 768w\" sizes=\"(max-width: 1212px) 100vw, 1212px\" \/><\/noscript><\/figure>\n\n\n\n<p>What we need to do in this section is to use the Animator component to grab the Mesh of each keyframe into the Buffer before instantiating the object. By selecting different indexes, we can get Mesh of different poses. The specific skeletal animation production is beyond the scope of this article.<\/p>\n\n\n\n<p>You just need to modify the code based on the previous chapter and add the Animator logic. I have written comments below, you can take a look.<\/p>\n\n\n\n<p>And the individual data structure is updated:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>struct Boid{ float3 position; float3 direction; float noise_offset; float speed; \/\/ not useful for now float frame; \/\/ indicates the current frame index in the animation float3 padding; \/\/ ensure data alignment };<\/code><\/pre>\n\n\n\n<p>Let&#039;s talk about alignment in detail. In a data structure, the size of the data should preferably be an integer multiple of 16 bytes.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>float3 position; (12 bytes)<\/li>\n\n\n\n<li>float3 direction; (12 bytes)<\/li>\n\n\n\n<li>float noise_offset; (4 bytes)<\/li>\n\n\n\n<li>float speed; (4 bytes)<\/li>\n\n\n\n<li>float frame; (4 bytes)<\/li>\n\n\n\n<li>float3 padding; (12 bytes)<\/li>\n<\/ul>\n\n\n\n<p>Without padding, the size is 36 bytes, which is not a common alignment size. With padding, the alignment is 48 bytes, perfect!<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>private SkinnedMeshRenderer boidSMR; \/\/ \u7528\u4e8e\u5f15\u7528\u5305\u542b\u8499\u76ae\u7f51\u683c\u7684SkinnedMeshRenderer\u7ec4\u4ef6\u3002\nprivate Animator animator;\npublic AnimationClip animationClip; \/\/ \u5177\u4f53\u7684\u52a8\u753b\u526a\u8f91\uff0c\u901a\u5e38\u7528\u4e8e\u8ba1\u7b97\u52a8\u753b\u76f8\u5173\u7684\u53c2\u6570\u3002\n\nprivate int numOfFrames; \/\/ \u52a8\u753b\u4e2d\u7684\u5e27\u6570\uff0c\u7528\u4e8e\u786e\u5b9a\u5728GPU\u7f13\u51b2\u533a\u4e2d\u5b58\u50a8\u591a\u5c11\u5e27\u6570\u636e\u3002\npublic float boidFrameSpeed = 10f; \/\/ \u63a7\u5236\u52a8\u753b\u64ad\u653e\u7684\u901f\u5ea6\u3002\nMaterialPropertyBlock props; \/\/ \u5728\u4e0d\u521b\u5efa\u65b0\u6750\u6599\u5b9e\u4f8b\u7684\u60c5\u51b5\u4e0b\u4f20\u9012\u53c2\u6570\u7ed9\u7740\u8272\u5668\u3002\u8fd9\u610f\u5473\u7740\u53ef\u4ee5\u6539\u53d8\u5b9e\u4f8b\u7684\u6750\u8d28\u5c5e\u6027\uff08\u5982\u989c\u8272\u3001\u5149\u7167\u7cfb\u6570\u7b49\uff09\uff0c\u800c\u4e0d\u4f1a\u5f71\u54cd\u5230\u4f7f\u7528\u76f8\u540c\u6750\u6599\u7684\u5176\u4ed6\u5bf9\u8c61\u3002\nMesh boidMesh; \/\/ \u5b58\u50a8\u4eceSkinnedMeshRenderer\u70d8\u7119\u51fa\u7684\u7f51\u683c\u6570\u636e\u3002\n...\nvoid Start(){ \/\/ \u8fd9\u91cc\u9996\u5148\u521d\u59cb\u5316Boid\u6570\u636e\uff0c\u7136\u540e\u8c03\u7528GenerateSkinnedAnimationForGPUBuffer\u6765\u51c6\u5907\u52a8\u753b\u6570\u636e\uff0c\u6700\u540e\u8c03\u7528InitShader\u6765\u8bbe\u7f6e\u6e32\u67d3\u6240\u9700\u7684Shader\u53c2\u6570\u3002\n    ...\n    \/\/ This property block is used only for avoiding an instancing bug.\n    props = new MaterialPropertyBlock();\n    props.SetFloat(\"_UniqueID\", Random.value);\n    ...\n    InitBoids();\n    GenerateSkinnedAnimationForGPUBuffer();\n    InitShader();\n}\nvoid InitShader(){ \/\/ \u6b64\u65b9\u6cd5\u914d\u7f6eShader\u548c\u6750\u6599\u5c5e\u6027\uff0c\u786e\u4fdd\u52a8\u753b\u64ad\u653e\u53ef\u4ee5\u6839\u636e\u5b9e\u4f8b\u7684\u4e0d\u540c\u9636\u6bb5\u6b63\u786e\u663e\u793a\u3002frameInterpolation\u7684\u542f\u7528\u6216\u7981\u7528\u51b3\u5b9a\u4e86\u662f\u5426\u5728\u52a8\u753b\u5e27\u4e4b\u95f4\u8fdb\u884c\u63d2\u503c\uff0c\u4ee5\u83b7\u5f97\u66f4\u5e73\u6ed1\u7684\u52a8\u753b\u6548\u679c\u3002\n    ...\n    if (boidMesh)\/\/Set by the GenerateSkinnedAnimationForGPUBuffer\n    ...\n    shader.SetFloat(\"boidFrameSpeed\", boidFrameSpeed);\n    shader.SetInt(\"numOfFrames\", numOfFrames);\n    boidMaterial.SetInt(\"numOfFrames\", numOfFrames);\n    if (frameInterpolation &amp;&amp; !boidMaterial.IsKeywordEnabled(\"FRAME_INTERPOLATION\"))\n    boidMaterial.EnableKeyword(\"FRAME_INTERPOLATION\");\n    if (!frameInterpolation &amp;&amp; boidMaterial.IsKeywordEnabled(\"FRAME_INTERPOLATION\"))\n    boidMaterial.DisableKeyword(\"FRAME_INTERPOLATION\");\n}\nvoid Update(){\n    ...\n    \/\/ \u540e\u9762\u4e24\u4e2a\u53c2\u6570\uff1a\n        \/\/ 1. 0: \u53c2\u6570\u7f13\u51b2\u533a\u7684\u504f\u79fb\u91cf\uff0c\u7528\u4e8e\u6307\u5b9a\u4ece\u54ea\u91cc\u5f00\u59cb\u8bfb\u53d6\u53c2\u6570\u3002\n        \/\/ 2. props: \u524d\u9762\u521b\u5efa\u7684 MaterialPropertyBlock\uff0c\u5305\u542b\u6240\u6709\u5b9e\u4f8b\u5171\u4eab\u7684\u5c5e\u6027\u3002\n    Graphics.DrawMeshInstancedIndirect( boidMesh, 0, boidMaterial, bounds, argsBuffer, 0, props);\n}\nvoid OnDestroy(){ \n    ...\n    if (vertexAnimationBuffer != null) vertexAnimationBuffer.Release();\n}\nprivate void GenerateSkinnedAnimationForGPUBuffer()\n{\n    ... \/\/ \u63a5\u4e0b\u6587\n}<\/code><\/pre>\n\n\n\n<p>In order to provide the Shader with Mesh with different postures at different times, the mesh vertex data of each frame is extracted from the Animator and SkinnedMeshRenderer in the GenerateSkinnedAnimationForGPUBuffer() function, and then the data is stored in the GPU&#039;s ComputeBuffer for use in instanced rendering.<\/p>\n\n\n\n<p>GetCurrentAnimatorStateInfo to obtain the state information of the current animation layer for subsequent precise control of animation playback.<\/p>\n\n\n\n<p>numOfFrames is determined using the power of two that is closest to the product of the animation length and the frame rate, which can optimize GPU memory access.<\/p>\n\n\n\n<p>Then create a ComputeBuffer to store all vertex data for all frames. vertexAnimationBuffer<\/p>\n\n\n\n<p>In the for loop, bake all animation frames. Specifically, play and update immediately at each sampleTime point, then bake the mesh of the current animation frame into bakedMesh. And extract the newly baked Mesh vertices, update them into the array vertexAnimationData, and finally upload them to the GPU to end.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ ...\u63a5\u4e0a\u6587\nboidSMR = boidObject.GetComponentInChildren&lt;SkinnedMeshRenderer&gt;();\nboidMesh = boidSMR.sharedMesh;\nanimator = boidObject.GetComponentInChildren&lt;Animator&gt;();\nint iLayer = 0;\nAnimatorStateInfo aniStateInfo = animator.GetCurrentAnimatorStateInfo(iLayer);\n\nMesh bakedMesh = new Mesh();\nfloat sampleTime = 0;\nfloat perFrameTime = 0;\n\nnumOfFrames = Mathf.ClosestPowerOfTwo((int)(animationClip.frameRate * animationClip.length));\nperFrameTime = animationClip.length \/ numOfFrames;\n\nvar vertexCount = boidSMR.sharedMesh.vertexCount;\nvertexAnimationBuffer = new ComputeBuffer(vertexCount * numOfFrames, 16);\nVector4&#91;] vertexAnimationData = new Vector4&#91;vertexCount * numOfFrames];\nfor (int i = 0; i &lt; numOfFrames; i++)\n{\n    animator.Play(aniStateInfo.shortNameHash, iLayer, sampleTime);\n    animator.Update(0f);\n\n    boidSMR.BakeMesh(bakedMesh);\n\n    for(int j = 0; j &lt; vertexCount; j++)\n    {\n        Vector4 vertex = bakedMesh.vertices&#91;j];\n        vertex.w = 1;\n        vertexAnimationData&#91;(j * numOfFrames) +  i] = vertex;\n    }\n\n    sampleTime += perFrameTime;\n}\n\nvertexAnimationBuffer.SetData(vertexAnimationData);\nboidMaterial.SetBuffer(\"vertexAnimation\", vertexAnimationBuffer);\n\nboidObject.SetActive(false);<\/code><\/pre>\n\n\n\n<p>In the Compute Shader, maintain each frame variable stored in an individual data structure.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>boid.frame = boid.frame + velocity * deltaTime * boidFrameSpeed; if (boid.frame &gt;= numOfFrames) boid.frame -= numOfFrames;<\/code><\/pre>\n\n\n\n<p>Lerp different frames of animation in Shader. The left side is without frame interpolation, and the right side is after interpolation. The effect is very significant.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-96.jpeg\" alt=\"\u89c6\u9891\u5c01\u9762\" class=\"wp-image-1468 lazyload\"\/><noscript><img decoding=\"async\" width=\"1254\" height=\"746\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-96.jpeg\" alt=\"\u89c6\u9891\u5c01\u9762\" class=\"wp-image-1468 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-96.jpeg 1254w, https:\/\/remoooo.com\/wp-content\/uploads\/image-96-300x178.jpeg 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-96-1024x609.jpeg 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-96-768x457.jpeg 768w\" sizes=\"(max-width: 1254px) 100vw, 1254px\" \/><\/noscript><\/figure>\n\n\n\n<p>A good title can get more recommendations and followers<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>void vert(inout appdata_custom v) { #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED #ifdef FRAME_INTERPOLATION v.vertex = lerp(vertexAnimation[v.id * numOfFrames + _CurrentFrame], vertexAnimation[v.id * numOfFrames + _NextFrame], _FrameInterpolation); #else v.vertex = vertexAnimation[v.id * numOfFrames + _CurrentFrame]; #endif v.vertex = mul(_Matrix, v.vertex); #endif } void setup() { #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED _Matrix = create_matrix(boidsBuffer[unity_InstanceID].position, boidsBuffer[unity_InstanceID].direction, float3(0.0, 1.0, 0.0)); _CurrentFrame = boidsBuffer[unity_InstanceID].frame; #ifdef FRAME_INTERPOLATION _NextFrame = _CurrentFrame + 1; if (_NextFrame &gt;= numOfFrames) _NextFrame = 0; _FrameInterpolation = frac(boidsBuffer[unity_InstanceID].frame); #endif #endif }<\/code><\/pre>\n\n\n\n<p>It was not easy, but it is finally complete.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-332.png\" alt=\"img\" class=\"wp-image-1472 lazyload\"\/><noscript><img decoding=\"async\" width=\"1440\" height=\"821\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-332.png\" alt=\"img\" class=\"wp-image-1472 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-332.png 1440w, https:\/\/remoooo.com\/wp-content\/uploads\/image-332-300x171.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-332-1024x584.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-332-768x438.png 768w\" sizes=\"(max-width: 1440px) 100vw, 1440px\" \/><\/noscript><\/figure>\n\n\n\n<p>Complete project link: https:\/\/github.com\/Remyuu\/Unity-Compute-Shader-Learn\/tree\/L4_Skinned\/Assets\/Scripts<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">8. Summary\/Quiz<\/h3>\n\n\n\n<p>When rendering points which gives the best answer?<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-331.png\" alt=\"img\" class=\"wp-image-1471 lazyload\"\/><noscript><img decoding=\"async\" width=\"1440\" height=\"380\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-331.png\" alt=\"img\" class=\"wp-image-1471 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-331.png 1440w, https:\/\/remoooo.com\/wp-content\/uploads\/image-331-300x79.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-331-1024x270.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-331-768x203.png 768w\" sizes=\"(max-width: 1440px) 100vw, 1440px\" \/><\/noscript><\/figure>\n\n\n\n<p>What are the three key steps in flocking?<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-330.png\" alt=\"img\" class=\"wp-image-1470 lazyload\"\/><noscript><img decoding=\"async\" width=\"1440\" height=\"368\" src=\"https:\/\/\u80a5\u80a5.com\/wp-content\/uploads\/image-330.png\" alt=\"img\" class=\"wp-image-1470 lazyload\" srcset=\"https:\/\/remoooo.com\/wp-content\/uploads\/image-330.png 1440w, https:\/\/remoooo.com\/wp-content\/uploads\/image-330-300x77.png 300w, https:\/\/remoooo.com\/wp-content\/uploads\/image-330-1024x262.png 1024w, https:\/\/remoooo.com\/wp-content\/uploads\/image-330-768x196.png 768w\" sizes=\"(max-width: 1440px) 100vw, 1440px\" \/><\/noscript><\/figure>\n\n\n\n<p>When creating an arguments buffer for DrawMeshInstancedIndirect, how many uints are required?<\/p>\n\n\n\n<p><img decoding=\"async\" class=\"lazyload\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/picx.zhimg.com\/80\/v2-93d1dc1419f1246f0ccbe8b33a2d8e35_1440w.png?source=d16d100b\" alt=\"img\"><noscript><img decoding=\"async\" class=\"lazyload\" src=\"https:\/\/picx.zhimg.com\/80\/v2-93d1dc1419f1246f0ccbe8b33a2d8e35_1440w.png?source=d16d100b\" alt=\"img\"><\/noscript><\/p>\n\n\n\n<p>We created the wing flapping by using a skinned mesh shader. True or False.<\/p>\n\n\n\n<p><img decoding=\"async\" class=\"lazyload\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/picx.zhimg.com\/80\/v2-7cb3d96798ee1a490d9e0d9641a8d120_1440w.png?source=d16d100b\" alt=\"img\"><noscript><img decoding=\"async\" class=\"lazyload\" src=\"https:\/\/picx.zhimg.com\/80\/v2-7cb3d96798ee1a490d9e0d9641a8d120_1440w.png?source=d16d100b\" alt=\"img\"><\/noscript><\/p>\n\n\n\n<p>In a shader used by DrawMeshInstancedIndirect, which variable name gives the correct index for the instance?<\/p>\n\n\n\n<p><img decoding=\"async\" class=\"lazyload\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/picx.zhimg.com\/80\/v2-d4793de8e95a27312d93da12486354cf_1440w.png?source=d16d100b\" alt=\"img\"><noscript><img decoding=\"async\" class=\"lazyload\" src=\"https:\/\/picx.zhimg.com\/80\/v2-d4793de8e95a27312d93da12486354cf_1440w.png?source=d16d100b\" alt=\"img\"><\/noscript><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>https:\/\/en.wikipedia.org\/wiki\/Boids<\/li>\n\n\n\n<li><a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/37401.37406\">Flocks, Herds, and Schools: A Distributed Behavioral Model<\/a><\/li>\n<\/ol>","protected":false},"excerpt":{"rendered":"<p>\u7d27\u63a5\u7740\u4e0a\u4e00\u7bc7\u6587\u7ae0 remoooo\uff1aCompute Shader\u5b66\u4e60\u7b14\u8bb0\uff08\u4e8c\uff09\u4e4b \u540e\u5904\u7406\u6548\u679c L4 \u7c92\u5b50\u6548\u679c\u4e0e\u7fa4 [&hellip;]<\/p>","protected":false},"author":1,"featured_media":1455,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[53],"tags":[74,37],"class_list":["post-1441","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech","tag-compute-shader","tag-unity"],"_links":{"self":[{"href":"https:\/\/remoooo.com\/en\/wp-json\/wp\/v2\/posts\/1441","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/remoooo.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/remoooo.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/remoooo.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/remoooo.com\/en\/wp-json\/wp\/v2\/comments?post=1441"}],"version-history":[{"count":2,"href":"https:\/\/remoooo.com\/en\/wp-json\/wp\/v2\/posts\/1441\/revisions"}],"predecessor-version":[{"id":1596,"href":"https:\/\/remoooo.com\/en\/wp-json\/wp\/v2\/posts\/1441\/revisions\/1596"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/remoooo.com\/en\/wp-json\/wp\/v2\/media\/1455"}],"wp:attachment":[{"href":"https:\/\/remoooo.com\/en\/wp-json\/wp\/v2\/media?parent=1441"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/remoooo.com\/en\/wp-json\/wp\/v2\/categories?post=1441"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/remoooo.com\/en\/wp-json\/wp\/v2\/tags?post=1441"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}