Tag: Shader

  • Unity 曲面細分詳解

    Unity surface breakdown

    Tags: Getting Started/Shader/Tessellation Shader/Displacement Map/LOD/Smooth Outline/Early Culling

    The word tessellation refers to a broad category of design activities, usually involving the arrangement of tiles of various geometric shapes next to each other to form a pattern on a flat surface. Its purpose can be artistic or practical, and many examples date back thousands of years. — Tessellation, Wikipedia, accessed July 2020.


    This article mainly refers to:

    https://nedmakesgames.medium.com/mastering-tessellation-shaders-and-their-many-uses-in-unity-9caeb760150e

    Surface subdivision in game development is generally done in a triangleflat(or Quad) and then use the Displacement map to do vertex displacement, or use the Phong subdivision or PN triangles subdivision implemented in this article to do vertex displacement.

    Phong subdivision does not need to know the adjacent topological information, only uses interpolation calculation, which is more efficient than PN triangles and other algorithms. Loop and Schaefer mentioned in GAMES101 use low-degree quadrilateral surfaces to approximate Catmull-Clark surfaces. The polygons input by these methods are replaced by a polynomial surface. The Phong subdivision in this article does not require any operation to correct additional geometric areas.

    1. Overview of the tessellation process

    This chapter introduces the process of surface subdivision in the rendering pipeline.

    The tessellation shader is located after the vertex shader, and the tessellation is divided into three steps: Hull, Tesselllator and Domain, among which Tessellator is not programmable.

    The first step of tessellation is the tessellation control shader (also known as Tessellation Control Shader, TCS), which will output control points and tessellation factors. This stage mainly consists of two parallel functions: Hull Function and Patch Constant Function.

    Both functions receive patches, which are a set of vertex indices. For example, a triangle uses three numbers to represent the vertex indices. One patch can form a fragment, for example, a triangle fragment is composed of three vertex indices.

    Moreover, the Hull Function is executed once for each vertex, and the Path Constant Function is executed once for each Patch. The former outputs the modified control point data (usually including vertex position, possible normals, texture coordinates and other attributes), while the latter outputs the constant data related to the entire fragment, that is, the subdivision factor. The subdivision factor tells the next stage (the tessellator) how to subdivide each fragment.

    In general, the Hull Function modifies each control point, while the Patch Constant Function determines the level of subdivision based on the distance from the camera.

    Next comes the non-programmable stage, the tessellator. It receives the patch and the subdivision factor just obtained. The tessellator generates a barycentric coordinate for each vertex data.

    Next comes the last step, the Domain Stage (also known as Tessellation Evaluation Shader, TES), which is programmable. This part consists of domain functions, which are executed once per vertex. It receives the barycentric coordinates and the results generated by the two functions in the Patch and Hull Stage. Most of the logic is written here. The most important thing is that you can reposition the vertices in this stage, which is the most important part of tessellation.

    If there is a geometry shader, it will be executed after the Domain Stage. But if not, it will come to the rasterization stage.

    In summary, the first thing is the vertex shader. The Hull stage accepts vertex data and decides how to subdivide the mesh. Then the tessellator stage processes the subdivided mesh, and finally the Domain stage outputs vertices for the fragment shader.

    2. Surface subdivision analysis

    This chapter contains code analysis of Unity's surface subdivision, practical example effects display and an overview of the underlying principles.

    2.1 Key code analysis

    2.1.1 Basic settings of Unity tessellation

    First of all, the tessellation shader needs to use shader target 5.0.

    HLSLPROGRAM
    #Pragmas target 5.0 // 5.0 required for tessellation
    
    #Pragmas vertex Vertex
    #Pragmas hull Hull
    #Pragmas domain Domain
    #Pragmas fragment Fragment
    
    ENDHLSL

    2.1.2 Hull Stage Code 1 – Hull Function

    In the classic process, the vertex shader converts the position and normal information into world space. Then the output result is passed to the Hull Stage. It should be noted that, unlike the vertex shader, the vertices of the Hull shader are represented by INTERNALTESSPOS semantics instead of POSITION semantics. The reason is that Hull does not need to output these vertex positions to the next rendering process, but for its own internal tessellation algorithm, so it will convert these vertices to a coordinate system that is more suitable for tessellation. In addition, developers can also distinguish more clearly.

    struct Attributes {
        float3 positionOS : POSITION;
        float3 normalOS : NORMAL;
        UNITY_VERTEX_INPUT_INSTANCE_ID
    };
    
    struct TessellationControlPoint {
        float3 positionWS : INTERNAL LTESS POS;
        float3 normalWS : NORMAL;
        UNITY_VERTEX_INPUT_INSTANCE_ID
    };
    
    TessellationControlPoint Vertex(Attributes input) {
        TessellationControlPoint output;
    
        UNITY_SETUP_INSTANCE_ID(input);
        UNITY_TRANSFER_INSTANCE_ID(input, output);
    
        VertexPositionInputs posnInputs = GetVertexPositionInputs(input.positionOS);
        VertexNormalInputs normalInputs = GetVertexNormalInputs(input.normalOS);
    
        output.positionWS = posnInputs.positionWS;
        output.normalWS = normalInputs.normalWS;
        return output;
    }

    Below are some setting parameters for the Hull Shader.

    The first line, domain, defines the domain type of the tessellation shader, which means that both the input and output are triangle primitives. You can choose tri (triangle), quad (quadrilateral), etc.

    The second line outputcontrolpoints indicates the number of output control points, 3 corresponds to the three vertices of the triangle.

    The third line outputtopology indicates the topological structure of the primitive after subdivision. triangle_cw means that the vertices of the output triangle are sorted clockwise. The correct order can ensure that the surface faces outward. triangle_cw (clockwise around the triangle), triangle_ccw (counterclockwise around the triangle), line (line segment)

    The fourth line patchconstantfunc is another function of the Hull Stage, which outputs constant data such as subdivision factors. A patch is executed only once.

    The fifth line, partitioning, specifies how to distribute additional vertices to the edges of the original Path primitive. This step can make the subdivision process smoother and more uniform. integer, fractional_even, fractional_odd.

    The maxtessfactor in the sixth line represents the maximum subdivision factor. Limiting the maximum subdivision can control the rendering burden.

    [domain("tri")]
    [outputcontrolpoints(3)]
    [outputtopology("triangle_cw")]
    [patchconstantfunc("patchconstant")]
    [partitioning("fractional_even")]
    [maxtessfactor(64.0)]

    In the Hull Shader, each control point will be called once independently, so this function will be executed the same number of control points. To know which vertex is currently being processed, we use the variable id with the semantics of SV_OutputControlPointID to determine. The function also passes in a special structure that can be used to easily access any control point in the Patch like an array.

    TessellationControlPoint Hull(
        InputPatch<TessellationControlPoint, 3> patch, uint id : SV_OutputControlPointID) {
        TessellationControlPoint h;
        // Hull shader code here
    
        return patch[id];
    }

    2.1.3 Hull Stage Code 2 – Patch Constant Function

    In addition to the Hull Shader, there is another function in the Hull Stage that runs in parallel, the patch constant function. The signature of this function is relatively simple. It inputs a patch and outputs the calculated subdivision factor. The output structure contains the tessellation factor specified for each edge of the triangle. These factors are identified by the special system value semantics SV_TessFactor. Each tessellation factor defines how many small segments the corresponding edge should be subdivided into, thereby affecting the density and details of the resulting mesh. Let's take a closer look at what this factor specifically contains.

    struct TessellationFactors {
        float edge[3] : SV_TessFactor;
        float inside : SV_InsideTessFactor;
    };
    // The patch constant function runs once per triangle, or "patch"
    // It runs in parallel to the hull function
    TessellationFactors PatchConstantFunction(
        InputPatch<TessellationControlPoint, 3> patch) {
        UNITY_SETUP_INSTANCE_ID(patch[0]); // Set up instancing
        //Calculate tessellation factors
        TessellationFactors f;
        f.edge[0] = _FactorEdge1.x;
        f.edge[1] = _FactorEdge1.y;
        f.edge[2] = _FactorEdge1.z;
        f.inside = _FactorInside;
        return f;
    }

    First, there is an edge tessellation factor edge[3] in the TessellationFactors structure, marked as SV_TessFactor. When using triangles as the basic primitives for tessellation, each edge is defined as being located relative to the vertex with the same index. Specifically: edge 0 corresponds to vertex 1 and vertex 2. Edge 1 corresponds to vertex 2 and vertex 0. Edge 2 corresponds to vertex 0 and vertex 1. Why is this so? The intuitive explanation is that the index of the edge is the same as the index of the vertex it is not connected to. This helps to quickly identify and process the edges corresponding to specific vertices when writing shader code.

    There is also a center tessellation factor inside labeled SV_InsideTessFactor. This factor directly changes the final tessellation pattern, and more essentially determines the number of edge subdivisions, which is used to control the subdivision density inside the triangle. Compared with the edge subdivision factor, the center tessellation factor controls how the inside of the triangle is further subdivided into smaller triangles, while the edge tessellation factor affects the number of edge subdivisions.

    Patch Constant Function can also output other useful data, but it must be labeled with the correct semantics. For example, BEZIERPOS semantics is very useful and can represent float3 data. This semantics will be used later to output the control points of the smoothing algorithm based on the Bezier curve.

    2.1.4 Domain Stage Code

    Next, we enter the Domain Stage. The Domain Function also has a Domain property, which should be the same as the output topology type of the Hull Function. In this example, it is set to a triangle. This function inputs the patch from the Hull Function, the output of the Patch Constant Function, and the most important vertex barycentric coordinates. The output structure is very similar to the output structure of the vertex shader, containing the position of the Clip space, as well as the lighting data required by the fragment shader.

    It doesn’t matter if you don’t know what it is for now. Just read Chapter 4 of this article and then come back to study it.

    Simply put, each new vertex that is subdivided will run this domain function.

    struct Interpolators {
        float3 normalWS                 : TEXCOORD0;
        float3 positionWS               : TEXCOORD1;
        float4 positionCS               : SV_POSITION;
    };
    
    // Call this macro to interpolate between a triangle patch, passing the field name
    #define BARYCENTRIC_INTERPOLATE(fieldName) \
            patch[0].fieldName * barycentricCoordinates.x + \
            patch[1].fieldName * barycentricCoordinates.y + \
            patch[2].fieldName * barycentricCoordinates.z
    
    // The domain function runs once per vertex in the final, tessellated mesh
    // Use it to reposition vertices and prepare for the fragment stage
    [domain("tri")] // Signal we're inputting triangles
    Interpolators Domain(
        TessellationFactors factors, //The output of the patch constant function
        OutputPatch<TessellationControlPoint, 3> patch, // The Input triangle
        float3 barycentricCoordinates : SV_DomainLocation) { // The barycentric coordinates of the vertex on the triangle
    
        Interpolators output;
    
        // Setup instancing and stereo support (for VR)
        UNITY_SETUP_INSTANCE_ID(patch[0]);
        UNITY_TRANSFER_INSTANCE_ID(patch[0], output);
        UNITY_INITIALIZE_VERTEX_OUTPUT_STEREO(output);
    
        float3 positionWS = BARYCENTRIC_INTERPOLATE(positionWS);
        float3 normalWS = BARYCENTRIC_INTERPOLATE(normalWS);
    
        output.positionCS = TransformWorldToHClip(positionWS);
        output.normalWS = normalWS;
        output.positionWS = positionWS;
    
        return output;
    }

    In this function, Unity will give us the subdivision factor, the three vertices of the patch, and the centroid coordinates of the current new vertex. We can use this data to do displacement processing, etc.

    2.2 Detailed explanation of subdivision factors and division modes

    From thisLink Copy the code, then make the corresponding material and turn on the wireframe mode. We have only drawn vertices for the Mesh and have not applied any operations in the fragment shader, so it looks transparent.

    If any component of the Edge Factor is set to 0 or less than 0, the Mesh will disappear completely. The following figure shows what it looks like after it disappears (the Unity editor's object border stroke is turned on). This feature is very important.

    2.2.1 Overview of subdivision factors

    To put it bluntly, after these factors are set in the Hull Stage, they are simply and crudely written into the barycentric coordinates in the Tessellation Stage, such as edge factors and internal factors. (Assuming they are all tri, if it is quad, it is calculated using uv, which may be more complicated, I don't know) This simple and crude stage is not programmable.

    Take "integer (uniform) cutting mode" as an example. (temporarily) [partitioning("integer")] The domain is all triangles [domain("tri")] The number of output vertices is also 3. [outputcontrolpoints(3)] And the output topology is a triangle clockwise. [outputtopology("triangle_cw")]

    2.2.2 Preparatory work and potential parallel issues

    Modify the code to the following:

    // .shader
    _FactorEdge1("[Float3]Edge factors,[Float]Inside factor", Vector) = (1, 1, 1, 1) // -- Edited -- 
    
    // .hlsl
    float4 _FactorEdge1; // -- Edited -- 
    ...
    f.edge[0] = _FactorEdge1.x;
    f.edge[1] = _FactorEdge1.y; // -- Edited -- 
    f.edge[2] = _FactorEdge1.z; // -- Edited -- 
    f.inside = _FactorEdge1.w; // -- Edited --

    There may be a problem here. Sometimes the compiler will split the Patch Constant Function and calculate each factor in parallel, which may cause some factors to be deleted, and the factors may be inexplicably equal to 0. The solution is to pack these factors into a vector so that the compiler will not use undefined quantities. The following is a simple reproduction of what may happen.

    Modify the Path Constant Function as follows and open two new properties in the panel.

    The modified code lines are commented out with // — Edited — .

    // The patch constant function runs once per triangle, or "patch"
    // It runs in parallel to the hull function
    TessellationFactors PatchConstantFunction(
    InputPatch<TessellationControlPoint, 3> patch) {
    UNITY_SETUP_INSTANCE_ID(patch[0]); // Set up instancing
    //Calculate tessellation factors
        TessellationFactors f;
        f.edge[0] = _FactorEdge1.x;
        f.edge[1] = _FactorEdge2; // -- Edited --
        f.edge[2] = _FactorEdge3; // -- Edited --
        f.inside = _FactorInside;
    return f;
    }
    _FactorEdge2("Edge 2 factor", Float) = 1 // -- Edited --
    _FactorEdge3("Edge 3 factor", Float) = 1 // -- Edited --

    2.2.3 Edge Factor – SV_TessFactor

    It can be seen that the edge factors correspond approximately to the number of times the corresponding edge is split, and the internal factor corresponds to the complexity of the center.

    The edge factor only affectsOriginal triangle edgeAs for the complex internal pattern, it is controlled by the internal factor Inside Factor and the division mode.

    It should be noted that the surface subdivision in "integer cutting mode" is rounded up, for example, 2.1 is rounded up to 3.

    One picture says it all.

    2.2.4 Inside Factor – SV_InsideTessFactor

    Let's take the INTEGER mode as an example. The internal factor will only affect the complexity of the internal pattern. The specific influence is described in detail below.To summarize, the edge factor affects the triangular subdivision between the outermost layer and the first layer, the internal factor affects how many layers there are, and the division mode affects how each internal layer is subdivided.

    Assuming that the Edge Factors are set to (2,3,4) and only the Insider Factor is modified, an interesting property can be observed: when the internal factor n is an even number, a vertex can be found whose coordinates are exactly at the centroid position (13,13,13).

    Generally, it is good to set the edge factors to the same value. Here, different values are set, and the graph may be more confusing, but the most essential rules can be seen.

    It can be further observed that the number of vertices on any edge closest to the outermost triangle has an equal relationship with the internal factor Inside Factor (n): n=Numpoint−1. That is, the number of vertices on this edge is always equal to the subdivision factor minus 1.

    The number of vertices in each layer decreases by 1. That is, the first layer (not counting the outermost layer, as it will not be subdivided) will have n vertices, the second layer inward will have n−2 vertices, and so on.

    Combining the above three observations, we can get a guess and conclusion(It’s useless, but I calculated it when I had nothing to do)The total number of internal vertices can be calculated using the formula, where n corresponds to the internal factor n-1. Note that the internal factor starts at 2: a2n=3n2a2n−1=3n(n−1)+1. This can be simplified and combined to: ak=−0.125(−1)k+0.75k2+0.125. The formula for all integer operations is as follows: ak=⌊−(−1)k+6k2+18⌋

    2.2.5 Partitioning Mode – [partitioning(“_”)]

    The above only describes the simplest way to divide integers evenly, which uses integer multiples for subdivision. Let's talk about the other methods.Simply put, Fractional Odd and Fractional Even are advanced versions of Integer, but the former is an advanced version of Integer when it is an odd number, and the latter is an advanced version of Integer when it is an even number. The specific advancement is that the fractional part can be used to make the division no longer equal.

    Fractional Odd: Inside Factor can be a fraction (not Ceil), and the denominator is an odd number. Note that the denominator here is actually the denominator represented by the barycentric coordinates of each vertex. The division method with an odd number as the denominator will definitely make a vertex fall on the barycentric coordinates of the triangle, while an even number will not.Kaios.

    Gif

    Fractional Even: Similar to fractional_odd, but with an even denominator. I'm not sure how to choose this.

    Gif

    Pow2 (power of 2): This mode only allows the use of powers of 2 (such as 1, 2, 4, 8, etc.) as subdivision levels. Generally used for texture mapping or shadow calculations.

    3. Segment Optimization

    3.1 View Frustum Culling

    Generating so many vertices will result in very bad performance! Therefore, some methods are needed to improve rendering efficiency. Although vertices outside the frustum will be culled before T rasterization, if unnecessary patches are culled in advance in TCS, the calculation pressure of the tessellation shader will be reduced.

    If the tessellation factor is set to 0 in the Patch Constant Function, the tessellation generator will ignore the patch, which means that the culling here is for the entire patch, rather than the vertex-by-vertex culling in the frustum culling.

    We test every point in the patch to see if they are out of view. To do this, transform every point in the patch into clip space. So we need to calculate the clip space coordinates of each point in the vertex shader and pass it to the Hull Stage. Use GetVertexPositionInputs to get what we want.

    struct TessellationControlPoint {
        float4 positionCS : SV_POSITION; // -- Edited -- 
        ...
    };
    
    TessellationControlPoint Vertex(Attributes input) {
        TessellationControlPoint output;
        ...
        VertexPositionInputs posnInputs = GetVertexPositionInputs(input.positionOS);
        ...
        output.positionCS = posnInputs.positionCS; // -- Edited -- 
        ...
        return output;
    }

    Then write a test function above the Patch Constant Function to determine whether to cull the patch. Temporarily pass false here. The function passes in three points in the clipping space.

    // Returns true if it should be clipped due to frustum or winding culling
    bool ShouldClipPatch(float4 p0PositionCS, float4 p1PositionCS, float4 p2PositionCS) {
        return false;
    }

    Then write the IsOutOfBounds function to test whether a point is outside the bounds. The bounds can also be specified, and this method can be used in another function to determine whether a point is outside the view frustum.

    // Returns true if the point is outside the bounds set by lower and higher
    bool IsOutOfBounds(float3 p, float3 lower, float3 higher) {
        return p.x < lower.x || p.x > higher.x || p.y < lower.y || p.y > higher.y || p.z < lower.z || p.z > higher.z;
    }
    
    // Returns true if the given vertex is outside the camera fustum and should be culled
    bool IsPointOutOfFrustum(float4 positionCS) {
        float3 culling = positionCS.xyz;
        float w = positionCS.w;
        // UNITY_RAW_FAR_CLIP_VALUE is either 0 or 1, depending on graphics API
        // Most use 0, however OpenGL uses 1
        float3 lowerBounds = float3(-w, -w, -w * UNITY_RAW_FAR_CLIP_VALUE);
        float3 higherBounds = float3(w, w, w);
        return IsOutOfBounds(culling, lowerBounds, higherBounds);
    }

    In Clip Space, the W component is the secondary coordinate that determines whether a point is in the view frustum. If xyz is outside the range [-w, w], these points will be culled because they are outside the view frustum. Different APIs have differentDepth of processingThere is a different logic on the , we need to pay attention when we use this component as the boundary. DirectX and Vulkan use the left-handed system, the Clip depth is [0, 1], so UNITY_RAW_FAR_CLIP_VALUE is 0. OpenGL is a right-handed system, the Clip depth range is [-1, 1], and UNITY_RAW_FAR_CLIP_VALUE is 1.

    After preparing these, you can determine whether a patch needs to be culled. Go back to the function at the beginning and determine whether all the points of a patch need to be culled.

    // Returns true if it should be clipped due to frustum or winding culling
    bool ShouldClipPatch(float4 p0PositionCS, float4 p1PositionCS, float4 p2PositionCS) {
        bool allOutside = IsPointOutOfFrustum(p0PositionCS) &&
            IsPointOutOfFrustum(p1PositionCS) &&
            IsPointOutOfFrustum(p2PositionCS); // -- Edited -- 
        return allOutside; // -- Edited -- 
    }

    3.2 Backface Culling

    In addition to frustum culling, patches can also undergo backface culling, using the normal vector to determine whether a patch needs to be culled.

    img

    The normal vector is obtained by taking the cross product of two vectors. Since we are currently in Clip space, we need to do a perspective division to get NDC, which should be in the range of [-1,1]. The reason for converting to NDC is that the position in Clip space is nonlinear, which may cause the position of the vertex to be distorted. Converting to a linear space like NDC can more accurately determine the front and back relationship of the vertices.

    // Returns true if the points in this triangle are wound counter-clockwise
    bool ShouldBackFaceCull(float4 p0PositionCS, float4 p1PositionCS, float4 p2PositionCS) {
        float3 point0 = p0PositionCS.xyz / p0PositionCS.w;
        float3 point1 = p1PositionCS.xyz / p1PositionCS.w;
        float3 point2 = p2PositionCS.xyz / p2PositionCS.w;
        float3 normal = cross(point1 - point0, point2 - point0);
        return dot(normal, float3(0, 0, 1)) < 0;
    }

    The above code still has a cross-platform problem. The viewing direction is different in different APIs, so modify the code.

    // In clip space, the view direction is float3(0, 0, 1), so we can just test the z coord
    #if UNITY_REVERSED_Z
        return cross(point1 - point0, point2 - point0).z < 0;
    #else // In OpenGL, the test is reversed
        return cross(point1 - point0, point2 - point0).z > 0;
    #endif

    Finally, add the function you just wrote to ShouldClipPatch to determine backface culling.

    // Returns true if it should be clipped due to frustum or winding culling
    bool ShouldClipPatch(float4 p0PositionCS, float4 p1PositionCS, float4 p2PositionCS) {
        bool allOutside = IsPointOutOfFrustum(p0PositionCS) &&
            IsPointOutOfFrustum(p1PositionCS) &&
            IsPointOutOfFrustum(p2PositionCS);
        return allOutside || ShouldBackFaceCull(p0PositionCS, p1PositionCS, p2PositionCS); // -- Edited -- 
    }

    Then set the vertex factor of the patch to be culled to 0 in PatchConstantFunction.

    ...
    if (ShouldClipPatch(patch[0].positionCS, patch[1].positionCS, patch[2].positionCS)) {
            f.edge[0] = f.edge[1] = f.edge[2] = f.inside = 0; // Cull the patch
    }
    ...

    3.3 Increase Tolerance

    You may want to verify the correctness of the code, or there may be some unexpected exclusions. In this case, adding a tolerance is a flexible approach.

    The first is the frustum culling tolerance. If the tolerance is positive, the culling boundaries will be expanded so that some objects near the edge of the frustum will not be culled even if they are partially out of bounds. This method can reduce the frequent changes in culling state due to small perspective changes or object dynamics.

    // Returns true if the given vertex is outside the camera fustum and should be culled
    bool IsPointOutOfFrustum(float4 positionCS, float tolerance) {
        float3 culling = positionCS.xyz;
        float w = positionCS.w;
        // UNITY_RAW_FAR_CLIP_VALUE is either 0 or 1, depending on graphics API
        // Most use 0, however OpenGL uses 1
        float3 lowerBounds = float3(-w - tolerance, -w - tolerance, -w * UNITY_RAW_FAR_CLIP_VALUE - tolerance);
        float3 higherBounds = float3(w + tolerance, w + tolerance, w + tolerance);
        return IsOutOfBounds(culling, lowerBounds, higherBounds);
    }

    Next, backface culling is adjusted. In practice, this is done by comparing to a tolerance instead of zero to avoid issues with numerical precision. If the dot product result is less than some small positive value (the tolerance) instead of being strictly less than zero, then the primitive is considered a backface. This approach provides an additional buffer, ensuring that only explicitly backface primitives are culled.

    // Returns true if the points in this triangle are wound counter-clockwise
    bool ShouldBackFaceCull(float4 p0PositionCS, float4 p1PositionCS, float4 p2PositionCS, float tolerance) {
        float3 point0 = p0PositionCS.xyz / p0PositionCS.w;
        float3 point1 = p1PositionCS.xyz / p1PositionCS.w;
        float3 point2 = p2PositionCS.xyz / p2PositionCS.w;
        // In clip space, the view direction is float3(0, 0, 1), so we can just test the z coord
    #if UNITY_REVERSED_Z
        return cross(point1 - point0, point2 - point0).z < -tolerance;
    #else // In OpenGL, the test is reversed
        return cross(point1 - point0, point2 - point0).z > tolerance;
    #endif
    }

    It is possible to expose a Range in the Material Panel.

    // .shader
    Properties{
        _tolerance("_tolerance",Range(-0.002,0.001)) = 0
        ...
    }
    // .hlsl
    float _tolerance;
    ...
    // Returns true if it should be clipped due to frustum or winding culling
    bool ShouldClipPatch(float4 p0PositionCS, float4 p1PositionCS, float4 p2PositionCS) {
        bool allOutside = IsPointOutOfFrustum(p0PositionCS, _tolerance) &&
            IsPointOutOfFrustum(p1PositionCS, _tolerance) &&
            IsPointOutOfFrustum(p2PositionCS, _tolerance); // -- Edited -- 
        return allOutside || ShouldBackFaceCull(p0PositionCS, p1PositionCS, p2PositionCS,_tolerance); // -- Edited -- 
    }

    3.4 Dynamic subdivision factor

    So far, our algorithm has subdivided all surfaces indiscriminately. However, in a complex Mesh, there may be large and small faces.Uneven Mesh AreaThe large face is more obvious visually due to its large area, and more subdivisions are needed to ensure the smoothness and details of the surface. The small face is small in area, so you can consider reducing the subdivision level of this part, which will not have a big impact on the visual effect. Dynamically changing the factor according to the length change is a common method. Set an algorithm to give faces with longer side lengths a higher subdivision factor.

    In addition to the large and small faces of the Mesh itself,The distance between the camera and the patchIt can also be used as a factor to dynamically change the factor. Objects that are farther away from the camera can have a lower tessellation factor because they occupy fewer pixels on the screen.The user’s viewing angle and gaze direction, you can prioritize subdividing faces that face the camera, and reduce the level of subdivision for faces that face away from the camera or to the sides.

    3.4.1 Fixed Segment Scaling

    Get the distance between two vertices. The larger the distance, the larger the subdivision factor. The scale is exposed in the control panel and set to [0,1]. When the scale is 1, the subdivision factor is directly contributed by the distance between the two points. The closer the scale is to 0, the larger the subdivision factor. In addition, an initial value bias is added. Finally, let it take a number of 1 or above to ensure accuracy.

    //Calculate the tessellation factor for an edge
    float EdgeTessellationFactor(float scale, float bias, float3 p0PositionWS, float3 p1PositionWS) {
        float factor = distance(p0PositionWS, p1PositionWS) / scale;
    
        return max(1, factor + bias);
    }

    Then modify the material panel and Patch Constant Function. Generally speaking, the average value of the edge subdivision factor is used as the internal subdivision factor, which will give a more consistent visual effect.

    // .shader
    Properties{
        ...
        _TessellationBias("_TessellationBias", Range(-1,5)) = 1
         _TessellationFactor("_TessellationFactor", Range(0,1)) = 0
    }
    
    // .hlsl
    
    f.edge[0] = EdgeTessellationFactor(_TessellationFactor, _TessellationBias, patch[1].positionWS, patch[2].positionWS);
    f.edge[1] = EdgeTessellationFactor(_TessellationFactor, _TessellationBias, patch[2].positionWS, patch[0].positionWS);
    f.edge[2] = EdgeTessellationFactor(_TessellationFactor, _TessellationBias, patch[0].positionWS, patch[1].positionWS);
    f.inside = (f.edge[0] + f.edge[1] + f.edge[2]) / 3.0;

    The degree of subdivision of fragments of different sizes will change dynamically, and the effect is as follows.

    By the way, if you find that your internal factor pattern is very strange, this may be caused by the compiler. Try to modify the internal factor code to the following to solve it.

    f.inside = ( // If the compiler doesn't play nice...
      EdgeTessellationFactor(_TessellationFactor, _TessellationBias, patch[1].positionWS, patch[2].positionWS) + 
      EdgeTessellationFactor(_TessellationFactor, _TessellationBias, patch[2].positionWS, patch[0].positionWS) + 
      EdgeTessellationFactor(_TessellationFactor, _TessellationBias, patch[0].positionWS, patch[1].positionWS)
      ) / 3.0;

    3.4.2 Screen Space Subdivision Scaling

    Next, we need to determine the camera distance. We can directlyUse screen space distance to adjust the subdivision level, which perfectly solves the problem of large and small surfaces + screen distance at the same time!

    Since we already have the data in Clip space, and since screen space is very similar to NDC space, we only need to convert it to NDC, that is, do a perspective division.

    float EdgeTessellationFactor(float scale, float bias, float3 p0PositionWS, float4 p0PositionCS, float3 p1PositionWS, float4 p1PositionCS) {
        float factor = distance(p0PositionCS.xyz / p0PositionCS.w, p1PositionCS.xyz / p1PositionCS.w) / scale;
    
        return max(1, factor + bias);
    }

    Next, pass the Clip space coordinates into the Patch Constant Function.

    f.edge[0] = EdgeTessellationFactor(_TessellationFactor, _TessellationBias, 
      patch[1].positionWS, patch[1].positionCS, patch[2].positionWS, patch[2].positionCS);
    f.edge[1] = EdgeTessellationFactor(_TessellationFactor, _TessellationBias, 
      patch[2].positionWS, patch[2].positionCS, patch[0].positionWS, patch[0].positionCS);
    f.edge[2] = EdgeTessellationFactor(_TessellationFactor, _TessellationBias, 
      patch[0].positionWS, patch[0].positionCS, patch[1].positionWS, patch[1].positionCS);
    f.inside = (f.edge[0] + f.edge[1] + f.edge[2]) / 3.0;

    The current effect is quite good, and the level of subdivision changes dynamically as the camera distance (screen space distance) changes. If you use a subdivision mode other than INTEGER, you will get a more consistent effect.

    There are still some areas that can be improved. For example, the unit of the scaling factor. Just now we controlled it to [0,1], which is not very suitable for us to adjust. We multiply it by the screen resolution and change the scaling factor range to [0,1080], which is more convenient for us to adjust. Then modify the material panel properties. Now it is a ratio in pixels.

    // .hlsl
    float factor = distance(p0PositionCS.xyz / p0PositionCS.w, p1PositionCS.xyz / p1PositionCS.w) * _ScreenParams.y / scale;
    
    // .shader
    _TessellationFactor("_TessellationFactor",Range(0,1080)) = 320

    3.4.3 Camera distance subdivision scaling

    How do we use camera distance scaling? It's very simple. We calculate the ratio of the distance between two points and the distance between the midpoint of the two vertices and the camera position. The larger the ratio, the larger the space occupied on the screen, and the more subdivision is needed.

    // .hlsl
    float EdgeTessellationFactor(float scale, float bias, float3 p0PositionWS, float3 p1PositionWS) {
        float length = distance(p0PositionWS, p1PositionWS);
        float distanceToCamera = distance(GetCameraPositionWS(), (p0PositionWS + p1PositionWS) * 0.5);
        float factor = length / (scale * distanceToCamera * distanceToCamera);
        return max(1, factor + bias);
    }
    ...
            f.edge[0] = EdgeTessellationFactor(_TessellationFactor, _TessellationBias, patch[1].positionWS, patch[2].positionWS);
            f.edge[1] = EdgeTessellationFactor(_TessellationFactor, _TessellationBias, patch[2].positionWS, patch[0].positionWS);
            f.edge[2] = EdgeTessellationFactor(_TessellationFactor, _TessellationBias, patch[0].positionWS, patch[1].positionWS);
    
    // .shader
    _TessellationFactor("_TessellationFactor",Range(0, 1)) = 0.02

    Note that the scaling factor is no longer in pixels, but in the original [0,1] unit. Because screen pixels are not very meaningful in this method, they are not used. And the world coordinates are used again.

    The results of screen space subdivision scaling and camera distance subdivision scaling are similar. Generally, a macro can be opened to switch the modes of the above dynamic factors. Here, it is left to the reader to complete.

    3.5 Specifying subdivision factors

    3.5.1 Vertex Storage Subdivision Factor

    In the previous section, we used different strategies to guess the appropriate subdivision factors. If we know exactly how the mesh should be subdivided, we can store the coefficients of these subdivision factors in the mesh. Since the coefficient only needs a float, only one color channel is needed. The following is a pseudo code, just give it a try.

    float EdgeTessellationFactor(float scale, float bias, float multiplier) {
        ...
        return max(1, (factor + bias) * multiplier);
    }
    
    ...
    // PCF()
    [unroll] for (int i = 0; i < 3; i++) {
        multipliers[i] = patch[i].color.g;
    }
    //Calculate tessellation factors
    f.edge[0] = EdgeTessellationFactor(_TessellationFactor, _TessellationBias, (multipliers[1] + multipliers[2]) / 2);

    3.5.2 SDF Control Surface Subdivision Factor

    It is quite cool to combine the Signed Distance Field (SDF) to control the tessellation factor. Of course, this section does not involve the generation of SDF, assuming that it can be directly obtained through the ready-made function CalculateSDFDistance.

    For a given Mesh, use CalculateSDFDistance to calculate the distance from each vertex in each patch to the shape represented by the SDF (such as a sphere). After obtaining the distance, evaluate the subdivision requirements of the patch and perform subdivision.

    TessellationFactors PatchConstantFunction(
        InputPatch<TessellationControlPoint, 3> patch) {
        float multipliers[3];
    
        // Loop through each vertex
        [unroll] for (int i = 0; i < 3; i++) {
            // Calculate the distance from each vertex to the SDF surface
            float sdfDistance = CalculateSDFDistance(patch[i].positionWS);
    
            // Adjust subdivision factor based on SDF distance
            if (sdfDistance < _TessellationDistanceThreshold) {
                multipliers[i] = lerp(_MinTessellationFactor, _MaxTessellationFactor, (1 - sdfDistance / _TessellationDistanceThreshold));
            } else {
                multipliers[i] = _MinTessellationFactor;
            }
        }
    
        // Calculate the final subdivision factor
        TessellationFactors f;
        f.Edge[0] = max(multipliers[0], multipliers[1]);
        f.Edge[1] = max(multipliers[1], multipliers[2]);
        f.Edge[2] = max(multipliers[2], multipliers[0]);
        f.Inside = (multipliers[0] + multipliers[1] + multipliers[2]) / 3;
    
        return f;
    }

    I don't know how to implement it specifically, so I'll try to understand it first.

    4. Vertex offset – contour smoothing

    The easiest way to add details to a mesh is to use various high-resolution textures. However, the bottom line is that adding more vertices to a mesh is better than increasing the texture resolution. For example, a normal map can change the direction of each fragment's normal, but it does not change the geometry. Even a 128K texture cannot eliminate aliasing and pointy edges.

    Therefore, we need to tessellate the surface and then offset the vertices. All the tessellation operations just mentioned are operated on the plane where the patch is located. If we want to bend these vertices, one of the simplest operations is Phong tessellation.

    4.1 Phong subdivision

    First, the original paper is attached. https://perso.telecom-paristech.fr/boubek/papers/PhongTessellation/PhongTessellation.pdf

    Phong shading should be familiar to you. It is a technique that uses linear interpolation of normal vectors to obtain smooth shading. Phong subdivision is inspired by Phong shading and extends the concept of Phong shading to the spatial domain.

    The core idea of Phong subdivision is to use the vertex normals of each corner of the triangle to affect the position of new vertices during the subdivision process, thereby creating a curved surface instead of a flat surface.

    It is worth noting that many tutorials here use triangle corner to represent vertices. I think they are all the same, so I will still use vertices in this article.

    First, in the Domain function, Unity will give us the centroid coordinates of the new vertex we need to process. Suppose we are currently processing (13,13,13).

    Each vertex of a patch has a normal. Imagine a tangent plane emanating from each vertex, perpendicular to the respective normal vector.

    Then project the current vertex onto these three tangent planes respectively.

    Describe it in mathematical language. P′=P−((P−V)⋅N)N

    in :

    • $P$ is the initially interpolated plane position.
    • $V$ is a vertex position on the plane.
    • $N$ is the normal at vertex $V$.
    • ⋅ represents the dot product.
    • P′ is the projection of $P$ on the plane.

    Get three $P'$.

    The three points projected on the three tangent planes are re-formed into a new triangle, and then the centroid coordinates of the current vertex are applied to the new triangle to calculate the new point.

    //Calculate Phong projection offset
    float3 PhongProjectedPosition(float3 flatPositionWS, float3 cornerPositionWS, float3 normalWS) {
        return flatPositionWS - dot(flatPositionWS - cornerPositionWS, normalWS) * normalWS;
    }
    
    // Apply Phong smoothing
    float3 CalculatePhongPosition(float3 bary, float3 p0PositionWS, float3 p0NormalWS,
        float3 p1PositionWS, float3 p1NormalWS, float3 p2PositionWS, float3 p2NormalWS) {
        float3 smoothedPositionWS =
            bary.x * PhongProjectedPosition(flatPositionWS, p0PositionWS, p0NormalWS) +
            bary.y * PhongProjectedPosition(flatPositionWS, p1PositionWS, p1NormalWS) +
            bary.z * PhongProjectedPosition(flatPositionWS, p2PositionWS, p2NormalWS);
        return smoothedPositionWS;
    }
    
    // The domain function runs once per vertex in the final, tessellated mesh
    // Use it to reposition vertices and prepare for the fragment stage
    [domain("tri")] // Signal we're inputting triangles
    Interpolators Domain(
        TessellationFactors factors, //The output of the patch constant function
        OutputPatch<TessellationControlPoint, 3> patch, // The Input triangle
        float3 barycentricCoordinates : SV_DomainLocation) { // The barycentric coordinates of the vertex on the triangle
    
        Interpolators output;
        ...
        float3 positionWS = CalculatePhongPosition(barycentricCoordinates, 
          patch[0].positionWS, patch[0].normalWS, 
          patch[1].positionWS, patch[1].normalWS, 
          patch[2].positionWS, patch[2].normalWS);
        float3 normalWS = BARYCENTRIC_INTERPOLATE(normalWS);
        float3 tangentWS = BARYCENTRIC_INTERPOLATE(tangentWS.xyz);
        ...
        output.positionCS = TransformWorldToHClip(positionWS);
        output.normalWS = normalWS;
        output.positionWS = positionWS;
        output.tangentWS = float4(tangentWS, patch[0].tangentWS.w);
        ...
    }

    Note that we need to add the normal vector here, and then write it into Vertex and Domain. Then write a function to calculate the coordinates of the center of gravity of $P'$.

    struct Attributes {
        ...
        float4 tangentOS : TANGENT;
    };
    struct TessellationControlPoint {
        ...
        float4 tangentWS : TANGENT;
    };
    struct Interpolators {
        ...
        float4 tangentWS : TANGENT;
    };
    TessellationControlPoint Vertex(Attributes input) {
        TessellationControlPoint output;
        ...
        // .....The last one is the symbol coefficient
        output.tangentWS = float4(normalInputs.tangentWS, input.tangentOS.w); // tangent.w contains bitangent multiplier
    }
    // Barycentric interpolation as a function
    float3 BarycentricInterpolate(float3 bary, float3 a, float3 b, float3 c) {
        return bary.x * a + bary.y * b + bary.z * c;
    }

    In the original Phong subdivision paper, an α factor was added to control the degree of curvature. The original author recommends setting this value globally to three-quarters for the best visual effect. Expanding the algorithm with the α factor can produce a quadratic Bezier curve, which does not provide an inflection point but is sufficient for practical development.

    First, let’s look at the formula in the original paper.

    Essentially, it controls the degree of interpolation. A quantitative analysis shows that when α=0, all vertices are on the original plane, which is equivalent to no displacement. When α=1, the new vertices are completely dependent on the Phong subdivision bending vertices. Of course, you can also try values less than zero or greater than one, and the effect is also quite interesting. ~~It doesn’t matter if you don’t understand the mathematical formulas in the original text. I will just use a lerp and make a random interpolation.~~

    // Apply Phong smoothing
    float3 CalculatePhongPosition(float3 bary, float smoothing, float3 p0PositionWS, float3 p0NormalWS,
        float3 p1PositionWS, float3 p1NormalWS, float3 p2PositionWS, float3 p2NormalWS) {
        float3 flatPositionWS = BarycentricInterpolate(bary, p0PositionWS, p1PositionWS, p2PositionWS);
        float3 smoothedPositionWS =
            bary.x * PhongProjectedPosition(flatPositionWS, p0PositionWS, p0NormalWS) +
            bary.y * PhongProjectedPosition(flatPositionWS, p1PositionWS, p1NormalWS) +
            bary.z * PhongProjectedPosition(flatPositionWS, p2PositionWS, p2NormalWS);
        return lerp(flatPositionWS, smoothedPositionWS, smoothing);
    }
    
    // Apply Phong smoothing
    float3 CalculatePhongPosition(float3 bary, float smoothing, float3 p0PositionWS, float3 p0NormalWS,
        float3 p1PositionWS, float3 p1NormalWS, float3 p2PositionWS, float3 p2NormalWS) {
        float3 flatPositionWS = BarycentricInterpolate(bary, p0PositionWS, p1PositionWS, p2PositionWS);
        float3 smoothedPositionWS =
            bary.x * PhongProjectedPosition(flatPositionWS, p0PositionWS, p0NormalWS) +
            bary.y * PhongProjectedPosition(flatPositionWS, p1PositionWS, p1NormalWS) +
            bary.z * PhongProjectedPosition(flatPositionWS, p2PositionWS, p2NormalWS);
        return lerp(flatPositionWS, smoothedPositionWS, smoothing);
    }

    Don't forget to expose in the material panel.

    // .shader
    _TessellationSmoothing("_TessellationSmoothing", Range(0,1)) = 0.5
    
    // .hlsl
    float _TessellationSmoothing;
    
    
    
    Interpolators Domain( .... ) {
        ...
        float smoothing = _TessellationSmoothing;
        float3 positionWS = CalculatePhongPosition(barycentricCoordinates, smoothing,
          patch[0].positionWS, patch[0].normalWS, 
          patch[1].positionWS, patch[1].normalWS, 
          patch[2].positionWS, patch[2].normalWS);
        ...
    }

    It is important to note that some models require some modification. If the edges of the model are very sharp, it means that the normal of this vertex is almost parallel to the normal of the face. In Phong Tessellation, this will cause the projection of the vertex on the tangent plane to be very close to the original vertex position, thus reducing the impact of subdivision.

    To solve this problem, you can add more geometric details by performing what is called "adding loop edges" or "loop cuts" in the modeling software. Insert additional edge loops near the edges of the original model to increase the subdivision density. The specific operation will not be expanded here.

    In general, the effect and performance of Phong subdivision are relatively good. However, if you want a higher quality smoothing effect, you can consider PN triangles. This technology is based on the curved triangle of Bezier curve.

    4.2 PN triangles subdivision

    First, here is the original paper. http://alex.vlachos.com/graphics/CurvedPNTriangles.pdf

    PN Triangles does not require information about neighboring triangles and is less expensive. The PN Triangles algorithm only requires the positions and normals of the three vertices in the patch. The rest of the data can be calculated. Note that all data is in barycentric coordinates.

    In the PN algorithm, 10 control points need to be calculated for surface subdivision, as shown in the figure below. Three triangle vertices, a centroid, and three pairs of control points on the edges constitute all the control points. The calculated Bezier curve control points will be passed to the Domain. Since the control points of each triangle patch are consistent, it is very appropriate to place the step of calculating the control points in the Patch Constant Function.

    The calculation method in the paper is as follows:

    $$
    \begin{aligned}
    b_{300} & =P_1 \
    b_{030} & =P_2 \
    b_{003} & =P_3 \
    w_{ij} & =\left(P_j-P_i\right) \cdot N_i \in \mathbf{R} \quad \text { here ' } \cdot \text { ' is the scalar product, } \
    b_{210} & =\left(2 P_1+P_2-w_{12} N_1\right) / 3 \
    b_{120} & =\left(2 P_2+P_1-w_{21} N_2\right) / 3 \
    b_{021} & =\left(2 P_2+P_3-w_{23} N_2\right) / 3 \
    b_{012} & =\left(2 P_3+P_2-w_{32} N_3\right) / 3 \
    b_{102} & =\left(2 P_3+P_1-w_{31} N_3\right) / 3, \
    b_{201} & =\left(2 P_1+P_3-w_{13} N_1\right) / 3, \
    E & =\left(b_{210}+b_{120}+b_{021}+b_{012}+b_{102}+b_{201}\right) / 6 \
    V & =\left(P_1+P_2+P_3\right) / 3, \
    b_{111} & =E+(EV) / 2 .
    \end{aligned}
    $$

    Each edge of the formula $w_{ij}$ is calculated twice, so a total of 6 times. For example, the meaning of $w_{1 2}$ is the projection length of the vector from $P_1$ to $P_2$ in the normal direction of $P_1$. Multiplying it by the corresponding normal direction means that the projection vector is $w$ in length.

    Let's take the calculation of the factor close to $P_1$ as an example. The weight of the current position point should be larger. Multiplying it by $2$ makes the calculated control point closer to the current vertex. The reason for subtracting the projection vector is to correct the error caused by the position of $P_2$ not being on the plane defined by the $P_1$​​ normal. Make the triangle plane more consistent and reduce the distortion effect. Finally, divide by 3 for standardization.

    Next, calculate the average Bezier control point $E$​, which represents the average position of the six control points. This average position represents the concentration trend of the boundary control points. Then calculate the average position of the triangle vertices. Then find the midpoint of these two average positions and add it to the Bezier average control point. This is the tenth parameter required in the end.

    To summarize, the first three are the positions of the triangle vertices (so they don't need to be written in the structure), six are calculated by weight, and the last one is the average of the previous calculations. The code is very simple to write.

    struct TessellationFactors {
        float edge[3] : SV_TessFactor;
        float inside : SV_InsideTessFactor;
        float3 bezierPoints[7] : BEZIERPOS;
    };
    
    //Bezier control point calculations
    float3 CalculateBezierControlPoint(float3 p0PositionWS, float3 aNormalWS, float3 p1PositionWS, float3 bNormalWS) {
        float w = dot(p1PositionWS - p0PositionWS, aNormalWS);
        return (p0PositionWS * 2 + p1PositionWS - w * aNormalWS) / 3.0;
    }
    
    void CalculateBezierControlPoints(inout float3 bezierPoints[7],
        float3 p0PositionWS, float3 p0NormalWS, float3 p1PositionWS, float3 p1NormalWS, float3 p2PositionWS, float3 p2NormalWS) {
        bezierPoints[0] = CalculateBezierControlPoint(p0PositionWS, p0NormalWS, p1PositionWS, p1NormalWS);
        bezierPoints[1] = CalculateBezierControlPoint(p1PositionWS, p1NormalWS, p0PositionWS, p0NormalWS);
        bezierPoints[2] = CalculateBezierControlPoint(p1PositionWS, p1NormalWS, p2PositionWS, p2NormalWS);
        bezierPoints[3] = CalculateBezierControlPoint(p2PositionWS, p2NormalWS, p1PositionWS, p1NormalWS);
        bezierPoints[4] = CalculateBezierControlPoint(p2PositionWS, p2NormalWS, p0PositionWS, p0NormalWS);
        bezierPoints[5] = CalculateBezierControlPoint(p0PositionWS, p0NormalWS, p2PositionWS, p2NormalWS);
        float3 avgBezier = 0;
        [unroll] for (int i = 0; i < 6; i++) {
            avgBezier += bezierPoints[i];
        }
        avgBezier /= 6.0;
        float3 avgControl = (p0PositionWS + p1PositionWS + p2PositionWS) / 3.0;
        bezierPoints[6] = avgBezier + (avgBezier - avgControl) / 2.0;
    }
    
    // The patch constant function runs once per triangle, or "patch"
    // It runs in parallel to the hull function
    TessellationFactors PatchConstantFunction(
        InputPatch<TessellationControlPoint, 3> patch) {
        ...
        TessellationFactors f = (TessellationFactors)0;
        // Check if this patch should be culled (it is out of view)
        if (ShouldClipPatch(...)) {
            ...
        } else {
            ...
            CalculateBezierControlPoints(f.bezierPoints, patch[0].positionWS, patch[0].normalWS, 
              patch[1].positionWS, patch[1].normalWS, patch[2].positionWS, patch[2].normalWS);
        }
        return f;
    }

    Then, in the domain function, use the ten factors output by the Hull Function. According to the formula given in the paper, calculate the final cubic Bezier surface coordinates. Then interpolate and expose them on the material panel.

    $$
    \begin{aligned}
    & b: \quad R^2 \mapsto R^3, \quad \text { for } w=1-uv, \quad u, v, w \geq 0 \
    & b(u, v)= \sum_{i+j+k=3} b_{ijk} \frac{3!}{i!j!k!} u^iv^jw^k \
    &= b_{300} w^3+b_{030} u^3+b_{003} v^3 \
    &+b_{210} 3 w^2 u+b_{120} 3 wu^2+b_{201} 3 w^2 v \
    &+b_{021} 3 u^2 v+b_{102} 3 wv^2+b_{012} 3 uv^2 \
    &+b_{111} 6 wuv .
    \end{aligned}
    $$

    // Barycentric interpolation as a function
    float3 BarycentricInterpolate(float3 bary, float3 a, float3 b, float3 c) {
        return bary.x * a + bary.y * b + bary.z * c;
    }
    
    float3 CalculateBezierPosition(float3 bary, float smoothing, float3 bezierPoints[7],
        float3 p0PositionWS, float3 p1PositionWS, float3 p2PositionWS) {
        float3 flatPositionWS = BarycentricInterpolate(bary, p0PositionWS, p1PositionWS, p2PositionWS);
        float3 smoothedPositionWS =
            p0PositionWS * (bary.x * bary.x * bary.x) +
            p1PositionWS * (bary.y * bary.y * bary.y) +
            p2PositionWS * (bary.z * bary.z * bary.z) +
            bezierPoints[0] * (3 * bary.x * bary.x * bary.y) +
            bezierPoints[1] * (3 * bary.y * bary.y * bary.x) +
            bezierPoints[2] * (3 * bary.y * bary.y * bary.z) +
            bezierPoints[3] * (3 * bary.z * bary.z * bary.y) +
            bezierPoints[4] * (3 * bary.z * bary.z * bary.x) +
            bezierPoints[5] * (3 * bary.x * bary.x * bary.z) +
            bezierPoints[6] * (6 * bary.x * bary.y * bary.z);
        return lerp(flatPositionWS, smoothedPositionWS, smoothing);
    }
    
    // The domain function runs once per vertex in the final, tessellated mesh
    // Use it to reposition vertices and prepare for the fragment stage
    [domain("tri")] // Signal we're inputting triangles
    Interpolators Domain(
        TessellationFactors factors, //The output of the patch constant function
        OutputPatch<TessellationControlPoint, 3> patch, // The Input triangle
        float3 barycentricCoordinates : SV_DomainLocation) { // The barycentric coordinates of the vertex on the triangle
    
        Interpolators output;
        ...
        // Calculate tessellation smoothing multipler
        float smoothing = _TessellationSmoothing;
    #ifdef _TESSELLATION_SMOOTHING_VCOLORS
        smoothing *= BARYCENTRIC_INTERPOLATE(color.r); // Multiply by the vertex's red channel
    #endif
    
        float3 positionWS = CalculateBezierPosition(barycentricCoordinates,
          smoothing, factors.bezierPoints, 
          patch[0].positionWS, patch[1].positionWS, patch[2].positionWS);
        float3 normalWS = BARYCENTRIC_INTERPOLATE(normalWS);
        float3 tangentWS = BARYCENTRIC_INTERPOLATE(tangentWS.xyz);
        ...
    }

    Compare the effects, PN triangles off and on.

    4.3 Improved PN triangles – Output subdivided normals

    Traditional PN triangles only change the position information of the vertices. We can combine the normal information of the vertices to output dynamically changing normal information to provide better light reflection effects.

    In the original algorithm, the change of normals is very discrete. As shown in the figure below (above), the normals provided by the two vertices of the original triangle may not be able to well represent the change of the normals of the original surface. We want to achieve the effect shown in the figure below (below), so we need to use quadratic interpolation to obtain the possible surface changes in a single patch.

    Since the surface is a cubic Bezier surface, the normal should be a quadratic Bezier surface interpolation, so three additional normal control points are required.TheTusThe article has been explained clearly. Please go to the detailed mathematical principlesRef10. Link.

    The following is a brief introduction on how to obtain the normal direction of the subdivision.

    First, get the two normal information of point AB. Then find their average normal.

    Construct a plane perpendicular to line segment AB and passing through its midpoint.

    Take the reflection vector of the average normal just taken for the plane.

    Count each side, so there are three.

    struct TessellationFactors {
        float edge[3] : SV_TessFactor;
        float inside : SV_InsideTessFactor;
        float3 bezierPoints[10] : BEZIERPOS;
    };
    
    float3 CalculateBezierControlNormal(float3 p0PositionWS, float3 aNormalWS, float3 p1PositionWS, float3 bNormalWS) {
        float3 d = p1PositionWS - p0PositionWS;
        float v = 2 * dot(d, aNormalWS + bNormalWS) / dot(d, d);
        return normalize(aNormalWS + bNormalWS - v * d);
    }
    
    void CalculateBezierNormalPoints(inout float3 bezierPoints[10],
        float3 p0PositionWS, float3 p0NormalWS, float3 p1PositionWS, float3 p1NormalWS, float3 p2PositionWS, float3 p2NormalWS) {
        bezierPoints[7] = CalculateBezierControlNormal(p0PositionWS, p0NormalWS, p1PositionWS, p1NormalWS);
        bezierPoints[8] = CalculateBezierControlNormal(p1PositionWS, p1NormalWS, p2PositionWS, p2NormalWS);
        bezierPoints[9] = CalculateBezierControlNormal(p2PositionWS, p2NormalWS, p0PositionWS, p0NormalWS);
    }
    
    // The patch constant function runs once per triangle, or "patch"
    // It runs in parallel to the hull function
    TessellationFactors PatchConstantFunction(
        InputPatch<TessellationControlPoint, 3> patch) {
        ...
        TessellationFactors f = (TessellationFactors)0;
        // Check if this patch should be culled (it is out of view)
        if (ShouldClipPatch(...)) {
            ..
        } else {
            ...
            CalculateBezierControlPoints(f.bezierPoints, 
              patch[0].positionWS, patch[0].normalWS, patch[1].positionWS, 
              patch[1].normalWS, patch[2].positionWS, patch[2].normalWS);
            CalculateBezierNormalPoints(f.bezierPoints, 
              patch[0].positionWS, patch[0].normalWS, patch[1].positionWS, 
              patch[1].normalWS, patch[2].positionWS, patch[2].normalWS);
        }
        return f;
    }

    And it should be noted that all interpolated normal vectors need to be standardized.

    float3 CalculateBezierNormal(float3 bary, float3 bezierPoints[10],
        float3 p0NormalWS, float3 p1NormalWS, float3 p2NormalWS) {
        return p0NormalWS * (bary.x * bary.x) +
            p1NormalWS * (bary.y * bary.y) +
            p2NormalWS * (bary.z * bary.z) +
            bezierPoints[7] * (2 * bary.x * bary.y) +
            bezierPoints[8] * (2 * bary.y * bary.z) +
            bezierPoints[9] * (2 * bary.z * bary.x);
    }
    
    float3 CalculateBezierNormalWithSmoothFactor(float3 bary, float smoothing, float3 bezierPoints[10],
        float3 p0NormalWS, float3 p1NormalWS, float3 p2NormalWS) {
        float3 flatNormalWS = BarycentricInterpolate(bary, p0NormalWS, p1NormalWS, p2NormalWS);
        float3 smoothedNormalWS = CalculateBezierNormal(bary, bezierPoints, p0NormalWS, p1NormalWS, p2NormalWS);
        return normalize(lerp(flatNormalWS, smoothedNormalWS, smoothing));
    }
    
    // The domain function runs once per vertex in the final, tessellated mesh
    // Use it to reposition vertices and prepare for the fragment stage
    [domain("tri")] // Signal we're inputting triangles
    Interpolators Domain(
        TessellationFactors factors, //The output of the patch constant function
        OutputPatch<TessellationControlPoint, 3> patch, // The Input triangle
        float3 barycentricCoordinates : SV_DomainLocation) { // The barycentric coordinates of the vertex on the triangle
    
        Interpolators output;
        ...
        // Calculate tessellation smoothing multipler
        float smoothing = _TessellationSmoothing;
        float3 positionWS = CalculateBezierPosition(barycentricCoordinates, smoothing, factors.bezierPoints, patch[0].positionWS, patch[1].positionWS, patch[2].positionWS);
        float3 normalWS = CalculateBezierNormalWithSmoothFactor(
            barycentricCoordinates, smoothing, factors.bezierPoints,
            patch[0].normalWS, patch[1].normalWS, patch[2].normalWS);
        float3 tangentWS = BARYCENTRIC_INTERPOLATE(tangentWS.xyz);
        ...
    }

    There is another problem that needs to be noted. When we use the interpolated normal, the tangent vector corresponding to it is no longer orthogonal to the interpolated normal vector. In order to maintain orthogonality, a new tangent vector needs to be calculated.

    void CalculateBezierNormalAndTangent(
        float3 bary, float smoothing, float3 bezierPoints[10],
        float3 p0NormalWS, float3 p0TangentWS, 
        float3 p1NormalWS, float3 p1TangentWS, 
        float3 p2NormalWS, float3 p2TangentWS,
        out float3 normalWS, out float3 tangentWS) {
    
        float3 flatNormalWS = BarycentricInterpolate(bary, p0NormalWS, p1NormalWS, p2NormalWS);
        float3 smoothedNormalWS = CalculateBezierNormal(bary, bezierPoints, p0NormalWS, p1NormalWS, p2NormalWS);
        normalWS = normalize(lerp(flatNormalWS, smoothedNormalWS, smoothing));
    
        float3 flatTangentWS = BarycentricInterpolate(bary, p0TangentWS, p1TangentWS, p2TangentWS);
        float3 flatBitangentWS = cross(flatNormalWS, flatTangentWS);
        tangentWS = normalize(cross(flatBitangentWS, normalWS));
    }
    
    [domain("tri")] // Signal we're inputting triangles
    Interpolators Domain(
        TessellationFactors factors, //The output of the patch constant function
        OutputPatch<TessellationControlPoint, 3> patch, // The Input triangle
        float3 barycentricCoordinates : SV_DomainLocation) { // The barycentric coordinates of the vertex on the triangle
        ...
        float3 normalWS, tangentWS;
        CalculateBezierNormalAndTangent(
            barycentricCoordinates, smoothing, factors.bezierPoints,
            patch[0].normalWS, patch[0].tangentWS.xyz, 
            patch[1].normalWS, patch[1].tangentWS.xyz, 
            patch[2].normalWS, patch[2].tangentWS.xyz,
            normalWS, tangentWS);
        ...
    }

    References

    1. https://www.youtube.com/watch?v=63ufydgBcIk
    2. https://nedmakesgames.medium.com/mastering-tessellation-shaders-and-their-many-uses-in-unity-9caeb760150e
    3. https://zhuanlan.zhihu.com/p/148247621
    4. https://zhuanlan.zhihu.com/p/124235713
    5. https://zhuanlan.zhihu.com/p/141099616
    6. https://zhuanlan.zhihu.com/p/42550699
    7. https://en.wikipedia.org/wiki/Barycentric_coordinate_system
    8. https://zhuanlan.zhihu.com/p/359999755
    9. https://zhuanlan.zhihu.com/p/629364817
    10. https://zhuanlan.zhihu.com/p/629202115
    11. https://perso.telecom-paristech.fr/boubek/papers/PhongTessellation/PhongTessellation.pdf
    12. http://alex.vlachos.com/graphics/CurvedPNTriangles.pdf
  • Unity可互动可砍断八叉树草海渲染 – 几何、计算着色器(BIRP/URP)

    Unity interactive and chopable octree grass sea rendering – geometry, compute shader (BIRP/URP)

    Project (BIRP) on Github:

    https://github.com/Remyuu/Unity-Interactive-Grass

    First, here is a screenshot of 10,0500 grasses running on Compute Shader on my M1 pro without any optimization. It can run more than 200 frames.

    After adding octree frustum culling, distance fading and other operations, the frame rate is not so stable (I want to die). I guess it is because the CPU has too much pressure to operate each frame and needs to maintain such a large amount of grass information. But as long as enough culling is done, running 700+ frames is no problem (comfort). In addition, the depth of the octree also needs to be optimized according to the actual situation. In the figure below, I set the depth of the octree to 5.

    Preface

    This article is getting longer and longer. I mainly use it to review my knowledge. When you read it, you may feel that there are a lot of basic contents. I am a complete novice, and I beg for discussion and correction from you.

    This article mainly has two stages:

    • The GS + TS method achieves the most basic effect of grass rendering
    • Then I used CS to re-render the sea of grass, adding various optimization methods

    The rendering method of geometry shader + tessellation shader should be relatively simple, but the performance ceiling is relatively low and the platform compatibility is poor.

    The method of combining compute shaders with GPU Instancing should be the mainstream method in the current industry, and it can also run well on mobile terminals.

    The CS rendering of the sea of grass in this article mainly refers to the implementation of Colin and Minions Art, which is more like a hybrid of the two (the former has been analyzed by a big guy on ZhihuGrass rendering study notes based on GPU Instance). Use three sets of ComputeBuffer, one is the buffer containing all the grass, one is the buffer that is appended into the Material, and the other is a visible buffer (obtained in real time based on frustum culling). Implemented the use of a quad-octree (odd-even depth) for space division, plus the frustum culling to get the index of all the grass in the current frustum, pass it to the Compute Shader for further processing (such as Mesh generation, quaternion calculation rotation, LoD, etc.), and then use a variable-length ComputeBuffer (ComputeBufferType.Append) to pass the grass to be rendered to the Material through Instancing for final rendering.

    You can also use the Hi-Z solution to eliminate it. I'm digging a hole and working hard to learn.

    In addition, I referred to the article by Minions Art and copied a set of editor grass brushing tools (incomplete version), which stores the positions of all grass vertices by maintaining a vertex list.

    Furthermore, by maintaining another set of Cut Buffer, if the grass is marked with a -1 value, it will not be processed. If it is marked with a non--1 value of the chopper height, it will be passed to the Material, and through the WorldPos + Split.y plus the lerp operation, the upper half of the grass will be made invisible, and the color of the grass will be modified, and finally some grass clippings will be added to achieve a grass-cutting effect.

    Previous articleI have introduced in detail what a tessellation shader is and various optimization methods. Next, I will integrate tessellation into actual development. In addition, I combined the compute shader I learned in a few days to create a grass field based on the compute shader. You can find more details in the following article.This noteThe following is the small effect that this article will achieve, with complete code attached:

    • Grass Rendering
    • Grass Rendering – Geometry Shader (BIRP/URP)
    • Define grass width, height, orientation, pour, curvature, gradient, color, band, normal
    • INTEGER tessellation
    • URP adds Visibility Map
    • Grass rendering – Compute Shader (BIRP/URP) work on MacOS
    • Octree frustum culling
    • Distance fades
    • Grass Interaction
    • Interactive Geometry Shaders (BIRP/URP)
    • Interactive Compute Shader (BIRP) work on MacOS
    • Unity custom grass generation tool
    • Grass cutting system

    Main references(plagiarism)article:

    There are many ways to render grass, two of which are shown in this article:

    • Geometry Shader + Tessellation Shader
    • Compute Shaders + GPU Instancing

    First of all, the first solution has great limitations. Many mobile devices and Metal do not support GS, and GS will recalculate the Mesh every frame, which is quite expensive.

    Secondly, can MacOS no longer run geometry shaders? Not really. If you want to use GS, you must use OpenGL, not Metal. But it should be noted that Apple supports OpenGL up to OpenGL 4.1. In other words, this version does not support Compute Shader. Of course, MacOS in the Intel era can support OpenGL 4.3 and can run CS and GS at the same time. The M series chips do not have this fate. Either use 4.1 or use Metal. On my M1p mbp, even if you choose a virtual machine (Parallels 18+ provides DX11 and Vulkan), the Vulkan running on macOS is translated and is essentially Metal, so there is still no GS. Therefore, there is no native GS after macOS M1.

    Furthermore, Metal doesn't even support Tessellation shaders directly. Apple doesn't want to support these two things on the chip at all. Why? Because the efficiency is too low. On the M chip, TS is even simulated by CS!

    To sum up, geometry shaders are a dead-end technology, especially after the advent of Mesh Shader. Although GS is very popular in Unity, any similar effect can be instanced on CS, and it is more efficient. Although new graphics cards will still support GS, there are still quite a few games on the market that use GS. It's just that Apple didn't consider compatibility and directly cut it off.

    This article explains in detail why GS is so slow:http://www.joshbarczak.com/blog/?p=667. Simply put, Intel optimized GS by blocking threads, etc., while other chips do not have this optimization.

    This article is a study note and is likely to contain errors.

    1. Overview of Geometry Shader Rendering Grass (BIRP)

    This chapter isRoystanA concise summary of the . If you need the project file or the final code, you can download it from the original article. Or readSocrates has no bottom article.

    1.1 Overview

    After the Domain Stage, you can choose to use a geometry shader.

    A geometry shader takes a whole primitive as input and is able to generate vertices on output. The input to a geometry shader is the vertices of a complete primitive (three vertices for a triangle, two vertices for a line or a single vertex for a point). The geometry shader is called once for each primitive.

    fromWeb DownloadInitial engineering.

    1.2 Drawing a triangle

    Draw a triangle.

    // Add inside the CGINCLUDE block.
    struct geometryOutput
    {
        float4 pos : SV_POSITION;
    };
    
    ...
        //Vertex shader
    return vertex;
    ...
    
    [maxvertexcount(3)]
    void geo(triangle float4 IN[3] : SV_POSITION, inout TriangleStreamtriStream)
    {
        geometryOutput o;
    
        o.POS = UnityObjectToClipPos(float4(0.5, 0, 0, 1));
        triStream.Append(o);
    
        o.POS = UnityObjectToClipPos(float4(-0.5, 0, 0, 1));
        triStream.Append(o);
    
        o.POS = UnityObjectToClipPos(float4(0, 1, 0, 1));
        triStream.Append(o);
    }
    
    
    
    // Add inside the SubShader Pass, just below the #pragma fragment frag line.
    #pragma geometry geo

    We actually draw a triangle for each vertex in the mesh, but the positions we assign to the triangle vertices are constant - they don't change for each input vertex - placing all the triangles on top of each other.

    1.3 Vertex Offset

    Therefore, we can just make an offset according to the position of each vertex.

    C#
    // Add to the top of the geometry shader.
    float3 POS = IN[0];
    
    
    
    // Update each assignment of o.pos.
    o.POS = UnityObjectToClipPos(POS + float3(0.5, 0, 0));
    
    
    
    o.POS = UnityObjectToClipPos(POS + float3(-0.5, 0, 0));
    
    
    
    o.POS = UnityObjectToClipPos(POS + float3(0, 1, 0));

    1.4 Rotating blades

    However, it should be noted that currently all triangles are emitted in one direction, so normal correction is added. TBN matrix is constructed and multiplied with the current direction. And the code is organized.

    float3 vNormal = IN[0].normal;
    float4 vTangent = IN[0].tangent;
    float3 vBinormal = cross(vNormal, vTangent) * vTangent.w;
    
    float3x3 tangentToLocal = float3x3(
        vTangent.x, vBinormal.x, vNormal.x,
        vTangent.y, vBinormal.y, vNormal.y,
        vTangent.z, vBinormal.z, vNormal.z
        );
    
    triStream.Append(VertexOutput(POS + mul(tangentToLocal, float3(0.5, 0, 0))));
    triStream.Append(VertexOutput(POS + mul(tangentToLocal, float3(-0.5, 0, 0))));
    triStream.Append(VertexOutput(POS + mul(tangentToLocal, float3(0, 0, 1))));

    1.5 Coloring

    Then define the upper and lower colors of the grass, and use UV to make a lerp gradient.

    return lerp(_BottomColor, _TopColor, i.uv.y);
    C#

    1.6 Rotation Matrix Principle

    Make a random orientation. Here a rotation matrix is constructed. The principle is also mentioned in GAMES101. There is also aVideo of formula derivation, and it is very clear! The simple derivation idea is, assuming that the vector $a$ rotates around the n-axis to $b$, then decompose $a$​ into the component parallel to the n-axis (found to be constant) plus the component perpendicular to the n-axis.

    float3x3 AngleAxis3x3(float angle, float3 axis)
    {
        float c, s;
        sincos(angle, s, c);
    
        float t = 1 - c;
        float x = axis.x;
        float y = axis.y;
        float z = axis.z;
    
        return float3x3(
            t * x * x + c, t * x * y - s * z, t * x * z + s * y,
            t * x * y + s * z, t * y * y + c, t * y * z - s * x,
            t * x * z - s * y, t * y * z + s * x, t * z * z + c
            );
    }

    The rotation matrix $R$ is calculated here using Rodrigues' rotation formula: $$R=I+sin⁡(θ)⋅[k]×+(1−cos⁡(θ))⋅[k]×2$$

    Among them, $\theta$ is the rotation angle. $k$ is the unit rotation axis. $I$ is the identity matrix. $[k]_{\times}$ is the antisymmetric matrix corresponding to the axis $k$.

    For a unit vector $k=(x,y,z)$ , the antisymmetric matrix $[k]_{\times}=\left[\begin{array}{ccc} 0 & -z & y \\ z & 0 & -x \\ -y & x & 0 \end{array}\right]$ finally obtains the matrix elements:

    $$ \begin{array}{ccc} tx^2 + c & txy – sz & txz + sy \\ txy + sz & ty^2 + c & tyz – sx \\ txz – sy & tyz + sx & tz^2 + c \\ \end{array} $$

    float3x3 facingRotationMatrix = AngleAxis3x3(rand(POS) * UNITY_TWO_PI, float3(0, 0, 1));

    1.7 Blade tipping

    Get the grass in a random direction, and then pour it in any random direction on the x or y axis.

    float3x3 bendRotationMatrix = AngleAxis3x3(rand(POS.zzx) * _BendRotationRandom * UNITY_PI * 0.5, float3(-1, 0, 0));

    1.8 Leaf size

    Adjust the width and height of the grass. Originally, we set the height and width to be one unit. To make the grass more natural, we add rand to this step to make it look more natural.

    _BladeWidth("Blade Width", Float) = 0.05
    _BladeWidthRandom("Blade Width Random", Float) = 0.02
    _BladeHeight("Blade Height", Float) = 0.5
    _BladeHeightRandom("Blade Height Random", Float) = 0.3
    
    
    float height = (rand(POS.zyx) * 2 - 1) * _BladeHeightRandom + _BladeHeight;
    float width = (rand(POS.xzy) * 2 - 1) * _BladeWidthRandom + _BladeWidth;
    
    
    triStream.Append(VertexOutput(POS + mul(transformationMatrix, float3(width, 0, 0)), float2(0, 0)));
    triStream.Append(VertexOutput(POS + mul(transformationMatrix, float3(-width, 0, 0)), float2(1, 0)));
    triStream.Append(VertexOutput(POS + mul(transformationMatrix, float3(0, 0, height)), float2(0.5, 1)));

    1.9 Tessellation

    Since the number is too small, the upper surface is subdivided here.

    1.10 Perturbations

    To animate the grass, add the normals to the _Time perturbation. Sample the texture, then calculate the wind rotation matrix and apply it to the grass.

    float2 uv = POS.xz * _WindDistortionMap_ST.xy + _WindDistortionMap_ST.z + _WindFrequency * _Time.y;
    
    float2 windSample = (tex2Dlod(_WindDistortionMap, float4(uv, 0, 0)).xy * 2 - 1) * _WindStrength;
    
    float3 wind = normalize(float3(windSample.x, windSample.y, 0));
    
    float3x3 windRotation = AngleAxis3x3(UNITY_PI * windSample, wind);
    
    float3x3 transformationMatrix = mul(mul(mul(tangentToLocal, windRotation), facingRotationMatrix), bendRotationMatrix);

    1.11 Fixed blade rotation issue

    At this time, the wind may rotate along the x and y axes, which is specifically manifested as:

    Write a matrix for the two points under your feet that rotates only along z.

    float3x3 transformationMatrixFacing = mul(tangentToLocal, facingRotationMatrix);
    
    
    
    triStream.Append(VertexOutput(POS + mul(transformationMatrixFacing, float3(width, 0, 0)), float2(0, 0)));
    triStream.Append(VertexOutput(POS + mul(transformationMatrixFacing, float3(-width, 0, 0)), float2(1, 0)));

    1.12 Blade curvature

    In order to make the leaves have curvature, we have to add vertices. In addition, since double-sided rendering is currently enabled, the order of vertices does not matter. Here, a manual interpolation for loop is used to construct triangles. A forward is calculated to bend the leaves.

    float forward = rand(POS.yyz) * _BladeForward;
    
    
    for (int i = 0; i < BLADE_SEGMENTS; i++)
    {
        float t = i / (float)BLADE_SEGMENTS;
        // Add below the line declaring float t.
        float segmentHeight = height * t;
        float segmentWidth = width * (1 - t);
        float segmentForward = pow(t, _BladeCurve) * forward;
        float3x3 transformMatrix = i == 0 ? transformationMatrixFacing : transformationMatrix;
        triStream.Append(GenerateGrassVertex(POS, segmentWidth, segmentHeight, segmentForward, float2(0, t), transformMatrix));
        triStream.Append(GenerateGrassVertex(POS, -segmentWidth, segmentHeight, segmentForward, float2(1, t), transformMatrix));
    }
    
    triStream.Append(GenerateGrassVertex(POS, 0, height, forward, float2(0.5, 1), transformationMatrix));

    1.13 Creating Shadows

    Create shadows in another Pass and output.

    Pass{
        Tags{
            "LightMode" = "ShadowCaster"
        }
    
        CGPROGRAM
        #Pragmas vertex vert
        #Pragmas geometry geo
        #Pragmas fragment frag
        #Pragmas hull hull
        #Pragmas domain domain
        #Pragmas target 4.6
        #Pragmas multi_compile_shadowcaster
    
        float4 frag(geometryOutput i) : SV_Target{
            SHADOW_CASTER_FRAGMENT(i)
        }
    
        ENDCG
    }

    1.14 Receiving Shadows

    Use SHADOW_ATTENUATION directly in Frag to determine the shadow.

    // geometryOutput struct.
    unityShadowCoord4 _ShadowCoord : TEXCOORD1;
    ...
    o._ShadowCoord = ComputeScreenPos(o.POS);
    ...
    #Pragmas multi_compile_fwdbase
    ...
    return SHADOW_ATTENUATION(i);

    1.15 Removing shadow acne

    Removes surface acne.

    #if UNITY_PASS_SHADOWCASTER
        o.POS = UnityApplyLinearShadowBias(o.POS);
    #endif

    1.16 Adding Normals

    Add normal information to vertices generated by the geometry shader.

    struct geometryOutput
    {
        float4 POS : SV_POSITION;
        float2 uv : TEXCOORD0;
        unityShadowCoord4 _ShadowCoord : TEXCOORD1;
        float3 normal : NORMAL;
    };
    ...
    o.normal = UnityObjectToWorldNormal(normal);

    1.17 Full code‼️ (BIRP)

    The final effect.

    Code:

    https://pastebin.com/8u1ytGgU

    Complete: https://pastebin.com/U14m1Nu0

    2. Geometry Shader Rendering Grass (URP)

    2.1 References

    I have already written the BIRP version, and now I just need to port it.

    • URP code specification reference: https://www.cyanilux.com/tutorials/urp-shader-code/
    • BIRP->URP quick reference table: https://cuihongzhi1991.github.io/blog/2020/05/27/builtinttourp/

    You can followThis article by DanielYou can also follow me to modify the code. It should be noted that the space transformation code in the original repo has problems.Pull requestsThe solution was found in

    Now put the above BIRP tessellation shader together.

    • Tags changed to URP
    • The header file is introduced and replaced with the URP version
    • Variables are surrounded by CBuffer
    • Shadow casting, receiving code

    2.2 Start to change

    Declare the URP pipeline.

    LOD 100
    Cull Off
    Pass{
        Tags{
            "RenderType" = "Opaque"
            "Queue" = "Geometry"
            "RenderPipeline" = "UniversalPipeline"
        }

    Import the URP library.

    #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"
    #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Lighting.hlsl"
    #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/ShaderVariablesFunctions.hlsl"
    
    o._ShadowCoord = ComputeScreenPos(o.POS);

    Change the function.

    // o.normal = UnityObjectToWorldNormal(normal);
    o.normal = TransformObjectToWorldNormal(normal);

    URP receives the shadow. It is best to calculate this in the vertex shader, but for convenience, it is all calculated in the geometry shader.

    Then generate the shadows. ShadowCaster Pass.

    Pass{
        Name "ShadowCaster"
        Tags{ "LightMode" = "ShadowCaster" }
    
        ZWrite On
        ZTest LEqual
    
        HLSLPROGRAM
    
            half4 frag(geometryOutput input) : SV_TARGET{
                return 1;
            }
    
        ENDHLSL
    }

    2.3 Full code‼️(URP)

    https://pastebin.com/6KveEKMZ

    3. Optimize tessellation logic (BIRP/URP)

    3.1 Organize the code

    Above we just use a fixed number of subdivision levels, which I cannot accept. If you don't understand the principle of surface subdivision, you can seeMy Tessellation Articles, which details several solutions for optimizing segmentation.

    I use the BIRP version of the code that I completed in Section 1 as an example. The current version only has the Uniform subdivision.

    _TessellationUniform("Tessellation Uniform", Range(1, 64)) = 1

    The output structures of each stage are quite confusing, so let's reorganize them.

    3.1 Partitioning Mode

    [KeywordEnum(INTEGER, FRAC_EVEN, FRAC_ODD, POW2)] _PARTITIONING("Partition algorithm", Float) = 0
    
    #Pragmas shader_feature_local _PARTITIONING_INTEGER _PARTITIONING_FRAC_EVEN _PARTITIONING_FRAC_ODD _PARTITIONING_POW2
    
    #if defined(_PARTITIONING_INTEGER)
        [partitioning("integer")]
    #elif defined(_PARTITIONING_FRAC_EVEN)
        [partitioning("fractional_even")]
    #elif defined(_PARTITIONING_FRAC_ODD)
        [partitioning("fractional_odd")]
    #elif defined(_PARTITIONING_POW2)
        [partitioning("pow2")]
    #else 
        [partitioning("integer")]
    #endif

    3.2 Subdivided Frustum Culling

    In BIRP, use _ProjectionParams.z to represent the far plane, and in URP use UNITY_RAW_FAR_CLIP_VALUE.

    bool IsOutOfBounds(float3 p, float3 lower, float3 higher) { //Given rectangle judgment
        return p.x < lower.x || p.x > higher.x || p.y < lower.y || p.y > higher.y || p.z < lower.z || p.z > higher.z;
    }
    bool IsPointOutOfFrustum(float4 positionCS) { //View cone judgment
        float3 culling = positionCS.xyz;
        float w = positionCS.w;
        float3 lowerBounds = float3(-w, -w, -w * _ProjectionParams.z);
        float3 higherBounds = float3(w, w, w);
        return IsOutOfBounds(culling, lowerBounds, higherBounds);
    }
    bool ShouldClipPatch(float4 p0PositionCS, float4 p1PositionCS, float4 p2PositionCS) {
        bool allOutside = IsPointOutOfFrustum(p0PositionCS) &&
            IsPointOutOfFrustum(p1PositionCS) &&
            IsPointOutOfFrustum(p2PositionCS);
        return allOutside;
    }
    
    TessellationControlPoint vert(Attributes v)
    {
        ...
        o.positionCS = UnityObjectToClipPos(v.vertex);
        ...
    }
    
    TessellationFactors patchConstantFunction (InputPatch<TessellationControlPoint, 3> patch)
    {
        TessellationFactors f;
        if(ShouldClipPatch(patch[0].positionCS, patch[1].positionCS, patch[2].positionCS)){
            f.edge[0] = f.edge[1] = f.edge[2] = f.inside = 0;
        }else{
            f.edge[0] = _TessellationFactor;
            f.edge[1] = _TessellationFactor;
            f.edge[2] = _TessellationFactor;
            f.inside = _TessellationFactor;
        }
        return f;
    }

    However, it should be noted that the judgment input here is the CS coordinates of the grass. If the triangular grass completely leaves the screen, but the grass grows high and may still be on the screen, it will cause a screen bug where the grass suddenly disappears. This depends on the needs of the project. If it is a project with an upward viewing angle and the grass is relatively short, this operation can be used.

    The viewing angle is not a big problem.

    If viewed from Voldemort's perspective, the grass is incomplete and over-culled.

    3.3 Fine-grained control of screen distance

    The grass is dense near and sparse far, but based on the screen distance (CS space). This method is affected by the resolution.

    float EdgeTessellationFactor(float scale, float4 p0PositionCS, float4 p1PositionCS) {
        float factor = distance(p0PositionCS.xyz / p0PositionCS.w, p1PositionCS.xyz / p1PositionCS.w) / scale;
        return max(1, factor);
    }
    
    TessellationFactors patchConstantFunction (InputPatch<TessellationControlPoint, 3> patch)
    {
        TessellationFactors f;
    
        f.edge[0] = EdgeTessellationFactor(_TessellationFactor, 
            patch[1].positionCS, patch[2].positionCS);
        f.edge[1] = EdgeTessellationFactor(_TessellationFactor, 
            patch[2].positionCS, patch[0].positionCS);
        f.edge[2] = EdgeTessellationFactor(_TessellationFactor, 
            patch[0].positionCS, patch[1].positionCS);
        f.inside = (f.edge[0] + f.edge[1] + f.edge[2]) / 3.0;
    
    
        #if defined(_CUTTESS_TRUE)
            if(ShouldClipPatch(patch[0].positionCS, patch[1].positionCS, patch[2].positionCS))
                f.edge[0] = f.edge[1] = f.edge[2] = f.inside = 0;
        #endif
    
        return f;
    }

    Tessellation Factor = 0.08

    It is not recommended to select Frac as the segmentation mode, otherwise there will be strong shaking, which is very eye-catching. I don't like this method very much.

    3.4 Camera distance classification

    Calculate the ratio of "the distance between two points" to "the distance between the midpoint of the two vertices and the camera position". The larger the ratio, the larger the space occupied on the screen, and the more subdivision is required.

    float EdgeTessellationFactor_WorldBase(float scale, float3 p0PositionWS, float3 p1PositionWS) {
        float length = distance(p0PositionWS, p1PositionWS);
        float distanceToCamera = distance(_WorldSpaceCameraPos, (p0PositionWS + p1PositionWS) * 0.5);
        float factor = length / (scale * distanceToCamera * distanceToCamera);
        return max(1, factor);
    }
    ...
    f.edge[0] = EdgeTessellationFactor_WorldBase(_TessellationFactor_WORLD_BASE, 
        patch[1].vertex, patch[2].vertex);
    f.edge[1] = EdgeTessellationFactor_WorldBase(_TessellationFactor_WORLD_BASE, 
        patch[2].vertex, patch[0].vertex);
    f.edge[2] = EdgeTessellationFactor_WorldBase(_TessellationFactor_WORLD_BASE, 
        patch[0].vertex, patch[1].vertex);
    f.inside = (f.edge[0] + f.edge[1] + f.edge[2]) / 3.0;

    There is still room for improvement. Adjust the density of the grass so that the grass at close distance is not too dense, and the grass curve at medium distance is smoother, and introduce a nonlinear factor to control the relationship between distance and tessellation factor.

    float EdgeTessellationFactor_WorldBase(float scale, float3 p0PositionWS, float3 p1PositionWS) {
        float length = distance(p0PositionWS, p1PositionWS);
        float distanceToCamera = distance(_WorldSpaceCameraPos, (p0PositionWS + p1PositionWS) * 0.5);
        // Use the square root function to adjust the effect of distance to make the tessellation factor change more smoothly at medium distances
        float adjustedDistance = sqrt(distanceToCamera);
        // Adjust the impact of scale. You may need to further fine-tune the coefficient here based on the actual effect.
        float factor = length / (scale * adjustedDistance);
        return max(1, factor);
    }

    This is more appropriate.

    3.5 Visibility Map Controls Grass Subdivision

    The vertex shader reads the texture and passes it to the tessellation shader, which calculates the tessellation logic in PCF.

    Take FIXED mode as an example:

    _VisibilityMap("Visibility Map", 2D) = "white" {}
    TEXTURE2D (_VisibilityMap);SAMPLER(sampler_VisibilityMap);
    struct Attributes
    {
        ...
        float2 uv : TEXCOORD0;
    };
    struct TessellationControlPoint
    {
        ...
        float visibility : TEXCOORD1;
    };
    TessellationControlPoint vert(Attributes v){
        ...
        float visibility = SAMPLE_TEXTURE2D_LOD(_VisibilityMap, sampler_VisibilityMap, v.uv, 0).r; 
        o.visibility    = visibility;
        ...
    }
    TessellationFactors patchConstantFunction (InputPatch<TessellationControlPoint, 3> patch){
        ...
        float averageVisibility = (patch[0].visibility + patch[1].visibility + patch[2].visibility) / 3; // Calculate the average grayscale value of the three vertices
        float baseTessellationFactor = _TessellationFactor_FIXED; 
        float tessellationMultiplier = lerp(0.1, 1.0, averageVisibility); // Adjust the factor based on the average gray value
        #if defined(_DYNAMIC_FIXED)
            f.edge[0] = _TessellationFactor_FIXED * tessellationMultiplier;
            f.edge[1] = _TessellationFactor_FIXED * tessellationMultiplier;
            f.edge[2] = _TessellationFactor_FIXED * tessellationMultiplier;
            f.inside  = _TessellationFactor_FIXED * tessellationMultiplier;
        ...

    3.6 Complete code‼️ (BIRP)

    Grass Shader:

    https://pastebin.com/TD0AupGz

    3.7 Full code ‼ ️ (URP)

    There are some differences in URP. For example, to calculate ShadowBias, you need to do the following. I won’t expand on it. Just look at the code yourself.

    #if UNITY_PASS_SHADOWCASTER
        // o.pos = UnityApplyLinearShadowBias(o.pos);
        o.shadowCoord = TransformWorldToShadowCoord(ApplyShadowBias(posWS, norWS, 0));
    #endif

    Grass Shader:

    https://pastebin.com/2ZX2aVm9

    4. Interactive Grassland

    URP and BIRP are exactly the same.

    4.1 Implementation steps

    The principle is very simple. The script transmits the character's world coordinates, and then bends the grass according to the set radius and interaction strength.

    uniform float3 _PositionMoving; // Object position float _Radius; // Object interaction radius float _Strength; // Interaction strength

    In the grass generation loop, calculate the distance between each grass fragment and the object and adjust the grass position according to this distance.

    float dis = distance(_PositionMoving, posWS); // Calculate distance
    float radiusEffect = 1 - saturate(dis / _Radius); // Calculate effect attenuation based on distance
    float3 sphereDisp = POS - _PositionMoving; // Calculate the position difference
    sphereDisp *= radiusEffect * _Strength; // Apply falloff and intensity
    sphereDisp = clamp(sphereDisp, -0.8, 0.8); // Limit the maximum displacement

    The new positions are then calculated within each blade of grass.

    // Apply interactive effects
    float3 newPos = i == 0 ? POS : POS + (sphereDisp * t);
    triStream.Append(GenerateGrassVertex(newPos, segmentWidth, segmentHeight, segmentForward, float2(0, t), transformMatrix));
    triStream.Append(GenerateGrassVertex(newPos, -segmentWidth, segmentHeight, segmentForward, float2(1, t), transformMatrix));

    Don't forget the outside of the for loop, which is the top vertex.

    // Final grass fragment
    float3 newPosTop = POS + sphereDisp;
    triStream.Append(GenerateGrassVertex(newPosTop, 0, height, forward, float2(0.5, 1), transformationMatrix));
    triStream.RestartStrip();

    In URP, using uniform float3 _PositionMoving may cause SRP Batcher to fail.

    4.2 Script Code

    Bind the object that needs interaction.

    using UnityEngine;
    
    public class ShaderInteractor : MonoBehaviour
    {
        // Update is called once per frame
        void Update()
        {
            Shader.SetGlobalVector("_PositionMoving", transform.position);
        }
    }

    4.3 Full code ‼ ️ (URP)

    Grass shader:

    https://pastebin.com/Zs77EQgy

    5. Compute Shader Rendering Grass v1.0

    Why v1.0? Because I think it is quite difficult to render the sea of grass with this compute shader. Many of the things that are not available now can be improved slowly in the future. I also wrote some notes about Compute Shader.

    1. Compute Shader Study Notes (I)
    2. Compute Shader Learning Notes (II) Post-processing Effects
    3. Compute Shader Learning Notes (II) Particle Effects and Cluster Behavior Simulation
    4. Compute Shader Learning Notes (Part 3) Grass Rendering

    5.1 Review/Organization

    The Compute Shader notes above fully describe how to write a stylized grass sea from scratch in CS. If you forgot, review it here.

    There are still many things that the CPU needs to do in the initialization stage. First, define the grass Mesh and Buffer transfer (the width and height of the grass, the position of each grass generation, the random orientation of the grass, and the random color depth of the grass). It also needs to specifically pass the maximum curvature value and grass interaction radius to the Compute Shader.

    For each frame, the CPU also passes the time variable, wind direction, wind force/speed, and wind field scaling factor to the Compute Shader.

    Compute Shader uses the information passed by the CPU to calculate how the grass should turn, using quaternions as output.

    Finally, the shader instantiates the ID and all calculation results, first calculating the vertex offset, then applying the quaternion rotation, and finally modifying the normal information.

    This demo can actually be further optimized, such as putting more calculations in the Compute Shader, such as the process of generating Mesh, the width and height of the grass, random tilting, etc. More real-time parameter adjustment variables can also be optimized. Various optimization culling can also be performed, such as culling the incoming camera position by distance, or culling with the view frustum, etc. This culling process requires the use of some atomic operations. There is also multi-object interaction. The logic of interactive grass deformation can also be optimized, such as the degree of interaction is proportional to the power of the distance of the interactive object, etc. The engine function can also be increased, and the function of brushing grass can be developed, which may require a quadtree storage system, etc.

    And in Compute Shader, use vectors instead of scalars when possible.

    First, organize the code. Put all variables that do not need to be sent to the Compute Shader every frame into a function for unified initialization. Organize the Inspector panel. (There are many code changes)

    First, basically all calculations are run on the GPU, except that the world coordinates of each grass are calculated in the CPU and passed to the GPU through a Buffer.

    The size of the buffer transmission depends entirely on the size of the ground mesh and the set density. In other words, if it is a super large open world, the buffer will become super large. For a 5*5 grass field, with the Density set to 0.5, approximately 312576 grass data will be sent, and the actual data will reach 4*312576*4=5001216 bytes. Based on the CPU->GPU transmission speed of 8 GB/s, it takes about 10 milliseconds to transmit.

    Fortunately, this buffer does not need to be transmitted every frame, but it is enough to attract our attention. If the current grass size increases to 100*100, the time required will increase several times, which is scary. Moreover, we may not use many of the vertices, which causes a great waste of performance.

    I added a function to generate perlin noise in the Compute Shader, as well as the xorshift128 random number generation algorithm.

    // Perlin random number algorithm
    float hash(float x, float y) {
        return frac(abs(sin(sin(123.321 + x) * (y + 321.123)) * 456.654));
    }
    float perlin(float x, float y){
        float col = 0.0;
        for (int i = 0; i < 8; i++) {
            float fx = floor(x); float fy = floor(y);
            float xx = ceil(x); float cy = ceil(y);
            float a = hash(fx, fy); float b = hash(fx, cy);
            float c = hash(xx, fy); float d = hash(xx, cy);
            col += lerp(lerp(a, b, frac(y)), lerp(c, d, frac(y)), frac(x));
            col /= 2.0; x /= 2.0; y /= 2.0;
        }
        return col;
    }
    // XorShift128 random number algorithm -- Edited Directly output normalized data
    uint state[4];
    void xorshift_init(uint s) {
        state[0] = s; state[1] = s | 0xffff0000u;
        state[2] = s < 16; state[3] = s >> 16;
    }
    float xorshift128() {
        uint t = state[3]; uint s = state[0];
        state[3] = state[2]; state[2] = state[1]; state[1] = s;
        t ^= t < 11u; t ^= t >> 8u;
        state[0] = t ^ s ^ (s >> 19u);
        return (float)state[0] / float(0xffffffffu);
    }
    
    [numthreads(THREADGROUPSIZE,1,1)]
    void BendGrass (uint3 id : SV_DispatchThreadID)
    {
        xorshift_init(id.x * 73856093u ^ id.y * 19349663u ^ id.z * 83492791u);
        ...
    }

    To review, at present, the CPU uses an AABB average grass paving logic to generate all possible grass vertices, which are then passed to the GPU to perform some culling, LoD and other operations in the Compute Shader.

    So far I have three Buffers.

    m_InputBuffer is the structure on the left of the above picture that sends all the grass to the GPU without any culling.

    m_OutputBuffer is a variable length buffer that increases slowly in the Compute Shader. If the grass of the current thread ID is suitable, it will be added to this buffer for instanced rendering later. The structure on the right of the above picture.

    m_argsBuffer is a parameterized Buffer, which is different from other Buffers. It is used to pass parameters to Draw, and its specific content is to specify the number of vertices to be rendered in batches, the number of rendering instances, etc. Let's take a look at it in detail:

    First parameter, my grass mesh has seven triangles, so there are 21 vertices to render.

    The second parameter is temporarily set to 0, indicating that nothing needs to be rendered. This number will be dynamically set according to the length of m_OutputBuffer after the Compute Shader calculation is completed. In other words, the number here will be the same as the number of grasses appended in the Compute Shader.

    The third and fourth parameters represent respectively: the index of the first rendered vertex and the index of the first instantiation.

    I haven't used the fifth parameter, so I don't know what it is used for.

    The last step looks like this, passing in the Mesh, material, AABB and parameter Buffer.

    5.2 Customizing Unity Tools

    Create a new C# script and save it in the Editor directory of the project (if it doesn't exist, create one). The script inherits from Editor, and then write [CustomEditor(typeof(XXX))] . It means you work for XXX. I work for GrassControl, and then you can attach what you wrote now to XXX. Of course, you can also have a separate window, which should inherit from EditorWindow.

    Write tools in the OnInspectorGUI() function, for example, write a Label.

    GUILayout.Label("== Remo Grass Generator ==");

    To center the Inspector, add a parameter.

    GUILayout.Label("== Remo Grass Generator ==", new GUIStyle(EditorStyles.boldLabel) { alignment = TextAnchor.MiddleCenter });

    Too crowded? Just add a line of space.

    EditorGUILayout.Space();

    If you want to attach tools above XXX, then all the logic should be written above OnInspectorGUI.

    ... // Write here
    // The default Inspector interface of GrassControl
    base.OnInspectorGUI();

    Create a button and press the code:

    if (GUILayout.Button("xxx"))
    {
        ...//Code after pressing

    Anyway, these are the ones I use now.

    5.3 Editor selects the object to generate grass

    It is also very simple to get the Object of the script of the current service and display it in the Inspector.

    [SerializeField] private GameObject grassObject;
    ...
    grassObject = (GameObject)EditorGUILayout.ObjectField("Write any name", grassObject, typeof(GameObject), true);
    if (grassObject == null)
    {
        grassObject = FindObjectOfType<GrassControl>()?.gameObject;
    }

    After obtaining it, you can access the contents of the current script through GameObject.

    How to get the object selected in the Editor window? It can be done with one line of code.

    foreach (GameObject obj in Selection.gameObjects)

    Display the selected objects in the Inspector panel. Note that you need to handle the case of multiple selections, otherwise a Warning will be issued.

    // Display the current Editor selected object in real time and control the availability of the button
    EditorGUILayout.LabelField("Selection Info:", EditorStyles.boldLabel);
    bool hasSelection = Selection.activeGameObject != null;
    GUI.enabled = hasSelection;
    if (hasSelection)
        foreach (GameObject obj in Selection.gameObjects)
            EditorGUILayout.LabelField(obj.name);
    else
        EditorGUILayout.LabelField("No active object selected.");

    Next, get the MeshFilter and Renderer of the selected object. Since Raycast detection is required, get a Collider. If it does not exist, create one.

    Then I will not talk about the code of sketching grass here.

    5.4 Processing AABBs

    After generating a bunch of grass, add each grass to the AABB and finally pass it to Instancing.

    I assume that each grass is the size of a unit cube, so it is Vector3.one. If the grass is particularly tall, this should need to be modified.

    Stuff each blade of grass into the big AABB and pass the new AABB back to the script's m_LocalBounds for Instancing.

    Graphics.DrawMeshInstancedIndirect(blade, 0, m_Material, m_LocalBounds, m_argsBuffer);

    5.5 Surface Shader – Pitfalls

    There is a small problem here. Since the current Material is a Surface Shader, the Vertex of the Surface Shader has calculated the center of the AABB by default to do the vertex offset, so the world coordinates passed in before cannot be used directly. You also need to pass the center of the AABB in and subtract it. It's so strange. I wonder if there is any elegant way.

    5.6 Simple Camera Distance Culling + Fade

    Currently, all generated grass is passed to the Compute Shader on the CPU, and then all grass is added to the AppendBuffer, which means there is no culling logic.

    The simplest culling solution is to cull grass based on the distance between the camera and the grass. In the Inspector panel, open a value to represent the culling distance. Calculate the distance between the camera and the current grass instance. If it is greater than the set value, it will not be added to the AppendBuffer.

    First, pass the world coordinates of the camera into C#. Here is the semi-pseudo code:

    // Get the camera
    private Camera m_MainCamera;
    
    m_MainCamera = Camera.main;
    
    if (m_MainCamera != null)
        m_ComputeShader.SetVector(ID_camreaPos, m_MainCamera.transform.position);

    In CS, calculate the distance between the grass and the camera:

    float distanceFromCamera = distance(input.position, _CameraPositionWS);

    The distance function code is as follows:

    float distanceFade = 1 - saturate((distanceFromCamera - _MinFadeDist) / (_MaxFadeDist - _MinFadeDist));

    If the value is less than 0, return directly.

    // skip if out of fading range too
    if (distanceFade < 0.001f)
    {
        return;
    }

    In the part between culling and not culling, set the grass width + Fade value to achieve a fading effect.

    Result.height = (bladeHeight + bladeHeightOffset * (xorshift128()*2-1)) * distanceFade;
    Result.width = (bladeWeight + bladeWeightOffset * (xorshift128()*2-1)) * distanceFade;
    ...
    Result.fade = xorshift128() * distanceFade;

    In the figure below, both are set to be relatively small for the convenience of demonstration.

    I think the actual effect is quite good and smooth. If the width and height of the grass are not modified, the effect will be greatly reduced.

    Of course, you can also modify the logic: do not completely remove the grass that exceeds the maximum drawing range, but reduce the number of drawings; or selectively draw the grass in the transition area.

    Both logics are acceptable, and if it were me I would choose the latter.

    5.7 Maintaining a set of visible ID buffers

    The so-called frustum culling is to reduce the redundant calculations of GPU through various methods at the CPU stage.

    So how do I let the Compute Shader know which grass needs to be rendered and which needs to be culled? My approach is to maintain a set of ID Lists. The length is the number of all grasses. If the current grass needs to be culled, otherwise the index value of the grass that needs to be rendered is recorded.

    List<uint> grassVisibleIDList = new List<uint>();
    
    // buffer that contains the ids of all visible instances
    private ComputeBuffer m_VisibleIDBuffer;
    
    private const int VISIBLE_ID_STRIDE        =  1 * sizeof(uint);
    
    m_VisibleIDBuffer = new ComputeBuffer(grassData.Count, VISIBLE_ID_STRIDE,
        ComputeBufferType.Structured); //uint only, per visible grass
    m_ComputeShader.SetBuffer(m_ID_GrassKernel, "_VisibleIDBuffer", m_VisibleIDBuffer);
    
    m_VisibleIDBuffer?.Release();

    Since some grass has been removed before being passed to the Compute Shader, the number of Dispatches is no longer the number of all grasses, but the number of the current List.

    // m_ComputeShader.Dispatch(m_ID_GrassKernel, m_DispatchSize, 1, 1);
    
    m_DispatchSize = Mathf.CeilToInt(grassVisibleIDList.Count / threadGroupSize);

    Generates a fully visible ID sequence.

    void GrassFastList(int count)
    {
        grassVisibleIDList = Enumerable.Range(0, count).ToArray().ToList();
    }

    And each frame should be uploaded to GPU. The preparation is complete, and then use Quad tree to operate this array.

    5.8 Quad/Octtree Storing Grass Index

    You can consider dividing an AABB into multiple sub-AABBs and then use a quadtree to store and manage them.

    Currently, all grass is in one AABB. Next, we build an octree and put all the grass in this AABB into branches. This makes it easy to do frustum culling in the early stages of the CPU.

    How to store it? If the current grass has a small vertical drop, then a quadtree is enough. If it is an open world with undulating mountains, then use an octree. However, considering that the grass has a relatively high horizontal density, I use a quadtree + octree structure here. The parity of the depth determines whether the current depth is divided into four nodes or eight nodes. If there is no need for strong height division, it is OK to use an octree, but I feel that the efficiency may be a little lower. Here, it is directly evenly distributed. Later optimization can consider the AABB division method based on variable length dynamic changes.

    if (depth % 2 == 0)
    {
        ...
        m_children.Add(new CullingTreeNode(topLeftSingle, depth - 1));
        m_children.Add(new CullingTreeNode(bottomRightSingle, depth - 1));
        m_children.Add(new CullingTreeNode(topRightSingle, depth - 1));
        m_children.Add(new CullingTreeNode(bottomLeftSingle, depth - 1));
    }
    else
    {
        ...
        m_children.Add(new CullingTreeNode(topLeft, depth - 1));
        m_children.Add(new CullingTreeNode(bottomRight, depth - 1));
        m_children.Add(new CullingTreeNode(topRight, depth - 1));
        m_children.Add(new CullingTreeNode(bottomLeft, depth - 1));
    
        m_children.Add(new CullingTreeNode(topLeft2, depth - 1));
        m_children.Add(new CullingTreeNode(bottomRight2, depth - 1));
        m_children.Add(new CullingTreeNode(topRight2, depth - 1));
        m_children.Add(new CullingTreeNode(bottomLeft2, depth - 1));
    }

    The detection of the view frustum and AABB can be done with GeometryUtility.TestPlanesAABB.

    public void RetrieveLeaves(Plane[] frustum, List<Bounds> list, List<int> visibleIDList)
    {
        if (GeometryUtility.TestPlanesAABB(frustum, m_bounds))
        {
            if (m_children.Count == 0)
            {
                if (grassIDHeld.Count > 0)
                {
                    list.Add(m_bounds);
                    visibleIDList.AddRange(grassIDHeld);
                }
            }
            else
            {
                foreach (CullingTreeNode child in m_children)
                {
                    child.RetrieveLeaves(frustum, list, visibleIDList);
                }
            }
        }
    }

    This code is the key part, passing in:

    • The six planes of the camera frustum Plane[]
    • A list of Bounds objects storing all nodes within the frustum
    • Stores a list of all grass indices contained in the node within the frustum

    By calling the method of this quad/octree, you can get the list of all bounding boxes and grass within the frustum.

    Then all the grass indexes can be made into a Buffer and passed to the Compute Shader.

    m_VisibleIDBuffer.SetData(grassVisibleIDList);

    To get a visual AABB, use the OnDrawGizmos() method.

    Pass all the AABBs obtained by culling the view frustum into this function. This way you can see the AABBs intuitively.

    Also write everything inside the view frustum to the visible grass.

    5.9 Flickering grass problem – Pitfalls

    Here I hit a small pit. I completed the octree and successfully divided many sub-AABBs as shown above. But when I moved the camera, the grass flickered wildly. I was a little lazy and didn't want to make GIF videos. Observe the two pictures below. I just moved the view slightly and changed the current Visibility List. The position of the grass jumped a lot, and it looked like the grass flickered continuously.

    I can't figure it out, there is no problem with Compute Shader culling.

    The number of dispatches is also calculated based on the length of the visibility list, so there must be enough threads to compute the shader.

    And there is no problem with DrawMeshInstancedIndirect.

    What's the problem?

    After a long debugging, I found that the problem lies in the process of taking random numbers by Xorshift of Compute Shader.

    Before using _VisibleIDBuffer, one grass corresponds to one thread ID, which is determined from the moment the grass is born. Now that this group of indexes has been added, and the ID of the incoming random value is not changed to a Visible ID, the random numbers will appear very discrete.

    That is to say, all previous IDs are replaced with index values taken from _VisibleIDBuffer!

    5.10 Multi-object Interaction

    Currently there is only one trampler passed in. If it is not passed in, an error will be reported, which is unbearable.

    There are three parameters about interaction:

    • pos – Vector3
    • trampleStrength – Float
    • trampleRadius – Float

    Now put trampleRadius into pos (Vector4) (or another one, depending on your needs), and pass the position array into it using SetVectorArray. This way each interactive object can have a dedicated interactive radius. For fat interactive objects, make the radius larger, and for skinny ones, make it smaller. That is, remove the following line:

    // In SetGrassDataBase, no need to upload every frame
    // m_ComputeShader.SetFloat("trampleRadius", trampleRadius);

    become:

    // In SetGrassDataUpdate, each frame must be uploaded
    // Set up multiple interactive objects
    if (trampler.Length > 0)
    {
        Vector4[] positions = new Vector4[trampler.Length];
        for (int i = 0; i < trampler.Length; i++)
        {
            positions[i] = new Vector4(trampler[i].transform.position.x, trampler[i].transform.position.y, trampler[i].transform.position.z,
                trampleRadius);
        }
        m_ComputeShader.SetVectorArray(ID_tramplePos, positions);
    }

    Then you have to pass the number of interactive objects so that the Compute Shader knows how many interactive objects need to be processed. This also needs to be updated every frame. I am used to storing an ID index for objects that are updated every frame, which is more efficient.

    // Initializing
    ID_trampleLength = Shader.PropertyToID("_trampleLength");
    // In each frame
    m_ComputeShader.SetFloat(ID_trampleLength, trampler.Length);

    I repackaged it:

    By modifying the corresponding code, you can adjust the radius of each interactive object on the panel. If you want to enrich this adjustment function, you can consider passing a separate Buffer into it.

    In the Compute Shader, it is relatively simple to combine multiple rotations.

    // Trampler
    float4 qt = float4(0, 0, 0, 1); // 1 in quaternion is like this, the imaginary part is 0
    for (int trampleIndex = 0; trampleIndex < trampleLength; trampleIndex++)
    {
        float trampleRadius = tramplePos[trampleIndex].a;
        float3 relativePosition = input.position - tramplePos[trampleIndex].xyz;
        float dist = length(relativePosition);
        if (dist < trampleRadius) {
            // Use the power to enhance the effect at close range
            float eff = pow((trampleRadius - dist) / trampleRadius, 2) * trampleStrength;
            float3 direction = normalize(relativePosition);
            float3 newTargetDirection = float3(direction.x * eff, 1, direction.z * eff);
            qt = quatMultiply(MapVector(float3(0, 1, 0), newTargetDirection), qt);
        }
    }

    5.11 Editor real-time preview

    The camera currently passed to the Compute Shader is the main camera, which is the one in the game window. Now you want to temporarily get the main camera's lens in the editor (Scene window) and restore it after starting the game. You can use the Scene View GUI to draw events.

    Here is an example of remodeling my current code:

    #if UNITY_EDITOR
        SceneView view;
    
        void OnDestroy()
        {
            // When the window is destroyed, remove the delegate
            // so that it will no longer do any drawing.
            SceneView.duringSceneGui -= this.OnScene;
        }
    
        void OnScene(SceneView scene)
        {
            view = scene;
            if (!Application.isPlaying)
            {
                if (view.camera != null)
                {
                    m_MainCamera = view.camera;
                }
            }
            else
            {
                m_MainCamera = Camera.main;
            }
        }
        private void OnValidate()
        {
            // Set up components
            if (!Application.isPlaying)
            {
                if (view != null)
                {
                    m_MainCamera = view.camera;
                }
            }
            else
            {
                m_MainCamera = Camera.main;
            }
        }
    #endif

    When initializing the shader, subscribe to the event at the beginning, and then determine whether the current state is game, and then pass a camera. If it is in edit mode, then m_MainCamera is still NULL.

    void InitShader()
    {
    #if UNITY_EDITOR
        SceneView.duringSceneGui += this.OnScene;
        if (!Application.isPlaying)
        {
            if (view != null && view.camera != null)
            {
                m_MainCamera = view.camera;
            }
        }
    #endif
        if (Application.isPlaying)
        {
            m_MainCamera = Camera.main;
        }
        ...

    In the frame-by-frame Update function, if it is detected that m_MainCamera is NULL, it is determined that the current mode is edit mode:

    // Pass in the camera coordinates
            if (m_MainCamera != null)
                m_ComputeShader.SetVector(ID_camreaPos, m_MainCamera.transform.position);
    #if UNITY_EDITOR
            else if (view != null && view.camera != null)
            {
                m_ComputeShader.SetVector(ID_camreaPos, view.camera.transform.position);
            }
    
    #endif

    6. Cutting Grass

    Maintain a set of Cut Buffers

    // added for cutting
    private ComputeBuffer m_CutBuffer;
    float[] cutIDs;

    Initializing Buffer

    private const int CUT_ID_STRIDE            =  1 * sizeof(float);
    // added for cutting
    m_CutBuffer = new ComputeBuffer(grassData.Count, CUT_ID_STRIDE, ComputeBufferType.Structured);
    // added for cutting
    m_ComputeShader.SetBuffer(m_ID_GrassKernel, "_CutBuffer", m_CutBuffer);
    m_CutBuffer.SetData(cutIDs);

    Don't forget to release it when you disable it.

    // added for cutting
    m_CutBuffer?.Release();

    Define a method to pass in the current position and radius to calculate the position of the grass. Set the corresponding cutID to -1.

    // newly added for cutting
    public void UpdateCutBuffer(Vector3 hitPoint, float radius)
    {
        // can't cut grass if there is no grass in the scene
        if (grassData.Count > 0)
        {
            List<int> grasslist = new List<int>();
            // Get the list of IDS that are near the hitpoint within the radius
            cullingTree.ReturnLeafList(hitPoint, grasslist, radius);
            Vector3 brushPosition = this.transform.position;
            // Compute the squared radius to avoid square root calculations
            float squaredRadius = radius * radius;
    
            for (int i = 0; i < grasslist.Count; i++)
            {
                int currentIndex = grasslist[i];
                Vector3 grassPosition = grassData[currentIndex].position + brushPosition;
    
                // Calculate the squared distance
                float squaredDistance = (hitPoint - grassPosition).sqrMagnitude;
    
                // Check if the squared distance is within the squared radius
                // Check if there is grass to cut, or of the grass is uncut(-1)
                if (squaredDistance <= squaredRadius && (cutIDs[currentIndex] > hitPoint.y || cutIDs[currentIndex] == -1))
                {
                    // store cutting point
                    cutIDs[currentIndex] = hitPoint.y;
                }
    
            }
        }
        m_CutBuffer.SetData(cutIDs);
    }

    Then bind a script to the object that needs to be cut:

    using System.Collections;
    using System.Collections.Generic;
    using UnityEngine;
    
    
    public class Cutgrass : MonoBehaviour
    {
        [SerializeField]
        GrassControl grassComputeScript;
    
        [SerializeField]
        float radius = 1f;
    
        public bool updateCuts;
    
        Vector3 cachedPos;
        // Start is called before the first frame update
    
    
        // Update is called once per frame
        void Update()
        {
            if (updateCuts && transform.position != cachedPos)
            {
                Debug.Log("Cutting");
                grassComputeScript.UpdateCutBuffer(transform.position, radius);
                cachedPos = transform.position;
    
            }
        }
    
        private void OnDrawGizmos()
        {
            Gizmos.color = new Color(1, 0, 0, 0.3f);
            Gizmos.DrawWireSphere(transform.position, radius);
        }
    }

    In the Compute Shader, just modify the grass height. (Very straightforward...) You can change the effect to whatever you want.

    StructuredBuffer<float> _CutBuffer;// added for cutting
    
        float cut = _CutBuffer[usableID];
        Result.height = (bladeHeight + bladeHeightOffset * (xorshift128()*2-1)) * distanceFade;
        if(cut != -1){
            Result.height *= 0.1f;
        }

    Done!

    References

    1. https://learn.microsoft.com/zh-cn/windows/uwp/graphics-concepts/geometry-shader-stage–gs-
    2. https://roystan.net/articles/grass-shader/
    3. https://danielilett.com/2021-08-24-tut5-17-stylised-grass/
    4. https://catlikecoding.com/unity/tutorials/basics/compute-shaders/
    5. Notes - A preliminary exploration of compute-shader
    6. https://www.patreon.com/posts/53587750
    7. https://www.youtube.com/watch?v=xKJHL8nQiuM
    8. https://www.patreon.com/posts/40090373
    9. https://www.patreon.com/posts/47447321
    10. https://www.patreon.com/posts/wip-patron-only-83683483
    11. https://www.youtube.com/watch?v=DeATXF4Szqo
    12. https://catlikecoding.com/unity/tutorials/basics/compute-shaders/
    13. https://docs.unity3d.com/Manual/class-ComputeShader.html
    14. https://docs.unity3d.com/ScriptReference/ComputeShader.html
    15. https://learn.microsoft.com/en-us/windows/win32/api/D3D11/nf-d3d11-id3d11devicecontext-dispatch
    16. https://zhuanlan.zhihu.com/p/102104374
    17. Unity-compute-shader-Basic knowledge
    18. https://kylehalladay.com/blog/tutorial/2014/06/27/Compute-Shaders-Are-Nifty.html
    19. https://cuihongzhi1991.github.io/blog/2020/05/27/builtinttourp/
    20. https://jadkhoury.github.io/files/MasterThesisFinal.pdf

en_USEN