Tags: SSR

  • Games202 作业三 SSR实现

    Games202 Assignment 3 SSR Implementation

    Assignment source code:

    https://github.com/Remyuu/GAMES202-Homeworkgithub.com/Remyuu/GAMES202-Homework

    TODO List

    • Implements shading of the scene's direct lighting (taking shadows into account).
    • Implements screen space ray intersection (SSR).
    • Implements shading of indirect lighting of the scene.
    • Implement RayMarch with dynamic step size.
    • (Not written yet) Bonus 1: Screen Space Ray Tracing with Mipmap Optimization.
    img

    Number of samples: 32

    Written in front

    The basic part of this assignment is the easiest among all the assignments in 202. There is nothing particularly complicated. But I don't know how to start with the bonus part. Can someone please help me?

    Depth buffer problem of framework

    This time, the operation encountered a more serious problem on macOS. The part of the cube close to the ground showed abnormal cutting jagged problems as the distance of the camera changed. This phenomenon did not occur on Windows, which was quite strange.

    img

    I personally feel that this is related to the accuracy of the depth buffer, and may be caused by z-fighting, in which two or more overlapping surfaces compete for the same pixel. There are generally several solutions to this problem:

    • Adjust the near and far planes: don't make the near plane too close to the camera, and don't make the far plane too far away.
    • Improve the precision of the depth buffer: use 32-bit or higher precision.
    • Multi-Pass Rendering: Use different rendering schemes for objects in different distance ranges.

    The simplest solution is to modify the size of the near plane, located in line 25 of the framework's engine.js.

    // engine.js // const camera = new THREE.PerspectiveCamera(75, gl.canvas.clientWidth / gl.canvas.clientHeight, 0.0001, 1e5); const camera = new THREE.PerspectiveCamera(75, gl.canvas.clientWidth / gl.canvas.clientHeight, 5e-2, 1e2);

    This will give you a pretty sharp border.

    img

    Added "Pause Rendering" function

    This section is optional. To reduce the strain on your computer, simply write a button to pause the rendering.

    // engine.js let settings = { 'Render Switch': true }; function createGUI() { ... // Add the boolean switch here gui.add(settings, 'Render Switch'); ... } function mainLoop (now) { if(settings['Render Switch']){ cameraControls.update(); renderer.render(); } requestAnimationFrame(mainLoop); } requestAnimationFrame(mainLoop);
    img

    image-20231117191114477

    1. Implementing direct lighting

    Implement EvalDiffuse(vec3 wi, vec3 wo, vec2 uv) and EvalDirectionalLight(vec2 uv) in shaders/ssrShader/ssrFragment.glsl.

    // ssrFragment.glsl vec3 EvalDiffuse(vec3 wi, vec3 wo, vec2 screenUV) { vec3 reflectivity = GetGBufferDiffuse(screenUV); vec3 normal = GetGBufferNormalWorld(screenUV); float cosi = max(0., dot(normal, wi)); vec3 f_r = reflectivity * cosi; return f_r; } vec3 EvalDirectionalLight(vec2 screenUV) { vec3 Li = uLightRadiance * GetGBufferuShadow(screenUV); return Li; }

    The first code snippet actually implements the Lambertian reflection model, which corresponds to $f_r \cdot \text{cos}(\theta_i)$ in the rendering equation.

    Here I divide $\pi$, but according to the results given in the assignment framework, there should be no division, so just take it as it is here.

    The second part is responsible for direct lighting (including shadow occlusion), relative to the $L_i \cdot V$ of the rendering equation.

    Lo(p,ωo)=Le(p,ωo)+∫ΩLi(p,ωi)⋅fr(p,ωi,ωo)⋅V(p,ωi)⋅cos⁡(θi)dωi

    Let's review the Lambertian reflection model here. We noticed that EvalDiffuse passed in two directions, wi and wo, but we only used the direction of the incident light, wi. This is because the Lambertian model has nothing to do with the direction of observation, but only with the surface normal and the cosine value of the incident light.

    Finally, set the result in main().

    // ssrFragment.glsl void main() { float s = InitRand(gl_FragCoord.xy); vec3 L = vec3(0.0); vec3 wi = normalize(uLightDir); vec3 wo = normalize(uCameraPos - vPosWorld.xyz); vec2 worldPos = GetScreenCoordinate(vPosWorld.xyz); L = EvalDiffuse(wi, wo, worldPos) * EvalDirectionalLight(worldPos); vec3 color = pow(clamp(L, vec3(0.0), vec3(1.0)), vec3(1.0 / 2.2)) ; gl_FragColor = vec4(vec3(color.rgb), 1.0); }
    img

    2. Specular SSR – Implementing RayMarch

    Implement the RayMarch(ori, dir, out hitPos) function to find the intersection point between the ray and the object and return whether the ray intersects the object. The parameters ori and dir are values in the world coordinate system, representing the starting point and direction of the ray respectively, where the direction vector is a unit vector. For more information, please refer to EA's SIG15Course Report.

    The "cube1" of the work frame itself includes the ground, so the final SSR effect of this thing is not very beautiful. The "beautiful" here refers to the clarity of the result map in the paper or the exquisiteness of the water reflection effect in the game.

    To be precise, what we implement in this article is the most basic "mirror SSR", namely Basic mirror-only SSR.

    img

    The easiest way to implement "mirror SSR" is to use Linear Raymarch, which gradually determines the occlusion relationship between the current position and the depth position of gBuffer through small steps.

    img
    // ssrFragment.glsl bool RayMarch(vec3 ori, vec3 dir, out vec3 hitPos) { const int totalStepTimes = 60; const float threshold = 0.0001; float step = 0.05; vec3 stepDir = normalize(dir) * step; vec3 curPos = ori ; for(int i = 0; i < totalStepTimes; i++) { vec2 screenUV = GetScreenCoordinate(curPos); float rayDepth = GetDepth(curPos); float gBufferDepth = GetGBufferDepth(screenUV); // Check if the ray has hit an object if(rayDepth > gBufferDepth + threshold){ hitPos = curPos; return true; } curPos += stepDir; } return false; }

    Finally, fine-tune the step size. I ended up with 0.05. If the step size is too large, the reflection will be "broken". If the step size is too small and the number of steps is not enough, the calculation may be terminated because the step distance is not enough where the reflection should be. The maximum number of steps in the figure below is 150.

    img
    // ssrFragment.glsl vec3 EvalSSR(vec3 wi, vec3 wo, vec2 screenUV) { vec3 worldNormal = GetGBufferNormalWorld(screenUV); vec3 relfectDir = normalize(reflect(-wo, worldNormal)); vec3 hitPos; if(RayMarch(vPosWorld.xyz ,relfectDir, hitPos)){ vec2 INV_screenUV = GetScreenCoordinate(hitPos); return GetGBufferDiffuse(INV_screenUV); } else{ return vec3(0.); } }

    Write a function that calls RayMarch and wraps it up so it can be used in main().

    // ssrFragment.glsl void main() { float s = InitRand(gl_FragCoord.xy); vec3 L = vec3(0.0); vec3 wi = normalize(uLightDir); vec3 wo = normalize(uCameraPos - vPosWorld.xyz); vec2 screenUV = GetScreenCoordinate(vPosWorld.xyz); // Basic mirror-only SSR float reflectivity = 0.2; L = EvalDiffuse(wi, wo, screenUV) * EvalDirectionalLight(screenUV); L+= EvalSSR(wi, wo, screenUV) * reflectivity; vec3 color = pow(clamp(L, vec3(0.0), vec3(1.0)), vec3(1.0 / 2.2)); gl_FragColor = vec4(vec3(color.rgb), 1.0); }

    If you just want to test the effect of SSR, please adjust it yourself in main().

    img
    img

    Before the release of "Killzone Shadow Fall" in 2013, SSR technology was still subject to great restrictions, because in actual development, we usually need to simulate glossy objects. Due to the performance limitations at the time, SSR technology was not widely adopted. With the release of "Killzone Shadow Fall", it marks a significant progress in real-time reflection technology. Thanks to the special hardware of PS4, it is possible to render high-quality glossy and semi-reflective objects in real time.

    img

    In the following years, SSR technology developed rapidly, especially in combination with technologies such as PBR.

    Starting with Nvidia's RTX graphics cards, the rise of real-time ray tracing has gradually replaced SSR in some scenarios. However, in most development scenarios, traditional SSR still plays a considerable role.

    The future development trend will still be a mixture of traditional SSR technology and ray tracing technology.

    3. Indirect lighting

    Write it according to the pseudocode. That is, use the Monte Carlo method to solve the rendering equation. Unlike before, the samples this time are all in screen space. In the sampling process, you can use the SampleHemisphereUniform(inout s, ou pdf) and SampleHemisphereCos(inout s, out pdf) provided by the framework. These two functions return local coordinates, and the input parameters are the random number s and the sampling probability pdf.

    For this part, you need to understand the pseudo code in the figure below, and then complete EvalIndirectionLight() accordingly.

    img

    First of all, we need to know that our sampling is still based on screen space. Therefore, we treat the content that is not on the screen (gBuffer) as non-existent. It is understood that there is only one layer of shell facing the camera.

    Indirect lighting involves random sampling of the upper hemisphere direction and the calculation of the corresponding PDF. Use InitRand(screenUV) to get the random number, then choose one of the two, SampleHemisphereUniform(inout float s, out float pdf) or SampleHemisphereCos(inout float s, out float pdf), update the random number and get the corresponding PDF and the position dir of the local coordinate system on the unit hemisphere.

    Pass the normal coordinates of the current Shading Point into the function LocalBasis(n, out b1, out b2), and then return b1, b2, where the three unit vectors n, b1, b2 are orthogonal to each other. Through the local coordinate system formed by these three vectors, dir is converted to world coordinates. I will write about the principle of LocalBasis() at the end.

    By the way, the matrix constructed with the vectors n (normal), b1, and b2 is commonly referred to as the TBN matrix in computer graphics.

    // ssrFragment.glsl #define SAMPLE_NUM 5 vec3 EvalIndirectionLight(vec3 wi, vec3 wo, vec2 screenUV){ vec3 L_ind = vec3(0.0); float s = InitRand(screenUV); vec3 normal = GetGBufferNormalWorld(screenUV); vec3 b1, b2; LocalBasis(normal, b1, b2); for(int i = 0; i < SAMPLE_NUM; i++){ float pdf; vec3 direction = SampleHemisphereUniform(s, pdf); vec3 worldDir = normalize(mat3(b1, b2, normal) * direction); vec3 position_1; if(RayMarch(vPosWorld.xyz, worldDir, position_1)){ // The sampling ray hits position_1 vec2 hitScreenUV = GetScreenCoordinate(position_1); vec3 bsdf_d = EvalDiffuse(worldDir, wo, screenUV); // Direct lighting vec3 bsdf_i = EvalDiffuse(wi, worldDir, hitScreenUV); // Indirect lighting L_ind += bsdf_d / pdf * bsdf_i * EvalDirectionalLight(hitScreenUV); } } L_ind /= float(SAMPLE_NUM); return L_ind; } // ssrFragment.glsl // Main entry point for the shader void main() { vec3 wi = normalize(uLightDir); vec3 wo = normalize( uCameraPos - vPosWorld.xyz); vec2 screenUV = GetScreenCoordinate(vPosWorld.xyz); // Basic mirror-only SSR coefficient float ssrCoeff = 0.0; // Indirection Light coefficient float indCoeff = 0.3; // Direction Light vec3 L_d = EvalDiffuse(wi, wo, screenUV) * EvalDirectionalLight(screenUV); // SSR Light vec3 L_ssr = EvalSSR(wi, wo, screenUV) * ssrCoeff; // Indirection Light vec3 L_i = EvalIndirectionLight(wi, wo, screenUV) * IndCorff; vec3 result = L_d + L_ssr + L_i; vec3 color = pow(clamp(result, vec3(0.0), vec3(1.0)), vec3(1.0 / 2.2)); gl_FragColor = vec4(vec3(color.rgb), 1.0); }

    Show only indirect lighting. Samples = 5.

    img

    Direct lighting + indirect lighting. Number of samples = 5.

    img

    It was such a headache to write this part. Even with SAMPLE_NUM set to 1, my computer was sweating profusely. Once the Live Server was turned on, there was a delay when typing directly. I couldn't stand it. Is this the performance of the M1pro? And what I can't stand the most is that the Safari browser is stuck, why is the whole system stuck? Is this your User First strategy of macOS? I don't understand. I had no choice but to take out my gaming computer to pass the LAN test project (sad). I just didn't expect that the RTX3070 would also sweat profusely when running.It seems that the algorithm I wrote is a pile of shit, and my life is also a pile of shit..

    4. RayMarch Improvements

    The current RayMarch() is actually problematic and will cause light leakage.

    img

    When the sampling number is 5, it is only about 46.2 frames. My device is M1pro 16GB.

    img

    Here we will focus on why light leakage occurs. See the figure below. Our gBuffer only has the depth information of the blue part. Even if our algorithm above has determined that the current curPos is deeper than the depth of gBuffer, it cannot ensure that this curPos is the collision point. Therefore, the algorithm above does not consider the situation in the figure, which leads to light leakage.

    img

    forSolve the light leakage problemWe introduce a threshold to solve this problem (yes, it is an approximation). If the difference between curPos and the depth recorded by the current gBuffer is greater than a certain threshold, the situation shown in the figure below will occur. At this time, the information in the screen space cannot correctly provide the reflection information, so the SSR result of this Shading Point is vec3(0). It is so simple and crude!

    img

    The idea of the code is similar to the previous one. At each step, the relationship between the depth of the next step position and the depth of gBuffer is determined. If the next step position is in front of gBuffer (nextDepth

    bool RayMarch(vec3 ori, vec3 dir, out vec3 hitPos) { const float EPS = 1e-2; const int totalStepTimes = 60; const float threshold = 0.1; float step = 0.05; vec3 stepDir = normalize(dir) * step; vec3 curPos = ori + stepDir; vec3 nextPos = curPos + stepDir; for(int i = 0; i < totalStepTimes; i++) { if(GetDepth(nextPos) < GetGBufferDepth(GetScreenCoordinate(nextPos))){ curPos = nextPos; nextPos += stepDir; }else if(GetGBufferDepth(GetScreenCoordinate(curPos )) - GetDepth(curPos) + EPS > threshold){ return false; }else{ curPos += stepDir; vec2 screenUV = GetScreenCoordinate(curPos); float rayDepth = GetDepth(curPos); float gBufferDepth = GetGBufferDepth(screenUV); if(rayDepth > gBufferDepth + threshold){ hitPos = curPos; return true; } } } return false; }

    The frame rate dropped to around 42.6, but the picture was significantly improved! At least there was no noticeable light leakage.

    img

    However, there are still some flaws in the picture, that is, there will be hairy reflection patterns at the edges, which means that the light leakage problem is still not solved, as shown in the following figure:

    img

    The above methodThere is indeed a problemWhen comparing with the threshold, we mistakenly used curPos for comparison (i.e., Step n in the figure below), which caused the code to enter the third branch and return the hitPos of the wrong curPos.

    img

    Taking a step back, we have no way to guarantee that the final calculated curPos falls exactly on the line between the edge of the object and the origin of the camera. To put it bluntly, the blue line in the figure below is quite discrete. We want to get the curPos that is "just" at the boundary, and then deal with the defects in the distance from "Step n" to "the "just" curPos" (that is, the burr error above), but obviously due to various precision reasons, we can't get it. In the figure below, the green line represents a step.

    img

    Even if we adjust the ratio of threshold/step to make it close to 1, we can hardly eliminate the problem and can only alleviate it, as shown in the figure below.

    img

    Therefore, we need to improve the "anti-light leakage" method again.

    In other words, the idea of improvement is very simple. Since I can't get the "exact" curPos point, I will guess it. Specifically, I will do a linear interpolation directly. Before interpolation, I will make an approximation, that is, I will regard the sight lines as parallel to each other, and then make a similar triangle as shown in the figure below, guess the curPos we want, and then use it as hitPos.

    img

    hitPos=curPos+s1s1+s2

    bool RayMarch(vec3 ori, vec3 dir, out vec3 hitPos) { bool result = false; const float EPS = 1e-3; const int totalStepTimes = 60; const float threshold = 0.1; float step = 0.05; vec3 stepDir = normalize(dir ) * step; vec3 curPos = ori + stepDir; vec3 nextPos = curPos + stepDir; for(int i = 0; i < totalStepTimes; i++) { if(GetDepth(nextPos) < GetGBufferDepth(GetScreenCoordinate(nextPos))){ curPos = nextPos; nextPos += stepDir; continue; } float s1 = GetGBufferDepth(GetScreenCoordinate(curPos)) - GetDepth(curPos) + EPS; float s2 = GetDepth(nextPos) - GetGBufferDepth(GetScreenCoordinate(nextPos)) + EPS; if(s1 < threshold && s2 < threshold){ hitPos = curPos + stepDir * s1 / (s1 + s2); result = true; } break; } return result ; }

    The effect is quite good, with no ghosting or border artifacts. And the frame rate is similar to the original algorithm, averaging around 49.2.

    img

    Next, we will focus on optimizing performance, specifically:

    • Add adaptive step
    • Off-screen ignored judgment

    Off-screen ignored judgment Very simple. If the uvScreen of curPos is not between 0 and 1, then the current step is abandoned.

    Let's talk about the adaptive step in detail. That is, add two lines at the beginning of for. The actual frame rate will increase slightly by about 2-3 frames.

    vec2 uvScreen = GetScreenCoordinate(curPos); if(any(bvec4(lessThan(uvScreen, vec2(0.0)), greaterThan(uvScreen, vec2(1.0))))) break;

    Adaptive step It is not difficult. First, set a larger value for the initial step. IfAfter steppingcurPos Not on screen or The depth value is deeper than gBuffer or "s1 < threshold && s2 < threshold" is not satisfied , then let the step be halved to ensure accuracy.

    bool RayMarch(vec3 ori, vec3 dir, out vec3 hitPos) { const float EPS = 1e-2; const int totalStepTimes = 20; const float threshold = 0.1; bool result = false, firstIn = false; float step = 0.8; vec3 curPos = ori; vec3 nextPos; for(int i = 0; i < totalStepTimes; i++) { nextPos = curPos+dir*step; vec2 uvScreen = GetScreenCoordinate(curPos); if(any(bvec4(lessThan(uvScreen, vec2(0.0))), greaterThan(uvScreen, vec2(1.0))))) break; if (GetDepth(nextPos) < GetGBufferDepth(GetScreenCoordinate(nextPos))){ curPos += dir * step; if(firstIn) step *= 0.5; continue; } firstIn = true; if(step < EPS){ float s1 = GetGBufferDepth(GetScreenCoordinate(curPos)) - GetDepth(curPos) + EPS; float s2 = GetDepth(nextPos) - GetGBufferDepth(GetScreenCoordinate(nextPos)) + EPS; if(s1 < threshold && s2 < threshold){ hitPos = curPos + 2.0 * dir * step * s1 / (s1 + s2); result = true; } break; } if(firstIn) step *= 0.5; } return result; }

    After the improvement, the frame rate suddenly reached 100 frames, almost doubling.

    img

    Finally, tidy up the code.

    #define EPS 5e-2 #define TOTAL_STEP_TIMES 20 #define THRESHOLD 0.1 #define INIT_STEP 0.8 bool outScreen(vec3 curPos){ vec2 uvScreen = GetScreenCoordinate(curPos); return any(bvec4(lessThan(uvScreen, vec2(0.0)), greaterThan(uvScreen, vec2(1.0)))); } bool testDepth(vec3 nextPos){ return GetDepth(nextPos) < GetGBufferDepth(GetScreenCoordinate(nextPos)); } bool RayMarch(vec3 ori, vec3 dir, out vec3 hitPos) { float step = INIT_STEP; bool result = false, firstIn = false; vec3 nextPos, curPos = ori; for(int i = 0; i < TOTAL_STEP_TIMES; i++) { nextPos = curPos + dir * step; if(outScreen(curPos)) break; if(testDepth(nextPos)){ // You can improve curPos += dir * step; continue; }else{ // Too advanced firstIn = true; if(step < EPS){ float s1 = GetGBufferDepth(GetScreenCoordinate(curPos)) - GetDepth(curPos) + EPS; float s2 = GetDepth(nextPos) - GetGBufferDepth(GetScreenCoordinate(nextPos)) + EPS; if(s1 < THRESHOLD && s2 < THRESHOLD){ hitPos = curPos + 2.0 * dir * step * s1 / (s1 + s2); result = true; } break; } if(firstIn) step *= 0.5; } } return result; }

    Switching to the cave scene, the sampling rate is set to 32, and the frame rate is only a pitiful 4 frames.

    img

    And the quality of the secondary light source is very good.

    img

    However, this algorithm will cause new problems when applied to reflections, especially the following picture, which has serious distortion.

    img
    img

    5. Mipmap Implementation

    Hierarchical-Z map based occlusion culling

    6. LocalBasis builds TBN principle

    Generally speaking, constructing the normal tangent vector (normal, tangent, and bitangent vector) is achieved through the cross product. The implementation method is very simple. First, select an auxiliary vector that is not parallel to the normal vector, and do a cross product between the two to get the first tangent vector. Then, do a cross product between the tangent vector and the normal vector to get the bitangent vector. The specific code is written as follows:

    void CalculateTBN(const vec3 &normal, vec3 &tangent, vec3 &bitangent) { vec3 helperVec; if (abs(normal.x) < abs(normal.y)) helperVec = vec3(1.0, 0.0, 0.0); else helperVec = vec3(0.0 , 1.0, 0.0); tangent = normalize(cross(helperVec, normal)); bitangent = normalize(cross(normal, tangent)); }

    But the code in the job framework avoids usingCross Product, which is very clever. Simply put, it is to ensure that the vectorDot ProductAll are 0.

    • $b1⋅n=0$
    • $b2⋅n=0$
    • $b1⋅b2=0$
    void LocalBasis(vec3 n, out vec3 b1, out vec3 b2) { float sign_ = sign(nz); if (nz == 0.0) { sign_ = 1.0; } float a = -1.0 / (sign_ + nz); float b = nx * ny * a; b1 = vec3(1.0 + sign_ * nx * nx * a, sign_ * b, -sign_ * nx); b2 = vec3(b, sign_ + ny * ny * a, -ny); }

    This algorithm is a heuristic one, which introduces a symbolic function, which is quite impressive. It also considers the case of division by 0, and the pattern is also full. However, the following four lines should be the author's random disassembly when he wrote the formula one day. Here I will restore the author's disassembly steps at that time. That is, the process of reverse deduction.

    img

    By the way, the sign function in the code can be multiplied in the last step.

    In fact, I can create a hundred such formulas, and I don’t know the difference between them. If you know, please tell me QAQ. If you insist, then it can be explained like this:

    Traditional cross-product-based methods may be numerically unstable because the cross-product result is close to the zero vector in this case. The method adopted in this paper is a heuristic method that constructs an orthogonal basis through a series of carefully designed steps. This method pays special attention to numerical stability, making it effective and stable when dealing with normal vectors close to extreme directions.

    grateful @I am a dragon set little fruit As pointed out by , the above method is very particular. The algorithm provided in the homework framework was obtained by Tom Duff et al. in 2017 by improving Frisvad's method. For details, please refer to the following two papers.

    https://graphics.pixar.com/library/OrthonormalB/paper.pdfgraphics.pixar.com/library/OrthonormalB/paper.pdf

    https://backend.orbit.dtu.dk/ws/portalfiles/portal/126824972/onb_frisvad_jgt2012_v2.pdfbackend.orbit.dtu.dk/ws/portalfiles/portal/126824972/onb_frisvad_jgt2012_v2.pdf

    References

    1. Games 202
    2. LearnOpenGL – Normal Mapping
en_USEN