Physically-Based Rendering and Per-Pixel LOD - Rogue Performance#

Physically-based rendering (PBR) is a forward and deferred render compatible lighting model that aims to better represent real-world light behaviour. It is costlier to calculate than traditional diffuse, specular, and ambient lighting, but it is very appealing to artists as it makes it easier to specify complex material properties. PBR art pipelines are rapidly becoming the norm in AAA titles.

../_images/perf-rec-18.jpg

Per-pixel texture level of detail (LOD)#

PBR pairs each object in a scene with a roughness/gloss map. This texture allows artists to alter the surface roughness and glossiness across an object, rather than having the same surface roughness or glossiness over the whole object. An example use is to add areas of dull rust to a shiny pistol, or to describe the properties of a rubber grip, all within a single draw call.

To add an element of reflectivity, environment maps are applied to all objects. Each environment map contains progressively blurrier surfaces towards the bottom of the chain. The sampled roughness value is used to calculate which mip level of the environment map should be sampled.

Why is this approach a problem for Rogue?#

Rogue subdivides a fragment shader USC task into 2x2 blocks of spatially aligned pixels. A primary reason for doing this is so gradients can be calculated across a pixel-quad to determine how texture filtering should be applied. It is also optimised for the standard rendering case where a LOD value is calculated for a pixel-quad based on the calculated gradients. This allows the graphics core to batch texture sample operations for the pixel-quad into a single TPU request.

When texture LOD is specified per-pixel, passed in via a varying, the graphics core assumes that each pixel in the quad has a unique LOD. This causes the USC to issue a TPU request for each pixel instead of the entire quad (USC instruction – pplod) which in turn causes one quarter TPU throughput. This behaviour could lead to a memory bandwidth bottleneck in some applications.

Detailed further down is an example fragment shader. This example shows how an application can get around this behaviour by using branching and performing bilinear filtering in software. By branching to a textureLod operation with a constant value as the LOD parameter, the compiler will no longer make assumptions about the LOD of each pixel. Therefore, the compiler will not automatically fetch a sample per pixel in the pixel group.

Note

The workaround described below increases the number of USC instructions significantly, so it is important to profile the application before implementing it. If the application is bandwidth or USC-limited, this workaround may negatively impact performance. Decreasing memory bandwidth in an application that is USC-limited would yield no performance benefits.

The workaround#

The GLSL workaround shown below avoids the one-quarter speed path. However, it introduces dynamic branching and additional instructions.

#version 310 es
in mediump float LOD;
in mediump vec3 TexCoords;

uniform lowp samplerCube EnvMap;
layout (location = 0) out lowp vec4 oColour;

mediump vec4 envSample(lowp samplerCube envMap_, mediump vec3 texCoords_, mediump float LOD_)
{
    mediump vec4 mip0;
    mediump vec4 mip1;

    if(LOD_ <= 4.0)
    {
        if(LOD_ <= 2.0)
            mip1 = textureLod(envMap_, texCoords_, 1.0);
        else // LOD_ > 2.0
            mip1 = textureLod(envMap_, texCoords_, 3.0);
    }
    else // LOD_ > 4.0
    {
        if(LOD_ <= 6.0)
            mip1 = textureLod(envMap_, texCoords_, 5.0);
        else // LOD > 6.0
            mip1 = textureLod(envMap_, texCoords_, 7.0);
    }

    if(LOD_ <= 3.0)
    {
        if(LOD_ <= 1.0)
            mip0 = textureLod(envMap_, texCoords_, 0.0);
        else // LOD_ > 1.0
            mip0 = textureLod(envMap_, texCoords_, 2.0);
    }
    else // LOD_ > 3.0
    {
        if(LOD_ <= 5.0)
            mip0 = textureLod(envMap_, texCoords_, 4.0);
        else // LOD_ > 5.0
            mip0 = textureLod(envMap_, texCoords_, 6.0);
    }

    bool isEven = ((int(LOD_) & 1) == 0);
    mediump float fractVal = fract(LOD_);
    mediump float invFractVal = 1.0 - fractVal;
    mediump float mixVal = isEven ? fractVal : invFractVal;
    return mix(mip0, mip1, mixVal);
}

void main()
{
    oColour = envSample(EnvMap, TexCoords, LOD);