Optimising the Model Fragment Shader

Or: A Journey of Half a Float

It is clear that the shader described earlier is expensive with many calculations occurring in this stage of the pipeline. It is very important for application performance to either be able to move some of these calculations out of the fragment shader to other parts of the pipeline, or to optimise the shader to be as efficient as possible.

This section will explain how this shader was optimised during development of the PBR demo. These optimisations range from steps performed in the asset pipeline to low-level optimisations in the GLSL code of the fragment shader.

Coarse Optimisations

There are a lot of algorithmic, or logic optimisations that went into optimising these shaders. These were mostly implemented by the original authors of this incarnation of PBR and the PowerVR Developer Technology team when putting them together. Various optimisations have been applied based on the PowerVR Performance Recommendations, such as not doing unnecessary work and moving all calculations as early as possible in the pipeline (offline → CPU → vertex shader → fragment shader). However, the fact remains that a lot of ALU calculations will happen in the model's fragment shader, making it very expensive.

Among the most important coarse optimisations that make this technique possible are:

  • Integrating the BRDF online in a lookup table texture

    This is an obvious optimisation as this calculation could not be done in the fragment shader while maintaining acceptable performance. The whole equation can be very nicely approximated in a 2D texture. If a function with more parameters needs to be integrated, such as in a more complex lighting scenario, then 3D textures could be used.

  • Integrating the environment lighting into the irradiance and prefiltered map

    As with the BRDF, it takes a few seconds to calculate each of those maps. Pre-calculating the environment should not even be considered an optimisation, but just another step that makes it possible to have a real-time approximation of global illumination.

  • Choosing a cheap tone-mapping operator

    Avoiding Overflow During Tonemapping will go into more detail about this, but essentially tonemapping is the process of mapping the HDR colour values to the normal sRGB range (LDR) so that they can be displayed on screen. The algorithms which perform this conversion are called tonemapping operators. The choice of operator is often dependent on visual and aesthetic considerations. In this demo, a simple operator was chosen to reduce the performance impact of tonemapping. Additionally, not requiring gamma correction is also a bonus.

Texture Optimisations

A lot of textures are sampled in this demo, so each individual texture should be as small as possible to reduce overall bandwidth usage. FP16 textures should only be considered for the BRDF. It is best to stick to ASTC HDR for platforms that support it, or, if it is necessary to go with an uncompressed format, choose either B10G11R11F or RGB9E5.

For more information on the advantages and disadvantages of different texture formats, please refer to Determining the Image Format.

After these more general optimisations, let's focus on what was actually done to optimise the fragment shader code itself.