Flow Control#

PowerVR hardware offers full support for flow control in both vertex and fragment shaders without the need to explicitly enable an extension.

Static flow control refers to when conditional execution depends on the value of a uniform variable. The same shader execution path is applied to all vertex or fragment instances in a draw call.

Dynamic flow control refers to conditional execution based on per-fragment or per-vertex data, such as textures or vertex attributes.

Static flow control can be used to combine many shaders into one big “uber-shader”. Thorough profiling should be done when taking this approach, as a performance advantage may not be gained. A better solution when an uber-shader is desired is to use pre-processor defines to create separate shaders from one larger shader at build time. This effectively creates many smaller shaders from a single original source file.

Using dynamic branching in a shader has a non-constant overhead that depends on the exact shader code. Dynamic branching is therefore unpredictable in its effect on performance.

The following specific points should be considered:

  • Make use of conditionals to skip unnecessary operations when the condition is met in a significant number of cases.

  • Do not branch to discard (see Do Not Use Discard).

  • On Series5 and Series5XT: Avoid branching to a texture read as samplers in dynamic branches qualify as dependent texture reads, and will harm performance.

Discard#

Applications should avoid the use of the discard operation in the fragment shader, as doing so will not improve performance. This is because some of the benefits of our TBDR architecture will be lost when discard is used, so if possible prefer alpha blending over discard.

Note

This is a general problem across many tile-based platforms and applies to many mobile/embedded graphics cores, not just PowerVR devices.

Shader group vote - OpenGL ES#

OpenGL ES 3.0 provides a new extension GL-EXT-shader-group-vote. This extension is designed to allow divergent code, such as branching, in shader programs to be optimised. Consider how the graphics core (a SIMD processor) executes shaders which are commonly grouped together, into a set of shader invocations that all must take the same code path. In compute this is known as a local work group.

In the code snippet below, if even a single shader in the local work group diverges from all other active shaders in the local work group with a true condition, then all other threads in local work group must also execute the do-fast-path() code path. This will usually leave most threads in the local work group dormant. Once the function do-fast-path() returns, all active shaders in the local work group must then also execute the do-general-path() code path, meaning the local work group executes both code paths.

if (condition)
    result = do-fast-path();
else
    result = do-general-path();

With the same example but using the allInvocationsEXT function (see below), the allInvocationsEXT function will return the same value for all invocations of the shader in the local work group. This means the group will either execute the do-fast-path() or the do-general-path() but not both paths. It achieves this by computing the Boolean value across the local work group. The implementation uses this result to decide which path to take for all active threads in the local work group.

if (allInvocationEXT(condition))
    result = do-fast-path();
else
    result = do-general-path();

The GL-EXT-shader-group-vote extension exposes three new built-in shader functions:

  • bool anyInvocationEXT(bool value) - returns true if value is true for at least one active invocation in the local work group.

  • bool allInvocationsEXT(bool value) - returns true if value is true for all active invocations in the local work group.

  • bool allInvocationsEqualEXT(bool value) - returns true if value is the same for all active invocations in the group.

More details about this extension can be found on its Khronos extensions page.