Flow Control

Flow control in shaders has to be used carefully to avoid performance hits

PowerVR hardware offers full support for flow control in both vertex and fragment shaders without the need to explicitly enable an extension.

  • Static flow control refers to when conditional execution depends on the value of a uniform variable. The same shader execution path is applied to all vertex or fragment instances in a draw call.
  • Dynamic flow control refers to conditional execution based on per-fragment or per-vertex data, such as textures or vertex attributes.

Static flow control can be used to combine many shaders into one big uber-shader. Thorough profiling should be done when taking this approach, as a performance advantage may not be gained. A better solution when an uber-shader is desired is to use pre-processor defines to create separate shaders from one larger shader at build time. This effectively creates many smaller shaders from a single original source file.

Using dynamic branching in a shader has a non-constant overhead that depends on the exact shader code. Dynamic branching is therefore unpredictable in its effect on performance.

The following specific points should be considered:

  • Make use of conditionals to skip unnecessary operations when the condition is met in a significant number of cases.
  • Do not branch to discard (more here).
  • Series5 and Series5XT only: Avoid branching to a texture read as samplers in dynamic branches qualify as dependent texture reads, and will harm performance.

Discard

Applications should avoid the use of the discard operation in the fragment shader, as doing so will not improve performance. This is because some of the benefits of our TBDR architecture will be lost when discard is used, so if possible prefer alpha blending over discard.

Note: This is a general problem across many tile based platforms and applies to many mobile/embedded graphics cores, not just PowerVR devices.

Shader group vote – OpenGL ES

OpenGL ES 3.0 provides a new extension GL_EXT_shader_group_vote. This extension is designed to allow divergent code, such as branching, in shader programs to be optimised. Consider how the graphics core (a SIMD processor) executes shaders which are commonly grouped together, into a set of shader invocations that all must take the same code path. In compute this is known as a local work group.

In the code snippet below, if even a single shader in the local work group diverges from all other active shaders in the local work group with a true condition, then all other threads in local work group must also execute the do_fast_path() code path. This will usually leave most threads in the local work group dormant. Once the function do_fast_path() returns, all active shaders in the local work group must then also execute the do_general_path() code path, meaning the local work group executes both code paths.

if (condition)
    result = do_fast_path(); 
else 
    result = do_general_path();

With the same example but using the allInvocationsEXT function (see below), the allInvocationsEXT function will return the same value for all invocations of the shader in the local work group. This means the group will either execute the do_fast_path() or the do_general_path() but not both paths. It achieves this by computing the Boolean value across the local work group. The implementation uses this result to decide which path to take for all active threads in the local work group.

if (allInvocationEXT(condition)) 
    result = do_fast_path(); 
else 
    result = do_general_path();

The GL_EXT_shader_group_vote extension exposes three new built-in shader functions:

  • bool anyInvocationEXT(bool value) - returns true if value is true for at least one active invocation in the local work group.
  • bool allInvocationsEXT(bool value) - returns true if value is true for all active invocations in the local work group.
  • bool allInvocationsEqualEXT(bool value) - returns true if value is the same for all active invocations in the group.

Further details on this extension can be found on the Khronos® extensions page here.