Demystifying Precision#

PowerVR hardware is designed with support for the multiple precision features of graphics APIs such as OpenGL ES and Vulkan. Three precision modifiers are included in the API spec for OpenGL ES 2.0 onwards and Vulkan, namely mediump, highp, and lowp. Lower precision calculations can be performed faster, but need to be used carefully to avoid trouble with visible artefacts being introduced. The best method of arriving at the right precision for a given value is to begin with lowp or mediump for everything (except samplers) then increase the precision of specific variables until the visual output is as desired.

`highp`#

Float variables with the highp precision modifier will be represented as 32 bit floating point (FP32) values, where integer values range from -216 to 216.

This precision should be used for all position calculations, including world, view, and projection matrices, as well as any bone matrices used for skinning where the precision, or range, of mediump is not sufficient. It should also be used for any scalar calculations that use complex built-in functions such as sin, cos, pow, and log.

`mediump`#

Variables declared with the mediump modifier are represented as 16 bit floating point (FP16) values covering the range (-65504.0, 65504.0). The integer values cover the range -210 to 210.

Applications should use FP16 wherever appropriate as it typically offers a performance improvement over FP32, and should be considered wherever FP32 would normally be used. This is as long as the precision is sufficient and the maximum and minimum values will not overflow, as visual artifacts may be introduced.

The improvement in performance experienced in FP16 is due to the dedicated FP16 Sum of Products (SOP) arithmetic pipeline, which can perform two SOP operations in parallel per cycle, theoretically doubling the throughput of floating point operations. The FP16 SOP pipeline is available on most PowerVR Rogue graphics cores, depending on the exact variant.

Some Rogue cores, such as Series6 XT, also provide a FP16 SOP/MAD (Sum of Products, Multiply-Add) arithmetic pipeline. This can perform four SOP/MAD operations in parallel per cycle, again significantly improving performance compared to high precision.

Verify the improvements of using medium precision by using PVRShaderEditor and selecting the appropriate compiler for the target device.

`lowp`#

SGX#

A variable declared with the lowp modifier will use a 10 bit fixed point format on SGX allowing values in the range (-2, 2) to be represented to a precision of 1/256. The integer values are in the range of (28, -28). This precision is useful for representing colours and any data read from low precision textures, such as normals from a normal map. Care must be taken not to overflow the maximum or minimum value of lowp precision, especially with intermediate results.

Rogue#

On PowerVR Rogue devices lowp is represented as a 16 bit floating point value, meaning lowp and mediump have identical representations as far as the hardware is concerned.

Swizzling#

Swizzling is the act of accessing or reordering the components of a vector out of order. Some examples of swizzling can be found next:

a = var.brg;                     // Swizzled - Out of order access
b = vec3(var.g, var.b, var.r);   // Swizzled - Out of order access
c = vec3(vec4);                  // Not swizzled - Dropping a component does not change
                                 // access order
d.gr = a.gr + b.gr               // Not swizzled - This will be optimized to a
                                 // non-swizzled form

Swizzling costs performance on Series5 (lowp only) and Series5XT (all precisions) due to the additional work required to reorder vector components. As PowerVR Rogue is scalar based, swizzling is a significantly cheaper operation.

Attributes#

The per-vertex attributes passed to a vertex shader should use a precision appropriate to the data-type being passed in. For example, highp would not be required for a float whose maximum value never goes above 2 and for which a precision of 1/256 would be acceptable.

Varyings#

Varyings represent the outputs from the vertex shader which are interpolated across a triangle and then fed into the fragment shader. Varyings are significantly cheaper than performing per-fragment operations to calculate data that could have been passed in from a vertex shader via a varying.

Keep the following considerations in mind when using varyings:

Each varying requires additional space in the parameter buffer, and additional processing time to perform interpolation;
Varying outputs are stored in registers. Having too many may introduce register pressure and potentially reduce shader occupancy. This will reduce the maximum number of concurrent shader executions per Unified Shader Core (USC).

Packing varyings#

Packing multiple varyings together, for example packing two Vec2 into a single Vec4, should suffer no performance penalty and will save varyings. Exclusively on PowerVR Series5 and Series5XT, co-ordinate varyings which are packed into the .zw channel of a Vec4 will always be treated as a dependent texture read and should be avoided (see Dependent texture read in Texture Sampling ).

Samplers#

Samplers are used to sample from a texture bound to a certain texture unit. The default precision for sampler variables is lowp, and usually this is good enough.

Two main exceptions exist to the lowp rule:

If the sampler will be used to read from either a depth or float texture, then it should be declared with highp;
If the sampler will be used to read from a half float texture, then it should be declared as mediump.

Uniforms#

Uniform variables represent values that are constant for all vertices or fragments processed as part of a draw call. They should be used to pass data that can be computed once on the CPU, and then not changed for the duration of a draw call. Unlike attributes and varyings, uniform variables may be declared as arrays. Using uniforms is significantly cheaper than using varyings; however keep the following considerations in mind when using uniforms:

A certain number of uniforms (uniform storage varies between graphics cores) can be stored in registers on-chip. Large uniform arrays will be stored in system memory and accessing them comes at a bandwidth and execution time cost.
Redundant uniform updates in between draw calls should be avoided.

Constant calculations#

The PowerVR shader compiler can extract calculations based on constant values (for example, uniforms) from the shader and perform these calculations once per draw call.

Conversion costs#

When performing arithmetic operations on multiple precisions within the same calculation, it is likely that values will have to be packed or unpacked. Packing is the act of taking a higher precision value and placing into a lower precision variable while unpacking is the reverse and involves taking a lower precision value and placing it into a higher precision variable.

Where possible, precisions should be kept the same for an entire calculation as each pack and unpack has a cost associated with it. This cost can be further reduced by writing shaders in such a way that:

All higher precision calculations are performed together, at the top of the shader.
All lower precision calculations are performed at the bottom.

This ensures that variables are not repeatedly packed and unpacked. It also ensures that variables are not all unpacked into highp therefore losing any benefit of using lower precision.

Using fixed point values in an arithmetic operation will result in the graphics core performing a type conversion. This should be avoided as additional cycles will be introduced to the shader.