# Demystifying Precision

The choice of shader variable precision has to balance calculation speed and visual quality.

PowerVR hardware is designed with support for the multiple precision features of graphics
APIs such as OpenGL ES and Vulkan. Three precision modifiers are included in the API spec
for OpenGL ES 2.0 onwards and Vulkan, namely `mediump`

,
`highp`

, and `lowp`

. Lower precision calculations can be
performed faster, but need to be used carefully to avoid trouble with visible artefacts
being introduced. The best method of arriving at the right precision for a given value is
to begin with `lowp`

or `mediump`

for everything (except
samplers) then increase the precision of specific variables until the visual output is as
desired.

## Highp

Float variables with the `highp`

precision modifier will be
represented as 32 bit floating point (`FP32`

) values, where integer
values range from 2^{31}-1 to -2^{31}.

This precision should be used for all position calculations, including world, view, and
projection matrices, as well as any bone matrices used for skinning where the precision,
or range, of `mediump`

is not sufficient. It should also be used for any
scalar calculations that use complex built-in functions such as `sin`

,
`cos`

, `pow`

, and `log`

.

## Mediump

Variables declared with the `mediump`

modifier are represented as 16
bit floating point (`FP16`

) values covering the range [-65504.0,
65504.0]. The integer values cover the range
[2^{15}-1,
-2^{15}].

Applications should use `FP16`

wherever appropriate as it typically offers
a performance improvement over `FP32`

, and should be considered wherever
`FP32`

would normally be used. This is as long as the precision is
sufficient and the maximum and minimum values will not overflow, as visual artefacts may
be introduced.

Using medium precision (`FP16`

) in shaders can result in a significant
improvement in performance over high precision (`FP32`

). This is due to the
dedicated FP16 Sum of Products (SOP) arithmetic pipeline, which can perform two SOP
operations in parallel per cycle, theoretically doubling the throughput of floating point
operations. The FP16 SOP pipeline is available on most PowerVR Rogue graphics cores –
depending on the exact variant.

Some Rogue cores, such as Series6 XT, also provide a FP16 SOP/MAD (Sum of Products, Multiply-Add) arithmetic pipeline. This can perform four SOP/MAD operations in parallel per cycle, again significantly improving performance compared to high precision.

Verify the improvements of using medium precision by opening the shader in PVRShaderEditor and selecting the appropriate compiler for the target device.

## Lowp

**SGX**

A variable declared with the `lowp`

modifier will use a 10 bit fixed
point format on SGX allowing values in the range [-2, 2] to be represented to a
precision of 1/256. The integer values are in the range of [2^{9} -1,
-2^{9}]. This precision is useful for representing colours and any data
read from low precision textures, such as normals from a normal map. Care must be
taken not to overflow the maximum or minimum value of `lowp`

precision, especially with intermediate results.

**Rogue**

On PowerVR Rogue devices lowp is represented as a 16 bit floating point value,
meaning `lowp`

and `mediump`

have identical
representations as far as the hardware is concerned.

## Swizzling

Swizzling is the act of accessing or reordering the components of a vector out of order. Some examples of swizzling can be found next:

a = var.brg;// Swizzled - Out of order accessb = vec3(var.g, var.b, var.r);// Swizzled - Out of order accessc = vec3(vec4);// Not swizzled - Dropping a component does not change// access orderd.gr = a.gr + b.gr// Not swizzled - This will be optimized to a// non-swizzled form

Swizzling costs performance on Series5 (lowp only) and Series5XT (all precisions) due to the additional work required to reorder vector components. As PowerVR Rogue is scalar based, swizzling is a significantly cheaper operation.

## Attributes

The per-vertex attributes passed to a vertex shader should use a precision appropriate to
the data-type being passed in. For example, `highp`

would not be required
for a float whose maximum value never goes above 2 and for which a precision of 1/256
would be acceptable.

## Varyings

Varyings represent the outputs from the vertex shader which are interpolated across a triangle and then fed into the fragment shader. Varyings are significantly cheaper than performing per-fragment operations to calculate data that could have been passed in from a vertex shader via a varying.

Keep the following considerations in mind when using varyings:

- Each varying requires additional space in the parameter buffer, and additional processing time to perform interpolation.
- Varying outputs are stored in registers. Having too many may introduce register pressure and potentially reduce shader occupancy. This will reduce the maximum number of concurrent shader executions per Unified Shader Core (USC).

## Packing varyings

Packing multiple varyings together, for example packing two `Vec2`

into a
single `Vec4`

, should suffer no performance penalty and will save varyings.
Exclusively on PowerVR Series5 and Series5XT, co-ordinate varyings which are packed into
the `.zw`

channel of a `Vec4`

will always be treated as a
dependent texture read and should be avoided (see Dependent Texture Read).

## Samplers

Samplers are used to sample from a texture bound to a certain texture unit. The default
precision for sampler variables is `lowp`

, and usually this is good enough.

Two main exceptions exist to the `lowp`

rule:

- If the sampler will be used to read from either a depth or float texture, then
it should be declared with
`highp`

. - If the sampler will be used to read from a half float texture, then it should be
declared as
`mediump`

.

## Uniforms

Uniform variables represent values that are constant for all vertices or fragments processed as part of a draw call. They should be used to pass data that can be computed once on the CPU, and then not changed for the duration of a draw call. Unlike attributes and varyings, uniform variables may be declared as arrays. Using uniforms is significantly cheaper than using varyings; however keep the following considerations in mind when using uniforms:

- A certain number of uniforms (uniform storage varies between graphics cores) can be stored in registers on-chip. Large uniform arrays will be stored in system memory and accessing them comes at a bandwidth and execution time cost.
- Redundant uniform updates in between draw calls should be avoided.

## Constant calculations

The PowerVR shader compiler can extract calculations based on constant values (for example uniforms) from the shader and perform these calculations once per draw call.

## Conversion costs

When performing arithmetic operations on multiple precisions within the same calculation, it is likely that values will have to be packed or unpacked. Packing is the act of taking a higher precision value and placing into a lower precision variable while unpacking is the reverse and involves taking a lower precision value and placing it into a higher precision variable.

Where possible, precisions should be kept the same for an entire calculation as each pack and unpack has a cost associated with it. This cost can be further reduced by writing shaders in such a way that:

- All higher precision calculations are performed together, at the top of the shader.
- All lower precision calculations are performed at the bottom.

This ensures that variables are not repeatedly packed and unpacked. It also ensures that
variables are not all unpacked into `highp`

therefore losing any benefit of
using lower precision.

Using fixed point values in an arithmetic operation will result in the graphics core performing a type conversion. This should be avoided as additional cycles will be introduced to the shader.