Abs, Neg, and Saturate¶
On PowerVR architecture, it is essential to use modifiers such as abs()
, neg()
, and clamp(…, 0.0, 1.0)
(also known as saturate()
) - being free in certain cases.
abs()
and neg()
are free if they are used on an input to an operation, in which case they are turned into a free modifier by the compiler. However, saturate()
turns into a free modifier when used on the output of an operation.
Note
Complex and sampling/interpolation instructions are exceptions to this rule. saturate()
is not free when used on a texture sampling output, or on a complex instruction output. When these functions are not used accordingly, they may introduce additional MOV
instructions which may increase the cycle count of the shaders.
It is also beneficial to use clamp(…, 0.0, 1.0)
instead of min(…, 1.0)
and max(…, 0.0)
. This changes a test instruction into a saturate modifier:
fragColor.x = abs(t.x * t.y); // two cycles
{sop, sop}
{mov, mov, mov}
-->
fragColor.x = abs(t.x) * abs(t.y); // one cycle
{sop, sop}
fragColor.x = -dot(t.xyz, t.yzx); // three cycles
{sop, sop, sopmov}
{sop, sop}
{mov, mov, mov}
-->
fragColor.x = dot(-t.xyz, t.yzx); // two cycles
{sop, sop, sopmov}
{sop, sop}
fragColor.x = 1.0 - clamp(t.x, 0.0, 1.0); // two cycles
{sop, sop, sopmov}
{sop, sop}
-->
fragColor.x = clamp(1.0 - t.x, 0.0, 1.0); // one cycle
{sop, sop}
fragColor.x = min(dot(t, t), 1.0) > 0.5 ? t.x : t.y; // five cycles
{sop, sop, sopmov}
{sop, sop}
{mov, fmad, tstg, mov}
{mov, mov, pck, tstg, mov}
{mov, mov, tstz, mov}
-->
fragColor.x = clamp(dot(t, t), 0.0, 1.0) > 0.5 ? t.x : t.y; // four cycles
{sop, sop, sopmov}
{sop, sop}
{fmad, mov, pck, tstg, mov}
{mov, mov, tstz, mov}
// Intructions on Volcanic:
fragColor.x = abs(t.x * t.y);
mul i0, i1, i0
mov r0, i0.abs // use the .abs instruction modifier
-->
fragColor.x = abs(t.x) * abs(t.y); // On Volcanic the operations are the same thanks to the .abs instruction modifier
mul i0, i1, i0
mov r0, i0.abs
fragColor.x = -dot(t.xyz, t.yzx); // On Volcanic fuse multiply add helps reduce instruction count
{mul}
{fma}
{fma}
{mov}
-->
fragColor.x = dot(-t.xyz, t.yzx);
{mul}
{fma}
{fma}
{mov}
fragColor.x = 1.0 - clamp(t.x, 0.0, 1.0); // On Volcanic .sat and .neg are used
mov sl0.sat, i0
add r0, 1f (sc1), sl0.neg
-->
fragColor.x = clamp(1.0 - t.x, 0.0, 1.0); // On Volcanic again .neg and .sat are used
add r0.sat, 1f (sc1), i0.neg
mov r2, i0
fragColor.x = min(dot(t, t), 1.0) > 0.5 ? t.x : t.y;
{mul}
{mul}
{fma}
{fma}
{fma}
p0 = 1f (sc1) <min i0
{movc}
p0 = i0 > 0.5f (sc42)
{movc}
-->
fragColor.x = clamp(dot(t, t), 0.0, 1.0) > 0.5 ? t.x : t.y;
{mul}
{mul}
{fma}
{fma}
{fma}
p0 = i0 > 0.5f (sc42)
{movc}
However, it is sensible to be wary of complex functions, as they are translated into multiple operations. Therefore in this case it matters where the modifiers are placed.
For example, normalize()
is broken down into:
vec3 normalize( vec3 v )
{
return v * inverssqrt( dot( v, v ) );
}
In this case it is best to negate one of the inputs of the final multiplication rather than the inputs in all cases, or create a temporary negated input:
fragColor.xyz = -normalize(t.xyz); // six cycles
{fmul, mov}
{fmad, mov}
{fmad, mov}
{frsq}
{fmul, fmul, mov, mov}
{fmul, mov}
-->
fragColor.xyz = normalize(-t.xyz); // seven cycles
{mov, mov, mov}
{fmul, mov}
{fmad, mov}
{fmad, mov}
{frsq}
{fmul, fmul, mov, mov}
{fmul, mov}
// Intructions on Volcanic:
fragColor.xyz = -normalize(t.xyz);
{mul}
{fma}
{fma}
{rsq}
{mul}
{mul}
{mul}
{mul}
-->
fragColor.xyz = normalize(-t.xyz);
{mul}
{fma}
{fma}
{rsq}
{mul}
{mul}
{mul}
{mul}