Exploiting the SOP/MAD FP16 Pipeline#

The PowerVR Rogue architecture has a powerful FP16 pipeline optimised for common graphics operations. This section describes how to take advantage of this.

Note

Converting the inputs to FP16 and then converting the output to FP32 is free.

With SOP/MAD there are a number of options for execution in one cycle:

  • Two SOPs

  • Two MADs

  • One MAD and one SOP

  • Four FP16 MADs.

Here are some examples.

Executing four MADs in one cycle:

fragColor.x = t.x * t.y + t.z;
fragColor.y = t.y * t.z + t.w;
fragColor.z = t.z * t.w + t.x;
fragColor.w = t.w * t.x + t.y;
{sopmad, sopmad, sopmad, sopmad}

SOP with a choice of an operation between the result of the multiplies:

fragColor.z = t.y * t.z OP t.w * t.x;
fragColor.w = t.x * t.y OP t.z * t.w;

where OP can be either an addition, a subtraction, a min() or a max():

fragColor.z = t.y * t.z + t.w * t.x;
fragColor.z = t.y * t.z - t.w * t.x;
fragColor.z = min(t.y * t.z, t.w * t.x);
fragColor.z = max(t.y * t.z, t.w * t.x);

It is possible to apply either a negate, an abs(), or a clamp() (saturate) to all of the inputs:

fragColor.z = -t.y * abs(t.z) + clamp(t.w, 0.0, 1.0) * -t.x;

Finally, it is also possible to apply a clamp() (saturate) to the end result:

fragColor.z = clamp(t.y * t.z OP t.w * t.x, 0.0, 1.0);
fragColor.z = clamp(t.y * t.z + t.w, 0.0, 1.0);

After applying all this knowledge, the example below shows off the power of this pipeline by using everything in one cycle:

// one cycle
mediump vec4 fp16 = t;
highp vec4 res;
res.x = clamp(min(-fp16.y * abs(fp16.z), clamp(fp16.w, 0.0, 1.0) * abs(fp16.x)), 0.0, 1.0);
res.y = clamp(abs(fp16.w) * -fp16.z + clamp(fp16.x, 0.0, 1.0), 0.0, 1.0);
fragColor = res;
{sop, sop}