FBA Load

What does this counter show?

This counter represents the average load of the GPU's Frame Buffer Accumulate (FBA) unit. This unit is responsible for a faster type of image access similar to that of image atomics, but which does not return results, therefore the execution can be queued up and executed asynchronously.

Accumulation image operations read from a selected texel, compute a new value as described by the function, and write the new value to the selected texel. The contents of the texel being updated by the accumulate function are guaranteed not to be updated by any other accumulate function between the time the original value is read, and the time the new value is written.

Image load and store operations perform relatively simple exclusive read or write operations on a given texel, respectively. Image accumulation functions perform comparatively more complex operations and accelerate read/modify/write functions acting atomically with regard to any other accumulation function. It is important to realise that accumulation functions perform only associative operations, such as addition, so the order of their execution is not important.

It should be noted that neither frame shaders nor ray shaders have access to generic image load or store operations, and instead have access only to image accumulation operations using the FBA unit.

What does a high value mean?

A high load value indicates that a very large number of frame buffer accumulation operations have been issued to the FBA.

If this value is very high then the first task carried out should be to identify where and when accumulation operations are issued and determine whether there are cases where accumulations can be combined, or the number reduced altogether. When only necessary accumulations remain, you should try the following to reduce the FBA load:

  • Avoid any frame buffer accumulations when the result of any such accumulation would have a negligible effect on the resulting image. This is done by checking the potential accumulation value prior to any such accumulation, and skipping the accumulation if the value is below a predefined threshold value. It should be noted that this technique can also be used to reduce the total number of rays issued by checking the data carried by any ray against a threshold value, and skipping the ray emission if the result would be negligible.

  • Common usage and general expectations are that for each ray emission there will be one or very few frame buffer accumulations, and therefore usage should be tailored towards these recommendations as far as possible.

  • Although it is possible to issue frame buffer accumulates from frame shaders, often it can be more sensible to instead use traditional frame buffer writes from fragment shaders executed at an earlier stage. This will typically execute faster than the comparatively slow frame buffer accumulates. This is more common for hybrid ray traced applications.

  • In some cases, an application may bounce rays multiple times per frame shader iteration with frame buffer accumulates issued at each ray bounce. However, it may be possible instead to defer these accumulations until any such ray terminates i.e. carry the data to be accumulated until a later point/bounce, and only carry out a single accumulate or a reduced number of combined accumulations at the end of the ray chain.