Using glClears and glColorMask on PowerVR

Partial clears result in overdraws so should be avoided.

An application must avoid a partial clear (partial colour mask) at the start of a frame for two important reasons:

  1. The previous frame must be read in. This is performed by a full screen primitive reading it in as a texture.
  2. This texture must be masked out by the partial clear, which is done by submitting another full screen primitive as a blend.

This will result in two overdraws before work on the frame begins. If the colour mask is changed to full and glClear, this counts as a state change for the colour mask. A state change requires a flush to be performed on the tile accelerator, and the clear becomes another full screen primitive. This adds the second overdraw.

In the case of one full clear (no partial colour masks) at the start of frame, the fast clear path is followed. This marks the whole frame as a set colour and does nothing, so no full screen primitive is required, resulting in no pixels being drawn at all.

Note: PVRTrace GUI emulates this behaviour.

Invalidating frame buffer attachments

There is a way to prevent unnecessary memory transfers when rendering to a frame buffer object using the OpenGL ES API. The application should invalidate frame buffers using the function glInvalidateFramebuffer, for example GL_DEPTH_ATTATCHMENT or GL_STENCIL_ATTACHMENT. Calling this function tells the driver to discard the contents of the specified frame buffer attachments, and therefore the driver does not need to store the contents of the frame buffer attachments into system memory. This can save huge amounts of memory bandwidth and improve performance significantly, as by default OpenGL ES will preserve frame buffer attachments.

When developing for PowerVR, it is highly recommended to invalidate depth and stencil buffers at all times. One exception to this rule is when swapping buffers using eglSwapBuffers, as the driver automatically discards the backbuffer's depth and stencil buffers. Invalidating buffers usually needs to happen before a frame buffer is sent for processing, such as when switching frame buffer objects. However, in certain cases such as when using glFenceSync (which causes a CPU-GPU sync point in the command stream) it is necessary to add extra invalidation to avoid depth load/store to main memory. Ensure no unnecessary depth load/store operations happen by observing PVRTune's "Z load/store" counter.

An application should call glClear specifying the buffers to clear - for example GL_DEPTH_BUFFER_BIT or GL_STENCIL_BUFFER_BIT. Calling this function will tell the driver that it does not need to load the contents of the attachments specified from system memory, again saving a huge amount of memory bandwidth.