Synchronisation

Understanding synchronisation in relation to performance

The most efficient method for hardware to schedule tasks is vertex processing executing in parallel to fragment tasks. To achieve this, the application should aim to remove functions which cause synchronisation between the CPU and graphics core wherever possible. In OpenGL ES, this includes synchronisation functions such as glFlush, glReadPixels, glFinish, eglClientWaitSync, glClientWaitSync, and glWaitSync.

  • On PowerVR hardware, calling the function glFlush flushes (kicks) all outstanding work on render targets and kicks any outstanding tiler work. In practice though when using double-buffering there is no outstanding work.
  • On PowerVR hardware, calling the function glFinish flushes (kicks) all outstanding work on render targets in a context, as defined by the OpenGL ES specification. glFinish might wait for these operations to complete.
  • On PowerVR hardware, calling the function eglClientWaitSync waits by blocking the calling thread, or until x nanoseconds have passed for the specified sync object to be signalled. In other words, it waits for the work to be completed.
  • Calling the function glClientWaitSync results in similar behaviour to calling the function eglClientWaitSync.
  • Calling the function glWaitSync results in similar behaviour to calling the function glClientWaitSync. glWaitSyncreturns immediately so the application can continue processing, but causes the GL server to block until the sync object is signalled.

Multithreading in OpenGL ES

Synchronisation between OpenGL ES threads is done by eglMakeCurrent. It performs the following:

  • Binds the supplied context to the current rendering thread and the supplied draw/read surfaces.
  • If the calling thread already has the current rendering context, then all outstanding operations are flushed and the context is marked as no longer current.
  • If the draw and read parameters are set to EGL_NO_SURFACE and context is set to EGL_NO_CONTEXT, then the current context is released from the calling thread without assigning it to a new one.

Usually the driver must flush all outstanding operations unless the currently bound context is released and then rebound. In this latter case, all outstanding operations have already been kicked. The driver has to wait for operations to finish if the context/surface pairs are broken up and paired up with a different context/surface. For example, surface kept/context changed, or context kept/surface changed. In the case of releasing the current context and surfaces without assigning a new one, the driver must flush all outstanding operations but does not need to wait for them. Therefore, calls to eglMakeCurrent should be kept to a minimum.

Using multi-threaded rendering usually has no performance benefits, and sometimes it can lead to worse performance. For example, the worst use case is to frequently bind a single graphics context to different threads using eglMakeCurrent. In this case, the API calls have the same cost as a single threaded render as the API call submission is serialised. However, there is the additional overhead of the context switch, which means that performance will be less optimal than a single threaded renderer.

For the best possible performance, rendering threads should be created at start up. A primary thread should be used for all rendering. Additional threads created with a shared context should only be used for shader compilation and buffer data upload. The number of background threads should be kept to a minimum, preferably one thread per-CPU core. Creating threads in excess will lead to unmaintainable, hard to debug code.