Additional Features of PVRTune Complete

The client stream and synchronisation objects are two important concepts to understand when using PVRTune Complete

PVRTune Complete brings extra performance analysis features to PVRTune that are not available in PVRTune Developer. This section will go through all these additional features in detail.

To get the most out of these features, it is important to understand the data streams which are captured by PVRTune Complete. These are the client stream and synchronisation events.

Client stream

The client data stream refers to the timing data generated and emitted by the client part of our graphics driver such as the OpenGL® ES, EGL™, Vulkan® and OpenCL™ modules. The driver emits timing events for a subset of API calls which are then captured by PVRTune.

Comprehensive lists of the various client stream timing events for the different APIs are available in the Captured API Calls section.

There are some functions which are not technically OpenGL ES API calls, but are invoked as a direct result of OpenGL ES calls made by the user application. These functions are a part of the OpenGL ES driver module and are also tracked. These include:

  • TA kick - Called when the OpenGL ES driver submits 3D work, such as Tiler and Renderer tasks, to the hardware to be executed.
  • TQ kick - Called when the OpenGL ES driver manipulates textures/render surfaces, including blits, texture uploads, and mip-map generation. This results in 2D tasks being executed on the GPU.
  • CDM kick - Called when the OpenGL ES driver submits compute work, such as OpenGL ES compute shaders, to the hardware for processing. This results in a compute task being executed by the GPU.

API events generated by the OpenGL ES, EGL, Vulkan, and OpenCL client driver are retrieved and presented in the PVRTune GUI. The piece of work submitted by an application can be tracked from the application level through PowerVR driver services, firmware, and onto the hardware for execution. As a result, it is possible to see how work submitted at the application level directly affects the driver and hardware. It gives a more complete understanding of the impact application-level API calls have on the behaviour and performance of both driver and hardware.

The client stream enables PVRTune Complete to collect other useful information emitted from the client driver, such as:

  • Surface format and size - details about the buffer(s) in use when work is submitted by the client driver to the hardware.
  • Shader information - the bound program that gets executed on the GPU at the time of a glDraw* call.

Synchronisation objects/events

Synchronisation objects are used by the driver to ensure that operations queued by the driver are executed in the proper order. Operations can be blocked by synchronisation objects until a previous operation completes and frees a resource, such as a render target.

PVRTune Complete captures the synchronisation objects emitted by our driver and uses this information to create new timelines called “queues”. Each hardware core such as the tiler, renderer, and compute have an associated queue if synchronisation data is available.

The timeline queues make it possible to see when work is queued by the driver. The operations can be traced from the client drivers to the hardware queue. These timelines will usually contain several “check fail” events followed by a “check-pass” event for the synchronisation object(s) for that task.

  • A failed check means that a dependency was not satisfied, and the operation cannot begin execution on the hardware.
  • A successful check means that all dependencies were met, and the task can begin execution.

In most cases there will be a delay between the driver scheduling the work, and the hardware executing it. This information can help diagnose pipeline bubbles in an application's workload. An example of this could be where work has been queued for a long time but was unable to execute due to a dependency on a resource, and possibly leaving the hardware idle.

In addition to timeline queues, each synchronisation object used by the driver in the current profiling session will have its own separate timeline (not shown by default). This timeline will show the creation/deletion of the object, and when the object is checked and updated.