When using a profiler to look into your programs, sometimes it feels like looking behind the stage of magician and suddenly grasping the trick behind the magic… Quite recently, I had an application in front of me, which demanded surprisingly much CPU time. In a nutshell, this application has some heavy computational operations in its core and (primarily) produces a rectangular 2D output image, which is rendered by QPainter to display the results. This output is updated once every few milliseconds and is embedded inside a QtQuick window. The handover of the rendered QImage is done by a harmless looking Q_PROPERTY.
So, I wondered: How big can the impact of handing over a QImage to the QSG renderer be? In particular — as we all know — copying a big chunk of memory is a CPU expensive operation which should be avoided if possible. For getting proper profiling results, I created a simple test application. This application just creates a QtQuick scene with a QQuickPaintedItem derived render object, which updates its output every millisecond (thus renders whenever the render-loop iterates). I use a big output rectangle of 640×640, because I want to focus on the memory copying effect, which is more obvious with bigger outputs.
QQuickPaintedItem::Image as render target for the QQuickPaintedItem object, on my computer I can see a quite constant 30% CPU usage (one core) and the following flamegraph when looking into the process with Perf (
sudo perf record --call-graph dwarf -p $(pidof qmlwithqpainter-experiment) -o perf-qimage.data) and visualizing the result with Hotspot:
Actually, this change is to be expected! We get rid of a quite expensive copy operation that has to be done on every image update. Let’s look into the QQuickPaintedItem documentation for confirmation:
When the render target is a QImage, QPainter first renders into the image then the content is uploaded to the texture. When a QOpenGLFramebufferObject is used, QPainter paints directly onto the texture.
So, what is the story I want to tell? — When facing performance problems inside an application, one can only guess until looking at the problem with a decent profiler (and Perf + Hotspot is an excellent combination for that!). And even then, you have to think about what your application is doing when much of CPU time is lost and ponder whether there are better code paths for your specific situations. In my example, the output still looks the same after the change, but note that I lost all of the fancy anti-aliasing of QImage and now resizing the output became a much more expensive operation.
Hence, for my scenario this change made sense, because the CPU usage drops from 30% to 11% and I do not need to support resizing operations. For other scenarios, this might be different.