Donnerstag, 8. Januar 2009

fillRect overhead analysis

Ever since working on the pipeline I've been interested where how many cycles are spent in which parts.
Today I profiled fillRect a bit:
Protocol generation: 120 cycles (40%)
Pipeline overhead : 90 cycles (30%)
Locking/synchronization: 90 cycles (30%)
Total: 300 cycles with server compiler (480 cycles with client compiler; ~20.000 cycles interpreter-only)
Protocol generation is writing the X11 protocol into a sun.misc.Unsafe.
Pipeline-Overhead is all the work done to validate pipeline/surface state and decide which code-path to use for the current operation, as well as all the abstraction from Graphics2D up to our XRender Surface.
Locking means aquiring/releasing a ReentrantLock, which guards AWT access.

Conclusions:
- 300 cycles is not that well, however we are generating rectangles probably faster the XServer can process it :). After specific optimizations as well as biased locking I guess 175 cycles is realistic, which is not that bad.
- The server-compiler does pretty well, hopefully tiered compilation will be implemented for JDK7. In this case the client-compiler produces 60% slower code :-/
- Locking is expensive, especially on older muti-core processors (like my Core2Duo). Biased locking could really help here, unfourtunatly it has a limitation which make it hard to use for the pipeline.
Furthermore it seems some optimizations don't have any effect when locking is done, but show e.g. 10 cycles improvement when no locking is done.
- The pipeline-overhead could be lower, but its not bad.

Kommentare:

rkennke hat gesagt…

Quite cool. But if we are generating rectangles faster than the backend can handle, it's all moot, right? I mean, there's no need to optimize something if we are waiting for the backend anyway, no?

JProgrammer hat gesagt…

I think you might be referring to sun.misc.Unsafe not java.misc.Unsafe

Linuxhippy hat gesagt…

JProgrammer: yes of course sun.misc.Unsafe, I somehow mixed that up ;)

rkennke:
On single-cores every cycle consumed by the java-process taken away from the XServer.

On multi-cores the benefit is a lot smaller of course. If we are limited by the XServer there's nothing optimizations on the client can do.
However if the bottleneck is on the java-side, like complex swing drawing code where only e.g. 30% are spent in java2d, faster java2d means more cycles left for that non-java2d-code.

I guess the focus should be on trying to optimize the X11-Commands so the server can process them easily, like rectangle / line batching, which in turn means more overhead on the java2d side ;)