Again I could not resist benchmarking my pipeline when I remembered the MigLayout benchmark.
I used it a few years ago to proof SWT's poor performance (inherited from GTK+), and I think its a quite good swing benchmark.
These were the results I got:
The X11 pipeline is really fast when using XAA, because if something falls back to software it can directly manipulate the target pixmap using shm pixmaps.
On EXA however pixmaps are stored in VRAM, and shm pixmaps are not supported - the sysprof profile is completly dominated by moving data from/to VRAM.
The real suprise was how well the XRender pipeline does when running on XAA, and how little EXA helps when running nimbus. However Xorg's profile looks quite well and top says a lot of time is spent in the java-process itself (65% java, 30% Xorg), so either I am using JNI too much or some validation/transformation stuff eats up all the cycles.
Ocean on EXA spends most time in gradients and text, at leats text will improve a lot once owen taylor's glyph patches are in Xorg.
I profiled my pipeline running the benchmark and only little time was spent in the pipeline, so I ran the same benchmark on my brother's computer (Sempron64 1.8ghz, Geforce6600, WinXP, ForceWare 9371 driver):
* OpenGL on Windows did not work at all with fbobject=true, and got stuck after 5s with fbobject=false.
* OpenGL on Linux showed artifacts, and became slower each run (6535->33000ms). Latest nvidia binary driver was installed.
The benchmark seems to stress Nimbus in a way it doesn't like, no matter which pipeline was used.
I am totally impressed by the ocean result running on D3D, keep in mind this CPU is ~50% slower than mine.
Another topic I am not happy about is poor MaskFill performance.
I thought Mask-upload (XPutImage) would be the slow part because of using suboptimal uploading paths with quite some overhead and furthermore the x-server has to migrate data from sysmem->vram.
It showed up that composition (with a mask in vram) as well as mask-uploading (+migration) are both almost equal slow/fast.
Antialiasing relies a lot on MaskFill/MaskBlit (Nimbus), however I am not sure how much room for improvement is left - for sure it would help if the no-mask operations could be accumulated in the MaskBuffer, however for this an API-change would be required.