Mittwoch, 23. Juli 2008

Benchmarking again...

MigLayout Benchmark:

Again I could not resist benchmarking my pipeline when I remembered the MigLayout benchmark.
I used it a few years ago to proof SWT's poor performance (inherited from GTK+), and I think its a quite good swing benchmark.

These were the results I got:

NimbusOcean
XRender/EXA2100ms1000ms
X11/EXA8000ms5800ms
XRender/XAA2800ms750ms
X11/XAA2600ms800ms

The X11 pipeline is really fast when using XAA, because if something falls back to software it can directly manipulate the target pixmap using shm pixmaps.
On EXA however pixmaps are stored in VRAM, and shm pixmaps are not supported - the sysprof profile is completly dominated by moving data from/to VRAM.

The real suprise was how well the XRender pipeline does when running on XAA, and how little EXA helps when running nimbus. However Xorg's profile looks quite well and top says a lot of time is spent in the java-process itself (65% java, 30% Xorg), so either I am using JNI too much or some validation/transformation stuff eats up all the cycles.

Ocean on EXA spends most time in gradients and text, at leats text will improve a lot once owen taylor's glyph patches are in Xorg.


Update:


I profiled my pipeline running the benchmark and only little time was spent in the pipeline, so I ran the same benchmark on my brother's computer (Sempron64 1.8ghz, Geforce6600, WinXP, ForceWare 9371 driver):


NimbusOcean
Linux/X114400ms800ms
Linux/OpenGL6535ms*--------
Windows-D3D4800ms500ms

* OpenGL on Windows did not work at all with fbobject=true, and got stuck after 5s with fbobject=false.
* OpenGL on Linux showed artifacts, and became slower each run (6535->33000ms). Latest nvidia binary driver was installed.

The benchmark seems to stress Nimbus in a way it doesn't like, no matter which pipeline was used.
I am totally impressed by the ocean result running on D3D, keep in mind this CPU is ~50% slower than mine.


MaskFill Performance:
Another topic I am not happy about is poor MaskFill performance.


BezierAnimLineAnim
All175fps150fps
no
mask upload
300fps200fps
no
composition
300fps200fps
nothing750fps440fps

I thought Mask-upload (XPutImage) would be the slow part because of using suboptimal uploading paths with quite some overhead and furthermore the x-server has to migrate data from sysmem->vram.
It showed up that composition (with a mask in vram) as well as mask-uploading (+migration) are both almost equal slow/fast.
Antialiasing relies a lot on MaskFill/MaskBlit (Nimbus), however I am not sure how much room for improvement is left - for sure it would help if the no-mask operations could be accumulated in the MaskBuffer, however for this an API-change would be required.

Kommentare:

Dmitri hat gesagt…

Note that Nimbus uses lots of clever caching tricks, and so improving AA performance may not necessarily give you the improvement you're looking for.

For example, most of the ui elements are rendered into VolatileImages once, and then copied/scaled from those cached images instead of rendering them every time. So by improving AA performance you'd only help the first "cache fill" stage..

Dmitri

Linuxhippy hat gesagt…

For this case it seems most time is spent in nimbus's caching logic, thats why the results seem to be almost the same across all piplines.

I also updated the values with results of the D3D pipeline, good job - results are very impressive :)