Freitag, 20. Juni 2008

Java2D uses scanlines intenally to fill transformed shapes, and because those
shapes are used a lot I today did some benchmarking on my laptop (C2D,
i945GM) to investigate which way of processing thode scanlines is most
efficient.

The two things I compared are for rendering 2500 times a 100 pixel wide, 1 pixel high scanline:
  • Draw the scanline as trapezoids to an alpha-mask, and later do one
    composition step using that mask (XRender generates an implicit alpha
    mask internally).
  • Render the scanlines one-by-one, without any mask.
I also tested how much batching solid color fills is worth, to see wether
it would be worth optimizing in this
direction. I compared Fedora8 to Fedora9, because EXA and the
intel-driver were quite in a bad shape in Xorg-1.3.

1.) Fedora9 x86_64,
Xorg-1.5, intel-master with TTM:


Intermediate
Mask Type
Time
A1
(1 bit alpha)
14ms
A8
(8 bit alpha)
4ms
Seperate
rendering*
16ms
* Seperate rendering = Many width/1 independent calls to XRenderComposite
* No mask = XRenderCompositeTrapezoids with masktype None


On this machine rendering to an A8 mask and compositing with that yields
best results.

2.) Fedora 8 i386, Xorg-1.3, Intel-2.1.1, EXA:


Intermediate
Mask Type
Time
A1
(1 bit alpha)
100ms
A8
(8 bit alpha)
120ms
No
mask*
40ms
Seperate
rendering*
13ms

Way
of filling
Time
Batched
Solid FillRetcs
1ms
Batched
Alpha FillRects
14ms
Single
Solid FillRects
8ms
Single
Alpha FillRects
35ms


Conclusions:
The results for composition are quite surprising.
* The mask-based approach performs terrible on Fedora8, although I thought this technique should be quite hardware/driver independent. Most of the time is spent inside libfb.so (arrrg, no symbols), maybe the
driver is completly falling back to software, or mask generation is really that slow.
* The many-and-small area composition approach performs quite similar on both systems.
* On Fedora9 as expected using an A8 intermediate mask yields best results, beeing 4
times faster than rendering many small pieces one-by-one.
* Batching solid color fills seems to speed things up a lot.

I'll try the benchmark on other HW to see wether Xorg-1.3 is the reason for
slow masking (no problem), or if the driver has to be highly tuned (quite bad).
I would prefer high performance across different drivers (and not all GPUs will have highest optimized EXA drivers), instead of peak-performance on some cards.
Hopefully I find some recent live-cd with Xorg-1.5 and Noveau included to test some nvidia hw ;)

Keine Kommentare: