Montag, 26. Januar 2009

Fosdem, Mozilla, RadeonHD, Intel

I'll talk a bit about the pipeline at Fosem 09.
I got a 30min slot assigned, no idea what I should talk about half an hour ;)

Mozilla / RepeatPad:
The XRender pipeline has to use RepeatPad for transformed and scaled images, however this is only accalerated by NVidia and Intel for now.
Mozilla now discusses the switch to RepeatPad (eliminates some artifacts and allows billinear scaling), which has already led to a series of bug-reports.
Update: the open-source radeon driver already has experimental RepeatPad accaleration for R100-R300 :)

AMD is currently working on XRender accaleration for their Radeon HD series in a seperate git tree.
Hopefully it will be merged into mainline soon, and in the long term I hope it will make its way into the proprietary Catalyst drivers.

Intel has released 2.6.0 and soon after that 2.6.1. Hopefully their GEM refactoring is finished soon, the driver is now in a fairly bad shape.
Performance is bad and non-GEM related bug fixing seems quite halted.
There's still an off-one-half bug on i915/i865 which makes Nimbus look ugly on my 945GM as well as several performance regressions which affect antialiasing :-/

Donnerstag, 22. Januar 2009

No Software Patents! (again!)

Please subscribe to the Stop Software Patents Petitition:

Nobody needs software patents, except patent trolls.
Writing code should be no crime - but with those thousands of patents its almost impossible to not interfere.

Examples of what can be patented in the EU:
- Selling over a mobile phone network - EP1090494
- Video streaming (segmented video on-demand) - EP633694
- Electronic shopping cart - EP807891

Donnerstag, 8. Januar 2009

fillRect overhead analysis

Ever since working on the pipeline I've been interested where how many cycles are spent in which parts.
Today I profiled fillRect a bit:
Protocol generation: 120 cycles (40%)
Pipeline overhead : 90 cycles (30%)
Locking/synchronization: 90 cycles (30%)
Total: 300 cycles with server compiler (480 cycles with client compiler; ~20.000 cycles interpreter-only)
Protocol generation is writing the X11 protocol into a sun.misc.Unsafe.
Pipeline-Overhead is all the work done to validate pipeline/surface state and decide which code-path to use for the current operation, as well as all the abstraction from Graphics2D up to our XRender Surface.
Locking means aquiring/releasing a ReentrantLock, which guards AWT access.

- 300 cycles is not that well, however we are generating rectangles probably faster the XServer can process it :). After specific optimizations as well as biased locking I guess 175 cycles is realistic, which is not that bad.
- The server-compiler does pretty well, hopefully tiered compilation will be implemented for JDK7. In this case the client-compiler produces 60% slower code :-/
- Locking is expensive, especially on older muti-core processors (like my Core2Duo). Biased locking could really help here, unfourtunatly it has a limitation which make it hard to use for the pipeline.
Furthermore it seems some optimizations don't have any effect when locking is done, but show e.g. 10 cycles improvement when no locking is done.
- The pipeline-overhead could be lower, but its not bad.