The current pipeline does perform a lot better than the old one when running on EXA (the new, default accaleration architecture of Xorg), however the old pipeline running on XAA (almost no accaleration, pixmaps always in sysram) its faster than everything else.
Of course XAA won't count soon, some distributions have already switched to EXA by default for many drivers (Ubuntu, Fedora9), however it looks a bit odd to have an old unaccalerated pipeline which is faster than the new accalerated one ;)
I already have some ideas howto speed up Fills in general (for strokes its not that easy unfortunately), I am quite interested how it will work out. Furthermore it would also solve the problem with extra-alpha.
For fills the approach could be like this:
- Have a mask-pixmap with a fixed size (e.g. 512x512), A8 format
- Render geometry to this pixmap using XRenderFillRectangles, which is itself hw-accalerated and *really* fast
- (apply extra alpha to the mask image if nescessary)
- Composite with the specified Texture- or GradientPaint. (For colors we can directly paint to the surface).
This approach would introduce some tiling (if the shape is larger than the fixed-size mask pixmap), however has quite some benefits:
- The mask is explizit, so we have control over its content (helpful for extra alpha)
- Using always the same mask removes the need for allocating implizit masks every time (but I guess xorg does optimize this anyway)
- Rendering rectangles is really fast, and EXA supports this operation done in HW. Trapezoids are currently rendered in software and then sent down to VRAM.
For MaskFills I will experiement with uploading the image to X itself, it seems the lock/getrasinfo/unlock functions introcude quite some overhead (C->JNI->Java calls, locking and an XSync)
Furthermore ~30% of cpu cycles when doing maskfills are spent in malloc although I don't malloc anything.
I hope I get sysprof working to see where all the cycles are wasted.