Dienstag, 10. November 2009

Multi-Threading Jules

I played with the idea of making Jules multi-threaded, by rasterizing the trapezoids using multiple threads.
For large shapes it does improve things quite a bit (on my Core2Duo notebook):



Hey, in one case jules now even beats ductus :)

However it reminded me again how hard it is to write well performing multi-threaded code in real-world.
The outcome is a 2-thread producer/consumer implementation, where the consumer can produce for itself if it has catched up with the producer.
The first implementation (multi-unopt) simply fetched idle tiles from a pool, and added rasterized tiles to another one. And there was a volatile variable for communication. That makes 4 uncontended monitorenter/exit + volatile read/write.
With that approach the synchronization overhead eat up almost any speedup, in some cases I even saw ugly regressions.

I now batch fetch/store from the idle/completed tile lists in order to minimize synchronization costs, as well as let the worker start a bit ahead of the consumer.

I am still not sure wether the whole effort makes much sence, it makes the whole thing quite complex.
After all, it was fun ;)

I wonder how much speedup can be archived by using profile-driven optimizations and -O3 when compiling pixman and jules. I guess arround 10-20%, which could be enough to push threaded Jules in front of ductus.

Kommentare:

Xerxes Rånby hat gesagt…

Thank you for doing this!

I look forward to run Jules on my ARM PC-Z1 NetWalker.
Impressive work!

Cheers and have a great day!
Xerxes

Linuxhippy hat gesagt…

Xerxes, I guess you are the first person actually trying out that stuff :)

Hope it will be useful for you. (and work on ARM)

Thanks, Clemens

cl333r hat gesagt…

Can you please say which renderer (ductus, yours, other) are using the jdk7 nightly builds from sun's site?

http://download.java.net/jdk7/binaries/

If not your solution, when do you plan pushing it into mainline?

Linuxhippy hat gesagt…

cl333r:

Sun's (proprietary) builds use ductus by default. OpenJDK uses pisces.

The problem with pushing this into mainline is, that it requires a private version of cairo - and I am quite sure Sun is not willed to integrate this into the official OpenJDK code. (although I haven't asked them).
My hope is to get into IcedTea as an optional feature some day.

cl333r hat gesagt…

Thanks,
If Cairo alone would add less than (about) 2MB to the JRE I'd certainly go for your solution because Pisces makes for a bad Java experience on a Linux desktop (compared to windows) and adding (about) 2MB to gain such a big performance improvement is a very good trade-off imho.
Besides if the Cairo devs see that Java is about to ship with a custom build of Cairo they're probably much more likely to respond positively to your needs of slightly changing Cairo to release the Unix version of Java from this slight burden.
I think you should try asking Sun and the Cairo devs the corresponding questions, I guess their replies would give you a better clue if you're going into the “right” direction and maybe some of the Cairo folks would even be willing to help you somehow.

Linuxhippy hat gesagt…

cl333r: Currently it adds ~300k on x86.

However only OpenJDK suffers from pisces (the sun bundles include ductus), and OpenJDK is almost enterly distributed by Linux distributors. So I guess it would be a lot easier to ask the distributors, and get the same end-result.

cl333r hat gesagt…

Wow (only) 300K is even better.
It's been over 2 years since Java went open source but almost every Desktop Linux user (including me) still uses to this day Sun's proprietary version of Java for a few reasons but the main one is graphics quality, thus if your solution goes in then the usage of the open source Java version will grow significantly. I hope you manage to push your solution before the next wave of Linux distros happens next year in spring. Btw, regarding Cairo, there's a been a meeting/event on Cairo about a week ago where its developers agreed about commiting serious changes to Cairo to improve video playback with Cairo etc., I hope you heard of it, maybe it's the right time to ask those devs about your changes to Cairo along the way. More info here:
http://www.phoronix.com/scan.php?page=news_item&px=NzcyOA

Endre Stølsvik hat gesagt…

What is that jules, pisces and ductus stuff? Do you have some quick links that could fill me slightly in? Thanks!

Regarding threading - have you followed the concurrent-interest list? The JSR166 "y" stuff? You could maybe just use that directly. Or implement some logic from there. Apparently "work stealing" is a better way to handling sharing of work than some single queue which each thread serves himself from: You quickly divide the work roughly upon threads, then if one thread is empty, he goes to another thread and "steals" work - from the BACK of that thread's queue. Therefore, you typically end up with uncontended and "lock free" distribution of work. Just a thought.

Linuxhippy hat gesagt…

Hi Endre,

> What is that jules, pisces and ductus stuff?
"Pisces" is the open-source replacement for its proprietary counterpart "ductus".
It rasterizes shapes (turns geometric definitions into pixels).
Jules tries to become a replacement for Pisces, because pisces has some performance problems.

> Apparently "work stealing" is a
> better way to handling sharing of work
It is currently implemented in a simplified way, but more or less its work-stealing what it does.
I didn't have a look at JSR166, but I plan to in the long term.

Unfourtuntly it crashes quite mysteriously somewhere in JVM code when multithreading is enabled, so I have to find whats going on before I can enable it by default :/