The results are quite pleasing:

At least for large operations Jules is now really close to ductus, although its burning significantly more cpu cycles using two threads.
I am still not sure the whole multi-threading is really a good idea, however the fill optimization mentioned applies to the single-threaded version alteady released as well :)
