Friday 22 December 2006
Parallelism
This is a huge huge problem because after fifty years of research we still don't have good ways to write parallel software. Threads and locks are very difficult to use correctly, don't work when you combine software components, and don't even scale well in performance. Language-level transactions are one possible answer but we don't have language and runtime support for them in mainstream systems. (At IBM I worked on the design of X10, a programming language designed for highly concurrent and distributed programming that used transactions as one of its building blocks.)
Even if we get language-level transactions, we still have several more problems to deal with. Reworking large legacy codebases to use them will be tons of work, and often impractical. Then there's the problem that many tasks are intrinsically very difficult or impossible to parallelize. In particular many higher-level programming models (such as oh say Javascript on the Web) assume single-threadedness.
For Mozilla, we need to be thinking about how we can exploit multiple cores in the browser to transparently accelerate existing Web content. We also need to be thinking about how we can add transaction support to Javascript to enable Web developers to exploit parallelism --- they don't ask for this today, but it will take a few years to do it right and by then, they'll probably be begging for it.
For open source in general, we need to be thinking about how the open source stack is going to evolve to incorporate more parallelism. If we can't get useful transaction support into C or C++ then two things will happen: first, most performance-sensitive projects will soon have to move away from them except for projects that can afford to invest in hairy threads-and-locks parallelism. Then, when chips with dozens of cores become popular, even those projects will have to migrate. I'm talking mostly about clients here, not servers; servers are easier to parallelize because one server usually serves many independent clients, so you can parallelize over the clients --- and of course people have been doing that for a long time.
I think there's a huge opportunity here for a client software mass extinction followed by Cambrian explosion. There's a new environmental pressure; projects that aren't agile enough to adapt quickly will be replaced by more agile projects, or by entirely new projects, perhaps ones that rewrite from scratch to take advantage of new languages. Exciting times.
Comments
You're wondering how you can do it; maybe you should ask if you should.
Ok, maybe that's a bit harsh, but I wanted to make the point that there's value in limiting applications to a single thread (or at least not using *every* core available).
Intel didn't invent speculative multithreading. Researchers such as Tood Mowry at U of Toronto and CMU did. Speculative multithreading is very cool but the speedups haven't been all that great. I think the hardware support for tracking data dependencies for speculative multithreading will be very useful to assist optimistic explicit transactions, though.
http://img242.imageshack.us/img242/8065/parallelismdk8.gif
- A thread for interface and menu rendering.
- A thread for the text renderind inside every tab.
- A subthread for every image rendering inside every tab.
- A subthread for every flash animation inside every tab (start Flash plugin multiple times?)
...
and so on.
But I assume that separating all those tasks in threads it's a hard work.
(When Intel's CSI bus gets released, both AMD and Intel should have bus architectures that support MOESI, which ought to make things easier to extend beyond just cache coherency, to tracking other facets of data coherency and dependency issues across multiple CPUs as well as cores. That's a few years out still, I think CSI isn't due until 2008, and it's been delayed several times already.)
BTW, there's an interesting short article on Valve's new multi-threaded Source engine at The Tech Report. Apparently the multi-threading middleware is quite flexible and cleverly designed.
http://techreport.com/etc/2006q4/source-multicore/index.x?pg=1
I have found Herb Sutters ideas from the Concur project interesting. He proposes a way to make it really easy to do n-way multithreading, by use of "active" method calls and code sections and "futures" as return values. He also mentions declarative locking, where you declare what data a lock protects and an ordering of the locks, so the compiler can ensure that all lock are taken before data is accessed and that the locks are taken in the correct order. I couldn't find a project web page, but here is a video presentation:
http://www.nwcpp.org/Meetings/2006/09.html
With a decent jit being added to mozilla, would it be posible to move much more of of the code from c++ to javascript? Javascript seems like a more controlled environment to do language innovation and optimizations in.
I don't trust anyone who says they have a silver bullet that makes multithreading "really easy". Futures certainly aren't it. They're a useful concurrency primitive but there's still the huge problem of state shared across threads.
You're right about Javascript.