Eyes Above The Waves

Robert O'Callahan. Christian. Repatriate Kiwi. Hacker.

Friday 22 December 2006

Parallelism

Here's an interesting interview with Hennessey and Patterson, the famous hardware architecture researchers. Nothing particularly surprising or new, but they remind me of the importance of the parallelism revolution that is happening right now. Namely, processors aren't getting much faster anymore; instead, we're all getting multicore chips instead. Software that doesn't take advantage of this parallelism will be slower than software that does.

This is a huge huge problem because after fifty years of research we still don't have good ways to write parallel software. Threads and locks are very difficult to use correctly, don't work when you combine software components, and don't even scale well in performance. Language-level transactions are one possible answer but we don't have language and runtime support for them in mainstream systems. (At IBM I worked on the design of X10, a programming language designed for highly concurrent and distributed programming that used transactions as one of its building blocks.)

Even if we get language-level transactions, we still have several more problems to deal with. Reworking large legacy codebases to use them will be tons of work, and often impractical. Then there's the problem that many tasks are intrinsically very difficult or impossible to parallelize. In particular many higher-level programming models (such as oh say Javascript on the Web) assume single-threadedness.

For Mozilla, we need to be thinking about how we can exploit multiple cores in the browser to transparently accelerate existing Web content. We also need to be thinking about how we can add transaction support to Javascript to enable Web developers to exploit parallelism --- they don't ask for this today, but it will take a few years to do it right and by then, they'll probably be begging for it.

For open source in general, we need to be thinking about how the open source stack is going to evolve to incorporate more parallelism. If we can't get useful transaction support into C or C++ then two things will happen: first, most performance-sensitive projects will soon have to move away from them except for projects that can afford to invest in hairy threads-and-locks parallelism. Then, when chips with dozens of cores become popular, even those projects will have to migrate. I'm talking mostly about clients here, not servers; servers are easier to parallelize because one server usually serves many independent clients, so you can parallelize over the clients --- and of course people have been doing that for a long time.

I think there's a huge opportunity here for a client software mass extinction followed by Cambrian explosion. There's a new environmental pressure; projects that aren't agile enough to adapt quickly will be replaced by more agile projects, or by entirely new projects, perhaps ones that rewrite from scratch to take advantage of new languages. Exciting times.



Comments

Flenser
Please don't parallelize Firefox (or at lease give us the option to turn it off). I'd like to be able to use other applications while I load webpages that thrash the CPU. If you parallelize it then it might thrash more than one core and I'm back in the same boat. For me multi-core means that I'll finally be able to run winamp (with vis), Firefox, a dozen other things and compile apps without having the app I'm currently in slowing down (and therefore slowing me down) because of the others.
You're wondering how you can do it; maybe you should ask if you should.
Ok, maybe that's a bit harsh, but I wanted to make the point that there's value in limiting applications to a single thread (or at least not using *every* core available).
Robert O'Callahan
That's simply a matter of proper process priorities and OS scheduling. If that's done right, having the application parallelized shouldn't hurt anything.
Robert O'Callahan
Another commenter mentioned Mitosis, an Intel project (sorry, the comment got lost because I had a duplicate post in here). Mitosis does speculative multithreading.
Intel didn't invent speculative multithreading. Researchers such as Tood Mowry at U of Toronto and CMU did. Speculative multithreading is very cool but the speedups haven't been all that great. I think the hardware support for tracking data dependencies for speculative multithreading will be very useful to assist optimistic explicit transactions, though.
Justin
This has little to do with the actual content of your post, but I was fairly amused how my feed reader caught and displayed both copies of your post:
http://img242.imageshack.us/img242/8065/parallelismdk8.gif
Stupid
I'm not a programmer, just a simple user (that means I can say something dumb), but I think one simple way (architecturally speaking) would be to assign a thread for the following tasks:
- A thread for interface and menu rendering.
- A thread for the text renderind inside every tab.
- A subthread for every image rendering inside every tab.
- A subthread for every flash animation inside every tab (start Flash plugin multiple times?)
...
and so on.
But I assume that separating all those tasks in threads it's a hard work.
Fredrik
No Mitosis doesn't provide any huge speedups, but it is an interesting actual implementation of the concept of speculative multi-threading. I think it's inevitable that we see some extension to x86 for multi-threading sooner rather than later.
(When Intel's CSI bus gets released, both AMD and Intel should have bus architectures that support MOESI, which ought to make things easier to extend beyond just cache coherency, to tracking other facets of data coherency and dependency issues across multiple CPUs as well as cores. That's a few years out still, I think CSI isn't due until 2008, and it's been delayed several times already.)
BTW, there's an interesting short article on Valve's new multi-threaded Source engine at The Tech Report. Apparently the multi-threading middleware is quite flexible and cleverly designed.
http://techreport.com/etc/2006q4/source-multicore/index.x?pg=1
Robert O'Callahan
Fredrik: unfortunately that doesn't tell us very much other than that Valve is using a mix of task and data parallelism, and they like lock-free synchronization. That's probably going to be the same for any similar application that can afford the investment.
Anders
When computers in a few years will have 32-128 cores it would be insane for Mozilla to not run multi-threaded and therefor only to use a few percent of the cpu. So something has to be done.
I have found Herb Sutters ideas from the Concur project interesting. He proposes a way to make it really easy to do n-way multithreading, by use of "active" method calls and code sections and "futures" as return values. He also mentions declarative locking, where you declare what data a lock protects and an ordering of the locks, so the compiler can ensure that all lock are taken before data is accessed and that the locks are taken in the correct order. I couldn't find a project web page, but here is a video presentation:
http://www.nwcpp.org/Meetings/2006/09.html
With a decent jit being added to mozilla, would it be posible to move much more of of the code from c++ to javascript? Javascript seems like a more controlled environment to do language innovation and optimizations in.
Robert O'Callahan
Declarative locking is interesting, but locking just doesn't scale to a large number of cores in general.
I don't trust anyone who says they have a silver bullet that makes multithreading "really easy". Futures certainly aren't it. They're a useful concurrency primitive but there's still the huge problem of state shared across threads.
You're right about Javascript.
Luke Gedeon
I would love to see each core operate independently and communicate via XML over IP when necessary. I also am in favor of moving some of that horse-power over to the monitor. I talk a little about this at http://luke.gedeon.name/is-it-really-web-30-or-just-25.html
Mike
I suspect that allowing people to write more functional-style code in imperative languages combined with compilers that can "notice" this and automatically multi-thread it will also play a big part. If you are willing to pay a memory overhead, you can often rewrite code in terms of data structure transformations rather than mutating shared state, and replacing a pointer from one tree to another is atomic.