Saturday, 19 May 2007

Status

I've fixed a number of bugs in the new textframe code. It's now very usable; I've been dogfooding it most of this week and some other people have too. I think we're on track for turning it on in the next few weeks. Set MOZ_ENABLE_NEW_TEXT_FRAME = 1 in layout/generic/Makefile.in for a good time. There are still some problems; for example, word selection is a little bit broken.

Much of this week I was looking at white-space handling. I've been cleaning up tricky issues like where we allow line breaks in white-space:pre-wrap text. A big thing in the last two days was handling pre/pre-wrap tabs; our existing code expanded them to a number of space characters based on the number of characters present in the line. CSS 2.1 wants us to advance to tab stops instead, which works better with proportional fonts. I've implemented that, among other fixes, and we should be very compliant with CSS 2.1 white-space now.

One remaining issue is that currently our white-space collapsing pass uses only backward context; for example we collapse a space that has a space before it. We can't however collapse spaces that are followed by, say, a line feed, as support for white-space:pre-line would require; that requires knowledge of the forward context. Similarly we should translate a newline between two CJK characters to a zero-width space, but that also requires forward context in general. (Right now we do this only when both CJK characters are in the same text node.) Backwards context is easy to implement because you can use a little state machine as you traverse the DOM or text frame tree or whatever. Supporting both directions requires something significantly more elaborate. I haven't decided whether it's worth doing it right now. Now would be the easiest time to do it.

I still need to do some performance tests on all three platforms to see what the effect is. Now all the pieces are in place ... in particular the new textrun cache is in. Even with the old text frame that cache sped up Tinderbox pageload performance by 3-10%. Some of that speedup was because the old textrun cache did a lot of PR_IntervalNow calls, and get-current-time calls are particularly slow in VMs because they'll trap to the virtual-machine monitor. But there was real speedup there too.

Vlad and Pav are arriving tomorrow to spend a couple of weeks here. This'll be exciting... First we're going to have to get some more furniture :-). I hope the weather's good! You never know in Auckland :-).

Hiring's been disappointing lately. If anyone reading this knows anyone in NZ who'd be good to work on Firefox, let me know!!!

I switched from Parallels to VMWare Fusion beta 3 because VMWare's support for Linux is so much better. Fusion's working great so far ... except that Windows builds are even slower than they were in Parallels, which was already slow. I just discovered that our "mkdepend.exe" tool that crawls C/C++ code to find the #include dependencies on Windows is incredibly slow and actually eating most of the build time. It looks like it's doing tons of stat() calls that are really slow, perhaps because they're often going to VMWare's virtual filesystem. I plan to take a quick look to see if there's some easy optimization that would help.

On the Chronicle front, there's a steady stream of people downloading the tarball. It's up to 161 downloads; not bad for something that doesn't really do anything useful. In snatches of time at airports and at home, I've gradually been putting an Eclipse UI together; it's actually pretty easy. Writing Java code in Eclipse again is fun.

People were quite interested in Chronicle when I talked about it at the OSQ retreat and before that, with the people gathered for the OOSPLA committee meeting. At the OSQ retreat I was amused to find myself an "industry representative" --- in fact, as the primary "open source" representative. It is nominally an "Open Source Quality" meeting, after all :-). I pushed the message that although all the research on bug finding is great, the truth is that in our domain we have already found plenty of bugs, and our biggest problem is fixing them. I suggested that therefore we need more research on topics such as debugging and performance analysis. I also suggested that the worst code is not necessarily buggy code, but code that is unnecessarily complex. Detecting that would be an interesting new direction for program analysis.

There was a great deal of discussion of parallel programming, now that we have entered the multicore world. More than one person opined that multicore is a mistake we will live to regret --- "we don't know what the right way is to use the transistors, but multicore is definitely wrong". There was general agreement that the state of parallel programming models, languages and tools remains pathetic for general-purpose single-user programs and no breakthrough should be expected. My position is that for regular desktop software to scale to 32 cores by 2011 (as roadmaps predict) we'd have to rewrite everything above the kernel, starting today, using some parallel programming model that doesn't suck. Since that model doesn't exist, it's already too late. Probably we will scale out to a handful of cores, with some opportunistic task or data parallelism, and then hit Amdahl's law, hard. It is probably therefore more fruitful to focus on new kinds of applications which we think we have reasonable ideas for parallelizing. I think virtual worlds (which are not just "games", people!) are a prime candidate. That's a direction I think the PL/software engineering research community should be pushing in.



13 comments:

  1. On the issue of backward versus forward context: I have a suggestion which may or may not be relevant to your particular situation. One possibility, which is fairly fast, fairly simple, and fairly memory-hungry, is to start with a right-to-left pass, and store just enough information at each character to make the left-to-right pass able to know what to do.
    For the two examples given above, you could allocate a new structure with one byte per source character, and fill it in on a right-to-left pass, where one bit in each byte would mean "the next character is a line feed" and another bit would mean "the next character is a CJK character".

    ReplyDelete
  2. Actually, there is a parallel programming model that doesn't suck. Check out Erlang, which uses a pure message passing model, which is easier to use in a large scale program than any model based on shared state, such as threads guarded by semaphores.
    Obviously it's not a silver bullet, because it's still possible to write programs that never utilize the message passing features, and therefore are no more parallelizable than anything else. On the other hand, when properly used a program might consist of dozens or even hundreds of Erlang processes passing messages to each other, which means the program will run faster on a computer with more cores (or a cluster with more computers, whichever you happen to have on your desk.)

    ReplyDelete
  3. Robert O'Callahan19 May 2007 21:55

    Carl: We could do that, but adding an extra pass which is hardly ever needed is not good for performance.
    db48x: We know about Erlang, and message passing in general. We had a couple of people there who work on Microsoft's Singularity project, which heavily uses message passing and tries to improve on it by adding a type system for controlling message ordering. Everyone agreed that eliminating (programmer-visible) shared memory is a good step.
    However, that's not nearly enough to solve the problem.
    For example, for many problems, successful parallelization requires *speculative* parallelism, where you don't know for sure that two computations will not interfere, but they *probably* won't, so you try to run them both in parallel and detect interference dynamically. If interference is detected you roll back some of the computation to avoid it. Optimistic transactions are one form of speculative parallelism. Pure message passing systems don't have an analogue to this.
    This is related to another big problem, composability. Suppose I have a big hash table that I want to provide parallel access to. Ignoring resizing for a moment, I can have one lock on each bucket, or equivalently, I can have one process for each bucket in a message passing system. Now suppose that as a user of that library I want to build on top of it a "swap" operation that's atomic (and performs well), without modifying the library. In a sequential world adding an operation like this is trivial, in a parallel world it's impossible in both the locking and message passing models.

    ReplyDelete
  4. Sure, you can't add a swap operator without modifying the library in a concurrent language, but if you can do it in a procedural language it's only because you're not allowing concurrent access to the hash table. That's not much of a problem, if you ask me.
    Erlang (and most other concurrent systems) allows you to upgrade code while the system is running, which allows you to upgrade your libraries as needed without bringing the system down. Of course, that depends on the library being available for you to legally modify, which isn't always the case.

    ReplyDelete
  5. I like what I see on Mac OS X :-).
    But I have one (two) problems with floated ::first-letter :-(
    1. the whole paragraph simply disappears
    2. when hitting the back button: crash ...
    I'll file bugs as soon as I have minimized my test cases.
    Thanks for all your work on this.

    ReplyDelete
  6. Robert O'Callahan20 May 2007 06:20

    db48x: you can't do the hash table swap in *Erlang*. There are other parallel programming models (transactional memory, for example) where it would be easy. Just pointing out that Erlang (and message passing in general) is not the last word in concurrent programming.
    The problem with saying "we'll just change the library!" is that if you have to change the library whenever an application wants to build something new on top of it, it's going to be very hard to build big complex systems, you're not going get much code reuse, and systems will be brittle and hard to maintain.
    And yes, of course the problem is very easy in sequential languages. That's the point. We're miles away from making parallel programming as "easy" as sequential programming, and if we can't do that, we won't be able to write parallel programs that are as capable as the sequential programs we have today.

    ReplyDelete
  7. Yea, I'm aware that Erlang isn't perfect, it's just what I have ready to hand.
    Anyway, I think the example you've chosen is pretty artificial. Who wants to deal with hash buckets individually? I'd rather be dealing with a single process that I can query, that way I don't care how the hash table is implemented. It can be implemented using a collection of subordinate processes for the buckets if it wants, but regardless I can extend the behavior of that process by writing a new process with the library implementation as a subordinate that implements a swap message, and passes anything else to the subordinate. It's exactly like subclassing a hash implementation in an object oriented language (procedural or otherwise), but without the difficulty of making sure the object remains thread safe. Erlang does require you to remember not to pass out the pids of your subordinate processes, because if you do that you can't rely on having exclusive access to them. That seems like a less difficult task though.
    Still, you're right in saying that concurrent programming isn't as easy as procedural programming, but concurrent programming in a concurrent language is easier than concurrent programming in a language that only gives you threads and semaphores (or locks, or critical sections or whatnot). If concurrency isn't needed, then there's no need to use a concurrent language. Last I checked, grep wasn't multithreaded, and it wouldn't help if it was.

    ReplyDelete
  8. Robert O'Callahan20 May 2007 11:21

    If you arbitrate access to the hash table with a single process, then that process will become a bottleneck and you won't scale to a large number of cores.
    The problem with the multicore world is that individual processors won't get faster. We'll just have more of them. So if you want your programs to get faster, you have to use concurrency.
    grep and similar tools are actually good candidates for parallelization...

    ReplyDelete
  9. To enable faster disk IO in your virtual machines, there is a currently hidden switch to enable buffered IO.
    - Right click on your VM package
    - Open the VMX settings file in a text editor that does Unix Line Feeds by default like TextWrangler or TextMate
    - Add the following line to the end of the file
    aiomgr.buffered = "TRUE"
    - Save the VMX file
    Hope this helps.
    For more hints and help, join the VMware Fusion beta forum at:
    http://www.vmware.com/vmwarestore/newstore/community_login.jsp?followingPage=/community/forum.jspa?forumID=371
    Pat

    ReplyDelete
  10. Robert O'Callahan20 May 2007 21:42

    Pat: thanks for that. I also found a way to speed up mkdepend itself, which I'll blog about in a second...

    ReplyDelete
  11. Yea, the single hash process can become a bottleneck, but so can the the individual bucket processes. As with any type of programming, there are trade offs to be made. Subordinating the bucket processes to a hash process gives you more control, but less potential for concurrency.
    As for grep, people expect it's output to be deterministic, which means that it would have to collect all of it's output and sort it to remain deterministic. Unfortunately, that would decrease it's utility because it'll hold up communication along the pipeline that it's a member of. That converts the concurrent process represented by the pipeline into a sequential process, defeating the purpose.

    ReplyDelete
  12. Robert O'Callahan21 May 2007 07:19

    The individual buckets won't become a bottleneck if your hashtable is working properly (i.e. has enough buckets and a good hash function). Erlang's forcing you to make a tradeoff; other parallel programming models can give you control and scalability in this example. Waving your hands and saying "there are always tradeoffs" is a cop-out.
    As for grep, for most uses it would be fine to give up on determinism. In other cases it would be fine to buffer the output: if you're speeding up grep by factor of 10, and there are only a few matches, it doesn't much matter if you have to delay match processing till grep has finished. This is a silly argument, but you were silly to pick search as an example of something that couldn't be usefully parallelized.

    ReplyDelete
  13. The speculative parallelism you mention above is already implemented in Haskell's STM (Software Transactional Memory) monad.

    ReplyDelete