Monday, 27 October 2008

American Tidbits

It's been a crazy few days. I pulled into Boston late Wednesday night. On Thursday I hung out at MIT. I visited my friend's quantum computation lab; lasers and liquid helium make science so much more interesting. I gave a talk about Chronicle and Chronomancer to a pretty good-sized CSAIL audience. I had the honour of being hassled by Richard Stallman for suggesting that there was synergy between Chronicle and VMWare's record-and-replay features. (VMWare, as non-free software, is apparently never the solution to any problem.)

On Friday morning I took the train to Stamford and then visited the IBM Hawthorne lab to talk about the future of the open Web platform. My talk was too long so I sped up and skipped the demos (contrary to my own point that visual gratification is the driving force behind platform evolution). Still, it went well and I enjoyed catching up with a lot of my old colleagues.

On Saturday I was at another friend's wedding. It was too much fun hailing friends I hadn't seen for years and watching the multi-second transition from unrecognition, to recognition, to shock and exclamation. I left the party around 3pm and arrived at my hotel in Mountain View 13 hours later.

When I'm in the Bay Area I get my Sunday fix at Home Of Christ 5. I go because a few very good friends go there, but also because "Home Of Christ Five" is the coolest name for a church ever (I'm not sure why). It's a very Silicon Valley church; they meet in a converted office building in an industrial park in Cupertino, right next to an Apple satellite. And this morning, the pastor compared the Christian struggle with sin to Boot Camp.

[Irony: a very attractive woman had just sat down next to me, and I was trying very hard to ignore this fact, when the pastor said "Now, turn to the person next to you and ask them if they struggle with sin!" You gotta be kidding me, Lord!]

Anyway, the HOC5 congregation is very friendly to newcomers. I know this because I only go there once or twice a year so naturally no-one ever remembers me and they're very nice every time :-).

I've watched some TV at times. The election coverage is appalling. Most of the "commentary" is clearly pushing one candidate or the other. Most of it's negative. I watched for hours and learned nothing about any significant differences between McCain and Obama's actual proposed policies. Most reporting is actually meta-news on the campaign itself, or even meta-meta-news on the media's coverage of the campaign. The worst is when pundits eat up screen time bemoaning the lack of meaningful coverage --- HELLO! Even the comic coverage isn't funny.

Another thing I noticed is that the news shows have so much animated rubbish --- scrolling tickers, bouncing icons, rotating stars. Larry King even has periodic full-screen zooming stars take over the screen, blotting out the actual picture, occurring seemingly at random while people are talking. It's impossible to concentrate on the actual content of the show (such as it is). What is the purpose of this?

Finally, in-flight movie summary:

  • The Happening OK.
  • The Forbidden Kingdom OK, but what a waste. Lose the white kid.
  • Get Smart OK.

Thank goodness I didn't pay for those. Better luck on the way home, I hope.

Tuesday, 21 October 2008

The Tragedy Of Naive Software Development

A friend of mine is doing a graduate project in geography and statistics. He's trying to do some kind of stochastic simulation of populations. His advisor suggested he get hold of another academic's simulator and adapt it for this project.

The problem is, the software is a disaster. It's a festering pile of copy-paste coding. There's lots of use of magic numbers instead of symbolic constants ("if (x == 27) y = 35;"). The names are all wrong, it's full of parallel arrays instead of records, there are no abstractions, there are no comments except for code that is commented out, half the code that isn't commented out is dead, and so on.

It gets worse. These people don't know how to use source control, which is why the comment code out so they can get it back if they need it. No-one told them about automated tests. They just make some changes, run the program (sometimes), and hope the output still looks OK.

This probably isn't anyone's fault. As far as I know, this was written by someone who had to get a job done quickly who had no training and little experience. But I think this is not uncommon. I know other people who did research in, say, aeronautics but spent most of their time grappling with gcc and gdb. That is a colossal waste of resources.

What's the solution? Obviously anyone who is likely to depend on programming to get their project done needs to take some good programming classes, just as I'd need to take classes before anyone let me near a chemistry or biology lab. This means that someone would actually have to teach good but not all-consuming programming classes, which is pretty hard to do. But I think it's getting easier, because these days we have more best practices and rules of thumb that aren't bogus enterprise software process management --- principles that most people, even hardcore hackers, will agree on. (A side benefit of forcing people into those classes is that maybe some will discover they really like programming and have the epiphany that blood and gears will pass away, but software is all.)

There is some good news in this story. This disaster is written in Java, which is no panacea but at least the nastiest sorts of errors are off-limits. The horror of this program incarnated in full memory-corrupting C glory is too awful to contemplate. I'm also interested to see that Eclipse's Java environment is really helping amateur programmers. The always-instant, inline compiler error redness means that wrestling with compiler errors is not a conscious part of the development process. We are making progress. I would love to see inline marking of dead code, though.

Monday, 20 October 2008

October Travel

On Wednesday I'm taking off to the US for about 10 days. First I plan to visit Boston to see a few friends and give a talk at MIT about Chronicle on Thursday. Then I'm heading to New York on Friday where I'll give a talk at IBM Research about Web stuff (3pm, 1S-F40). On Saturday I'm at a friend's wedding. On Saturday night I fly back to California for a platform work week. Hopefully that week I'll also be able to attend the WHATWG social event in Mountain View.

It's going to be somewhat tiring and I probably won't be very responsive online until I get to California, but I should be quite responsive from then on --- especially if you manage to corner me in meatspace!

Sunday, 19 October 2008


Whenever content or style changes, Gecko has to ensure that the necessary regions of the window are repainted. This is generally done by calling nsIFrame::Invalidate to request the repainting of a rectangle relative to a particular frame. Each of these operations is nontrivial; these rectangles have to be translated up the frame tree into window coordinate space, which is tricky if there's an ancestor element using CSS transforms, or there's an SVG foreignObject ancestor. Ancestor with SVG filter effects can even cause the invalidation area to grow in tricky ways. I've always been surprised that Invalidate doesn't show up in profiles very often.

Worse than the performance issue, though, is that these Invalidate calls are smeared through layout and consequently there are lots of bugs --- mostly where we invalidate too little and leave "rendering turds" in windows, but also where we invalidate too much and do unnecessary repainting that slows things down. Part of the problem is that Gecko's invariants about who is responsible for invalidating what are complex and, in some cases, just plain wrong. However the problem is also fundamental: invalidation is in some sense duplication of information and code, because we already have code to paint, and in principle you could derive what needs to be invalidated from that code.

So, I've been tossing around the idea of doing just that. We already create frame display lists representing what needs to be rendered in a window. So in theory we can just keep around a copy of the display list for the window; whenever we need to repaint we can just create a new display list for the window, diff it against the old display list to see what's changed, and repaint that area.

There are a few problems that make it not so easy. First, we need to know when we may need to repaint the window. In most cases that's fairly easy, we should do a repaint after every reflow or style change with a "repaint" hint. A few other cases, such as animated images, would need to signal their repainting needs explicitly. We'd have to deal with maintaining the "currently visible" display list in the face of content it refers to being deleted. We'd also need to update that display list to take account of scrolling.

The big question is performance. Display list construction is typically very cheap, but in pathological cases (when huge numbers of elements are visible at once), it can be slow, so small visual changes to such pages could get significantly slower than they are now. On the other hand, when most of the page area is changing this scheme should be faster than what we today, because the costs of invalidation will go away and we have to build a display list for the whole window at each paint anyway.

Another option is to do some kind of hybrid scheme where we make a little effort to keep track of what's changed --- perhaps just the frame subtree that was affected, possibly via dirty bits in the frames themselves --- and use that to bound the display list (re)construction.

Tuesday, 14 October 2008

SVG Bling Update

For those who don't follow the Web-Tech blog --- you should. But anyway, support for SVG filter, clip-path and mask on non-SVG content landed on Gecko trunk a while ago and is in Firefox 3.1 beta 1. Also, I've proposed these extensions to the SVG WG for standardization.

Even more exciting is that Boris Zbarsky did an awesome job of implementing external document references --- I'm not sure if that made beta 1, but it will definitely be in beta 2. This means that all code that uses nsReferencedElement to track which element is referenced by a given URI/fragment-ID now automatically supports referring to external resource documents --- i.e. URIs of the form foobar.xml#abc. And Robert Longson has done a great job of migrating our last remaining SVG URI-ref users to use nsReferencedElement --- that is, markers and textPath --- so as of today, all the places where we support SVG referring to elements by ID support referring to elements in external documents as well as the current document. (It also means they're all "live for ID changes" and safe to use with incremental loading of SVG documents.)

The combination of these features is particularly cool because it means you can now apply SVG filter/clip-path/mask in regular HTML (non-XHTML) documents by placing the effect definitions in an external SVG XML file.

We're pretty much done for new features in Gecko 1.9.1 at this point. Looking forward post Gecko 1.9.1, we will be able to build on the external resource document loader to support SVG fonts (including via CSS @font-face) and SVG images (for CSS background-image etc, and HTML <img>). They should be a top priority for Gecko 1.9.2 or whatever it ends up being called.

At this point most of my "bling branch" has landed, except for two features: SVG paint servers (gradients and patterns) for non-SVG content, via CSS background-image, and the "use any element as a CSS background-image" feature. I'm not sure what to do with them. The former probably should land at some point, but it's not a high priority for me at the moment --- maybe I'll roll it into SVG background-image support, since they're closely related. For the latter, my current thinking is that some uses are adequately served with a CSS background-image referencing an SVG pattern containing a <foreignObject>, and other uses really demand an API that lets you specify a particular DOM node to render (e.g. to mirror a particular element in a particular IFRAME).

For that case, I think the way to go is to to create a new element --- some sort of viewPort element that acts like a replaced element and renders the content of some other element. It would have an attribute href that lets you declaratively specify a URI to the element to render, but it could also have a setSource(node) API so that you can give it a specific DOM node to mirror. You could even have an allowEvents attribute that lets events pass through the looking-glass... Right now MozAfterPaint and canvas.drawWindow are the best way to do effects like that, but they're not optimal. (Although there are uses for MozAfterPaint that the putative viewPort element would not satisfy, such as paint flashing/logging for debugging tools.)

Hating Pixels

Drawing an image on the screen should be a simple operation. However, in a browser engine, things get complicated because there are a number of subtle requirements involving subpixel layout, scaling, tiling, and device pixels. We've had lots of bugs where visual artifacts appear in sites at certain zoom levels, and we've fixed most of them but the code got really messy and some bugs were still not fixed. So several days ago I sat down and worked out what all our known requirements for image rendering are. Then I worked out an approach that would satisfy those requirements, and implemented it. As is often the case, the implementation revealed that some of my requirements were not strong enough. The resulting patch seems to fix all the bugs and is much much simpler than our current code.

The problem at hand is to render a raster image to a pixel-based device at a specified subpixel-precise rectangle, possibly scaling or tiling the image to fill the rectangle. We control this by specifying two rectangles: a "fill rectangle" which is the area that should be filled with copies of the image, and an "initial rectangle" which specifies where one copy of the image is mapped to (thus establishing the entire grid of tiled images). There may also be a "dirty rectangle" outside of which we don't need to render. There are several requirements, the first three of which are actually more general than just image rendering:

  1. Horizontal or vertical edges (e.g., of background color, background image, border, foreground image, etc.) laid out so they're not precisely on pixel boundaries should generally be "snapped" during rendering to lie on pixel boundaries, so they look crisp and not fuzzy.
  2. All edges at the same subpixel location must be snapped (or not snapped) to the same pixel boundary. This includes multiple edges of the same element, edges of ancestor/descendant elements, and edges of elements without an ancestor/descendant relationship. Otherwise, you get nasty-looking seams or overlaps.
  3. Any two edges separated by a width that maps to an exact number of device pixels must snap to locations separated by the same amount (and in the same direction, of course). As far as possible, we want widths specified by the author to be honoured on the screen.

    In Gecko, we achieve the first three requirements by rounding the subpixel output rectangle top-left and bottom-right corners to device pixel boundaries, ensuring that the set of pixel centers remains unchanged.

  4. When content is rendered with different dirty rects, the pixel values where those rects overlap should be the same. Otherwise you get nasty visual artifacts when windows are partially repainted.
  5. Let the "ideal rendering" be what would be drawn to an "infinite resolution" device. This rendering would simply draw each image pixel as a rectangle on the device. Then image pixels which are not visible in the ideal rendering should not be sampled by the actual rendering. This requirement is important because in the real Web there's a lot of usage of background-position to slice a single image up into "sprites", and sampling outside the intended sprite produces really bad results. Note that a "sprite" could actually be a fixed number of copies of a tiled image...

    (This may need further explanation. Good image scaling algorithms compute output pixels by looking at a range of input pixels, not just a single image pixel. Thus, if you have an image that's half black and half white, and you use it as a CSS background for an element that should just be showing the half-black part, if you scale the whole thing naively the image scaling algorithm might look at some white pixels and you get some gray on an edge of your element.)

  6. The exact ratio of initial rectangle size in device pixels to image size in CSS pixels, along each axis, should be used as the scale factors when we transform the image for rendering. This is especially important when the ratios are 1; pixel-snapping logic should not make an image need scaling when it didn't already need scaling. It's also important for tiled images; a 5px-wide image that's filling a 50px-wide area should always get exactly 10 copies, no matter what scaling or snapping is happening.
  7. Here's a subtle one... Suppose we have an image that's being scaled to some fractional width, say 5.3px and we're extracting some 20px-wide area out of the tiled surface. We can only pixel-align one vertical edge of the image, but which one? It turns out that if the author specified "background-position:right" you want to pixel-align the right edge of a particular image tile, but if they specified "backgrond-position:left" you want to pixel-align the left edge of that image tile. So the image drawing algorithm needs an extra input parameter: an "anchor point" that when mapped back to image pixels is pixel-aligned in the final device output.

It turns out that given these requirements, extracting the simplest algorithm that satisfies them is pretty easy. For more details, see this wiki page. Our actual implementation is a little more complicated, mainly in the gfx layer where we don't have direct support for subimage sampling restrictions (a.k.a. "source clipping"), so we have to resort to cairo's EXTEND_PAD and/or temporary surfaces, with fast paths where possible and appropriate.

Note: this algorithm has not actually been checked in yet, so we don't have battle-tested experience with it. However, we have pretty good test coverage in this area and it passed all the trunk tests with no change to the design, as well as new tests I wrote, so I'm pretty confident.

Sunday, 12 October 2008


I took a couple of days off and took off to Coromandel with the family --- we needed to do something with the kids during the school holidays, and I haven't been out that way for a very long time.

On Thursday we drove to Clevedon, then to Kawakawa and around the coast to Waharau Regional Park for a short walk. We carried on around the Firth of Thames, taking a lunch stop near Thames, then continued north to Coromandel town. The road from Thames to Coromandel town is mostly right next to the shore and very pretty. (See the first picture below.)

We stopped at Coromandel town for a bit and then crossed the peninsula to Whitianga where we stayed in a motel, right across the road from the beach. Despite being the school holidays, the whole place was very quiet. I guess it's still a bit early in the season, or maybe it's the dreaded Financial Crisis. Anyway, it was lovely.

On Friday we drove around to Hahei and did the walk to Cathedral Cove (see second picture below). There's nothing to say about that that hasn't already been said --- amazing etc. Kayaking around it would be a great way to go but the kids aren't up to that yet. Had lunch on Hahei beach, then went back to Whitianga, took the ferry to the south side of the harbour and walked to the pa site on Whitianga Rock, then past Maramatotara Bay, up the hill and along the ridge to Flaxmill Bay and back to the ferry.

It's neat how the eastern and western Coromandel coasts have their own flavours. The eastern coast has the Pacific outlook and the best beaches, but the western coast has a bit more character.

This morning we drove back along the southern route, crossing the Coromandel Ranges near Tairua. We came pretty much straight back --- only took about two-and-a-half hours, so with a leisurely start we still got back to Auckland in time for yum cha. I love living here :-).

East coast of Coromandel, looking south

Looking north from the Cathedral Cove carpark along the coast

Friday, 3 October 2008

Interesting Developments In Program Recording

A few interesting developments tangentially related to Chronicle have emerged in the last week or so.

VMWare developers have announced Valgrind-RR, syscall-level record and replay for Valgrind. The idea is to isolate and record all sources of nondeterminism --- syscall results, CPU timer results, thread scheduling, etc --- so you can play back the execution deterministically to get the same results, but in Valgrind with other Valgrind tools turned on. This is great stuff. It actually complements Chronicle very well, because you could run your program first with Valgrind-RR, with less than 10X overhead (going by their numbers), and then rerun under Chronicle with higher overhead but guaranteed to get the same execution. So this would make Chronicle more useful for interactive programs.

VMWare has also announced improved replay for their VM record-and-replay functionality. That's cool, but what's especially interesting is that their Valgrind announcement hinted at possible future integration of Valgrind replay with VM recording. That's really the ultimate scenario for Chronicle: record your app in a VM at less than 2X overhead, then replay under Chronicle instrumentation at your leisure for an awesome debugging experience. You could even parallelize the replay under Chronicle to reduce that overhead by throwing hardware at it.

Another piece of excitement is that a partial port of Valgrind to Mac has been announced. I haven't tried it myself, but people say it can run Firefox opt builds and is close to being able to run Firefox debug builds. This means that at some point I'll probably be able to get Chronicle running on Mac!