Tuesday, 28 February 2006

Linker Performance Revisited

About this time last year, I did some work to improve ld performance and reported it on this blog, getting to the point where I could link gklayout.so with debug symbols in only five seconds. Sadly at some point during my SUSE upgrades, things went downhill again: recently it's been taking about 70 seconds to link. Vlad says he's seen similar performance regressions with the ld in Fedora Core 5.

So I went to investigate, and the first thing I found is that the ld in binutils CVS HEAD only takes 10 seconds, i.e., it's seven times faster that the SUSE package. So my advice to other Linux developers is, if you feel that the packaged ld is sluggish, try building your own from CVS.

Monday, 27 February 2006

Choosing Sides

Suppose you're a great programmer and your goal in life is to get great desktop software into the hands of as many people as possible. Where should you work? Since most desktop users in the world use Microsoft software, Microsoft's recruiters tell you that you should work there --- but they're wrong. This is quite easy to see if you ask yourself, "what is the scarce resource that limits Microsoft's ability to deliver great software?" It is not a shortage of brilliant engineers! It is instead a shortage of competitive pressures.

The current situation with IE7 is a classic example: after achieving a monopoly position, the company sat back and did nothing until Firefox forced its hand. Looking further back, one of the greatest software engineering feats ever (in my opinion) was when IE went from nowhere to Netscape parity with the incredibly fast development of IE3, and I don't think it was a coincidence that Microsoft was then facing one of their greatest threats ever.

Another interesting example is Windows Sidebar. Five years ago Microsoft Research had a project called Sideshow which provided desktop widgets for "peripheral awareness". That project struggled to emerge from the research lab and was killed. Since then a number of similar competitive products emerged, notably Google Desktop. Microsoft responded by hastily assembling a team to copy those features by (re)creating Windows Sidebar.

One could make many more comparisons. Where Microsoft dominates (desktop OSes, office apps), progress is slow; where they are threatened (MSN/Live, Xbox, Java/.NET), progress is rapid.

The lesson is that if you want to see great Microsoft software, the biggest impact you can have is by working elsewhere. Create something compelling, and Microsoft will clone it. Make a successful challenge to an entrenched product, and prod them out of complacency.

A word of caution: you will run a high risk of being flattened by Microsoft's combination of technological prowess and monopoly power, which may daunt some. But fighting the good fight is not necessarily a ticket to penury.

Tuesday, 21 February 2006

Cairo, Linux And GTK2 Themes

In the last week or two I've been spending most of my time working on cairo-gtk2 trunk builds, trying to get them back into shape. Vlad did some wonderful, long-needed work to move all double-buffering and window translucency management out of the view manager into into platform-specific windowing code, in a very elegant way, but he only updated Windows, so GTK2 needed to catch up there. When I fixed that, I encountered a serious cairo issue --- it was creating 24-bit temporary pixmaps for the offscreen buffer, which was killing performance on my 16-bit display setup --- and it took a while to diagnose and fix (in cairo).

But the biggest chunk of work has been reenabling GTK2 native themes in cairo builds, without which Firefox looks completely naff. This isn't easy since we want to be able to render GTK2 themes to any cairo context (e.g., a PDF context, or a screen context with some rotation applied) but GTK2 themes expect to be rendering to an X pixmap or window. We can hack around this by having the theme render to an offscreen pixmap and then extracting the data, but that's painful and slow especially when the theme paints with transparency or translucency. It's important that the common case of painting the theme to an X window with no scaling or rotation remain as efficient as it is today. Well, it's done :-).

One bonus of this work is that we can finally enable GTK2 themes for HTML content. We never did so before because we never wrote the code to handle rendering to a non-X printing context (i.e. render to temporary pixmaps and send them as images to the printer). Now that we've solved this problem in a general way, we will (when my patches land) turn on GTK2 themes for HTML. The screenshot below is using the "Industrial" theme. The Cancel button is being hovered.

GTK2-themed GMail preferences

Friday, 17 February 2006

Irrational Glumness

I just spotted a piece of classic gloom-based newspaper copy, so blatant that I just have to share it.

Contrast the headline:
Driving, costs shock migrants

... with the body:

Immigrants don't like our driving habits and cannot believe the cost of living.

But nearly all new migrants would tell others to join them in New Zealand, saying the country's natural beauty, relaxed pace of life and friendly people made their Kiwi experience special.

Wednesday, 15 February 2006

The Greatest Movies Of All Time

Police Story III: Supercop is one of the greatest movies of all time.

Like Casablanca, it's a genre masterpiece featuring industry regulars at the height of their powers. Unlike Casablanca, it has dazzling martial arts, outrageous stunts and no characters or plot worth mentioning. But I'm convinced, after repeated viewing over a span of years, that it's the finest Hong Kong action-comedy flick I've seen. I actually enjoy it more each time, which is quite remarkable for this kind of movie.

What makes it so great? At the core, it's Jackie Chan near his best (possibly not as good as he was in Drunken Master 2, but very close). That means more than just martial arts; it means a certain style of humour, action and direction. Police Story III adds variety by laying on gun-fu and explosions to a much greater extent that you normally see in Jackie Chan's Hong Kong movies. But what really sets it apart is the pairing of Jackie Chan with Michelle Yeoh, who is phenomenal, matching him in martial arts and slapstick.

She also matches him in doing her own stunts. It really adds to the enjoyment knowing that when she appears to jump a motorbike onto a moving train, she really did jump the motorbike onto the moving train --- despite not having ridden a motorbike before. In the out-takes you can see that the crew thoughtfully provided a pile of cardboard boxes for her to land in, should she overshoot the train --- which she did, twice. Top that, Casablanca.

Saturday, 11 February 2006

Anatomy Of Bloat

Yesterday I ran into some really nasty, bloaty code. This isn't particularly unusual, but this time I was bothered by the fact that this code was mostly written after a lot of people started working hard to de-bloat core parts of Gecko. This wasn't the first time I noticed problems in this code either, but for various reasons I felt particularly bothered yesterday. So I'm taking the opportunity to make something positive out of this by letting off some steam and exposing some practices that absolutely must be stamped out.

I have some examples from the SVG code. I do not intend to defame our SVG contributors, without whom we simply wouldn't have SVG support. Yet the fact that SVG has a lot of problems suggests they needed external code review and encouragement well before now.

Let's take a look at something simple: rectangles. An interesting question to ask is what gets created for an SVG <rect> element? Here's an answer:
TypeCountObject SizeTotal Size
This is only approximate, since I computed it by hand, and it doesn't include some details like the storage for the DOM attributes that the rect would presumably have, nor does it account for the frame's style context and some other generic data (which might, however, be shared if you have a lot of similar elements). It makes some optimistic assumptions about the allocation behaviour of some of the dynamically allocating arrays. It also doesn't account for intra-object padding and inter-object overhead for the heap objects. I apologize in advance for any errors. But it's good enough to get an idea of what's going on, especially with respect to SVG-specific data.

Overall, that's a lower bound of 1372 bytes on a normal 32-bit machine. I won my bet with Boris that it wouldn't be under 1K! For a single rectangle element! Other interesting statistics to think about: that's 39 heap allocated objects, 26 of which are reference counted XPCOM objects. Many of the objects support multiple C++ interfaces, hence require multiple vtable pointers per object; in fact 320 of those 1372 bytes are for vtable pointers. On a 64-bit machine, which we can no longer consider pie in the sky, the numbers are worse as pointers grow to 8 bytes each: 2376 bytes, of which 640 bytes are vtable pointers. Eeeep!

So how did we get to here, and how can we get out of here? I'm afraid the main culprit is XPCOM and a design that leans heavily on it; the critical decision was to have every object in the SVG DOM implemented directly as an XPCOM object. One of the biggest issues is the way SVG handles coordinates. Following the SVG DOM spec, each coordinate (an SVG rect has 6, x/y/width/height plus two bonus corner-rounding values) has been made an XPCOM object, an nsSVGAnimatedLength. An nsSVGAnimatedLength has two nsSVGLength components in theory: the "base value" and an "animated value". Since we don't support animation yet (if we did, a straightforward extension would make the above numbers significantly worse) our nsSVGAnimatedLength objects have just a single nsSVGLength child object, so that's 12 XPCOM objects accounted for already.

Another big problem is that the SVG code relies on an observer pattern implemented through XPCOM. Changes to SVG values like the coordinates are propagated to objects which have explicitly attached themselves as listeners. To avoid strong-reference cycles, the listeners implement a weak-reference pattern, which is where our nsWeakReference objects come in (each of which is a ref-counted XPCOM object). As you might expect, the frame listens for changes to each of its SVG coordinates. For no reason I can see, each nsSVGAnimatedLength listens to its base value. There's 7 more XPCOM objects for their nsWeakReferences.

There's also an nsSVGAnimatedTransformList for the rect, just in case it has a "transform" attribute I suppose, with a child nsSVGTransformList that it listens to. That's 3 XPCOM objects for those two plus the listener's nsWeakReference. The other XPCOM objects are the content element, the nsCSSStyleRule to hold a CSS rule expressing the style impact of any presentational attributes that might be on a <rect>, the nsSVGClassValue which seems to hold the element's class in case you want to animate it (!), and an nsSVGRendererPathGeometry which gets attached to the frame. I'm not sure why this last one needs to be a separate object.

Clearly it's imperative to simplify the coordinates, especially animated coordinates, so they're not XPCOM objects. I actually have a suggestion for how to do this that's been languishing for a while. It would shrink an animated coordinate down to 8 bytes for the common case where the value is not animated (compared to 92 bytes currently). The DOM interfaces would be implemented as "tearoffs" so we only allocate them if they're used.

Another imperative (actually a prerequisite for the above) is to eliminate the observer pattern and the use of weak references. We do not need to support an arbitrary graph of notifications in almost all cases; storing the explicit dependence graph is hugely wasteful. Attribute changes, style changes and DOM method calls already notify or are implemented by the content element, which can directly notify its frame. This can eliminate a huge amount of code and data. If we just do these two items I think we'll have made a huge step getting the code back under control.

While studying this code I discovered some interesting additional problems which I'd better mention before I forget. They may prove instructive.

  • A CSS style rule is created for every SVG element that might have presentational attributes, even if there aren't any on the element. This is just silly and should be easy to fix (as a comment in the code mentions!).
  • SVG values are expected to have zero or one observers, so nsSVGValue uses an nsSmallVoidArray to represent the observer list which is optimized for the 0-1 case. This is a pointer-sized field that can either store a direct pointer to an observer, or a heap-allocated array for the hopefully rare case of multiple observers. But it turns out the rectangle's six nsSVGLength values are observed by both the frame and its containing nsSVGAnimatedLength! So the common case is to have 2 elements, and each value has to heap-allocate an nsSmallVoidArray::Impl, and happens to allocate one with a default size of 8 pointer elements (plus overhead). nsSVGTransformList suffers the same fate. Welcome to 7 heap allocations and 280 bytes.
  • nsSVGTransformList contains an nsAutoVoidArray member for its list of transforms. This array type is meant for use on the stack, and lets you avoid a heap allocation if the maximum count required is no more than 8 elements. Since most SVG elements don't have any transforms, we burn 8 pointer elements plus overhead for nothing most of the time. This is actually a good place to use nsSmallVoidArray. (But why can't nsSVGRectElement::mTransforms just be null if there are no transforms? Beats me.) Generally speaking one should not use nsAutoVoidArray (or similar classes such as nsAuto(C)String) as object fields.
  • Not actually seen in this example, but seen in code nearby: some SVG classes contain PRBool fields. PRBool is a 4-byte type and should never be used in objects. Space could be saved just by making these fields PRPackedBool (or even PRPackedBool mField:1, for one bit!) and rearranging the fields to avoid padding.
  • There are other things I could complain about, like the presence of "mFillGradient" and "mStrokeGradient" in every single rectangle frame regardless of whether the frame actually uses gradients, but those are small fry compared to the above.

I haven't said much about code size yet. Just constructing and maintaining the big graph of objects takes considerable code. Much of it is boilerplate code
repeated over and over with little or no sharing. There is other evidence of copy-paste coding. It's been said so many times, but I'll say it again: copy-paste coding is evil. It bloats source and object code, breeds inconsistency, discourages the creation of useful mental abstractions, and will inevitably make you fix the same bugs many times.

An overarching lesson is that XPCOM is a disease. It tempts people who should know better to create systems which are nasty, large, and inevitably, slow. It is a contagious disease; people acquire it by being exposed to infected code. Developers can be immunised, as many of us have, mostly by suffering through a bout of it. Perhaps diatribes like mine can help immunise others with less suffering. XPCOM does have its uses --- as a component mechanism in situations where one absolutely requires dynamic component loading and binding, and where we need to cross language boundaries --- but wherever it spreads beyond that, it must be stamped out.

Friday, 10 February 2006

The Joy Of Text

nsTextFrame is the core of our text layout and rendering. To the rest of the layout engine, it exposes an interface pretty much like that of any other inline frame. It consumes fairly simple platform-specific APIs for measuring and drawing strings of single-style text. It implements a whole lot of behaviour:

  • Regular text measurement, layout, linebreaking and painting
  • XML/HTML whitespace compression (for CSS "whitespace" not "pre")
  • Text selection painting
  • IME-specific selection/conversion painting
  • CSS "word-spacing" and "letter-spacing"
  • CSS "text-decoration" in "quirks mode" (including "text-decoration:blink")
  • CSS "text-transform"
  • CSS "font-variant:smallcaps"
  • CSS "text-align: justify" (in conjunction with control logic in nsLineLayout)

All this is really complex for a few reasons. For starters, the functionality itself is complex, especially when you take into account weird languages and scripts: e.g., the German "ß" whose uppercase form is two characters, "SS", so uppercasing a string can actually change its length; various RTL issues; UTF16 surrogate pairs; and glyph clustering (when multiple Unicode code points combine to form an atomic cluster of glyphs). Furthermore we support a mix of platforms with different underlying capabilities, so some nsTextFrame code is only used on platforms that don't have a required capability. And of course text layout is a core part of Web rendering, and hence nsTextFrame is performance critical, so we have convoluted code to avoid copying text and to speed things up in other ways.

Unsurprisingly nsTextFrame has become very messy. Apart from the usual difficulties this causes us with bugfixing and performance, we also need to extend it to support text-overflow, hyphenation and other features. Furthermore, over the last several years the multilingual, multiscript text layout capabilities of our platforms have improved enormously, with the introduction and widespread availability of Uniscribe, Pango, and ATSUI. Now with the move to cairo-based rendering, it is time for a wholesale redesign of nsTextFrame.

The primary goal is to separate text handling into two abstractions, nsTextFrame and gfxTextRun. A gfxTextRun is a run of single-direction, single-style, single-language text. gfxTextRun is responsible for converting the text into a sequence of glyph clusters and rendering them. The implementation will be platform-specific and rely on platform APIs as much as possible. A new nsTextFrame implementation is being written that implements all the functionality of text frames on top of gfxTextRun. Based on our experiences with the current nsTextFrame, I have designed a gfxTextRun interface which is fairly clean and around which we can hopefully build a much leaner, cleaner implementation of text frames. The actual implementation will build the real interface and implementation incrementally, and no doubt diverge a bit from the proposal, but I believe it's a good idea to think far enough ahead to have confidence we're heading to a coherent end point.

Currently Red Hat ships Firefox builds configured to use Pango underneath the existing nsTextFrame. These builds have notoriously poor performance. In SUSE we're going to enable these builds only in certain Indic locales likely to use scripts that the non-Pango builds are completely incapable of. In the new world I hope we can achieve much better performance, possibly even better performance than non-Pango builds today, because we'll be able to construct a gfxTextRun for an entire single-style text paragraph, eating the cost of glyph conversion and placement just once and then sharing it among all the text frames for that paragraph.

One interesting detail is that I think we can move text-transform and smallcaps out of text frames into specialized gfxTextRun implementations, treating them as nothing more than an extra processing pass during text to glyph conversion. This will help simplify nsTextFrame even more. Unfortunately nsTextFrame will still have to do whitespace compression as this depends on information from surrounding frames.

One implication of this plan is that we will not be fixing many bugs in the existing trunk nsTextFrame, unless they are candidates for landing on a FF1.5 or FF2 branch.

Thursday, 2 February 2006


Sunday through Tuesday I was in the central North Island with some very good friends visiting from the USA. The central volcanic plateau is one of my favourite places. The landscape is compelling, the alpine vegetation is fascinating, and the barely-dormant vulcanism of Ngaurahoe, Tongariro and Ruapehu is thrilling. Just thinking about the place brings back the flavour of childhood road trips with my family.

The focus of the trip was tramping the Tongariro Crossing on Monday. This eight-hour trip skirts the base of Ngaurahoe, then heads north, straight across the craters of Tongariro to descend its northern slopes. Some authorities call it the best one-day hike in New Zealand and I know no reason to contradict them.

Ruapehu over a hillock

This is a view of Ruapehu from early in the trip. Here we have tussock, but the track quickly ascends to areas with just moss, and in the craters of Tongariro, nothing but sand, gravel and bare rock.

Craggy wall

From the open country the track enters a valley with impossibly craggy walls, then at the head of the valley the trail takes a long, steep climb to the base of the Ngaurahoe cone. We had low lying cloud all day --- mercifully, I think, keeping us cool. Even so I perspired and eventually drank all two-and-a-half litres of water I was carrying. We climbed into thick cloud, which was stirred by the wind so that in one minute visibility would drop from kilometres to just a few metres and then open up again in a different direction. It was eerie and a little worrying; one wouldn't want to lose the path in that environment.

Emerald Lakes from the top of the Red Crater rim

The track crosses the South Crater and climbs again, to the top of the lip of the Red Crater. Here we could see and smell the steam of sulphurous fumaroles, and even feel the warmth of gases pouring up and over the crater rim. A little further along we got a view of the Central Crater and the Emerald Lakes.

The track continues past the lakes, through the Central Crater, around another lake and then gently down the mountainside. You pass the Keretahi Hot Springs belching steam and streams of warm, silica-coloured water. The vegetation changes before your eyes from barren rock to low tussock to high tussock and then with startling abruptness you plunge into rainforest.

It's a long walk, but not really very difficult in summertime, except for the two punishing climbs. They pushed my lungs and legs well beyond their usual operating parameters. The warnings about being prepared for sudden weather changes and about taking plenty of water are certainly no exaggeration. Gushing testimonies are no exaggeration either.

I'm looking forward to returning to the area soon. I hope to see more geothermal activity ... I'd love to be down there during an eruption. Maybe that sounds a bit pathological but I suspect it's similar to why I love extreme weather: the feeling of God tearing through the comfort of our technology and putting us in our place.