Tuesday, 29 May 2007

We Don't Need No Office Education

Today's Herald leads with a story about Government's licensing of Microsoft software. The government declined to renew a deal that licensed Microsoft Office for every single Mac in schools, because it turns out only about half of those Macs actually are using Office.

The Herald paints this in alarming terms:

Pupils suffer in schools computer row

Software has been wiped from thousands of school computers because of a row over Government funding.

Principals said the move was baffling, as it went against the drive to use computers to enhance learning.

Auckland Primary Principals Association past president Julien Le Sueur said schools were being told to improve links with parents and their communities.

But the outside world was dominated by the global giant Microsoft.

"We've had our legs cut out from underneath us," he said.

This is outrageous in several dimensions...

Obviously those 50% who want Office should buy it themselves and stop being subsidised by the other 50%. It's probably an overall saving of money.

(Aside: Microsoft loves these "pay per machine, not per copy" deals because they make it impossible for competing products to get a foothold. It's these sorts of deals that gave them dominance in OEM operating systems until the US Department of Justice forced them to stop. The NZ government would be wise to avoid these deals just as a matter of policy.)

I'm greatly disturbed that educators are linking computer skills and enhanced learning with Office training. Word and Excel are boring and if kids are taught that's what computers are about, they will be turned off computers for life if they have any sense. That would be a disservice to them and to our economy.

By the time kids get into the workforce it's very likely that Office as we know it will have been eclipsed by Web-based tools. The future of computing is more likely to resemble kid-driven solutions like phones, online games and social networking sites. Microsoft-pushing school principals need to get out of the way.

Furthermore the dismissal of Apple and OpenOffice alternatives is very weak:

But Mr Le Sueur said NeoOffice was littered with problems, and its website warned that users could expect lots of bugs.

"That's not the sort of software we should be expecting kids in New Zealand to be using."

As if Microsoft Office doesn't have lots of bugs! Anyway, kids won't push these products all that hard. A stronger point against OpenOffice would have been its sluggishness on low-end machines. On the other hand, Apple's stuff is cheap and supposed to be good.

I hope Steve Maharey stands his ground.



Things I've Seen With My Eyes

Today was a fun day at the Auckland office. We did the first ever demos of a few exciting new features:


  • Theora video playing natively in Firefox!
  • Windowless plugins in X/Linux!
  • Rendering --- and editing --- crazy Zapfino ligatures without bogus visual artifacts!

... each the work of our local developers. It's a hive of excitement here, I tell you!



Saturday, 26 May 2007

Record-And-Replay In Virtual Machines

VMWare has announced very interesting record and replay functionality in VMWare Workstation 6. This is not completely new --- in the research world, TTVM did this, and probably they weren't the first --- but it's excellent to see this functionality arriving so quickly in commercial VMs. It's also neat that they've got debugging integrated into it already. Too bad the debugger is gdb...

They report slowdown of around 5%, which is very good and not surprising, and a logging rate of 2MB/minute for some unspecified workload, which is also very good. These numbers aren't surprising because the inputs to a virtual machine from the outside world are usually very small. Disk I/O stays inside the VM. Only very heavy users of the network, the VMWare shared file system, or some other high-bandwidth input device (e.g., USB video camera) are going to see a significant penalty.

This is definitely the way to go for low-overhead record and replay. A great way to debug would be to use a VM to record a test failure, then replay it under Chronicle-style instrumentation, then debug the failure from the Chronicle database using Chronicle-style debugging. This way, the recording overhead is minimal and could even be used on a live system with negligible perturbation. This suggests to me that approaches like Nirvana that trade off queryability in search of reduced recording overhead probably aren't worth it.

Replaying a VM with massive binary instrumentation is a significant engineering problem; it would require major changes to Valgrind+Chronicle, for example. But conceptually it's fairly straightforward.

This is a really exciting development. There are lots of potential uses for VM record-and-replay. I think this will be a big win for VMWare. Hopefully people will see beyond the obvious forward-and-back time travel approaches to debugging.



Friday, 25 May 2007

The Glyph Bounds Problem

New textframe is looking pretty good. I'm dogfooding it and so are a few other people. The list of regressions is fairly short. We should be able to turn it on as soon as the tree opens for alpha6.

In the meantime I'm thinking about the glyph bounds problem. This is the problem of certain characters in certain fonts having glyphs that are particularly large, and fall outside the "font box" for the glyph: i.e. the glyph extends above the font's specified ascent, below the font's specified descent, to the left of the glyph origin ("left bearing"), or to the right of the glyph advance ("right bearing"). When a text frame contains such glyphs, we need to carefully compute the "overflow area" --- the union of the extents of the (positioned) glyphs --- when we lay out the text. That lets us redraw the correct area efficiently if the text moves or changes. Since time immemorial we have assumed no glyphs overflow their font boxes, which is plain wrong and getting more wrong as fonts get fancier...

The hard part (as often in the world of browser text) is performance. For example, the Mac implementation of cairo_glyph_extents gets the glyph extents by retrieving the glyph outline curves from ATS and computing their bounding boxes with grim mathematics. There's no way in the world we could apply that to each glyph, even with caching. The other Mac APIs aren't better, I'm told, and the other platforms aren't looking good performance-wise either.

One possible approach is to exploit the fact that normally we only care about glyph bounds if they overflow the font box. If we could examine the font and determine cheaply for each glyph if it's guaranteed to not overflow the font box, we'd be fast on most fonts and pages, where these check will succeed and we won't need the exact bounds for the glyphs. Looking at the OpenType/TrueType font format, the 'glyf' table contains, for each glyph, min/max x and y coordinates for the glyph. But this data does not take hinting into account. Hinting could make the glyph larger than that box.

Perhaps we can be a little conservative and assume that hinting will make the glyph at most ascend one more pixel, and descend one more pixel. We could also assume that hinting will not increase the left bearing and will not increase the glyph right bearing (because its hinted advance should increase if the glyph width does). Of course these are just heuristics, and it would be a good idea test them against a large collection of fonts to see if they hold. But if they hold, we should be able to use the 'glyf' data to quickly rule out overflow for a large number of glyphs.

There is one more problem: when we actually compute the bounding box approximation for a string of glyphs, our glyph advances may include kerning. The font box definition above assumed no kerning. This probably won't worry us in practice because we generally measure substrings that end with spaces or have no characters following them, so there is no kerning across the end of the substring.

Grabbing and analyzing these tables could still be expensive. I wish there was a better way, but I can't see one right now. Possibly the way to go, if it was practical, would be to abandon "overflow areas" in Gecko, but then we repaint a lot more than we have to when documents change dynamically.

One remaining card we can play is to identify common fonts that have no overflowing glyphs and hardcode that knowledge into Gecko to avoid looking up tables. That would be ugly, but expedient.



Thursday, 24 May 2007

Performance Observation

I'm running Tp2 page load tests on a Linux box --- a real dual-core machine, not a VM. Interesting observations:


  • Firefox uses 40% of one processor
  • X11 uses 60% of one processor
  • One processor is basically idle

I don't know why we're spending so much time in the X server. These are old pages and they're fairly simple. Maybe we're not using cairo well, or cairo isn't using X well.

I also wonder why we aren't using two processors more effectively here. Firefox should be able to overlap with X. They must be waiting on each other for some reason.

I'm tempted to go off on a rant here about how X has sucked for so long and continues to suck in spite of all promises to the contrary. But I won't.



Wednesday, 23 May 2007

Chronicle Update

I committed the Chronicle sources to Google Code's Subversion repository, along with a couple of bug fixes. One of those fixes will hopefully get it to build on Linux distros like Ubuntu where libelf headers go in /usr/include. The only downside is that I can no longer track download numbers --- currently 174 :-).

I'm still working ever-so-gradually on the Eclipse debugger plugin. I need to play the name game again... incumbent: Chronometer, challenger: Anachron.

Update New challenger: Parachron.



Sunday, 20 May 2007

Speeding Up Mkdepend

I did a little bit of research into the problem of mkdepend.exe being slow under VMWare with sources in VMWare's shared file system.

The first thing to do was to measure the problem. This was pretty easy: running mkdepend on nsPresShell.cpp consistently takes 32 seconds. Actually compiling the file takes only a few seconds. So clearly something is very wrong.

Poor-man's profiling in the debugger shows that mkdepend is spending all its time in stat() calls. These stat calls are just checking to see whether a file exists; on a hunch I wondered if there's a faster way to check for file existence than stat(). Some Web searches suggested that GetFileAttributesEx is a good fast way. So I refactored the code to use a does_file_exist function which calls GetFileAttributesEx on Windows. Ta da! Now mkdepend on nsPresShell.cpp takes 1.5 seconds. GAME OVER. Patch submitted.

I conjecture that stat() is so slow because it fills in a bunch of information about devices, inode numbers, and access permissions, and on Windows that may require several filesystem operations to get that information. Perhaps VMWare's file system is particularly slow at some of those extra operations.



Authentication

Today was gorgeous. We went up north and stopped at Wenderholm for a walk. One of my kids was napping in the car so I left my wife with him and took the other for a short walk to the Couldrey House lookout. It's a short track through lovely bush and the view is spectacular.

Along the way my little companion decided he wanted to turn back but I coerced him into continuing. He wasn't happy about that and started howling but we carried on anyway. We passed a couple, and then a little later one of them (an woman, American by the sound of it) ran back to us, apologised for intruding, and then asked "is he yours?" and then to the unhappy boy, "is he your daddy?" As it happens he declined to confirm or deny.

At the time I was glanding Toddlerproof Stoic Calm so I said little, just waited until she seemed satisfied and then we carried on our way. I don't really know what kind of confirmation she expected. In retrospect I find it rather galling to be questioned this way --- especially to have my son questioned this way. On the other hand, she probably had the best of intentions. I guess I'll just carry a family portrait in my wallet from now on...



Saturday, 19 May 2007

Status

I've fixed a number of bugs in the new textframe code. It's now very usable; I've been dogfooding it most of this week and some other people have too. I think we're on track for turning it on in the next few weeks. Set MOZ_ENABLE_NEW_TEXT_FRAME = 1 in layout/generic/Makefile.in for a good time. There are still some problems; for example, word selection is a little bit broken.

Much of this week I was looking at white-space handling. I've been cleaning up tricky issues like where we allow line breaks in white-space:pre-wrap text. A big thing in the last two days was handling pre/pre-wrap tabs; our existing code expanded them to a number of space characters based on the number of characters present in the line. CSS 2.1 wants us to advance to tab stops instead, which works better with proportional fonts. I've implemented that, among other fixes, and we should be very compliant with CSS 2.1 white-space now.

One remaining issue is that currently our white-space collapsing pass uses only backward context; for example we collapse a space that has a space before it. We can't however collapse spaces that are followed by, say, a line feed, as support for white-space:pre-line would require; that requires knowledge of the forward context. Similarly we should translate a newline between two CJK characters to a zero-width space, but that also requires forward context in general. (Right now we do this only when both CJK characters are in the same text node.) Backwards context is easy to implement because you can use a little state machine as you traverse the DOM or text frame tree or whatever. Supporting both directions requires something significantly more elaborate. I haven't decided whether it's worth doing it right now. Now would be the easiest time to do it.

I still need to do some performance tests on all three platforms to see what the effect is. Now all the pieces are in place ... in particular the new textrun cache is in. Even with the old text frame that cache sped up Tinderbox pageload performance by 3-10%. Some of that speedup was because the old textrun cache did a lot of PR_IntervalNow calls, and get-current-time calls are particularly slow in VMs because they'll trap to the virtual-machine monitor. But there was real speedup there too.

Vlad and Pav are arriving tomorrow to spend a couple of weeks here. This'll be exciting... First we're going to have to get some more furniture :-). I hope the weather's good! You never know in Auckland :-).

Hiring's been disappointing lately. If anyone reading this knows anyone in NZ who'd be good to work on Firefox, let me know!!!

I switched from Parallels to VMWare Fusion beta 3 because VMWare's support for Linux is so much better. Fusion's working great so far ... except that Windows builds are even slower than they were in Parallels, which was already slow. I just discovered that our "mkdepend.exe" tool that crawls C/C++ code to find the #include dependencies on Windows is incredibly slow and actually eating most of the build time. It looks like it's doing tons of stat() calls that are really slow, perhaps because they're often going to VMWare's virtual filesystem. I plan to take a quick look to see if there's some easy optimization that would help.

On the Chronicle front, there's a steady stream of people downloading the tarball. It's up to 161 downloads; not bad for something that doesn't really do anything useful. In snatches of time at airports and at home, I've gradually been putting an Eclipse UI together; it's actually pretty easy. Writing Java code in Eclipse again is fun.

People were quite interested in Chronicle when I talked about it at the OSQ retreat and before that, with the people gathered for the OOSPLA committee meeting. At the OSQ retreat I was amused to find myself an "industry representative" --- in fact, as the primary "open source" representative. It is nominally an "Open Source Quality" meeting, after all :-). I pushed the message that although all the research on bug finding is great, the truth is that in our domain we have already found plenty of bugs, and our biggest problem is fixing them. I suggested that therefore we need more research on topics such as debugging and performance analysis. I also suggested that the worst code is not necessarily buggy code, but code that is unnecessarily complex. Detecting that would be an interesting new direction for program analysis.

There was a great deal of discussion of parallel programming, now that we have entered the multicore world. More than one person opined that multicore is a mistake we will live to regret --- "we don't know what the right way is to use the transistors, but multicore is definitely wrong". There was general agreement that the state of parallel programming models, languages and tools remains pathetic for general-purpose single-user programs and no breakthrough should be expected. My position is that for regular desktop software to scale to 32 cores by 2011 (as roadmaps predict) we'd have to rewrite everything above the kernel, starting today, using some parallel programming model that doesn't suck. Since that model doesn't exist, it's already too late. Probably we will scale out to a handful of cores, with some opportunistic task or data parallelism, and then hit Amdahl's law, hard. It is probably therefore more fruitful to focus on new kinds of applications which we think we have reasonable ideas for parallelizing. I think virtual worlds (which are not just "games", people!) are a prime candidate. That's a direction I think the PL/software engineering research community should be pushing in.



Wednesday, 16 May 2007

Collision

I'm scheduled to give a talk at next month's Auckland Web meetup --- about Firefox 3 stuff, of course. It turns out that Nigel Parker from Microsoft will be at the same event giving the Silverlight pitch. When it comes to marketing, Nigel's a pro, so that should be interesting.

The indomitable Chris Double should be there with me and we will have some interesting demos to show, although perhaps less flashy than others :-). We'll want to strike the right balance, though, pushing not just Firefox but our message about the open Web.

Anyway, it should be a fun evening!



Sunday, 13 May 2007

Desperate Measures

Greetings from Santa Cruz, where the Berkeley OSQ group's retreat has just ended and I'm hacking away in my hotel room so I can run some performance tests while I go for a walk on the beach. I love software development.

Silverlight is interesting. It sounds pretty good. We definitely have work to do to make sure that the Web stays attractive as a platform. But we already knew that, and a lot of what we've been working on for Firefox 3 over the last few years aims to address that --- offline apps, better text rendering, video, better graphics; we're doing the right things and we need to keep on doing them.

To tell the truth, I'm a lot less worried about Silverlight and Flash than I am about WPF, for a couple of reasons. First, SilverFlash is a lot more Web-centric than WPF: the browser is still in the loop. There are compelling development and user benefits for this approach, like the ability to participate in hyperlinking, so I think it'll remain popular. That means that browsers won't become irrelevant any time soon and that means we have an ongoing opportunity to keep moving the Web forward. Secondly, SilverFlash is cross-platform (at least for now), especially if Miguel has his way and is able to produce a viable open source implementation. This is very interesting and perhaps underappreciated: Silverlight is a very risky move by Microsoft. The Web has been steadily undermining the Windows platform monopoly, and to the extent that SilverFlash is a more attractive development target than the Web, they undermine Windows that much more. Sure, in the long term, if Silverlight dominates Microsoft can pull back support for platforms they don't own --- especially if Adobe is dead (but I think they'd be bought first). However that is a long term strategy and it can be countered, especially if Miguel's open source implementation gets traction. In the medium term, things don't look good for Windows.

We need to keep the pressure on and keep forcing Microsoft to make decisions like this.