Sunday, 31 January 2010

Ruapehu Redux

Five Mozilla people at the top of Mt Ruapehu 12 days ago. The crater lake is in the background.




Three Lessons From A Pernicious Bug

We were having a lot of problems on Tinderbox with xpcshell tests randomly failing on Windows by returning exit code 1. Curiously, it seemed that almost any test could fail this way, with low probability, and the output log for each test indicated that all the actual subtest assertions passed, the process just mysteriously returned an exit code of 1 for no apparent reason.

Naturally I charged into battle with my sword +5 against orange (i.e., VMWare's record-and-replay debugging). The tests failed randomly about one run in a thousand, so capturing a failure wasn't hard. Then I verified that the problematic xpcshell run was passing 0 as the exit code for the final NtTerminateProcess system call, but the outer python script was indeed getting 1 from GetExitCodeProcess. Very mysterious!

I was at a loss for a while until some questions from Benjamin Smedberg led me to observe that in the failing run, the Google Breakpad helper thread was still alive at process termination, but in passing runs, that thread had exited before process termination.

Benjamin then looked into the Breakpad shutdown code and noticed it using TerminateThread to shut down its helper thread. The MSDN documentation for TerminateThread warns that that function is an extremely dangerous API, because it abruptly terminates the thread, leaving any shared resources it was manipulating in a possibly inconsistent state. Nevertheless, the Breakpad shutdown code seems to use it in a safe way ... mostly. We observed that that code was passing exit code 1 to TerminateThread. Changing that exit code to 44 made the exit code returned in our randomly failing tests change to 44. Clearly, somehow the exit code set for the Breakpad helper thread by TerminateThread was becoming the exit code for the process!

We think that, although it's not mentioned in MSDN, TerminateThread is actually asynchronous. The thread is not terminated immediately. If you're (un)lucky, the main thread can trigger process exit (passing exit code 0) before the helper thread's termination is complete. Then, somewhere in the netherworld of NT kernel process finalization, after the main thread has been terminated, termination of the helper thread finally completes and as the last thread to terminate, its exit code is recorded as the process exit code!

Moral of the story: never use TerminateThread. If you must use TerminateThread, you should probably try to wait for termination to finish (e.g. by calling WaitForSingleObject) before exiting the process, if you care about your process exit code. Another moral is that record-and-replay debugging rocks (but we already knew that). Yet another moral is that debugging closed-source operating systems sucks (but we already knew that too).



Saturday, 30 January 2010

H.264 Licensing And Free Software

I've read comments on the Web suggesting that since the MPEG-LA's patent licensing documentation only mentions playback products that are "sold", the MPEG-LA doesn't expect software that is given away at zero cost to need a license. Intrepid LWN reader Trelane actually bothered to ask them, and got a response. Here it is.

The most important part of the response:

In response to your specific question, under the Licenses royalties are
paid on all MPEG-4 Visual/AVC products of like functionality, and the
Licenses do not make any distinction for products offered for free
(whether open source or otherwise).

Also


I would also like to mention that while our Licenses are not concluded
by End Users, anyone in the product chain has liability if an end
product is unlicensed. Therefore, a royalty paid for an end product by
the end product supplier would render the product licensed in the hands
of the End User, but where a royalty has not been paid, such a product
remains unlicensed and any downstream users/distributors would have
liability.
Therefore, we suggest that all End Users deal with products only from
licensed suppliers. In that regard, we maintain lists of Licensees in
Good Standing to each of our Licenses at http://www.mpegla.com.


In other words, if you're an end user in a country where software patents (or method patents) are enforceable, and you're using software that encodes or decodes H.264 and the vendor is not on the list of licensees, the MPEG-LA reserves the right to sue you, the end user, as well as the software vendor or distributor.



Tuesday, 26 January 2010

ActiveX All Over Again

Many responses to my post about video codec issues have expressed the opinion that "Mozilla should do whatever is most useful for users today, regardless of 'politics' or 'ideology'" and also that "if a user already has codecs on their system, it's wrong for Mozilla to refuse to use them". These opinions fail due to various legal and technical issues that I've already discussed, but I also want to point out that taking such positions is nothing new for Mozilla and history has proved us right for doing so, in particular regarding ActiveX and Web standards in general.

Perhaps it's not widely known, but Gecko has had code to support hosting ActiveX controls, dating back as far as 1999. ActiveX controls are very much like system video codecs. ActiveX support would have been very useful to users ever since 1999, and still would be now --- certainly in corporate intranets, and everywhere in China and South Korea. Enabling ActiveX support would probably boost our market share significantly. Most users have useful ActiveX controls on their machines. But for the last ten years, even during Mozilla's most desperate days, we have consistently refused to turn this feature on, because we believe that ActiveX is not good for the Web.

I think history has proved that this decision was completely right. Our market share rose anyway. The ActiveX ecosystem was a big vector for security attacks. Most importantly, if we'd caved, the Web everywhere would look like it does in China and South Korea, but more so --- dependent on ActiveX, and tied to Windows. No resurgent Apple, no Linux netbooks, precious few Linux users, no ChromeOS, no iPhone, no usable browsers on phones at all, and Microsoft's grip on the industry stronger than one dare imagine. We would have sacrificed huge long-term wins for users --- ALL users, not just Firefox users --- for the sake of a temporary filip.

During the ascendancy of IE, we had similar pressures over the issue of whether to follow Web standards or focus on IE compatibility. The "pragmatists" wanted us to focus on IE compatibility for the very sensible reason that authors had to develop for IE anyway, so it would be easier and cheaper for them if we just fell in line. Obviously pursuing IE compatibility would also have been good for our market share --- and our users --- at the time. We chose standards. Again, I think history shows we made the right decision.

I'm not suggesting that the consequences of exposing system codecs to the Web would be identical to exposing ActiveX. That's unlikely, and unknowable. But favouring our principles over short-term gains for users is nothing new for Mozilla, and when we've done it in the past, history shows it was the right thing to do.



Sunday, 24 January 2010

LCA

At LCA I found it harder to talk to people than I normally do at conferences, probably because a lot of people already seemed to know each other and I don't already belong to any of those cliques. The same was true at my first computer science conferences, but there I at least had connections to other students and faculty from CMU that I could leverage. At LCA I was often forced to initiate conversations with random people, which is definitely a good thing.

I went to several interesting talks, and some less interesting. Although I enjoyed the joke talks at the time, I think they probably weren't a good use of time. Some of the other talks were just reiterating things I already knew.

One talk that was really useful was the talk about Clutter. I had been wondering if we should be using Clutter somewhere instead of writing our own layers system, but listening to the talk and talking to Emmanuele afterwards, it became clear that would not work. One big design issue is that Clutter animation runs on the main GLib thread, synchronously with application processing (and that will be hard to change), whereas we eventually need to run animation on its own thread. Another big issue is that Clutter depends heavily on OpenGL whereas we need the option to use D3D on Windows.

Jeremy Allison's talk about Microsoft was good. We've feared Microsoft for so long it's become almost unfashionable, but I think Jeremy is right to keep reminding the free software community of the danger there. He talked about Microsoft's attempts to take over the Web, and kindly mentioned Firefox's role in pulling us back from that brink. He made the point (which I think is too often overlooked) that which company one works for is almost always an individual moral choice and we should hold people accountable for it ... we can't let people off the hook by saying "oh, the company I work for is just evil and I can't do anything about it". The focus of his talk was the suggestion that Microsoft is gearing up for an all-out patent war on free software. I don't know if this is true --- honestly, I expected them to do it long ago and I'm not sure what's been holding them back --- but we certainly do need to keep aware of the possibility. Jeremy suggested that Microsoft will promote "RAND" standards --- standards covered by patents whose licenses would require a "Reasonable And Non-Discriminatory" fee, which sound good except that for free software, any non-zero fee is a show-stopper. In fact, as I discussed later in my talk, RAND-encumbered standards won't fly in the traditional Web standards world --- e.g. CSS and HTML5. We have a very good situation there, where everyone understands that any suggestion that can't be implemented in Gecko (MPL/LGPL/GPL) or Webkit (LGPL) is simply a non-starter. However, we do face a very serious situation in video, where the licensing isn't even RAND, and possibly in other technologies such as touch interfaces. It was good to be able to use some of these issues that Jeremy raised as launching points for my talk.

I went to Jan Schmidt's talk about GStreamer. Although the talk wasn't super-interesting, during LCA I did have some interesting discussions with people about whether we should be using GStreamer as the basis of our implementation of HTML5 <video> and <audio>, like Opera is. For now I tend to think it still makes sense to do our own thing, since there's a bunch of stuff GStreamer has that we don't want, like filter graphs and a GLib dependency. But going forward, as we want to add functionality, I think it's definitely something we shouldn't rule out.

Another interesting talk was Andrew Tridgell on patent defence for free software. Most of it was basic stuff about patents that I already knew, but he made one very important suggestion, which was that we need to find a way for free software organizations to collaborate on patent defence by sharing legal information. Right now this is basically impossible; for reasons I don't understand, the legal opinions we get, we are required to keep secret. This makes us vulnerable to a divide-and-conquer strategy because we can't band together to work together effectively on strategy and to share legal advice to keep costs down. Some sort of legal hacking may be required to solve this problem.

Of course, one of the main purposes of conferences is to meet people, and I did. In particular it was great to meet more of the Xiph/Annodex people behind Ogg. It was also great to meet Carl Worth and talk to him about cairo and the layers work that we're doing for hardware acceleration.

The social events were so-so for me, largely because of the issues I mentioned at the beginning, except for the final Penguin Dinner, which was great. The show was amazingly good ... to be honest, I often suspect "culture on display"-style shows of being insulting to everyone involved, but Friday night's was just great. One nice feature of the dinner was that the food was very good and they put out food for every seat at the table, including the seat next to me where nobody was sitting, so I had a double helping of everything I liked. The only really annoying thing about the Penguin Dinner was the relentless fundraising. OK, Life Flight is a good cause, but we can give to good causes without being constantly harangued ... can't we?



Video, Freedom And Mozilla

Note: below is nothing but my own opinion as a developer of video-related Mozilla code!

My LCA talk on Friday was about why open video is critically important to free software, and what Mozilla is doing about (plus a discussion of the relationship between Web standards and free software in general). Little did I know that Youtube and Vimeo would pick the day before my talk to cast a glaring spotlight on the issue!

Youtube and Vimeo have started offering video playback using the HTML5 <video> element. That is good news for free software, since it means you don't need a closed-source Flash player to play the video [1]. However, they only offer video in H.264 format, and that is not good news for free software. A lot of people have noticed that Firefox doesn't support H.264, and apparently many people don't understand why, or know what the problems are with H.264. This is a good time to restate the facts and re-explain why Firefox does not support H.264. I'll be mostly recapitulating the relevant chunks of my talk. (Hopefully a full recording of my talk will become available from the LCA site next week.)

The basic problem is simple: H.264 is encumbered by patents whose licensing is actively pursued by the MPEG-LA. If you distribute H.264 codecs in a jurisdiction where software patents are enforceable, and you haven't paid the MPEG-LA for a patent license, you are at risk of being sued.

So why doesn't Mozilla just license H.264 (like everybody else)? One big reason is that that would violate principles of free software that we strongly believe in. In particular, we believe that downstream recipients of our code should be able to modify and redistribute it without losing any functionality. This is freedom that copyleft licenses such as the GPL and LGPL (which we use for our code) are intended to ensure. It is possible to obtain patent licenses in a way which works around the letter of the GPLv2 and LGPLv2, but honoring the letter while violating the spirit is not a game we are interested in playing.

But aren't there (L)GPL implementations of H.264? Yes, but they're not as free as they appear. Their freedom has been silently stolen by patents (in jurisdictions where those patents exist and are enforceable). The software license permits you to redistribute and use the code, but the MPEG-LA can still stop you. [2]

But the MPEG-LA won't bother suing me or my project, we're not worth bothering with. Perhaps true, but I hope "remain irrelevant" is not the favoured strategy for most free software projects. It's certainly not an option for Mozilla. If we hadn't distributed Firefox to tens of millions of users --- legally --- it probably wouldn't be possible to browse the Web today using anything but IE on Windows. Plus, relying on selective enforcement is rarely a good idea.

Mozilla should just ship without licensing as a civil disobedience measure. That might be fun, but I expect an injunction would quickly force us to disable H.264 and send a hefty damages payout to the MPEG-LA. That's not a win.

Mozilla should pick up and use H.264 codecs that are already installed on the user's system. I've previously written about a variety of reasons this would be a bad idea, especially on Windows. Really there are two main issues:


  • Most users with Windows Vista and earlier do not have an H.264 codec installed. So for the majority of our users, this doesn't solve any problem.
  • It pushes the software freedom issues from the browser (where we have leverage to possibly change the codec situation) to the platform (where there is no such leverage). You still can't have a completely free software Web client stack.

But I could just download gst-plugins-ugly and I'd be OK. That's a selfish attitude. Everyone should be able to browse the Web with a free software stack without having to jump through arcane hoops to download and install software (whose use is legally questionable).

The H.264 patents will expire soon, and then we'll be OK. Many H.264 patents don't expire until 2017 or later. Anyway, H.264 isn't the last word in video compression. There will be an H.265 and the same set of problems will persist.

Users just want video to work. You Mozilla people are such idealists! Yes, that is the reason for Mozilla to exist. Anyway, in the short term, our users probably won't be affected much since Flash fallback will still work. In the long term, I think freedom will ultimately benefit users (not just Firefox users, but all users).

Apart from the issues with H.264 support in clients, there are also huge issues around H.264 for Web authors and content providers. Currently providing H.264 content on the Internet is zero-cost, but after 2010 that will almost certainly change. A couple of good articles are here and here. We won't know much about the terms until the end of this month. The key issue is not exactly how much it will cost, but that if you want to publish H.264 you will probably have to hire lawyers and negotiate a license with the MPEG-LA. If you just want to put a few videos on your Web site, or add a help video to your Web application, or put a video cut-scene in your Web game, that is probably not something you want to do. Web video is not just about Youtube; mandatory licensing would cripple the use of video on the Web. (Just imagine if we had such a regime for still images...) Even if there were no patent issues on the client side, this would still be a good reason for Mozilla to push for truly free codecs.

The honest truth is that none of us know how this is going play out. The proponents of mandatory licensing are strong, and most people don't care about software freedom. We're doing our best to make Ogg Theora rock, and I don't know what else we can do directly right now except spread the word and help people understand what's at stake here ... hence my LCA talk, and this blog post.

[1] Yes, I know gnash and/or swfdec might play these videos, but in general they are not able to keep up with the latest Flash APIs offered by Adobe and used by major sites. Anyway, it's good to not have to play that game.

[2] RANT: in many cases, free implementations of heavily patent-encumbered technology are harmful to the free software ecosystem, for two reasons: people are confused into thinking they have rights that they actually don't, and thus these implementations can discourage the adoption of alternatives that are free for everyone.



The Road To LCA: Ruapehu

Several Mozilla-related people went to linux.conf.au in Wellington last week. We decided to combine this with our annual outdoorsy "team building for accounting purposes" Auckland Mozilla office event, by driving down from Auckland and spending a couple of days in Tongariro National Park on the way.

So, Sunday night found me, Karl Tomlinson, Michael Ventnor, Josh Matthews, Taras Glek, Matthew Gregan, Chris Pearce, Tim Terriberry and a few others in National Park. The drive down was great ... our visitors finally saw the sheep they'd been waiting for. We had a great dinner at Schnapps Bar, the only hitch being that we exhausted their venison pie special. Our evening was occupied with a six-player game of Settlers. I won.

On Monday the weather report predicted afternoon thunderstorms so our plans to ascend to the summit of Mt Ruapehu were aborted. Instead we took the Whakapapaiti Valley Track from the Bruce Road down to Whakapapa Village, many of us taking a detour to the Silica Rapids near the end. Even though it rained a fair bit, I still really enjoyed this walk. It descends from alpine terrain into forest, crosses some nice mountain streams, and the Silica Rapids look awesome. Photos below.

Monday night's dinner was at the Station Cafe ... pretty good, although I think Schnapp's Bar is better value for money. Monday night's game was Scrabble. Josh won, although I claim I was crippled by getting 4 "I"s on my last tile draw.

On Tuesday the weather was again very cloudy up the mountain and the chairlifts were reported to be closed. Just for the heck of it we drove up to the Whakapapa ski area anyway, and thanks to some fast talking by Karl we managed to hire a guide and get the chairlifts to run us up the mountain to the start of the walk to the Ruapehu summit. Due to people leaving on Monday night and Michael not feeling up to wearing his boots for another five hours, six of us went up: me, Matthew and Casey, Karl, Josh and Taras. It was a fantastic trip. The cloud came and went, but we had great views of parts of the mountain, especially the crater lake. Just being up there eating lunch looking down into the swirling waters of the lake was amazing for a volcanophile like me. (The base of the lake is a molten sulphur plug over the vent; the surface lake water is currently heated to 16-17C and is highly acidic.) The walk up was steep and rockhoppy, but fun, and going down was even more fun, due to a few short cuts that involved butt-sliding down the snow. Highly, highly recommended. I can't wait to do it again.

Our guide Callum was great, and had a lot of stories to tell. He was involved in the rescue operation in 2007 when the volcano belched a few rocks into the Dome Shelter hut and smashed the leg of a climber sleeping there. The DOC ranger we met at the summit claimed that there are still pieces of sleeping bag at the bottom of a hole in the hut floor...

As soon as we got back to the car, Michael, Josh, Taras, Karl and I drove straight to Wellington. We didn't get there in time for the Speaker's Dinner, but I had already expected that ... I heard the dinner was great, but I don't regret choosing Ruapehu instead! I'll blog more about LCA soon.



Alpine terrain near the top of the Bruce Road.



A stream in Whakapapaiti Valley. The water was so cold that just dipping my hand made my fingers tingle.



No wonder the venison pie was on special.



Lunch in the Whakapapaiti Valley.



Apparently the water leaches alumino-silica minerals out of the volcanic rock while underground. When it reaches the surface and encounters rapids, carbon dioxide leaves the water and the minerals precipitate out.



Heading up to the Ruapehu summit. Most of the walk ascends rocky ridges like the one you see here.



View of the crater lake from the Dome Shelter area. If you look carefully you can see an outer area of lighter turquoise. The boundaries of that area change slowly as you watch, presumably due to convection from the volcanic heat below. When the volcano is more active you can see more disturbances --- lumps of elastic yellow sulphur rising to the surface, geysers, and I suppose full-on phreatomagmatic blasts, although you only get to see one of those :-).



Sliding down the mountain. Lots of fun, especially with limited visibility.



More fun than static analysis.



A view of Ruapehu on the way to Wellington down the Desert Road. For me, Ruapehu defines the word "mountain".

Saturday, 16 January 2010

Volcanoes And Penguins

Next week is linux.conf.au in Wellington. I'm going to be there from Wednesday to Friday, and I'll be giving a talk about why open video is important for free software. Actually I think I'll make it more interesting by broadening it with some discussion of the relationship between Web standards and free software in general.

On the way down to Wellington, a group of Mozilla-related people are going to be stopping at National Park Sunday night and Monday night so we can attempt the Mt Ruapehu summit walk. Should be fun, if the weather cooperates.

Anyway, this means I'll be completely offline Monday/Tuesday and mostly offline Wednesday to Friday next week.



Friday, 15 January 2010

More On Patch Division

Kyle Huey asked me to comment on my approach to breaking up work into a large number of small patches.

First a big disclaimer: I've only been doing this for a short time, but I know other communities have been doing it for a long time. I bet Linux kernel developers know far more about this than I do. Probably many Mozilla developers do too.

As I always have, I try to plan out the work into incremental steps in my head before I actually start writing code. This plan always needs modification after contact with the enemy but it generally works. Each step is of course a new mq patch.

Sometimes I find that additional steps are needed. If I need to add new steps after the point I'm currently up to, there's nothing to do. If I need to add new steps before the point I'm currently up to, it's easy to pop off mq patches and do the work in the right place.

When I think I've completed a step, I often (but not always) do a build to make sure it compiles. However, I rarely test my changes until I've finished the work. Perhaps this makes me a bad person, but I find that test effort before that point is often wasted since I frequently go back and change previous patches as I learn things from writing later patches.

When I'm testing and fixing bugs, I accumulate many fixes as a diff in my working copy. Periodically (once a day?) I take the changes in that diff and distribute them to the patches that each change logically belongs to. I.e., if I fix a bug in new code that was added by a patch, I apply the fix to that patch.

Once the code works, I then reread the patches (a sort of self-review). One thing I look for is large patches that could be broken up into smaller ones consisting of logically independent changes. (I especially focus on the largest patches, to try to minimize the maximum patch size.) Wherever breakup is easy to do (e.g. because the smaller patches don't overlap), I'll do it. If it's hard because patches overlap, I generally won't do it unless the gain in clarity seems large.
When patches can be broken up into really tiny fragments, I'm not sure what the correct minimum patch size is ... a patch consisting solely of many instances of the same mechanical change probably isn't worth breaking up, unless different reviewers are going to review each piece.

It's important for bisection searches that at each stage in the patch queue, the project at least builds. To verify this for large patch queues I just run a script overnight that does a loop of hg qpush ; make.

When the code is ready to submit, I upload all the patches to the bug, numbering them with "Part NNN" to make it easier to refer to a particular patch. For large projects it's also helpful to publish the patches as a Mercurial repository using qcommit.

If reviews take a while, as they often do for large projects, every so often (once a week to once a month) I'll refresh the patches to trunk (see below) and push the results to tryserver to make sure nothing's broken.

One tip: at least with Mercurial 1.1, DO NOT refresh to trunk using rebasing (e.g. hg pull --rebase) while you have mq patches applied. Instead, hg qpop -a, hg pull -u, and then hg qpush -a. For me, rebasing screws up my patches in various ways. Also I find that fixing conflicts during rebasing is far more tedious than fixing conflicts during qpush, partly because I have no idea what rebasing is actually doing, but mostly because as of 1.1 rebasing with mq seems to force me to fix the same conflicts over and over again.



Wednesday, 13 January 2010

CSS Absolute Length Units

While reexamining the way we handle screen resolutions in Gecko, we had to reconsider how to handle CSS "absolute length units" --- especially pt, but also in, cm, mm and pc. The spec says that these units should be displayed at their physical sizes, so CSS "1in" is rendered as one inch, etc. Unfortunately this breaks Web content, because most Web pages were designed on desktop screens. A one-inch margin may be fine on paper or on a desktop screen but doesn't make much sense on a two-inch-high phone screen. For related reasons, IE and Webkit have already redefined these units to be fixed numbers of CSS pixels, i.e. 1in = 96px, etc.

After I raised this issue on www-style, it became clear that this is not just an issue of compatibility with existing content. In fact, absolute length units as defined in CSS are only rarely useful. When do you want to force a length to be one inch, no matter what kind of screen is being used or how far from the user's eyes it would be? The only use cases I know of would be touch interfaces and "life size" diagrams. Indeed, the CSS spec says

Absolute length units are only useful when the physical properties of the output medium are known.

On the Web, physical properties of the output medium are almost never known, so the spec suggests that currently absolute length units are useless on the Web.

It seems to me that it will be far more useful, as well as compatible with existing Web content, to specify that absolute length units should take those physical lengths when printed to "normal paper" --- paper (or other media) that you hold when reading, like almost every printed page. Then the browser decides how to render the content on a screen to best express the author's intent. Common sense, as well as Web compatibility, will dictate that the ratios between "px", "pt", "in", "cm", "mm" and "pc" will be fixed based on the assumption that 1in = 96px.

We may find it useful to define a new unit or units, say "truemm", that is a true physical millimeter, for the rare use cases of touch interfaces and "life size" diagrams.

Redefining "in" to not always mean physical inches is quite controversial in www-style. I hope sanity prevails.



Unifying Abstractions, Dividing Patches

Speaking of architectural misfeatures, one that's been glaring for quite some time now is the Gecko "view system" --- nsIView, nsIViewManager and friends. Originally views were intended to be used as "heavyweight" rendering objects for elements that used "advanced" features such as translucency. Over time it became clear that it was simpler to just support those features directly in the "frame tree" of rendering objects that most elements have, and now we see that everything in the view system would be simpler and more performant if it was handled by the frame system.

However, there are large chunks of functionality still handled in the view system. One big one is scrolling. Scrollable elements have an nsHTMLScrollFrame (or nsXULScrollFrame) which handles layout, scrollbars, and related stuff, except for the actual scrolling --- the visual movement. That is handled by nsScrollPortView. Well, until today, when I landed a series of patches to move all scrolling functionality in the view system out to the frame system. The hard part of these patches is that there were two interfaces exposed to let other parts of Gecko manipulate scrolling --- nsIScrollableFrame and nsIScrollableView. A lot of code was using the latter view-system interface; APIs present in nsIScrollableView but not nsIScrollableFrame had to be added to nsIScrollableFrame, and then all the callers of nsIScrollableView APIs had to be modified to use the frame API instead. This was a lot of work --- not really very hard, but difficult to get right without accidentally causing regressions.

Anyway, it's done. A lot of code that used to have to mess around with views no longer has to know about them. Scrolling code is simplified since it's all in one place. Page load is little more efficient because we don't have to construct scrollable view objects. Better still, this just happened to fix the long hang at the end of Firefox loading the HTML5 spec, since that was all about updating a big pile of scrollable view objects! (There are still some issues with slow script execution ... they're being worked on.)

When I did this work I decided to do a little experiment in development style by breaking up the work into a very large number of small patches, managed with Mercurial Queues. The final commit has 37 separate changesets. I thought this might be overkill since our culture, and Bugzilla, isn't really designed to handle that many patches for a single bug. While Bugzilla (especially the way we use it for code review) could certainly be much better at handling a large set of patches, overall I declare the experiment a resounding success. I think this approach made it much easier for me to keep track of things, made code review much easier and less daunting, and made it much easier for me to keep the patches up to date over several months as the tree changed underneath it. I'll do it again. In fact, I'm already doing it again with the layers work --- more about that later :-).



There Can Be Only One

For quite a while now, Firefox 3.6 has been done as far as most Gecko work is concerned. Meanwhile all kinds of exciting things are happening on trunk, but we haven't been blogging enough about them. I'll try to do that a bit more starting today, and I encourage my colleagues to do the same!

Since it was first designed, Gecko has supported multiple presentations per document (in code terms, multiple nsIPresShells per nsIDocument). The idea is that you would be able to have multiple views of the same document, each with different style sheets and layouts, and display and even edit the views simultaneously and have them all stay in sync. It sounds cool, and the idea also appears in the DOM specs (see document.defaultView), but the reality is that there is no need for this feature in Web browsers, as Dave Hyatt pointed out a long time ago. (Different views into the same presentation --- e.g., live thumbnails --- are useful, but that's a different and much simpler feature.) Supporting multiple presentations just adds unnecessary complexity.

The only use of multiple presentations in Gecko was for printing and print preview. When printing or previewing a document, we'd create a temporary extra presentation. But for printing you don't need a new presentation for the same document, it's fine to make a copy of the document and create a presentation for the copy. In fact this is a better approach, because you can print or display a preview while continuing to change the original document. You don't want to be displaying a print preview while a script keeps making changes to the document; apart from not being what the user wants, we never supported such dynamic changes properly, so we had to carefully freeze scripting while a print preview was displayed. This was complex and fragile.

Thanks to Olli Pettay this is all history now. He went ahead and implemented the document copying approach for printing and print preview. This approach let us fix a lot of bugs more easily. For example, we can take a snapshot of the current frame of each animated image --- and video, and plugins --- when copying the document, so the copy is truly static (although you can still change the layout, e.g. by changing the page margins). We don't have to worry about temporarily freezing scripts --- we just disable scripting altogether in the copied document.

Better still, this let Olli go ahead and remove the support for multiple presentations. This lets us do various simplifications and optimizations. In particular, before, each presentation had a hash table mapping elements to their "primary frames" used for rendering that element in that presentation. Now a DOM element can have a direct pointer to its primary frame. This is more efficient and also lets us use much simpler code since we no longer need to maintain primary frame maps.

In general there are lots of places in our code where we used to have to think "am I using the right presentation for this document"? We no longer have to think about that.

One architectural misfeature eliminated. Stay tuned for more :-).