Saturday, 14 November 2009

Sword +5 Against Orange

Today for the first time I seriously used VMWare's record-and-replay feature for debugging. Chris Pearce had set up the mochitest harness to run test_access_controls.html over and over until it failed, to try to catch the "random orange" where the test has been intermittently failing on Tinderbox. He caught a failure after about 30 minutes of continuous test running in a recording VM a couple of days ago. (Each test run takes about 15 seconds, so this represents over a hundred test runs before we reproduced the failure.) Today I got around to debugging it.

Chris had captured an output log and had added some code to nsGlobalWindow::Dump to number the dump() messages. Then we can set a conditional breakpoint in nsGlobalWindow::Dump to stop whenever a particular message is about to be printed. A sensible thing to do is to break when we output the last "TEST PASSED" message before the first test failure. It took almost an hour to replay up to that point (replay is a bit slower than recording), so another sensible thing to do is to then immediately take a VM snapshot so future debugging can resume from that point relatively quickly as often as you like. It takes about two minutes to resume replay from a snapshot.

From then on it was fairly similar to a normal debugging session: setting breakpoints, looking at variables, rerunning the program as you work backwards towards the cause. Except that you don't have to worry about the execution being different and the bug not showing up. You know exactly what the log output is going to be, right down to the addresses of objects dumped in the log, so it's easy to set conditional breakpoints to stop on log events you want to look into.

The whole experience is pretty much what I expected: fabulous! It would be nice if the couple of minutes to restart execution from a snapshot could be reduced, and some of the other UI operations feel sluggish and heavyweight, but this was certainly by far the best way to debug what turned out to be a complex and very hard to reproduce failure. It's a good feeling to know that whatever happens, you will be able to go back over this execution again; it takes the fear out of debugging, and replaces it with confidence that you *will* be able to find the bug. My hat's off to E Lewis and the rest of the VMWare team.

We need to do a bit more work to optimize and document our setup, get some more experience with these features and perhaps get some more automation for running tests, catching failures and building that vital initial snapshot. But I'm pretty confident that soon this will be an essential tool for us.

Wednesday, 4 November 2009

CSS Gradient Syntax

We landed support for a form of CSS gradients on trunk a while ago, but we got considerable feedback that our syntax --- which was an incremental improvement of Webkit's syntax, which basically exposes a standard gradient API in the most direct possible way --- sucked. A bunch of people on www-style got talking and Tab Atkins produced a much better proposal. Since we haven't shipped our syntax anywhere yet, dropping it and implementing Tab's syntax instead was a no-brainer. So Zack Weinberg, David Baron and I did that (using a -moz prefix of course), and today it landed on trunk. It should land on the Firefox 3.6 branch shortly. It's unfortunate to land something new like this after the last beta, but in this case, it seems like the right thing to do instead of shipping CSS gradient syntax that we know nobody wants.

This does mean that anyone who's currently using -moz-linear-gradient or -moz-radial-gradient on pages is going to find that their syntax doesn't work anymore. Hopefully that's not too many people yet.

Tuesday, 3 November 2009

Challenges In Software Research

One of the greatest errors I see in computer science research is a tendency to view research as a spectrum with "applied" at one end and "basic" at the other, with attached assumptions that "applied" is incremental, boring, and engineering-oriented, but "basic" is crisp, radical, and intellectually satisfying. This is a false dichotomy, and I like to debunk false dichotomies.

I think the most principled way to construct such a spectrum is to classify research projects according to the degree of simplifying assumption they make to reach a crisp problem statement. So "applied" research adopts most of the relevant "real world constraints" into its problem statements, and "basic" research throws most of them out. The error is to assume that "real world constraints" make a problem boring, intellectually unsatisfying, and non-conducive to finding real breakthroughs [1]. This may be true "on average", but when choosing a research project one is not constrained by averages. In my experience, interesting research topics are abundant, and there are problems whose solution can be immediately applicable to the world while also being intellectually satisfying and potentially breakthrough science. (The current frenzy over combined symbolic and concrete execution is a good example.)

Let me suggest several such problems in the area of software development technology that I feel aren't yet getting the attention they deserve.

Verifying Refactorings

We know that verifying real software against full specifications is prohibitively expensive in most cases. However, I hypothesize that one could cheaply verify most changes that land in mozilla-central. The secret is that most patches are refactorings --- changes that should not alter observable behaviour --- or can be split into refactorings plus a smaller change that actually alters behaviour. "Should not alter observable behaviour" is very simple to state and understand, but still very tight. It would be huge win if we could prove that most such patches meet that specification. It's unclear how difficult this would be in general, but clearly patches that are the output of refactoring tools should be tractable to verify, and there is a lot of unexplored territory beyond that.

Improved Record And Replay Debugging

I don't have time to work on Chronomancer, but someone should be working out how to expose omniscience to the programmer to optimize the debugging experience.

Testing Fixes For Non-Reproducible Bugs

Record and replay technology is more or less here. (I'll post more about VMWare's solution later, but suffice to say it's working quite well so far!) Suppose you have recording of a bug which isn't reproducible. Now you can replay that recording one or more times and understand the bug and write a patch that should fix it. The problem you immediately run into is: how do you check that the patch really fixes the bug? You somehow need to inject the fix into the code and replay; that's tricky, and the fix may disturb the replay and cause the bug not to occur on that run for reasons unrelated to the bug actually being fixed.

Performance Measurement And Variation

Measuring performance is hard. One huge problem is that you want to measure effects that are much smaller than the random noise in the tests. I don't really know where this noise comes from. I would love to see research that analyzes the sources of run-to-run performance variation and describes techniques for reducing the variation.

I believe all these problems are reasonably easy to get into, have been little investigated in the past, have lots of room for exploration by multiple groups, are intellectually stimulating, and are likely amenable to solutions that could be immediately applied to the needs of software developers.

[1] The converse error --- assuming that "basic" research must not be boring or incremental --- is also widely held, but more easily falsified. Consider the endless stream of esoteric type theories that are both boring and completely incremental, such as my POPL '99 paper!