Thursday, 28 December 2006

A New Approach To Debugging

One of the painful truths about Linux development is that debugging sucks. Specifically, gdb on Linux sucks. Basic functionality simply does not work reliably. The details of the brokenness vary depending on the version, but the general suckiness seems to hold up across versions. Since it fails on the basics, we need not even discuss the lack of modern features or its appalling usability. This is a big problem for Linux because it is driving developers away from the platform.

Here's a deeper and less widely understood truth: all debuggers suck. Debugging is mainly about tracing bug symptoms back to buggy code, i.e. tracing effects to causes. Effects follow their causes in time, so debugging by nature requires backwards traversal of time. But traditional debuggers don't provide this because it's not obvious how to implement it. Instead they let the programmer monitor a program's forward execution, in the hope that the programmer will be able to somehow remember or reconstruct enough history, usually by running the program several times, to figure out for themselves the backwards cause and effect chain. Such tools aren't particularly suited to the debugging task; they're just a compromise forced by either limited imagination or implementation constraints.

Lots of other people have made this observation and have tried to extend traditional debuggers with some support for looking backwards in time. For example:


  • SoftICE supports a "back trace buffer" that records a limited number of instruction executions in a certain instruction address range. Only the program counter values are recorded.
  • Green Hills' "Time Machine" debugger also has a trace buffer, which apparently can include register and memory deltas so that old memory and register values can be restored by "backwards stepping" through the buffer. Unfortunately it seems this buffer is limited to memory, and therefore for a large application only a small history window can be retained. Furthermore Green Hills' implementation requires specialized hardware if you want to trace more than program counter values.
  • TTVM logs the interactions between a guest virtual machine and its host so that later the exact execution of the guest VM can be replayed. Periodic checkpoints are recorded so that the replay cost required to reach any desired point in time is limited.

In Time Machine and TTVM, the UI is a traditional debugger UI extended with "reverse" versions of the standard forward execution commands. The debugger still displays the program state at a certain point in time; there is now some capability to move the current point backwards as well as forwards.

This is good progress --- but those tools still need to overcome implementation limitations to become immediately useful for mainstream desktop application debugging (i.e. useful to me!). More importantly, I want more than just reverse execution. For example, when debugging we frequently want to know exactly when a variable was set to its current (incorrect) value, and see what the program was doing at that time. Reverse-execution debuggers can set a watchpoint on the variable and execute backwards to find that write, but it would be quicker and more powerful if the debugger just knew the history of the variable and could perform an efficient database lookup to find the last time the variable was written before the current time. The debugger should have the entire history of the program available for efficient inspection and querying. With such a debugger, after we've gathered the history we don't need to run the program any more; the user just browses the history in the debugger until the bug is located.

The only other exposition of this vision I know of is Omnisicient Debugging. Bil Lewis has created a Java debugger that takes this approach. Unfortunately it is restricted to Java programs, and apparently rather small and short-running Java programs since it keeps the trace data in memory. So the big question is: can we implement this vision so that it's a practical approach for debugging real applications on commodity hardware?

I have spent a few of the last 18 months investigating this question. I have built a system, which I'm calling Amber, to record the complete execution history of arbitrary Linux processes. The history is recorded using binary instrumentation based on Valgrind. The history is indexed to support efficient queries that debuggers need, and then compressed and written to disk in a format optimized for later query and retrieval. The history supports efficient reconstruction of the contents of any memory location or register at any point in time. It also supports efficient answers to "when was the last write to location X before time T", "when was location P executed between times T1 and T2", and other kinds of queries. I can record the 4.1 billion instructions of a Firefox debug build starting up, displaying a Web page, and exiting; the compressed, indexed trace is about 0.83 bytes per instruction executed. (With some cleverness, we achieve compression ratios of over 20 to 1.) It takes less than half an hour to record on my laptop. 300X slowdown may sound like a lot, but spending computer time to save programmer time is almost always a good idea. Furthermore Amber tracing easily parallelizes across multiple cores so hardware trends are in my favour.

I've built a trace query engine in C and a prototype debugger front end using XULRunner to demonstrate the power of this approach. It works as expected. I've found that basing the debugger entirely on a "complete recording" approach dramatically simplifies some of the difficult parts of debugger implementation. Novell has filed for a patent on my work (and is planning to file another regarding an extension which I haven't described here yet), but I hope to get Novell to release the source code under an open source license so I can keep working on it. Maybe others will be interested in working on it too.

When I get a chance I'll write more about how Amber works, its full set of features, and what needs to be done to get it to prime-time.



37 comments:

  1. This sounds amazing, and could really help. I hope Novell do release it as OSS. :-D

    ReplyDelete
  2. Does it only work for firefox, or can you throw just about any app at it?
    Either way, I *really* hope it'll become free software.

    ReplyDelete
  3. This is a cool idea. I looked at the problem a bit 15 years ago, and even then there was a lot of prior work. If I recall correctly, a serious problem with this approach is that in order to replay, you either have to remember a *lot* of state, or remember a smaller amount but take a lot of time for each step backwards. That somewhat limits the usefullness of the approach.
    More importantly for Amanda, if the tool is restricted by patents, it'll essentially be dead in the water in the Open Source world. Novell's name is mud right now because of the deal they cut with Microsoft. Maybe you could convince Novell to renounce their Amanda patents as a measure of good will. It won't help them, but it would help you.
    (PS: This blog software is seriously broken.)

    ReplyDelete
  4. I'm sure Rob is also aware of the Standard ML of NJ debugger that supported reverse execution, based on checkpoint and replay:
    http://citeseer.ist.psu.edu/tolmach93debugger.html
    That method wouldn't extend well to imperative languages like C that do more frequent stores. And unlike Rob's slick approach, it interacts poorly with operations like I/O that are hard to reverse and/or nondeterministic.
    This raises another question I have: would the Amber approach support exploring "alternate timelines" (what if value X rather than Y had been written at time T?). This seems like a next step you might want after determining the apparent cause of the misbehavior, to try out a fix without having to reproduce the problem.
    Would reconstituting an executable state from the trace be "hard"? Does the trace include enough information to do it in principle? Of course you have the sticky I/O issues.
    Looking forward to hearing more.

    ReplyDelete
  5. Robert O'Callahan28 December 2006 21:52

    Anonymous: you can throw pretty much any app at it.
    Mike:
    > If I recall correctly, a serious problem with
    > this approach is that in order to replay, you
    > either have to remember a *lot* of state
    What I've tried to explain is that a) "replay" is the wrong way to think about this and b) you do have to remember a lot of state, but it turns out you can compress it really, really well, and the output data volume is not too high to get work done.
    There's a lot of FUD floating around about Novell, Microsoft and patents. Nevertheless, the concerns are about Microsoft's patents, not Novell's, so I don't see a problem here.
    Darrell: Reconstructing an executable state and restarting execution would be fairly easy in theory: as I said, we can reconstruct the contents of every memory location and register for whatever time we want. As you obviously realize, the hard issues are reconstructing out-of-process state such as kernel state. Probably doable though. Would be a very nice project. It would be useful perhaps to skip that part and just allow execution up to the next system call ... that would let you test many kinds of fixes. Another issue is that you'd need some kind of edit-and-continue capability to "patch in" updated code without changing the layout of unaffected code.

    ReplyDelete
  6. Any idea if the recording is IO bound on the disk or if a single machine running the app could quickly offload the state stream to a farm of recorders?
    run state -> socket -> message queue -> farm -> disks?

    ReplyDelete
  7. Robert O'Callahan29 December 2006 02:14

    The way things are configured right now, the bottleneck is the zlib compression.
    I suspect the before-compression bandwidth is too high for effective offload to other machines, but it might be doable. There are lots of possibly interesting configurations.

    ReplyDelete
  8. Sounds great - but for now, I'd be happy with the simplicity of the VC debugger and the ability to step into the code with watches, expression evaluators, etc. This may technically be possible with gdb and some GUI wrapper, but I really have no clue and haven't spent the time to try and learn gdb's ins and outs...

    ReplyDelete
  9. Brilliant idea. I'd always taken non-reversibility as a given :)
    I wonder how expensive the javascript (or python, perl, etc.) equivalent would be in terms of space and speed --- whether it would be practical today.

    ReplyDelete
  10. Robert O'Callahan29 December 2006 06:48

    You can use Amber to record the execution of a Python or Javascript program. Heck, Firefox is such a program.
    Of course, front-end work is required to display the state of the Javascript or Python interpreter (or JIT, for that matter) in a way that makes sense to Javascript or Python programmers, so there's no free lunch. But the basic technology is language and runtime neutral.
    I think this is a very interesting approach to debugging JITted code. You can see the execution of JITted code, then look back at the execution of the compiler that generated the code to figure out the relationship between the source code to the JITted code. The JIT compiler probably doesn't have to generate debug information at all.

    ReplyDelete
  11. Sorry, but the problem is not just with Microsoft's patents. The problem is with software patents in general.

    ReplyDelete
  12. Robert O'Callahan29 December 2006 20:41

    Yes, software patents in general are bad. But the anti-Novell brouhaha that Mike referred to and I was responding to is specifically about *Microsoft* patents.

    ReplyDelete
  13. Oh the irony - I was working on reverse execution in GDB at Apple, before jumping to Mozilla. Can't say too much about it, since not in any product, except that I was very happy to escape...
    Another system worthy of notice is the Simics instruction-set simulator, which can reverse at the instruction level. Trunk GDB has support for it, in the form of reverse-step and the like.

    ReplyDelete
  14. This is a good approach to debugging, but hardly new. Check out the debugger in Visualworks Smalltalk (you can go back to arbitrary parts of the stack frame, of the running program, change values and/or change CODE, save, and continue as if nothing had happened; this is much more useful as the default action for unhandled exceptions is to give several options, including bringing up the debugger). Other Smalltalk implementations, such as Squeak, have similar debuggers. http://www.cc.gatech.edu/computing/classes/cs2390_96_spring/stable/Debugger.html
    has a dated-looking screenshot - but it is from '96; I believe the general form predates this.
    The VisualWorks debugger is -by far- the best that I've used, and I've used quite a few. I'd go so far as to say that it doesn't suck.
    Less impressively, Ocaml has a "back-in-time" debugger based on snapshots; http://caml.inria.fr/pub/docs/manual-ocaml/manual030.html
    Related, though not identical, with C and gdb, from 1995: http://lwn.net/1999/0121/a/mec.html

    ReplyDelete
  15. Robert O'Callahan30 December 2006 02:35

    Stack unwind and edit-and-continue are quite common these days; Java and Visual Basic have supported them for a long time, for example. They are, however, quite different to what I'm talking about.
    The O'Caml debugger sounds similar to the SML/NJ debugger that Darrell mentioned. That's closer to what I'm talking about, but still has a lot of limitations compared to Amber.
    I've been aware of MEC's work for a long time. It can be thought of as a replay technology that prefigures work like ReVirt. It's less complete, for example I don't think MEC ever replayed signal timing accurately.

    ReplyDelete
  16. Robert O'Callahan30 December 2006 02:42

    It's worth pointing out that a replay facility like MEC's or ReVirt would work well in conjunction with Amber. You could capture a replayable trace with low overhead, then reexecute --- slowly, and offline --- with Amber recording enabled --- then debug.

    ReplyDelete
  17. What about DTrace?

    ReplyDelete
  18. Robert O'Callahan30 December 2006 07:46

    DTrace is also different --- a more powerful program monitoring framework, basically.
    And, in case anyone mentions it, Frysk is another program monitoring framework in development. Interesting, but not much like Amber.

    ReplyDelete
  19. From the paper titled "Framework for Instruction-level Tracing and Analysis of Program Executions"
    We present a runtime framework with a goal of collecting a complete, machine- and task-independent, user-mode trace of a program’s execution that can be re-simulated deterministically with full fidelity down to the instruction level. The framework has reasonable runtime overhead and by using a novel compression scheme, we significantly reduce the size of traces. Our framework enables building a wide variety of tools for understanding program behavior. As examples of the applicability of our framework, we present a program analysis and a data locality profiling tool. Our program analysis tool is a time travel debugger that enables a developer to debug in both forward and backward direction over an execution trace with nearly all information available as in a regular debugging session. Our profiling tool has been used to improve data locality and reduce the dynamic working sets of real world applications.
    Oh. Here's the URL to the PDF version: http://research.microsoft.com/manuvir/papers/instruction_level_tracing_VEE06.pdf.

    ReplyDelete
  20. Robert O'Callahan30 December 2006 09:46

    That is interesting; I didn't know about it, so thanks for the pointer.
    It's definitely the closest thing to Amber that I know of. However, their approach is still based on reexecution (or rather, resimulation), and hence is actually quite different to what Amber does. It means they have lower overhead --- although not as much lower as you might think, if you add up the "recording" and "resimulation" overheads. On the other hand, it can't provide the efficiency of Amber's state reconstruction and queries (although unfortunately they don't measure that, and just mention vaguely "indexes" in their debugger). They certainly do enough to add reverse execution to a traditional debugger, but that was already done in ReVirt. I'm aiming for even more.
    Very interesting to know that it's being used inside Microsoft, though. I wonder if they're planning to make a product out of it.

    ReplyDelete
  21. I doubt it'll become a product anytime soon (<= 3yrs). Several internal dev apps have been used for years before becoming commercial products, if at all.
    The MS Research project's perf claims look like they're about an order of magnitude faster than what you state for Amber - though I don't know if the numbers are comparable. If they are, then Amber will have to have some features not easily doable with other tools.
    It sounds like you have other ideas in mind.

    ReplyDelete
  22. Robert O'Callahan31 December 2006 02:13

    My idea is that the recording overhead doesn't matter very much for a lot of applications and/or bugs, in particular when we can do the recording without any human intervention ... which we often can, because these days we have a lot of infrastructure for running automated tests.
    Given that assumption, it makes most sense for us to focus on reducing the time spent during the debugging phase, because that's human time, which is far more valuable than computer time. That's where I think the extra information captured by Amber will be valuable.
    If that assumption is wrong, though, there are quite a few things that can be done to reduce Amber's recording overhead.

    ReplyDelete
  23. Moving the mechanics of recording from a binary instrumentation framework like Valgrind to a hypervisor (virtual machine monitor, e.g. Xen) would probably make it easier to record either the full state of the machine or an arbitrary subset of the functionality, since the hypervisor sits below even operating systems and knows everything about all aspects of what it's hosting. Perhaps it would also speed things up, compared to the kinds of indirections Valgrind introduces.
    I can only assume that it would at least make it fully possible (and "easy") to make whole-system snapshots of every single aspect you're interested in (kernel state, you name it). Of course, the amount of data output by such a low-level recorder would have an upper bound that is enormous.
    (Thinking out loud: Much like you can share a VM snapshot today, you could potentially share a "debug snapshot" between developers or, in an ideal world, between end users to developers, allowing a bug assignee to run an actual snapshot of a system where the bug manifests itself.)

    ReplyDelete
  24. Robert O'Callahan31 December 2006 04:05

    Hypervisors have no way to see what's happening on an instruction-by-instruction basis, which is what Amber wants. If all you want is replay, AND you're running on a single processor core, hypervisors are great.
    If you're running on multiple cores then for accurate replay a hypervisor isn't enough, because it can't tell how writes to shared memory by one processor are observed by other processors.
    Amber actually makes it easy to share program traces; the traces themselves are very large and hard to move around, but the query engine is accessible over a socket so you can allow other people to connect and analyze the trace. So we could have new kinds of Bugzilla workflows where users and testers record bugs using Amber and publish query engine connection info in Bugzilla; developers can then connect, debug, and submit a patch for testing. Great for those hard-to-reproduce bugs.

    ReplyDelete
  25. I'm not saying there exist such capabilities today. I was thinking along the lines of a far-reaching paravirtualized system, where more aspects of the hosted OS would be hypercalls rather than plain old syscalls. A lower level instrumentation sort of thing.
    Although thinking about it some more, it may end up being a problem that degrades into a "attach Valgrind to everything" sort of situation... At the very least it's even farther from trivial than Amber. Possibly not worth the hassle.

    ReplyDelete
  26. Wow. This is huge. That's all I have to say. Congrats Robert!

    ReplyDelete
  27. Some comments/corrections about the Green Hills TimeMachine debugger.
    1) About 90% of the debugging problems a programmer can encounter can be solved with a very small trace buffer. Typically the last 1 to 5 Million instructions is sufficient.
    2) The TimeMachine trace buffer is not limited to memory. It downloads chunks of trace data from the program at certain intervals. Those intervals are either when the user requests, when the target halts, when the chunk fills up, and/or a regular polling interval.
    Each chunk is limited to memory. However all of the chunks put together are not.
    3) You would be surprised at how much data memory can store. A 1MB trace buffer can store about 5 Million instructions. Therefore if you have a GB of extra RAM you can record up to 5 billion instructions. I've tested this with the Green Hills tools and works well.
    4) The performance hit of the Green Hill's instrumentation is about 1.5X to 7X. At that speed your program is still completely usable. Slowing things down 300X can make any application unusable. Something that takes 3 seconds (like program startup) now takes 15 minutes.
    5) If you are debugging certain types of hardware targets you can do all of this with NO PERFORMANCE HIT. There is built in hardware for these devices which spits out trace data at real time.
    6) If you are using a Green Hills SuperTrace Probe you automatically get 1GB of trace buffer. Therefore you have zero performance hit, and zero memory hit.
    Note that these tools are all currently commercially available (I have them!), but very much proprietary.

    ReplyDelete
  28. FWIW, real automatic tests for many GUI-rich apps is still fairly rare in some circles. It would be nice to dream of a day when this isn't so, but I'm not sure how realistic a dream it is. I think having the performance be good enough to be able to run small interactive test cases would be a great thing, although I don't know how easy that would be.

    ReplyDelete
  29. Robert O'Callahan4 January 2007 02:17

    Michael: Thanks for those comments. All too often it's hard to figure out what's really going on from marketing material. Does the Green Hills debugger support "go back to the last time this memory location was modified and show me the program state at that time"?

    ReplyDelete
  30. I have no special information about Amber, but it seems to me that many folks commenting above seem to be misunderstanding the approach Amber is taking, and how it's fundamentally different that the "rewind" tools. (Robert, please confirm, clarify or refute my point when you have a minute!)
    The goal of Amber is not just to be able to step backwards, the goal is to create a complete multidimensional model of the program's execution including the time dimension. I think the easiest way to explain the difference is to imagine tracking a bouncing ball.
    A traditional debugger lets you press pause and resume time, but it can't ever go back in time. You can start over again, but that's it (as far as time is concerned).
    What the majority of the "rewind" tools mentioned above do (to a greater or lesser degree) is let you also rewind time. Some let you rewind a little, others a lot.
    What Amber is doing is essentially creating an n+1 dimensional model of the data, with the time dimension expanded out. So, instead of seeing where the ball is at a particular point in time, Amber doing the moral equivalent of drawing a 2D graph of the ball's height on the y-axis and time on the x-axis. For some problems (eg. determining the height of the ball at time 't'), Amber doesn't seem to be very interesting. But for other problems (eg. determining the point in time when the ball has the greatest height), the Amber approach is pretty compelling.
    Another way to think of this difference is that, when you're examining the program state with Amber, the program is no longer running. Of course, everything that happened during the run can be examined, to the granularity of a single instruction. (Once I understood this, the name Amber was immediately and obviously the perfect choice!)
    The advantages to this is that, when looking at the results an Amber run, you are not at all bound by real-time. You can do queries with time as a component, but going "forward" or "backward" doesn't actually take any time. (Any more than looking at a different part of the ball's height chart takes time.)
    The primary disadvantage is that it doesn't let you alter the program state. Robert pointed out that you could use the Amber data to reconstruct the memory state at a particular point in time, then modify that state and continue executing from there. While that's cool and all, it's not really the point of Amber. If it's important to modify the program's state to see how that changes things, it doesn't seem like the Amber approach is the best.
    On the other hand, modifying the program state is more a short-cut to helping determine *how* to fix a bug, while Amber seems like it would be a way more efficient tool to determine *where* to find the bug.
    Robert, I'm pretty darned impressed with the concept behind Amber, and I hope that somehow we'll all be able to use something like it someday! Thanks!

    ReplyDelete
  31. What about http://www.virtutech.com/products/simics-hindsight.html

    ReplyDelete
  32. Robert O'Callahan6 January 2007 23:46

    dete: That's very accurate. I'm glad you "get it"!
    ol: That's another reverse execution system, yes. Like other reverse execution systems, it doesn't have Amber's query capability.

    ReplyDelete
  33. Robert,
    In response to your question...
    > Does the Green Hills debugger support "go back to
    > the last time this memory location was modified
    > and show me the program state at that time"?
    Yes it does have that ability. In fact, if you double click on a variable in the debugger a graph will appear showing a history of its value over time.
    There are some limitations... it only works on global variables and graphing only works on integer types.
    The n-dimensional approach sounds cool, but this tool just works and is insanely powerful. Once you start using a back in time debugger like this you never want to go back to gdb or visual studio.

    ReplyDelete
  34. >DTrace is also different --- a more powerful program monitoring framework, basically.
    As far as I know, dtrace is actually a "system" monitoring tool,it is able to trace a great deal of userland + kernel behaviours to locate the problem.
    Information for dtrace can be found at:
    http://opensolaris.org/os/community/dtrace/ and http://docs.sun.com/doc/817-6223
    BTW: dtrace is really a good tool which helps me a lot in debugging problems.
    And Amber seems like to be a tool for "program" debugging,aims to give detailed history for "application programs".

    ReplyDelete
  35. Another one to check out is Jockey:
    "Jockey is a user-space library for recording and replaying an execution of generic GNU/Linux programs. It is a debugging tool especially for long-running networked servers that suffer from bugs that are difficult to reproduce."
    http://home.gna.org/jockey/

    ReplyDelete