Tuesday 1 October 2024
Advanced Debugging Technology In Practice
I spent most of September traveling, partly in order to deliver a keynote at the DEBT debugging workshop in Vienna (associated with ECOOP). My goal was to inform people that “advanced” debugging technologies like record-and-replay and omniscience exist, they are being used in practice, and that use is increasing; and to explain what has been successful and what has not, and what I think we need to do to be more successful. There was no recording and I can’t share the slides so here’s my summary.
The first part of my talk was about https://rr-project.org. After a brief explanation of what it is, I talked about what people like about it: the “record first, debug later” workflow; “reverse dataflow analysis” by setting a data watchpoint and reverse-continuing; solving the “stepped too far, start over” problem; breaking at a specific line of output using rr replay -a -M
and rr replay -g
; the familiarity of the GDB interface. I discussed (at a high level) what we know about rr adoption in specific communities — browser vendors, security researchers, Wall St, “Big Tech” companies, and language runtimes. The general trend is that powerful debuggers are popular with people who have lots of code they don’t understand.
I talked about the increasing awareness of rr in the developer community. These days, in almost every Hacker News comment thread about debugging someone will mention rr, which wasn’t the case even as recently as a couple of years ago. (I have heard a similar observation from my friends at Undo.) In the room, most people had previously heard about rr, several people had used it, and a few were even using it in their research. I pointed out that the growth in awareness and usage is organic and cultural, not because of changes in rr itself — rr hasn’t had any major new features in eight years. rr has succeeded more than most similar projects because we made some good decisions ten years ago and have doggedly kept on polishing it. (Thanks to Google for paying me to spend some of my time doing that.)
An interesting point is that awareness of rr and similar tools like Undo and Pernosco is not just a matter of people knowing they exist, but believing that they actually work in practice. In the past, a lot of people simply did not believe that these tools could possibly do what we say they do. That is becoming less of a problem.
I switched gears to talk about Pernosco. I introduced the idea of omniscient debugging and how Kyle and I built Pernosco to explore and try to sell it, first as a hosted service and then on-prem. I talked about how we can exploit massive parallelism to build omniscient databases that instantly answer key questions (e.g. “what is the value in this memory location/register at time T?”, “when was the value last modified?”). I ran through a few of the features that I think our users enjoy — the basic UI structure, reverse dataflow, exploring control flow via callees, and the shared notebook. This UI is less familiar than GDB so some people bounce off, but some people grasp it right away without instruction, so I think we did something right.
Pernosco has only been modestly successful so far, so I talked about what’s gone well and what hasn’t. Customers seem to love the product and keep subscribing. People are used to debuggers being free — that’s a culture issue. You need an rr recording to use Pernosco, so you usually have the option of just debugging with rr, and some people are completely happy with that. Scalability of omniscience to very compute-heavy workloads is a problem (although not as problematic as everyone assumes).
I finished up by talking about open problems that we need to solve to accelerate the adoption of advanced debugging technology. I see two kinds: increasing the power of debuggers, and increasing the ubiquity of debuggers.
We still lack direct approaches to understanding why things didn’t happen. I’ve been saying it for years but we need good tools for understanding why something happened in one situation (testcase, code revision, nondterministic test run) but not another very similar situation, assuming we have complete recordings of the two runs — and doing this at realistic scale.
Users often want to visualize program state at a higher level than the C/C++ level. People write custom pretty-printers for their data structures, but a lot more work is needed here. For debuggers with richer interfaces (like Visual Studio and Pernosco) you want custom visualizations, like Natvis or something even richer.
Custom control-flow visualization is also important. The problems with debugging C++/Rust “async” code are well-known, but there are many other examples. Suppose you have a classic event loop pulling events from a queue; while handling an event, if you look up the stack trace you probably don’t want to see the boring event loop code, you want to see the stack where this event was dispatched.
The most interesting case of debugging at a higher level is when the higher level is an actual programming language. For example if your application contains a Javascript VM, you will sometimes want to look at the state at the Javascript level. Pernosco can do this, but it’s somewhat hardcoded; how can we make it easy for users to provide the customizations for their own programming languages?
What makes all this extra-hard is that we really want extension APIs that work across debuggers, even when those debuggers are quite different in their implementations (e.g. GDB vs Pernsoco). It’s real problem already that GDB and LLDB and Visual Studio all have different APIs for writing custom prettyprinters 😔.
The ubiquity problem is basically that advanced debuggers tend to be the tool of last resort. This creates a vicious cycle where they don’t get used much so there’s little investment, which makes them less usable. I dream of a world where, when you hit a bug (e.g. by running a test) it is faster and easier to drop into the UI of a powerful debugger like Pernosco to debug it than to add logging statements, recompile and rerun. That means, among other things, your test runs are always being recorded and any necessary debuginfo is always available. That will be a lot of work to achieve, but I think it’s feasible — at least in specific domains.