Eyes Above The Waves

Robert O'Callahan. Christian. Repatriate Kiwi. Hacker.

Monday 23 April 2007

History Based Stack Reconstruction

One thing I forgot to mention in the last post was history-based stack reconstruction. This is an idea that I came up with a while ago but I had to leave it out of my original Amber discussion. It was implemented in the Xulrunner prototype debugger that I showed screenshots of.

Traditional debuggers reconstruct the call stack for a thread by examining stack memory. You can pick out return addresses, saved registers, etc, and most of the time you can figure out what state each function was in before it called its callee. The problem is that a) this is very hard to do right and b) you can't always do it right. Anyone who's used gdb a lot is probably familiar with situations where gdb won't give you a stack trace, or will only give you part of a stack trace.

It's hard to do right because optimizing compilers like to mess around with the layout of stack frames, and they like to avoid storing things in stack memory at all. They also like to avoid using a register for the "frame pointer" (especially on x86 where registers are precious gold dust) so they don't always construct the linked list of stack frames that you read about in textbooks. Debuginfo formats like DWARF2 define hellishly complex schemes for encoding information about what the compiler has done so the debugger can figure everything out, but it seems that gcc or gdb or both frequently screw it up. That's not surprising since there's a lot of hairy specification to implement.

Sometimes even the best-implemented tools won't help. Consider tail call optimization, for example: when a function A ends by calling another function B, a commonly desired optimization is to overwrite A's stack frame with the stack frame for B, so that when B returns, control returns directly to A's caller. This optimization simply destroys the information you need to construct the true call stack. Consider bugs where the stack is accidentally (or deliberately!) overwritten with garbage: same problem.

Now suppose you have a Chronicle-based debugger with easy access to the entire history of the program execution, and you want to reconstruct the call stack for some timestamp T. Instead of messing around with the stack memory at time T, look in the history to see which functions have been called but have not yet returned. Those are the functions that are on the call stack.

To implement this, we need to define what it means for a function to return. This isn't obvious since we want to include things like exception unwinding. The definition I chose was that a function called at time T has returned when the stack pointer first becomes greater than the value it had at time T. Defining when a function is called is also not obvious since we want to include things like tail calls, where the control transfer to the function might be just a jump instruction. I chose to define a function call as any control transfer to a instruction labelled as a function by symbol table information, or any execution of an x86 CALL instruction --- unless the destination of the CALL is the next instruction. The latter is a common idiom for getting the current program counter onto the stack so you can pop it off into a register for PC-relative addressing.

The rest of the gory details are in the updated paper --- a PDF version is in the Chronicle tarball I linked to in the last post. It's really not all that hard. The hardest bit is probably that you need a way to efficiently answer the question "what is the maximum value of the stack pointer over the time interval from T1 to T2", which requires a small amount of additional instrumentation all the way back to the Valgrind tool. But in the end it works really well. It's a lot less effort than implementing the DWARF2 stack crawling spec, and it's incredibly robust. Since we don't look at the stack memory at all, we construct a correct call stack in the presence of tail calls, or even if someone has zeroed out the stack completely. Also we never need to determine from the current PC what the enclosing function is (something traditional debuggers do) so we're robust to wild jumps, and also to arbitrary code placement optimizations.

I think this is a good example of how designing a debugger around complete recording of program execution gives you the chance to simplify a lot of things and make them more robust at the same time.