Tuesday, 27 April 2021

Print Debugging Should Go Away

This is based on a comment I left on HN.

Many people prefer print debugging over interactive debugging tools. Some of them seem to have concluded that the superiority of print debugging is some kind of eternal natural law. It isn't: almost all the reasons people use print debugging can be overcome by improving debuggers — and to some extent already have been. (In the words of William Gibson, the future is already here, it's just not evenly distributed yet). The superiority of print debugging is contingent and, for most developers, it will end at some point (or it has already ended and they don't know it.)

Record-and-replay debuggers like rr (disclaimer: I initiated it and help maintain it), Undo, TTD, replay.io, etc address one set of problems with interactive debuggers. You don't have to stop the program to debug it; you can record a complete run, and debug it later. You can record the program many times until it fails and debug only the execution that failed until you understand the failure. You can record the program running in a far-off machine, extract the recording and debug it wherever you want.

Pernosco (disclaimer: also my baby) and other omniscient debuggers go much further. Fans of print debugging observe that "step debuggers" (even record-and-replay step debuggers, like rr) only show you one point in time, and this is limiting. They are absolutely right. Omniscient debuggers have fast access to all program states and can show you at a glance how program state changes over time. One of our primary goals in Pernosco (mostly achieved, I think) is that developers should never feel the need to "step" to build up a mental picture of how program state evolves over time. One way we do this is by supporting a form of "interactive print debugging":

Once you buy into omniscient debugging a world of riches opens to you. For example omniscient debuggers like Pernosco let you track dataflow backwards in time, a debugging superpower print debugging can't touch.

There are many reasons why print debugging is still the best option for many developers. rr, Pernosco and similar tools can't even be used at all in many contexts. However, most of the limitations of these tools (programming languages, operating systems, hardware platforms, overhead) could be mitigated with sufficient investment in engineering work and a modicum of support from platform vendors. It's important to keep in mind that the level of investment in these tools to date has been incredibly low, basically just a handful of startups and destitute open source projects. If the software industry took debugging seriously — instead of just grumbling about the tools and reverting to print debugging (or, at best, building a polished implementation of the features debuggers have had since the 1980s) — and invested accordingly we could make enormous strides, and not many people would feel the need to resort to print debugging.

Friday, 16 April 2021

Demoing The Pernosco Omniscient Debugger: Debugging Crashes In Node.js And GDB

This post was written by Pernosco co-founder Kyle Huey.

Traditional debugging forms a hypothesis about what is going wrong with the program, gathers evidence to accept or reject that hypothesis, and repeats until the root cause of the bug is found. This process is time-consuming, and formulating useful hypotheses often requires deep understanding of the software being debugged. With the Pernosco omniscient debugger there’s no need to speculate about what might have happened, instead an engineer can ask what actually did happen. This radically simplifies the debugging process, enabling much faster progress while requiring much less domain expertise.

To demonstrate the power of this approach we have two examples from well-known and complex software projects. The first is an intermittently crashing node.js test. From a simple stack walk it is easy to see that the proximate cause of the crash is calling a member function with a NULL `this` pointer. The next logical step is to determine why that pointer is NULL. In a traditional debugging approach, this requires pre-existing familiarity with the codebase, or reading code and looking for places where the value of this pointer could originate from. Then an experiment, either poking around in an interactive debugger or adding relevant logging statements, must be run to see where the NULL pointer originates from. And because this test fails intermittently, the engineer has to hope that the issue can be reproduced again and that this experiment doesn’t disturb the program’s behavior so much that the bug vanishes.

In the Pernosco omniscient debugger, the engineer just has to click on the NULL value. With all program state available at all points in time, the Pernosco omniscient debugger can track this value back to its logical origin with no guesswork on the part of the user. We are immediately taken backwards to the point where the connection in question received an EOF and set this pointer to NULL. You can read the full debugging transcript here.

Similarly, with a crash in gdb, the proximate cause of the crash is immediately obvious from a stack walk: the program has jumped through a bad vtable pointer to NULL. Figuring out why the vtable address has been corrupted is not trivial with traditional methods: there are entire tools such as ASAN (which requires recompilation) or Valgrind (which is very slow) that have been designed to find and diagnose memory corruption bugs like this. But in the Pernosco omniscient debugger a click on the object’s pointer takes the user to where it was assigned into the global variable of interest, and another click on the value of the vtable pointer takes the user to where the vtable pointer was erroneously overwritten. Walk through the complete debugging session here.

As demonstrated in the examples above, the Pernosco omniscient debugger makes it easy to track down even classes of bugs that are notoriously difficult to work with such as race conditions or memory corruption errors. Try out Pernosco individual accounts or on-premises today!

Wednesday, 14 April 2021

Visualizing Control Flow In Pernosco

In traditional debuggers, developers often single-step through the execution of a function to discover its control flow. One of Pernosco's main themes is avoiding single-stepping by visualizing state over time "all at once". Therefore, presenting control flow through a function "at a glance" is an important Pernosco feature and we've recently made significant improvements in this area.

This is a surprisingly hard problem. Pernosco records control flow at the instruction level. Compiler-generated debuginfo maps instructions to source lines, but lacks other potentially useful information such as the static control flow graph. We think developers want to understand control flow in the context of their source code (so approaches taken by, e.g., reverse engineering tools are not optimal for Pernosco). However, mapping potentially complex control flow onto the simple top-to-bottom source code view is inherently lossy or confusing or both.

For functions without loops there is a simple, obvious and good solution: highlight the lines executed, and let the user jump in time to that line's execution when clicked on. In the example below, we can see immediately where the function took an early exit.

To handle loops, Pernosco builds a dynamic control flow graph, which is actually a tree where leaf nodes are the execution of source lines, non-leaf nodes are the execution of a loop iteration and the root node is the execution of the function itself. Constructing a dynamic CFG is surprisingly non-trivial (especially in the presence of optimized code and large functions with long executions), but outside the scope of this post. Then, given a "current moment" during the function call, we identify which loop iterations are "current", and highlight the lines executed by those loop iterations; clicking on these highlights jumps directly to the appropriate point in time. Any lines executed during this function call but not in a current loop iteration are highlighted differently; clicking on these highlights shows all executions of that line in that function call. Hover text explains what is going on.

This presentation is still lossy — for example control-flow edges are not visible. However, user feedback has been very positive.

Try out Pernosco individual accounts or on-premises today!