Friday, 16 April 2021

Demoing The Pernosco Omniscient Debugger: Debugging Crashes In Node.js And GDB

This post was written by Pernosco co-founder Kyle Huey.

Traditional debugging forms a hypothesis about what is going wrong with the program, gathers evidence to accept or reject that hypothesis, and repeats until the root cause of the bug is found. This process is time-consuming, and formulating useful hypotheses often requires deep understanding of the software being debugged. With the Pernosco omniscient debugger there’s no need to speculate about what might have happened, instead an engineer can ask what actually did happen. This radically simplifies the debugging process, enabling much faster progress while requiring much less domain expertise.

To demonstrate the power of this approach we have two examples from well-known and complex software projects. The first is an intermittently crashing node.js test. From a simple stack walk it is easy to see that the proximate cause of the crash is calling a member function with a NULL `this` pointer. The next logical step is to determine why that pointer is NULL. In a traditional debugging approach, this requires pre-existing familiarity with the codebase, or reading code and looking for places where the value of this pointer could originate from. Then an experiment, either poking around in an interactive debugger or adding relevant logging statements, must be run to see where the NULL pointer originates from. And because this test fails intermittently, the engineer has to hope that the issue can be reproduced again and that this experiment doesn’t disturb the program’s behavior so much that the bug vanishes.

In the Pernosco omniscient debugger, the engineer just has to click on the NULL value. With all program state available at all points in time, the Pernosco omniscient debugger can track this value back to its logical origin with no guesswork on the part of the user. We are immediately taken backwards to the point where the connection in question received an EOF and set this pointer to NULL. You can read the full debugging transcript here.

Similarly, with a crash in gdb, the proximate cause of the crash is immediately obvious from a stack walk: the program has jumped through a bad vtable pointer to NULL. Figuring out why the vtable address has been corrupted is not trivial with traditional methods: there are entire tools such as ASAN (which requires recompilation) or Valgrind (which is very slow) that have been designed to find and diagnose memory corruption bugs like this. But in the Pernosco omniscient debugger a click on the object’s pointer takes the user to where it was assigned into the global variable of interest, and another click on the value of the vtable pointer takes the user to where the vtable pointer was erroneously overwritten. Walk through the complete debugging session here.

As demonstrated in the examples above, the Pernosco omniscient debugger makes it easy to track down even classes of bugs that are notoriously difficult to work with such as race conditions or memory corruption errors. Try out Pernosco individual accounts or on-premises today!

No comments:

Post a comment