Eyes Above The Waves

Robert O'Callahan. Christian. Repatriate Kiwi. Hacker.

Wednesday 1 April 2015

Reverse Execution And Signals

gdb's reverse execution interface interacts with signals in counter-intuitive ways. If you're using rr and gdb reverse execution to debug situations involving signals, e.g. a SIGSEGV, read on...

Consider the following program test.c:

int main(int argc, char **argv) {
  __asm__ __volatile__("jmp 0x42");
}

We can debug this with rr as follows:

[roc@eternity test]$ rr ./test
rr: Saving the execution of `/home/roc/tmp/test' to trace directory `/home/roc/.rr/test-6'.
[rr.170] Warning: task 14677 (process 14677) dying from fatal signal SIGSEGV.
[roc@eternity test]$ rr replay
GNU gdb (GDB) 7.9
...
0x00002aaaaaaaf6f6 in _dl_start () from /lib64/ld-linux-x86-64.so.2
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000042 in ?? ()
(gdb) where
#0  0x0000000000000042 in ?? ()
#1  0x0000000000000000 in ?? ()

At this point you get that awful sinking feeling... But wait!

(gdb) reverse-stepi
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000042 in ?? ()
(gdb) reverse-stepi
main (argc=1, argv=0x7fffffffdf38) at /home/roc/tmp/test.c:28
28   __asm__ __volatile__("jmp 0x42");

Hurrah!

The obvious questions are: why did we have to reverse-stepi twice to get back to the jmp, and why did the first reverse-stepi trigger SIGSEGV again?

If you singlestep forwards through the program using gdb normally, starting at the jmp instruction, you actually see two separate events:

(gdb) run
Starting program: /home/roc/tmp/test 
Breakpoint 1, main (argc=1, argv=0x7fffffffdfd8) at /home/roc/tmp/test.c:28
28   __asm__ __volatile__("jmp 0x42");
(gdb) stepi
0x0000000000000042 in ?? ()
(gdb) stepi
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000042 in ?? ()
What's happening is that the first stepi runs the jmp and arrives at the invalid location. The second stepi tries to run the instruction at that location and triggers SIGSEGV instead. Now, in the original session we first ran all the way past the SIGSEGV. Then our first reverse-stepi makes us reverse-execute one step: in this case, we reverse-execute the triggering of the SIGSEGV. (In gdb, reverse-executing the triggering of a signal prints the signal just as forward-executing does.) The next reverse-stepi reverse-executes the actual jump.

It feels a little weird, though it makes some amount of sense. It's even weirder when you reverse-singlestep through the execution of a signal handler, but it still all makes sense. I've been pleasantly surprised by gdb's robustness at handling that sort of thing. I've been unpleasantly unsurprised by the number of rr bugs I've had to iron out to make this work properly at scale for Gecko debugging!