Thursday, 23 April 2015

rr 3.1 Released

I released rr 3.1 just now. It contains reverse execution support, but I'm not yet convinced that feature is stable enough to release rr 4.0. (Some people are already using it with joy, but it's much more fun to use when you can really trust it to not blow up your debugging session, and I don't yet.) We needed to do a minor release since we've added a lot of bug fixes and system-call support since 3.0, and for Firefox development rr 3.0 is almost unusable now. In particular various kinds of sandboxing support are being added to desktop Linux Firefox (using unshare and seccomp syscalls) and supporting those in rr was nontrivial.

In the future we plan to do more frequent minor releases to ensure low-risk bug fixes and expanded system-call support are quickly made available in a release.

An increasing number of Mozilla developers are trying rr, which is great. However, I need to figure out a way to measure how much rr is being used successfully, and the impact it's having.

Monday, 20 April 2015

Another VMWare Hypervisor Bug

Single-stepping through instructions in VMWare (6.0.4 build-2249910 in my case) with a 32-bit x86 guest doesn't trigger hardware watchpoints.

Steps to reproduce:

  1. Configure a VMWare virtual machine (6.0.4 build-2249910 in my case) booting 32-bit Linux (Ubuntu 14.04 in my case).
  2. Compile this program with gcc -g -O0 and run it in gdb:
    int main(int argc, char** argv) {
      char buf[100];
      buf[0] = 99;
      return buf[0];
    }
  3. In gdb, do
    1. break main
    2. run
    3. watchpoint -l buf[0]
    4. stepi until main returns
  4. This should trigger the watchpoint. It doesn't :-(.

Doing the same thing in a KVM virtual machine works as expected.

Sigh.

Thursday, 2 April 2015

Reverse Execution And Signals

gdb's reverse execution interface interacts with signals in counter-intuitive ways. If you're using rr and gdb reverse execution to debug situations involving signals, e.g. a SIGSEGV, read on...

Consider the following program test.c:

int main(int argc, char **argv) {
  __asm__ __volatile__("jmp 0x42");
}

We can debug this with rr as follows:

[roc@eternity test]$ rr ./test
rr: Saving the execution of `/home/roc/tmp/test' to trace directory `/home/roc/.rr/test-6'.
[rr.170] Warning: task 14677 (process 14677) dying from fatal signal SIGSEGV.
[roc@eternity test]$ rr replay
GNU gdb (GDB) 7.9
...
0x00002aaaaaaaf6f6 in _dl_start () from /lib64/ld-linux-x86-64.so.2
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000042 in ?? ()
(gdb) where
#0  0x0000000000000042 in ?? ()
#1  0x0000000000000000 in ?? ()

At this point you get that awful sinking feeling... But wait!

(gdb) reverse-stepi
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000042 in ?? ()
(gdb) reverse-stepi
main (argc=1, argv=0x7fffffffdf38) at /home/roc/tmp/test.c:28
28   __asm__ __volatile__("jmp 0x42");

Hurrah!

The obvious questions are: why did we have to reverse-stepi twice to get back to the jmp, and why did the first reverse-stepi trigger SIGSEGV again?

If you singlestep forwards through the program using gdb normally, starting at the jmp instruction, you actually see two separate events:

(gdb) run
Starting program: /home/roc/tmp/test 
Breakpoint 1, main (argc=1, argv=0x7fffffffdfd8) at /home/roc/tmp/test.c:28
28   __asm__ __volatile__("jmp 0x42");
(gdb) stepi
0x0000000000000042 in ?? ()
(gdb) stepi
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000042 in ?? ()
What's happening is that the first stepi runs the jmp and arrives at the invalid location. The second stepi tries to run the instruction at that location and triggers SIGSEGV instead. Now, in the original session we first ran all the way past the SIGSEGV. Then our first reverse-stepi makes us reverse-execute one step: in this case, we reverse-execute the triggering of the SIGSEGV. (In gdb, reverse-executing the triggering of a signal prints the signal just as forward-executing does.) The next reverse-stepi reverse-executes the actual jump.

It feels a little weird, though it makes some amount of sense. It's even weirder when you reverse-singlestep through the execution of a signal handler, but it still all makes sense. I've been pleasantly surprised by gdb's robustness at handling that sort of thing. I've been unpleasantly unsurprised by the number of rr bugs I've had to iron out to make this work properly at scale for Gecko debugging!