Saturday, 23 May 2015

rr Performance Update

It's been a while (March 2014 to be precise) since I gathered meaningful rr performance numbers. I'm preparing a talk for the TCE 2015 conference and as part of that I ran some new benchmarks with mozilla-central Firefox. It turned out that numbers had regressed --- unsurprisingly, since we don't have continuous performance tests for rr, and a lot has changed since March 2014. In particular, Firefox has evolved a lot, our tests have changed, we're using x86-64 now instead of x86-32, and rr has changed a lot. Over the last few days I studied the regressions and fixed a number of issues: in particular, during the transition to x86-64 some of the optimizations related to syscall-buffering were lost because we weren't patching some important syscall callsites and we weren't handling the recvfrom syscall, which is common in 64-bit Firefox. I also realized that in some cases we were flushing much more data from the syscallbuf to the trace file than we'd actually recorded in the buffer, massively bloating the traces, and fixed that.

There are still some regressions evident since last March. Octane overhead has increased significantly. Forcing Octane to run on a single core without rr shows a similar overhead; in particular that alone causes one test (Mandreel) to regress by a factor of 10! My guess is that Spidermonkey is using multiple cores much more aggressively that it did last year and because it's carefully tuned for Octane, going back to a single core really hurts performance. Replay overhead on the HTML mochitests has also increased significantly; I think this is partly because we changed rr to disable syscall buffering on writes to standard output. This improves the debugging experience but it results in a lot more overhead during replay.

Overall though, I remain very happy with rr performance, especially recording performance, which is critical when you're trying to capture a test failure under rr. Replay performance is becoming more important since it impacts the debugging experience, especially reverse execution; but doing a lot of work to improve raw replay performance is low priority since I think there are projects that could provide a better improvement in the debugging experience for less work (e.g. the ability to take a checkpoint during a recording and start debugging from there, and implement support for gdb's evaluate-in-target conditional breakpoints).

Friday, 22 May 2015

BlinkOn 4

Last week I went to BlinkOn 4 in Sydney, having been invited by a Google developer. It was a lot of fun and I'm glad I was able to go. A few impressions:

It was good to hear talk about acting responsibly for the Web platform. My views about Google are a matter of public record, but the Blink developers I talked to have good intentions.

The talks were generally good, but there wasn't as much audience interaction as I'd expected. In my experience interaction makes most talks a lot better, and the BlinkOn environment is well-suited to interaction, so I'd encourage BlinkOn speakers and audiences to be a bit more interactive next time. I admit I didn't ask as many questions during talks as I usually do, because I felt the time belonged to actual Blink developers.

Blink project leaders felt that there wasn't enough long-term code ownership, so they formed subteams to own specific areas. It's a tricky balance between strong ownership, agile migration to areas of need, and giving people the flexibility to work on what excites them. I think Mozilla has a good balance right now.

The Blink event scheduling work is probably the only engine work I saw at BlinkOn that I thought was really important and that we're not currently working on in Gecko. We need to get going on that.

Another nice thing that Blink has that Gecko needs is the ability to do A/B performance testing on users in the field, i.e. switch on a new code path for N% of users and see how that affects performance telemetry.

On the other hand, we're doing some cool stuff that Blink doesn't have people working on --- e.g. image downscaling during decode, and compositor-driven video frame selection.

I spent a lot of time talking to Google staff working on the Blink "slimming paint" project. Their design is similar to some of what Gecko does, so I had information for them, but I also learned a fair bit by talking to their people. I think their design can be improved on, but we'll have to see about that.

Perhaps the best part of the conference was swapping war stories, realizing that we all struggle with basically the same set of problems, and remembering that the grass is definitely not all green on anyone's side of the fence. For example, Blink struggles with flaky tests just as we do, and deals with them the same way (by disabling them!).

It would be cool to have a browser implementors' workshop after some TPAC; a venue to swap war stories and share knowledge about how to implement all the specs we agreed on at TPAC :-).

Friday, 15 May 2015

Using rr To Debug Dropped Video Frames In Gecko

Lately I've been working on a project to make video rendering smoother in Firefox, by sending our entire queue of decoded timestamped frames to the compositor and making the compositor responsible for choosing the correct video frame to display every time we composite the window. I've been running a testcase with a 60fps video, which should draw each video frame exactly once, with logging to show when that isn't the case (i.e. we dropped or duplicated a frame). rr is excellent for debugging such problems! It's low overhead so it doesn't change the results much. After recording a run, I examine the log to identify dropped or dup'ed frames, and then it's easy to use rr to replay the execution and figure out exactly why each frame was dropped or dup'ed. Using a regular interactive debugger to debug such issues is nigh-impossible since stopping the application in the debugger totally messes up the timing --- and you don't know which frame is going to have a problem, so you don't know when to stop anyway.

I've been using rr on optimized Firefox builds because debug builds are too slow for this work, and it turns out rr with reverse execution really helps debugging optimized code. One of the downsides of debugging optimized code is the dreaded "Value optimized out" errors you often get trying to print values in gdb. When that happens under rr, you can nearly always find the correct value by doing "reverse-step" or "reverse-next" until you reach a program point where the variable wasn't optimized out.

I've found it's taking me some time to learn to use reverse execution effectively. Finding the fastest way to debug a problem is a challenge, because reverse execution makes new and much more effective strategies available that I'm not used to having. For example, several times I've found myself exploring the control flow of a function invocation by running (forwards or backwards) to the start of a function and then stepping forwards from there, because that's how I'm used to working, when it would be more effective to set a breakpoint inside the function and then reverse-next a few times to see how we got there.

But even though I'm still learning, debugging is much more fun now!

Sunday, 10 May 2015

Piha Night Walk

Yesterday, as a bit of an experiment, we did a walk out at Piha intending to see the sun set during the walk and completing the walk in the dark. It was a resounding success!

We started near Lion Rock around 4pm and walked north along the beach, then turned inland up the Marawhara Walk, White Track and Rose Track walking quite fast uphill and downhill to Whites Beach. We arrived around 5:40pm just after sunset (5:30ish), so we didn't quite get to watch the sun actually set from the beach as planed --- next time I'd start a little earlier. The view in the dusk was still lovely.

After that we walked out to Te Waha Point in the gathering gloom, then back to Piha Beach via the Laird Thomson track --- lights essential. After reaching the beach we turned our lights off and walked back south along the beach in darkness. This was really fun --- the beach is an easy walk in the dark, the night sky was surprisingly good (considering proximity to Auckland), and there's a narrow sea-cave near the north end of the beach that you can go all the way into to experience true pitch-darkness.

This was even more fun than I'd hoped. Most walks are best done in summer when there's plenty of daylight, but this walk is a great winter option --- after it gets dark you've still got time to drive back to town and have dinner together or whatever. Do bring lights and warm clothing --- it could be quite cold if windy and wet, though we were lucky with the weather. As a bonus it helped my children get used to tramping in the dark.

Thursday, 23 April 2015

rr 3.1 Released

I released rr 3.1 just now. It contains reverse execution support, but I'm not yet convinced that feature is stable enough to release rr 4.0. (Some people are already using it with joy, but it's much more fun to use when you can really trust it to not blow up your debugging session, and I don't yet.) We needed to do a minor release since we've added a lot of bug fixes and system-call support since 3.0, and for Firefox development rr 3.0 is almost unusable now. In particular various kinds of sandboxing support are being added to desktop Linux Firefox (using unshare and seccomp syscalls) and supporting those in rr was nontrivial.

In the future we plan to do more frequent minor releases to ensure low-risk bug fixes and expanded system-call support are quickly made available in a release.

An increasing number of Mozilla developers are trying rr, which is great. However, I need to figure out a way to measure how much rr is being used successfully, and the impact it's having.

Monday, 20 April 2015

Another VMWare Hypervisor Bug

Single-stepping through instructions in VMWare (6.0.4 build-2249910 in my case) with a 32-bit x86 guest doesn't trigger hardware watchpoints.

Steps to reproduce:

  1. Configure a VMWare virtual machine (6.0.4 build-2249910 in my case) booting 32-bit Linux (Ubuntu 14.04 in my case).
  2. Compile this program with gcc -g -O0 and run it in gdb:
    int main(int argc, char** argv) {
      char buf[100];
      buf[0] = 99;
      return buf[0];
  3. In gdb, do
    1. break main
    2. run
    3. watchpoint -l buf[0]
    4. stepi until main returns
  4. This should trigger the watchpoint. It doesn't :-(.

Doing the same thing in a KVM virtual machine works as expected.


Thursday, 2 April 2015

Reverse Execution And Signals

gdb's reverse execution interface interacts with signals in counter-intuitive ways. If you're using rr and gdb reverse execution to debug situations involving signals, e.g. a SIGSEGV, read on...

Consider the following program test.c:

int main(int argc, char **argv) {
  __asm__ __volatile__("jmp 0x42");

We can debug this with rr as follows:

[roc@eternity test]$ rr ./test
rr: Saving the execution of `/home/roc/tmp/test' to trace directory `/home/roc/.rr/test-6'.
[rr.170] Warning: task 14677 (process 14677) dying from fatal signal SIGSEGV.
[roc@eternity test]$ rr replay
GNU gdb (GDB) 7.9
0x00002aaaaaaaf6f6 in _dl_start () from /lib64/
(gdb) cont
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000042 in ?? ()
(gdb) where
#0  0x0000000000000042 in ?? ()
#1  0x0000000000000000 in ?? ()

At this point you get that awful sinking feeling... But wait!

(gdb) reverse-stepi
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000042 in ?? ()
(gdb) reverse-stepi
main (argc=1, argv=0x7fffffffdf38) at /home/roc/tmp/test.c:28
28   __asm__ __volatile__("jmp 0x42");


The obvious questions are: why did we have to reverse-stepi twice to get back to the jmp, and why did the first reverse-stepi trigger SIGSEGV again?

If you singlestep forwards through the program using gdb normally, starting at the jmp instruction, you actually see two separate events:

(gdb) run
Starting program: /home/roc/tmp/test 
Breakpoint 1, main (argc=1, argv=0x7fffffffdfd8) at /home/roc/tmp/test.c:28
28   __asm__ __volatile__("jmp 0x42");
(gdb) stepi
0x0000000000000042 in ?? ()
(gdb) stepi
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000042 in ?? ()
What's happening is that the first stepi runs the jmp and arrives at the invalid location. The second stepi tries to run the instruction at that location and triggers SIGSEGV instead. Now, in the original session we first ran all the way past the SIGSEGV. Then our first reverse-stepi makes us reverse-execute one step: in this case, we reverse-execute the triggering of the SIGSEGV. (In gdb, reverse-executing the triggering of a signal prints the signal just as forward-executing does.) The next reverse-stepi reverse-executes the actual jump.

It feels a little weird, though it makes some amount of sense. It's even weirder when you reverse-singlestep through the execution of a signal handler, but it still all makes sense. I've been pleasantly surprised by gdb's robustness at handling that sort of thing. I've been unpleasantly unsurprised by the number of rr bugs I've had to iron out to make this work properly at scale for Gecko debugging!