Sunday, 8 October 2017

Thoughts On Microsoft's Time-Travel Debugger

I'm excited that Microsoft's TTD is finally available to the public. Congratulations to the team! The video is well worth watching. I haven't used TTD myself yet since I don't have a Windows system at hand, but I've talked to Mozilla developers who've tried it on Firefox.

The most important and obvious difference between TTD and rr is that TTD is for Windows and rr is for Linux (though a few crazy people have had success debugging Windows applications in Wine under rr).

TTD supports recording of multiple threads in parallel, while rr is limited to a single core. On the other hand, per-thread recording overhead seems to be much higher in TTD than in rr. It's hard to make a direct comparison, but a simple "start Firefox, display mozilla.org, shut down" test run on similar hardware takes about 250 seconds under TTD and 26 seconds under rr. This is not surprising given TTD relies on pervasive binary instrumentation and rr was designed not to. This means recording extremely parallel workloads might be faster under TTD, but for many workloads rr recording will be faster. Starting up a large application really stresses binary translation frameworks, so it's a bit of a worst-case scenario for TTD — though a common one for developers. TTD's multicore recording might be better at reproducing certain kinds of concurrency bugs, though rr's chaos mode helps mitigate that problem — and lower recording overhead means you can churn through test iterations faster.

Therefore for Firefox-like workloads, on Linux, I still think rr's recording approach is superior. Note that when the technology behind TTD was first developed the hardware and OS features needed to support an rr-like approach did not exist.

TTD's ability to attach to arbitrary processes and start recording sounds great and would mitigate some of the slow-recording problem. This would be nice to have with rr, but hard to implement. (Currently we require reserving a couple of pages at specific addresses that might not be available when attaching to an arbitrary process.)

Some of the performance overhead of TTD comes from it copying all loaded libraries into the trace file, to ensure traces are portable across machines. rr doesn't do that by default; instead you have to run rr pack to make traces self-contained. I still like our approach, especially in scenarios where you repeatedly re-record a testcase until it fails.

The video mentions that TTD supports shared memory and async I/O and suggests rr doesn't. It can be confusing, but to clarify: rr supports shared memory as long as you record all the processes that are using the shared memory; for example Firefox and Chromium communicate with subprocesses using shared memory and work fine under rr. Async I/O is pretty rare in Linux; where it has come up so far (V4L2) we have been able to handle it.

Supporting unlimited data breakpoints is a nice touch. I assume that's done using their binary instrumentation.

TTD's replay looks fast in the demo videos but they mention that it can be slower than live debugging. They have an offline index build step, though it's not clear to me yet what exactly those indexes contain. It would be interesting to compare TTD and rr replay speed, especially for reverse execution.

The TTD trace querying tools look cool. A lot more can be done in this area.

rr+gdb supports running application functions at debug time (e.g. to dump data structures), while TTD does not. This feature is very important to some rr users, so it might be worthwhile for the TTD people to look at.

No comments:

Post a comment