Wednesday 14 June 2017
New "rr pack" Command
I think there's huge potential to use rr for debugging cloud services. Apparently right now interactive debugging is mostly not used in the cloud, which makes sense — it's hard to identify the right process to debug, much less connect to it, and even if you could, stopping it for interactive analysis would likely interfere too much with your distributed system. However, with rr you could record any number of process executions without breaking your system, identify the failed runs after the fact, and debug them at your leisure.
Unfortunately there are a couple of problems making that difficult right now. One is that the largest cloud providers don't support the hardware performance counter rr needs. I'm excited to hear that Amazon has recently enabled some HW performance counters on dedicated hosts — hopefully they can be persuaded to add the retired-conditional-branch counter to their whitelist (and someone can fix the Xen PMU virtualization bug that breaks rr). Another problem is that rr's traces aren't easy to move from one machine to another. I've started addressing this problem by implementing a new rr command, rr pack.
There are two problems with rr traces. One is that on filesystems that do not support "reflink" file copies, to keep recording overhead low we sometimes hardlink files into the trace, or for system libraries we just assume they won't change even if we can't hardlink them. This means traces are not fully self-contained in the latter case, and in the former case the recording can be invalidated if the files change. The other problem is that every time an mmap occurs we clone/link a new file into the trace, even if a previous mmap mapped the same file, because we have no fast way of telling if the file has changed or not. This means traces appear to contain large numbers of large files but many of those files are duplicates.
rr pack fixes both of those problems. You run it on a trace directory in-place. It deduplicates trace files by computing a cryptographic hash (BLAKE2b, 256 bits) of each file and keeping only one file for any given hash. It identifies needed files outside the trace directory, and hardlinks to files outside the trace directory, and copies them into the trace directory. It rewrites trace records (the mmaps file) to refer to the new files, so the trace format hasn't changed. You should be able to copy around the resulting trace, and modify any files outside the trace, without breaking it. I tried pretty hard to ensure that interrupted rr pack commands leave the trace intact (using fsync and atomic rename); of course, an interrupted rr pack may not fully pack the trace so the operation should be repeated. Successful rr pack commands are idempotent.
We haven't really experimented with trace portability yet so I can't say how easy it will be to just zip up a trace directory and replay it on a different computer. We know that currently replaying on a machine with different CPUID values is likely to fail, but we have a solution in the works for that — Kyle's patches to add ARCH_SET_CPUID to control "CPUID faulting" are in Linux kernel 4.12 and will let rr record and replay CPUID values.