Saturday, 26 May 2007

Record-And-Replay In Virtual Machines

VMWare has announced very interesting record and replay functionality in VMWare Workstation 6. This is not completely new --- in the research world, TTVM did this, and probably they weren't the first --- but it's excellent to see this functionality arriving so quickly in commercial VMs. It's also neat that they've got debugging integrated into it already. Too bad the debugger is gdb...

They report slowdown of around 5%, which is very good and not surprising, and a logging rate of 2MB/minute for some unspecified workload, which is also very good. These numbers aren't surprising because the inputs to a virtual machine from the outside world are usually very small. Disk I/O stays inside the VM. Only very heavy users of the network, the VMWare shared file system, or some other high-bandwidth input device (e.g., USB video camera) are going to see a significant penalty.

This is definitely the way to go for low-overhead record and replay. A great way to debug would be to use a VM to record a test failure, then replay it under Chronicle-style instrumentation, then debug the failure from the Chronicle database using Chronicle-style debugging. This way, the recording overhead is minimal and could even be used on a live system with negligible perturbation. This suggests to me that approaches like Nirvana that trade off queryability in search of reduced recording overhead probably aren't worth it.

Replaying a VM with massive binary instrumentation is a significant engineering problem; it would require major changes to Valgrind+Chronicle, for example. But conceptually it's fairly straightforward.

This is a really exciting development. There are lots of potential uses for VM record-and-replay. I think this will be a big win for VMWare. Hopefully people will see beyond the obvious forward-and-back time travel approaches to debugging.


    A paper is here.

  2. found the pdf here:

  3. There was a paper from a group at TU Vienna this year that instrumented QEMU to collect taint flow and symbolic constraints from a Win2K guest. That might be an interesting start point for someone who wanted to follow up your suggestion.