More About Amber

I have uploaded a document describing Amber in some detail. It focuses on the back end: the instrumentation, indexing, and compression. I also motivate Amber by discussing the debugging problem and related attempts to solve it. There are a few important ideas here:

A conceptual framework for classifying debuggers and execution recorders.
Treating instruction execution and memory writes as two instances of a common concept --- "memory effects" --- and providing a single implementation and API for recording, indexing, compressing, storing and querying them.
The "bunching" optimization for memory effects.
The principle that repetitious program behaviour should produce repetitious input to the compressor in order to maximise the benefits of compression, and how to tweak the formatting of trace data to follow that principle.

The document also contains screenshots of the prototype XULRunner-based debugger based on Amber, to motivate Amber's design by illustrating what is possible when you go beyond just reverse execution. I assure you that those screenshots are not faked in any way; it really works :-).

Comments

Philip Withnall

Having read through the PDF, and developed a headache, I've got a few comments:
The timestamps could get very large; as you said, a simple Firefox test run executes over 4.8 billion instructions. Do you have any plans to make them easier to read in the interface? How about displaying them as differences from the current timestamp, when applicable?
I suppose you've already explored this, but as you suspect most of the CPU time is being spent in zlib, how about changing to use a different compression library (one which has been designed for ridiculously fast compression) such as lzop?

Chris Cunningham

That pdf is being sent as application/octet-stream by the looks of things.
- Chris

Robert O'Callahan

Philip: I never display timestamps in the interface.
There's a lot that could be done to improve performance. But now it's more important to build a useful debugger that can use this data.
Chris: I can't change the MIME type unfortunately, the site is hosted by Google Pages.

Philip Withnall

Hmmm...OK.
There's probably some obvious reason I'm missing (still relatively new to programming and the like), but why put in the memory addresses of variables in the debugger interface? Or is this just temporary, waiting for something else to be implemented, like hyperlinked queries to when that variable was last modified, or its source?

Dan Amelang

Very exciting stuff! FYI, there is another little-known PIN-based Deterministic Replay Debugging tool that you might want to look into (if you haven't already) called BugNet.
The software implementation is described here:
http://www.cse.ucsd.edu/~skumar/papers/sw-bugnet.pdf
The hardware implementation is described here:
http://www-cse.ucsd.edu/~calder/papers/ISCA-05-BugNet.pdf
Some slides from a talk about BugNet are here:
http://www.cs.ucsd.edu/~skumar/bugnet.ppt
BugNet uses some interesting heuristics for both keeping the log size down and reducing the performance overhead. From the first paper: "...on average we require less than 10MB of BugNet checkpoint logs to have the ability to replay 100 million instructions..." and "...on average the performance overhead of the logger is 86x"
Of course, they don't log as much as you do, so YMMV, but you might still get some good ideas.
Once again, nice work! It will be fun to play around with it once you're ready to release it.

Robert O'Callahan

I know a little bit about Bugnet. Thanks for the pointers.
Bugnet seems less interesting than the alternatives. Their overhead isn't much lower than Amber, even though they're logging a lot less and don't support efficient state reconstruction. Compared to other replay systems like Nirvana, they seem to have sacrificed logging overhead in order to get smaller log sizes. But this doesn't seem like a good tradeoff to me, because disk space is so cheap. Average PCs have 200GB disks these days, so spending 5GB to store a log actually isn't a big deal. You just need to make sure that the log is small enough that reading or writing the log is not a bottleneck.

Michael

Green Hills TimeMachine:
PC Instrumentation: 1MB ~ 5 Million Instructions. Typical slowdown is 2-10x.
Full Data Instrumentation: 1MB ~ 1 Million Instructions. Typical slowdown is 5 to 30x.
Hardware Trace Data Collection: Instructions downloaded depends on hardware. Slowdown: None. Hardware does the instrumentation, so there is no slowdown to the process. (Only supported on some ARM chips, Coldfire, and PPC chips).
All products scale to gigabytes of data.

Eyes Above The Waves

Archive

More About Amber

Comments