Experimental Data On Reproducing Intermittent MongoDB Test Failures With rr Chaos Mode

Max Hirschhorn from MongoDB has released some very interesting results from an experiment reproducing intermittent MongoDB test failures using rr chaos mode.

He collected 18 intermittent test failure issues and tried running them 1000 times under the test harness and rr with and without chaos mode. He noted that for 13 of these failures, MongoDB developers were able to make them reproducible on demand with manual study of the failure and trial-and-error insertion of "sleep" calls at relevant points in the code.

Unfortunately rr didn't reproduce any of his 5 not-manually-reproducible failures. However, it did reproduce 9 of the 13 manually reproduced failures. Doing many test runs under rr chaos mode is a lot less developer effort than the manual method, so it's probably a good idea to try running under rr first.

Of the 9 failures reproducible under rr, 3 also reproduced at least once in a 1000 runs without rr (with frequencies 1, 3 and 54). Of course with such low reproduction rates those failures would still be pretty hard to debug with a regular debugger or logging.

The data also shows that rr chaos mode is really effective: in almost all cases where he measured chaos mode vs rr non-chaos or running without rr, rr chaos mode dramatically increased the failure reproduction rate.

The data has some gaps but I think it's particularly valuable because it's been gathered on real-world test failures on an important real-world system, in an application domain where I think rr hasn't been used before. Max has no reason to favour rr, and I had no interaction with him between the start of the experiment and the end. As far as I know there's been no tweaking of rr and no cherry-picking of test cases.

I plan to look into the failures that rr was unable to reproduce to see if we can improve chaos mode to catch them and others like them in the future. He hit at least one rr bug as well.

I've collated the data for easier analysis here:

Failure	Reproduced manually	rr-chaos reproductions	regular rr reproductions	no-rr reproductions
BF-9810	--	0 /1000	?	?
BF-9958	Yes	71 /1000	2 /1000	0 /1000
BF-10932	Yes	191 /1000	0 /1000	0 /1000
BF-10742	Yes	97 /1000	0 /1000	0 /1000
BF-6346	Yes	0 /1000	0 /1000	0 /1000
BF-8424	Yes	1 /232	1 /973	0 /1000
BF-7114	Yes	0 /48	?	?
BF-7588	Yes	193 /1000	96 /1000	54 /1000
BF-7888	Yes	0 /1000	?	?
BF-8258	--	0 /636	?	?
BF-8642	Yes	3 /1000	?	0 /1000
BF-9248	Yes	0 /1000	?	?
BF-9426	--	0 /1000	?	?
BF-9552	Yes	5 /563	?	?
BF-9864	--	0 /687	?	?
BF-10729	Yes	2 /1000	?	1 /1000
BF-11054	Yes	7 /1000	?	3 /1000

Eyes Above The Waves

Archive

Experimental Data On Reproducing Intermittent MongoDB Test Failures With rr Chaos Mode