Thursday, 20 June 2019

Stack Write Traffic In Firefox Binaries

For people who like this sort of thing...

I became interested in how much CPU memory write traffic corresponds to "stack writes". For x86-64 this roughly corresponds to writes that use RSP or RBP as a base register (including implicitly via PUSH/CALL). I thought I had pretty good intuitions about x86 machine code, but the results surprised me.

In a Firefox debug build running a (non-media) DOM test (including browser startup/rendering/shutdown), Linux x86-64, non-optimized (in an rr recording, though that shouldn't matter):

Base registerFraction of written bytes
RAX0.40%
RCX0.32%
RDX0.31%
RBX0.01%
RSP53.48%
RBP44.12%
RSI0.50%
RDI0.58%
R80.01%
R90.00%
R100.00%
R110.00%
R120.00%
R130.00%
R140.00%
R150.00%
RIP0.00%
RDI (MOVS/STOS)0.25%
Other0.00%
RSP/RBP97.59%

Ooof! I expected stack writes to dominate, since non-opt Firefox builds have lots of trivial function calls and local variables live on the stack, but 97.6% is a lot more dominant than I expected.

You would expect optimized builds to be much less stack-dominated because trivial functions have been inlined and local variables should mostly be in registers. So here's a Firefox optimized build:

Base registerFraction of written bytes
RAX1.23%
RCX0.78%
RDX0.36%
RBX2.75%
RSP75.30%
RBP8.34%
RSI0.98%
RDI4.07%
R80.19%
R90.06%
R100.04%
R110.03%
R120.40%
R130.30%
R141.13%
R150.36%
RIP0.14%
RDI (MOVS/STOS)3.51%
Other0.03%
RSP/RBP83.64%

Definitely less stack-dominated than for non-opt builds — but still very stack-dominated! And of course this is not counting indirect writes to the stack, e.g. to out-parameters via pointers held in general-purpose registers. (Note that opt builds could use RBP for non-stack purposes, but Firefox builds with -fno-omit-frame-pointer so only in leaf functions, and even then, probably not.)

It would be interesting to compare the absolute number of written bytes between opt and non-opt builds but I don't have traces running the same test immediately at hand. Non-opt builds certainly do a lot more writes.

Tuesday, 4 June 2019

Winter Tramp: Waihohonu Hut To Tama Lakes

New Zealand's Tongariro National Park is one of my favourite places. We had a three-day weekend so I drove three friends and family down for a two-night stay at Waihohonu Hut, surely the grandest public hut in New Zealand, and we enjoyed the park in a wintry setting ... an interesting change from our previous visits.

We left Auckland around 7am on Saturday to avoid traffic — often hordes of people leave Auckland for long weekends — but there was no congestion. After stopping for lunch in Turangi we reached the trailhead on the Desert Road shortly before 1pm. The wind was cold and there was thick low-lying cloud, but it wasn't snowing ... yet. From there the walk to Waihohonu Hut is easy in less than two hours, on a good quality track with a very gentle upward slope. Much of the route is very exposed but the wind wasn't as high as forecast and we were well equipped. Towards the end it started snowing gently, but that was fun and we got to the hut in high spirits before 3pm. The hut is well insulated and other trampers had arrived before us and got the fire going, and the LED lighting was on, so it was cosy. We talked to them, made popcorn, watched the snow fall, played some card games and enjoyed the rest of the afternoon and evening as more trampers trickled in.

I had wondered how full the hut would get. There are 28 bunks, but it's off-season so they can't be booked, and given the public holiday potentially a lot of people could have turned up. As it happened about 35 people ended up there on Saturday night — many people tramping in from the Desert Road just to visit Waihohonu, like us, but also quite a few doing round trips from Whakapapa or even doing the Tongariro Northern Circuit (which requires alpine skills at this time of year). People grabbed bunks as they arrived, and the rest slept on spare mattresses in the common room, which was actually a fine option. The only problem with sleeping in the common room is people staying up late and (probably other) people coming in early for breakfast. Even though it was technically overfull, Waihohonu Hut's common areas are so spacious that at no time did it ever feel crowded.

On Sunday morning there was a bit more snow on the ground, some low cloud and light snow falling. I was hoping to walk west from the hut to the Tama Saddle, which separates Mt Ruapehu to the south from Mts Tongariro and Ngauruhoe to the north, and visit the Lower Tama Lake just north of the saddle. It was unclear what conditions were going to be like but the forecast had predicted snow would stop falling in the morning, and we were well equipped, so we decided to give it a go. The expected walking time was about six and a half hours and we left before 9am so we had plenty of time. In the end it worked out very well. The cloud lifted quickly, except for some tufts around Ruapehu, and the snow did stop falling, so we had stunning views of the mountains the whole day. We were the first walkers heading west that day so we walked through fresh snow, broke the ice of frozen puddles and streams, and saw the footprints of rabbits and other animals, and relished the pristine wintry environment. It's the first time I've done a long-ish walk in the snow in the wilderness like this, and it was magnificent! I'm so grateful we had the chance to be there and that the weather turned out well.

As we got close to the saddle the snow was thicker, up to our knees in a few places, and the wind got stronger, and at the Lower Tama Lake it was quite cold indeed and blowing steadily from the east. I was a bit worried about having to walk back into that wind, and there was still the possibility of a change in the weather, so even though we were ahead of schedule I decided after lunch above the lake we should head back to Waihohonu rather than carrying on up to Upper Tama Lake (where no doubt the views would have been even better, but the wind would have been even colder!). Interestingly though, we were far from alone; many people, mostly foreign tourists, had walked to the lakes from Whakapapa (on the western side of Ruapehu), a shorter walk, and even headed up the ridge to the upper lake. As it turned out, our walk back was pretty easy. The wind mostly died away and the sun even came out.

We got back to Waihohonu about 3:30pm and once again relaxed at the hut for the rest of the afternoon, catching up with the trampers who were staying both nights and meeting new arrivals. That night the hut was again overfull but only by a couple of people, and again that wasn't a problem.

This morning (Monday) the sky was completely clear, giving magnificent views of snow-covered Ngauruhoe and Ruapehu through the hut's huge picture windows. A thick frost on the ground combined with the snow to form a delightfully crunchy surface for our walk back to the car park. I for one kept turning around to take in the incredible views. It was a very pleasant drive back in the sun through the heart of the North Island, but I can't want to go tramping again!

Wednesday, 29 May 2019

A Few Comments On "Sparse Record And Replay With Controlled Scheduling"

This upcoming PLDI paper is cool. One thing I like about it is that it does a detailed comparison against rr, and a fair comparison too. The problem of reproducing race bugs using randomized scheduling in a record-and-replay setting is important, and the paper has interesting quantitative results.

It's unfortunate that the paper doesn't mention rr's chaos mode, which is our attempt to tackle roughly the same problem. It would be very interesting to compare chaos mode to the approach in this paper on the same or similar benchmarks.

I'm quite surprised that the PLDI reviewers accepted this paper. I don't mean that the paper is poor, because I think it's actually quite good. We submitted papers about rr to several conferences including PLDI (until USENIX ATC accepted it), and we consistently got quite strong negative review comments that it wasn't clear enough which programs rr would record and replay successfully, and what properties of the execution were guaranteed to be preserved during the replay. We described many steps we had to take to get applications to record efficiently in rr in practice, and many reviewers seemed to perceive rr as just a collection of hacks and thus not publishable. Yet it seems to me this "sparse replay" approach is considerably more vague than rr about what it can handle and what gets preserved during replay. I do not see any principled reason why the critical reviewers of our rr paper would not have criticised this paper even harder. I wonder what led to a different outcome.

Perhaps making the idea of "sparse replay" (i.e., record only some subset of behaviour that's necessary and sufficient for a particular application) a focus of the paper effectively lampshaded the problem, or just sufficiently reduced expectations by not claiming to be a general-purpose tool.

I also suspect it's partly just "luck of the draw" in reviewer assignment. It is an unfortunate fact that paper review outcomes can be pretty random. As both a submitter and reviewer, I've seen that scores from different reviewers often differ wildly — it's not uncommon for a paper to get both A and D reviews on an A-to-D scale. When a paper gets both A and D, it typically gets a lot more scrutiny from the review committee to reach a decision, but one should also expect that there are many (un)lucky papers that just happen to avoid a D reviewer or fail to connect with an A reviewer. Given how important publications are to many people (fortunately, not to me), it's not a great system. Though, like democracy, maybe it's better than the others.

Saturday, 25 May 2019

Microsoft's Azure Time-Travel Debugging

This looks pretty cool. The video is also instructive.

It's not totally clear to me how this works under the hood, but apparently they have hooked up the Nirvana TTD to the .NET runtime so that it will enable TTD recording of invocations of particular methods. That means you can inspect the control flow, the state of registers (i.e. local variables), and any memory read by the method or its callees, at any point during the method invocation. It's not clear what happens if you inspect memory outside the scope of the method (e.g. global variables) or if you inspect memory that was modified concurrently by other threads. Plus there are performance and other issues listed in the blog post.

This seems like a good idea but somewhat more limited than a full-fledged record-everything debugger like rr or WinDbg-TTD. I suspect they're pushing this limited-scope debugging as a way to reduce run-time overhead. Various people have told me that WinDbg-TTD has almost unusably high overhead for Firefox ... though other people have told me they found it tolerable for their work on Chrome, so data is mixed.

One interesting issue here is that if I was designing a Nirvana-style multithread-capable recorder for .NET applications — i.e., one that records all memory reads in some fashion via code instrumentation — I would try building it into the .NET VM itself, like Chronon for Java. That way you avoid recording stuff like GC (noted as a problem for this Azure debugger), and the JIT compiler can optimize your instrumentation. I guess Microsoft people were looking for a way to deploy TTD more widely and decided this was the best option. That would be reasonable, but it would be a "solution-driven" approach to the problem, which I have strong feelings about.

Monday, 20 May 2019

Don't Call Socially Conservative Politicial Parties "Christian"

There is talk about starting a new "Christian" (or "Christian values") political party in New Zealand. The party might be a good idea, but if it's really a "social conservative" party, don't call it "Christian".

Audrey Young writes:

The issues that would galvanise the party are the three big social issues before Parliament at present and likely to be so in election year as well: making abortions easier to get, legalising euthanasia, and legalising recreational cannabis.

None of those issues are specifically Christian. None of them are mentioned directly in the New Testament. I even think Christians can be for some version of all of them (though it makes sense to me that most Christians would oppose the first two at least). Therefore "social conservative" is a much more accurate label than "Christian" for a party focused on opposing those changes.

A truly Christian party's key issues would include reminding the voting public that we all sinners against God, in need of repentance and forgiveness that comes through Jesus. The party would proclaim to voters "how hard it is for the rich to enter the kingdom of God" and warn against storing up treasures on earth instead of heaven. It would insist on policies that support "the least of these". It would find a way to denounce universally popular sins such as greed, gluttony and heterosexual extra-marital sex, and advocate policies that reduce their harm, while visibly observing Paul's dictum "What business is it of mine to judge those outside the church? Are you not to judge those inside?" A Christian party would follow Jesus' warning against "those who for a show make lengthy prayers" and downplay their own piety. It would put extraordinary emphasis on honouring the name of Christ by avoiding any sort of lies, corruption or scandal. Its members would show love for their enemies and not retaliate when attacked. If they fail in public, they would confess and repent in public.

That sounds pretty difficult, but it's what Jesus deserves from any party that claims his name.

I'm all for Christians being involved in politics and applying their Christian worldview to politics, if they can succeed without making moral compromises. But it's incredibly important that any Christian who publicly connects Christ with politics takes into account how that will shape unbelievers' view of Christianity. If they lead people to believe that Christianity is about being socially conservative and avoiding certain hot-button sins, with the gospel nowhere in sight, then they point people towards Hell and betray Jesus and his message.

Monday, 6 May 2019

Debugging Talk At Auckland Rust Meetup

I gave a talk about "debugging techniques for Rust" at tonight's Auckland Rust Meetup. There was many good questions and I had a good time. It wasn't recorded. Thanks to the organiser and sponsors!

I'm also going to give a talk at the next meetup in June!

Monday, 29 April 2019

Goodbye Mozilla IRC

I've been connected to Mozilla IRC for about 20 years. When I first started hanging out on Mozilla IRC I was a grad student at CMU. It's how I got to know a lot of Mozilla people. I was never an IRC op or power user, but when #mozilla was getting overwhelmed with browser user chat I was the one who created #developers. RIP.

I'll be sad to see it go, but I understand the decision. Technologies have best-before dates. I hope that Mozilla chooses a replacement that sucks less. I hope they don't choose Slack. Slack deliberately treats non-Chrome browsers as second-class — in particular, Slack Calls don't work in Firefox. That's obviously a problem for Mozilla users, and it would send a bad message if Mozilla says that sort of attitude is fine with them.

I look forward to finding out what the new venue is. I hope it will be friendly to non-Mozilla-staff and the community can move over more or less intact.

Friday, 26 April 2019

Update To rr Master To Debug Firefox Trunk

A few days ago Firefox started using LMDB (via rkv) to store some startup info. LMDB relies on file descriptor I/O being coherent with memory-maps in a way that rr didn't support, so people have had trouble debugging Firefox in rr, and Pernosco's CI test failure reproducer also broke. We have checked in a fix to rr master and are in the process of updating the Pernosco pipeline.

The issue is that LMDB opens a file, maps it into memory MAP_SHARED, and then opens the file again and writes to it through the new file descriptor, and requires that the written data be immediately reflected in the shared memory mapping. (This behavior is not guaranteed by POSIX but is guaranteed by Linux.) rr needs to observe these writes and record the necessary memory changes, otherwise they won't happen during replay (because writes to files don't happen during replay) and replay will fail. rr already handled the case when the application write to the file descriptor (technically, the file description) that was used to map the file — Chromium has needed this for a while. The LMDB case is harder to handle. To fix LMDB, whenever the application opens a file for writing, we have to check to see if any shared mapping of that file exists and if so, mark that file description so writes to it have their shared-memory effects recorded. Unfortunately this adds overhead to writable file opens, but hopefully it doesn't matter much since in many workloads most file opens are read-only. (If it turns out to be a problem there are ways we can optimize further.) While fixing this, we also added support for the case where the application opens a file (possibly multiple times with different file descriptions) and then creates a shared mapping of one of them. To handle that, when creating a shared mapping we have to scan all open files to see if any of them refer to the mapped file, and if so, mark them so the effects of their writes are recorded.

Update Actually, at least this commit is required.

Thursday, 11 April 2019

Mysteriously Low Hanging Fruit: A Big Improvement To LLD For Rust Debug Builds

LLD is generally much faster than the GNU ld.bfd and ld.gold linkers, so you would think it has been pretty well optimised. You might then be surprised to discover that a 36-line patch dramatically speeds up linking of Rust debug builds, while also shrinking the generated binaries dramatically, both in simple examples and large real-world projects.

The basic issue is that the modern approach to eliminating unused functions from linked libraries, --gc-sections, is not generally able to remove the DWARF debug info associated with the eliminated functions. With --gc-sections the compiler puts each function in its own independently linkable ELF section, and then the linker is responsible for selecting only the "reachable" functions to be linked into the final executable and discarding the rest. However, compilers are still putting the DWARF debug info into a single section per compilation unit, and linkers mostly treat debug info sections as indivisible black boxes, so those sections get copied into the final executable even if the functions they're providing debug info for have been discarded. My patch tackles the simplest case: when a compilation unit has had all its functions and data discarded, discard the debug info sections for that unit. Debug info could be shrunk a lot more if the linker was able to rewrite the DWARF sections to discard info for a subset of the functions in a compilation unit, but that would be a lot more work to implement (and would potentially involve performance tradeoffs). Even so, the results of my patch are good: for Pernosco, our "dist" binaries with debug info shrink from 2.9GB to 2.0GB.

Not only was the patch small, it was also pretty easy to implement. I went from never having looked at LLD to working code in an afternoon. So an interesting question is, why wasn't this done years ago? I can think of a few contributing reasons:

People just expect binaries with debug info to be bloated, and because they're only used for debugging, except for a few people working on Linux distros, it's not worth spending much effort trying to shrink them.

C/C++ libraries that expect to be statically linked, especially common ones like glibc, don't rely on --gc-sections to discard unused functions. Instead, they split the library into many small compilation units, ideally one per independently usable function. This is extra work for library developers, but it solves the debug info problem. Rust developers don't (and really, can't) do this because rustc splits crates into compilation units in a way that isn't under the control of the developer. Less work for developers is good, so I don't think Rust should change this; tools need to keep up.

Big companies that contribute to LLD, with big projects that statically link third-party libraries, often "vendor" those libraries, copying the library source code into their big project and building it as part of that project. As part of that process, they would usually tweak the library to only build the parts their project uses, avoiding the problem.

There has been tension in the LLD community between doing the simple thing I did and doing something more difficult and complex involving DWARF rewriting, which would have greater returns. Perhaps my patch submission to some extent forced the issue.

Friday, 5 April 2019

Rust Discussion At IFP WG2.4

I've spent this week at a IFIP WG2.4 meeting, where researchers share ideas and discuss topics in programming languages, analysis and software systems. The meeting has been in Paihia in the Bay of Islands, so very conveniently located for me. My main talk was about Pernosco, but I also took the opportunity to introduce people to Rust and the very significant advances in programming language technology that it delivers. My slides are rudimentary because I wanted to minimize my talking and leave plenty of time for questions and discussion. I think it went pretty well. The main point I wanted researchers to internalize is that Rust provides a lot of structure that could potentially be exploited by static analysis and other kinds of tools, and that we should expect future systems programming languages to at least meet the bar set by Rust, so forward-looking research should try to exploit these properties. I think Rust's tight control of aliasing is especially important because aliasing is still such a problematic issue for all kinds of static analysis techniques. The audience seemed receptive.

One person asked me whether they should be teaching Rust instead of C for their "systems programming" courses. I definitely think so. I wouldn't teach Rust as a first programming language, but for a more advanced course focusing on systems programming I think Rust would be a great way to force people to think about issues such as lifetimes — issues that C programmers should grapple with but can often get away with sloppy handling of in classroom exercises.

Saturday, 30 March 2019

Marama Davidson And The Truth About Auckland's History

On March 24 I sent the following email to Marama Davidson's parliamentary office email address.

Subject: Question about Ms Davidson's speech at the Auckland peace rally on March 16 I was at the rally. During her speech Ms Davidson mentioned that the very land we were standing on (Aotea Square) was taken from Māori by European settlers by force. However Wikipedia says
By 1840 Te Kawau had become the paramount chief of Ngāti Whātua. Cautious of reprisals from the Ngāpuhi defeated at Matakitaki, Te Kawau found it most convenient to offer Governor Hobson land around the present central city.
https://en.wikipedia.org/wiki/History_of_Auckland Can you clarify Ms Davidson's statement and/or provide a source for her version? Sincerely, Robert O'Callahan

I haven't received a response. Te Ara agrees with Wikipedia.

I'd genuinely like to know the truth here. It would be disappointing if Davidson lied — blithely accepting "all politicians lie" is part of the path to electing people like Donald Trump. On the other hand if the official histories are wrong, that would also be disappointing and they need to be corrected.

Monday, 18 February 2019

Banning Huawei Is The Right Decision

If China's dictator-for-life Xi Jinping orders Huawei to support Chinese government spying, it's impossible to imagine Huawei resisting. The Chinese government flaunts its ability to detain anyone at any time for any reason.

The argument "no-one has caught Huawei doing anything wrong" (other than stealing technology) misses the point; the concern is about what they might do in the future.

The idea that you can buy equipment from Huawei today and protect it from future hijacking doesn't work. It will need to be maintained and upgraded by Huawei, which will let them add backdoors in the future even if there aren't any (accidental or deliberate) today.

Don't imagine you can inspect their systems to find backdoors. Skilled engineers can insert practically undetectable backdoors at many different levels of a computer system.

These same issues apply to other Chinese technology companies.

These same issues apply to technology companies from other countries, but New Zealand should worry less about technology companies from Western powers. Almost every developed country has much greater rule of law than China has; for example US spy agencies can force tech companies to cooperate using National Security Letters, but those can be challenged in court. We also have to weigh how much we fear the influence of different governments. I think New Zealand should worry a lot less about historically friendly democracies, flawed as they are, than about a ruthless tyranny like the Chinese government with a history of offensive cyberwarfare.

New Zealand and other countries may pay an economic price for such decisions, and I can see scenarios where the Chinese government decides to make an example of us to try to frighten other nations into line. Hopefully that won't happen and we won't be forced to choose between friendship with China and digital sovereignty — but if we have to pick one, we'd better pick digital sovereignty.

It would be easier for Western countries to take the right stand if the US President didn't fawn over dictators, spit on traditional US allies, and impose tariffs on us for no good reason.

Monday, 11 February 2019

Rust's Affine Types Catch An Interesting Bug

A function synchronously downloads a resource from Amazon S3 using a single GetObject request. I want it to automatically retry the download if there's a network error. A wrapper function aws_retry_sync based on futures-retry takes a closure and automatically reruns it if necessary, so the new code looks like this:

pub fn s3_download<W: Write>(
    client: S3Client,
    bucket: String,
    key: String,
    out: W,
) -> io::Result<()> {
    aws_retry_sync(move || {
        let response = client.get_object(...).sync()?;
        if let Some(body) = response.body {
            body.fold(out, |mut out, bytes: Vec| -> io::Result {
                out.write_all(&bytes)?;
                Ok(out)
            })
            .wait()?;
        }
    })
}
This fails to compile for an excellent reason:
error[E0507]: cannot move out of captured variable in an `FnMut` closure
   --> aws-utils/src/lib.rs:194:23
    |
185 |     out: W,
    |     --- captured outer variable
...
194 |             body.fold(out, |mut out, bytes: Vec| -> io::Result {
    |                       ^^^ cannot move out of captured variable in an `FnMut` closure
I.e., the closure can execute more than once, but each time it executes it wants to take ownership of out. Imagine if this compiled ... then if the closure runs once and writes N bytes to out, then the network connection fails and we retry successfully, we would write those N bytes to out again followed by the rest of the data. This would be a subtle and hard to reproduce error.

A retry closure should not have side effects for failed operations and should not, therefore, take ownership of out at all. Instead it should capture data to a buffer which we'll write to out if and only if the entire fetch succeeds. (For large S3 downloads you need parallel downloads of separate ranges, so that network errors only require refetching part of the object, and that approach deserves a separate implementation.)

Ownership types are for more than just memory and thread safety.

Mt Taranaki 2019

Last weekend I climbed Mt Taranaki again. Last time was just me and my kids, but this weekend I had a larger group of ten people — one of my kids and a number of friends from church and elsewhere. We had a range of ages and fitness levels but everyone else was younger than me and we had plans in place in case anyone needed to turn back.

We went this weekend because the weather forecast was excellent. We tried to start the walk at dawn on Saturday but were delayed because the North Egmont Visitor's Centre carpark apparently filled up at 4:30am; everyone arriving after that had to park at the nearest cafe and catch a shuttle to the visitor's centre, so we didn't start until 7:40am.

In short: we had a long hard day, as expected, but everyone made it to the crater, most of us by 12:30pm. Most of our group clambered up to the very summit, and we all made it back safely. Unfortunately clouds set in around the top not long before we go there so there wasn't much of a view, but we had good views much of the rest of the time. You could clearly see Ruapehu, Ngauruhoe and Tongariro to the east, 180km away. It was a really great day. The last of our group got back to the visitor's centre around 6pm.

My kid is six years older than last time and much more experienced at tramping, so this time he was actually the fastest of our entire group. I'm proud of him. I think I found it harder than last time — probably just age. As I got near the summit my knees started to twinge and cramp if I wasn't careful on the big steps up. I was also a bit shorter of breath than I remember from last time. I was faster at going down the scree slope though, definitely the trickiest part of the descent.

On the drive back from New Plymouth yesterday, the part of the group in our car stopped at the "Three Sisters", rock formations on the beach near Highway 3 along the coast. I just saw it on the map and we didn't know what was there, but it turned out to be brilliant. We had a relaxing walk and the beach, surf, rocks and sea-caves were beautiful. Highly recommended — but you need to be there around low tide to walk along the riverbank to the beach and through the caves.

Sunday, 27 January 2019

Experimental Data On Reproducing Intermittent MongoDB Test Failures With rr Chaos Mode

Max Hirschhorn from MongoDB has released some very interesting results from an experiment reproducing intermittent MongoDB test failures using rr chaos mode.

He collected 18 intermittent test failure issues and tried running them 1000 times under the test harness and rr with and without chaos mode. He noted that for 13 of these failures, MongoDB developers were able to make them reproducible on demand with manual study of the failure and trial-and-error insertion of "sleep" calls at relevant points in the code.

Unfortunately rr didn't reproduce any of his 5 not-manually-reproducible failures. However, it did reproduce 9 of the 13 manually reproduced failures. Doing many test runs under rr chaos mode is a lot less developer effort than the manual method, so it's probably a good idea to try running under rr first.

Of the 9 failures reproducible under rr, 3 also reproduced at least once in a 1000 runs without rr (with frequencies 1, 3 and 54). Of course with such low reproduction rates those failures would still be pretty hard to debug with a regular debugger or logging.

The data also shows that rr chaos mode is really effective: in almost all cases where he measured chaos mode vs rr non-chaos or running without rr, rr chaos mode dramatically increased the failure reproduction rate.

The data has some gaps but I think it's particularly valuable because it's been gathered on real-world test failures on an important real-world system, in an application domain where I think rr hasn't been used before. Max has no reason to favour rr, and I had no interaction with him between the start of the experiment and the end. As far as I know there's been no tweaking of rr and no cherry-picking of test cases.

I plan to look into the failures that rr was unable to reproduce to see if we can improve chaos mode to catch them and others like them in the future. He hit at least one rr bug as well.

I've collated the data for easier analysis here:

FailureReproduced manuallyrr-chaos reproductionsregular rr reproductionsno-rr reproductions
BF-9810--0 /1000??
BF-9958Yes71 /10002 /10000 /1000
BF-10932Yes191 /10000 /10000 /1000
BF-10742Yes97 /10000 /10000 /1000
BF-6346Yes0 /10000 /10000 /1000
BF-8424Yes1 /2321 /9730 /1000
BF-7114Yes0 /48??
BF-7588Yes193 /100096 /100054 /1000
BF-7888Yes0 /1000??
BF-8258--0 /636??
BF-8642Yes3 /1000?0 /1000
BF-9248Yes0 /1000??
BF-9426--0 /1000??
BF-9552Yes5 /563??
BF-9864--0 /687??
BF-10729Yes2 /1000?1 /1000
BF-11054Yes7 /1000?3 /1000