Wednesday, 16 December 2020

Rees-Dart Track

In the first part of our December tramping trip we did the Kepler Track "Great Walk" with a group of thirteen. Four of that group had a rest day in Queenstown and then, joined by two others, did the Rees-Dart Track December 10-14. This was the first time I've done this track. I chose it because it's the only major track in the Queenstown region that I haven't done. As expected, it's more difficult than any of the Great Walks, but it was amazing!

I originally planned to start in the Rees valley and end in the Dart valley because that's the direction described on the DoC site. DoC staff in Queenstown suggested the reverse direction might be a little better (better views looking up the Dart valley, going up instead of down the steep slope to Dart Hut), and also the weather forecast was for poor weather on December 11, when we'd be crossing the Rees Saddle from the Rees side. However the shuttle company wasn't willing to switch our booking at the last minute so we stuck with starting in the Rees valley. As we shall see this was probably for the best...

So in the morning of the 10th a van picked us up in Queenstown and took us to Glenorchy, with marvellous views along Lake Wakatipu, and there we switched to another van to take us all the way to Muddy Creek where the track officially starts. The driver offered to take us a couple of kilometres further on; a few of us got out so we would walk "the whole track", but the rest took the offer (and took our packs too!).

That first day's walk up the Rees valley was wonderful. The weather was warm and clear. The track through the open land off Rees Station is a bit boggy in places; we noticed some pack-rafters walking up the riverbed instead, but without local knowledge we thought it best to stick to the marked track. There were incredible views as we skirted the flanks of Mt Earnslaw. After a few hours we reached the park boundary and were into the bush, but the track was still easy going. After a while we emerged into clearings with marvellous views of snowcapped peaks and waterfalls all around. This was true for the entire tramp actually so I'm just going to stop mentioning them! We got to Shelter Rock Hut in about six hours, feeling good.

There were five other trampers at Shelter Rock Hut, all heading in the opposite direction to us, i.e. out to the car park the next day. I enjoyed talking to them; two were from Singapore, on working-holiday visas that were extended by several months due to COVID. One of the other three was a volunteer hut maintainer with lots of interesting information to share. As it turned out these were the last trampers we'd be staying with on the entire trip! Before we went to Queenstown I had expected the track to be a lot busier, because I'd met a couple of random people planning to do the Rees-Dart in December, but either that was a fluke or we were early enough in December to beat the rush.

It rained during the night but the next day dawned clear, which was good news as we planned to cross the Rees Saddle. However, just as we were about to leave the hut, it clouded over and snow started to fall! There was a forecast for morning snow to 1200m, but we were at 900m. It wasn't just flurries; the snow thickened and almost immediately started to accumulate as we headed off up the valley. It was lovely, especially for those in our group who hadn't been in snowfall before, but after an hour we were nearing the head of the valley, snow was still falling and there were a couple of inches on the ground. I was seriously wondering about crossing the saddle that day: certainly it would not be ideal to cross a mountain pass with snow falling, given we had no alpine gear and very little experience tramping in snow, and some people had neglected to bring some of the wet-weather gear I had asked for. On the other hand, the (day-old at this point) forecast was for snow to stop before midday and for the temperature to rise during the day, and in fact it was already reasonably warm and there was little wind, so we should not encounter ice. Plus of course we always had the option of turning back, and for worst-case scenarios I always carry an emergency locator beacon.

In fact as we got near the head of the valley the snow did stop falling and the sky cleared, leaving us in a snowy wonderland and making it a fairly easy decision to proceed at least to the top of the saddle and see the condition of the rest of the route. The last climb up to the saddle was a bit tricky but not really a problem. The view at the saddle was incredible. After a long lunch break we carried on down alongside Snowy Creek, and reached Dart Hut a bit later than anticipated but having had a truly exhilarating day.

At this point we discovered one of our group's boot was coming apart. Fortunately, there was a DoC hut warden staying at Dart Hut that night, and he did a great job patching the boot up with fencing wire! That repair lasted for over a day; later we had to resort to wrapping the boot in duct-tape to hold it together for the rest of the trip. A roll of duct-tape is another item I always bring on overnight tramps.

Our third day had brilliant fine weather, as indeed we had for the rest of the trip. As planned we used this day for a day trip up to the end of Dart Valley and then Cascade Saddle. It's a fairly long walk — it took us nine hours, leaving at 9:30am and returning at 6:30pm — but we were able to leave most of our gear at Dart Hut and the walk was truly outstanding. You walk up to where the Dart River emerges from the Dart Glacier, and get incredible views of the glacier and surrounding peaks as you climb up to the saddle. The view from the saddle itself across Matukituki Valley and Mt Aspiring National Park is awe-inspiring; our group was literally gasping "oh oh OH!" as we arrived.

This was the longest day of our trip and in some ways the hardest. The track is well-cairned except for one stretch crossing a slope of broken schist, but we got across that OK (and it was much easier to go back down than I feared). It was hot but water wasn't a problem since there are little streams everywhere (all drinkable) — even up on the saddle itself!

I think this was also the day when we finally got all our group members playing Bang. (And liking it!)

Day four was a relatively simple walk for six-ish hours down the Dart valley to Daley Flat Hut. I made it unnecessarily hard for myself by not packing my pack properly so it was top-heavy, and for that and a few other niggly reasons I was a little bit grumpy — sorry team! We passed a handful of trampers going the other way but Daley Flat Hut was unoccupied. There were some sandflies inside the hut but after we eliminated them we had a pleasant afternoon. Given it was a hot day we thought we'd try bathing in a pool by the Dart River ... unsurprisingly we could only stand the water very briefly, since it was literally glacial runoff.

That night some of our group got up at 2am to see the stars and the Geminid meteor shower. I wasn't one of them but apparently it was incredibly impressive — minimal light pollution out there!

Our last day was another relatively easy walk to the trail end at the Chinaman's Bluff car park. The track skirts the Sandy Flat area of Dart River which is an amazing lake. We started at 7am to be sure to make our pickup at 2pm, but only took a little over five hours in the end.

Given that the snow turned out to be a big win, I'd say in many ways this is the best tramp I've ever done. Getting all the way to Cascade Saddle is certainly a challenge but definitely a goal worth aiming for if you can get fit for it!

Tuesday, 15 December 2020

Kepler Track 2020

In what is becoming an annual tradition, I organised an early-December tramping trip in the South Island for friends and family, starting with a reasonably accessible tramp with a large group and followed by a more challenging tramp with a smaller group. This year our accessible tramp was the Kepler Track "Great Walk", December 5 to 8. It was the second Kepler trip for my kids and I, but new to the rest of the group.

This group was thirteen people this year, the largest group I've had to manage yet! There was a big range of ages and tramping experience, including a first-time overnight tramper. There was one person I'd never met before (a friend of a friend). We had seven men and six women. It was all a good mix and I thank God that, from my perspective at least, the group dynamics worked well.

In previous years with groups of ten I've found it difficult to keep track of who's carrying which supplies, especially food. This year I tried to mitigate those problems by splitting the large group into three subgroups and having the members of each subgroup carry supplies for their subgroup. This worked well. I had planned for the subgroups to actually cook independently but we ended up sharing cooking work across subgroups, which was a bit chaotic but still worked well with everyone eager to pitch in as needed. I mixed up the membership of the subgroups so that people got to know each other a bit more.

The weather forecast was looking pretty bad a week out but we ended up getting quite good weather, especially after the first day. That day was made more interesting because it also happened to be the day of the Kepler Challenge! That race starts at 6am and normally the runners run the whole Kepler Track the same direction we were going, thus would have been well ahead of us the whole way. However, due to the bad weather forecast they ran only as far as Luxmore Hut, then turned around and came back to the start at Lake Te Anau's control gates (followed by a run down to Moturau Hut and back). So, after we walked from town to the control gates (spotting the takahē at the sanctuary along the way) and got onto the Kepler Track proper, we were passed by hundreds of runners coming in the other direction. It was a little annoying but quite interesting, and I'm glad I wasn't running it myself!

It drizzled all morning and we had to stop for lunch in the rain, which was a slight downer, but after we got above the bushline the weather cleared up, we got some excellent views over Lake Te Anau, and everyone cheered right up. Pretty soon after that we got to Luxmore Hut and enjoyed a pleasant afternoon. We visited Luxmore Cave and were entertained by the antics of a kea on the deck. For dinner we had our current favourite first-night meal: sausages, fried onions and buttered bread.

On the second day we had our usual first breakfast of bacon and eggs. Carrying eggs in their cartons at the top of a few people's packs works surprisingly well. We had a marvellous clear day walking across the Kepler tops, and not much wind either, except up Mt Luxmore and later on the descent into Iris Burn valley. We encountered keas close up again at the Forest Burn and Hanging Valley shelters. Really we were exceptionally fortunate because the views were outstanding and the walk pleasant; on a windy, wet day, the tops could be a very unpleasant environment indeed. That night at Iris Burn Hut we had pasta with canned tuna, sundried tomato pesto and grated Parmesan cheese — our new second-night favourite.

The third day of the Kepler is an easy walk down to Moturau Hut and we got there early in the afternoon. It was a lovely sunny day and the whole group got to laze around at the fine beach next to the hut. A number of us braved the waters of Lake Manapouri — cold, but endurable to the point where after fifteen minutes or so it starts feeling OK! The swim was highly refreshing and made more enjoyable by the stunning views around the lake. Dinner was instant noodles and there was plenty of time for more relaxed socialising at the beach as the sun set around 9pm.

The last day was an even easier walk from Moturau back to the control gates and then into town to pick up our bags and take a shuttle back to Queenstown. Once again the weather was superb, sunny but shady in the bush and then windy to keep us cool in the open as we walked back to town.

It was a wonderful trip and I believe our whole group enjoyed it very much. A couple of people had some small fitness issues but were feeling much better by the end. Anyone who didn't enjoy it should assume tramping isn't for them! Coordinating such a large group was a bit stressful for me at times; part of the problem is that I'm not naturally good at socialising in large groups, especially when I don't know some of them very well, so occasionally I had to wander off on my own to pray and decompress a bit. Overall though I was very happy.

A couple of times it was appropriate to remind everyone what a privilege it is to be able to do a trip like this worry-free, while much of the rest of the world is suffering in various ways, and how grateful we should all be for that — we did not earn it!

Thursday, 3 December 2020

Exploiting Precognition In Binary Instrumentation Of rr Replays

This post is part of a series about the rr remix instrumentation engine that powers the Pernosco omniscient debugger.

When rr replays a recording, it constructs processes that have identical memory and register contents to the recorded processes. It replays execution of the threads in those processes in steps; each step runs CPU code until some specific program state is reached (e.g., the next system call, or until registers and the retired-conditional-branch counter match some recorded values). The most efficient way to implement binary instrumentation of this code is to inject the instrumentation engine into each replay process so the engine and its generated code share the same address space as the application code and data. Then, instead of rr replay using PTRACE_CONT to directly execute application code, it uses PTRACE_CONT to enter the instrumentation engine, which is then responsible for executing the application code with instrumentation. When the engine detects that the instrumented code has reached the desired stopping point for that replay step, it returns control to rr replay.

A key invariant is that the remix engine produces the same effects as native execution on application memory and registers at the end of each replay step. This ensures that rr replay continues to produce memory and register states that match those during recording. It also means we can switch between instrumented execution and native execution at any time between replay steps, e.g. we can replay up to a certain point using regular rr replay and then turn on instrumentation. This is useful for debugging the instrumentation engine and for applying instrumentation-based analysis to a subinterval of an rr recording. In particular Pernosco uses this to parallelize analysis by running multiple replays at once, each one instrumenting a different time interval in the recording.

When injecting the instrumentation engine into each replay process we need to allocate a contiguous range of virtual memory that will never be used by the application. Fortunately, because this is an rr replay, we can see into the future. We can quickly scan the recording, identify a sufficiently large range of memory that will never be used in any replay process, and place the engine and its data there in all replay processes from the beginning.

rr replay needs to count the number of retired conditional branches so that we can deliver asynchronous interrupts at the right time during program execution. Effectively, the RCB counter is part of the state that we instruct the engine to stop at. To avoid having to stop and start a hardware performance counter around the instrumentation's own conditional branches, the engine disregards hardware counters and instead adds instructions to count the conditional branches explicitly.

Single-Exit Fragments

Like other instrumentation engines, remix processes a group of instructions at a time, translating each group of application instructions into instrumented instructions in a hidden code buffer; we call these groups "fragments". This lets us apply optimizations across instruction boundaries within a fragment. For example, the shortest instruction to increment our conditional branch counter is a single inc instruction, but this instruction modifies the CPU's arithmetic flags, which could disrupt the application. However, it is safe to modify flags if we can guarantee that the inc will always be followed by an application instruction that overwrites those flags without reading them. Conditional branches are often preceded by such instructions. Therefore, for example, consider the following application instructions:

    cmp r12,[rsp-8]
    jz label
Because cmp overwrites the arithmetic flags, we can translate this code to
    inc [remix_rcb_counter]
    cmp r12,[rsp-8]
    jz translated_label

Correctly applying this kind of optimization in a binary instrumentation engine is more difficult than it looks, because of unexpected early exits from fragments. In this case, a problem would arise if rsp-8 is not a valid address so that instruction triggers a segmentation fault. We would fault after incrementing remix_rcb_counter, counting a conditional branch that may never happen. Even worse, we will have corrupted the application flag values; the cmp instruction we were counting on to cover that up has not executed, and may never execute! (Keep in mind that segfaults don't have to be fatal...) Normally, the possibility of these unexpected exits limits the optimizations an engine can use, and/or requires elaborate recovery machinery to mitigate — machinery that adds overhead to the just-in-time binary instrumentation process.

During remix execution, however, rr replay indicates whether execution will stop at a segmentation fault or not. If it will stop at a fault, the fault state will be the goal state for that execution step, and remix will insert code before the faulting instruction to stop execution when the right state has been reached — it will never execute a faulting instruction. In remix there are no early exits from fragments; once a fragment has been entered, it is guaranteed to run to completion. We can apply code motion within a fragment at will as long as dataflow dependencies are respected. Binary instrumentation of rr replays is a much easier problem than regular just-in-time binary instrumentation and this lets remix achieve lower overhead with a simpler implementation.

Tuesday, 1 December 2020

rr remix: Efficient Replay-Only Binary Instrumentation

The Pernosco omniscient debugger analyzes user-submitted rr recordings to extract all program states into a database, so that during debugging sessions it can efficiently recreate the program state at any point in time. This analysis requires intensive binary instrumentation of rr replays (e.g. to observe all memory writes performed by the application). To integrate binary instrumentation with rr replay we had to create a new binary instrumentation framework, which we call "rr remix". (Record, replay, remix, ...)

Our main goals for remix were:

  • Instrument the replay of any rr recording (including applications with self-modifying code)
  • Support executing arbitrary instrumentation code
  • Minimise space and time overhead, especially on huge applications (e.g. Web browsers)
  • Be efficient on code without compiler optimizations, as well as optimized code (the former being common when debugging)
  • Be simple so we could build and maintain it with minimal effort
It turned out we hit an extra goal we didn't start with:
  • Support replaying rr recordings in hardware/VMs without access to hardware performance counters or CPUID faulting, and with incompatible XSAVE layouts

Traditionally, tool performance has been mostly evaluated on small, compiler-optimized benchmark applications. Therefore our goals led us to make rather different design decisions compared to other well-known binary instrumentation tools such as Pin, Valgrind, and DynamoRIO. Also, doing binary instrumentation during rr replay imposes some extra requirements on the instrumentation engine, but also gives us very valuable knowledge of the future that we can exploit to simplify the design of the instrumentation engine and improve its performance.

To illustrate the performance of remix we will show the performance of clang++ compiling a simple C++ program. This resembles running a single test of a large application, typical behaviour for Pernosco users.

We compare up-to-date DynamoRio, Valgrind and remix all using "null tool" instrumentation, i.e. rewriting the code but not actually adding any unnecessary instrumentation. These are geometric means of five runs, real time. On optimized clang++, remix of an rr replay beats both DynamoRio and Valgrind instrumenting a normal run.

On non-optimized code, remix beats both DynamoRio and Valgrind again, but the overhead ratios of all the instrumentation systems (especially DR/Valgrind) are lower. This is a pretty good result for remix because it is much simpler than DR and Valgrind; in particular it doesn't perform any inlining, while the others do. This hurts more on non-optimized code, which has a lot more function calls that compilers would normally inline. (Valgrind deliberately optimizes for ease of tool writing and portability over performance; I'm including it for completeness and because people are familiar with it.)

Unfortunately we aren't in a position to open-source remix at this time.

We plan to follow up with some more posts documenting interesting design decisions in remix and how they contribute to these results. Probable topics:

  • The basic remix architecture and how it integrates into rr
  • Fixing regular rr's limitations on trace portability and target hardware
  • Leveraging knowledge of the future to improve the efficiency of binary rewriting
  • The mystery of efficient branch-and-link instructions on x86-64
  • Optimizing non-optimized code: leveraging hardware return address prediction in binary instrumentation
  • Optimizing non-optimized code: dataflow analysis

PS, remember this is all in service of the Pernosco omniscient debuggertry it out!