Saturday, 19 June 2021

Spectre Mitigations Murder *Userspace* Performance In The Presence Of Frequent Syscalls

I just made a performance improvement to the (single-threaded) rr sources command to cache the results of access system calls checking for directory existence. It's a simple and very effective optimization on my Skylake, Linux 5.12 laptop:

Before:
[roc@localhost code]$ time rr sources ~/pernosco/main/test-tmp/basics-demo >& ~/tmp/output2
real	3m19.648s
user	1m9.157s
sys	2m9.416s

After:
[roc@localhost code]$ time rr sources  ~/pernosco/main/test-tmp/basics-demo >& ~/tmp/output2
real	0m36.160s
user	0m36.009s
sys	0m0.053s

One interesting thing is that we cut the userspace execution time in half even though we're executing more userspace instructions than before. Frequent system calls actually slow down code execution in userspace. I assumed this was at least partly due to Spectre mitigations so I turned those off (with mitigations=off) and reran the test:

Before:
[roc@localhost code]$ time rr sources ~/pernosco/main/test-tmp/basics-demo >& ~/tmp/output2
real	2m5.776s
user	0m33.052s
sys	1m32.280s

After:
[roc@localhost code]$ time rr sources  ~/pernosco/main/test-tmp/basics-demo >& ~/tmp/output2
real	0m33.422s
user	0m32.934s
sys	0m0.110s
So those Spectre mitigations make pre-optimization userspace run 2x slower (due to cache and TLB flushes I guess) and the whole workload overall 1.6x slower! Before Spectre mitigations, those system calls hardly slowed down userspace execution at all.

Monday, 7 June 2021

Tama Lakes Winter Tramp 2021

This weekend my kids and I went down to Tongariro National Park for a winter tramp, repeating a similar trip in 2019. Overall it was great but apparently word got out and Waihohonu Hut was a lot busier than last time.

The weather forecast wasn't great so it was just me and my kids on this trip. The first sign of busyness was that the car park of the Desert Road was about full. We got to the hut about 2:15pm and were able to claim the last bunks, but people kept arriving. I guess there were probably more than 50 people there on Saturday night, 30+ squeezed into bunks and a lot of people who had to sleep on the floor of the main room. It's such a huge hut that this was still tolerable and we had a fun afternoon and evening. We got some views of the lower flanks of Mt Ruapehu on the walk in, and a good view of Mt Ngaurahoe topped by cloud.

Sunday was drizzly with low cloud, as forecast. One of my kids stayed at the hut to study for exams. The other one and I walked to the Tama Lakes via an off-track route I heard about years ago and had been hoping to try out ever since. I don't have much experience with off-track walking and the conditions weren't ideal, but: they weren't bad, we were carrying all relevant gear, my son and I are reasonably fit and fast walkers, we left at 8am so had plenty of time, and the return trip from the lakes was via the Great Walk track (easy). It worked out well and we had a lot of fun, though the views were obscured by cloud — on a good day they would be magnificent. I'd like to do it again but only with a small group and not often; I don't want the environment to be damaged by the route becoming popular.

We got back to the hut about 2:20pm after some fast walking and found it already pretty full, again. In fact it looked like it would be even fuller than on Saturday night, so my kids and I decided to just walk out to the car and come home a day early. That seemed like the right decision for us, but also freed up a little bit of space in the hut.

So unfortunately it seems that the Queens Birthday trip to Waihohonu Hut will not be such a great option in the future; given it was barely tolerable with a poor weather forecast, it probably would be really intolerable had the forecast been good.

Wednesday, 19 May 2021

Forward Compatibility Of rr Recordings

Since 2017 rr has maintained backward compatibility for recordings, i.e. new rr versions can replay any recording made by any earlier rr version back to 5.0. When we set that goal, it wasn't clear for how long we'd be able to sustain it, but so far so good!

However, we have said nothing about forward compabitility — whether old rr versions are able to replay recordings produced by new rr versions — and in practice we have broken that many times. In practice that's generally OK. However, when we do break forward compatibility, when an old rr tries to replay an incompatible recording, it often just crashes mysteriously. This is suboptimal.

So, I have added "forward compatibility version checking" to rr. rr builds have a forward compability version; each recording is stamped with the forward compatibility version it was created with; and rr will refuse to replay a recording with a later forward compatibility version than the rr build supports. When we make an rr change that means old rrs can no longer replay new rr recordings, we'll bump the forward compatibility version in the source.

Note that rrs built before this change don't have the check, will continue to merrily try to replay recordings they can't replay, and die in exciting ways.

Tuesday, 4 May 2021

Lake Waikaremoana 2021

Last week I did the Lake Waikaremoana Great Walk again, in Te Uruwera National Park. Last time it was just me and my kids, and that was our first "Great Walk" and first more-than-one-night tramp, so it has special memories for me (hard to believe it was only seven years ago!). Since then my tramping group has grown; this time I was with one of my kids and ten friends. The weather was excellent and once again, I think everyone had a great time — I certainly did! I really thank God for all of it: the weather; the location; the huts and tracks and the people who maintain them; my tramping friends, whom I love so much; and the time I get to spend with them.

We did the track in the clockwise direction, starting from Hopuruahine at the north end of the lake, walking the west edge of the lake to Onepoto in the south. Most people go the other way but I like this direction because it leaves the best views for last, along the Panekiri Bluff. We took four days, even though it's definitely doable in three, because I like spending time around huts, and the only way to spread the walk evenly over three days is to skip staying at Panekiri Hut, which has the best views.

I heard from a local that this summer was the busiest ever at Lake Waikaremoana, which makes sense because NZers haven't wanted to leave the country. We were right at the end of the first-term school holidays and the track was reasonably busy (the most popular huts (Marauiti Hut, Waiopaoa Hut) were full, but the others weren't) but by no means crowded. We would see a couple of dozen other people on the track each day and fewer people at the huts we stayed at.

So six of our group stayed at Marauiti Hut the first night, and four at Waiharuru Hut (because Marauiti Hut had filled up before they booked). On the second day one of my friends and his six-year-old son (on his first overnight tramp) were dropped by water taxi at the beach by Maraunui campsite; that worked well. Unfortunately one of our group had a leg problem and had to take the same water taxi out. That night we all stayed at Waiopaoa Hut (after most of us did the side trip to Korokoro Falls). On the third day we raced up to Panekiri Hut in three hours and had the whole afternoon there — a bit cold, but amazing views when the cloud lifted. Many games of Bang were played. The last day was pretty slow for various reasons so we didn't exit until about 1:30pm, but it was great to be in the extraordinary "goblin forest" along the Panekiri Bluff, with amazing views over the lake on a perfectly clear day.

For this trip I ran my dehydrator four times — lots of fruit chips, and half a kilogram of beef jerky. That was all well received. We also had lots of other usual snacks (chocolate, muesli bars). As a result we didn't eat many of the cracker boxes I had brought for lunch; we'll cut down on those next time. We did nearly run out of gas and actually run out of toilet paper (at the end), so I need to increase our budget for those. We rehydrated my dehydrated carrots by draining pasta water over them; they didn't grow much, but they took on the texture of pickled vegetables and were popular, so we'll do that again.

With such a big group it's easy to just talk to each other, so I need to make a special effort to talk to other people at the huts. We did meet a few interesting characters. One group at Panekiri Hut, who had just started the track, needed their car moved from Onepoto to Hopuruahine so we did that for them after we finished the track. I hope they picked it up OK!

In my opinion, Lake Waikaremoana isn't the best of the Great Walks in any particular dimension, but it's certainly very good and well worth doing if you've done the others. Kia ora Ngai TÅ«hoe!

Tuesday, 27 April 2021

Print Debugging Should Go Away

This is based on a comment I left on HN.

Many people prefer print debugging over interactive debugging tools. Some of them seem to have concluded that the superiority of print debugging is some kind of eternal natural law. It isn't: almost all the reasons people use print debugging can be overcome by improving debuggers — and to some extent already have been. (In the words of William Gibson, the future is already here, it's just not evenly distributed yet). The superiority of print debugging is contingent and, for most developers, it will end at some point (or it has already ended and they don't know it.)

Record-and-replay debuggers like rr (disclaimer: I initiated it and help maintain it), Undo, TTD, replay.io, etc address one set of problems with interactive debuggers. You don't have to stop the program to debug it; you can record a complete run, and debug it later. You can record the program many times until it fails and debug only the execution that failed until you understand the failure. You can record the program running in a far-off machine, extract the recording and debug it wherever you want.

Pernosco (disclaimer: also my baby) and other omniscient debuggers go much further. Fans of print debugging observe that "step debuggers" (even record-and-replay step debuggers, like rr) only show you one point in time, and this is limiting. They are absolutely right. Omniscient debuggers have fast access to all program states and can show you at a glance how program state changes over time. One of our primary goals in Pernosco (mostly achieved, I think) is that developers should never feel the need to "step" to build up a mental picture of how program state evolves over time. One way we do this is by supporting a form of "interactive print debugging":

Once you buy into omniscient debugging a world of riches opens to you. For example omniscient debuggers like Pernosco let you track dataflow backwards in time, a debugging superpower print debugging can't touch.

There are many reasons why print debugging is still the best option for many developers. rr, Pernosco and similar tools can't even be used at all in many contexts. However, most of the limitations of these tools (programming languages, operating systems, hardware platforms, overhead) could be mitigated with sufficient investment in engineering work and a modicum of support from platform vendors. It's important to keep in mind that the level of investment in these tools to date has been incredibly low, basically just a handful of startups and destitute open source projects. If the software industry took debugging seriously — instead of just grumbling about the tools and reverting to print debugging (or, at best, building a polished implementation of the features debuggers have had since the 1980s) — and invested accordingly we could make enormous strides, and not many people would feel the need to resort to print debugging.

Friday, 16 April 2021

Demoing The Pernosco Omniscient Debugger: Debugging Crashes In Node.js And GDB

This post was written by Pernosco co-founder Kyle Huey.

Traditional debugging forms a hypothesis about what is going wrong with the program, gathers evidence to accept or reject that hypothesis, and repeats until the root cause of the bug is found. This process is time-consuming, and formulating useful hypotheses often requires deep understanding of the software being debugged. With the Pernosco omniscient debugger there’s no need to speculate about what might have happened, instead an engineer can ask what actually did happen. This radically simplifies the debugging process, enabling much faster progress while requiring much less domain expertise.

To demonstrate the power of this approach we have two examples from well-known and complex software projects. The first is an intermittently crashing node.js test. From a simple stack walk it is easy to see that the proximate cause of the crash is calling a member function with a NULL `this` pointer. The next logical step is to determine why that pointer is NULL. In a traditional debugging approach, this requires pre-existing familiarity with the codebase, or reading code and looking for places where the value of this pointer could originate from. Then an experiment, either poking around in an interactive debugger or adding relevant logging statements, must be run to see where the NULL pointer originates from. And because this test fails intermittently, the engineer has to hope that the issue can be reproduced again and that this experiment doesn’t disturb the program’s behavior so much that the bug vanishes.

In the Pernosco omniscient debugger, the engineer just has to click on the NULL value. With all program state available at all points in time, the Pernosco omniscient debugger can track this value back to its logical origin with no guesswork on the part of the user. We are immediately taken backwards to the point where the connection in question received an EOF and set this pointer to NULL. You can read the full debugging transcript here.

Similarly, with a crash in gdb, the proximate cause of the crash is immediately obvious from a stack walk: the program has jumped through a bad vtable pointer to NULL. Figuring out why the vtable address has been corrupted is not trivial with traditional methods: there are entire tools such as ASAN (which requires recompilation) or Valgrind (which is very slow) that have been designed to find and diagnose memory corruption bugs like this. But in the Pernosco omniscient debugger a click on the object’s pointer takes the user to where it was assigned into the global variable of interest, and another click on the value of the vtable pointer takes the user to where the vtable pointer was erroneously overwritten. Walk through the complete debugging session here.

As demonstrated in the examples above, the Pernosco omniscient debugger makes it easy to track down even classes of bugs that are notoriously difficult to work with such as race conditions or memory corruption errors. Try out Pernosco individual accounts or on-premises today!

Wednesday, 14 April 2021

Visualizing Control Flow In Pernosco

In traditional debuggers, developers often single-step through the execution of a function to discover its control flow. One of Pernosco's main themes is avoiding single-stepping by visualizing state over time "all at once". Therefore, presenting control flow through a function "at a glance" is an important Pernosco feature and we've recently made significant improvements in this area.

This is a surprisingly hard problem. Pernosco records control flow at the instruction level. Compiler-generated debuginfo maps instructions to source lines, but lacks other potentially useful information such as the static control flow graph. We think developers want to understand control flow in the context of their source code (so approaches taken by, e.g., reverse engineering tools are not optimal for Pernosco). However, mapping potentially complex control flow onto the simple top-to-bottom source code view is inherently lossy or confusing or both.

For functions without loops there is a simple, obvious and good solution: highlight the lines executed, and let the user jump in time to that line's execution when clicked on. In the example below, we can see immediately where the function took an early exit.

To handle loops, Pernosco builds a dynamic control flow graph, which is actually a tree where leaf nodes are the execution of source lines, non-leaf nodes are the execution of a loop iteration and the root node is the execution of the function itself. Constructing a dynamic CFG is surprisingly non-trivial (especially in the presence of optimized code and large functions with long executions), but outside the scope of this post. Then, given a "current moment" during the function call, we identify which loop iterations are "current", and highlight the lines executed by those loop iterations; clicking on these highlights jumps directly to the appropriate point in time. Any lines executed during this function call but not in a current loop iteration are highlighted differently; clicking on these highlights shows all executions of that line in that function call. Hover text explains what is going on.

This presentation is still lossy — for example control-flow edges are not visible. However, user feedback has been very positive.

Try out Pernosco individual accounts or on-premises today!