Tuesday, 4 May 2021

Lake Waikaremoana 2021

Last week I did the Lake Waikaremoana Great Walk again, in Te Uruwera National Park. Last time it was just me and my kids, and that was our first "Great Walk" and first more-than-one-night tramp, so it has special memories for me (hard to believe it was only seven years ago!). Since then my tramping group has grown; this time I was with one of my kids and ten friends. The weather was excellent and once again, I think everyone had a great time — I certainly did! I really thank God for all of it: the weather; the location; the huts and tracks and the people who maintain them; my tramping friends, whom I love so much; and the time I get to spend with them.

We did the track in the clockwise direction, starting from Hopuruahine at the north end of the lake, walking the west edge of the lake to Onepoto in the south. Most people go the other way but I like this direction because it leaves the best views for last, along the Panekiri Bluff. We took four days, even though it's definitely doable in three, because I like spending time around huts, and the only way to spread the walk evenly over three days is to skip staying at Panekiri Hut, which has the best views.

I heard from a local that this summer was the busiest ever at Lake Waikaremoana, which makes sense because NZers haven't wanted to leave the country. We were right at the end of the first-term school holidays and the track was reasonably busy (the most popular huts (Marauiti Hut, Waiopaoa Hut) were full, but the others weren't) but by no means crowded. We would see a couple of dozen other people on the track each day and fewer people at the huts we stayed at.

So six of our group stayed at Marauiti Hut the first night, and four at Waiharuru Hut (because Marauiti Hut had filled up before they booked). On the second day one of my friends and his six-year-old son (on his first overnight tramp) were dropped by water taxi at the beach by Maraunui campsite; that worked well. Unfortunately one of our group had a leg problem and had to take the same water taxi out. That night we all stayed at Waiopaoa Hut (after most of us did the side trip to Korokoro Falls). On the third day we raced up to Panekiri Hut in three hours and had the whole afternoon there — a bit cold, but amazing views when the cloud lifted. Many games of Bang were played. The last day was pretty slow for various reasons so we didn't exit until about 1:30pm, but it was great to be in the extraordinary "goblin forest" along the Panekiri Bluff, with amazing views over the lake on a perfectly clear day.

For this trip I ran my dehydrator four times — lots of fruit chips, and half a kilogram of beef jerky. That was all well received. We also had lots of other usual snacks (chocolate, muesli bars). As a result we didn't eat many of the cracker boxes I had brought for lunch; we'll cut down on those next time. We did nearly run out of gas and actually run out of toilet paper (at the end), so I need to increase our budget for those. We rehydrated my dehydrated carrots by draining pasta water over them; they didn't grow much, but they took on the texture of pickled vegetables and were popular, so we'll do that again.

With such a big group it's easy to just talk to each other, so I need to make a special effort to talk to other people at the huts. We did meet a few interesting characters. One group at Panekiri Hut, who had just started the track, needed their car moved from Onepoto to Hopuruahine so we did that for them after we finished the track. I hope they picked it up OK!

In my opinion, Lake Waikaremoana isn't the best of the Great Walks in any particular dimension, but it's certainly very good and well worth doing if you've done the others. Kia ora Ngai TÅ«hoe!

Tuesday, 27 April 2021

Print Debugging Should Go Away

This is based on a comment I left on HN.

Many people prefer print debugging over interactive debugging tools. Some of them seem to have concluded that the superiority of print debugging is some kind of eternal natural law. It isn't: almost all the reasons people use print debugging can be overcome by improving debuggers — and to some extent already have been. (In the words of William Gibson, the future is already here, it's just not evenly distributed yet). The superiority of print debugging is contingent and, for most developers, it will end at some point (or it has already ended and they don't know it.)

Record-and-replay debuggers like rr (disclaimer: I initiated it and help maintain it), Undo, TTD, replay.io, etc address one set of problems with interactive debuggers. You don't have to stop the program to debug it; you can record a complete run, and debug it later. You can record the program many times until it fails and debug only the execution that failed until you understand the failure. You can record the program running in a far-off machine, extract the recording and debug it wherever you want.

Pernosco (disclaimer: also my baby) and other omniscient debuggers go much further. Fans of print debugging observe that "step debuggers" (even record-and-replay step debuggers, like rr) only show you one point in time, and this is limiting. They are absolutely right. Omniscient debuggers have fast access to all program states and can show you at a glance how program state changes over time. One of our primary goals in Pernosco (mostly achieved, I think) is that developers should never feel the need to "step" to build up a mental picture of how program state evolves over time. One way we do this is by supporting a form of "interactive print debugging":

Once you buy into omniscient debugging a world of riches opens to you. For example omniscient debuggers like Pernosco let you track dataflow backwards in time, a debugging superpower print debugging can't touch.

There are many reasons why print debugging is still the best option for many developers. rr, Pernosco and similar tools can't even be used at all in many contexts. However, most of the limitations of these tools (programming languages, operating systems, hardware platforms, overhead) could be mitigated with sufficient investment in engineering work and a modicum of support from platform vendors. It's important to keep in mind that the level of investment in these tools to date has been incredibly low, basically just a handful of startups and destitute open source projects. If the software industry took debugging seriously — instead of just grumbling about the tools and reverting to print debugging (or, at best, building a polished implementation of the features debuggers have had since the 1980s) — and invested accordingly we could make enormous strides, and not many people would feel the need to resort to print debugging.

Friday, 16 April 2021

Demoing The Pernosco Omniscient Debugger: Debugging Crashes In Node.js And GDB

This post was written by Pernosco co-founder Kyle Huey.

Traditional debugging forms a hypothesis about what is going wrong with the program, gathers evidence to accept or reject that hypothesis, and repeats until the root cause of the bug is found. This process is time-consuming, and formulating useful hypotheses often requires deep understanding of the software being debugged. With the Pernosco omniscient debugger there’s no need to speculate about what might have happened, instead an engineer can ask what actually did happen. This radically simplifies the debugging process, enabling much faster progress while requiring much less domain expertise.

To demonstrate the power of this approach we have two examples from well-known and complex software projects. The first is an intermittently crashing node.js test. From a simple stack walk it is easy to see that the proximate cause of the crash is calling a member function with a NULL `this` pointer. The next logical step is to determine why that pointer is NULL. In a traditional debugging approach, this requires pre-existing familiarity with the codebase, or reading code and looking for places where the value of this pointer could originate from. Then an experiment, either poking around in an interactive debugger or adding relevant logging statements, must be run to see where the NULL pointer originates from. And because this test fails intermittently, the engineer has to hope that the issue can be reproduced again and that this experiment doesn’t disturb the program’s behavior so much that the bug vanishes.

In the Pernosco omniscient debugger, the engineer just has to click on the NULL value. With all program state available at all points in time, the Pernosco omniscient debugger can track this value back to its logical origin with no guesswork on the part of the user. We are immediately taken backwards to the point where the connection in question received an EOF and set this pointer to NULL. You can read the full debugging transcript here.

Similarly, with a crash in gdb, the proximate cause of the crash is immediately obvious from a stack walk: the program has jumped through a bad vtable pointer to NULL. Figuring out why the vtable address has been corrupted is not trivial with traditional methods: there are entire tools such as ASAN (which requires recompilation) or Valgrind (which is very slow) that have been designed to find and diagnose memory corruption bugs like this. But in the Pernosco omniscient debugger a click on the object’s pointer takes the user to where it was assigned into the global variable of interest, and another click on the value of the vtable pointer takes the user to where the vtable pointer was erroneously overwritten. Walk through the complete debugging session here.

As demonstrated in the examples above, the Pernosco omniscient debugger makes it easy to track down even classes of bugs that are notoriously difficult to work with such as race conditions or memory corruption errors. Try out Pernosco individual accounts or on-premises today!

Wednesday, 14 April 2021

Visualizing Control Flow In Pernosco

In traditional debuggers, developers often single-step through the execution of a function to discover its control flow. One of Pernosco's main themes is avoiding single-stepping by visualizing state over time "all at once". Therefore, presenting control flow through a function "at a glance" is an important Pernosco feature and we've recently made significant improvements in this area.

This is a surprisingly hard problem. Pernosco records control flow at the instruction level. Compiler-generated debuginfo maps instructions to source lines, but lacks other potentially useful information such as the static control flow graph. We think developers want to understand control flow in the context of their source code (so approaches taken by, e.g., reverse engineering tools are not optimal for Pernosco). However, mapping potentially complex control flow onto the simple top-to-bottom source code view is inherently lossy or confusing or both.

For functions without loops there is a simple, obvious and good solution: highlight the lines executed, and let the user jump in time to that line's execution when clicked on. In the example below, we can see immediately where the function took an early exit.

To handle loops, Pernosco builds a dynamic control flow graph, which is actually a tree where leaf nodes are the execution of source lines, non-leaf nodes are the execution of a loop iteration and the root node is the execution of the function itself. Constructing a dynamic CFG is surprisingly non-trivial (especially in the presence of optimized code and large functions with long executions), but outside the scope of this post. Then, given a "current moment" during the function call, we identify which loop iterations are "current", and highlight the lines executed by those loop iterations; clicking on these highlights jumps directly to the appropriate point in time. Any lines executed during this function call but not in a current loop iteration are highlighted differently; clicking on these highlights shows all executions of that line in that function call. Hover text explains what is going on.

This presentation is still lossy — for example control-flow edges are not visible. However, user feedback has been very positive.

Try out Pernosco individual accounts or on-premises today!

Thursday, 4 March 2021

On-Premises Pernosco Now Available; Reflecting On Application Confinement

In November we announced Pernosco availability for individual developers via our debugging-as-a-service platform. That product requires developers to share binary code, debug symbols and test data with us (but not necessarily source code), and we recognize that many potential customers are not comfortable with that. Therefore we are making Pernosco available to run on-premises. Contact us for a free trial! On-prem pricing is negotiable, but we prefer to charge a fixed amount per month for unlimited usage by a given team. Keep in mind Pernosco's current limitations: applications that work with rr (Linux, x86-64), C/C++/Rust/Ada/V8.

An on-premises customer says:

One of the key takeaways for me in our evaluation is that users keep coming back to pernosco without me pushing for it, and really like it — I have rarely seen such a high adoption rate in a new tool.

To deploy Pernosco on-premises we package it into two containers: the database builder and the application server. You collect an rr trace, run the database builder, then run the application server to export a Web interface to the database. Both steps, especially database building, require a reasonably powerful machine; our hosted service uses c5d.9xlarge instances for database building and a smaller shared i3.4xlarge instance for the application servers. If you want to run your own private shared service you are responsible for any authentication, authorization, public routing to the Web interface, wrangling database storage, etc.

To help our customers feel comfortable using Pernosco on-premises, all our closed-source code is bundled into those two containers, and we provide an open-source Python wrapper which runs those containers with sandboxing to confine it. Our goal is that you should not have to trust anything other than that easily-audited wrapper script (and rr of course, which is open source, but not so easily audited, though lots of people have looked into it). Our database builder container simply reads one subtree of the filesystem and drops a database into it, and the application server reads that subtree and exposes a Web server on a private subnet. Both containers should be incapable of leaking data to the outside world, even if the contents were malicious.

This seems like a natural approach to deploying closed-source software — no-one wants to be the next Solarwinds. In fact even if you receive the entire product's source code from your vendor, you still want confinement because you're not really going to audit it effectively. Therefore I'm surprised to find that this use-case doesn't seem to be well-supported by infrastructure!

For example, Docker provides an embedded DNS server to containers that cannot be disabled. We don't need it, and in theory crafted DNS queries could leak information to the outside world, so I'd like to disable it. But apparently our confinement goal is not considered a valid use-case by Docker. (I guess you can work around this by limiting the DNS service on the container host somehow, but that sucks more.)

Another interesting issue is preventing leaks through our Web application. For example when a user loads our application, it could send messages to public Internet hosts that leak information. We try to prevent this by having our application send CSP headers that deny all non-same-origin access. The problem is, if our application was malicious it could simply change the CSP it sends. Preventing that would seem to require a proxy outside the container that checks or modifies the CSP headers we send.

I'm surprised these problems haven't already been solved, or at least that the solutions aren't widely known and deployed.

What Would Jesus Do ... About Vaccination?

My thoughts about COVID19 vaccination as a Christian are pretty simple (assuming the Pfizer vaccine or something similar):

  • Is it safe for me?
    Yes. I'm not known to be allergic to vaccine components or immunologically compromised, and the safety data is solid.
  • If I get exposed to COVID19, will the vaccine stop me from getting infected and passing it on to people around me?
    Yes, almost certainly per the data.
  • Knowing this, if I chose to not get vaccinated, caught COVID19, and infected people around me with COVID19, would I have disobeyed Jesus' command to love my neighbour?
  • Is there any other way I can ensure I won't catch COVID19 and infect others?
  • Are there any countervailing ethical issues with taking the vaccine?
    No. None of the vaccines on offer are closely connected to abortion.

Thus it is pretty clearly God's will for me to be vaccinated.

Monday, 22 February 2021

Mercer Bay

West of Auckland, between Karekare beach and Piha beach, there is a small bay called Mercer Bay. It's surrounded by cliffs and there is no marked track to get down to it, but I've known for a long time that there is an unmarked route down the cliff. Last year I met someone who knows the route and she kindly agreed to guide a group of us down it on Saturday morning.

We started from Karekare up the Cowan track to the turnoff. The cliffs look precipitious and I'm not good with heights, but the route is actually not very difficult, and we got down quite easily. Mercer Bay is very pretty. At dead low tide you can walk around the north end of the beach to an inlet with three large sea-caves. The largest goes a significant distance into the hill to a blowhole where the roof has collapsed. It's incredibly impressive.

Climbing back up the cliff is the sensible but strenuous way out, but at dead low tide you can walk south around the rocks back to Karekare beach, and that's what we did. There is some nontrivial (for me) climbing involved — a few narrow ledges, a few overhangs — but the barnacle-covered conglomerate rocks provide excellent footholds and handholds if you have gloves or don't mind a few scrapes. Unfortunately we did not time our walk well and the tide was coming in, so we had to hurry. In a few places we had to cross little inlets with waves surging in and out, and later on we just plunged in when it wasn't too deep. I don't know how close we came to being trapped, but I certainly would have preferred a larger safety margin. Lesson learned! (I was carrying my personal locator beacon and we may even have had cellphone coverage, so I think we would have been rescued had we taken refuge on the higher rocks, but how embarrassing!)

Anyway, now that I know the way I'm already looking forward to going back :-). If we go around the rocks again I'll make sure we start well before low tide.