Sunday, 19 December 2021

Mt Pirongia 2021

On Friday I took advantage of the Auckland border having opened (on Wednesday) to travel down to Mt Pirongia and tramp to the summit, staying at Pahautea Hut overnight and then walking out again on Saturday (yesterday). This is the second time I've done Mt Pirongia (the last one was April 2016). It was intense but pretty great!

IIRC last time we took the shortest route — Corcoran Rd end, up Tirohanga Track to Ruapane peak and then along the track to Pirongia summit, returning via the same route. This time we did a loop from the Grey Road end: taking the link track to Ruapane Track, then joining the Tirohanga track to Pirongia summit, then back down Mahaukura Track to the car park via Mahaukura and Wharauroa peaks. It's definitely longer this way but you see and do more.

Pirongia is extremely rugged and the tracks reflect this. There aren't many steps or boardwalk sections and the tracks stick to the ridgelines, and there are many peaks along those ridges (the remains of many volcanic cores), so you're constantly scrambling up and down steep slopes with the aid of rocks and roots. Where the rock faces are nontrivial, chains have been installed to help with the climbing. Pirongia gets a lot of rainfall and its soils don't drain well so there's a lot of mud along the way. Although the tracks are quite short horizontally, they're hard going. Good fitness, good boots, and determination are all pretty important here. But Pirongia isn't huge so you should get there in the end.

On Friday night the hut was not very full — twenty bunks but only seven people there, me and my three friends and another group, three young women. We got talking to their leader, who told us about her extensive tramping experience and an upcoming 10-day tramp around the infamous Northwest Circuit on Stewart Island that she was organising. Later she mentioned she's 17 years old. I was a bit flabbergasted to be honest. Good for her, and well done to her parents!

There's a real shortage of tramping huts in the Auckland region. Within a two-hour drive there's really only Pahautea at Pirongia and Crosbie's/Pinnacles in the Kauaeranga Valley in Coromandel, as far as I know. The latter are super busy and Pirongia is just a bit too hard for inexperienced trampers. But if you are fit and at least a little bit experienced, it's good option.

Saturday, 18 December 2021

Do We Really Need A Link Step?

mold looks pretty cool, and a faster drop-in ld replacement is obviously extremely useful. But having no link step at all would be even faster. Why do native-code compilers write out temporary object files which then have to be linked together, anyway? Could we stop doing that and have compilers emit compiled translation units directly into a final executable file that the OS can execute directly --- a "zero-link" approach? I think we could ... in many cases.

The basic idea is to treat the final executable file (an ELF file, say) as a mutable data structure. When the compiler would emit an object file it instead allocates space in that executable file using a shared memory allocator, and writes the object code directly into that space. To make this tractable we'll assume we aren't going to generate optimal code in size or space; we're going to build an executable that runs "pretty fast", for testing purposes (manual or automated).

An obvious problem is how to handle symbol resolution, i.e. what happens when the compiler emits code for a translation unit that uses symbol A from some other unit --- especially if that other unit hasn't been compiled yet? Here's an option for function symbols: when A is used for the first time, write a stub for A to the final binary and call that. When a definition for A is seen, patch the stub to jump to the definition. Internally this will mean maintaining a parallel hashtable of all undefined symbols that all compiler instances can share efficiently.

For data symbols, instead of using a stub, we can emit a pointer that can be patched to point to the data definition. For complex constants, we might need to defer initialization until load time or emit code to initialize them lazily.

To challenge the design a bit more, let's think about why object files are useful.

Sometimes compilers emit object files for a project which are then linked into multiple different output binaries. True, but it's more efficient to link them once into a single shared library which is then loaded by each of those output binaries, so projects should just do that.

Compilers use object files for incremental compilation: when a translation unit hasn't changed, its object file can be reused. We can capture the same benefits with the zero-link approach: reuse the final executable and keep around its symbol hashtable; when an object file changes, release the object file's space in the final executable, and allocate new space for the new object file.

You can combine multiple object files into a static library and the linker will select the object files that are needed to satistfy undefined symbols. In many projects this feature is only used for "system libraries" --- a project's build system should avoid building project object files that will not be used in the final link. System libraries are usually dynamically linked for sharing reasons. When we really need to subset static libraries, we could link objects from those libraries into our final executable on-demand when we first see them being used.

Another issue is debuginfo (particularly important to me!) Supporting debuginfo would require extending DWARF5 debuginfo sections to allow their data to be scattered over the final executable.

There are lots of unresolved questions, enough that I wouldn't bet money on this actually being practical. But I think it's worth questioning the assumption that we have to have a link step for native binaries.

Update Zig can do something like this.

Tuesday, 9 November 2021

Some Observations On The NZ CovidPass System

NZ's Ministry of Health has published a specification for the data in the CovidPass QR code. The spec looks pretty good to me; there's probably enough information for anyone to go ahead and implement a verifier app today, and it should only take a few days to put together a bare-bones verifier app. The spec also tells us a lot about how the system will work. I see some confusion/misinformation out there so here are some observations.

The main idea is very simple. You ask the Ministry (probably via the My Covid Record Web site, but possibly in other ways) to generate a statement of the form "<full-name>, <date-of-birth> is considered fully vaccinated". The Ministry computer system checks your records to ensure that they agree you're fully vaccinated, then generates that statement, digitally signs it with the Ministry's private key, and encodes the statement and the signature as a QR code. You can store that code on your phone, or print it out on a piece of paper. Later, you show that QR code to a gatekeeper who wants to check your vaccination status. They scan it with their own app, which decodes the statement, checks that the statement has a valid signature from the Ministry, and if it does, tells the gatekeeper "<full-name>, <date-of-birth> is considered fully vaccinated". To confirm that you're the person the statement is talking about, the gatekeeper will need to check your driver's license or other ID.

If you're not familar with digital signatures, it's important to know that unlike pen-and-paper signatures, altering the statement invalidates the signature and only the Ministry of Health can generate new signatures that verifiers will accept. This is basic "public key crytography" and generally very secure. To generate a fake vaccine certificate someone would have to break into Ministry computer systems, or feed false data into the Ministry database recording them as vaccinated, or find an egregious bug in the verification software. So of course you can easily copy someone else's statement, but if you change the details to match your own details, verifier apps will reject the new statement; a copied statement is only useful if you can pretend to be the person you copied it from.

For privacy: be aware that when you let someone view your QR code, you're telling them your full name and date of birth. They could record that information if they want to (though there may be legislation soon that restricts what they can do with that information). There is no need for a verifier app to notify anyone of these QR code scans, and I would expect the government's app to not notify or record scans. (Hopefully they'll release the source code like they do for CovidTracer.)

As I mentioned above, you don't need a phone to prove you're vaccinated; your code printed on a piece of paper will work fine. Verifiers will need a phone or similar device, but it doesn't have to be connected to the Internet to verify certificates (though the app will need to be updated once in a while). So DoC rangers could scan vaccination certificates at huts for example.

The data in the QR code currently doesn't record which vaccines you have had or when. In fact the Ministry could choose to issue these certificates to people who haven't even been vaccinated, if there's a good reason.

These signed statements have an expiration date on them, so periodically a particular QR code will expire. People using their phones will probably get the new one automatically but if you carry a printed one, you will need to print a new one every so often. This means the Ministry could change the criteria for issuing new certificates (e.g. to require a booster shot) in the future.

I like the way this has been designed. It could perhaps be a bit simpler — I'm not sure using W3C DID is worthwhile — but it's simple enough. By committing to this spec, it will be pretty easy to integrate certificate verification into other apps. People might even be able to implement interesting enhancements like scanning a QR code alongside a drivers license to verify the name and DoB automatically with one action. Let's hope the Ministry's contractors can finish their backend work and verifier app before the end of this month!

Monday, 4 October 2021

How WHO Failed

I see that WHO is in contention for the Nobel Peace Prize. This is absurd. WHO got almost everything wrong early in the COVID19 pandemic and probably made the pandemic much worse. Here's a list:

  • As late as April 2020 WHO was advising countries against closing borders. (NZ eliminated COVID19 after closing borders in March against WHO advice. Later, WHO had the gall to pretend NZ eliminated COVID19 by following WHO advice.)
  • As late as June 2020 WHO was advising that asymptomatic spread of COVID19 was "rare". We now know that asymptomatic spread of COVID19 was and is a major factor in transmission.
  • Until June 2020 WHO was advising people to not wear masks unless they were sick with COVID19 or caring for someone sick, because it would be either ineffective or harmful. We now know that general mask-wearing is helpful at preventing transmission.
  • Until May 2021 WHO was advising that COVID19 was spread mainly by droplets. Now we know that it is spread mainly via aerosols.
  • As late as July 2020 WHO was advising that fomite transmission was a "likely mode of transmission" for COVID19. Fomite transmission has never been demonstrated as far as I know.
  • WHO delayed declaring COVID19 a pandemic until 11 March 2020, long after it was obviously a pandemic.

It would be unreasonable to expect WHO to get everything right given the unknowns of a new pandemic. However, we should expect WHO to get more right than wrong, and the above list shows they were actually worse than useless. These failures demand serious investigation and reform, not a Nobel Prize. If that investigation and reform doesn't happen, in the next pandemic, countries will be best off ignoring WHO advice.

Sadly, I see little sign of such criticism and reform happening. Instead, as this Nobel talk illustrates, mainstream opinion backs WHO's COVID19 response and is almost completely silent on WHO's appalling COVID19 track record. I'm not sure why this has happened, but I suspect it's another casualty of American partisan politics: "Trump attacked WHO, therefore reasonable people have to uncritically support WHO". It's maddening.

(Note for those who don't know me: I am an enthusiastic supporter of mainstream science and institutions, in general. WHO bungled this one.)

Sunday, 12 September 2021

Emulating AMD Approximate Arithmetic Instructions On Intel

Pernosco accepts uploaded rr recordings from customers and replays them with binary instrumentation to build a database of all program execution, to power an amazing debugging experience. Our infrastructure is Intel-based AWS instances. Some customers upload recordings made on AMD (Zen) machines; for these recordings to replay correctly on Intel machines, instruction execution needs to produce bit-identical results. This is almost always true, but I recently discovered that the approximate arithmetic instructions RSQRTSS, RCPSS and friends do not produce identical results on Zen vs Intel. Fortunately, since Pernosco replays with binary instrumentation, we can insert code to emulate the AMD behavior of these instructions. I just needed to figure out a good way to implement that emulation.

Reverse engineering AMD's exact algorithm and reimplementing it with Intel's instructions seemed like it would be a lot of work and tricky to reimplement correctly. Instead, we take advantage of the fact that RSQRT/RCP are unary operations on single-precision float values. This means there are only 232 possible inputs, so a lookup table of all results is not out of the question: in the worst case it would only be 16GB. Of course we would prefer something smaller, so I computed the full table of Intel and AMD results and looked for patterns we can exploit.

Since the Intel and AMD values should always be pretty close together, I computed the XOR of the Intel and AMD values. Storing just this table lets us convert from AMD to Intel and vice versa. It turns out that for RSQRT there are only 22 distinct difference values, and for RCP only 17. This means we can store just one byte per table entry, an index into a secondary lookup table that gives the actual difference value. Another key observation is that the difference value depends only on the upper 21 bits of the input. (I suspect RSQRT/RCP completely ignore the bottom 11 bits of the input mantissa, but I haven't verified that.) Thus the main table can be stored in just 221 bytes, i.e. 2MB, and of course we need one table for RSQRT and one for RCP, so 4MB total, which is acceptable. With deeper analysis we might find more patterns we can use to compress the table further, but this is already good enough for our purposes.

That part was pretty easy. It turned out that most of the work was actually implementing the instrumentation. The problem is that each instruction comes in five distinct flavours. For RSQRT, there is:

  • RSQRTSS: compute RSQRT of the bottom 32 bits of the input operand, store the result in the bottom 32 bits of the output register, leaving all other output register bits unchanged
  • RSQRTPS: compute RSQRT of the bottom four 32-bit lanes of the input operand, store the results in the bottom four 32-bit lanes of the output register, leaving all other output register bits unchanged
  • VRSQRTSS (two input operands): compute RSQRT of the bottom 32 bits of the second input operand, store the result in the bottom 32 bits of the output register, copy bits 32-127 from the first input register to the output register, zero all bits >= 128 of the output register (seriously, Intel?)
  • VRSQRTPS, 128-bit version: compute RSQRT of the bottom four 32-bit lanes of the input operand, store the results in the bottom four 32-bit lanes of the output register, zero all bits >= 128 of the output register
  • VRSQRTPS, 256-bit version: compute RSQRT of the eight 32-bit lanes of the input operand, store the results in the eight 32-bit lanes of the output register
In each of these instructions the primary input operand can be a memory load operand instead of a register.

So our generated instrumentation has to perform one table lookup per lane and also handle the correct effects on other bits of the output register. If we really cared about performance we'd probably want to vectorize the table lookups, but that's hard and the performance impact is unlikely to matter in our case, so I kept it simple with serial logic using general purpose registers.

Anyway it's working well now and Pernosco is able to process AMD submissions using these instructions, so go ahead and send us your recordings to debug! (The logic also handles emulating Intel semantics if you happen to be running Pernosco on-prem on Zen hardware.) Tracing replay divergence back to RSQRTSS (through many long def-use chains) was extremely painful so I wrote a fairly good automated test suite for this work; I want to never again have to debug divergence caused by this.

Thursday, 9 September 2021

rr Trace Portability: Diverging Behavior of RSQRTSS in AMD vs Intel

When we added Zen support to rr, it was an open question whether it would be possible to reliably replay Zen recordings on Intel CPUs or vice versa. It wasn't clear whether CPU instructions normally used by applications had bit-identical semantics across vendors. Over time the news was good: replaying Zen recordings on Intel generally works — if you trap and emulate CPUID to return the Zen results, and work around a difference in x87 FIP handling. So Pernosco has been able to handle submissions from Zen users.

Unfortunately, today I discovered a new difference between AMD and Intel: the RSQRTSS instruction. Perhaps this is unsurprising, since it is described as: "computes an approximate reciprocal of the square root of the low single-precision floating-point value in the source operand" (emphasis mine). A simple test program:

#include <stdio.h>
#include <string.h>
int main(void) {
  float in = 256;
  float out;
  unsigned int raw;
  asm ("rsqrtss %1,%0" : "=x"(out) : "x"(in));
  memcpy(&raw, &out, 4);
  printf("out = %x, float = %f\n", raw, out);
  return 0;
On Intel Skylake I get
out = 3d7ff000, float = 0.062485
On AMD Rome I get
out = 3d7ff800, float = 0.062492
Intel's result just stays within the documented 1.5 x 2-12 relative error bound. (Seems unfortunate given that the exact reciprocal square root of 256 is so easily computed to 0.0625, but whatever...)

The net effect of this is that rr recordings captured on Zen that use RSQRTSS may not replay correctly on Intel machines. The instructions will execute fine but it's possible that the slight differences in results may later lead to diverging control flow which break the rr recording. We have seen this in practice with a Pernosco user.

I have some ideas about how to fix this for Pernosco. If they work that'll be fodder for another post.

Update For what it's worth, the same issue exists with RCPSS (and presumably the SIMD versions (V)RCPPS and (V)RSQRTPS). Intel also has a number of new approximate-arithmetic instructions in AVX512, but has published software reference implementations of those, so hopefully if AMD does ever implement them Zen will match those. I'm not (yet) aware of any other non-AVX512 "approximate" instructions.

Saturday, 19 June 2021

Spectre Mitigations Murder *Userspace* Performance In The Presence Of Frequent Syscalls

I just made a performance improvement to the (single-threaded) rr sources command to cache the results of access system calls checking for directory existence. It's a simple and very effective optimization on my Skylake, Linux 5.12 laptop:

[roc@localhost code]$ time rr sources ~/pernosco/main/test-tmp/basics-demo >& ~/tmp/output2
real	3m19.648s
user	1m9.157s
sys	2m9.416s

[roc@localhost code]$ time rr sources  ~/pernosco/main/test-tmp/basics-demo >& ~/tmp/output2
real	0m36.160s
user	0m36.009s
sys	0m0.053s

One interesting thing is that we cut the userspace execution time in half even though we're executing more userspace instructions than before. Frequent system calls actually slow down code execution in userspace. I assumed this was at least partly due to Spectre mitigations so I turned those off (with mitigations=off) and reran the test:

[roc@localhost code]$ time rr sources ~/pernosco/main/test-tmp/basics-demo >& ~/tmp/output2
real	2m5.776s
user	0m33.052s
sys	1m32.280s

[roc@localhost code]$ time rr sources  ~/pernosco/main/test-tmp/basics-demo >& ~/tmp/output2
real	0m33.422s
user	0m32.934s
sys	0m0.110s
So those Spectre mitigations make pre-optimization userspace run 2x slower (due to cache and TLB flushes I guess) and the whole workload overall 1.6x slower! Before Spectre mitigations, those system calls hardly slowed down userspace execution at all.

Monday, 7 June 2021

Tama Lakes Winter Tramp 2021

This weekend my kids and I went down to Tongariro National Park for a winter tramp, repeating a similar trip in 2019. Overall it was great but apparently word got out and Waihohonu Hut was a lot busier than last time.

The weather forecast wasn't great so it was just me and my kids on this trip. The first sign of busyness was that the car park of the Desert Road was about full. We got to the hut about 2:15pm and were able to claim the last bunks, but people kept arriving. I guess there were probably more than 50 people there on Saturday night, 30+ squeezed into bunks and a lot of people who had to sleep on the floor of the main room. It's such a huge hut that this was still tolerable and we had a fun afternoon and evening. We got some views of the lower flanks of Mt Ruapehu on the walk in, and a good view of Mt Ngaurahoe topped by cloud.

Sunday was drizzly with low cloud, as forecast. One of my kids stayed at the hut to study for exams. The other one and I walked to the Tama Lakes via an off-track route I heard about years ago and had been hoping to try out ever since. I don't have much experience with off-track walking and the conditions weren't ideal, but: they weren't bad, we were carrying all relevant gear, my son and I are reasonably fit and fast walkers, we left at 8am so had plenty of time, and the return trip from the lakes was via the Great Walk track (easy). It worked out well and we had a lot of fun, though the views were obscured by cloud — on a good day they would be magnificent. I'd like to do it again but only with a small group and not often; I don't want the environment to be damaged by the route becoming popular.

We got back to the hut about 2:20pm after some fast walking and found it already pretty full, again. In fact it looked like it would be even fuller than on Saturday night, so my kids and I decided to just walk out to the car and come home a day early. That seemed like the right decision for us, but also freed up a little bit of space in the hut.

So unfortunately it seems that the Queens Birthday trip to Waihohonu Hut will not be such a great option in the future; given it was barely tolerable with a poor weather forecast, it probably would be really intolerable had the forecast been good.

Wednesday, 19 May 2021

Forward Compatibility Of rr Recordings

Since 2017 rr has maintained backward compatibility for recordings, i.e. new rr versions can replay any recording made by any earlier rr version back to 5.0. When we set that goal, it wasn't clear for how long we'd be able to sustain it, but so far so good!

However, we have said nothing about forward compabitility — whether old rr versions are able to replay recordings produced by new rr versions — and in practice we have broken that many times. In practice that's generally OK. However, when we do break forward compatibility, when an old rr tries to replay an incompatible recording, it often just crashes mysteriously. This is suboptimal.

So, I have added "forward compatibility version checking" to rr. rr builds have a forward compability version; each recording is stamped with the forward compatibility version it was created with; and rr will refuse to replay a recording with a later forward compatibility version than the rr build supports. When we make an rr change that means old rrs can no longer replay new rr recordings, we'll bump the forward compatibility version in the source.

Note that rrs built before this change don't have the check, will continue to merrily try to replay recordings they can't replay, and die in exciting ways.

Tuesday, 4 May 2021

Lake Waikaremoana 2021

Last week I did the Lake Waikaremoana Great Walk again, in Te Uruwera National Park. Last time it was just me and my kids, and that was our first "Great Walk" and first more-than-one-night tramp, so it has special memories for me (hard to believe it was only seven years ago!). Since then my tramping group has grown; this time I was with one of my kids and ten friends. The weather was excellent and once again, I think everyone had a great time — I certainly did! I really thank God for all of it: the weather; the location; the huts and tracks and the people who maintain them; my tramping friends, whom I love so much; and the time I get to spend with them.

We did the track in the clockwise direction, starting from Hopuruahine at the north end of the lake, walking the west edge of the lake to Onepoto in the south. Most people go the other way but I like this direction because it leaves the best views for last, along the Panekiri Bluff. We took four days, even though it's definitely doable in three, because I like spending time around huts, and the only way to spread the walk evenly over three days is to skip staying at Panekiri Hut, which has the best views.

I heard from a local that this summer was the busiest ever at Lake Waikaremoana, which makes sense because NZers haven't wanted to leave the country. We were right at the end of the first-term school holidays and the track was reasonably busy (the most popular huts (Marauiti Hut, Waiopaoa Hut) were full, but the others weren't) but by no means crowded. We would see a couple of dozen other people on the track each day and fewer people at the huts we stayed at.

So six of our group stayed at Marauiti Hut the first night, and four at Waiharuru Hut (because Marauiti Hut had filled up before they booked). On the second day one of my friends and his six-year-old son (on his first overnight tramp) were dropped by water taxi at the beach by Maraunui campsite; that worked well. Unfortunately one of our group had a leg problem and had to take the same water taxi out. That night we all stayed at Waiopaoa Hut (after most of us did the side trip to Korokoro Falls). On the third day we raced up to Panekiri Hut in three hours and had the whole afternoon there — a bit cold, but amazing views when the cloud lifted. Many games of Bang were played. The last day was pretty slow for various reasons so we didn't exit until about 1:30pm, but it was great to be in the extraordinary "goblin forest" along the Panekiri Bluff, with amazing views over the lake on a perfectly clear day.

For this trip I ran my dehydrator four times — lots of fruit chips, and half a kilogram of beef jerky. That was all well received. We also had lots of other usual snacks (chocolate, muesli bars). As a result we didn't eat many of the cracker boxes I had brought for lunch; we'll cut down on those next time. We did nearly run out of gas and actually run out of toilet paper (at the end), so I need to increase our budget for those. We rehydrated my dehydrated carrots by draining pasta water over them; they didn't grow much, but they took on the texture of pickled vegetables and were popular, so we'll do that again.

With such a big group it's easy to just talk to each other, so I need to make a special effort to talk to other people at the huts. We did meet a few interesting characters. One group at Panekiri Hut, who had just started the track, needed their car moved from Onepoto to Hopuruahine so we did that for them after we finished the track. I hope they picked it up OK!

In my opinion, Lake Waikaremoana isn't the best of the Great Walks in any particular dimension, but it's certainly very good and well worth doing if you've done the others. Kia ora Ngai TÅ«hoe!

Tuesday, 27 April 2021

Print Debugging Should Go Away

This is based on a comment I left on HN.

Many people prefer print debugging over interactive debugging tools. Some of them seem to have concluded that the superiority of print debugging is some kind of eternal natural law. It isn't: almost all the reasons people use print debugging can be overcome by improving debuggers — and to some extent already have been. (In the words of William Gibson, the future is already here, it's just not evenly distributed yet). The superiority of print debugging is contingent and, for most developers, it will end at some point (or it has already ended and they don't know it.)

Record-and-replay debuggers like rr (disclaimer: I initiated it and help maintain it), Undo, TTD,, etc address one set of problems with interactive debuggers. You don't have to stop the program to debug it; you can record a complete run, and debug it later. You can record the program many times until it fails and debug only the execution that failed until you understand the failure. You can record the program running in a far-off machine, extract the recording and debug it wherever you want.

Pernosco (disclaimer: also my baby) and other omniscient debuggers go much further. Fans of print debugging observe that "step debuggers" (even record-and-replay step debuggers, like rr) only show you one point in time, and this is limiting. They are absolutely right. Omniscient debuggers have fast access to all program states and can show you at a glance how program state changes over time. One of our primary goals in Pernosco (mostly achieved, I think) is that developers should never feel the need to "step" to build up a mental picture of how program state evolves over time. One way we do this is by supporting a form of "interactive print debugging":

Once you buy into omniscient debugging a world of riches opens to you. For example omniscient debuggers like Pernosco let you track dataflow backwards in time, a debugging superpower print debugging can't touch.

There are many reasons why print debugging is still the best option for many developers. rr, Pernosco and similar tools can't even be used at all in many contexts. However, most of the limitations of these tools (programming languages, operating systems, hardware platforms, overhead) could be mitigated with sufficient investment in engineering work and a modicum of support from platform vendors. It's important to keep in mind that the level of investment in these tools to date has been incredibly low, basically just a handful of startups and destitute open source projects. If the software industry took debugging seriously — instead of just grumbling about the tools and reverting to print debugging (or, at best, building a polished implementation of the features debuggers have had since the 1980s) — and invested accordingly we could make enormous strides, and not many people would feel the need to resort to print debugging.

Friday, 16 April 2021

Demoing The Pernosco Omniscient Debugger: Debugging Crashes In Node.js And GDB

This post was written by Pernosco co-founder Kyle Huey.

Traditional debugging forms a hypothesis about what is going wrong with the program, gathers evidence to accept or reject that hypothesis, and repeats until the root cause of the bug is found. This process is time-consuming, and formulating useful hypotheses often requires deep understanding of the software being debugged. With the Pernosco omniscient debugger there’s no need to speculate about what might have happened, instead an engineer can ask what actually did happen. This radically simplifies the debugging process, enabling much faster progress while requiring much less domain expertise.

To demonstrate the power of this approach we have two examples from well-known and complex software projects. The first is an intermittently crashing node.js test. From a simple stack walk it is easy to see that the proximate cause of the crash is calling a member function with a NULL `this` pointer. The next logical step is to determine why that pointer is NULL. In a traditional debugging approach, this requires pre-existing familiarity with the codebase, or reading code and looking for places where the value of this pointer could originate from. Then an experiment, either poking around in an interactive debugger or adding relevant logging statements, must be run to see where the NULL pointer originates from. And because this test fails intermittently, the engineer has to hope that the issue can be reproduced again and that this experiment doesn’t disturb the program’s behavior so much that the bug vanishes.

In the Pernosco omniscient debugger, the engineer just has to click on the NULL value. With all program state available at all points in time, the Pernosco omniscient debugger can track this value back to its logical origin with no guesswork on the part of the user. We are immediately taken backwards to the point where the connection in question received an EOF and set this pointer to NULL. You can read the full debugging transcript here.

Similarly, with a crash in gdb, the proximate cause of the crash is immediately obvious from a stack walk: the program has jumped through a bad vtable pointer to NULL. Figuring out why the vtable address has been corrupted is not trivial with traditional methods: there are entire tools such as ASAN (which requires recompilation) or Valgrind (which is very slow) that have been designed to find and diagnose memory corruption bugs like this. But in the Pernosco omniscient debugger a click on the object’s pointer takes the user to where it was assigned into the global variable of interest, and another click on the value of the vtable pointer takes the user to where the vtable pointer was erroneously overwritten. Walk through the complete debugging session here.

As demonstrated in the examples above, the Pernosco omniscient debugger makes it easy to track down even classes of bugs that are notoriously difficult to work with such as race conditions or memory corruption errors. Try out Pernosco individual accounts or on-premises today!

Wednesday, 14 April 2021

Visualizing Control Flow In Pernosco

In traditional debuggers, developers often single-step through the execution of a function to discover its control flow. One of Pernosco's main themes is avoiding single-stepping by visualizing state over time "all at once". Therefore, presenting control flow through a function "at a glance" is an important Pernosco feature and we've recently made significant improvements in this area.

This is a surprisingly hard problem. Pernosco records control flow at the instruction level. Compiler-generated debuginfo maps instructions to source lines, but lacks other potentially useful information such as the static control flow graph. We think developers want to understand control flow in the context of their source code (so approaches taken by, e.g., reverse engineering tools are not optimal for Pernosco). However, mapping potentially complex control flow onto the simple top-to-bottom source code view is inherently lossy or confusing or both.

For functions without loops there is a simple, obvious and good solution: highlight the lines executed, and let the user jump in time to that line's execution when clicked on. In the example below, we can see immediately where the function took an early exit.

To handle loops, Pernosco builds a dynamic control flow graph, which is actually a tree where leaf nodes are the execution of source lines, non-leaf nodes are the execution of a loop iteration and the root node is the execution of the function itself. Constructing a dynamic CFG is surprisingly non-trivial (especially in the presence of optimized code and large functions with long executions), but outside the scope of this post. Then, given a "current moment" during the function call, we identify which loop iterations are "current", and highlight the lines executed by those loop iterations; clicking on these highlights jumps directly to the appropriate point in time. Any lines executed during this function call but not in a current loop iteration are highlighted differently; clicking on these highlights shows all executions of that line in that function call. Hover text explains what is going on.

This presentation is still lossy — for example control-flow edges are not visible. However, user feedback has been very positive.

Try out Pernosco individual accounts or on-premises today!

Thursday, 4 March 2021

On-Premises Pernosco Now Available; Reflecting On Application Confinement

In November we announced Pernosco availability for individual developers via our debugging-as-a-service platform. That product requires developers to share binary code, debug symbols and test data with us (but not necessarily source code), and we recognize that many potential customers are not comfortable with that. Therefore we are making Pernosco available to run on-premises. Contact us for a free trial! On-prem pricing is negotiable, but we prefer to charge a fixed amount per month for unlimited usage by a given team. Keep in mind Pernosco's current limitations: applications that work with rr (Linux, x86-64), C/C++/Rust/Ada/V8.

An on-premises customer says:

One of the key takeaways for me in our evaluation is that users keep coming back to pernosco without me pushing for it, and really like it — I have rarely seen such a high adoption rate in a new tool.

To deploy Pernosco on-premises we package it into two containers: the database builder and the application server. You collect an rr trace, run the database builder, then run the application server to export a Web interface to the database. Both steps, especially database building, require a reasonably powerful machine; our hosted service uses c5d.9xlarge instances for database building and a smaller shared i3.4xlarge instance for the application servers. If you want to run your own private shared service you are responsible for any authentication, authorization, public routing to the Web interface, wrangling database storage, etc.

To help our customers feel comfortable using Pernosco on-premises, all our closed-source code is bundled into those two containers, and we provide an open-source Python wrapper which runs those containers with sandboxing to confine it. Our goal is that you should not have to trust anything other than that easily-audited wrapper script (and rr of course, which is open source, but not so easily audited, though lots of people have looked into it). Our database builder container simply reads one subtree of the filesystem and drops a database into it, and the application server reads that subtree and exposes a Web server on a private subnet. Both containers should be incapable of leaking data to the outside world, even if the contents were malicious.

This seems like a natural approach to deploying closed-source software — no-one wants to be the next Solarwinds. In fact even if you receive the entire product's source code from your vendor, you still want confinement because you're not really going to audit it effectively. Therefore I'm surprised to find that this use-case doesn't seem to be well-supported by infrastructure!

For example, Docker provides an embedded DNS server to containers that cannot be disabled. We don't need it, and in theory crafted DNS queries could leak information to the outside world, so I'd like to disable it. But apparently our confinement goal is not considered a valid use-case by Docker. (I guess you can work around this by limiting the DNS service on the container host somehow, but that sucks more.)

Another interesting issue is preventing leaks through our Web application. For example when a user loads our application, it could send messages to public Internet hosts that leak information. We try to prevent this by having our application send CSP headers that deny all non-same-origin access. The problem is, if our application was malicious it could simply change the CSP it sends. Preventing that would seem to require a proxy outside the container that checks or modifies the CSP headers we send.

I'm surprised these problems haven't already been solved, or at least that the solutions aren't widely known and deployed.

What Would Jesus Do ... About Vaccination?

My thoughts about COVID19 vaccination as a Christian are pretty simple (assuming the Pfizer vaccine or something similar):

  • Is it safe for me?
    Yes. I'm not known to be allergic to vaccine components or immunologically compromised, and the safety data is solid.
  • If I get exposed to COVID19, will the vaccine stop me from getting infected and passing it on to people around me?
    Yes, almost certainly per the data.
  • Knowing this, if I chose to not get vaccinated, caught COVID19, and infected people around me with COVID19, would I have disobeyed Jesus' command to love my neighbour?
  • Is there any other way I can ensure I won't catch COVID19 and infect others?
  • Are there any countervailing ethical issues with taking the vaccine?
    No. None of the vaccines on offer are closely connected to abortion.

Thus it is pretty clearly God's will for me to be vaccinated.

Monday, 22 February 2021

Mercer Bay

West of Auckland, between Karekare beach and Piha beach, there is a small bay called Mercer Bay. It's surrounded by cliffs and there is no marked track to get down to it, but I've known for a long time that there is an unmarked route down the cliff. Last year I met someone who knows the route and she kindly agreed to guide a group of us down it on Saturday morning.

We started from Karekare up the Cowan track to the turnoff. The cliffs look precipitious and I'm not good with heights, but the route is actually not very difficult, and we got down quite easily. Mercer Bay is very pretty. At dead low tide you can walk around the north end of the beach to an inlet with three large sea-caves. The largest goes a significant distance into the hill to a blowhole where the roof has collapsed. It's incredibly impressive.

Climbing back up the cliff is the sensible but strenuous way out, but at dead low tide you can walk south around the rocks back to Karekare beach, and that's what we did. There is some nontrivial (for me) climbing involved — a few narrow ledges, a few overhangs — but the barnacle-covered conglomerate rocks provide excellent footholds and handholds if you have gloves or don't mind a few scrapes. Unfortunately we did not time our walk well and the tide was coming in, so we had to hurry. In a few places we had to cross little inlets with waves surging in and out, and later on we just plunged in when it wasn't too deep. I don't know how close we came to being trapped, but I certainly would have preferred a larger safety margin. Lesson learned! (I was carrying my personal locator beacon and we may even have had cellphone coverage, so I think we would have been rescued had we taken refuge on the higher rocks, but how embarrassing!)

Anyway, now that I know the way I'm already looking forward to going back :-). If we go around the rocks again I'll make sure we start well before low tide.

Saturday, 23 January 2021


Dehydrated food is great for tramping trips (saves weight and is less perishable) but the variety and cost in our local shops is not great, so although I don't like to accumulate many gadgets I bought a dehydrator over Christmas — a Biochef "Arizona" 6-tray unit. I've only used it a few times so far, but I'm very happy with it.

I've dehydrated sliced fruits: apples, pears, peaches, bananas and plums. Different people prefer different fruits but all of them have been well received. The pears are so sweet I feel guilty eating them. The unit can dehydrate at least 12 apples at a time, taking about 8 hours at 63C.

I've made beef and lamb jerky based on this recipe (using some Jack Daniels BBQ sauce I had around instead of "liquid smoke"). The unit can process about 2kg of meat in one run, taking about 6 hours at 70C with 6-7mm thick slices of meat. Longer dehydration times or smaller slices makes the jerky crunchy which is fine but not to everyone's taste. (The strips shown in the photo below are really too thin, because I bought a "stir fry" package from the supermarket with thinner strips than I expected, but they still taste good.)

I was surprised by how easy the process is. I thought it would take some practice to get good results but just about everything I've tried has turned out well.

There are a lot more experiments I want to do. In particular I want to investigate dehydrating vegetables for cooking meals while tramping. Fun!

Thursday, 21 January 2021

Tongariro Northern Circuit 2021

Yesterday I got back from a walk on the Tongariro Northern Circuit. Unfortunately things didn't go quite according to plan!

We had intended to walk the circuit over three days, clockwise. On Monday we would walk from Mangatepopo over Mt Tongariro, via the Tongariro Crossing, to Emerald Lakes, where we would turn off and carry on down to Oturere Hut. On Tuesday we'd walk south to Waihohonu Hut, and on Wednesday walk west to Whakapapa to complete most of the circuit. However on Monday high winds were forecast at the top of Tongariro so the Department of Conservation issued a "bad weather" forecast, so the shuttle we had booked would not take us to Mangatepopo, so we couldn't walk from there even if we thought it was safe, which we did not. We talked to staff at the DoC visitor's centre in Whakapapa and eventually decided to walk from Whakapapa east across the Tama Saddle to Waihohonu Hut and then north to Oturere — i.e. do half the circuit anticlockwise from Whakapapa in our first day. On Tuesday we walked south to Waihohonu as planned. On Wednesday we walked out to the Desert Rd and caught a shuttle back to Whakapapa instead of walking back to Whakapapa as planned, because the weather forecast was still poor and there didn't seem much point in re-traversing the saddle into wind and rain.

I thought the first day might be a bit gruelling — about eight hours of walking, on paper, with significant wind and rain forecast. It actually turned out pretty well. Only a little rain fell on us — we seemed to be moving east just ahead of it — and the wind was mostly at our backs. The sun even broke out a few times. No-one complained about the length of the walk and I felt pretty good myself. We reached Oturere Hut after about eight hours but that included our lunch break, a lengthy stop at Waihohonu Hut for a rest and hot drinks, and a side trip to Lower Tama Lake, so we were actually quite fast. One upside of the weather was that Oturere Hut, which is rather small, would have been packed with twenty-six people in good weather but only our group of ten and two other women actually showed up, so it was very comfortable.

On Tuesday the weather was similar — westerly wind and rain — but some of us wanted to do a "side trip" up Oturere Valley to Emerald Lakes if possible, before moving on to Waihohonu Hut — an hour and a half each way. Six of us (out of ten) did it, but it was a bit brutal! It wasn't too bad in the valley — wind and some drizzle in our faces, the spectacular volcanic desert landscape obscured by drifting fog — but the track then climbs steeply up to the saddle with the lakes, and there it was colder and the wind was much stronger. Scrambling up the last, steep part of the path into strong winds and driving rain was no fun at all! (The strong smell of sulphurous gases from the volcano added an extra frisson!) We had a quick look around the lake and the Tongariro Crossing junction and then scrambled right back down again. The walk back down the valley with the wind at our backs was positively pleasant. One good thing about this side trip is that it confirmed we had made the right choice in not risking crossing Tongariro on Monday!

We were able to have lunch in the cosy Oturere Hut and then it was just an easy two and a half hour walk to Waihohonu. Actually it was only mostly easy; in a few especially exposed places we got some very big wind gusts, probably the windiest conditions I've ever walked in. I had to lean hard into the wind to not be blown over, and some in our group just had to squat down and wait for the gusts to pass. I guess it was probably blowing a hundred kilometres an hour.

Once again we had the hut mostly to ourselves — and I think Waihohonu is still the best hut in New Zealand! The two women from Oturere stayed there, and there was also a French woman who was in NZ to work for a few years (arriving eighteen months ago — that was good timing). We had great afternoon and evening — a fire, games, good food and fellowship — and then on Wednesday morning, an early start and a short seventy-five minute walk to the car park to get picked up and returned to Whakapapa.

The weather was certainly disappointing. Ruapehu and Ngauruhoe are beatiful mountains and were entirely covered by cloud the entire time we were there. Of course it was disappointing we couldn't cross Tongariro. On the other hand, I think most or even all people in our group of ten had a good time and have much to be thankful for. If you tramp regularly you have to accept that the weather won't always be good; if you enjoy yourself even in the bad weather, you've got the Right Stuff for tramping :-).