Friday, 22 July 2016

Further Improving My Personal Digital Security

A few months ago I moved my 2FA secrets (my Github account and three Google accounts) from a phone app to a Yubikey. Recently, somewhat inspired by Daniel Pocock's blog posts about SMS and phone security --- plus other news --- I've decided to reduce the trust in my phone further.

I don't want my phone to be usable in an account-recovery attack, so I've removed it as a recovery option for my Google and Github accounts. To not increase the risk of losing control of those accounts unrecoverably, I bought a second Yubikey as a backup and regenerated 2FA secrets for those accounts onto both Yubikeys. (For both Google and Github, generating 2FA secrets invalidates existing ones, but it's easy enough to load a secret into any number of devices while the QR code for the new secret is visible.) I generated new backup verification codes and printed them without saving them anywhere. (Temporary data for the print job might linger on my laptop storage, though that's encrypted with a decent password. More worrying is that the printer might keep data around... I probably should have copied them down by hand!)

Unfortunately my other really important account --- my online banking account --- is weakly protected by comparison. Westpac's personal-banking system uses simple user-name-and-password logon. There are heuristics to detect "suspicious" transfers, which you need to confirm with a code sent to your phone by SMS. This is quite unsatisfactory, though not unsatisfactory enough to justify the trouble of switching banks (given that generally Westpac would reimburse me for losses due to my account being compromised).

Thursday, 7 July 2016

Ordered Maps For Stable Rust

The canonical ordered-map type for Rust is std::collections::BTreeMap, which looks nice, except that the range methods are marked unstable and so can't be used at all except with the nightly-build Rust toolchain. Those methods are the only way to perform operations like "find first element greater than a given key", so BTreeMap is mostly useless in stable Rust.

This wouldn't be a big deal if had a good ordered-map library that worked with stable Rust ... but, as far as I can tell, until now it did not. I didn't want to switch to Rust nightly just to use ordered maps, so I solved this problem by forking container-rs's bst crate, modifying it to work on stable Rust (which meant ripping out a bunch of "unstable" annotations, fixing a few places that required unstable "box" syntax, and fixing some test code that depended on unboxed closures), and publishing the result as stable_bst. (Note: I haven't actually gotten around to using it yet, so maybe it's broken, but at least its tests pass.)

So, if you want to use ordered maps with stable Rust, give it a try. bst has a relatively simple implementation and, no doubt, is less efficient than BTreeMap, but it should be comparable to the usual C++ std::map implementations.

Currently it supports only C++-style lower_bound and upper_bound methods for finding elements less/greater than a given key. range methods similar to BTreeMap could easily be added, using a local copy of the unstable standard Bound type. I'm not sure if I'll bother but I'd accept PRs.

Update I realized the lower_bound and upper_bound methods were somewhat useless since they only return forward iterators, so I bit the bullet, implemented the range/range_mut methods, removed lower_bound/upper_bound and the reverse iterators which are superseded by range, and updated

FWIW I really like the range API compared to C++-style upper/lower_bound. I always have to think carefully to use the C++ API correctly, whereas the range API is easy to use correctly: you specify upper and lower bounds, each of which can be unbounded, exclusive or inclusive, just like in mathematics. A nice feature of the range API (when implemented correctly!) is that if you happen to specify a lower bound greater than the upper bound, it returns an empty iterator, instead of returning some number of wrong elements --- or crashing exploitably --- as the obvious encoding in C++ would do.

Another somewhat obscure but cool feature of range is that the values for bounds don't have to be exactly the same type as the keys, if you set up traits correctly. ogoodman on github pointed out that in some obscure cases you want range endpoints that can't be expressed as key values. Their example is keys of type (A, B), lexicographically ordered, where B does not have min or max values (e.g., arbitrary-precision integers), and you want a range containing all keys with a specific value for A. With the BTreeMap and stable_bst::TreeMap APIs you can handle this by making the bounds be type B', where B' is B extended with artificial min/max values, and defining traits to order B/B' values appropriately.

Saturday, 2 July 2016

Itanium Zombie Claims Another Victim

Oracle owes HP $3B for (temporarily) dropping support for Itanium CPUs in 2011. This is the latest in a long line of embarrassments caused by that architecture. Its long, sad and very expensive story is under-reported and under-appreciated in the industry, probably because Intel, thanks to its x86 near-monopoly, ended up shrugging it off with no long-lasting ill effects. I'm disappointed about that; market efficiency requires that companies that make such enormous blunders should suffer. Ironically Intel's partners who jumped on the Itanium bandwagon --- HP, SGI, DEC, and even software companies such as Microsoft and Oracle --- ended up suffering a lot more than Intel did. Someone should do a proper retrospective and try to tally up the billions of dollars wasted and the products and companies ruined.

It was all so forseeable, too. I was in graduate school during Itanium development and there was massive skepticism in the CMU CS department that Itanium's explicit ILP would ever work well in the face of the unpredictable runtime behavior of real code. People correctly predicted that the compiler advances required were, in fact, unachievable. Corporate agendas, large budgets, and some over-optimistic academic researchers trumped common sense.

This amusing graph is a fine illustration of the folly of trusting "industry analysts", if any were needed.

Friday, 1 July 2016

rr 4.3.0 Released

I've just released rr 4.3.0. This release doesn't have any major new user-facing features, just a host of small improvements:

  • AVX (i.e. YMM) registers are exposed through gdb.
  • Optimizations for I/O-heavy tracees on btrfs. I highly recommend putting tracee data and the traces on the same btrfs filesystem to take advantage of this.
  • Support for dconf's shared memory usage.
  • Much better support for vfork.
  • Much better support for ptrace. This allows rr to record rr replay.
  • Support for tracees calling setuid.
  • Support for tracees compiled with AddressSanitizer.
  • Support for optimized release builds via cmake -DCMAKE_BUILD_TYPE=Release (thanks to Keno Fischer).
  • Keno Fischer also dived into the guts of rr and did some nice cleanups.
  • As always, syscall support was expanded and many minor bugs fixed.
  • This release has been tested on Ubuntu 16.04 and Fedora 24 (as well as older distros).
  • With the help of Brad Spengler, we got rr working on grsecurity kernels. (The changes to grsecurity only landed a few days ago.)

In this release I've fixed the last known intermittent test failure! Some recent Linux kernels have a regression in performance counter code that very rarely causes some counts to be lost. This regression seems to be fixed in 4.7rc5 which I'm currently running.

Ubuntu 16.04 was released with gdb 7.11.0, which contains a serious regression that makes it very unreliable with rr. The bug is fixed in gdb 7.11.1 which is shipping as an update to 16.04, so make sure to update.

Wednesday, 29 June 2016

Nexus 5X vs Wettest June Hour In Auckland's History

I had lunch with a friend in Newmarket today. I walked there from home, but it was raining so I half-ran there holding my phone and keys under my jacket. Unfortunately when I got there I was only holding my keys :-(. I ran around a bit looking for the phone, couldn't find it, and decided lunch with my friend was more important.

So later I walked home, keeping an eye out for my phone in vain --- during which it was really pouring; activated the Android Device Manager locator; drove to where it said the phone was; looked around (still raining) but couldn't find it and couldn't activate the ring (due to not having a phone!); drove home and activated the "call me when you find this phone" screen. Not long after that a miracle happens: a kid calls me on the phone. Not only has he found it and is willing to wait on the street for fifteen minutes while I drive back to pick it up, but the phone still works despite having been out in the rain for the wettest June hour in Auckland's history. Seriously, on my way home there were torrents of water in the gutters and whole streets were flooded. Congratulations to LG and Google, thanks to God the phone wasn't simply washed away, and thanks to the Grammar boys who found it and waited for me. (The latter I was able to thank directly by giving them my copy of China Mieville's wonderful The City And The City.)

Relearning Debugging With rr

As I've mentioned before, once you have a practical reverse-execution debugger like rr, you need to learn new debugging strategies to exploit its power, and that takes time. (Almost all of your old debugging strategies still work --- they're just wasting your time!) A good example presented itself this morning. A new rr user wanted to stop at a location in JIT-generated code, and modified the JIT compiler to emit an int3 breakpoint instruction at the desired location --- because that's what you'd do with a regular debugger. But with rr there's no need: you can just run past the generation of the code, determine the address of your generated instruction after the fact (by inserting a logging statement at the point where you would have triggered generation of int3, if you must), set a hardware execution breakpoint at that address, and reverse-execute until that location is reached.

One of the best reasons I've heard for not using rr was given by Jeff: "I don't want to forget how to debug on other platforms".

Tuesday, 28 June 2016

Handling Read-Only Shared Memory Usage In rr

One of rr's main limitations is that it can't handle memory being shared between recorded processes and not-recorded processes, because writes by a not-recorded process can't be recorded and replayed at the right moment. This mostly hasn't been a problem in practice. On Linux desktops, the most common cases where this occurs (X, pulseaudio) can be disabled via configuration (and rr does this automatically). However, there is another common case --- dconf --- that isn't easily disabled via configuration. When applications read dconf settings for the first time, the dconf daemon hands them a shared-memory buffer containing a one-byte flag. When those settings change, the flag is set. Whenever an application reads cached dconf settings, it checks this flag to see if it should refetch settings from the daemon. This is very efficient but it causes rr replay to diverge if dconf settings change during recording, because we don't replay that flag change.

Fortunately I've been able to extend rr to handle this. When an application maps the dconf memory, we map that memory into the rr supervisor process as well, and then replace the application mapping with a "shadow mapping" that's only shared between rr and the application. Then rr periodically checks to see whether the dconf memory has changed; if it is, then we copy the changes to the shadow mapping and record that we did so. Essentially we inject rr into the communication from dconf to the application, and forward memory updates in a controlled manner. This seems to work well.

That "periodic check" is performed every time the recorded process completes a traced system call. That means we'll forward memory updates immediately when any blocking system call completes, which is generally what you'd want. If an application busy-waits on an update, we'll never forward it and the application will deadlock, but that could easily be fixed by also checking for updates on a timeout. I'd really like to have a kernel API that lets a process be notified when some other process has modified a chunk of shared memory, but that doesn't seem to exist yet!

This technique would probably work for other cases where shared memory is used as a one-way channel from not-recorded to recorded processes. Using shared memory as a one-way channel from recorded to not-recorded processes already works, trivially. So this leaves shared memory that's read and written from both sides of the recording boundary as the remaining hard (unsupported) case.