Monday, 20 November 2017

Tararua Southern Crossing

I spent three days last week hiking the Tararua Southern Crossing in the Tararua Ranges north of Wellington with one of my children, and had a great time!

On the first day we took the train from Wellington city up the coast to Waikanae on the Kapiti Coast, then were driven east into the ranges, to Otaki Forks where the track starts. It took about five hours to walk to Kime Hut, at about 1400m — an altitude gain of about 1200m. From there we watched a spectacular sunset over the northern end of the South Island, which was wholly visible from the Marlborough Sounds in the west to the Kaikoura Ranges in the east. Kime has no heating and has a reputation for being cold, but it was upgraded in 2014 and although there was a hard frost outside, we were fine sleeping in our clothes inside our sleeping bags.

The next day we hiked over "the tops" — semi-alpine terrain above the tree line. This area is notoriously windy and potentially dangerous in bad weather, but the weather was fine and the wind tolerable. The DoC office had warned us not to take the route if the wind forecast for Powell Hut further north was 50km/h or higher, and I was able to download a fresh forecast in the morning from near Kime Hut which forecast 35km/h. In some places, notably the south side of Mount Hector, the wind was a little unnerving, but it was never a real problem. In places the track is narrow with steep drops on both sides, and although I'm generally not good with heights, I was fine just keeping my eyes on the track; wielding a couple of hiking poles helped. We were cautious and it took us about five hours to reach that day's destination, Alpha Hut. Along the way we had spectacular views from the Kapiti Coast in the west, around to the Hutt Valley, Wellington, and snow-capped peaks in the South Island to the south, and over to the Wairarapa valley in the east. It was a wonderful day.

The final day was a long walk through the bush along the Marchant Ridge down to rural Kaitoke. It took us about eight hours of walking, after which we had a taxi take us to Upper Hutt and then a train back to Wellington. It's a bit of a slog, but has more good views and generally fun. I tripped on a root near the end and banged my knee, which is still sore a few days later, but so it goes.

I'd never been tramping in the Tararuas before, and from the visitor books it looked like most trampers in this area are locals. I'm glad I got a chance to try it. We were lucky that the weather was excellent; it's often wet and/or foggy. Although the area between Kime and Alpha is classed as a "route" and most of the rest is "tramping track" grade, we found almost all the track well-defined and in good condition, though a lot of stepping up and down rocks and tree roots is required!

After the tramp we met up with the rest of the family for a couple of days in Wellington, in particular to visit the World War I exhibitions at Te Papa museum and the Dominion Museum. They are excellent, though required a significant amount of more walking which probably didn't help my knee.

Sunday, 29 October 2017

Auckland Half Marathon 2017

Not as good a time as last year, but OK. I seemed to run out of steam a bit towards the end this year; I'm not sure why. Maybe I pushed too hard too early, or just had a bad day. I should try running more fast 10Ks in the lead-up next year.

I followed Alfredo's advice to reduce the wear on my bare feet by sticking some small pieces of climbing tape to the balls of my feet. It worked really well. The tape almost entirely wore off during the race, and my feet didn't feel any different while I was running, but the skin didn't wear away. In experiments on training runs I discovered that strips of tape parallel to the side of the foot work better than strips across the foot because in the latter case, the flexing of the foot seems to pull the tape off.

Thursday, 19 October 2017

Microsoft's Chrome Exploitation And The Limitations Of Control Flow Integrity

Microsoft published an interesting blog post about exploiting a V8 bug to achieve arbitrary code execution in a Chrome content sandbox. They rightly point out that then even if you don't escape the sandbox, you can break important Web security properties (e.g., assuming the process is allowed to host content from more than one origin, you can break same-origin restrictions). However, the message we're supposed to take away from this article is that Microsoft's CFI would prevent similar bugs in Edge from having the same impact. I think that message is basically wrong.

The problem is, once you've achieved arbitrary memory read/write from Javascript, it's very likely you can break those Web security properties without running arbitrary machine code, without using ROP, and without violating CFI at all. For example if you want to violate same-origin restrictions, your JS code could find the location in memory where the origin of the current document is stored and rewrite it to be a different origin. In practice it would quite a lot more complicated than that, but the basic idea should work, and once you've implemented the technique it could be used to exploit any arbitrary read/write bug. It might even be easier to write some exploits this way than using traditional arbitrary code execution; JS is a more convenient programming language than ROP gadgets.

The underlying technical problem is that once you've achieved arbitrary read/write you can almost completely violate data-flow integrity within the process. As I recently wrote, DFI is extremely important and (unlike CFI) it's probably impossible to dynamically enforce with low overhead in the presence of arbitrary read/write, with any reasonable granularity.

I think there's also an underlying cultural problem here, which is that traditionally "Remote Code Execution" — of unconstrained machine code — has been the gold standard for a working exploit, which is why techniques to prevent that, like CFI, have attracted so much attention. But Javascript (or some other interpreter, or even some Turing-complete interpreter-like behavior) armed with an arbitrary memory read/write primitive is just as bad in a lot of cases.

Sunday, 15 October 2017

"Slow To Become Angry"

James 1:19-20

My dear brothers and sisters, take note of this: Everyone should be quick to listen, slow to speak and slow to become angry, because human anger does not produce the righteousness that God desires.

Online media exploit the intoxicating effects of righteous anger to generate "engagement". But even — or especially — when the targets of one's anger richly deserve it, becoming a person whose life is characterized by anger is contrary to God's will.

Something to remember when I'm tempted to click on yet another link about some villain getting what he deserves.

Monday, 9 October 2017

Type Safety And Data Flow Integrity

We talk a lot about memory safety assuming everyone knows what it is, but I think that can be confusing and sell short the benefits of safety in modern programming languages. It's probably better to talk about "type safety". This can be formalized in various ways, but intuitively a language's type system proposes constraints on what is allowed to happen at run-time — constraints that programmers assume when reasoning about their programs; type-safe code actually obeys those constraints. This includes classic memory safety features such as avoidance of buffer overflows: writing past the end of an array has effects on the data after the array that the type system does not allow for. But type safety also means, for example, that (in most languages) a field of an object cannot be read or written except through pointers/references created by explicit access to that field. With this loose definition, type safety of a piece of code can be achieved in different ways. The compiler might enforce it, or you might prove the required properties mechanically or by hand, or you might just test it until you've fixed all the bugs.

One implication of this is that type-safe code provides data-flow integrity. A type system provides intuitive constraints on how data can flow from one part of the program to another. For example, if your code has private fields that the language only lets you access through a limited set of methods, then at run time it's true that all accesses to those fields are by those methods (or due to unsafe code).

Type-safe code also provides control-flow integrity, because any reasonable type system also suggests fine-grained constraints on control flow.

Data-flow integrity is very important. Most information-disclosure bugs (e.g. Heartbleed) violate data-flow integrity, but usually don't violate control-flow integrity. "Wild write" bugs are a very powerful primitive for attackers because they allow massive violation of data-flow integrity; most security-relevant decisions can be compromised if you can corrupt their inputs.

A lot of work has been done to enforce CFI for C/C++ using dynamic checks with reasonably low overhead. That's good and important work. But attackers will move to attacking DFI, and that's going to be a lot harder to solve for C/C++. For example the checking performed by ASAN is only a subset of what would be required to enforce the C++ type system, and ASAN's overhead is already too high. You would never choose C/C++ for performance reasons if you had to run under ASAN. (I guess you could reduce ASAN's overhead if you dropped all the support for debugging, but it would still be too high.)

Note 1: people often say "even type safe programs still have correctness bugs, so you're just solving one class of bugs which is not a big deal" (or, "... so you should just use C and prove everything correct"). This underestimates the power of type safety with a reasonably rich type system. Having fine-grained CFI and DFI, and generally being able to trust the assumptions the type system suggests to you, are essential for sound reasoning about programs. Then you can leverage the type system to build abstractions that let you check more properties; e.g. you can enforce separation between trusted and untrusted data by giving untrusted user input different types and access methods to trusted data. The more your code is type-safe, the stronger is your confidence in those properties.

Note 2: C/C++ could be considered "type safe" just because the specification says any program executing undefined behavior gets no behavioral constraints whatsoever. However, in practice, programmers reasoning about C/C++ code must (and do) assume the constraint "no undefined behavior occurs"; type-safe C/C++ code must ensure this.

Note 3: the presence of unsafe code within a hardware-enforced protection domain can undermine the properties of type-safe code within the same domain, but minimizing the amount of such unsafe code is still worthwhile, because it reduces your attack surface.

Legacy Code Strikes Again

This blog post describes using binary diffing to find security-relevant bugs in Windows 7 that were fixed in Windows 10. It's an interesting example of the problems you can get into if you don't fix bugs across all your product versions at about the same time.

CVE-2017-8685 is particularly sad. The system call NtGdiEngCreatePalette can leak uninitialized values from kernel memory to user-space; these leaks are quite serious security issues. This is basically the old GDI CreatePalette function. It's a system call because in Windows NT 4.0 (1996) Microsoft moved the GDI subsystem into the kernel for performance reasons. That may have made sense at the time, but at that time GDI was already a mess of complicated legacy code, so this was a large increase in attack surface that's been hurting security and stability for Microsoft users ever since.

What's especially sad about this function is that CreatePalette is only useful for palette-based displays, which became obsolete in the 1990s, around the time NT 4.0 came out. It's been about 20 years since this API was useful for anything other than compatibility with even older software ... or kernel memory disclosure!

Sunday, 8 October 2017

Thoughts On Microsoft's Time-Travel Debugger

I'm excited that Microsoft's TTD is finally available to the public. Congratulations to the team! The video is well worth watching. I haven't used TTD myself yet since I don't have a Windows system at hand, but I've talked to Mozilla developers who've tried it on Firefox.

The most important and obvious difference between TTD and rr is that TTD is for Windows and rr is for Linux (though a few crazy people have had success debugging Windows applications in Wine under rr).

TTD supports recording of multiple threads in parallel, while rr is limited to a single core. On the other hand, per-thread recording overhead seems to be much higher in TTD than in rr. It's hard to make a direct comparison, but a simple "start Firefox, display mozilla.org, shut down" test run on similar hardware takes about 250 seconds under TTD and 26 seconds under rr. This is not surprising given TTD relies on pervasive binary instrumentation and rr was designed not to. This means recording extremely parallel workloads might be faster under TTD, but for many workloads rr recording will be faster. Starting up a large application really stresses binary translation frameworks, so it's a bit of a worst-case scenario for TTD — though a common one for developers. TTD's multicore recording might be better at reproducing certain kinds of concurrency bugs, though rr's chaos mode helps mitigate that problem — and lower recording overhead means you can churn through test iterations faster.

Therefore for Firefox-like workloads, on Linux, I still think rr's recording approach is superior. Note that when the technology behind TTD was first developed the hardware and OS features needed to support an rr-like approach did not exist.

TTD's ability to attach to arbitrary processes and start recording sounds great and would mitigate some of the slow-recording problem. This would be nice to have with rr, but hard to implement. (Currently we require reserving a couple of pages at specific addresses that might not be available when attaching to an arbitrary process.)

Some of the performance overhead of TTD comes from it copying all loaded libraries into the trace file, to ensure traces are portable across machines. rr doesn't do that by default; instead you have to run rr pack to make traces self-contained. I still like our approach, especially in scenarios where you repeatedly re-record a testcase until it fails.

The video mentions that TTD supports shared memory and async I/O and suggests rr doesn't. It can be confusing, but to clarify: rr supports shared memory as long as you record all the processes that are using the shared memory; for example Firefox and Chromium communicate with subprocesses using shared memory and work fine under rr. Async I/O is pretty rare in Linux; where it has come up so far (V4L2) we have been able to handle it.

Supporting unlimited data breakpoints is a nice touch. I assume that's done using their binary instrumentation.

TTD's replay looks fast in the demo videos but they mention that it can be slower than live debugging. They have an offline index build step, though it's not clear to me yet what exactly those indexes contain. It would be interesting to compare TTD and rr replay speed, especially for reverse execution.

The TTD trace querying tools look cool. A lot more can be done in this area.

rr+gdb supports running application functions at debug time (e.g. to dump data structures), while TTD does not. This feature is very important to some rr users, so it might be worthwhile for the TTD people to look at.