Friday 7 August 2020
What Is The Minimal Set Of Optimizations Needed For Zero-Cost Abstraction?
A compelling feature of Rust and C++ is "zero-cost abstractions". You can write "high level" code, e.g. using iterators, that compiles down to the same machine code as the low-level code that you'd write by hand. You can add layers of abstraction, e.g. wrapping a primitive value in a struct and providing a specialized API for it, without adding run-time overhead. However, "zero-cost" only applies if you enable an adequate set of compiler optimizations. Unfortunately, enabling those optimizations slows down compilation and, with current compilers, trashes a lot of debug information, making it very difficult to debug with those binaries. Since the Rust standard library (and increasingly the C++ standard library) makes heavy use of "zero-cost" abstractions, using non-optimized builds for the sake of better debugging and build times creates binaries that are many times slower than release builds, often untenably slow. So the question is: how can we get fast builds and quality debuginfo while keeping zero-cost abstractions?
An obvious approach is limit the set of enabled compiler optimizations to the minimum necessary to achieve zero-cost abstraction, and hope that that produces acceptable build speed and debuginfo quality. But what is that set? If it's "all optimizations", then this is no help. I guess it depends on the exact set of "zero-cost abstraction" patterns we want to support. Here are some optimizations that I think need to be on the list:
- Inlining: Most or all abstractions depend on inlining functions to achieve zero-cost, because they encapsulate code into functions that you'd otherwise write yourself. So, we must have aggressive inlining.
- Copy propagation: After inlining functions, there will be chains of assignments through parameters and results that will need to be shortened using copy propagation.
- Limited dead store elimination and temporary elimination: After copy propagation, a lot of temporary values aren't needed. We shouldn't store their values or allocate space for them.
- Scalar replacement: Many abstractions package up variables into structs and encapsulate operations on the struct into functions. It seems likely that the optimizer needs to be able to undo that, taking structs apart and treating primitive components like any other scalar variable. This would typically be needed to make the above optimizations work on those components.
Here are some optimizations that I think don't need to be on the list:
- Register allocation: Unoptimized debug builds typically do almost no register allocation: local variables live in stack slots. Thus, if unnecessary temporaries are eliminated (see above) storing the surviving variables on the stack is not penalizing abstraction.
- Vectorization: Unoptimized debug builds typically don't do any vectorization, so again we can just keep on not doing it without penalizing abstraction.
- Tail calls: I can't think of a Rust or C++ abstraction that relies on TCO to be zero-cost.
Here's an optimization I'm not sure about:
- Common subexpression elimination: Are there common abstractions that we think should be zero-cost in Rust and C++ that require CSE? I can't think of any off the top of my head.
It would be very interesting to dive into this further and actually configure the Rust compiler with a promising set of optimizations and evaluate the results.
A somewhat orthogonal approach to the whole problem is to simply improve the debuginfo quality and build speed of optimized builds. The former is mostly a lot of engineering and architectural work to preserve debuginfo through various optimization passes. There are issues where some optimizations (e.g. register allocation) cause variable values to be unavailable at program points where they're technically still in scope — but good record-and-replay debuggers like rr or better still omniscient debuggers like Pernosco can get around that. Build speed is really the harder problem: doing lots of optimizations, especially with all the bookkeeping to preserve debuginfo, is inherently expensive.
I wonder what a compiler backend would look like if you designed from scratch with the goal of good debuginfo and the fastest possible builds while enabling zero-cost abstractions.
Comments