Wednesday, 5 April 2017

Rust Optimizations That C++ Can't Do

Update dbaupp on Reddit pointed out that this particular example does not support the point I'm trying to make. I'll explain why below. I followed up with a new post with a better example.

Consider the Rust code

pub struct Foo {
    v1: u64,
    v2: u64,
pub fn foo(t: &Foo) -> u64 {
    t.v1 + t.v2

The stable Rust compiler, in Release mode, generates

 leaq (%rdi,%rsi), %rax

Hang on! foo is supposed to take a reference, so shouldn't it take a pointer as its parameter and do two loads to get the values of v1 and v2? Instead it receives the values of v1 and v2 in %rdi/%rsi and adds them using leaq as a three-operand "add" instruction. The Rust compiler has decided (correctly) that even though you wrote a reference, in this case it's more efficient to just pass the parameter by value. This is good to know as a Rust developer; I can use references freely without worrying about whether it would be more efficient to pass the value directly.

C++ compilers can't do this optimization, for a couple of reasons. One reason is that C++ has a standard ABI that dictates how parameters are passed; it requires that references are always passed as pointers. A more interesting reason is that in general this optimization is not safe in C++. An immutable Rust reference promises that the referenced value does not change (except for some special cases involving UnsafeCell which are easily detected by the compiler), but not so for C++ const references; a "const" value can be modified through other mutable references, so copies induced by pass-by-value could change program behavior. For a simple case like this a C++ compiler could do some analysis to show that can't happen, but those analyses don't scale (e.g. if foo called, and was called by, unknown code, all bets are off).

Here Rust's strong control over aliasing is enabling improved optimization, not just improved safety. Update Actually as dbaupp points out, this optimization example is not specific to Rust. In fact, LLVM is doing it via local analysis (the sort I mentioned above), and clang++ produces the same results for the equivalent C++ code. In fact Rust in general would need to preserve references because you can convert a reference to a raw pointer and compare them for equality.

It's easy to find other examples, such as improved code motion and common subexpression elimination. I think with more study of large Rust programs, more investment in the Rust compiler, and tighter control over unsafe Rust, we will probably discover that Rust enables many more interesting optimizations.

One of the promises of hair-shirt functional languages was that referential transparency and other strong semantic guarantees would enable extremely aggressive optimizations, which would recover the performance lost by moving away from the stateful semantics of hardware. That didn't really work out, but with Rust we might get the aggressive optimizations on top of semantics that are already very efficient.

1 comment: