Wednesday 5 April 2017
Rust Optimizations That C++ Can't Do (Version 2)
My last post was wrong in the details, so let me show a better example. Rust code:
pub extern fn foo(v: &u64, callback: fn()) -> u64 { let mut sum = 0; for _ in 0..100 { sum += *v; callback(); } sum }C++ version here.
The inner loop of the Rust version turns into:
.LBB0_1: inc ebx call r14 add r12, r15 cmp ebx, 100 jl .LBB0_1sum is in r12 and the evaluation of *v has been hoisted out of the loop with the result in r15. In the compiled C++ code the value has not been hoisted:
.LBB0_1: # =>This Inner Loop Header: Depth=1 add rbx, qword ptr [r15] call r14 dec ebp jne .LBB0_1
Of course the performance difference in this case would be negligible, but if *v was replaced with a more complicated expression, the difference would increase. (No, I don't know why the Rust code counts up instead of counting down like C++ to get slightly better code; someone should fix that.)
Here it's really clear that the semantics of Rust are making this optimization possible. In C++ v could be a reference to a global variable which is modified by the callback function, in which case hoisting the load would be incorrect.
Several people pointed out that with LTO/WPO a C++ compiler might be able to determine that such cases don't happen, in this case that no function passed into callback directly or indirectly modifies v. That's what I meant in my previous post by "a C++ compiler could do some analysis to show that can't happen" ... but as I continued, "those analyses don't scale". The larger and more complicated your code gets, the more likely it is that static analysis will fail to prove the invariants you care about. Sometimes, due to dynamic linking or run-time code generation, relevant code just isn't available to be analyzed. In other cases the analysis algorithms aren't sophisticated enough to prove what you care about (often because no such algorithm is known), or adequate algorithms exist but they were too expensive to deploy. Perhaps the worst cases for sophisticated global analysis are when the property you care about isn't even true, because you accidentally violated it in some part of the program, but you'll never know, because the only effect of that accident is that performance got a little bit worse somewhere else. (E.g. with this example if you accidentally change one of the callback functions to indirectly modify v, your hypothetical WPO compiler does a slightly worse job and you won't know why.)
What we need are language-level invariants that apply globally and support aggressive optimization using only local analysis. C++ has some, but Rust's are stronger. We also need those invariants to be checked and enforced, not just "it's undefined behavior if you do this"; (safe) Rust wins again on that.
Comments