Saturday, 8 August 2020

Scaling Debuginfo For Zero-Cost Abstractions

In yesterday's post I raised the issue that Rust and C++ promise zero-cost abstractions, but with the standard build configurations people use today, you have to choose between "zero-cost abstractions" and "fast and debuggable builds". Reflecting on this a bit more, I realized that the DWARF debuginfo format used with ELF binaries inherently conflicts with the goal of having both zero-cost abstractions and fast debuggable builds, because of the way it handles inline functions (and possibly for other reasons).

Suppose we have a non-inline function F calling non-inline function G, at K sites, each time via a stack of N inline trivial wrapper functions I1 ... IN. Zero-cost abstraction means this compiles to the same code as F calling G directly K times. However, correct DWARF debuginfo for F must contain KxN explicit DW_TAG_inlined_subroutines, one for each instance of an inlined function :-(. Tracking and then emitting all this debuginfo will necessarily slow down the build, and make it slower as you add more layers of abstraction. Maybe this isn't (yet) a problem in practice compared to the cost of optimizations, but it is a fundamental limitation. Fixing it would probably require changing the debuginfo format so that, at least for the "minimal optimizations for fast debuggable builds" mode, the debuginfo for an inline function can be emitted just once and then parameterized for each call site where it is inlined. I don't have a clear idea of how that would work! This probably means the format and emission of debuginfo needs to be carefully codesigned with the optimizer to achieve fast debuggable builds with zero-cost abstractions.


  1. Seems you want no-inline flags!? And, linker can merge duplicated functions, so as their debug info.

  2. No, we need inlining because without inlining "zero-cost abstractions" aren't zero-cost.

    Unfortunately linkers don't modify debuginfo at all during linking, so merging identical functions doesn't merge their debuginfo. Anyway that won't help since in my example there are no duplicate functions to merge.

  3. Great couple of posts... thanks.

    My first reaction is to wonder how much this matters. The few bytes that an inlined_subroutine_instance *has* to take up, even if multiplied up by K call sites, might not add up to very much. I'm not sure. This whole thing is bounde by the structural source-level complexity that arises in "real programs"... people don't build infinitely deep towers, even though they ought to be able to. And of course DWARF-level stuff like the factoring out of "abstract origins" is intended to help with this. Maybe a better encoding at the DWARF level is a "good enough"?

    This reminds me of a related problem: when you build these big abstraction towers, at least when they involve functions, the debugging experience becomes miserable because "step into" becomes too crude -- I might want to step into some part of an expression, like a "real" function call of interest, but not into the bowels of the smart pointer manipulations (or whatever) involved in that expression. I've suspected the answer is to create a notion of "layers"... I could "step over" some layers but "into" others. But maybe it would also let the compiler "cut" the debuggability at some layer, so that stepping below that level it is not possible and debug info can be discarded earlier? Not comfortable for me to advocate this, since normally I advocate full source-level recovery. I guess it comes back to how much of the DWARF storage and compile-time costs are really essential.

    1. Good questions. I don't have real answers but I do think the size of debuginfo relative to the the size of optimized binaries indicates we have a real problem.

      The step-over/step-into abstraction problem is not a big issue in my mind because I think step-over/step-into are fundamentally misfeatures. With omniscient debugging we can offer alternative features that are just better, e.g.