Eyes Above The Waves

Robert O'Callahan. Christian. Repatriate Kiwi. Hacker.

Thursday 20 June 2019

Stack Write Traffic In Firefox Binaries

For people who like this sort of thing...

I became interested in how much CPU memory write traffic corresponds to "stack writes". For x86-64 this roughly corresponds to writes that use RSP or RBP as a base register (including implicitly via PUSH/CALL). I thought I had pretty good intuitions about x86 machine code, but the results surprised me.

In a Firefox debug build running a (non-media) DOM test (including browser startup/rendering/shutdown), Linux x86-64, non-optimized (in an rr recording, though that shouldn't matter):

Base registerFraction of written bytes
RAX0.40%
RCX0.32%
RDX0.31%
RBX0.01%
RSP53.48%
RBP44.12%
RSI0.50%
RDI0.58%
R80.01%
R90.00%
R100.00%
R110.00%
R120.00%
R130.00%
R140.00%
R150.00%
RIP0.00%
RDI (MOVS/STOS)0.25%
Other0.00%
RSP/RBP97.59%

Ooof! I expected stack writes to dominate, since non-opt Firefox builds have lots of trivial function calls and local variables live on the stack, but 97.6% is a lot more dominant than I expected.

You would expect optimized builds to be much less stack-dominated because trivial functions have been inlined and local variables should mostly be in registers. So here's a Firefox optimized build:

Base registerFraction of written bytes
RAX1.23%
RCX0.78%
RDX0.36%
RBX2.75%
RSP75.30%
RBP8.34%
RSI0.98%
RDI4.07%
R80.19%
R90.06%
R100.04%
R110.03%
R120.40%
R130.30%
R141.13%
R150.36%
RIP0.14%
RDI (MOVS/STOS)3.51%
Other0.03%
RSP/RBP83.64%

Definitely less stack-dominated than for non-opt builds — but still very stack-dominated! And of course this is not counting indirect writes to the stack, e.g. to out-parameters via pointers held in general-purpose registers. (Note that opt builds could use RBP for non-stack purposes, but Firefox builds with -fno-omit-frame-pointer so only in leaf functions, and even then, probably not.)

It would be interesting to compare the absolute number of written bytes between opt and non-opt builds but I don't have traces running the same test immediately at hand. Non-opt builds certainly do a lot more writes.