Thursday 20 June 2019
Stack Write Traffic In Firefox Binaries
For people who like this sort of thing...
I became interested in how much CPU memory write traffic corresponds to "stack writes". For x86-64 this roughly corresponds to writes that use RSP or RBP as a base register (including implicitly via PUSH/CALL). I thought I had pretty good intuitions about x86 machine code, but the results surprised me.
In a Firefox debug build running a (non-media) DOM test (including browser startup/rendering/shutdown), Linux x86-64, non-optimized (in an rr recording, though that shouldn't matter):
Base register | Fraction of written bytes |
RAX | 0.40% |
RCX | 0.32% |
RDX | 0.31% |
RBX | 0.01% |
RSP | 53.48% |
RBP | 44.12% |
RSI | 0.50% |
RDI | 0.58% |
R8 | 0.01% |
R9 | 0.00% |
R10 | 0.00% |
R11 | 0.00% |
R12 | 0.00% |
R13 | 0.00% |
R14 | 0.00% |
R15 | 0.00% |
RIP | 0.00% |
RDI (MOVS/STOS) | 0.25% |
Other | 0.00% |
RSP/RBP | 97.59% |
Ooof! I expected stack writes to dominate, since non-opt Firefox builds have lots of trivial function calls and local variables live on the stack, but 97.6% is a lot more dominant than I expected.
You would expect optimized builds to be much less stack-dominated because trivial functions have been inlined and local variables should mostly be in registers. So here's a Firefox optimized build:
Base register | Fraction of written bytes |
RAX | 1.23% |
RCX | 0.78% |
RDX | 0.36% |
RBX | 2.75% |
RSP | 75.30% |
RBP | 8.34% |
RSI | 0.98% |
RDI | 4.07% |
R8 | 0.19% |
R9 | 0.06% |
R10 | 0.04% |
R11 | 0.03% |
R12 | 0.40% |
R13 | 0.30% |
R14 | 1.13% |
R15 | 0.36% |
RIP | 0.14% |
RDI (MOVS/STOS) | 3.51% |
Other | 0.03% |
RSP/RBP | 83.64% |
Definitely less stack-dominated than for non-opt builds — but still very stack-dominated! And of course this is not counting indirect writes to the stack, e.g. to out-parameters via pointers held in general-purpose registers. (Note that opt builds could use RBP for non-stack purposes, but Firefox builds with -fno-omit-frame-pointer so only in leaf functions, and even then, probably not.)
It would be interesting to compare the absolute number of written bytes between opt and non-opt builds but I don't have traces running the same test immediately at hand. Non-opt builds certainly do a lot more writes.