Thursday 20 June 2019
Stack Write Traffic In Firefox Binaries
For people who like this sort of thing...
I became interested in how much CPU memory write traffic corresponds to "stack writes". For x86-64 this roughly corresponds to writes that use RSP or RBP as a base register (including implicitly via PUSH/CALL). I thought I had pretty good intuitions about x86 machine code, but the results surprised me.
In a Firefox debug build running a (non-media) DOM test (including browser startup/rendering/shutdown), Linux x86-64, non-optimized (in an rr recording, though that shouldn't matter):
| Base register | Fraction of written bytes |
| RAX | 0.40% |
| RCX | 0.32% |
| RDX | 0.31% |
| RBX | 0.01% |
| RSP | 53.48% |
| RBP | 44.12% |
| RSI | 0.50% |
| RDI | 0.58% |
| R8 | 0.01% |
| R9 | 0.00% |
| R10 | 0.00% |
| R11 | 0.00% |
| R12 | 0.00% |
| R13 | 0.00% |
| R14 | 0.00% |
| R15 | 0.00% |
| RIP | 0.00% |
| RDI (MOVS/STOS) | 0.25% |
| Other | 0.00% |
| RSP/RBP | 97.59% |
Ooof! I expected stack writes to dominate, since non-opt Firefox builds have lots of trivial function calls and local variables live on the stack, but 97.6% is a lot more dominant than I expected.
You would expect optimized builds to be much less stack-dominated because trivial functions have been inlined and local variables should mostly be in registers. So here's a Firefox optimized build:
| Base register | Fraction of written bytes |
| RAX | 1.23% |
| RCX | 0.78% |
| RDX | 0.36% |
| RBX | 2.75% |
| RSP | 75.30% |
| RBP | 8.34% |
| RSI | 0.98% |
| RDI | 4.07% |
| R8 | 0.19% |
| R9 | 0.06% |
| R10 | 0.04% |
| R11 | 0.03% |
| R12 | 0.40% |
| R13 | 0.30% |
| R14 | 1.13% |
| R15 | 0.36% |
| RIP | 0.14% |
| RDI (MOVS/STOS) | 3.51% |
| Other | 0.03% |
| RSP/RBP | 83.64% |
Definitely less stack-dominated than for non-opt builds — but still very stack-dominated! And of course this is not counting indirect writes to the stack, e.g. to out-parameters via pointers held in general-purpose registers. (Note that opt builds could use RBP for non-stack purposes, but Firefox builds with -fno-omit-frame-pointer so only in leaf functions, and even then, probably not.)
It would be interesting to compare the absolute number of written bytes between opt and non-opt builds but I don't have traces running the same test immediately at hand. Non-opt builds certainly do a lot more writes.