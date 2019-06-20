For people who like this sort of thing...
I became interested in how much CPU memory write traffic corresponds to "stack writes". For x86-64 this roughly corresponds to writes that use RSP or RBP as a base register (including implicitly via PUSH/CALL). I thought I had pretty good intuitions about x86 machine code, but the results surprised me.
In a Firefox debug build running a (non-media) DOM test (including browser startup/rendering/shutdown), Linux x86-64, non-optimized (in an rr recording, though that shouldn't matter):
|Base register
|Fraction of written bytes
|RAX
|0.40%
|RCX
|0.32%
|RDX
|0.31%
|RBX
|0.01%
|RSP
|53.48%
|RBP
|44.12%
|RSI
|0.50%
|RDI
|0.58%
|R8
|0.01%
|R9
|0.00%
|R10
|0.00%
|R11
|0.00%
|R12
|0.00%
|R13
|0.00%
|R14
|0.00%
|R15
|0.00%
|RIP
|0.00%
|RDI (MOVS/STOS)
|0.25%
|Other
|0.00%
|RSP/RBP
|97.59%
Ooof! I expected stack writes to dominate, since non-opt Firefox builds have lots of trivial function calls and local variables live on the stack, but 97.6% is a lot more dominant than I expected.
You would expect optimized builds to be much less stack-dominated because trivial functions have been inlined and local variables should mostly be in registers. So here's a Firefox optimized build:
|Base register
|Fraction of written bytes
|RAX
|1.23%
|RCX
|0.78%
|RDX
|0.36%
|RBX
|2.75%
|RSP
|75.30%
|RBP
|8.34%
|RSI
|0.98%
|RDI
|4.07%
|R8
|0.19%
|R9
|0.06%
|R10
|0.04%
|R11
|0.03%
|R12
|0.40%
|R13
|0.30%
|R14
|1.13%
|R15
|0.36%
|RIP
|0.14%
|RDI (MOVS/STOS)
|3.51%
|Other
|0.03%
|RSP/RBP
|83.64%
Definitely less stack-dominated than for non-opt builds — but still very stack-dominated! And of course this is not counting indirect writes to the stack, e.g. to out-parameters via pointers held in general-purpose registers. (Note that opt builds could use RBP for non-stack purposes, but Firefox builds with -fno-omit-frame-pointer so only in leaf functions, and even then, probably not.)
It would be interesting to compare the absolute number of written bytes between opt and non-opt builds but I don't have traces running the same test immediately at hand. Non-opt builds certainly do a lot more writes.
