I looked into the frequency of register usage in x86-64. I disassembled optimized and debug builds of Firefox's libxul.so and counted, for each general-purpose register, the number of instructions referencing that register. The graph below shows those results, normalized to a fraction of the total number of instructions in each binary.
These results severely undercount the actual usage of rsp, since it's implicitly used in every push, pop, call and ret instruction, but otherwise should be accurate. 8, 16 and 32-bit registers were counted as the 64-bit register they belong to.
- Debug builds use rax and rbp a lot more than optimized builds. This is because gcc debug builds keep local variables in memory and reference them via rbp. Optimized builds use register allocation to spread locals across registers and do a lot fewer memory references.
- Optimized builds make far more use of r12-r15 than r8-r11. I think that's because r12-r15 are four of the six callee-saves registers in the System V x86-64 ABI (the others are rbx and rbp), so the third callee-saves register you allocate will be r12. On the other hand rdi, rsi, rax, rdx and rcx are all caller-saves so you don't get into r8 etc until you need a sixth caller-saves register.
- In optimized builds r11 is referenced by less than 0.4% of instructions, so it looks like there's no need for more caller-saves general-purpose registers, and given the least-used callee-saves register r15 is still used by 3% of instructions, maybe it would have been a good idea for the x86-64 ABI to make more registers callee-saves instead of caller-saves.