Wednesday, 31 August 2016

Avoiding Cache Writebacks For Freed Memory

I wonder how much memory traffic is generated by the CPU writing out evicted cache lines for memory locations the application knows are dead because they belong to freed memory. This would be interesting to measure. If it's significant, then perhaps the application (or some lower level of the software stack) should be trying more aggressively to reuse freed memory, or perhaps it would be valuable to have hardware support for invalidating cache lines without writing them back. Though I suppose the latter case is problematic since it would make buggy software nondeterministic. At the OS level Linux has madvise(..., MADV_DONTNEED) to clear unwanted pages, but I'm not aware of anything analogous for the CPU cache.

Update Xi Yang pointed me at a paper on exactly this topic: Cooperative Cache Scrubbing.

For modern Java programs, 10 to 60% of DRAM writes are useless, because the data on these lines are dead - the program is guaranteed to never read them again.
That's a lot more than I expected! It would be very interesting to know what these numbers are for C++ or Rust programs. If the numbers hold up, it sounds like it would be worth adding some kind of hardware support to quash these writes. It would be nice to avoid the problem of nondeterministic behavior for buggy software; I wonder if you could have a "zero cache line" instruction that sets the cache line to zero at each level of cache, marks the lines as clean, and writes zeroes to RAM, all more efficiently than a set of non-temporal writes of zero. Actually you might want to be able to do this for a large range of virtual addresses all at once, since programs often free large chunks of memory.


  1. "trying more aggressively to reuse freed memory"
    Quick throughts on just this:
    - The stack should naturally do that already, so that's good for parameters and local variables.
    - The heap could be managed in such a way that recently-freed memory is reused first. It would be interesting to know if that's already the case on existing platforms.

  2. My student looked into 'unread writes' via dynamic binary translation a few years ago. Appeared at RV 2012: "Detecting Unread Memory using Dynamic Binary Translation." Summary: not as many as we'd hoped.

    1. Interesting, but not the same thing that I'm considering here. I'm wondering about writes that may or may not be read, but where the last read (if any) occurs before the cache line is written back to RAM.