I wonder how much memory traffic is generated by the CPU writing out evicted cache lines for memory locations the application knows are dead because they belong to freed memory. This would be interesting to measure. If it's significant, then perhaps the application (or some lower level of the software stack) should be trying more aggressively to reuse freed memory, or perhaps it would be valuable to have hardware support for invalidating cache lines without writing them back. Though I suppose the latter case is problematic since it would make buggy software nondeterministic. At the OS level Linux has madvise(..., MADV_DONTNEED) to clear unwanted pages, but I'm not aware of anything analogous for the CPU cache.
Update Xi Yang pointed me at a paper on exactly this topic: Cooperative Cache Scrubbing.
For modern Java programs, 10 to 60% of DRAM writes are useless, because the data on these lines are dead - the program is guaranteed to never read them again.That's a lot more than I expected! It would be very interesting to know what these numbers are for C++ or Rust programs. If the numbers hold up, it sounds like it would be worth adding some kind of hardware support to quash these writes. It would be nice to avoid the problem of nondeterministic behavior for buggy software; I wonder if you could have a "zero cache line" instruction that sets the cache line to zero at each level of cache, marks the lines as clean, and writes zeroes to RAM, all more efficiently than a set of non-temporal writes of zero. Actually you might want to be able to do this for a large range of virtual addresses all at once, since programs often free large chunks of memory.