Six basic cache optimizations

Six basic cache optimizations Reducing miss rate: larger block size, larger cache size, and higher associativity. Reducing miss penalty: multilevel ca...
Author: Annice Norris
1 downloads 0 Views 25KB Size
Six basic cache optimizations Reducing miss rate: larger block size, larger cache size, and higher associativity. Reducing miss penalty: multilevel caches and giving reads priority over writes. Reducing the time to hit in the cache: avoiding address translation when indexing the cache.

3Cs. Three categories of misses: Compulsory (or code-start misses) the very first access to a block cannot be in the cache, so the block must be bring into the cache occur even if infinite amount of cache

Capacity Capacity if the cache cannot contain all the blocks needed during the execution of a program, capacity misses (in addition to compulsory misses) will occur because of blocks being discarded and later retrieved. occur in fully associated cache.

Conflict Conflict If the block placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory and capacity missed) will occur) because a block may be discarded and later retrieved if too many blocks map to its corresponding sets. occur when going from a fully associate cache to a set associative or directed mapped cache.

Conclusion Compulsory miss rate of SPEC 2000 programs is very small. Fully associative caches gets rid of conflict misses – but expensive in hardware, and often slows processor clock rate implying lower overall performance. capacity misses can be reduced by enlarging the cache. Note: A technique that may reduce the miss-rate may also increase the hit time or miss penalty.

Techniques for reducing miss rate Large block size reduce compulsory misses BUT increases miss penalty increase block size implies fewer blocks fit in cache (hence more conflict misses).

Larger caches helps to reduce capacity misses disadvantage: higher hit time and higher cost. Popular for off-chip caches – size of caches now is more than memory a decade ago. (Pentium IV E has 1M L2 cache.)

Higher associativity Data suggests 8 way associative is often as good as fully associative for this 2:1 cache rule (for cache size

128KB)

a direct-mapped cache of size N has the same miss rate of 2-way .



associative cache of size



Problem: increase hit time.

Multilevel caches CPU gets faster Caches’ speed need to keep up with CPU! Caches need to be BIGGER. We need both but hard.

How to solve bigger and faster problem? Add a second level of cache First level cache is small enough to match clock cycle time of the fast CPU. Second level cache should be large enough to capture many access that would go to main memory? How do we analyze performance?

Miss rate Local miss rate is the number of misses in a given cache divided by the total number of memory access to this cache.

 

Miss rate

for L1.



Miss rate

for L2.

Miss rate Local miss rate is large for 2nd-level caches because 1st level cache skims the “juicy” memory accesses. Global miss rate is a more useful measure Global miss rate measure what fraction of memory accesses must go all the way to memory.

L1 and L2 For data in L1 cache, should it be on the L2 cache too? Multilevel inclusion consistency between I/O and caches (or among caches). Bad things: Statistics suggest it is a good idea to have small block sizes for L1 cache and bigger block size for L2 caches. Still doable, Pentium 4 does it.

Multilevel Exclusion L1 data is NEVER found in L2 cache If designer can only afford a L2 cache that is slightly bigger than L1 cache, then they don’t want to waste space in L2 cache. AMD Athlon uses exclusion.

Giving priority to read misses over writes Serve reads before writes have been completed. SW R3, 512(R0) LW R1, 1024(R0) LW R2, 512(R0) assume direct-mapped and 1024 and 512 are mapped to the same block.

Solution Wait until the write buffer is empty OR check the contents of the write buffer on a read miss, if no conflicts and the memory system is available, let the read miss continue. Also save cost on write-back cache if the content is found on write buffer.

Suggest Documents