ECE4680 Computer Organization and Architecture. Memory Hierarchy: Cache System

ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System ECE4680 Cache.1 2002-4-17 The Motivation for Caches Memory System Pr...
Author: Logan Turner
1 downloads 2 Views 152KB Size
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System

ECE4680 Cache.1

2002-4-17

The Motivation for Caches Memory System

Processor

Cache

DRAM

°Motivation: • Large memories (DRAM) are slow • Small memories (SRAM) are fast °Make the average access time small by: • Servicing most accesses from a small, fast memory. °Reduce the bandwidth required of the large memory

ECE4680 Cache.2

2002-4-17

An Expanded View of the Memory System

Processor Control Memory Memory Memory

Memory

Datapath

Memory

Slowest Biggest Lowest

Speed: Fastest Size: Smallest Cost: Highest

ECE4680 Cache.3

2002-4-17

Levels of the Memory Hierarchy Upper Level

Capacity Access Time Cost

Staging Xfer Unit

CPU Registers 100s Bytes DRAM (Read/Write) Access Time • - 2:1; why? °DRAM (Read/Write) Cycle Time : • How frequent can you initiate an access? • Analogy: A little kid can only ask his father for money on Saturday °DRAM (Read/Write) Access Time: • How quickly will you get what you want once you initiate an access? • Analogy: As soon as he asks, his father will give him the money °DRAM Bandwidth Limitation analogy: • What happens if he runs out of money on Wednesday?

ECE4680 Cache.39

2002-4-17

Increasing Bandwidth - Interleaving Access Pattern without Interleaving:

D1 available Start Access for D1

CPU

Memory

Start Access for D2 Memory Bank 0

Access Pattern with 4-way Interleaving:

CPU

Memory Bank 1

Access Bank 0

Memory Bank 2 Memory Bank 3 Access Bank 1 Access Bank 2 Access Bank 3 We can Access Bank 0 again

ECE4680 Cache.40

2002-4-17

Main Memory Performance °Timing model • 1 to send address, • 6 access time, 1 to send data • Cache Block is 4 words °Simple M.P. = 4 x (1+6+1) = 32 °Wide M.P. =1+6+1 =8 °Interleaved M.P. = 1 + 6 + 4x1 = 11

ECE4680 Cache.41

2002-4-17

Independent Memory Banks

°How many banks? number banks

number clocks to access word in bank

• For sequential accesses, otherwise will return to original bank before it has next word ready °Increasing DRAM => fewer chips => harder to have banks • Growth bits/chip DRAM : 50%-60%/yr • Nathan Myrvold M/S: mature software growth (33%/yr for NT) - growth MB/$ of DRAM (25%-30%/yr)

ECE4680 Cache.42

2002-4-17

SPARCstation 20’s Memory System

Memory Module 0

Memory Module 1

Memory Module 2

Memory Module 3

Memory Module 4

Memory Module 5

Memory Module 6

Memory Bus (SIMM Bus) 128-bit wide datapath Memory Module 7

Processor Bus (Mbus) 64-bit wide

Memory Controller

Processor Module (Mbus Module) SuperSPARC Processor External Cache

Instruction Cache Data Cache

Register File

ECE4680 Cache.43

2002-4-17

SPARCstation 20’s External Cache Processor Module (Mbus Module) SuperSPARC Processor External Instruction Cache Cache 1 MB Register Direct Mapped File Data Write Back Cache Write Allocate

°SPARCstation 20’s External Cache: • Size and organization: 1 MB, direct mapped • Block size: 128 B • Sub-block size: 32 B • Write Policy: Write back, write allocate

ECE4680 Cache.44

2002-4-17

SPARCstation 20’s Internal Instruction Cache Processor Module (Mbus Module) SuperSPARC Processor External I-Cache Cache 20 KB 5-way 1 MB Register Direct Mapped File Write Back Data Write Allocate Cache

°SPARCstation 20’s Internal Instruction Cache: • Size and organization: 20 KB, 5-way Set Associative • Block size: 64 B • Sub-block size: 32 B • Write Policy: Does not apply °Note: Sub-block size the same as the External (L2) Cache

ECE4680 Cache.45

2002-4-17

SPARCstation 20’s Internal Data Cache Processor Module (Mbus Module) SuperSPARC Processor External I-Cache Cache 20 KB 5-way 1 MB Register Direct Mapped File D-Cache Write Back 16 KB 4-way Write Allocate WT, WNA

°SPARCstation 20’s Internal Data Cache: • Size and organization: 16 KB, 4-way Set Associative • Block size: 64 B • Sub-block size: 32 B • Write Policy: Write through, write not allocate °Sub-block size the same as the External (L2) Cache

ECE4680 Cache.46

2002-4-17

Two Interesting Questions? Processor Module (Mbus Module) SuperSPARC Processor External I-Cache Cache 20 KB 5-way 1 MB Register Direct Mapped File D-Cache Write Back 16 KB 4-way Write Allocate WT, WNA

°Why did they use N-way set associative cache internally? • Answer: A N-way set associative cache is like having N direct mapped caches in parallel. They want each of those N direct mapped cache to be 4 KB. Same as the “virtual page size.” °How many levels of cache does SPARCstation 20 has? • Answer: Three levels. (1) Internal I & D caches, (2) External cache and (3) ...

ECE4680 Cache.47

2002-4-17

SPARCstation 20’s Memory Module °Supports a wide range of sizes: • Smallest 4 MB: 16 2Mb DRAM chips, 8 KB of Page Mode SRAM • Biggest: 64 MB: 32 16Mb chips, 16 KB of Page Mode SRAM DRAM Chip 15 512 cols 256K x 8 = 2 MB

512 rows

DRAM Chip 0 256K x 8 = 2 MB

512 x 8 SRAM 8 bits

bits

512 x 8 SRAM bits

ECE4680 Cache.48

Memory Bus

2002-4-17

DRAM Performance °A 60 ns (tRAC) DRAM can • perform a row access only every 110 ns (tRC) • perform column access (tCAC) in 15 ns, but time between column accesses is at least 35 ns (tPC). -

In practice, external address delays and turning around buses make it 40 to 50 ns

°These times do not include the time to drive the addresses off the microprocessor nor the memory controller overhead. • Drive parallel DRAMs, external memory controller, bus to turn around, SIMM module, pins… • 180 ns to 250 ns latency from processor to memory is good for a “60 ns” (tRAC) DRAM

ECE4680 Cache.49

2002-4-17

Summary: °The Principle of Locality: Temporal Locality vs Spatial Locality °Four Questions For Any Cache • Where to place in the cache • How to locate a block in the cache • Replacement • Write policy: Write through vs Write back - Write miss: °Three Major Categories of Cache Misses: • Compulsory Misses: sad facts of life. Example: cold start misses. • Conflict Misses: increase cache size and/or associativity. Nightmare Scenario: ping pong effect! • Capacity Misses: increase cache size

ECE4680 Cache.50

2002-4-17

Suggest Documents