Page 1. One Word Wide Memory Organization. Memory Hierarchies. One Word Wide Memory Organization

1 2 One Word Wide Memory Organization Memory Hierarchies on-chip CPU Outline of Lectures on Memory Systems 1. Memory Hierarchies Cache 2. Cach...
Author: Debra Reynolds
3 downloads 0 Views 100KB Size
1

2

One Word Wide Memory Organization

Memory Hierarchies

on-chip

CPU

Outline of Lectures on Memory Systems 1. Memory Hierarchies

Cache

2. Cache Memory bus

3. Virtual Memory Memory

4. The future

• If the block size is one word, then for a memory access due to a cache miss, the pipeline will have to stall the number of cycles required to return one data word from memory cycle to send address cycles to read DRAM cycle to return data total clock cycles miss penalty • Number of bytes transferred per clock cycle (bandwidth) for a single miss

3

4

One Word Wide Memory Organization on-chip

CPU

Cache bus

One Word Wide Memory Organization, con’t

• If the block size is one word, then for a memory access due to a cache miss, the pipeline will have to stall the number of cycles required to return one data word from memory 1 cycle to send address 25 cycles to read DRAM 1 cycle to return data 27 total clock cycles miss penalty

on-chip

CPU

Cache bus

• What if the block size is four words? 1 cycle to send 1st address 4 x 25 = 100 cycles to read DRAM 1 cycles to return last data word 102 total clock cycles miss penalty

25 cycles 25 cycles 25 cycles

Memory

Memory

• Number of bytes transferred per clock cycle (bandwidth) for a single miss is 4/27 = 0.148 bytes per clock

25 cycles

• Number of bytes transferred per clock cycle (bandwidth) for a single miss is (4 x 4)/102 = 0.157 bytes per clock

Page 1

5

6

One Word Wide Memory Organization, con’t on-chip

CPU

Cache

Interleaved Memory Organization

• What if the block size is four words and if a fast page mode DRAM is used? cycle to send 1st address 1 cycles to read DRAM 25 + 3*8 = 49 cycles to return last data word 1 total clock cycles miss penalty 51

‰

For a block size of four words

on-chip

1 cycle to send 1st address

CPU

25 + 3 = 28 cycles to read DRAM 1 cycles to return last data word

Cache

bus

30 total clock cycles miss penalty 25 cycles

bus 25 cycles

25 cycles 8 cycles

Memory

25 cycles

Memory Memory Memory Memory bank 0 bank 1 bank 2 bank 3

8 cycles 8 cycles

25 cycles

Number of bytes transferred per clock cycle (bandwidth) for a single miss is

‰

• Number of bytes transferred per clock cycle (bandwidth) for a single miss is (4 x 4)/51 = 0.314 bytes per clock

(4 x 4)/30 = 0.533 bytes per clock

7

8

The Memory Hierarchy

DRAM Memory System Summary ‰

• Its important to match the cache characteristics – caches access one block at a time (usually more than one word)

Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology Processor

• with the DRAM characteristics – use DRAMs that support fast multiple word accesses, preferably ones that match the block size of the cache

4-8 bytes (word)

Increasing distance from the processor in access time

• with the memory-bus characteristics – make sure the memory-bus can support the DRAM access rates and patterns – with the goal of increasing the Memory-Bus to Cache bandwidth

L1$ 8-32 bytes (block)

L2$ 1 to 4 blocks

Main Memory

Inclusive– what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM

1,024+ bytes (disk sector = page)

Secondary Memory

(Relative) size of the memory at each level

Page 2

9

10

The Memory Hierarchy: Why Does it Work?

The Memory Hierarchy: Terminology • Hit: data is in some block in the upper level (Blk X) – Hit Rate: the fraction of memory accesses found in the upper level – Hit Time: Time to access the upper level which consists of

• Temporal Locality (Locality in Time): ⇒ Keep most recently accessed data items closer to the processor • Spatial Locality (Locality in Space): ⇒ Move blocks consisting of contiguous words to the upper levels

RAM access time + Time to determine hit/miss To Processor Upper Level Memory

Lower Level Memory

Blk X

From Processor To Processor Upper Level Memory

Blk Y

Lower Level Memory

• Miss: data is not in the upper level so needs to be retrieved from a block in the lower level (Blk Y) – Miss Rate = 1 - (Hit Rate) – Miss Penalty: Time to replace a block in the upper level + Time to deliver the block to the processor – Hit Time

Suggest Documents