Exploiting Memory Hierarchy

Lecture 9 - Cache Memory SRAM: DRAM: Memory vs Logic Performance value is stored on a pair of inverting gates very fast but takes ...

Author: Irene Atkinson

16 downloads 2 Views 192KB Size

Report

Download PDF

Recommend Documents

Large and Fast: Exploiting Memory Hierarchy

Chapter 7: Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Large and Fast: Exploiting Memory Hierarchy. Instructor: Huzefa Rangwala, PhD

Quiz for Chapter 5 Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy: Caches, Virtual Memory

Lecture 9: Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy. Who Cares about Memory Hierarchy?

Memories: Design Objectives. The Main Memory Unit. Memories: Characteristics. Memories: Basic Parameters. Exploiting Memory Hierarchy

Memory Hierarchy and Cache. Memory Hierarchy and Cache

Memory Hierarchy. Introduction. Goal of Memory Hierarchy. Locality

Chapter 7. Memory Hierarchy

Overview. Memory Hierarchy

A typical memory hierarchy

Introduction. Memory Hierarchy

The Memory Hierarchy

14. Caches & The Memory Hierarchy

Lecture 17-18: Memory Hierarchy

Chapter 12 The Memory Hierarchy

The Memory Hierarchy Part II

Memory hierarchy. Outline: memory hierarchy basics on-chip RAM and caches memory management operating systems

Chapter 6 The Memory Hierarchy

Lecture 12: Memory hierarchy & caches

Lecture 9 - Cache Memory

SRAM:

DRAM:

Memory vs Logic Performance

value is stored on a pair of inverting gates very fast but takes up more space (4 to 6 transistors per bit) than DRAM value is stored as charge on a capacitor (must be refreshed) very small but slower than SRAM (factor of 5 to 10) memory speed improves much slower than logic speed consequently, very fast memory is expensive applications eat up more and more memory (without necessarily providing better functionality)

Users want large and fast memories!

SRAM access times are 2 - 25ns at cost of $100 to $250 per Mbyte DRAM access times are 60-120ns at cost of $2 to $5 per Mbyte Disk access times are 10 to 20 million ns at cost < $0.05 to $.10 per Mbyte

gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 1

Exploiting Memory Hierarchy

Question:

Answer:

how to organize memory to improve performance without the cost? build a memory hierarchy

gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 2

Exploiting Memory Hierarchy size

nearest to CPU

memory type

access speed

Processor ProcessorRegisters Registers

64 to 256 bytes

1 - 5 nsec

On-chip On-chipcache cache

8 - 32 Kbytes

~ 10 nsec

Second-level Second-levelcache cache

128 - 512 Kbytes

10’s nsec

Main Mainmemory memory(DRAM) (DRAM)

16M - 4G bytes

~ 100 nsec

Disk Diskor orother otherstore store

gac1/pykc - 31-Oct-03

10’s - 100’s Gbytes

ISE1 / EE2 Computing

10’s - 100’s msec

Lecture 9 - 3

Memory transfer between levels of hierarchy

Every pair of levels in memory of hierarchy can be thought of as having an upper and lower level Within each level, the unit of information that is present or not is called a block. Usually an entire block is transferred between levels gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 4

Principle of Locality

The principle of locality makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be referenced soon. Why does code have locality? Memory hierarchy can be multiple levels Data is copied between two adjacent levels at a time We will focus on two levels:

Upper level (closer to the processor, smaller but faster) Lower level (further from the processor, larger but slower)

Some terms used in describing memory hierarchy: block: minimum unit of data to transfer - also called a cache line hit: data requested is in the upper level miss: data requested is not in the upper level

gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 5

The Basics of Caches

Say we wish to access an address in main memory The processor communicates this request to the cache Two possibilities

cache contains this address: so access can be performed without involving main memory (HIT) cache does not contain this address: need to go to main memory to perform this access (MISS)

On a cache miss, the cache will update itself to store this memory address Next time, we should get a HIT Problem: only a limited number of addresses can be stored in a cache Solution: cache needs a replacement policy – which location should be replaced by the new entry?

gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 6

Unified instruction and data cache

Single cache shared between instruction and data:

gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 7

Separate data and instruction caches

gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 8

The Basics of Caches

Two issues:

How do we know if a data item is in the cache? (Is it a hit or a miss?) If it is there (a hit), how do we find it?

Our first example:

block size is one byte of data we will consider the "direct mapped" approach For Foreach eachitem itemof ofdata dataat atthe thelower lowerlevel, level, there is exactly one location there is exactly one locationin inthe thecache cachewhere whereititmight mightbe. be. Lots Lotsof ofitems itemsat atthe thelower lowerlevel levelshare sharelocations locationsin inthe theupper upperlevel level

gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 9

Direct Mapped Cache

Mapping: address is modulo the number of blocks in the cache

gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 10

Cache Contents - A walk-through Index 000 001 010 011 100 101 110 111

Valid bit (V) N N N N N N N N

Tag

Data

Initial state on power-ON After handling read of address 00000 After handling read of address 00001 After handling write to address 01010 After handling read of address 01011 After handling read of address 11010 gac1/pykc - 31-Oct-03

3 tag

2

0 index

division of address into tag and index (cache line number)

ISE1 / EE2 Computing

Lecture 9 - 11

A Further Example

We have considered an 8 word cache, consisting of 8 lines each of 1 byte. An alternative arrangement is 4 lines, each of 2 bytes. On each cache miss, we must then fill the entire line

3 2 tag

Index 00 01 10 11

1 index

Valid bit (V) N N N N

gac1/pykc - 31-Oct-03

0 byte

Initial state on power-ON After handling read of address 00000 After handling read of address 00001 After handling write to address 01010 After handling read of address 01011 After handling read of address 11010

Tag

ISE1 / EE2 Computing

Data

Lecture 9 - 12

Organization of Direct-mapped cache

gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 13

Direct Mapped Cache

A particular memory item is stored in a unique location in the cache. To check if a particular memory item is in cache, the relevant address bits are used to access the cache entry. The top address bits are then compared with the stored tag. If they are equal, we have got a hit. Two items with the same cache address field will contend for use of that location. Only those bits of the address which are not used to select within the line or to address the cache RAM need be stored in the tag field. When a miss occurs, data cannot be read from the cache. A slower read from the next level of memory must take place, incurring a miss penalty. A cache line is typically more than one word. It shows 4 words in the diagram here. A large cache line exploits principle of spatial locality more hits for sequential access. It also incurs higher miss penalty. gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 14

Hits vs. Misses

Read hits

this is what we want: cache passes data to processor

Read misses

stall the CPU, fetch block from memory, deliver to cache, restart

Write hits:

can replace data in cache and memory (write-through) write the data only into the cache (write-back to the cache later)

Write misses:

read the entire block into the cache, then write the word

gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 15

Write Strategies in Caches

Cache write is more complicated than cache read because even if you have a hit, you have to decide if and when you write it back to main memory. This is of utmost importance in multiprocessor systems. Two main strategies are used: 1. Write-through All writes are passed to main memory immediately If there is a hit, the cache is updated to hold new value Processor slows down to main memory speed during write 3. Write-back Write operation updates the cache, but not main memory Cache remembers that it is different from main memory via a dirty bit It is written back to main memory only when the cache line is used by new data

gac1/pykc - 31-Oct-03

ISE1 / EE2 Computing

Lecture 9 - 16