Cache Memory and Performance

Cache Memory and Performance Memory Hierarchy 1 Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Co...
Author: Grant Norton
5 downloads 0 Views 911KB Size
Cache Memory and Performance

Memory Hierarchy 1

Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS:APP) Randal E. Bryant and David R. O'Hallaron http://csapp.cs.cmu.edu/public/lectures.html

The book is used explicitly in CS 2505 and CS 3214 and as a reference in CS 2506.

CS@VT

Computer Organization II

©2005-2015 CS:APP & McQuain

Memory Hierarchy 2

An Example Memory Hierarchy L0: Registers L1: Smaller, faster, costlier per byte

L2:

CPU registers hold words retrieved from L1 cache

L1 cache (SRAM)

L1 cache holds cache lines retrieved from L2 cache

L2 cache (SRAM)

L2 cache holds cache lines retrieved from main memory

L3:

Larger, slower, cheaper per byte

L5:

CS@VT

Main memory (DRAM) L4:

Local secondary storage (local disks)

Main memory holds disk blocks retrieved from local disks Local disks hold files retrieved from disks on remote network servers

Remote secondary storage (tapes, distributed file systems, Web servers)

Computer Organization II

©2005-2015 CS:APP & McQuain

Random-Access Memory (RAM)

Memory Hierarchy 3

Key features – – –

RAM is traditionally packaged as a chip. Basic storage unit is normally a cell (one bit per cell). Multiple RAM chips form a memory.

Static RAM (SRAM) – – – –

Each cell stores a bit with a four or six-transistor circuit. Retains value indefinitely, as long as it is kept powered. Relatively insensitive to electrical noise (EMI), radiation, etc. Faster and more expensive than DRAM.

Dynamic RAM (DRAM) –

– – –

CS@VT

Each cell stores bit with a capacitor. One transistor is used for access Value must be refreshed every 10-100 ms. More sensitive to disturbances (EMI, radiation,…) than SRAM. Slower and cheaper than SRAM.

Computer Organization II

©2005-2015 CS:APP & McQuain

SRAM vs DRAM Summary

Trans. per bit

Memory Hierarchy 4

Access Needs Needs time refresh? EDC?

Cost

Applications

SRAM 4 or 6

1X

No

Maybe 100x

Cache memories

DRAM 1

10X

Yes

Yes

Main memories, frame buffers

CS@VT

1X

Computer Organization II

©2005-2015 CS:APP & McQuain

Traditional CPU-Memory Bus Structure

Memory Hierarchy 5

A bus is a collection of parallel wires that carry address, data, and control signals. Buses are typically shared by multiple devices.

CPU chip Register file

ALU System bus

Bus interface

CS@VT

I/O bridge

Computer Organization II

Memory bus

Main memory

©2005-2015 CS:APP & McQuain

Memory Read Transaction (1)

Memory Hierarchy 6

CPU places address A on the memory bus.

Register file %eax

Load operation: movl A, %eax ALU

Main memory I/O bridge Bus interface

CS@VT

0

A x

Computer Organization II

A

©2005-2015 CS:APP & McQuain

Memory Read Transaction (2)

Memory Hierarchy 7

Main memory reads A from the memory bus, retrieves word x, and places it on the bus.

Register file %eax

Load operation: movl A, %eax ALU

I/O bridge Bus interface

CS@VT

x

Main memory 0 x

Computer Organization II

A

©2005-2015 CS:APP & McQuain

Memory Read Transaction (3)

Memory Hierarchy 8

CPU read word x from the bus and copies it into register %eax.

Register file %eax

x

Load operation: movl A, %eax ALU

I/O bridge Bus interface

CS@VT

Main memory 0 x

Computer Organization II

A

©2005-2015 CS:APP & McQuain

Memory Write Transaction (1)

Memory Hierarchy 9

CPU places address A on bus. Main memory reads it and waits for the corresponding data word to arrive.

Register file %eax

y

Store operation: movl %eax, A ALU

I/O bridge Bus interface

CS@VT

A

Main memory 0 A

Computer Organization II

©2005-2015 CS:APP & McQuain

Memory Write Transaction (2)

Memory Hierarchy 10

CPU places data word y on the bus.

Register file %eax

y

Store operation: movl %eax, A ALU

I/O bridge Bus interface

CS@VT

y

Main memory 0 A

Computer Organization II

©2005-2015 CS:APP & McQuain

Memory Write Transaction (3)

Memory Hierarchy 11

Main memory reads data word y from the bus and stores it at address A.

register file %eax

y

Store operation: movl %eax, A ALU

I/O bridge bus interface

CS@VT

main memory 0 y

Computer Organization II

A

©2005-2015 CS:APP & McQuain

The Bigger Picture: I/O Bus

Memory Hierarchy 12

CPU chip Register file ALU System bus

Memory bus

Main memory

I/O bridge

Bus interface

I/O bus USB controller

Graphics adapter

Mouse Keyboard

Monitor

Disk controller

Expansion slots for other devices such as network adapters.

Disk CS@VT

Computer Organization II

©2005-2015 CS:APP & McQuain

Storage Trends

Memory Hierarchy 13

SRAM

Metric

1980

1985

1990

1995

2000

2005

2010

2010:1980

$/MB access (ns)

19,200 300

2,900 150

320 35

256 15

100 3

75 2

60 1.5

320 200

1980

1985

1990

1995

2000

2005

2010

2010:1980

$/MB 8,000 access (ns) 375 typical size (MB) 0.064

880 200 0.256

100 100 4

30 70 16

1 60 64

0.1 50 2,000

0.06 40 8,000

130,000 9 125,000

2010

2010:1980

DRAM

Metric

Disk

Metric

1980

1985

1990

1995

2000

2005

$/MB access (ms) typical size (MB)

500 87 1

100 75 10

8 28 160

0.30 10 1,000

0.01 8 20,000

0.005 0.0003 1,600,000 4 3 29 160,000 1,500,000 1,500,000

CS@VT

Computer Organization II

©2005-2015 CS:APP & McQuain

The CPU-Memory Gap The gap

Memory Hierarchy 14

between DRAM, disk, and CPU speeds.

100,000,000.0

Disk

10,000,000.0 1,000,000.0

SSD

100,000.0

Disk seek time Flash SSD access time DRAM access time SRAM access time CPU cycle time Effective CPU cycle time

ns

10,000.0 1,000.0

DRAM

100.0 10.0 1.0

CPU

0.1 0.0 1980

1985

1990

1995

2000

2003

2005

2010

Year CS@VT

Computer Organization II

©2005-2015 CS:APP & McQuain

Memory Hierarchy 15

Locality

Principle of Locality: Programs tend to use data and instructions with addresses near or equal to those they have used recently Temporal locality: –

Recently referenced items are likely to be referenced again in the near future

Spatial locality: – Items with nearby addresses tend to be referenced close together in time

CS@VT

Computer Organization II

©2005-2015 CS:APP & McQuain

Locality Example

Memory Hierarchy 16 sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum;

Data references – –

Reference array elements in succession (stride-1 reference pattern). Reference variable sum each iteration.

Spatial locality Temporal locality

Instruction references – –

CS@VT

Reference instructions in sequence. Cycle through loop repeatedly.

Computer Organization II

Spatial locality Temporal locality

©2005-2015 CS:APP & McQuain

Taking Advantage of Locality

Memory Hierarchy 17

Memory hierarchy Store everything on disk Copy recently accessed (and nearby) items from disk to smaller DRAM memory –

Main memory

Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory –

CS@VT

Cache memory attached to CPU

Computer Organization II

©2005-2015 CS:APP & McQuain

Memory Hierarchy 18

An Example Memory Hierarchy L0: Registers L1: Smaller, faster, costlier per byte

L2:

CPU registers hold words retrieved from L1 cache

L1 cache (SRAM)

L1 cache holds cache lines retrieved from L2 cache

L2 cache (SRAM)

L2 cache holds cache lines retrieved from main memory

L3:

Larger, slower, cheaper per byte

L5:

CS@VT

Main memory (DRAM) L4:

Local secondary storage (local disks)

Main memory holds disk blocks retrieved from local disks Local disks hold files retrieved from disks on remote network servers

Remote secondary storage (tapes, distributed file systems, Web servers)

Computer Organization II

©2005-2015 CS:APP & McQuain

Caches Cache:

Memory Hierarchy 19 a smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device.

Fundamental idea of a memory hierarchy: –

For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1.

Why do memory hierarchies work? – –

Because of locality, programs tend to access the data at level k more often than they access the data at level k+1. Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit.

Big Idea: The memory hierarchy creates a large pool of storage that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.

CS@VT

Computer Organization II

©2005-2015 CS:APP & McQuain

General Cache Concepts

Cache

8 4

9

Memory Hierarchy 20

14 10

Data is copied in block-sized transfer units

10 4

Memory

CS@VT

3

Smaller, faster, more expensive memory caches a subset of the blocks

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Computer Organization II

Larger, slower, cheaper memory viewed as partitioned into “blocks”

©2005-2015 CS:APP & McQuain

General Cache Concepts: Hit

Memory Hierarchy 21

Request: 14

Cache

8

9

14

3

Memory

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

CS@VT

Computer Organization II

Data in block b is needed Block b is in cache: Hit!

©2005-2015 CS:APP & McQuain

General Cache Concepts: Miss

Memory Hierarchy 22

Request: 12

Cache

8

9 12

CS@VT

3

Request: 12

12

Memory

14

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Computer Organization II

Data in block b is needed Block b is not in cache: Miss! Block b is fetched from memory Block b is stored in cache • Placement policy: determines where b goes • Replacement policy: determines which block gets evicted (victim)

©2005-2015 CS:APP & McQuain

Types of Cache Misses

Memory Hierarchy 23

Cold (compulsory) miss –

Cold misses occur because the cache is empty.

Conflict miss –

Most caches limit blocks at level k+1 to a small subset (sometimes a singleton) of the block positions at level k. 



E.g. Block i at level k+1 must be placed in block (i mod 4) at level k.

Conflict misses occur when the level k cache is large enough, but multiple data objects all map to the same level k block. 

E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time.

Capacity miss –

CS@VT

Occurs when the set of active cache blocks (working set) is larger than the cache.

Computer Organization II

©2005-2015 CS:APP & McQuain

Examples of Caching in the Hierarchy

Memory Hierarchy 24

Cache Type

What is Cached?

Where is it Cached?

Registers

4-8 bytes words

CPU core

0

Compiler

TLB

Address translations

On-Chip TLB

0

Hardware

L1 cache

64-bytes block

On-Chip L1

1

Hardware

L2 cache

64-bytes block

On/Off-Chip L2

10

Hardware

Virtual Memory

4-KB page

Main memory

100

Hardware + OS

Buffer cache

Parts of files

Main memory

100

OS

Disk cache

Disk sectors

Disk controller

100,000

Disk firmware

Network buffer cache

Parts of files

Local disk

10,000,000

AFS/NFS client

Browser cache

Web pages

Local disk

10,000,000

Web browser

Web cache

Web pages

Remote server disks

CS@VT

Computer Organization II

Latency (cycles)

1,000,000,000

Managed By

Web proxy server

©2005-2015 CS:APP & McQuain

Cache Memories

Memory Hierarchy 25

Cache memories are small, fast SRAM-based memories managed automatically in hardware. –

Hold frequently accessed blocks of main memory

CPU looks first for data in caches (e.g., L1, L2, and L3), then in main memory. Typical system structure: CPU chip Register file Cache memories

ALU System bus

Bus interface

CS@VT

I/O bridge

Computer Organization II

Memory bus Main memory

©2005-2015 CS:APP & McQuain

Suggest Documents