Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5 Large and Fast: Exploiting Memory Hierarchy Adapted from the following professors’ slides Dave Patterson, UC Berkeley Jeanine Cook, New Mexi...
6 downloads 0 Views 566KB Size
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Adapted from the following professors’ slides Dave Patterson, UC Berkeley Jeanine Cook, New Mexico State University Mithuna Thottethodi, Purdue university M.S. Schmalz, University of Florida Dr. Son Vuong, University of British Columbia

Processor-Memory Gap

2

5.1 Overview of Memory

• • • • • •

Storage of data and instructions Modeled as linear array of 32-bit words MIPS: 32-bit addresses => 232 words Response time for any word is the same Memory can be read/written in 1 cycle ☺ Types of memory

– Physical: – Virtual: – Cache:

What is installed in the computer Disk storage that looks like memory Fast memory for temporary storage 3

Memory Instructions • • •

lw reg, offset(base)

sw reg, offset(base)

Load => Memory-to-register transfer Store => Register-to-memory transfer

sw Reg

Base Offset

lw

4

5.2 Memory Technology

• Random Access Memory: – “Random” is good: access time is the same for all locations – DRAM: Dynamic Random Access Memory • High density, low power, cheap, slow • Dynamic: need to be “refreshed” regularly – SRAM: Static Random Access Memory • Low density, high power, expensive, fast • Static: content will last “forever”(until lose power)

5



Memory Technology Memory Technology

Typical access time

SRAM

0.5-5 ns

DRAM Magnetic disk

$ per GB (2004) $4000 - $10,000

40-70 ns

$100 - $200

5-20 million ns

$0.50 - $20

• “Not-so-random” Access Technology:

– Access time varies from location to location and from time to time, – Examples: Disk, CDROM • Sequential Access Technology:

– access time linear in location (e.g.,Tape) • The Main Memory:

DRAMs

6

DRAM - Main Memory • memory as matrix • addresses divided into 2 halves • DRAM built as DIMM/SIMMs

•Dense, 1T/bit-cell •Forgets after a while •need “refresh” 7

SRAM - Cache •

Data is static (as long as power is applied)

•6-transistor •No refresh •Address not divided

8

Flash Storage •

Nonvolatile semiconductor storage

– is a type of EEPROM chip (Electronically Erasable Programmable Read Only Memory).

– 100× – 1000× faster than disk – Smaller, lower power, more robust – But more $/GB (between disk and DRAM)

9

Example Use • • • • •

Your computer's BIOS chip



Memory cards for video game consoles, Cellular phone, Video/Music player



Computer replacing hard drive?

CompactFlash (most often found in digital cameras) SmartMedia (most often found in digital cameras) Memory Stick (most often found in digital cameras) PCMCIA Type I and Type II memory cards (used as solid-state disks in laptops)

10

Characteristics

11

Disk Storage •

Nonvolatile, rotating magnetic storage

12

Disk Drives

Sector org. 10 bytes header 512 Bytes data 12 bytes ECC

13

Disk Drives Platters

Tracks

Platter Sectors

Track

14

Disk Device Terminology Arm Head

Sector

Inner Outer Track Track Platter

Actuator

• Several platters, with information recorded magnetically on both surfaces (usually)

• Bits recorded in tracks, which in turn divided into sectors (e.g., 512 Bytes); error correction code per sector to find and correct errors

• Actuator moves head (end of arm) over track (“seek”), wait for sector rotate under head, then read or write 15

Disk Sectors and Access

• Each sector records – Sector ID – Data (512 bytes, 4096 bytes proposed) – Error correcting code (ECC) •

Used to hide defects and recording errors

– Synchronization fields and gaps

• Access to a sector involves – Queuing delay if other accesses are pending – Seek: move the heads – Rotational latency – Data transfer, Controller overhead

16

Disk drive capacities •

A high-capacity disk drive may have 512 bytes per sector, 1,000 sectors per track, 5,000 tracks per surface, and 8 platters. The total capacity of this drive is C = 512 bytes/sector* 1000 sectors/track * 5000 tracks/surface * 8 platters *2surfaces/platter = 38 GB.

17

Disk Device Performance Outer Track

Inner Sector Head Arm Controller Spindle Track

Platter

Actuator

• Disk Latency = Seek Time + Rotation Time + Transfer Time + Controller Overhead

–Seek Time depends no. tracks move arm, seek speed of disk –Rotation Time depends on speed disk rotates, how far sector is from head

–Transfer Time depends on data rate (bandwidth) of disk (bit density), size of request

18

Disk drive speeds •

To access data: — seek: position head over the proper track (8 to 20 ms. avg.) — rotational latency: wait for desired sector (.5 / RPM) — transfer: grab the data (one or more sectors) 2 to 15 MB/sec



Controller time: overhead the controller imposes in I/O access.



Disk Read Time = Average seek time + average rotational delay + transfer time + controller overhead



What is the average time to read or write a 512-byte sector for a typical disk rotating at 5400 RPM? The average seek time is 12 ms, the transfer rate 5MB/sec, and the controller overhead is 2ms.

• •

Rotational delay = 0.5 rotation/5400RMP = 5.6 ms 19

12ms + 5.6ms + 0.5KB/5MB/sec + 2ms = 19.7 ms

Another Disk Access Example •

Given

– 512B sector, 15,000rpm, 4ms average seek time,

100MB/s transfer rate, 0.2ms controller overhead, idle disk



Average read time

– 4ms seek time

+ ½ / (15,000/60) = 2ms rotational latency + 512 / 100MB/s = 0.005ms transfer time + 0.2ms controller delay = 6.2ms



If actual average seek time is 1ms

– Average read time = 3.2ms 20

Disk Performance Issues

• Manufacturers quote average seek time – Based on all possible seeks – Locality and OS scheduling lead to smaller actual average seek times

• Smart disk controller allocate physical sectors on disk

– Present logical sector interface to host – SCSI, ATA, SATA

• Disk drives include caches – Prefetch sectors in anticipation of access – Avoid seek and rotational delay

21

Exploiting Memory Hierarchy •

Fact: – Large memories are slow (and cheap), – fast memories are small (and expensive)



Users want large, cheap and fast memories!

• How do we create a memory that is large, cheap and fast (most of the time)? – memory hierarchy: multiple levels of memory with different speeds and sizes how: take advantage of principle of locality

22

Memory Hierarchy •

Trade-off among cost, capacity, and access time



Inboard memory

– – – •

•Decreasing cost per bit

Cache Main memory

Outboard storage

– – •

Registers

Magnetic disks CD-ROM

•Increasing capacity •Increasing access time •Decreasing frequency of access by the processor

Off-line storage



HIT RATIO!

Magnetic tapes

23

Memory Hierarchy •Processor Higher Levels in memory hierarchy

•Level 1 •Level 2

•Increasing Distance from Proc., Decreasing speed

•Level 3 •. . . Lower

•Level n

•Size of memory at each level As we move to deeper levels the latency goes up and price per bit goes down.

24

Memory Hierarchy

25

Memory Hierarchy 1. 2. 3. 4.

Registers – Small, Fast, expensive storage internal to processor – Speed = 1 CPU clock cycle, Capacity ~ 0.1K to 2K Bytes Cache (SRAM) – Fast costly storage internal or external to processor – Speed = A few CPU clock cycles, Capacity, 0.5MB to 10MB Main Memory (DRAM) – Large, cheap, Main storage external to processor – Capacity: < 16GB, Speed = ~ 50ns Disk Storage – Large, far from microprocessor, – Very slow – Access time = 1 to 15 msec – Used for backing store for main memory 26

Memory Hierarchy • If level is closer to Processor, it must… –Be smaller –Be faster –Contain a subset (most recently used data) of lower levels beneath it

–Contain all the data in higher levels above it • Lowest Level (usually disk) contains all available data • Is there another level lower than disk?

27

Locality •

The Principle of Locality:

– •

Program (lw and sw) access only small part of memory in a given time slice (tens of contiguous cycles)

Temporal Locality (Locality in Time):reuse of data

– If an item is accessed, it will tend to be referenced again soon. ⇒ Keep most recently accessed data items closer to the processor,

– Examples: instructions in a loop, items in data structure – 90% of the execution time of a program is spent in just 10% of the code due to loops, function calls. 28

Locality •

Spatial Locality (Locality in Space): use of nearby data

. . . 4n

– If an item is referenced, item

4n+1

whose addresses are close by will tend to be referenced soon.

A[0]

4n+2 4n+3 4n+4 4n+5

⇒Move blocks consists of contiguous words to the upper levels,

A[1]

4n+6 4n+7 4n+8 4n+9

A[2]

4n+10

– Example: matrix, array, stack

4n+11

. . .

29

5.3 The Basics of Caches •

What is cache?

– • •

the level of the memory hierarchy between CPU and main memory. Also used to refer to any storage managed to take advantage of locality.

Block: Group of words Set: Group of Blocks CPU address

word

block

cache

Main memory

Mapping function

30

Cache Read Operation • • • •

CPU requests contents of memory location

• •

Then deliver from cache to CPU

Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Cache includes tags to identify which block of main memory is in each cache slot

31

Cache Memory - Terms •

Hit ☺

– – – •

When you look for B in the cache, you find it Hit Time: time it takes to find B Hit Rate: fraction of the time you find B in the cache

Miss

– –

Look for B in cache, can’t find it Data needs to be retrieved from a block in the lower level





stall the CPU, fetch block from memory, deliver to cache, restart



Miss Rate: fraction of the time you can’t find B in the cache when you go looking for B = 1 – hit rate



Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor

Hit Time

Suggest Documents