Chapter 7. The Memory Hierarchy Part I

Chapter 7 The Memory Hierarchy – Part I The slides of Part I are taken in large part from V. Heuring & H. Jordan, “Computer Systems Design and Archit...
Author: Rodney Hines
4 downloads 1 Views 862KB Size
Chapter 7 The Memory Hierarchy – Part I

The slides of Part I are taken in large part from V. Heuring & H. Jordan, “Computer Systems Design and Architecture” .

S07 Mark Franklin

1

Memory Hierarchy Outline (1) • Memory components: – RAM memory cells & cell arrays. – Static RAM—more expensive, but less complex. – Tree and matrix decoders—needed for large RAM chips. – Dynamic RAM—less expensive, but needs “refreshing” • Chip organization • Timing – ROM—Read-only memory. • Memory boards – Arrays of chips give more addresses and/or wider words. – 2-D and 3-D chip arrays. • Memory modules – Large systems can benefit by partitioning memory for: • separate access by system components. • fast access to multiple words. S07 Mark Franklin

2

Memory Hierarchy Outline (2) • The Memory Hierarchy: from fast & expensive to slow & cheap: – Registers → Cache → Main Memory → Disk – Cache: High speed, expensive (1st level on-chip, 2nd level off-chip) • Design Types: Direct mapped, associative, set associative – Virtual memory: Makes the hierarchy to disk transparent • Address translation: logical address Æ physical address • Memory management — control of information movement between levels. • Multiprogramming, multithreading — computation while waiting for memory Æ improve efficiency and resource utilization. • The “TLB”: For speeding up the address translation process. • Memory as a subsystem: Overall performance.

S07 Mark Franklin

3

Memory Technology Characteristics Level

Memory Type

Average Access Time

1

Cache (on-chip) Main Memory

2

Typical Size

Unit of Transfer (Block Size)

.25 – 10ns

8KB - 8MB

Word 16-64bits

40 – 200ns

2MB - 32GB

Cache line 8B-32B

3

Disk

5 – 10ms

~ 1,000Gb

Page 4KB-16KB

4

Magnetic Tape

1 – 5sec

> 1,000Gb

Record 16KB

S07 Mark Franklin

4

AMD Athlon

S07 Mark Franklin

5

Typical Disk Drive: SATA 750Gb

S07 Mark Franklin

6

Memory Performance Gap Processor-DRAM Memory Gap (latency) µProc 60%/yr. “Moore’s Law” (2X/1.5yr) Processor-Memory Performance Gap: (grows 50% / year) DRAM DRAM 9%/yr. (2X/10 yrs)

Performance

1000

CPU

100 10

1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

1

Time S07 Mark Franklin

7

Levels of the Memory Hierarchy Capacity, Access Time, Cost

Staging Xfer Unit

CPU Registers 100s Bytes 8, require additional gate levels. • When fan-in>8, tree & matrix decoders are often used. m

m

m

1

m

4

x x

x

0

1

2– 4 d e co d e r

x

m

m

m

2

m

3

0

5 1

2– 4 d eco der

m

m

0

m

m 6

m

0

m

1

m

2

m

3

4

5

6

7

m

m

m

m

8

9

10

11

m

m

m

m

12

13

14

15

7

2– 4 decoder x

2

x

2

3-to-8 line tree decoder constructed from 2-input gates. S07 Mark Franklin

x

2

x

3

4-to-16 line matrix decoder constructed from 2-input gates. 18

6-Transistor Static RAM Cell

Dual rail data lines for reading and writing ib

b NOT +5

Storage cell

Reading a value: 1) “precharge” the bit lines to a value ½ way between a 0 and a 1,

Word line wi Switches to control access to cell

2) Simultaneously assert the word line. This allows the latch to drive the bit lines to the value stored in the latch.

Active loads

Additional cells Column select (from column address decoder)

Sense/write amplifiers — sense and amplify data on Read, drive bi and bi on write

R/W CS di

S07 Mark Franklin

19

Static RAM Read Operation

Memory address

Read/write

CS Data tAA

Access time from Address—the time required of the RAM array to decode the address and provide value to the data bus. S07 Mark Franklin

20

Static RAM Write Operations Memory address

Read/write

CS Data

tw

Write time—the time the data must be held valid in order to decode address and store value in memory cells. S07 Mark Franklin

21

Example Commercial Product • •

NEC Electronics: 8M Standard Synchronous SRAM 512K words x 16-bits, various package types, voltage supply (3.3v, 2.5v) Clk. Freq. Max. Access Type µPD4482161

µPD4482162

(MHz)

Time (ns)

133

6.5

Flow Thru

117

7.5

Flow Thru

100

8.5

Flow Thru

225

2.8

Pipelined

200

3.1

Pipelined

167

3.5

Pipelined

• Also 512 words x 18-bits; 512K words x 32-bits; 512K words x 36-bits S07 Mark Franklin

22

Dynamic RAM Organization

b i

Single bit line

Capacitor discharges in 4–15 ms. Refresh capacitor: Read the (sensing) value on bit line, amplify it & place it back on bit line to recharge the capacitor.

Switch to control access to cell

t

c

Capacitor stores charge for a 1, no charge for a0

Word line w j

Additional cells

Write: place value on bit line and assert word line. Read: precharge bit line, assert word line, sense value on bit line with sense/amp. Need to refresh the storage cells of DRAM chips complicates DRAM system design. S07 Mark Franklin

Column select (from column address decoder)

Sense/write amplifiers — sense and amplify data on Read, drive bi and bi on write

R/W

R

CS

W

d i

23

Row latches and decoder

DRAM Chip Organization • Addresses are timemultiplexed on address bus using RAS (Row Address Select) & CAS (Column Address Select) as strobes of rows & columns. • CAS is normally used as the CS function.

1024 1024 × 1024 cell array

Control

A0– A9 RAS CAS R/W

Pin counts: • Without addr. multiplexing: 27 pins including power & ground. • With address multiplexing: 17 pins including power & ground. S07 Mark Franklin

1024

10

Control logic

1024 sense/write amplifiers and column latches 1024

10

10 column address latches, 1– 1024 muxes and demuxes

d

o

d

i

24

DRAM Read and Write Cycles Typical DRAM Read operation Memory address

Row address

Memory address

Column address

t RAS

RAS

Typical DRAM Write operation

tPrechg

RAS

CAS

CAS

R/W

W

Data

Data

tA

Column address

tRAS

t prechg

t DHR tC

Access time Cycle time Notice that it is the bit line precharge operation that causes the difference between access time and cycle time. S07 Mark Franklin

Row address

tC

Data hold from RAS.

25

DRAM Refresh & Row Access • Refresh is usually accomplished by a “RAS-only” cycle. The row addr is placed on the addr lines & RAS asserted. This refreshes the entire row. CAS is not asserted. The absence of a CAS phase signals the chip that a row refresh is requested, & thus no data is placed on the external data lines. • Many chips use “CAS before RAS” to signal a refresh. The chip has an internal counter, and whenever CAS is asserted before RAS, it is a signal to refresh the row pointed to by the counter, & to increment the counter. • Most DRAM vendors also supply DRAM controllers that encapsulate the refresh and other functions. • Page mode, nibble mode, and static column mode allow rapid access to the entire row that has been read into the column latches. • Video RAMS, VRAMS, clock an entire row into a shift register where it can be rapidly read out, bit by bit, for display. S07 Mark Franklin

26

DRAM Commercial Product •

Micron Synchronous – DRAM MT48LC128M4A2 - 32M x 4 x 4 Banks – DRAM MT48LC64M8A2 - 16M x 8 x 4 Banks – DRAM MT48LC32M16A2 - 8M x 16 x 4 Banks



SEE HANDOUT

S07 Mark Franklin

27

A 2-D CMOS ROM Chip +v

00

Address Ro w dec oder

CS

1

S07 Mark Franklin

0

1

0

28

ROM Types

ROM Type

Cost

Programmability

Time to Program

Time to Erase

Maskprogrammed ROM

Very inexpensive

At factory only

Weeks

N/A

PROM

Inexpensive

Once, by end user

Seconds

N/A

EPROM

Moderate

Many times

Seconds

minutes

Flash EPROM

Moderate

Many times

100 µs

1 s, large block

EEPROM

Expensive

Many times

100 µs

10 ms, byte

S07 Mark Franklin

29

Memory Boards and Modules • There is a need for memories that are larger & wider than a single chip • Chips can be organized into “boards.” • Boards may not be actual, physical boards, but may consist of structured chip arrays present on the motherboard. • A board or collection of boards make up a memory module. • Memory modules: • Satisfy the processor–main memory interface requirements • May have DRAM refresh capability • May expand the total main memory capacity • May be interleaved to provide faster access to blocks of words

S07 Mark Franklin

30

General Structure of a Memory Chip This is a slightly different view of the memory chip than previous.

Chip selects

... Address m

Multiple chip selects ease the assembly of chips into chip arrays. Usually provided by an external AND gate.

Row decoder

CS m

s I/O multiplexer

...

R/W

Memory cell array

s

s

s

. . . CS R/W

Address Data s

Data

S07 Mark Franklin

31

Word Assembly from Narrow Chips All chips have common CS, R/W, and Address lines. Select Address R/W CS

CS

CS

R/W Address

Address

Data

Data

s

...

R/W

R/W Address Data

s

s p×s

P chips expand word size from s bits to p x s bits.

S07 Mark Franklin

32

Increasing the Num. of Words by a Factor of 2k

The additional k address bits are used to select one of 2k chips, each one of which has 2m words: Address m+k

k

k to 2k decoder

... m

R/W CS

CS R/W

Address

CS R/W

R/W

Address

Address

Data

Data

Data

s

s

s s

Word size remains at s bits. S07 Mark Franklin

33

Chip Using 2 Chip Selects

Address m+ q+ k Horizontal decoder

k m

R/W CS1 CS2 R/W Address Data

Vertical decoder

q

Multiple chip select lines are used to This scheme replace the simplifies the last level of decoding from gates in this use of a (q+k)-bit matrix decoder decoder to using one scheme. q-bit & one k-bit decoder.

s One of 2m+q+k s-bit words

S07 Mark Franklin

34

3-Dimensional Dynamic RAM Array Enable

CAS kc

2kc decoder

2kr decoder

RAS

...

kr

2kr decoder

High address

...

...

kc + k r

R/W Multiplexed address

m/2 RAS CAS

• CAS is used to enable top decoder in decoder tree. • Use one 2-D array for each bit. Each 2D array on separate board.

RAS CAS

R/W

R/W

Address

Address

Data

Data

Data w

RAS CAS R/W Address Data

S07 Mark Franklin

35

A Memory Module and Its Interface Must provide— • Read and Write signals. • Ready: memory is ready to accept commands. • Address—to be sent with Read/Write command. • Data—sent with Write or available upon Read when Ready is asserted. • Module select—needed when there is more than one module. Bus Interface:

Address k+m Address register k m

Chip/board selection

Control signal generator: for SRAM, just strobes data on Read, Provides Ready on Read/Write

Module select

Memory boards and/or chips Control signal generator

Read Write Ready

For DRAM—also provides CAS, RAS, R/W, multiplexes address, generates refresh Data signals, and provides Ready. S07 Mark Franklin

w

Data register

w

36

Dynamic RAM Module with Refresh Control Address k+m Address register Chip/board selection

m/2

k

m/2

m/2 Refresh counter

Write

Memory timing generator

m/2

Board and chip selects

RAS Read

Address multiplexer 2

Grant

Refresh

Module select

Request

Refresh clock and control

CAS R/W

Address lines

Dynamic RAM array

Data lines w

Ready

Data register Data

S07 Mark Franklin

w

37

Two Kinds of Memory Module Organizations j + k = m-bit address bus

j

msbs

lsbs k

Module 0 Address Module select Module 1

Memory modules are used to allow access to more than one word simultaneously.

..

Address Module select

Module 2k – 1 Address Module select

(a) Consecutive words in consecutive modules (interleaving)

S07 Mark Franklin

k

lsbs j

Module 0 Address Module select Module 1 Address Module select

..

msbs

k + j = m-bit address bus

Module 2k – 1 Address Module select

(b) Consecutive words in the same module

38

Timing Advantage of Interleaving If time to transmit information over bus, tb, is < module cycle time, tc, Can time multiplex information transmission to several modules; Example: Store one word of each cache line in a separate module.

Main Memory Address:

Word

Module No.

This provides successive words in successive modules. Timing: Bus

Read module 0 address

Module 0

Write module 3 address and data

Module 0 Data return

Module 0 read

Module 3

Module 3 write tb

tc

tb

With interleaving of 2k modules, and tb < tb/2k, it is possible to get a 2k-fold increase in memory bandwidth, provided memory requests are pipelined. DMA satisfies this requirement. S07 Mark Franklin

39