Chapter 7 The Memory Hierarchy – Part I
The slides of Part I are taken in large part from V. Heuring & H. Jordan, “Computer Systems Design and Architecture” .
S07 Mark Franklin
1
Memory Hierarchy Outline (1) • Memory components: – RAM memory cells & cell arrays. – Static RAM—more expensive, but less complex. – Tree and matrix decoders—needed for large RAM chips. – Dynamic RAM—less expensive, but needs “refreshing” • Chip organization • Timing – ROM—Read-only memory. • Memory boards – Arrays of chips give more addresses and/or wider words. – 2-D and 3-D chip arrays. • Memory modules – Large systems can benefit by partitioning memory for: • separate access by system components. • fast access to multiple words. S07 Mark Franklin
2
Memory Hierarchy Outline (2) • The Memory Hierarchy: from fast & expensive to slow & cheap: – Registers → Cache → Main Memory → Disk – Cache: High speed, expensive (1st level on-chip, 2nd level off-chip) • Design Types: Direct mapped, associative, set associative – Virtual memory: Makes the hierarchy to disk transparent • Address translation: logical address Æ physical address • Memory management — control of information movement between levels. • Multiprogramming, multithreading — computation while waiting for memory Æ improve efficiency and resource utilization. • The “TLB”: For speeding up the address translation process. • Memory as a subsystem: Overall performance.
S07 Mark Franklin
3
Memory Technology Characteristics Level
Memory Type
Average Access Time
1
Cache (on-chip) Main Memory
2
Typical Size
Unit of Transfer (Block Size)
.25 – 10ns
8KB - 8MB
Word 16-64bits
40 – 200ns
2MB - 32GB
Cache line 8B-32B
3
Disk
5 – 10ms
~ 1,000Gb
Page 4KB-16KB
4
Magnetic Tape
1 – 5sec
> 1,000Gb
Record 16KB
S07 Mark Franklin
4
AMD Athlon
S07 Mark Franklin
5
Typical Disk Drive: SATA 750Gb
S07 Mark Franklin
6
Memory Performance Gap Processor-DRAM Memory Gap (latency) µProc 60%/yr. “Moore’s Law” (2X/1.5yr) Processor-Memory Performance Gap: (grows 50% / year) DRAM DRAM 9%/yr. (2X/10 yrs)
Performance
1000
CPU
100 10
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
1
Time S07 Mark Franklin
7
Levels of the Memory Hierarchy Capacity, Access Time, Cost
Staging Xfer Unit
CPU Registers 100s Bytes 8, require additional gate levels. • When fan-in>8, tree & matrix decoders are often used. m
m
m
1
m
4
x x
x
0
1
2– 4 d e co d e r
x
m
m
m
2
m
3
0
5 1
2– 4 d eco der
m
m
0
m
m 6
m
0
m
1
m
2
m
3
4
5
6
7
m
m
m
m
8
9
10
11
m
m
m
m
12
13
14
15
7
2– 4 decoder x
2
x
2
3-to-8 line tree decoder constructed from 2-input gates. S07 Mark Franklin
x
2
x
3
4-to-16 line matrix decoder constructed from 2-input gates. 18
6-Transistor Static RAM Cell
Dual rail data lines for reading and writing ib
b NOT +5
Storage cell
Reading a value: 1) “precharge” the bit lines to a value ½ way between a 0 and a 1,
Word line wi Switches to control access to cell
2) Simultaneously assert the word line. This allows the latch to drive the bit lines to the value stored in the latch.
Active loads
Additional cells Column select (from column address decoder)
Sense/write amplifiers — sense and amplify data on Read, drive bi and bi on write
R/W CS di
S07 Mark Franklin
19
Static RAM Read Operation
Memory address
Read/write
CS Data tAA
Access time from Address—the time required of the RAM array to decode the address and provide value to the data bus. S07 Mark Franklin
20
Static RAM Write Operations Memory address
Read/write
CS Data
tw
Write time—the time the data must be held valid in order to decode address and store value in memory cells. S07 Mark Franklin
21
Example Commercial Product • •
NEC Electronics: 8M Standard Synchronous SRAM 512K words x 16-bits, various package types, voltage supply (3.3v, 2.5v) Clk. Freq. Max. Access Type µPD4482161
µPD4482162
(MHz)
Time (ns)
133
6.5
Flow Thru
117
7.5
Flow Thru
100
8.5
Flow Thru
225
2.8
Pipelined
200
3.1
Pipelined
167
3.5
Pipelined
• Also 512 words x 18-bits; 512K words x 32-bits; 512K words x 36-bits S07 Mark Franklin
22
Dynamic RAM Organization
b i
Single bit line
Capacitor discharges in 4–15 ms. Refresh capacitor: Read the (sensing) value on bit line, amplify it & place it back on bit line to recharge the capacitor.
Switch to control access to cell
t
c
Capacitor stores charge for a 1, no charge for a0
Word line w j
Additional cells
Write: place value on bit line and assert word line. Read: precharge bit line, assert word line, sense value on bit line with sense/amp. Need to refresh the storage cells of DRAM chips complicates DRAM system design. S07 Mark Franklin
Column select (from column address decoder)
Sense/write amplifiers — sense and amplify data on Read, drive bi and bi on write
R/W
R
CS
W
d i
23
Row latches and decoder
DRAM Chip Organization • Addresses are timemultiplexed on address bus using RAS (Row Address Select) & CAS (Column Address Select) as strobes of rows & columns. • CAS is normally used as the CS function.
1024 1024 × 1024 cell array
Control
A0– A9 RAS CAS R/W
Pin counts: • Without addr. multiplexing: 27 pins including power & ground. • With address multiplexing: 17 pins including power & ground. S07 Mark Franklin
1024
10
Control logic
1024 sense/write amplifiers and column latches 1024
10
10 column address latches, 1– 1024 muxes and demuxes
d
o
d
i
24
DRAM Read and Write Cycles Typical DRAM Read operation Memory address
Row address
Memory address
Column address
t RAS
RAS
Typical DRAM Write operation
tPrechg
RAS
CAS
CAS
R/W
W
Data
Data
tA
Column address
tRAS
t prechg
t DHR tC
Access time Cycle time Notice that it is the bit line precharge operation that causes the difference between access time and cycle time. S07 Mark Franklin
Row address
tC
Data hold from RAS.
25
DRAM Refresh & Row Access • Refresh is usually accomplished by a “RAS-only” cycle. The row addr is placed on the addr lines & RAS asserted. This refreshes the entire row. CAS is not asserted. The absence of a CAS phase signals the chip that a row refresh is requested, & thus no data is placed on the external data lines. • Many chips use “CAS before RAS” to signal a refresh. The chip has an internal counter, and whenever CAS is asserted before RAS, it is a signal to refresh the row pointed to by the counter, & to increment the counter. • Most DRAM vendors also supply DRAM controllers that encapsulate the refresh and other functions. • Page mode, nibble mode, and static column mode allow rapid access to the entire row that has been read into the column latches. • Video RAMS, VRAMS, clock an entire row into a shift register where it can be rapidly read out, bit by bit, for display. S07 Mark Franklin
26
DRAM Commercial Product •
Micron Synchronous – DRAM MT48LC128M4A2 - 32M x 4 x 4 Banks – DRAM MT48LC64M8A2 - 16M x 8 x 4 Banks – DRAM MT48LC32M16A2 - 8M x 16 x 4 Banks
•
SEE HANDOUT
S07 Mark Franklin
27
A 2-D CMOS ROM Chip +v
00
Address Ro w dec oder
CS
1
S07 Mark Franklin
0
1
0
28
ROM Types
ROM Type
Cost
Programmability
Time to Program
Time to Erase
Maskprogrammed ROM
Very inexpensive
At factory only
Weeks
N/A
PROM
Inexpensive
Once, by end user
Seconds
N/A
EPROM
Moderate
Many times
Seconds
minutes
Flash EPROM
Moderate
Many times
100 µs
1 s, large block
EEPROM
Expensive
Many times
100 µs
10 ms, byte
S07 Mark Franklin
29
Memory Boards and Modules • There is a need for memories that are larger & wider than a single chip • Chips can be organized into “boards.” • Boards may not be actual, physical boards, but may consist of structured chip arrays present on the motherboard. • A board or collection of boards make up a memory module. • Memory modules: • Satisfy the processor–main memory interface requirements • May have DRAM refresh capability • May expand the total main memory capacity • May be interleaved to provide faster access to blocks of words
S07 Mark Franklin
30
General Structure of a Memory Chip This is a slightly different view of the memory chip than previous.
Chip selects
... Address m
Multiple chip selects ease the assembly of chips into chip arrays. Usually provided by an external AND gate.
Row decoder
CS m
s I/O multiplexer
...
R/W
Memory cell array
s
s
s
. . . CS R/W
Address Data s
Data
S07 Mark Franklin
31
Word Assembly from Narrow Chips All chips have common CS, R/W, and Address lines. Select Address R/W CS
CS
CS
R/W Address
Address
Data
Data
s
...
R/W
R/W Address Data
s
s p×s
P chips expand word size from s bits to p x s bits.
S07 Mark Franklin
32
Increasing the Num. of Words by a Factor of 2k
The additional k address bits are used to select one of 2k chips, each one of which has 2m words: Address m+k
k
k to 2k decoder
... m
R/W CS
CS R/W
Address
CS R/W
R/W
Address
Address
Data
Data
Data
s
s
s s
Word size remains at s bits. S07 Mark Franklin
33
Chip Using 2 Chip Selects
Address m+ q+ k Horizontal decoder
k m
R/W CS1 CS2 R/W Address Data
Vertical decoder
q
Multiple chip select lines are used to This scheme replace the simplifies the last level of decoding from gates in this use of a (q+k)-bit matrix decoder decoder to using one scheme. q-bit & one k-bit decoder.
s One of 2m+q+k s-bit words
S07 Mark Franklin
34
3-Dimensional Dynamic RAM Array Enable
CAS kc
2kc decoder
2kr decoder
RAS
...
kr
2kr decoder
High address
...
...
kc + k r
R/W Multiplexed address
m/2 RAS CAS
• CAS is used to enable top decoder in decoder tree. • Use one 2-D array for each bit. Each 2D array on separate board.
RAS CAS
R/W
R/W
Address
Address
Data
Data
Data w
RAS CAS R/W Address Data
S07 Mark Franklin
35
A Memory Module and Its Interface Must provide— • Read and Write signals. • Ready: memory is ready to accept commands. • Address—to be sent with Read/Write command. • Data—sent with Write or available upon Read when Ready is asserted. • Module select—needed when there is more than one module. Bus Interface:
Address k+m Address register k m
Chip/board selection
Control signal generator: for SRAM, just strobes data on Read, Provides Ready on Read/Write
Module select
Memory boards and/or chips Control signal generator
Read Write Ready
For DRAM—also provides CAS, RAS, R/W, multiplexes address, generates refresh Data signals, and provides Ready. S07 Mark Franklin
w
Data register
w
36
Dynamic RAM Module with Refresh Control Address k+m Address register Chip/board selection
m/2
k
m/2
m/2 Refresh counter
Write
Memory timing generator
m/2
Board and chip selects
RAS Read
Address multiplexer 2
Grant
Refresh
Module select
Request
Refresh clock and control
CAS R/W
Address lines
Dynamic RAM array
Data lines w
Ready
Data register Data
S07 Mark Franklin
w
37
Two Kinds of Memory Module Organizations j + k = m-bit address bus
j
msbs
lsbs k
Module 0 Address Module select Module 1
Memory modules are used to allow access to more than one word simultaneously.
..
Address Module select
Module 2k – 1 Address Module select
(a) Consecutive words in consecutive modules (interleaving)
S07 Mark Franklin
k
lsbs j
Module 0 Address Module select Module 1 Address Module select
..
msbs
k + j = m-bit address bus
Module 2k – 1 Address Module select
(b) Consecutive words in the same module
38
Timing Advantage of Interleaving If time to transmit information over bus, tb, is < module cycle time, tc, Can time multiplex information transmission to several modules; Example: Store one word of each cache line in a separate module.
Main Memory Address:
Word
Module No.
This provides successive words in successive modules. Timing: Bus
Read module 0 address
Module 0
Write module 3 address and data
Module 0 Data return
Module 0 read
Module 3
Module 3 write tb
tc
tb
With interleaving of 2k modules, and tb < tb/2k, it is possible to get a 2k-fold increase in memory bandwidth, provided memory requests are pipelined. DMA satisfies this requirement. S07 Mark Franklin
39