Computer Architecture
ESE 345 Computer Architecture Memory Technology
1
Early Read-Only Memory Technologies
Punched cards, From early 1700s through Jaquard Loom, Babbage, and then IBM Diode Matrix, EDSAC-2 µcode store
Punched paper tape, instruction stream in Harvard Mk 1
IBM Card Capacitor ROS Memory technology
IBM Balanced Capacitor ROS 2
Early Read/Write Main Memory Technologies Babbage, 1800s: Digits stored on mechanical wheels
Williams Tube, Manchester Mark 1, 1947
Mercury Delay Line, Univac 1, 1951
Also, regenerative capacitor memory on Atanasoff-Berry computer, and rotating magnetic drum memory on IBM 650 Memory technology
3
MIT Whirlwind Core Memory
Memory technology
4
Core Memory
Core memory was first large scale reliable main memory invented by Forrester in late 40s/early 50s at MIT for Whirlwind project Bits stored as magnetization polarity on small ferrite cores threaded onto twodimensional grid of wires Coincident current pulses on X and Y wires would write cell and also sense original state (destructive reads)
• Robust, non-volatile storage • Used on space shuttle computers until recently • Cores threaded onto wires by hand (25 billion a year at peak production) • Core access time ~ 1ms DEC PDP-8/E Board, 4K words x 12 bits, (1968) Memory technology
5
Semiconductor Memory
Semiconductor memory began to be competitive in early 1970s
First commercial Dynamic RAM (DRAM) was Intel 1103
Intel formed to exploit market for semiconductor memory Early semiconductor memory was Static RAM (SRAM). SRAM cell internals similar to a latch (cross-coupled inverters).
1Kbit of storage on single chip charge on a capacitor used to hold value
Semiconductor memory quickly replaced core in ‘70s Memory technology
6
One Transistor Dynamic RAM [Dennard, IBM] 1-T DRAM Cell word access transistor TiN top electrode (VREF)
VREF
Ta2O5 dielectric
bit Storage capacitor (FET gate, trench, stack) poly word line
W bottom electrode access transistor Memory technology
7
Sub-70 nm DRAM Structure
[Samsung, sub-70nm DRAM, 2004] Memory technology
8
1977: DRAM faster than microprocessors Apple ][ (1977) CPU: 1000 ns DRAM: 400 ns
Steve Jobs
Steve Wozniak
Memory technology 9
Processor-DRAM Gap (latency) µProc 60%/year CPU Processor-Memory Performance Gap: (growing 50%/yr)
100 10
DRAM 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
1
DRAM 7%/year 1980 1981
Performance
1000
Time Four-issue 3GHz superscalar accessing 100ns DRAM could execute 1,200 instructions during time for one memory access! Memory technology
10
Levels of the Memory Hierarchy
Upper Level
Capacity Access Time Cost CPU Registers 100s Bytes make sure equal! 2.. Select row 3. Cell pulls one line low 4. Sense amp on column detects difference between bit and bit Memory technology
15
Typical SRAM Organization: 16-word x 4-bit Din 3
Din 2
Din 1
Din 0
WrEn Precharge
Wr Driver & - Precharger+
Wr Driver & - Precharger+
Wr Driver & - Precharger+
Wr Driver & - Precharger+
SRAM Cell
SRAM Cell
SRAM Cell
SRAM Cell Word 1
SRAM Cell
SRAM Cell
SRAM Cell
SRAM Cell
:
:
:
:
Address Decoder
Word 0
A0 A1 A2 A3
Word 15
SRAM Cell
SRAM Cell
SRAM Cell
SRAM Cell
- Sense Amp +
- Sense Amp +
- Sense Amp +
- Sense Amp +
Dout 3
Dout 2
Dout 1
Dout 0
Memory technology
Q: Which is longer: word line or bit line? 16
Logic Diagram of a Typical SRAM A N
WE_L OE_L
2 N words x M bit SRAM M
D
° Write Enable is usually active low (WE_L) ° Din and Dout are combined to save pins: • A new control signal, output enable (OE_L) is needed • WE_L is asserted (Low), OE_L is disasserted (High) - D serves as the data input pin • WE_L is disasserted (High), OE_L is asserted (Low) - D is the data output pin • Both WE_L and OE_L are asserted: - Result is unknown. Don’t do that!!!
Memory technology
17
Typical SRAM Timing A N
WE_L OE_L
2 N words x M bit SRAM M
Write Timing:
D
Data In
D
Read Timing: High Z
Data Out
Data Out
Junk A
Read Address
Write Address
Read Address
OE_L WE_L
Write Hold Time Write Setup Time
Read Access Time
Memory technology
Read Access Time
18
1-Transistor Memory Cell (DRAM) row select
° Write: • 1. Drive bit line • 2.. Select row
° Read: • 1. Precharge bit line to Vdd/2 • 2.. Select row bit • 3. Cell and bit line share charges - Very small voltage changes on the bit line • 4. Sense (fancy sense amp) - Can detect changes of ~1 million electrons • 5. Write: restore the value
° Refresh • 1. Just do a dummy read to every cell. Memory technology
19
Classical DRAM Organization (square) bit (data) lines r o w d e c o d e r
row address
Each intersection represents a 1-T DRAM Cell
RAM Cell Array
word (row) select
Column Selector & I/O Circuits
data
Column Address
° Row and Column Address together: • Select 1 bit a time
Memory technology
20
DRAM Architecture bit lines Col. 2M
Col. 1
N+M
Row 1
Row Address Decoder
N
M
Row 2N
Column Decoder & Sense Amplifiers Data
• •
word lines
Memory cell (one bit)
D
Bits stored in 2-dimensional arrays on chip Modern chips have around 4-8 logical banks on each chip –
each logical bank physically implemented as many smaller arrays Memory technology
21
Logic Diagram of a Typical DRAM RAS_L
A 9
CAS_L
WE_L
OE_L
256K x 8 DRAM
8
D
° Control Signals (RAS_L, CAS_L, WE_L, OE_L) are all active low ° Din and Dout are combined (D): • WE_L is asserted (Low), OE_L is disasserted (High) - D serves as the data input pin • WE_L is disasserted (High), OE_L is asserted (Low) - D is the data output pin
° Row and column addresses share the same pins (A) • RAS_L goes low: Pins A are latched in as row address • CAS_L goes low: Pins A are latched in as column address • RAS/CAS edge-sensitive Memory technology
22
DRAM Read Timing ° Every DRAM access begins at:
RAS_L
• The assertion of the RAS_L • 2 ways to read: early or late v. CAS
A
CAS_L
WE_L
256K x 8 DRAM
9
OE_L
D
8
DRAM Read Cycle Time RAS_L CAS_L A
Row Address
Col Address
Junk
Row Address
Col Address
Junk
WE_L OE_L D
High Z Junk Read Access Time
Data Out
Early Read Cycle: OE_L asserted before CAS_L
High Z Output Enable Delay
Data Out
Late Read Cycle: OE_L asserted after CAS_L
Memory technology
23
DRAM Write Timing ° Every DRAM access begins at:
RAS_L
• The assertion of the RAS_L • 2 ways to write: early or late v. CAS
A
CAS_L
WE_L
256K x 8 DRAM
9
OE_L
D
8
DRAM WR Cycle Time RAS_L CAS_L A
Row Address
Col Address
Junk
Row Address
Col Address
Junk
OE_L WE_L D
Junk
Data In
Junk
WR Access Time Early Wr Cycle: WE_L asserted before CAS_L
Data In
Junk
WR Access Time Late Wr Cycle: WE_L asserted after CAS_L
Memory technology
24
DRAM Operation Three steps in read/write access to a given bank Row access (RAS) decode row address, enable addressed row (often multiple Kb in row) bitlines share charge with storage cell small change in voltage detected by sense amplifiers which latch whole row of bits sense amplifiers drive bitlines full rail to recharge storage cells Column access (CAS) decode column address to select small number of sense amplifier latches (4, 8, 16, or 32 bits depending on DRAM package) on read, send latched bits out to chip pins on write, change sense amplifier latches which then charge storage cells to required value can perform multiple column accesses on same row without another row access (burst mode) Precharge charges bit lines to known value, required before next row access Each step has a latency of around 15-20ns in modern DRAMs Various DRAM standards (DDR, RDRAM) have different ways of encoding the signals for transmission to the DRAM, but all share same core architecture Memory technology
25
Advanced DRAM Organization
Bits in a DRAM are organized as a rectangular array DRAM accesses an entire row Burst mode: supply successive words from a row with reduced latency Double data rate (DDR) DRAM Transfer on rising and falling clock edges Quad data rate (QDR) DRAM Separate DDR inputs and outputs
Memory technology
26
Double-Data Rate (DDR2) DRAM 200MHz Clock
Row
Column
Precharge
Row’
Data [ Micron, 256Mb DDR2 SDRAM datasheet ] Memory technology
400Mb/s Data Rate
27
DRAM Generations Row and Column Access times, ns Year
Capacity
$/GB
1980
64Kbit
$1500000
1983
256Kbit
$500000
1985
1Mbit
$200000
1989
4Mbit
$50000
1992
16Mbit
$15000
1996
64Mbit
$10000
1998
128Mbit
$4000
2000
256Mbit
$1000
2004
512Mbit
$250
2007
1Gbit
$50
300 250 200 Trac Tcac
150 100 50 0 '80 '83 '85 '89 '92 '96 '98 '00 '04 '07
Memory technology
28
DRAM name based on Peak Chip Transfers / Sec DIMM name based on Peak DIMM MBytes / Sec Standard
Clock Rate (MHz)
M transfers / second
DRAM Name
Mbytes/s/ DIMM
DDR
133
266
DDR266
2128
PC2100
DDR
150
300
DDR300
2400
PC2400
DDR
200
400
DDR400
3200
PC3200
DDR2
266
533
DDR2-533
4264
PC4300
DDR2
333
667
DDR2-667
5336
PC5300
DDR2
400
800
DDR2-800
6400
PC6400
DDR3
533
1066
DDR3-1066
8528
PC8500
DDR3
666
1333
DDR3-1333
10664
PC10700
DDR3
800
1600
DDR3-1600
12800
PC12800
DDR4
1600
3200
DDR4-3200
25600
PC 25600
x2
x8
Memory technology
DIMM Name
29
DRAM Packaging (Laptops/Desktops/Servers) Clock and control signals
~7
Address lines multiplexed row/column address ~12
DRAM chip
Data bus (4b,8b,16b,32b)
DIMM (Dual Inline Memory Module) contains multiple chips with clock/control/address signals connected in parallel (sometimes need buffers to drive signals to all chips) Data pins work together to return wide word (e.g., 64-bit data bus using 16x4-bit parts)
Memory technology
30
DRAM Packaging, Mobile Devices [ Apple A4 package on circuit board]
Two stacked DRAM die Processor plus logic die
[ Apple A4 package cross-section, iFixit 2010 ] Memory technology
31
Summary ° Two Different Types of Locality: • Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon. • Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon.
° By taking advantage of the principle of locality: • Present the user with as much memory as is available in the cheapest technology. • Provide access at the speed offered by the fastest technology.
° DRAM is slow but cheap and dense: • Good choice for presenting the user with a BIG memory system
° SRAM is fast but expensive and not very dense: • Good choice for providing the user FAST access time.
Memory technology
32