ECE 485/585 Microprocessor System Design Lecture 7:
Memory Modules Error Correcting Codes Memory Controllers
Zeshan Chishti Electrical and Computer Engineering Dept.
Maseeh College of Engineering and Computer Science Source: Lecture based on materials provided by Mark F.
DRAM Refresh Frequency DRAM standard requires memory controllers to send periodic refresh commands to DRAM
tRefLatency (tRFC): Varies based on DRAM chip density (e.g., 350ns)
Timeline
tRefPeriod (tREFI): Remains constant (7.8 usec for Current generation DRAM)
ECE 485/585
2
Refresh Now: N Simultaneous Rows CLK
CMD
PRE
REF
ACT
ADDR Bank/All
ROW
DATA tRP
Device densit y 8Gb
Num. Bank s 16
Perbank Rows 128K
Total Rows
Rows tRFC in AR (ɳs)
2M
256
350
16Gb
16
256K
4M
512
480
32Gb
16
512K
8M
1024 640
tRFC
Command is called Auto-Refresh (AR) Retention Time = 64ms Refresh N rows in each 7.8 usec (64ms ÷ 8K), perbank N increases with density (N: 16 8Gb, 32 16Gb)
ECE 485/585
3
Impact of Refresh on Performance DRAM is unavailable to serve requests for of time
tRefLatency tRefPeriod
4.5% for today’s 4Gb DRAM Unavailability increases with higher density due to higher tRefLatency
ECE 485/585
4
Memory Modules
184 pin DDR SDRAM DIMM
All chips in a “rank” receive same address and control signals Each chip responsible for subset of data bits in its rank Module acts as high capacity DRAM with wide data path
Example: 8 chips, each 8 bits wide = 64 bits
Easy to add/replace memory in a system
No need to solder or remove individual chips
Memory granularity issue
What’s the smallest increment in memory size?
ECE 485/585
From Hsien-Hsin Sean Lee, Georgia Institute of Technology
DRAM Ranks
ECE 485/585
Organization of DRAM Modules
ECE 485/585
Memory Modules
SIMM (Single Inline Memory Module) 30-pin: some 286, most 386, some 486 systems – Page Mode, Fast Page mode devices 72-pin: some 386, most 486, nearly all Pentium (before DIMM) – Fast Page Mode, EDO devices DIMM (Dual Inline Memory Module) Dominant today SODIMM (Small Outline DIMM) Used in notebooks, Apple iMac RIMM (Rambus RDRAM Module)
SIMM
168 pin SDRAM DIMM
184 pin DDR SDRAM DIMM
200 pin DDR2, DDR3 SDRAM DIMM SODIMM
240 pin DDR2, DDR3 SDRAM DIMM
RIMM ECE 485/585
RIMM
SPD (Serial Presence Detect) 8-pin serial EEPROM on memory module Key parameters for SDRAM controller
Number of row/column addresses Number of ranks Module width Refresh rate/type Error checking (none, parity, ECC) Latency Timing parameters
ECE 485/585
DRAM and DIMM Nomenclature Device name
Clock
M transfers per sec
MB/sec Per DIMM
DIMM name
DDR200
100 MHz
200
1,600 MB/s
PC-1600
DDR266
133 MHz
266
2,133 MB/s
PC-2100
DDR333
166 MHz
333
2,666 MB/s
PC-2700
DDR400
200 MHz
400
3,200 MB/s
PC-3200
DDR2-400
200 MHz
400
3,200 MB/s
PC2-3200
DDR2-533
266 MHz
533
4,266 MB/s
PC2-4200
DDR2-667
333 MHz
666
5,333 MB/s
PC2-5300
DDR2-800
400 MHz
800
6,400 MB/s
PC2-6400
DDR2-1066
533 MHz
1066
8,533 MB/s
PC2-8500
DDR3-800
400 MHz
800
6,400 MB/s
PC3-6400
DDR3-1066
533 MHz
1066
8,500 MB/s
PC3-8500
DDR3-1333
666 MHz
1333
10,666 MB/s
PC3-10600
DDR3-1600
800 MHz
1600
12,800 MB/s
PC3-12800
DDR3-1866
933 MHZ
1866
14928 MB/s
PC3-14900
M transfers/second = 2 transfers (DDR) x Clock Rate DRAM name incorporates M transfers per second MB/sec = 8 bytes x M transfers per second DIMM name incorporates MB/sec (rounded) ECE 485/585
DRAM/SDRAM Latency Specifications
DRAM
Used 4 numbers (e.g. 4-1-1-1)
Indicates number of CPU cycles for 1st and successive accesses
SDRAM
CAS Latency (CAS or CL)
Delay in clock cycles between request and the time the first data is available
PC133 module might be described as CAS-2, CAS=2, CL2, CL-2, or CL=2
SDR-DRAM
DDR-DRAM
CAS Latency of 1, 2, or 3 CAS Latency of 2 or 2.5
When three numbers appear (e.g. 3-2-2)
CAS Latency (tCAC)
RAS-to-CAS delay (tRCD)
RAS pre-charge time (tRP)
DDR3 seeing use of four numbers
CAS Latency ( tCAS tCL, CL)
RAS-to-CAS delay (tRCD)
RAS pre-charge time (tRP)
RAS access time (tRAS)
ECE 485/585
3-3-3-10 timing
Key SDRAM Timing Parameters Determines Latency: tRCD: Minimum time between an ACTIVE command and READ command
CL (CAS Latency): Time between READ command and first data valid
Determines Bandwidth: tRC: Time between successive row access to different rows (tRC = tRAS + tRP)
tRAS : Time between ACTIVE command and end of restoration of data in DRAM array
tRP: Time to pre-charge DRAM array in preparation for another row access
ECE 485/585
EX: Comparing Performance of DIMMs Parameter Clock Period
TCK
CAS Latency
CL
PC3-12800 DIMM 1/800Mhz = 1.25ns 9
RAS-to-CAS Delay
TRCD
9
9
RAS pre-charge time
TRP
9
9
RAS access time
TRAS
27
27
Cost/pair
$
176
196
SDRAM Spec
PC3-14900 DIMM 1/933Mhz = 1.07ns 9
Best Bandwidth/$:
tRC = tRAS + tRP = 27 + 9 = 36 (for both DIMMs)
14900/12800 = 1.16, 196/176 = 1.11 so 16% bandwidth gain, 11% increase in cost…I’d buy the PC3-14900 DIMMs
Time from ACTIVE to end of cycle:
Time to first byte (Latency) for PC3-12800 = TRCD + CL = 9 + 9 = 18
Time to get 8 bytes of data (burst size = 8, DDR) = 4
Total time = (18 + 4) * 1.25ns = 27.5ns
ECE 485/585
DDR4 • • • • • •
JEDEC released standard September 2012 Projected to be ~50% of market by 2015-2016 Hynix announced 128 GB module using 8 Gb DDR4 in April 2014 AMD (Hierofalcon), Intel (Haswell-E) supporting DDR4 in 2014 No longer multi-drop – point-to-point with single DIMM per channel 284-pin DIMM interface
ECE 485/585
Error Correcting Codes
ECE 485/585
Error Correction
Motivation
Failures/time proportional to number of bits
As DRAM cells size & voltages shrink, more vulnerable
Why was/is this not issue on your PC?
Failure rate was low
Few consumers would know what to do anyway
DRAM banks too large – so much memory that not likely to encounter an error
Servers (always) correct memory system errors (e.g. usually use ECC)
Sources
Alpha particles (impurities in IC manufacturing)
Cosmic rays (vary with altitude)
Bigger problem in Denver and on space-bound electronics
Noise
Need to handle failures throughout memory subsystem
DRAM chips, module, bus
DRAM chips don’t incorporate ECC
Store the ECC bits in DRAM alongside the data bits
Chipset (or integrated controller) handles ECC
ECE 485/585
Error Detection: Parity
[from Bruce Jacob]
ECE 485/585
Error Correction Codes (ECC) Single bit error correction requires n+1 check bits for 2n data bits
ECE 485/585
Error Correction Codes (ECC)
=1^0^0^0 = 1
1 ECE 485/585
Error Correction Codes (ECC) An example: decoding and verifying Sent -> Recv’d ->
1
1
=1^0^0^0 = 1
1
R1011
ECE 485/585
1
Error Correction Codes (ECC) Add another check bit – SECDED Single Error Correction Double Error Detection
requires n+2 check bits for 2n data bits ECE 485/585
Error Correction Codes (ECC)
64-bit data path + 8 bits ECC stored to DRAM module
[from Bruce Jacob]
ECE 485/585
Memory Controllers
ECE 485/585
Memory Controllers
Handle the actual interface to memory Determine memory configuration/capability Memory Timing/Signal interface Address Mapping
Error Correction Scheduling Refresh
WAS in North Bridge of chipset
Physical Address to Memory Topology
Intel prior to Nehalem MCH (Memory Controller Hub) Isolates mp from memory technology/device changes
IS Integrated with microprocessor
AMD, Intel Nehalem Low latency for high performance Opens possibility for processor-directed hints
ECE 485/585
Address Mapping Dual channels Channel ID
ECE 485/585
Memory module Rank
Row
Bank
Column
Address Mapping
(cont’d)
Dual channels Channel ID
Memory module Rank
Row
Bank
Column
Channel Physical path between CPU and memory Rank Group of DRAM chips operating in lockstep Same address, control, CS Responsible for subset of same “word” Bank Set of independent memory arrays in DRAM chip Row/Column Address of bit cell in a bank May be several “planes” to achieve n bits “wide” ECE 485/585
Memory Scheduling
Memory transactions: read, write
•
DRAM commands: refresh, activate, read, write, precharge
Memory scheduling policy – – – – – –
Handle transaction requests •
Possibly from different cores
• •
CPU cache line fill request Prefetch
• •
Open Page Close Page
Refresh Prioritize low/high priority
Prioritize Read over Write Re-order to take advantage of open page in bank Page policy
ECE 485/585
Memory Scheduling Without access scheduling (56 DRAM cycles) Time (cycles) 01
10 P
(0,0,0)
A
20
30
40
50
56
C P
(0,1,0)
A
C P
(0,0,1)
A
C P
(0,1,3)
A
C
P
(1,0,0)
A
C P
(1,1,1)
A
C P
(1,0,1)
A
C P
(1,1,2)
With access scheduling (19 DRAM cycles) 01 (0,0,0)
10 P
A
20
C P
(0,1,0)
A
C
C
(0,0,1)
C
(0,1,3) (1,0,0)
P
(bank,row,col)
A
C
(1,1,1)
P
(1,0,1)
A
C
C C
(1,1,2)
ECE 485/585
DRAM commands P: bank precharge (3 cycles) A: row activation (3 cycles) C: column access (1 cycle)
A
C
Memory Access to Idle Bank
ECE 485/585
Memory Access to Active Page (Open Bank)
ECE 485/585
Memory Access to New Page (Open Bank)
ECE 485/585
Open page vs.Close page policy Open page policy: Row hit latency: tCL+tBURST Row miss latency: tRP + tRCD + tCL + tBURST
Close page policy: Row is closed after every access => no row hits Latency: tRCD + tCL + tBURST (slower than open page row hits but faster than open page row misses)
Assume than n% of the accesses are row hits with open page policy, then the break-even point for leaving the page open (or close) will be: tRCD + tCL = (n * tCL) + ((1 – n) * (tRP+tRCD + tCL)) n = tRP / (tRP + tRCD) ECE 485/585