585 Microprocessor System Design

ECE 485/585 Microprocessor System Design Lecture 7: Memory Modules Error Correcting Codes Memory Controllers Zeshan Chishti Electrical and Computer ...
3 downloads 0 Views 2MB Size
ECE 485/585 Microprocessor System Design Lecture 7:

Memory Modules Error Correcting Codes Memory Controllers

Zeshan Chishti Electrical and Computer Engineering Dept.

Maseeh College of Engineering and Computer Science Source: Lecture based on materials provided by Mark F.

DRAM Refresh Frequency  DRAM standard requires memory controllers to send periodic refresh commands to DRAM

tRefLatency (tRFC): Varies based on DRAM chip density (e.g., 350ns)

Timeline

tRefPeriod (tREFI): Remains constant (7.8 usec for Current generation DRAM)

ECE 485/585

2

Refresh Now: N Simultaneous Rows CLK

CMD

PRE

REF

ACT

ADDR Bank/All

ROW

DATA tRP

Device densit y 8Gb

Num. Bank s 16

Perbank Rows 128K

Total Rows

Rows tRFC in AR (ɳs)

2M

256

350

16Gb

16

256K

4M

512

480

32Gb

16

512K

8M

1024 640

tRFC

 Command is called Auto-Refresh (AR)  Retention Time = 64ms  Refresh N rows in each 7.8 usec (64ms ÷ 8K), perbank  N increases with density (N: 16 8Gb, 32 16Gb)

ECE 485/585

3

Impact of Refresh on Performance  DRAM is unavailable to serve requests for of time

tRefLatency tRefPeriod

 4.5% for today’s 4Gb DRAM  Unavailability increases with higher density due to higher tRefLatency

ECE 485/585

4

Memory Modules

184 pin DDR SDRAM DIMM

 All chips in a “rank” receive same address and control signals  Each chip responsible for subset of data bits in its rank  Module acts as high capacity DRAM with wide data path 

Example: 8 chips, each 8 bits wide = 64 bits

 Easy to add/replace memory in a system 

No need to solder or remove individual chips

 Memory granularity issue 

What’s the smallest increment in memory size?

ECE 485/585

From Hsien-Hsin Sean Lee, Georgia Institute of Technology

DRAM Ranks

ECE 485/585

Organization of DRAM Modules

ECE 485/585

Memory Modules 



 

SIMM (Single Inline Memory Module)  30-pin: some 286, most 386, some 486 systems – Page Mode, Fast Page mode devices  72-pin: some 386, most 486, nearly all Pentium (before DIMM) – Fast Page Mode, EDO devices DIMM (Dual Inline Memory Module)  Dominant today SODIMM (Small Outline DIMM)  Used in notebooks, Apple iMac RIMM (Rambus RDRAM Module)

SIMM

168 pin SDRAM DIMM

184 pin DDR SDRAM DIMM

200 pin DDR2, DDR3 SDRAM DIMM SODIMM

240 pin DDR2, DDR3 SDRAM DIMM

RIMM ECE 485/585

RIMM

SPD (Serial Presence Detect)  8-pin serial EEPROM on memory module  Key parameters for SDRAM controller       

Number of row/column addresses Number of ranks Module width Refresh rate/type Error checking (none, parity, ECC) Latency Timing parameters

ECE 485/585

DRAM and DIMM Nomenclature Device name

Clock

M transfers per sec

MB/sec Per DIMM

DIMM name

DDR200

100 MHz

200

1,600 MB/s

PC-1600

DDR266

133 MHz

266

2,133 MB/s

PC-2100

DDR333

166 MHz

333

2,666 MB/s

PC-2700

DDR400

200 MHz

400

3,200 MB/s

PC-3200

DDR2-400

200 MHz

400

3,200 MB/s

PC2-3200

DDR2-533

266 MHz

533

4,266 MB/s

PC2-4200

DDR2-667

333 MHz

666

5,333 MB/s

PC2-5300

DDR2-800

400 MHz

800

6,400 MB/s

PC2-6400

DDR2-1066

533 MHz

1066

8,533 MB/s

PC2-8500

DDR3-800

400 MHz

800

6,400 MB/s

PC3-6400

DDR3-1066

533 MHz

1066

8,500 MB/s

PC3-8500

DDR3-1333

666 MHz

1333

10,666 MB/s

PC3-10600

DDR3-1600

800 MHz

1600

12,800 MB/s

PC3-12800

DDR3-1866

933 MHZ

1866

14928 MB/s

PC3-14900

M transfers/second = 2 transfers (DDR) x Clock Rate DRAM name incorporates M transfers per second MB/sec = 8 bytes x M transfers per second DIMM name incorporates MB/sec (rounded) ECE 485/585

DRAM/SDRAM Latency Specifications 





DRAM 

Used 4 numbers (e.g. 4-1-1-1)



Indicates number of CPU cycles for 1st and successive accesses

SDRAM 

CAS Latency (CAS or CL)



Delay in clock cycles between request and the time the first data is available



PC133 module might be described as CAS-2, CAS=2, CL2, CL-2, or CL=2

SDR-DRAM 



DDR-DRAM 





CAS Latency of 1, 2, or 3 CAS Latency of 2 or 2.5

When three numbers appear (e.g. 3-2-2) 

CAS Latency (tCAC)



RAS-to-CAS delay (tRCD)



RAS pre-charge time (tRP)

DDR3 seeing use of four numbers 

CAS Latency ( tCAS tCL, CL)



RAS-to-CAS delay (tRCD)



RAS pre-charge time (tRP)



RAS access time (tRAS)

ECE 485/585

3-3-3-10 timing

Key SDRAM Timing Parameters  Determines Latency:  tRCD: Minimum time between an ACTIVE command and READ command



CL (CAS Latency): Time between READ command and first data valid

 Determines Bandwidth:  tRC: Time between successive row access to different rows (tRC = tRAS + tRP) 

tRAS : Time between ACTIVE command and end of restoration of data in DRAM array



tRP: Time to pre-charge DRAM array in preparation for another row access

ECE 485/585

EX: Comparing Performance of DIMMs Parameter Clock Period

TCK

CAS Latency

CL

PC3-12800 DIMM 1/800Mhz = 1.25ns 9

RAS-to-CAS Delay

TRCD

9

9

RAS pre-charge time

TRP

9

9

RAS access time

TRAS

27

27

Cost/pair

$

176

196





SDRAM Spec

PC3-14900 DIMM 1/933Mhz = 1.07ns 9

Best Bandwidth/$: 

tRC = tRAS + tRP = 27 + 9 = 36 (for both DIMMs)



14900/12800 = 1.16, 196/176 = 1.11 so 16% bandwidth gain, 11% increase in cost…I’d buy the PC3-14900 DIMMs

Time from ACTIVE to end of cycle: 

Time to first byte (Latency) for PC3-12800 = TRCD + CL = 9 + 9 = 18



Time to get 8 bytes of data (burst size = 8, DDR) = 4



Total time = (18 + 4) * 1.25ns = 27.5ns

ECE 485/585

DDR4 • • • • • •

JEDEC released standard September 2012 Projected to be ~50% of market by 2015-2016 Hynix announced 128 GB module using 8 Gb DDR4 in April 2014 AMD (Hierofalcon), Intel (Haswell-E) supporting DDR4 in 2014 No longer multi-drop – point-to-point with single DIMM per channel 284-pin DIMM interface

ECE 485/585

Error Correcting Codes

ECE 485/585

Error Correction 



Motivation 

Failures/time proportional to number of bits



As DRAM cells size & voltages shrink, more vulnerable

Why was/is this not issue on your PC? 

Failure rate was low



Few consumers would know what to do anyway



DRAM banks too large – so much memory that not likely to encounter an error



Servers (always) correct memory system errors (e.g. usually use ECC)



Sources 

Alpha particles (impurities in IC manufacturing)



Cosmic rays (vary with altitude) 





Bigger problem in Denver and on space-bound electronics

Noise

Need to handle failures throughout memory subsystem 

DRAM chips, module, bus



DRAM chips don’t incorporate ECC 

Store the ECC bits in DRAM alongside the data bits



Chipset (or integrated controller) handles ECC

ECE 485/585

Error Detection: Parity

[from Bruce Jacob]

ECE 485/585

Error Correction Codes (ECC) Single bit error correction requires n+1 check bits for 2n data bits

ECE 485/585

Error Correction Codes (ECC)

=1^0^0^0 = 1

1 ECE 485/585

Error Correction Codes (ECC) An example: decoding and verifying Sent -> Recv’d ->

1

1

=1^0^0^0 = 1

1

R1011

ECE 485/585

1

Error Correction Codes (ECC) Add another check bit – SECDED Single Error Correction Double Error Detection

requires n+2 check bits for 2n data bits ECE 485/585

Error Correction Codes (ECC)

64-bit data path + 8 bits ECC stored to DRAM module

[from Bruce Jacob]

ECE 485/585

Memory Controllers

ECE 485/585

Memory Controllers 

Handle the actual interface to memory  Determine memory configuration/capability  Memory Timing/Signal interface  Address Mapping 

   

Error Correction Scheduling Refresh

WAS in North Bridge of chipset   



Physical Address to Memory Topology

Intel prior to Nehalem MCH (Memory Controller Hub) Isolates mp from memory technology/device changes

IS Integrated with microprocessor   

AMD, Intel Nehalem Low latency for high performance Opens possibility for processor-directed hints

ECE 485/585

Address Mapping Dual channels Channel ID

ECE 485/585

Memory module Rank

Row

Bank

Column

Address Mapping

(cont’d)

Dual channels Channel ID

Memory module Rank

Row

Bank

Column

Channel Physical path between CPU and memory Rank Group of DRAM chips operating in lockstep Same address, control, CS Responsible for subset of same “word” Bank Set of independent memory arrays in DRAM chip Row/Column Address of bit cell in a bank May be several “planes” to achieve n bits “wide” ECE 485/585

Memory Scheduling

Memory transactions: read, write



DRAM commands: refresh, activate, read, write, precharge

Memory scheduling policy – – – – – –

Handle transaction requests •

Possibly from different cores

• •

CPU cache line fill request Prefetch

• •

Open Page Close Page

Refresh Prioritize low/high priority

Prioritize Read over Write Re-order to take advantage of open page in bank Page policy

ECE 485/585

Memory Scheduling Without access scheduling (56 DRAM cycles) Time (cycles) 01

10 P

(0,0,0)

A

20

30

40

50

56

C P

(0,1,0)

A

C P

(0,0,1)

A

C P

(0,1,3)

A

C

P

(1,0,0)

A

C P

(1,1,1)

A

C P

(1,0,1)

A

C P

(1,1,2)

With access scheduling (19 DRAM cycles) 01 (0,0,0)

10 P

A

20

C P

(0,1,0)

A

C

C

(0,0,1)

C

(0,1,3) (1,0,0)

P

(bank,row,col)

A

C

(1,1,1)

P

(1,0,1)

A

C

C C

(1,1,2)

ECE 485/585

DRAM commands P: bank precharge (3 cycles) A: row activation (3 cycles) C: column access (1 cycle)

A

C

Memory Access to Idle Bank

ECE 485/585

Memory Access to Active Page (Open Bank)

ECE 485/585

Memory Access to New Page (Open Bank)

ECE 485/585

Open page vs.Close page policy  Open page policy:  Row hit latency: tCL+tBURST  Row miss latency: tRP + tRCD + tCL + tBURST

 Close page policy:  Row is closed after every access => no row hits  Latency: tRCD + tCL + tBURST (slower than open page row hits but faster than open page row misses)

 Assume than n% of the accesses are row hits with open page policy, then the break-even point for leaving the page open (or close) will be: tRCD + tCL = (n * tCL) + ((1 – n) * (tRP+tRCD + tCL)) n = tRP / (tRP + tRCD) ECE 485/585