Photonic On-Chip Networks for Performance-Energy Optimized Off-Chip Memory Access

Photonic On-Chip Networks for Performance-Energy Optimized Off-Chip Off Chip Memory Access GILBERT HENDRY JOHNNIE CHAN, DANIEL BRUNINA, LUCA CARLONI,...
Author: Primrose Lamb
3 downloads 0 Views 6MB Size
Photonic On-Chip Networks for Performance-Energy Optimized Off-Chip Off Chip Memory Access GILBERT HENDRY JOHNNIE CHAN, DANIEL BRUNINA,

LUCA CARLONI, KEREN BERGMAN

Lightwave Research Laboratory Columbia University New York, NY

Motivation y The memory yg gap p warrants a p paradigm g shift in how

we move information to and from storage and computing elements

[www.OpenSparc.net] Lightwave Research Lab, Columbia University

[Exascale Report, 2008]

10/1/2009

Main Premise y Current memory y subsystem y technology gy and

packaging are not well-suited to future trends { { { {

Networks on chip Growing i cache h sizes i Growing bandwidth requirements Growing pin counts

Lightwave Research Lab, Columbia University

10/1/2009

SDRAM context • DIMMs controlled fully in parallel, sharing access on data and address busses • Many wires/pins • Matched signal paths (for delay) • DIMMs made for short, random accesses

Chip

Lately, this is on chip

DIMM Memory Controller

[Intel]

DIMM

DIMM Lightwave Research Lab, Columbia University

10/1/2009

Future SDRAM context y Example: p Tilera TILE 64 4

Lightwave Research Lab, Columbia University

10/1/2009

SDRAM DIMM Anatomy DRAM_Bank

DRAM_Chip data

IO

Cntrl

Banks (usually 8)

Row addr/en

Col Decoder Sense Amps

Row Deco oder

Col addr/en

data

DRAM cell arrays

Addr/ cntrl

Ranks

DRAM_DIMM

Lightwave Research Lab, Columbia University

SDRAM device

10/1/2009

Memory Access in an Electronic NoC message

Packetized, size of packet k t determined d t i d by router buffers

Chip Boundary

NoC router

Memory Controller

Burst length dictated by packet size

Lightwave Research Lab, Columbia University

10/1/2009

Memory Control y Complex p DRAM control { Scheduling accesses around: Open/closed rows Ù Precharging Ù Refreshing Ù Data/Control bus usage Ù

[DRAMsim, UMD] Lightwave Research Lab, Columbia University

10/1/2009

Experimental Setup – Electronic NoC System:

5-port Electronic Router

y 2cm×2cm chip y 8×8 Electronic Mesh { {

28 DRAM Access points (MCs) 2 DIMMs per DRAM AP

y Routers: R { { { {

1 kb input buffers (per VC) 4 virtual channels 256 b packet size 128 b channels h l

y 32 nm tech. point (ORION) { { {

Normal Vt Vdd = 1.0 V F Freq = 22.5 5 GH GHz

Traffic: y y y y

Random core-DRAM access point pairs Random read/write Uniform message sizes Poisson arrival at 1µs

Lightwave Research Lab, Columbia University

DRAM: y y y

Modeled cycle-accurately with DRAMsim [Univ. MD] DDR3 (10 (10-10-10) 10 10) @ 1333 MT/s MT/ 8 chips per DIMM, 8 banks per Chip, 2 ranks 10/1/2009

Experiment Results 269 Gb/s 100

250

Latency (µ µs)

DRAM M Bandwidth h (Gb/s)

300

200 150 100

Avg Read Latency Zero Load Latency

10

1

50

0.1 0 1000

10000

Msg g Size ((b))

Lightwave Research Lab, Columbia University

1000

10000

Msg Size (b)

10/1/2009

Current

Lightwave Research Lab, Columbia University

10/1/2009

Goal: Optically Integrated Memory Optical Fiber

Optical Transceiver

Vdd, Gnd

Lightwave Research Lab, Columbia University

10/1/2009

Advantages of Photonics y Decoupled energy-distance relationship y No long traces to drive and synch with clock { {

DRAM chips can run faster L Less power

y Less pins on DIMM module and going into chip { {

Eventuallyy required q byy packaging p g g constraints Waveguides can achieve dramatically higher density due to WDM

y DRAM can be arbitrarily distant – fiber is low loss

Lightwave Research Lab, Columbia University

10/1/2009

Hybrid Circuit-Switched Photonic Network Broadband 1×2 Switch

[Cornell, 2008]

Tran nsmission n

Broadband 2×2 Switch

Lightwave Research Lab, Columbia University

[Shacham, NOCS ’07]

λ

10/1/2009

Hybrid Circuit-Switched Photonic Network

Lightwave Research Lab, Columbia University

10/1/2009

Hybrid Circuit-Switched Photonic Network 16

International Symposium on Networks-on-Chip

10/1/2009

Hybrid Circuit-Switched Photonic Network 17

[Bergman, HPEC ’07] International Symposium on Networks-on-Chip

10/1/2009

Photonic DRAM Access Fiber / PCB waveguide

DIMM

Memory gateway

DIMM

Photonic + electronic

DIMM Procesor gateway

To network

electronic l i

Processor / cache

Modulators needed to send commands to DRAM Chi p boundary Photonic switch

Modulators cntrl

Memory Control

Mem cntrll

generates memory control commands

Network Interface

To/From network Lightwave Research Lab, Columbia University

10/1/2009

Memory Transaction DIMM

Memory gateway

3

To network et o k

DIMM DIMM

2 Procesor gateway

1 1

Processor / cache Chi p boundary

Lightwave Research Lab, Columbia University

1) Read or write request is initiated from local or remote processor, travels on electronic network 2) Processor Gateway forwards it to Memory gateway 3) Memory gateway receives request

10/1/2009

Memory READ Transaction 4) MC receives READ command 5) Switch is setup from modulators to DIMM, and from DIMM to network 6) Path setup travels back to receiving Processor. Path ACK returns when path is set up 7) Row/Col addresses sent to DIMM optically 8) R Read dd data returned d optically i ll 9) Path torn down, MC knows how long it will take

8

7 Modulators

Photonic switch

5 Control

4 8

Lightwave Research Lab, Columbia University

6 10/1/2009

Memory WRITE Transaction 4) MC receives WRITE command, which is also a path setup from the processor to memory gateway 5) Switch is setup from modulators to DIMM 6) Row/Col addresses sent to DIMM 7) Switch is setup from network to DIMM 8) Path ACK sent back to Processor 9)) D Data transmitted i d optically i ll to DIMM 10) Path torn down from Processor after data transmitted

9

6 Modulators

Photonic switch

5 7 Control

4 8 Lightwave Research Lab, Columbia University

10/1/2009

Optical Circuit Memory (OCM) Anatomy Packe t Format

Detector Bank

λ

DRAM_OpticalTransceiver Cntrl

Burst length

Bank ID

DLL

Col address a

Row address a

Data

Latches

Modulator Bank



Addr/cntrl (25)

Mux

Data (64)

Nλ drivers clk

t tRCD

tCL Fiber Coupling

OR

Waveguide Coupling

Lightwave Research Lab, Columbia University

VDD, Gnd

10/1/2009

Advantages of Photonics y Decoupled energy-distance relationship y No long traces to drive and synch with clock { {

DRAM chips can run faster L Less power

y Less pins on DIMM module and going into chip { {

Eventuallyy required q byy packaging p g g constraints Waveguides can achieve dramatically higher density due to WDM

y DRAM can be arbitrarily distant – fiber is low loss y Simplified memory control logic – no contending

accesses, contention handled by path setup {

Accesses are optimized for large streams of data

Lightwave Research Lab, Columbia University

10/1/2009

Experimental Setup - Photonic System:

Photonic Torus Tile

y 2cm×2cm chip y 8×8 Photonic Torus { {

28 DRAM Access points (MCs) 2 DIMMs per DRAM AP

y Routers: R { { {

256 b buffers 32 b packet size 32 b channels

y 32 nm tech. h point i (ORION) { { {

High Vt Vdd = 0.8 V Freq = 1 GHz

y Photonics Ph t i - 13λ

Traffic: y y y y

Random core-DRAM access point pairs Random read/write Uniform message sizes Poisson arrival at 1µs

Lightwave Research Lab, Columbia University

DRAM:

y Modeled with our event-driven DRAM model y DDR3 (10-10-10) @ 1600 MT/s y 8 chips per DIMM, 8 banks per Chip 10/1/2009

Avg Rea ad Latency (µs)

Performance Comparison

700 600 500

Electronic Mesh Photonic Torus

400

200 100 1000

10000

Msg Size (b)

Electronic Mesh Photonic Ph t i T Torus

10

1

0.1 10

300

Zero Load La atency (µs)

DRAM Ba andwidth (Gb b/s)

800

100

1000

10000

Electronic Mesh Photonic Torus 1

0.1

1000

10000

Msg Size (b) Lightwave Research Lab, Columbia University

10/1/2009

Experiment #2 Random

Lightwave Research Lab, Columbia University

Statically Mapped Address Space

10/1/2009

Results

1000

1000 800

EM - random EM- mapped PT - random PT - mapped

Avg R Read Latenccy (µs)

DRAM M Bandwidth h (Gb/s)

1200

600 400 200 0

100

EM - random d EM- mapped PT - random PT - mapped

10

1

0.1

0.01

1000

10000

Msg Size (b)

Lightwave Research Lab, Columbia University

1000

10000

Msg Size (b)

10/1/2009

Network Energy Comparison Electronic Mesh

Photonic Torus 1%

1% 7%

Electronic Arbiter Electronic Clock Tree

16% Electronic Arbiter Electronic Clock Tree

Electronic Crossbar

3% 3

Electronic Inport

4%

Electronic Crossbar

Electronic Wire

Electronic IO Wire

6%

Electronic Inport

4%

Electronic Wire

Detector 57%

Modulator PSE1x2

9%

90%

PSE2x2 Thermal Tuning

Power = 0.42 W Power = 13.3 W

Total Power = 2.53 W (Including laser power)

Lightwave Research Lab, Columbia University

10/1/2009

Summary y Extending gap photonic network to include access to

DRAM looks good for many reasons: {

{ {

Circuit-switching allows large burst lengths and simplified memory control, control for increased bandwidth bandwidth. Energy efficient end-to-end transmission Alleviates p pin count constraints with high-density g y waveguides g

PhotoMAN Lightwave Research Lab, Columbia University

10/1/2009