3D-DRAM Circuit Design, Modeling and Exploration for Computer Memory Hierarchy
3D-DRAM Circuit Design, Modeling and Exploration for Computer Memory Hierarchy Rakesh Anigu, Hongbin Sun, James J.-Q. Lu, Ken Rose, and Tong Zhang Ele...
3D-DRAM Circuit Design, Modeling and Exploration for Computer Memory Hierarchy Rakesh Anigu, Hongbin Sun, James J.-Q. Lu, Ken Rose, and Tong Zhang Electrical, Computer and Systems Engineering Department Rensselaer Polytechnic Institute
Motivation TSV size/pitch but…
Thermal Yield loss EDA tools Equipments Cost …
2
Motivation Naturally embraces the immaturity of 3D integration TSV size/pitch
9 Coarse-grained die-to-die interconnect only
Thermal
9 Inherently low power and less heat
Yield loss
9 Easy to achieve very high defect tolerance
EDA tools
9 Minimal departure from 2D design
Equipments
9 Big $$$ market
Cost
9 Higher-end, definitely not commodity
3
Overall performance
Why 3D Processor-DRAM Integration
Memory Wall & Bandwidth Wall Time (Dr. Phil Emma @ IBM)
Move more memory closer to processor cores at minimal extra cost!
3D Processor-DRAM Integration 4
Why 3D Processor-DRAM Integration Almost no yield loss 2D design know-how
Coarse-grained TSVs DRAM dies
Thermal friendly
Processor die
Justifiable cost
To break the memory & bandwidth wall! Quantitatively evaluate the potential 5
Outline Motivation 3D DRAM Architecture Design 3D Processor-DRAM Integration Conclusions
6
3D DRAM Architecture Design Stacked commodity DRAM dies Processor die
L2 cache ⇔ main memory Bandwidth
Latency
Area CACTI 5 Î 1Gb 2D DRAM @ 65nm Latency Energy
7
3D DRAM Architecture Design Stacked Commodity DRAM Î Customized 3D DRAM
At which granularity should we carry out 3D mapping Intra-sub-array 3D mapping
Fine-grained TSVs
Inter-sub-array 3D mapping
Coarse-grained TSVs 8
Inter-Sub-Array 3D Mapping
TSV I/Os
Top view
9
3D Sub-Array Set Distributed across dies 2D sub-array
Data bus
Address bus 2D sub-array 2D sub-array
TSVs bundle Multi-layer data access (MLDA)
Single-layer data access (SLDA)
All 2D sub-arrays are activated
Only one 2D sub-array is activated
Each handles a portion of data
One 2D sub-array handles all data
TSVs
Energy
TSVs
Energy
10
3D DRAM Architecture Design Inter-sub-array 3D mapping Small number of TSVs (1K~10K) Intact individual DRAM sub-array design Distributed global routing Î performance gain Modified CACTI 5 to support inter-sub-array 3D mapping Case study: 1Gb with 8 banks and 256-bit I/O @ 65nm
2D
vs.
3D die packaging (i.e., no TSVs)
SLDA
vs.
3D DRAM MLDA 11
12
Defect Tolerance One more dimension for redundancy repair
Heterogeneous 3D DRAM Stacked Commodity DRAM Î Customized 3D DRAM Heterogeneous 3D-DRAM L2 cache + main memory structure Each core has its private 2D-SRAM L1 cache & 3D-DRAM L2 cache DRAM density vs. speed trade-off
Density
Density Sub-Array
Sub-Array
Speed
Speed
Integrate both high-threshold & low-threshold MOSFETs 17
Evaluation M5 full system simulator with Linux (U. of Mich.) Four 4.0GHz cores with 8-layer 3D-DRAM at 45nm node ¾ 3D-DRAM L2 cache per core: 2MB ¾ 3D-DRAM main memory: 1GB
Processor Die
Baseline
Core w/ L1
Core w/ L1
Core w/ L1
Core w/ L1
Without multi-Vt
With multi-Vt
18
Instruction Per Cycle (IPC) Gain over Baseline
19
One Step Further
Decentralized distributed main memory structure Fastlane between L2 cache and its closest main memory block
Reduced L2 cache miss penalty 20
One Step Further
21
Conclusions 3D multi-core processor DRAM integration 3D DRAM Design Simple but effective inter-sub-array 3D mapping strategy Simple but effective 3D redundancy repair Good memory performance gain Integration of processor and 3D DRAM Heterogeneous 3D DRAM architecture Great computing system performance gain 22