3D-DRAM Circuit Design, Modeling and Exploration for Computer Memory Hierarchy

3D-DRAM Circuit Design, Modeling and Exploration for Computer Memory Hierarchy Rakesh Anigu, Hongbin Sun, James J.-Q. Lu, Ken Rose, and Tong Zhang Ele...

Author: Rodger Sullivan

6 downloads 0 Views 548KB Size

Report

Download PDF

Recommend Documents

COSC 6385 Computer Architecture - Memory Hierarchy Design (II)

Computer Architecture. Chapter 5: Memory Hierarchy

Memory Hierarchy and Cache. Memory Hierarchy and Cache

ECE4680 Computer Organization and Architecture. Memory Hierarchy: Cache System

Memory Hierarchy: Caches, Virtual Memory

Lecture 9: Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy. Who Cares about Memory Hierarchy?

Memory Hierarchy. Introduction. Goal of Memory Hierarchy. Locality

Chapter 7. Memory Hierarchy

Overview. Memory Hierarchy

Memory hierarchy. Outline: memory hierarchy basics on-chip RAM and caches memory management operating systems

A typical memory hierarchy

Introduction. Memory Hierarchy

OBJECTIVE. 1. Understanding Memory Protection 2. Understanding Memory Coherency ADVANCED COMPUTER ARCHITECTURE LESSON 22: MEMORY HIERARCHY

The Memory Hierarchy

Exploiting Memory Hierarchy

The Memory Hierarchy

Computer Architecture Lecture 3: Memory Hierarchy Design (Chapter 2, Appendix B)

Magnetics Designer. Personal Computer Circuit Design Tools

Lecture 12: Memory Hierarchy Design Contd. Cache Problem

Large and Fast: Exploiting Memory Hierarchy

Why memory hierarchy? CS2410: Computer Architecture. L1 cache design. Sangyeun Cho. Computer Science Department University of Pittsburgh

Lecture 15: Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 8: Memory Hierarchy and Cache

14. Caches & The Memory Hierarchy

3D-DRAM Circuit Design, Modeling and Exploration for Computer Memory Hierarchy Rakesh Anigu, Hongbin Sun, James J.-Q. Lu, Ken Rose, and Tong Zhang Electrical, Computer and Systems Engineering Department Rensselaer Polytechnic Institute

Motivation TSV size/pitch but…

Thermal Yield loss EDA tools Equipments Cost …

2

Motivation Naturally embraces the immaturity of 3D integration TSV size/pitch

9 Coarse-grained die-to-die interconnect only

Thermal

9 Inherently low power and less heat

Yield loss

9 Easy to achieve very high defect tolerance

EDA tools

9 Minimal departure from 2D design

Equipments

9 Big $$$ market

Cost

9 Higher-end, definitely not commodity

3

Overall performance

Why 3D Processor-DRAM Integration

Memory Wall & Bandwidth Wall Time (Dr. Phil Emma @ IBM)

Move more memory closer to processor cores at minimal extra cost!

3D Processor-DRAM Integration 4

Why 3D Processor-DRAM Integration Almost no yield loss 2D design know-how

Coarse-grained TSVs DRAM dies

Thermal friendly

Processor die

Justifiable cost

To break the memory & bandwidth wall! Quantitatively evaluate the potential 5

Outline Motivation 3D DRAM Architecture Design 3D Processor-DRAM Integration Conclusions

6

3D DRAM Architecture Design Stacked commodity DRAM dies Processor die

L2 cache ⇔ main memory Bandwidth

Latency

Area CACTI 5 Î 1Gb 2D DRAM @ 65nm Latency Energy

7

3D DRAM Architecture Design Stacked Commodity DRAM Î Customized 3D DRAM

At which granularity should we carry out 3D mapping Intra-sub-array 3D mapping

Fine-grained TSVs

Inter-sub-array 3D mapping

Coarse-grained TSVs 8

Inter-Sub-Array 3D Mapping

TSV I/Os

Top view

9

3D Sub-Array Set Distributed across dies 2D sub-array

Data bus

Address bus 2D sub-array 2D sub-array

TSVs bundle Multi-layer data access (MLDA)

Single-layer data access (SLDA)

All 2D sub-arrays are activated

Only one 2D sub-array is activated

Each handles a portion of data

One 2D sub-array handles all data

TSVs

Energy

TSVs

Energy

10

3D DRAM Architecture Design Inter-sub-array 3D mapping Small number of TSVs (1K~10K) Intact individual DRAM sub-array design Distributed global routing Î performance gain Modified CACTI 5 to support inter-sub-array 3D mapping Case study: 1Gb with 8 banks and 256-bit I/O @ 65nm

2D

vs.

3D die packaging (i.e., no TSVs)

SLDA

vs.

3D DRAM MLDA 11

12

Defect Tolerance One more dimension for redundancy repair

Sub-Array Sub-Array

Sub-Array Redundancy x Redundancy

Redundancy

Inter-die inter-sub-array redundancy repair 13

Inter-Die Inter-Sub-Array Redundancy Repair

1024x256 sub-array, defect density: 0.05%, repair-most algorithm

14

Outline Motivation 3D DRAM Architecture Design 3D Processor-DRAM Integration Conclusions

15

Current Design Practice Core w/ L1

Core w/ L1

Shared L2 Cache (SRAM) L2 capacity & L1↔L2 bandwidth

Core w/ L1

Core w/ L1

Core w/ L1

Core w/ L1

3D Integration

DDRx

Commodity DRAM

channel

L2 ↔ main memory bandwidth

High-density DRAM High-speed DRAM

16

Heterogeneous 3D DRAM Stacked Commodity DRAM Î Customized 3D DRAM Heterogeneous 3D-DRAM L2 cache + main memory structure Each core has its private 2D-SRAM L1 cache & 3D-DRAM L2 cache DRAM density vs. speed trade-off

Density

Density Sub-Array

Sub-Array

Speed

Speed

Integrate both high-threshold & low-threshold MOSFETs 17

Evaluation M5 full system simulator with Linux (U. of Mich.) Four 4.0GHz cores with 8-layer 3D-DRAM at 45nm node ¾ 3D-DRAM L2 cache per core: 2MB ¾ 3D-DRAM main memory: 1GB

Processor Die

Baseline

Core w/ L1

Core w/ L1

Core w/ L1

Core w/ L1

Without multi-Vt

With multi-Vt

18

Instruction Per Cycle (IPC) Gain over Baseline

19

One Step Further

Decentralized distributed main memory structure Fastlane between L2 cache and its closest main memory block

Reduced L2 cache miss penalty 20

One Step Further

21

Conclusions 3D multi-core processor DRAM integration 3D DRAM Design Simple but effective inter-sub-array 3D mapping strategy Simple but effective 3D redundancy repair Good memory performance gain Integration of processor and 3D DRAM Heterogeneous 3D DRAM architecture Great computing system performance gain 22