Multi-Core Microprocessor Chips: Motivation & Challenges

Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large Digital Enterprise Group Intel Corporation May 2...
Author: Clemence Ward
1 downloads 0 Views 2MB Size
Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large Digital Enterprise Group Intel Corporation May 2006

Copyright © 2006 Intel Corporation.

2006 Intel Distinguished Lecture

Agenda y y y y y y y

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary

©2006, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

www.intel.com/education

2006 Intel Distinguished Lecture

Intel only: On-time “2-year-cycle” 180nm

130nm

90nm

65nm

45nm

Wafer Size (mm):

200

200/300

300

300

300

1st Production:

1999

2001

2003

2005

2007

35nm LG NiSi Strain Si 8 Cu Low-k

Details Coming!

Transistors:

SiGe

SiGe

Interconnects:

100nm LG CoSi2

70nm LG CoSi2

6 Al SiOF

6 Cu SiOF

50nm LG NiSi Strain Si 7 Cu Low-k

45 nm Logic Process on Track for Delivery in 2007 Process Name

P1262

P1264

P1266

P1268

Lithography

90 nm

65 nm

45 nm

32 nm

2003

2005

2007

2009

1st Production

Moore's Law continues! Intel continues to develop a new technology generation every 2 years

Intel 11th EMEA Academic Forum

Historical Driving Forces Increased Performance via Increased Frequency

Shrinking Geometry

100000 100000

10 10

10000 10000

Frequency (MHz)

Feature Size (um)

1000 1000 100 100

1 1

0.1 0.1

10 10 1 1 1970 1970

1971

1980 1980

1990 1990

1978

2000 2000

2010 2010

2020 2020

1985

0.01 0.01 1970 1970

1993

1980 1980

1990 1990

2000 2000

2005

2010 2010

4004 Processor 8008 Processor i386 Processor Pentium Processor Montecito 2300 Transistors IBM PC 3.1M transistors 1.7B Transistors 32-bit

2020 2020

The Challenges Power Limitations

Diminishing Voltage Scaling

1000

10 0.7um 0.7um 0.5um 0.5um 0.35um 0.35um

CPU Power 100 (W)

10 1990

Supply 1 Voltage (V)

~30%

0.25um 0.25um 0.18um 0.18um 0.13um 0.13um 90nm 90nm 65nm 65nm 45nm 45nm 30nm 30nm

0.1

1995

2000

2005

2010

2015

1990 1993 1997 2001 2005 2009

Power = Capacitance x Voltage2 x Frequency also Power ~ Voltage3

Agenda y y y y y y y

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary

©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

www.intel.com/education

2006 Intel Distinguished Lecture

Design Challenges y Memory latency not scaling as fast as processor speed y Power growing non-linearly with single thread performance y Designer productivity lagging design complexity y Ability to validate and test complex design y Keeping up with new process technology every two years

www.intel.com/education

2006 Intel Distinguished Lecture

Peak Instructions during DRAM Access

Long Latency DRAM Accesses: Needs Latency Tolerant Techniques 1400 1200 1000 800 600 400 200 0 Pentium® Processor 66 MHz

Pentium-Pro Pentium III Processor Processor 1100 MHz 200 MHz www.intel.com/education

Pentium 4 Processor 2 GHz

Future CPUs

2006 Intel Distinguished Lecture

DRAM Latency Tolerance y Continue building even larger caches – Every semiconductor process generation provides opportunity to double cache size – Cache becomes larger part of die

y Hide multiple threads of execution behind memory latency y Intel implemented simultaneous multithreading in 2000 y Implement multi-core products as Moore’s Law allows www.intel.com/education

2006 Intel Distinguished Lecture

Agenda y y y y y y y

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary

©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

www.intel.com/education

2006 Intel Distinguished Lecture

Situational Analysis y With Each Process Generation transistor density doubles – – – –

Frequency has increased by ~1.5X; ~1.3x in future Vcc has scaled by about ~0.8x; ~0.9x in future Capacitance has scaled by 0.7x Total power may not scale down due to increased leakage

y Instruction Level Parallelism harder to find y Increasing single-stream performance often requires non-linear increase in design complexity y Many server applications are inherently parallel y Parallelism exists in multimedia applications y Multi-tasking usage models becoming popular www.intel.com/education

2006 Intel Distinguished Lecture

Processor Power

www.intel.com/education

2006 Intel Distinguished Lecture

Design Complexity and Productivity factors y Huge transistor budgets stress ability to design and verify complex chips y Multi-core fits well with increasing transistor budgets y Multi-core design addresses density/designer gap

www.intel.com/education

2006 Intel Distinguished Lecture

Agenda y y y y y y y

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary

©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

www.intel.com/education

2006 Intel Distinguished Lecture

Iron Law of Performance y Execution Time is the product of – – –

Path Length Cycles Per Instruction (CPI) Cycle Time

y CPI is the sum of – infinite-cache core cpi – miss rate * effective memory latency

y Bad (good) news is that performance does not scale up (down) linearly with frequency www.intel.com/education

2006 Intel Distinguished Lecture

The Magic of Voltage Scaling y Power = Capacitance * Voltage2 * Frequency y Frequency α Voltage in region of interest y Power increases as the cube of Frequency y Good news is that voltage scaling works y 10% reduction in voltage yields – 10% reduction in frequency – 30% reduction in power – less than 10% reduction in performance

www.intel.com/education

2006 Intel Distinguished Lecture

Simple Dual Core Example y Assume Single Core processor at 100W – 80W for core, 20W for cache and I/O – 50% die are is core

y Dual core within same power envelop – 20W for I/O and cache – 40W per core – Die size increases by 50% – Reduce voltage by 21% to reduce core power to 40W – Frequency reduces by ~20% – Single thread perf reduces by ~15% – Throughput increases by 70-80% www.intel.com/education

2006 Intel Distinguished Lecture

Possible Improvements y Develop new power efficient core – E.g. extensive clock gating – Big power savings with little or no performance loss

y Design a smaller core with lower performance – Area and power savings much greater than performance loss – Use larger number of cores

y Adjust frequency and power of each core with load factor – Inactive cores can be put in sleep mode – Maintain overall die power constant www.intel.com/education

2006 Intel Distinguished Lecture

A New Era… THE OLD Performance Equals Frequency Unconstrained Power Voltage Scaling

THE NEW Performance Equals IPC Multi-Core Power Efficiency Microarchitecture Advancements

Intel Core Micro-architecture Five Key Innovations Intel® Wide Dynamic Execution

Intel® Intelligent Power Capability

Intel® Advanced Digital Media Boost

Intel® Smart Memory Access

Intel® Advanced Smart Cache

Multi-Core Trajectory

Quad-Core Dual-Core

2H 2006

1H 2007

Architecture Transitions

2 YEARS

2 YEARS

2 YEARS

Microprocessor Design Model Shrink/Derivative PRESLER · YONAH · DEMPSEY

65nm New Microarchitecture MEROM · CONROE · WOODCREST

1. One micro-architecture for all high volume market segments

Shrink/Derivative PENRYN

45nm New Microarchitecture NEHALEM

2. Optimized for performance/watt 3. Parallel design teams 4. No waiting on new process technology

Shrink/Derivative NEHALEM-C

32nm New Microarchitecture GESHER

PRINCIPLES

5. Chipset cadence offset for fast ramp

OBJECTIVE: Sustained Technology Leadership

Agenda y y y y y y y

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary

©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

www.intel.com/education

2006 Intel Distinguished Lecture

Possible Evolution y Transistor density doubles with each process generation y New generation enables complex new core y Possible alternative design point – Double the cache capacity in same area – Double the number of processor cores – Frequency improves with process technology Core

Cache

90 nm

Core

Core

Core

Core

Core

Core

2 x Cache

4 x Cache

65 nm

45 nm

www.intel.com/education

2006 Intel Distinguished Lecture

Ramping Multi-core Everywhere 2005 Desktop Mainstream/Performance

Mobile

Shipping

2006*

2007*

>70%

>90%

Desktop Client

Shipping

>70%

>90%

Mobile Client

Shipping

>85%

~100%

Mainstream/Performance

Server

Server & Workstation

* Data is projected run rate exiting the year. Source: Intel

Expect to ship >60 million multi-core processors by end of 2006 All products and dates are preliminary and subject to change without notice.

CMP Challenges y How much Thread Level Parallelism is there in most workloads? y Ability to generate code with lots of threads & performance scaling y Thread synchronization y Operating systems for parallel machines y Single thread performance tradeoff y Power limitations y On-chip interconnect/cache infrastructure y Memory and I/O bandwidth required www.intel.com/education

2006 Intel Distinguished Lecture

Intel’s Software Tools and Support Thread Checker Thread Profiler

Math Kernel Libraries Performance Primitives

Solutions, Blueprints, Sizing/Scaling Guides

Driver Optimization Labs Drivers

Solution Services Developer Services

Compilers

VTune™ Analyzers

Software College Early Access Programs

www.intel.com/education

2006 Intel Distinguished Lecture

How Many Cores? y Where does the doubling stop? – Driven by software issues

y Today Microsoft Windows supports only 64 threads! y How many applications scale to 64 threads? y How well does performance scale with thread count?

www.intel.com/education

2006 Intel Distinguished Lecture

Agenda y y y y y y y

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary

©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

www.intel.com/education

2006 Intel Distinguished Lecture

Looking Beyond CMP y How far do we push the number of general purpose cores? y Is there are role for application specific engines? y Programming model for heterogeneous cores

www.intel.com/education

2006 Intel Distinguished Lecture

Improving Power Efficiency penalty: Specialized MIPS

Performance

>3% performance for 1% in power

4 CMP *CMP = Symmetric General Purpose (GP) cores

C 2 CMP

* P M

parallelism

3 CMP

~1% performance for 1% in power

One processor

C

penalty:

sign e d l a tion n e v n o

1% performance for 3% in power

Power www.intel.com/education

2006 Intel Distinguished Lecture

Application Specific Engines y Can achieve better power efficiency than general purpose cores y Simpler design due to targeted application and lack of support for full operating system y Challenge – Needs to support high volume application – Reconfigurable?

y Graphics and Multimedia engines are good candidates www.intel.com/education

2006 Intel Distinguished Lecture

Agenda y y y y y y y

Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary

©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others

www.intel.com/education

2006 Intel Distinguished Lecture

Summary y One billion transistors are here already! y Chip Level Multiprocessing and large caches can exploit Moore’s Law y Amount of parallelism in future microprocessor systems will increase y Heterogeneous cores may emerge eventually y Need applications and tools that can exploit parallelism y Design challenges and software issues remain Collaborate, Innovate, Lead! www.intel.com/education

2006 Intel Distinguished Lecture

Closing Thought “Don’t be encumbered by past history, go off and do something wonderful.” - Robert Noyce Intel Co-founder www.intel.com/education

2006 Intel Distinguished Lecture