Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large Digital Enterprise Group Intel Corporation May 2006
Copyright © 2006 Intel Corporation.
2006 Intel Distinguished Lecture
Agenda y y y y y y y
Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary
©2006, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
www.intel.com/education
2006 Intel Distinguished Lecture
Intel only: On-time “2-year-cycle” 180nm
130nm
90nm
65nm
45nm
Wafer Size (mm):
200
200/300
300
300
300
1st Production:
1999
2001
2003
2005
2007
35nm LG NiSi Strain Si 8 Cu Low-k
Details Coming!
Transistors:
SiGe
SiGe
Interconnects:
100nm LG CoSi2
70nm LG CoSi2
6 Al SiOF
6 Cu SiOF
50nm LG NiSi Strain Si 7 Cu Low-k
45 nm Logic Process on Track for Delivery in 2007 Process Name
P1262
P1264
P1266
P1268
Lithography
90 nm
65 nm
45 nm
32 nm
2003
2005
2007
2009
1st Production
Moore's Law continues! Intel continues to develop a new technology generation every 2 years
Intel 11th EMEA Academic Forum
Historical Driving Forces Increased Performance via Increased Frequency
Shrinking Geometry
100000 100000
10 10
10000 10000
Frequency (MHz)
Feature Size (um)
1000 1000 100 100
1 1
0.1 0.1
10 10 1 1 1970 1970
1971
1980 1980
1990 1990
1978
2000 2000
2010 2010
2020 2020
1985
0.01 0.01 1970 1970
1993
1980 1980
1990 1990
2000 2000
2005
2010 2010
4004 Processor 8008 Processor i386 Processor Pentium Processor Montecito 2300 Transistors IBM PC 3.1M transistors 1.7B Transistors 32-bit
2020 2020
The Challenges Power Limitations
Diminishing Voltage Scaling
1000
10 0.7um 0.7um 0.5um 0.5um 0.35um 0.35um
CPU Power 100 (W)
10 1990
Supply 1 Voltage (V)
~30%
0.25um 0.25um 0.18um 0.18um 0.13um 0.13um 90nm 90nm 65nm 65nm 45nm 45nm 30nm 30nm
0.1
1995
2000
2005
2010
2015
1990 1993 1997 2001 2005 2009
Power = Capacitance x Voltage2 x Frequency also Power ~ Voltage3
Agenda y y y y y y y
Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary
©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
www.intel.com/education
2006 Intel Distinguished Lecture
Design Challenges y Memory latency not scaling as fast as processor speed y Power growing non-linearly with single thread performance y Designer productivity lagging design complexity y Ability to validate and test complex design y Keeping up with new process technology every two years
www.intel.com/education
2006 Intel Distinguished Lecture
Peak Instructions during DRAM Access
Long Latency DRAM Accesses: Needs Latency Tolerant Techniques 1400 1200 1000 800 600 400 200 0 Pentium® Processor 66 MHz
Pentium-Pro Pentium III Processor Processor 1100 MHz 200 MHz www.intel.com/education
Pentium 4 Processor 2 GHz
Future CPUs
2006 Intel Distinguished Lecture
DRAM Latency Tolerance y Continue building even larger caches – Every semiconductor process generation provides opportunity to double cache size – Cache becomes larger part of die
y Hide multiple threads of execution behind memory latency y Intel implemented simultaneous multithreading in 2000 y Implement multi-core products as Moore’s Law allows www.intel.com/education
2006 Intel Distinguished Lecture
Agenda y y y y y y y
Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary
©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
www.intel.com/education
2006 Intel Distinguished Lecture
Situational Analysis y With Each Process Generation transistor density doubles – – – –
Frequency has increased by ~1.5X; ~1.3x in future Vcc has scaled by about ~0.8x; ~0.9x in future Capacitance has scaled by 0.7x Total power may not scale down due to increased leakage
y Instruction Level Parallelism harder to find y Increasing single-stream performance often requires non-linear increase in design complexity y Many server applications are inherently parallel y Parallelism exists in multimedia applications y Multi-tasking usage models becoming popular www.intel.com/education
2006 Intel Distinguished Lecture
Processor Power
www.intel.com/education
2006 Intel Distinguished Lecture
Design Complexity and Productivity factors y Huge transistor budgets stress ability to design and verify complex chips y Multi-core fits well with increasing transistor budgets y Multi-core design addresses density/designer gap
www.intel.com/education
2006 Intel Distinguished Lecture
Agenda y y y y y y y
Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary
©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
www.intel.com/education
2006 Intel Distinguished Lecture
Iron Law of Performance y Execution Time is the product of – – –
Path Length Cycles Per Instruction (CPI) Cycle Time
y CPI is the sum of – infinite-cache core cpi – miss rate * effective memory latency
y Bad (good) news is that performance does not scale up (down) linearly with frequency www.intel.com/education
2006 Intel Distinguished Lecture
The Magic of Voltage Scaling y Power = Capacitance * Voltage2 * Frequency y Frequency α Voltage in region of interest y Power increases as the cube of Frequency y Good news is that voltage scaling works y 10% reduction in voltage yields – 10% reduction in frequency – 30% reduction in power – less than 10% reduction in performance
www.intel.com/education
2006 Intel Distinguished Lecture
Simple Dual Core Example y Assume Single Core processor at 100W – 80W for core, 20W for cache and I/O – 50% die are is core
y Dual core within same power envelop – 20W for I/O and cache – 40W per core – Die size increases by 50% – Reduce voltage by 21% to reduce core power to 40W – Frequency reduces by ~20% – Single thread perf reduces by ~15% – Throughput increases by 70-80% www.intel.com/education
2006 Intel Distinguished Lecture
Possible Improvements y Develop new power efficient core – E.g. extensive clock gating – Big power savings with little or no performance loss
y Design a smaller core with lower performance – Area and power savings much greater than performance loss – Use larger number of cores
y Adjust frequency and power of each core with load factor – Inactive cores can be put in sleep mode – Maintain overall die power constant www.intel.com/education
2006 Intel Distinguished Lecture
A New Era… THE OLD Performance Equals Frequency Unconstrained Power Voltage Scaling
THE NEW Performance Equals IPC Multi-Core Power Efficiency Microarchitecture Advancements
Intel Core Micro-architecture Five Key Innovations Intel® Wide Dynamic Execution
Intel® Intelligent Power Capability
Intel® Advanced Digital Media Boost
Intel® Smart Memory Access
Intel® Advanced Smart Cache
Multi-Core Trajectory
Quad-Core Dual-Core
2H 2006
1H 2007
Architecture Transitions
2 YEARS
2 YEARS
2 YEARS
Microprocessor Design Model Shrink/Derivative PRESLER · YONAH · DEMPSEY
65nm New Microarchitecture MEROM · CONROE · WOODCREST
1. One micro-architecture for all high volume market segments
Shrink/Derivative PENRYN
45nm New Microarchitecture NEHALEM
2. Optimized for performance/watt 3. Parallel design teams 4. No waiting on new process technology
Shrink/Derivative NEHALEM-C
32nm New Microarchitecture GESHER
PRINCIPLES
5. Chipset cadence offset for fast ramp
OBJECTIVE: Sustained Technology Leadership
Agenda y y y y y y y
Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary
©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
www.intel.com/education
2006 Intel Distinguished Lecture
Possible Evolution y Transistor density doubles with each process generation y New generation enables complex new core y Possible alternative design point – Double the cache capacity in same area – Double the number of processor cores – Frequency improves with process technology Core
Cache
90 nm
Core
Core
Core
Core
Core
Core
2 x Cache
4 x Cache
65 nm
45 nm
www.intel.com/education
2006 Intel Distinguished Lecture
Ramping Multi-core Everywhere 2005 Desktop Mainstream/Performance
Mobile
Shipping
2006*
2007*
>70%
>90%
Desktop Client
Shipping
>70%
>90%
Mobile Client
Shipping
>85%
~100%
Mainstream/Performance
Server
Server & Workstation
* Data is projected run rate exiting the year. Source: Intel
Expect to ship >60 million multi-core processors by end of 2006 All products and dates are preliminary and subject to change without notice.
CMP Challenges y How much Thread Level Parallelism is there in most workloads? y Ability to generate code with lots of threads & performance scaling y Thread synchronization y Operating systems for parallel machines y Single thread performance tradeoff y Power limitations y On-chip interconnect/cache infrastructure y Memory and I/O bandwidth required www.intel.com/education
2006 Intel Distinguished Lecture
Intel’s Software Tools and Support Thread Checker Thread Profiler
Math Kernel Libraries Performance Primitives
Solutions, Blueprints, Sizing/Scaling Guides
Driver Optimization Labs Drivers
Solution Services Developer Services
Compilers
VTune™ Analyzers
Software College Early Access Programs
www.intel.com/education
2006 Intel Distinguished Lecture
How Many Cores? y Where does the doubling stop? – Driven by software issues
y Today Microsoft Windows supports only 64 threads! y How many applications scale to 64 threads? y How well does performance scale with thread count?
www.intel.com/education
2006 Intel Distinguished Lecture
Agenda y y y y y y y
Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary
©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
www.intel.com/education
2006 Intel Distinguished Lecture
Looking Beyond CMP y How far do we push the number of general purpose cores? y Is there are role for application specific engines? y Programming model for heterogeneous cores
www.intel.com/education
2006 Intel Distinguished Lecture
Improving Power Efficiency penalty: Specialized MIPS
Performance
>3% performance for 1% in power
4 CMP *CMP = Symmetric General Purpose (GP) cores
C 2 CMP
* P M
parallelism
3 CMP
~1% performance for 1% in power
One processor
C
penalty:
sign e d l a tion n e v n o
1% performance for 3% in power
Power www.intel.com/education
2006 Intel Distinguished Lecture
Application Specific Engines y Can achieve better power efficiency than general purpose cores y Simpler design due to targeted application and lack of support for full operating system y Challenge – Needs to support high volume application – Reconfigurable?
y Graphics and Multimedia engines are good candidates www.intel.com/education
2006 Intel Distinguished Lecture
Agenda y y y y y y y
Semiconductor Technology Evolution Design Challenges Why Multi-Core Processor Chips? Power/Performance Trade-Offs CMP Directions Beyond CMP Summary
©2005, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others
www.intel.com/education
2006 Intel Distinguished Lecture
Summary y One billion transistors are here already! y Chip Level Multiprocessing and large caches can exploit Moore’s Law y Amount of parallelism in future microprocessor systems will increase y Heterogeneous cores may emerge eventually y Need applications and tools that can exploit parallelism y Design challenges and software issues remain Collaborate, Innovate, Lead! www.intel.com/education
2006 Intel Distinguished Lecture
Closing Thought “Don’t be encumbered by past history, go off and do something wonderful.” - Robert Noyce Intel Co-founder www.intel.com/education
2006 Intel Distinguished Lecture