Massively Parallel Electrically Aware Design

Massively Parallel Electrically Aware Design By David White, Sr. Group Director of R&D, and Xiao Lin, Senior Principal Software Engineer, Virtuoso, Cu...

Author: Gertrude Robertson

6 downloads 1 Views 2MB Size

Report

Download PDF

Recommend Documents

Massively Parallel NUMA-aware Hash Joins

Massively Parallel Artificial Intelligence

MASSIVELY PARALLEL SEQUENCING SERVICES

Massively Parallel Particle Laden Flow Simulations

Design-Aware Mask Inspection

Annotating COMPARA, a Grammar-aware Parallel Corpus

Model-based Application Development for Massively Parallel Embedded Systems

Massively parallel Monte Carlo simulation with graphics processing units (GPU)

Generation of Feasible Integer Solutions on a Massively Parallel Computer,

Massively Parallel Computing with CUDA. Antonino Tumeo Politecnico di Milano

PREDICTION OF RNA BASE PAIRING PROBABILITIES USING MASSIVELY PARALLEL COMPUTERS

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce

THE WHITE CELL has been adapted for massively parallel

Australian Renal Gene Panels by Massively Parallel Sequencing

Architecture and Parallel Algorithm Design

Design of Parallel Algorithms. Parallel Dense Matrix Algorithms

Design Issues for Peer-to-Peer Massively Multiplayer Online Games

DESIGN OF EFFICIENT PARALLEL RADIX 10 MULTIPLIER

Design and Implementation of Parallel Memory Architectures

Massively Multiplayer Online Environments

Multilayer Hex-Cells: A New Class of Hex-Cell Interconnection Networks for Massively Parallel Systems

Handling complex geometries in the massively parallel lattice Boltzmann framework walberla

Underlying Data for Sequencing the Mitochondrial Genome with the Massively Parallel Sequencing Platform Ion Torrent PGM

Enhanced killing of antibiotic-resistant bacteria enabled by massively parallel combinatorial genetics

Massively Parallel Electrically Aware Design By David White, Sr. Group Director of R&D, and Xiao Lin, Senior Principal Software Engineer, Virtuoso, Custom IC, and PCB Group, Cadence

In-design verification is opening new opportunities to shorten design cycles and maximize circuit performance. Whereas physical verification has traditionally required a tradeoff between accuracy and performance for larger designs, recent advances in large-scale distributed computing may offer an alternative. Cloud infrastructure needs are pushing the industry toward larger multi-core server architectures and massively parallel computing frameworks. To address the larger designs necessary for today’s market, massively parallel computing frameworks and in-design-based parallel extraction, enabled through tools such as Cadence® Virtuoso® Layout Suite for Electrically Aware Design (EAD), will allow random walk-based solvers to provide accurate parasitic extraction to support more sophisticated use models and enable better optimization and faster processing.

Contents Massive Multi-Core Server Infrastructures Enable Significant Advances in EDA ...............................1 Traditional Verification Forces a Choice Between Accuracy and Performance ......................................2 Massively Distributed Solutions Enable In-Design Performance with Golden Accuracy .......................2 Cadence Virtuoso Layout Suite Provides Deep and Accurate Analysis...............................3 The Future of Electrically Aware Design ....................................5 Conclusion ........................................6

Massive Multi-Core Server Infrastructures Enable Significant Advances in EDA Cloud-inspired computing infrastructure has accelerated the scaling of multi-core systems and reduced overall computing costs. Furthermore, cloudbased solutions now offer graphics processing unit (GPU) clusters for solving complex science and engineering problems on an increasingly larger scale. The result is a significant increase in the available compute performance for computationally heavy tasks that may take advantage of parallel software architectures. At the same time, there is a growing need in custom IC design to gain greater visibility into electrical behavior and to incrementally optimize IC layout as it is constructed. This new in-design verification paradigm is opening new opportunities to shorten design schedules and maximize performance. New advances in EDA solutions allow for more in-design extraction, analysis, and optimization across interconnect and devices—but the computational requirements have grown substantially as these solutions are asked to do more. As the industry increasingly adopts multi-core solutions with the associated reduction in computing costs, designers can leverage massively parallel processing architectures to provide high-accuracy, solver-based parasitic extraction as well as parasitic-aware simulation at earlier design stages. This article will explore how in-design parasitic extraction and high-accuracy field solvers are being implemented in a massively parallel framework to extract and analyze larger and larger designs. These approaches are opening the door to more sophisticated use models that leverage local and distributed parallelism and enable even greater in-design verification.

Massively Parallel Electrically Aware Design

Traditional Verification Forces a Choice Between Accuracy and Performance In the past, circuit design and layout engineers had to wait until the layout was completed and the design rule check (DRC) was clean before they could run it through their signoff flow. This flow required the finished layout to cycle through layout vs. schematic (LVS), DRC, parasitic extraction, simulation, and electromagnetic (EM) verification to review the effects of even a modest change to a handful of nets. Waiting until a layout is fully completed to learn of problems often resulted in a reduced set of potential solutions and added cycles to fix the issues. This flow also hampered the ability to do what-if analysis for incremental design decisions or optimization based on electrical design intent. In addition, a typical verification flow may consist of tools from different vendors with different data formats and different interpretations of silicon effects—designers must wait longer for data to be transferred from one tool to another and hope the tools interpret the data consistently. Also, when designing at advanced silicon nodes, it can be difficult to achieve alignment across separate tools from separate vendors to correctly extract connectivity, extract parasitics, simulate, and apply the correct EM rules. The most efficient way to address the impact of layout decisions on verification and simulation is to correct problems when they are created. Ideally the layout tool would be “electrically aware” and the in-design solutions fully integrated to ensure consistency and improve productivity. It would also be more efficient if all of the tools in the verification flow were fully integrated to ensure interoperability. This ensures that the tools are interpreting advanced node rules consistently and that the user can identify where in the flow an issue occurred. To address this, electrically aware design solutions were architected and built directly into Cadence Virtuoso Layout Suite for Electrically Aware Design (EAD). For lower-level signal nets, most of the analysis (e.g., EM checks) was quasi-real time, in that the connectivity and parasitic extraction was fast enough to keep up with layout engineers. However, as users began to perform more top-level and hierarchical full chip analysis, they needed additional functionality: • More compute power and higher parasitic accuracy on critical nets, with additional processing power for field solvers including random walk approaches • What-if analysis to tune or optimize the layout to meet electrical design intent and constraints • Electrically driven (or at least electrically assisted) placement and routing to reduce trial and error These combined requirements began to require more and more processing power beyond the original interactive solution, creating a need for computing frameworks to address the real-time, local processing needs and provide more distributed processing for deeper analysis and optimization.

Massively Distributed Solutions Enable In-Design Performance with Golden Accuracy Golden levels of extraction accuracy in massively distributed solutions can be achieved with random walk field solvers at much faster speeds than conventional finite-element and boundary-element-based field solvers. Random walk field solvers also allow the user to easily set the level of required accuracy versus performance with one simple convergence selection that can be tailored for an individual or set of nets. For applications such as automated EM-aware routing, compute performance is paramount and machine learning methods (e.g., neural networks, deep learning) offer a potential solution. These methods have the ability to learn complex nonlinear relationships like the input and output behavior of field solvers but are much faster, in some cases achieving a hundred nets per second on a single CPU. However, there is a slight tradeoff in accuracy to achieve that level of performance, so machine learning may suitable for some nets and not for others. While machine learning will no doubt have a large impact on design automation in the future, ideally the users of these tools would prefer as much as accuracy and performance as possible. Not all nets have equal impact on circuit performance, which raises questions about why past verification methodologies extracted the parasitics of all nets to the same level of accuracy. It is important that in-design methodologies allow extraction and simulation accuracy to be tailored based on the sensitivity of design performance to specific “critical” nets.

www.cadence.com

2

Massively Parallel Electrically Aware Design

Cadence Virtuoso Layout Suite Provides Deep and Accurate Analysis Virtuoso Layout Suite EAD offers a massively distributable solution created to address both a local, threaded interactive analysis and a distributed solution for deeper and more accurate analysis. This method provides a fully threaded parasitic extraction using a combination of random walk and machine-learning-based extraction on a net-by-net basis. This approach allows the simple selection of certain critical nets or net types to be extracted with the random walk solver and others that require less accuracy (e.g., power and ground nets) to be more rapidly extracted with machine learning based methods. The result is a tailored accuracy enabling the user to select where and how to trade off accuracy versus performance on a net-by-net basis within the same layout using net names or net types that match certain search criteria or pre-set constraints. Figure 1 shows how Virtuoso Layout EAD is extended using a massively parallel framework in the Virtuoso Layout Suite that combines multi-threading with large-scale distributed resource management across server clusters. This framework enables greater accuracy in in-design parasitic extraction, capacitance solving, and EM checking for larger designs, including top-level analysis through hierarchy. Users can run Virtuoso Layout EAD solutions across a distributed server farm without impacting their current Virtuoso session.

Interactive Use Model

Seconds to Minutes

VIRTUOSO LAYOUT EAD Virtuoso Layout EAD session running on a multi-core machine

Results and Analysis

Distributed Use Model: Minutes to Hours

Massively Distributed Framework Multi-Core Server Cluster

Supports Virtuoso Layout EAD session running on a multi-core machine

Supports Virtuoso Layout EAD session running on a multi-core machine

Supports Virtuoso Layout EAD session running on a multi-core machine

Supports Virtuoso Layout EAD session running on a multi-core machine

Supports Virtuoso Layout EAD session running on a multi-core machine

Supports Virtuoso Layout EAD session running on a multi-core machine

Supports Virtuoso Layout EAD session running on a multi-core machine

Supports Virtuoso Layout EAD session running on a multi-core machine

Figure 1: EAD now provides a massively parallel engine for RC extraction and solver

This distributed framework allows for high-accuracy solvers like random walk methods to be extended for use on full designs. The user may select certain critical nets to extract with the random walk solver and other nets (e.g., power nets) that require lesser accuracy for fast extraction. The result is a tailored accuracy where the user can select where and how to trade off accuracy versus performance on a net by net basis within the same partial layout. The achievable performance is shown in Figures 2 and 3. This framework uses the Virtuoso HPC random walk solver to fully extract capacitance in 0.7 hours (42 minutes) for a test case with over 30,000 nets across all hierarchical levels. For these benchmarks, the processors used are Xeon CPUs (E5-2697) operating at 2.60GHz. Based on typical customer preferences, the Virtuoso HPC solver was set to 2% convergence for the signal nets and 5% for the power/ ground nets. Whereas this framework could be extended to use more cores (84) and servers (6), the goal here is to focus on realizable levels of computing resources that a user may have available. Also, using only the random walk solver on the full design provides a lower bound on performance, whereas using machine learning extraction for some portion of the 30,000 nets would provide even faster performance.

www.cadence.com

3

Massively Parallel Electrically Aware Design

Figure 2 shows a plot of the distributed processing performance cores for the full 30,000 net design as a function of the number of cores from one to 84, using up to six servers with 14 cores each. The performance is shown as bar chart in terms of both wall clock hours (left axis) and the increase in speed over that for a single core (right axis). 45 40

60

55 38.8

50

30

40

Hours

25 32

30

20 15

20 15

10

7.5 10

5

5 1

2.5

1.2

Speed-Up Over Single CPU

35

0.7

0

0 1

6

18

42

Number of Cores

84

Hours

Speed-up

Figure 2: Distributed processing performance over threads

To provide a number of analysis test points that could map to more common server infrastructures, the x-axis shows the number of cores that represent a server and core combination. The axis in Figure 2 corresponds to the following analysis points: a single CPU, six servers using one core (six cores total), six servers using three cores (18 cores total), six servers using seven cores (42 cores total) and six servers using 14 cores (84 cores total). While other server and core combinations may yield slightly different results, the plot shown here provides a general guideline. Figure 3 shows a plot of the distributed processing performance for the full design of over 30,000 nets as a function of the number of servers from one to six, using 14 cores each. The performance is shown as bar chart in terms of both wall clock hours (axis on the left) and the increase in speed over that for a single server using 14 cores (axis on the right). As the two figures clearly show, Virtuoso Layout Suite enables designers to achieve “golden” levels of parasitic accuracy and electrical analysis for a large design during in-design use. 3.5

3.2

5.0

4.6

4.0

Hours

2.5

3.5

3.2

2

3.0

1.8

1.5

2.4 1.8

2.5

1.3

2.0

1.0

1

1.5

1.0

0.7

1.0

0.5

Speed-Up Over Single Server

4.5

3

0.5 0.0

0 1

2

3

Number of Servers

4

6

Hours

Speed-up

Figure 3: Distributed processing performance over servers

Although farms with multi-core servers are becoming more prevalent and economical, not every user will have 84 cores available (six servers with 14 core each) to use at a given time. Using the convenient “over lunch” and overnight as time metrics for batch use models, one server with 6 cores is sufficient to run the test case overnight, allowing the user to come in the next day with a full set of highly accurate RC parasitics and electrically aware analysis to review. For the “over-lunch” period of one hour, four servers with 14 cores each are required for golden accuracy and analysis of a full layout.

www.cadence.com

4

Massively Parallel Electrically Aware Design

Also consider that in-design solutions are incremental where they only extract the parts of the design impacted by a recent change, thus speeding the design process. Virtuoso Layout EAD determines the three-dimensional region to track the impact for one or more changes. While the performance results shown here are for extracting a full design, the majority of in-design edits and modifications will impact a smaller subset of nets within the larger layout and thus incremental extraction will provide even faster turnaround times.

The Future of Electrically Aware Design In the future, the in-design use model shown in Figure 4 will blend those tasks that can be done in quasi-real time on local systems like manual layout edits and those tasks like place and route optimization being performed on distributed systems. Given the complexity of placement and routing decisions, particularly at advanced nodes, the in-design methodology has started shifting from electrically aware to electrically driven design. Local tasks, as shown in the orange block of Figure 4, include manually driven layout and other tasks where the user expects zero lag and highly responsive analysis. This is typically used at lower levels of design. The extraction and analysis is confined to the region impacted by each subsequent modification. This use model prioritizes performance and responsiveness of the tool to ensure design productivity is improved. Massively Parallel Distribution Layout Creation in Virtuoso Layout EAD Local Session Fast extraction and electrically aware assistance

• • • •

Fast RC P2P R Linear Re-Sim Static EM

Electrically Driven Optimization Set of Design Alternatives Aligned with Intent

Intent Captured As Design Constraints

Cost Function Electrically Driven Placement

Device Extraction

Electrically Driven Routing

Device and Interconnect Extraction

Supports Virtuoso Layout EAD session running on a multi-core machine

Server

Option:

EAD Fast Extract or RW Solver

Cluster

Supports Virtuoso Layout EAD session running on a multi-core machine

Supports Virtuoso Layout EAD session running on a multi-core machine

Supports Virtuoso Layout EAD session running on a multi-core machine

Supports Virtuoso Layout EAD session running on a multi-core machine

Supports Virtuoso Layout EAD session running on a multi-core machine

Supports Virtuoso Layout EAD session running on a multi-core machine

Figure 4: In-design use model requires alignment of local and distributed processing solutions

For deeper levels of extraction, analysis, and optimization on larger sections of the design, a more powerful distributed processing framework may be required. This includes larger layouts or top-level extraction with deep hierarchy that may have been completed without incremental extraction or where analysis across multiple corners is desired. The random walk solvers and other computation-heavy tasks will be extended to run on GPUs as well as CPUs to take full advantage of all available hardware and efficiently scale solutions to new use models. With the increase in complexity of advanced silicon nodes, there is a growing need to support electrically driven placement and routing that can guarantee electrical design intent and designer-established constraints. Optimization engines can be used to drive the placement and routing based on a set of cost functions, but this approach requires heavy computation not normally available on a local machine. The system iterates through the layout net by net, ensuring the overall design meets the intended criteria across all corners. This approach can also be extended to include what-if analysis where the search space of potential placement or routing options can be explored in-design to find an electrically optimal set.

www.cadence.com

5

Massively Parallel Electrically Aware Design

Conclusion With advances in the performance of multi-core server hardware and distributed computing frameworks, the EDA industry is approaching a time when high-accuracy solvers and subsequent simulation and reliability analysis can be highly parallelized to handle even the largest designs in a relatively short time. Previously, golden-level parasitic accuracy was only available from field solvers that required significant compute times for just a handful of nets. The introduction of massively parallel computing frameworks and in-design based incremental extraction, enabled through tools such as Virtuoso Layout Suite, will allow random walk-based solvers to provide accurate parasitic extraction for larger and larger layouts in minutes. These frameworks will allow for greater optimization and what-if analysis of electrically driven placement and routing in custom IC design.

Cadence Design Systems enables global electronic design innovation and plays an essential role in the creation of today’s electronics. Customers use Cadence software, hardware, IP, and expertise to design and verify today’s mobile, cloud and connectivity applications. www.cadence.com © 2016 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo, and Virtuoso are registered trademarks of Cadence Design Systems, Inc. in the United States and other countries. All rights reserved. All other trademarks are the property of their respective owners. 6955 09/16 SC/JT/PDF