Elements of CPU performance

Elements of CPU performance     Cycle time. CPU pipeline. Superscalar design. Memory system. instructions cycles sec onds Texec = ( )( )( ) prog...

Author: Melinda Owen

6 downloads 0 Views 462KB Size

Report

Download PDF

Recommend Documents

CPU Performance Pipelined CPU

Benchmarking CPU Performance. Benchmarking CPU Performance. Benchmarking CPU Performance. Benchmarking CPU Performance

Benchmarking CPU Performance. Benchmarking CPU Performance

The CPU Performance Equation

III. ELEMENTS OF PERFORMANCE MANAGEMENT

CPU and COMMUNICATIONS PERFORMANCE ISSUES

Heterogeneous (CPU+GPU) Performance Libraries

TMS DMA and CPU: Data Access Performance

Performance models for CPU-GPU data transfers

AIX Performance Tuning CPU and Memory

Modern CPU Performance Analysis on Linux

CPU alternatives for future high-performance systems

Optimizing CPU Performance for Convolutional Neural Networks

Enhancing Computational Performance using CPU-GPU Integration

Checking Your Pervasive Server CPU Performance

Building CPU Stubs to Optimize CPU Bound Systems: An Application of Dynamic Performance Stubs

Importance of Explicit Vectorization for CPU and GPU Software Performance

Factors Influencing the Performance of a CPU-RFU Hybrid Architecture

TRENDS OF CPU, GPU AND FPGA FOR HIGH-PERFORMANCE COMPUTING

Example. Principles Of Computer Design. Example. CPU Performance Equation. Example

Methods for Emulation of Multi-Core CPU Performance

General structure of CPU

Elements of CPU performance    

Cycle time. CPU pipeline. Superscalar design. Memory system.

instructions cycles sec onds Texec = ( )( )( ) program instruction cycle

ARM7TDM CPU Core

ARM Cortex A-9 Microarchitecture

ARM Cortex A-9 MPcore

Pipelining 

Several instructions are executed simultaneously at different stages of completion.

I1

Fetch

I2

Decode

Execute

Write

Fetch

Decode

Execute

Write

Fetch

Decode

Execute

Write

Fetch

Decode

Execute

Write

Fetch

Decode

Execute

I3

I4

I5



Various conditions can cause pipeline bubbles that reduce utilization:   

branches; memory system delays; etc.

Write

ARM pipeline execution ARM7 has 3-stage pipes: fetch instruction from memory; decode opcode and operands; execute.

fetch sub r2,r3,r6

execute

fetch

decode

execute

fetch

decode

cmp r2,#3

1

add r0,r1,#5

decode

2

3

time

execute

Pipeline changes for ARM9TDMI

ARM10 and ARM11 pipelines

(superscalar design)

Performance measures  



Latency: time it takes for an instruction to get through the pipeline. Throughput: number of instructions executed per time period. Pipelining increases throughput without reducing latency. Assume a program with N, K-stage instructions Without pipeline: Texec = N*K With K-stage pipeline: Texec = K + (N-1)

K cycles for 1st instruction 1 cycle to complete each additional instruction

Speedup =

𝑁𝑁×𝐾𝐾 𝐾𝐾+(𝑁𝑁−1)

For large N 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 ≈ 𝐾𝐾

This assumes no pipeline stalls.

Pipeline stalls  

If every step cannot be completed in the same amount of time, pipeline stalls. Bubbles introduced by stall increase latency, reduce throughput.

ARM multi-cycle LDMIA instruction

ldmia fetch decodeex ld r2ex ld r3 r0,{r2,r3} sub r2,r3,r6 cmp r2,#3

fetch decode fetch

ex sub

decodeex cmp time

Control stalls 

Branches often introduce stalls (branch penalty). 

 

Stall time may depend on whether branch is taken.

May have to squash instructions that already started executing. Don’t know what to fetch until condition is evaluated.

ARM pipelined branch

bne foo

sub r2,r3,r6 foo add r0,r1,r2

fetch decode ex bne ex bne ex bne fetch decode fetch decode ex add time

Delayed branch 



To increase pipeline efficiency, delayed branch mechanism requires n instructions after branch always executed whether branch is executed or not. SHARC supports delayed and non-delayed branches.  

Specified by bit in branch instruction. 2 instruction branch delay slot.

Example: ARM execution time 

Determine execution time of FIR filter: for (i=0; i