CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture Instruction Set Architecture III Dr. Bill Young Department of Computer Sciences University of Texas at A...
Author: Bryce Phillips
0 downloads 2 Views 787KB Size
CS429: Computer Organization and Architecture Instruction Set Architecture III Dr. Bill Young Department of Computer Sciences University of Texas at Austin

Last updated: January 13, 2017 at 08:54

CS429 Slideset 8: 1

Instruction Set Architecture III

Controlling Program Execution

We can now generate programs that execute linear sequences of instructions Access registers and storage Perform computations

But what about loops, conditions, etc.? Need ISA support for: comparing and testing data values directing program control jump to some instruction that isn’t just the next one in sequence Do so based on some condition that has been tested.

CS429 Slideset 8: 2

Instruction Set Architecture III

Processor State (x86-64, Partial) Information about currently executing program.

Temporary data (%rax, ...) Location of runtime stack (%rsp) Location of current code control point (%rip) Status of recent tests (CF, ZF, SF, OF)

Registers %rax %rbx %rcx %rdx %rsi %rdi %rsp %rbp

%r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 Instruction pointer

%rip CF

CS429 Slideset 8: 3

ZF

SF

OF

Condition codes

Instruction Set Architecture III

PC-relative Addressing

In general, you shouldn’t use %rip as a general purpose register. However, the compiler may generate PC-relative addressing. jmp

0 x10(% r i p )

The effective address for a PC-relative instruction address is the offset parameter added to the address of the next instruction. This offset is signed to allow reference to code both before and after the instruction. Can you guess why the compiler might generate such code?

CS429 Slideset 8: 4

Instruction Set Architecture III

Condition Codes (Implicit Setting) Single bit registers CF: carry flag (for unsigned) ZF: zero flag SF: sign flag (for signed) OF: overflow flag (for signed) Implicitly set by arithmetic operations E.g., addq Src, Dest C analog: t = a + b; CF set if carry out from most significant bit (unsigned overflow) ZF set if t == 0 SF set if t < 0 (as signed) OF set if two’s complement overflow: (a>0 && b>0 && t0 && (a-b) # Z e r o r e s t o f %r a x

CS429 Slideset 8: 10

Instruction Set Architecture III

Jumping jX Instructions: Jump to different parts of the code depending on condition codes. jX jmp je jne js jns jg jge jl jle ja jb

Condition 1 ZF ~ZF SF ~SF ~(SF^OF)&~ZF ~(SF^OF) (SF^OF) (SF^OF)|ZF ~CF&~ZF CF CS429 Slideset 8: 11

Description Unconditional Equal / Zero Not equal / not zero Negative Nonnegative Greater (signed) Greater or equal (signed) Less (signed) Less or equal (signed) Above (unsigned) Below (unsigned) Instruction Set Architecture III

Conditional Branch Example (Old Style) Generation: gcc -Og -fno-if-conversion control.c

long a b s d i f f ( long x , long y ) { long r e s u l t ; if (x > y) r e s u l t = x−y ; else r e s u l t = y−x ; return result ; }

absdiff : cmpq jle movq subq retq .L4 : movq subq retq

Register %rdi %rsi %rax

CS429 Slideset 8: 12

%r s i , %r d i .L4 %r d i , %r a x %r s i , %r a x

# x:y

# x y) r e s u l t = x−y ; else r e s u l t = y−x ; return result ; }

CS429 Slideset 8: 13

long a b s d i f f j ( long x , long y ) { long r e s u l t ; i n t n t e s t = x y ? x−y : y−x ;

Goto Version: n t e s t = ! Test i f ( n t e s t ) goto E l s e ; v a l = Then Expr ; g o t o Done ; Else : val = Else Expr ; Done : ...

CS429 Slideset 8: 14

Create separate code regions for then and else expressions. Execute the appropriate one.

Instruction Set Architecture III

Conditional Move Instructions

Refer to generically as “cmovXX” Based on values of condition codes Conditionally copy value from source to destination. Can be used to eliminate conditional jump.

CS429 Slideset 8: 15

Instruction Set Architecture III

Conditional Move Instructions Inst. cmove cmovne cmovs cmovns cmovg cmovge cmovl cmovle cmova cmovae cmovb cmovbe

Synonym cmovz cmovnz

cmovnle cmovnl cmovnge cmovng cmovnbe cmovnb cmovnae cmovna

CS429 Slideset 8: 16

Description Equal / zero Not equal / not zero Negative Not negative Greater (signed) Greater or equal (signed) Less (signed) Less or equal (signed) Above (unsigned) Above or equal (unsigned) Below (unsigned) Below or equal (unsigned)

Instruction Set Architecture III

Using Conditional Moves Conditional Move Instructions Instruction supports: if (Test) Dest ← Src

C Code

Supported in post-1995 x86 processors

v a l = Test ? Then Expr : Else Expr

GCC tries to use them, but only when safe Why? Branches are very disruptive to instruction flow through pipelines.

Goto Version r e s u l t = Then Expr ; eval = Else Expr ; nt = ! Test ; i f ( nt ) r e s u l t = e v a l ; return result ;

Conditional moves do not require control transfer. CS429 Slideset 8: 17

Instruction Set Architecture III

Conditional Move Example long a b s d i f f ( long x , long y ) { long r e s u l t ; if (x > y) r e s u l t = x−y ; else r e s u l t = y−x ; return result ; } absdiff : movq subq movq subq cmpq cmovle retq

%r d i %r s i %r s i %r d i %r s i %rdx

, , , , , ,

%r a x %r a x %r d x %r d x %r d i %r a x

Register %rdi %rsi %rax

Use(s) Argument x Argument y return value

# x # r e s u l t = x−y # e v a l = y−x # x:y # i f 0 ? x ∗= 7 : x += 3 ;

Both values get computed Must be side effect free CS429 Slideset 8: 19

Instruction Set Architecture III

Do-While Loop Example A common compilation strategy is to take a C construct and rewrite it into a semantically equivalent C version that is closer to assembly. C Code:

Goto Version:

long pcount do ( unsigned long x ) { long r e s u l t = 0; do { r e s u l t += x & 0 x1 ; x >>= 1 ; } while (x) ; return result ; }

long pcount goto ( unsigned long x ) { loop : r e s u l t += x & 0 x1 ; x >>= 1 ; i f ( x ) goto loop ; return result ; }

Count number of 1’s in argument x (“popcount”) Use conditional branch to either continue looping or to exit loop CS429 Slideset 8: 20

Instruction Set Architecture III

Do-While Loop Compilation Goto Version: long pcount goto ( unsigned long x ) { loop : r e s u l t += x & 0 x1 ; x >>= 1 ; i f ( x ) goto loop ; return result ; }

movl .L2 : movq andl addq shrq jne retq

$0 , %e a x %r d i , %r d x $1 , %edx %rdx , %r a x $1 , %r d i .L2

CS429 Slideset 8: 21

Register %rdi %rax

Use(s) Argument x return value

# result = 0 # loop : # # # #

t = x & 0 x1 r e s u l t += t x >>= 1 i f ( x ) goto loop

Instruction Set Architecture III

General Do-While Translation

Goto Version:

C Code:

loop : Body i f ( Test ) goto loop ;

do Body w h i l e ( Test ) ;

Body can be any C statement, typically is a compound statement. Test is an expression returning an integer. If it evaluates to 0, that’s interpreted as false. If it evaluates to anything but 0, that’s interpreted as true.

CS429 Slideset 8: 22

Instruction Set Architecture III

General While Translation #1

“Jump-to-middle” translation Used with -Og Goto version goto t e s t ; loop : Body test : i f ( Test ) goto loop ; done :

While version w h i l e ( Test ) Body

CS429 Slideset 8: 23

Instruction Set Architecture III

While Loop Example #1 Jump to Middle

C Code long pcount while ( unsigned long s ) { long r e s u l t = 0; while (x) { r e s u l t += x & 0 x1 ; x >>= 1 ; } return result ; }

long pcount goto jtm ( unsigned long x ) { long r e s u l t = 0; goto t e s t ; loop : r e s u l t += x & 0 x1 ; x >>= 1 ; test : i f ( x ) goto loop ; return result ; }

Compare to do-while version of function Initial goto starts loop at test

CS429 Slideset 8: 24

Instruction Set Architecture III

General While Translation C Code

which gets compiled as if it were:

w h i l e ( Test ) Body

Goto Version

which is equivalent to:

( ! Test ) g o t o done ;

loop : Body i f ( Test ) goto loop ; done :

Do-While Version if

if

( ! Test ) g o t o done ;

do Body w h i l e ( Test ) ;

Are all three versions semantically equivalent?

done :

CS429 Slideset 8: 25

Instruction Set Architecture III

While Loop Example #2 Do-While version C Code long pcount while ( unsigned long x ) { long r e s u l t = 0; while (x) { r e s u l t += x & 0 x1 ; x >>= 1 ; } return result ; }

long pcount goto dw ( unsigned long x ) { long r e s u l t = 0; i f ( ! x ) g o t o done ; loop : r e s u l t += x & 0 x1 ; x >>= 1 ; i f ( x ) goto loop ; done : return result ; }

Compare to do-while version of function Initial conditional guards entrance to loop

CS429 Slideset 8: 26

Instruction Set Architecture III

For Loop Form General Form

Init

f o r ( I n i t ; T e s t ; Update ) Body

i = 0

Test #d e f i n e WSIZE 8∗ s i z e o f ( l o n g ) long pcount for ( unsigned long x ) { size t i ; long r e s u l t = 0; f o r ( i =0; i > i ) & 0 x1 ; r e s u l t += b i t ; } return result ; } CS429 Slideset 8: 27

i < WSIZE

Update i++

Body { unsigned b i t = ( x >> i ) & 0 x1 ; r e s u l t += b i t ; } Instruction Set Architecture III

For Loop to While Loop

For version f o r ( I n i t ; T e s t ; Update ) Body

translates to: While version Init ; w h i l e ( Test ) { Body ; Update ; }

CS429 Slideset 8: 28

Instruction Set Architecture III

For-While Conversion Example Init i = 0

Test i < WSIZE

Update i++

Body { unsigned b i t = ( x >> i ) & 0 x1 ; r e s u l t += b i t ;

long pcount for while ( unsigned long x ) { size t i ; long r e s u l t = 0; i = 0; w h i l e ( i < WSIZE ) { unsigned b i t = ( x >> i ) & 0 x1 ; r e s u l t += b i t ; i ++; } return result ; }

} CS429 Slideset 8: 29

Instruction Set Architecture III

For Loop Do-While Conversion C Code: Goto version long pcount for ( unsigned long x ) { size t i ; long r e s u l t = 0; f o r ( i =0; i > i ) & 0 x1 ; r e s u l t += b i t ; } return result ; }

Note that the initial test is not needed. Why? CS429 Slideset 8: 30

long pcount for goto dw ( unsigned long x ) { size t i ; long r e s u l t = 0; i = 0; if

( ! ( i < WSIZE ) ) g o t o done ;

# drop # drop

loop : { unsigned b i t = ( x >> i ) & 0 x1 ; r e s u l t += b i t ; } i ++; i f ( i < WSIZE ) goto loop ; done : return result ; } Instruction Set Architecture III

Switch Statement Example long switch eq ( long x , long y , long z ) { long w = 1; switch (x) { case 1: w = y∗z ; break ; case 2: w = y/z ; /∗ F a l l t h r o u g h ∗/ case 3: w += z ; break ; case 5: case 6: w −= z ; break ; default : w = 2; } return w; } CS429 Slideset 8: 31

Multiple case labels (e.g., 5, 6) Fall through cases (e.g., 2) Missing cases (e.g., 4)

Instruction Set Architecture III

Jump Table Structure Jump Targets Switch Form switch (x) { case val 0 : Block 0 case val 1 : Block 1 ... c a s e v a l n −1: B l o c k n−1 }

Jump Table JTab:

Targ0 Targ1 Targ2 ... Targn-1

Targ0:

Code Block 0

Targ1:

Code Block 1

Targ2:

Code Block 2

Translation (Extended C) goto

... Targn-1:

∗ JTab [ x ] ;

CS429 Slideset 8: 32

Instruction Set Architecture III

Code Block n-1

Switch Example long switch eq ( long x , long y , long z ) { long w = 1; switch (x) { ... } return w; }

Setup: switch eq : movq %rdx , %r c x cmpq $6 , %r d i # x :6 ja .L8 jmp ∗ . L 4 ( , %r d i , 8 )

CS429 Slideset 8: 33

Register %rdi %rsi %rdx %rax

Use(s) Argument x Argument y Argument z return value

Note that w is not initialized here.

Instruction Set Architecture III

Switch Statement Example Jump table long switch eq ( long x , long y , long z ) { long w = 1; switch (x) { ... } return w; }

.section .rodata .align 8 .L4 : .quad .L8 # .quad .L3 # .quad .L5 # .quad .L9 # .quad .L8 # .quad .L7 # .quad .L7 #

Setup: switch eq movq cmpq ja jmp

: %rdx , %r c x $6 , %r d i .L8 ∗ . L 4 ( , %r d i , 8 )

CS429 Slideset 8: 34

# # # #

x :6 use d e f a u l t g o t o ∗JTAB [ x ] , i n d i r e c t jump

Instruction Set Architecture III

x x x x x x x

= = = = = = =

0 1 2 3 4 5 6

Assembly Setup Explanation Table Structure Each target requires 8 bytes Base address at .L4 Jumping Direct: jmp .L8 Jump target is denoted by label .L8 Indirect: jmp *.L4(, %rdi, 8) Start of jump table: .L4 Must scale by factor of 8 (addresses are 8 bytes)

.section .rodata .align 8 .L4 : .quad .L8 # .quad .L3 # .quad .L5 # .quad .L9 # .quad .L8 # .quad .L7 # .quad .L7 #

Fetch target from effective address (.L4 + x*8), but only for 0 ≤ x ≤ 6 CS429 Slideset 8: 35

Instruction Set Architecture III

x x x x x x x

= = = = = = =

0 1 2 3 4 5 6

Jump Table

Jump Table: .section .rodata .align 8 .L4 : .quad .L8 # .quad .L3 # .quad .L5 # .quad .L9 # .quad .L8 # .quad .L7 # .quad .L7 #

x x x x x x x

= = = = = = =

0 1 2 3 4 5 6

CS429 Slideset 8: 36

long switch eq ( long x , long y , long z ) { long w = 1; switch (x) { case 1: w = y∗z ; break ; case 2: w = y/z ; /∗ F a l l t h r o u g h ∗/ case 3: w += z ; break ; case 5: case 6: w −= z ; break ; default : w = 2; } return w; } Instruction Set Architecture III

Code Blocks (x == 1)

switch (x) { case 1: // . L3 w = y∗z ; break ; ... }

Register %rdi %rsi %rdx %rax

.L3 : movq imulq retq

%r s i , %r a x %rdx , %r a x

Use(s) Argument x Argument y Argument z return value

CS429 Slideset 8: 37

Instruction Set Architecture III

# y # y∗z

Handling Fall-Through

long w = 1; ... switch (x) { ... case 2: w = y/z ; / F a l l Through ∗/ case 3: w += z ; break ; ... }

CS429 Slideset 8: 38

case 2: w = y/z ; g o t o merge ; ... case 3: w = 1; merge : w += z ;

Instruction Set Architecture III

Code Blocks (x == 2, x == 3)

long w = 1; ... switch (x) { ... case 2: w = y/z ; / F a l l Through ∗/ case 3: w += z ; break ; ... }

.L5 : movq cqto idivq jmp .L9 : movl .L6 : addq retq

Register %rdi %rsi %rdx %rax

CS429 Slideset 8: 39

#Case 2 %r s i , %r a x %r c x .L6 $1 , %e a x %r c x , %r a x

Use(s) Argument x Argument y Argument z return value

Instruction Set Architecture III

# y/z # g o t o merge #Case 3 # w = 1 # merge : # w += z

Code Blocks (x == 5, x == 6, default)

switch (x) { ... case 5: case 6: w −= z ; break ; default : w = 2; }

// . L7 // . L7

// . L8

.L7 : movl subq retq .L8 : movl

Register %rdi %rsi %rdx %rax

CS429 Slideset 8: 40

$1 , %e a x %rdx , %r a x

# Case 5 , 6 # w = 1 # w −= z

$2 , %e a x

# default # 2

Use(s) Argument x Argument y Argument z return value

Instruction Set Architecture III

Jump Table Structure Suppose you have a set of switch labels that are “sparse” (widely separated). In this case, it doesn’t make sense to use a jump table. If there are only a few labels, simply use a nested if structure. If there are many, build a balanced binary search tree. The compiler decides the appropriate thresholds for what’s “sparse,” what are “a few,” etc.

CS429 Slideset 8: 41

switch (x) { case 0: Block 0 case 620: B l o c k 620 ... case 1040: B l o c k 1040 }

Instruction Set Architecture III

Summarizing C Control if-then-else do-while while, for switch Assembler Control Conditional jump Conditional move Indirect jump (via jump tables) Compiler generates code sequence to implement more complex control Standard Techniques Loops converted to do-while or jump-to-middle form Large switch statements use jump tables Sparse switch statements may use decision trees CS429 Slideset 8: 42

Instruction Set Architecture III

Suggest Documents