CS429: Computer Organization and Architecture Instruction Set Architecture III Dr. Bill Young Department of Computer Sciences University of Texas at Austin
Last updated: January 13, 2017 at 08:54
CS429 Slideset 8: 1
Instruction Set Architecture III
Controlling Program Execution
We can now generate programs that execute linear sequences of instructions Access registers and storage Perform computations
But what about loops, conditions, etc.? Need ISA support for: comparing and testing data values directing program control jump to some instruction that isn’t just the next one in sequence Do so based on some condition that has been tested.
CS429 Slideset 8: 2
Instruction Set Architecture III
Processor State (x86-64, Partial) Information about currently executing program.
Temporary data (%rax, ...) Location of runtime stack (%rsp) Location of current code control point (%rip) Status of recent tests (CF, ZF, SF, OF)
Registers %rax %rbx %rcx %rdx %rsi %rdi %rsp %rbp
%r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 Instruction pointer
%rip CF
CS429 Slideset 8: 3
ZF
SF
OF
Condition codes
Instruction Set Architecture III
PC-relative Addressing
In general, you shouldn’t use %rip as a general purpose register. However, the compiler may generate PC-relative addressing. jmp
0 x10(% r i p )
The effective address for a PC-relative instruction address is the offset parameter added to the address of the next instruction. This offset is signed to allow reference to code both before and after the instruction. Can you guess why the compiler might generate such code?
CS429 Slideset 8: 4
Instruction Set Architecture III
Condition Codes (Implicit Setting) Single bit registers CF: carry flag (for unsigned) ZF: zero flag SF: sign flag (for signed) OF: overflow flag (for signed) Implicitly set by arithmetic operations E.g., addq Src, Dest C analog: t = a + b; CF set if carry out from most significant bit (unsigned overflow) ZF set if t == 0 SF set if t < 0 (as signed) OF set if two’s complement overflow: (a>0 && b>0 && t0 && (a-b) # Z e r o r e s t o f %r a x
CS429 Slideset 8: 10
Instruction Set Architecture III
Jumping jX Instructions: Jump to different parts of the code depending on condition codes. jX jmp je jne js jns jg jge jl jle ja jb
Condition 1 ZF ~ZF SF ~SF ~(SF^OF)&~ZF ~(SF^OF) (SF^OF) (SF^OF)|ZF ~CF&~ZF CF CS429 Slideset 8: 11
Description Unconditional Equal / Zero Not equal / not zero Negative Nonnegative Greater (signed) Greater or equal (signed) Less (signed) Less or equal (signed) Above (unsigned) Below (unsigned) Instruction Set Architecture III
Conditional Branch Example (Old Style) Generation: gcc -Og -fno-if-conversion control.c
long a b s d i f f ( long x , long y ) { long r e s u l t ; if (x > y) r e s u l t = x−y ; else r e s u l t = y−x ; return result ; }
absdiff : cmpq jle movq subq retq .L4 : movq subq retq
Register %rdi %rsi %rax
CS429 Slideset 8: 12
%r s i , %r d i .L4 %r d i , %r a x %r s i , %r a x
# x:y
# x y) r e s u l t = x−y ; else r e s u l t = y−x ; return result ; }
CS429 Slideset 8: 13
long a b s d i f f j ( long x , long y ) { long r e s u l t ; i n t n t e s t = x y ? x−y : y−x ;
Goto Version: n t e s t = ! Test i f ( n t e s t ) goto E l s e ; v a l = Then Expr ; g o t o Done ; Else : val = Else Expr ; Done : ...
CS429 Slideset 8: 14
Create separate code regions for then and else expressions. Execute the appropriate one.
Instruction Set Architecture III
Conditional Move Instructions
Refer to generically as “cmovXX” Based on values of condition codes Conditionally copy value from source to destination. Can be used to eliminate conditional jump.
CS429 Slideset 8: 15
Instruction Set Architecture III
Conditional Move Instructions Inst. cmove cmovne cmovs cmovns cmovg cmovge cmovl cmovle cmova cmovae cmovb cmovbe
Synonym cmovz cmovnz
cmovnle cmovnl cmovnge cmovng cmovnbe cmovnb cmovnae cmovna
CS429 Slideset 8: 16
Description Equal / zero Not equal / not zero Negative Not negative Greater (signed) Greater or equal (signed) Less (signed) Less or equal (signed) Above (unsigned) Above or equal (unsigned) Below (unsigned) Below or equal (unsigned)
Instruction Set Architecture III
Using Conditional Moves Conditional Move Instructions Instruction supports: if (Test) Dest ← Src
C Code
Supported in post-1995 x86 processors
v a l = Test ? Then Expr : Else Expr
GCC tries to use them, but only when safe Why? Branches are very disruptive to instruction flow through pipelines.
Goto Version r e s u l t = Then Expr ; eval = Else Expr ; nt = ! Test ; i f ( nt ) r e s u l t = e v a l ; return result ;
Conditional moves do not require control transfer. CS429 Slideset 8: 17
Instruction Set Architecture III
Conditional Move Example long a b s d i f f ( long x , long y ) { long r e s u l t ; if (x > y) r e s u l t = x−y ; else r e s u l t = y−x ; return result ; } absdiff : movq subq movq subq cmpq cmovle retq
%r d i %r s i %r s i %r d i %r s i %rdx
, , , , , ,
%r a x %r a x %r d x %r d x %r d i %r a x
Register %rdi %rsi %rax
Use(s) Argument x Argument y return value
# x # r e s u l t = x−y # e v a l = y−x # x:y # i f 0 ? x ∗= 7 : x += 3 ;
Both values get computed Must be side effect free CS429 Slideset 8: 19
Instruction Set Architecture III
Do-While Loop Example A common compilation strategy is to take a C construct and rewrite it into a semantically equivalent C version that is closer to assembly. C Code:
Goto Version:
long pcount do ( unsigned long x ) { long r e s u l t = 0; do { r e s u l t += x & 0 x1 ; x >>= 1 ; } while (x) ; return result ; }
long pcount goto ( unsigned long x ) { loop : r e s u l t += x & 0 x1 ; x >>= 1 ; i f ( x ) goto loop ; return result ; }
Count number of 1’s in argument x (“popcount”) Use conditional branch to either continue looping or to exit loop CS429 Slideset 8: 20
Instruction Set Architecture III
Do-While Loop Compilation Goto Version: long pcount goto ( unsigned long x ) { loop : r e s u l t += x & 0 x1 ; x >>= 1 ; i f ( x ) goto loop ; return result ; }
movl .L2 : movq andl addq shrq jne retq
$0 , %e a x %r d i , %r d x $1 , %edx %rdx , %r a x $1 , %r d i .L2
CS429 Slideset 8: 21
Register %rdi %rax
Use(s) Argument x return value
# result = 0 # loop : # # # #
t = x & 0 x1 r e s u l t += t x >>= 1 i f ( x ) goto loop
Instruction Set Architecture III
General Do-While Translation
Goto Version:
C Code:
loop : Body i f ( Test ) goto loop ;
do Body w h i l e ( Test ) ;
Body can be any C statement, typically is a compound statement. Test is an expression returning an integer. If it evaluates to 0, that’s interpreted as false. If it evaluates to anything but 0, that’s interpreted as true.
CS429 Slideset 8: 22
Instruction Set Architecture III
General While Translation #1
“Jump-to-middle” translation Used with -Og Goto version goto t e s t ; loop : Body test : i f ( Test ) goto loop ; done :
While version w h i l e ( Test ) Body
CS429 Slideset 8: 23
Instruction Set Architecture III
While Loop Example #1 Jump to Middle
C Code long pcount while ( unsigned long s ) { long r e s u l t = 0; while (x) { r e s u l t += x & 0 x1 ; x >>= 1 ; } return result ; }
long pcount goto jtm ( unsigned long x ) { long r e s u l t = 0; goto t e s t ; loop : r e s u l t += x & 0 x1 ; x >>= 1 ; test : i f ( x ) goto loop ; return result ; }
Compare to do-while version of function Initial goto starts loop at test
CS429 Slideset 8: 24
Instruction Set Architecture III
General While Translation C Code
which gets compiled as if it were:
w h i l e ( Test ) Body
Goto Version
which is equivalent to:
( ! Test ) g o t o done ;
loop : Body i f ( Test ) goto loop ; done :
Do-While Version if
if
( ! Test ) g o t o done ;
do Body w h i l e ( Test ) ;
Are all three versions semantically equivalent?
done :
CS429 Slideset 8: 25
Instruction Set Architecture III
While Loop Example #2 Do-While version C Code long pcount while ( unsigned long x ) { long r e s u l t = 0; while (x) { r e s u l t += x & 0 x1 ; x >>= 1 ; } return result ; }
long pcount goto dw ( unsigned long x ) { long r e s u l t = 0; i f ( ! x ) g o t o done ; loop : r e s u l t += x & 0 x1 ; x >>= 1 ; i f ( x ) goto loop ; done : return result ; }
Compare to do-while version of function Initial conditional guards entrance to loop
CS429 Slideset 8: 26
Instruction Set Architecture III
For Loop Form General Form
Init
f o r ( I n i t ; T e s t ; Update ) Body
i = 0
Test #d e f i n e WSIZE 8∗ s i z e o f ( l o n g ) long pcount for ( unsigned long x ) { size t i ; long r e s u l t = 0; f o r ( i =0; i > i ) & 0 x1 ; r e s u l t += b i t ; } return result ; } CS429 Slideset 8: 27
i < WSIZE
Update i++
Body { unsigned b i t = ( x >> i ) & 0 x1 ; r e s u l t += b i t ; } Instruction Set Architecture III
For Loop to While Loop
For version f o r ( I n i t ; T e s t ; Update ) Body
translates to: While version Init ; w h i l e ( Test ) { Body ; Update ; }
CS429 Slideset 8: 28
Instruction Set Architecture III
For-While Conversion Example Init i = 0
Test i < WSIZE
Update i++
Body { unsigned b i t = ( x >> i ) & 0 x1 ; r e s u l t += b i t ;
long pcount for while ( unsigned long x ) { size t i ; long r e s u l t = 0; i = 0; w h i l e ( i < WSIZE ) { unsigned b i t = ( x >> i ) & 0 x1 ; r e s u l t += b i t ; i ++; } return result ; }
} CS429 Slideset 8: 29
Instruction Set Architecture III
For Loop Do-While Conversion C Code: Goto version long pcount for ( unsigned long x ) { size t i ; long r e s u l t = 0; f o r ( i =0; i > i ) & 0 x1 ; r e s u l t += b i t ; } return result ; }
Note that the initial test is not needed. Why? CS429 Slideset 8: 30
long pcount for goto dw ( unsigned long x ) { size t i ; long r e s u l t = 0; i = 0; if
( ! ( i < WSIZE ) ) g o t o done ;
# drop # drop
loop : { unsigned b i t = ( x >> i ) & 0 x1 ; r e s u l t += b i t ; } i ++; i f ( i < WSIZE ) goto loop ; done : return result ; } Instruction Set Architecture III
Switch Statement Example long switch eq ( long x , long y , long z ) { long w = 1; switch (x) { case 1: w = y∗z ; break ; case 2: w = y/z ; /∗ F a l l t h r o u g h ∗/ case 3: w += z ; break ; case 5: case 6: w −= z ; break ; default : w = 2; } return w; } CS429 Slideset 8: 31
Multiple case labels (e.g., 5, 6) Fall through cases (e.g., 2) Missing cases (e.g., 4)
Instruction Set Architecture III
Jump Table Structure Jump Targets Switch Form switch (x) { case val 0 : Block 0 case val 1 : Block 1 ... c a s e v a l n −1: B l o c k n−1 }
Jump Table JTab:
Targ0 Targ1 Targ2 ... Targn-1
Targ0:
Code Block 0
Targ1:
Code Block 1
Targ2:
Code Block 2
Translation (Extended C) goto
... Targn-1:
∗ JTab [ x ] ;
CS429 Slideset 8: 32
Instruction Set Architecture III
Code Block n-1
Switch Example long switch eq ( long x , long y , long z ) { long w = 1; switch (x) { ... } return w; }
Setup: switch eq : movq %rdx , %r c x cmpq $6 , %r d i # x :6 ja .L8 jmp ∗ . L 4 ( , %r d i , 8 )
CS429 Slideset 8: 33
Register %rdi %rsi %rdx %rax
Use(s) Argument x Argument y Argument z return value
Note that w is not initialized here.
Instruction Set Architecture III
Switch Statement Example Jump table long switch eq ( long x , long y , long z ) { long w = 1; switch (x) { ... } return w; }
.section .rodata .align 8 .L4 : .quad .L8 # .quad .L3 # .quad .L5 # .quad .L9 # .quad .L8 # .quad .L7 # .quad .L7 #
Setup: switch eq movq cmpq ja jmp
: %rdx , %r c x $6 , %r d i .L8 ∗ . L 4 ( , %r d i , 8 )
CS429 Slideset 8: 34
# # # #
x :6 use d e f a u l t g o t o ∗JTAB [ x ] , i n d i r e c t jump
Instruction Set Architecture III
x x x x x x x
= = = = = = =
0 1 2 3 4 5 6
Assembly Setup Explanation Table Structure Each target requires 8 bytes Base address at .L4 Jumping Direct: jmp .L8 Jump target is denoted by label .L8 Indirect: jmp *.L4(, %rdi, 8) Start of jump table: .L4 Must scale by factor of 8 (addresses are 8 bytes)
.section .rodata .align 8 .L4 : .quad .L8 # .quad .L3 # .quad .L5 # .quad .L9 # .quad .L8 # .quad .L7 # .quad .L7 #
Fetch target from effective address (.L4 + x*8), but only for 0 ≤ x ≤ 6 CS429 Slideset 8: 35
Instruction Set Architecture III
x x x x x x x
= = = = = = =
0 1 2 3 4 5 6
Jump Table
Jump Table: .section .rodata .align 8 .L4 : .quad .L8 # .quad .L3 # .quad .L5 # .quad .L9 # .quad .L8 # .quad .L7 # .quad .L7 #
x x x x x x x
= = = = = = =
0 1 2 3 4 5 6
CS429 Slideset 8: 36
long switch eq ( long x , long y , long z ) { long w = 1; switch (x) { case 1: w = y∗z ; break ; case 2: w = y/z ; /∗ F a l l t h r o u g h ∗/ case 3: w += z ; break ; case 5: case 6: w −= z ; break ; default : w = 2; } return w; } Instruction Set Architecture III
Code Blocks (x == 1)
switch (x) { case 1: // . L3 w = y∗z ; break ; ... }
Register %rdi %rsi %rdx %rax
.L3 : movq imulq retq
%r s i , %r a x %rdx , %r a x
Use(s) Argument x Argument y Argument z return value
CS429 Slideset 8: 37
Instruction Set Architecture III
# y # y∗z
Handling Fall-Through
long w = 1; ... switch (x) { ... case 2: w = y/z ; / F a l l Through ∗/ case 3: w += z ; break ; ... }
CS429 Slideset 8: 38
case 2: w = y/z ; g o t o merge ; ... case 3: w = 1; merge : w += z ;
Instruction Set Architecture III
Code Blocks (x == 2, x == 3)
long w = 1; ... switch (x) { ... case 2: w = y/z ; / F a l l Through ∗/ case 3: w += z ; break ; ... }
.L5 : movq cqto idivq jmp .L9 : movl .L6 : addq retq
Register %rdi %rsi %rdx %rax
CS429 Slideset 8: 39
#Case 2 %r s i , %r a x %r c x .L6 $1 , %e a x %r c x , %r a x
Use(s) Argument x Argument y Argument z return value
Instruction Set Architecture III
# y/z # g o t o merge #Case 3 # w = 1 # merge : # w += z
Code Blocks (x == 5, x == 6, default)
switch (x) { ... case 5: case 6: w −= z ; break ; default : w = 2; }
// . L7 // . L7
// . L8
.L7 : movl subq retq .L8 : movl
Register %rdi %rsi %rdx %rax
CS429 Slideset 8: 40
$1 , %e a x %rdx , %r a x
# Case 5 , 6 # w = 1 # w −= z
$2 , %e a x
# default # 2
Use(s) Argument x Argument y Argument z return value
Instruction Set Architecture III
Jump Table Structure Suppose you have a set of switch labels that are “sparse” (widely separated). In this case, it doesn’t make sense to use a jump table. If there are only a few labels, simply use a nested if structure. If there are many, build a balanced binary search tree. The compiler decides the appropriate thresholds for what’s “sparse,” what are “a few,” etc.
CS429 Slideset 8: 41
switch (x) { case 0: Block 0 case 620: B l o c k 620 ... case 1040: B l o c k 1040 }
Instruction Set Architecture III
Summarizing C Control if-then-else do-while while, for switch Assembler Control Conditional jump Conditional move Indirect jump (via jump tables) Compiler generates code sequence to implement more complex control Standard Techniques Loops converted to do-while or jump-to-middle form Large switch statements use jump tables Sparse switch statements may use decision trees CS429 Slideset 8: 42
Instruction Set Architecture III