User Mode Execution
Computer Architecture - Overview processor architecture
able to use all of the “normal” instructions
–
privileged execution modes
–
load and store general registers from/to memory
–
asynchronous exceptions (traps)
–
arithmetic, logical, test, compare, data copying
–
branches and subroutine calls
I/O architecture
able to address some subset of memory
–
busses, controllers, devices, smart controllers
–
I/O: direct, polled, mapped, DMA, interrupt driven
–
sequential and random access devices
–
disks and factors affecting disk I/O performance
–
I/O operations, update the MMU
You need to understand how these really work
–
interrupt enables, enter supervisor mode
computer and I/O architecture
–
what is controlled by a Memory Management Unit
not able to perform privileged operations
3/5/03 - 1
computer and I/O architecture
Supervisor Mode Execution
Processor Status Register
can execute privileged instructions
contains condition codes
–
able to perform I/O operations
–
set by arithmetic/logical operations (0,+,-,ovflo)
–
interrupt enable/disable/return, load PS
–
tested by conditional branch instructions
–
instructions to change processor mode
controls execution mode (user/supervisor)
can access privileged address spaces –
access data structures inside the OS
–
access other process's address spaces
–
change and create address spaces
describes which interrupts are enabled may describe what address space to use may control other processor features/options
may have alternate registers, alternate stack computer and I/O architecture
3/5/03 - 2
3/5/03 - 3
–
word length, endian-ness, instruction set, ...
computer and I/O architecture
3/5/03 - 4
Choice of Execution Modes
Asynchronous Exceptions and Handlers
computer boots up in supervisor mode –
most errors can be handled “in-line”
used by bootstrap and OS to initialize the system
applications run in user mode –
OS changes to user mode before running user code user programs cannot do I/O, restricted address space
–
they have no way to get into supervisor mode because instructions to change the PS are privileged
reentering supervisor mode is strictly controlled –
only happens in response to traps and interrupts
computer and I/O architecture
3/5/03 - 5
Trap Handling
st
1 level trap handler (saves registers and selects 2nd level handler)
PS/PC PS/PC PS/PC PS/PC PS/PC
TRAP vector table
return to user mode
2nd level handler (actually deals with the problem) computer and I/O architecture
–
program can test for, and handle such conditions
some errors must interrupt program execution –
e.g. CPU was unable to execute this instruction
–
there must be a way to inform OS if this happens
most computers accomplish this with “traps” –
a well specified list of all possible exceptions
–
a means for the OS to associate handlers with each
computer and I/O architecture
3/5/03 - 6
hardware trap handling
... instr; instr; instr; bad instr; instr; instr; instr ...
supervisor mode
arithmetic overflows are reflected in condition codes
(Transition into Supervisor Mode)
Application Program user mode
–
3/5/03 - 7
–
use trap cause to index into trap vector table for PC/PS
–
load new processor status word, switch to supv mode
–
push PC/PS of program that caused trap onto stack
–
load new program counter (w/addr of 1st level handler)
software trap handling –
1st level handler pushes all other registers onto stack
–
1st level handler gathers info, selects 2nd level handler
–
2nd level handler deals with the exception condition
computer and I/O architecture
3/5/03 - 8
Control of Supervisor mode transitions all user->supervisor changes are via traps/interrupts –
it is difficult to know when these will happen
there is a designated handler for each trap/intr –
its address is stored in a trap/interrupt vector table
–
the operating system sets up all of the handler vectors
ordinary programs can't access these vectors –
vectors are not in the process' address spaces
by carefully controlling all of the trap/intr “gateways”
computer and I/O architecture
some exceptions are handled by the OS –
e.g. page faults, alignment, floating point emulation
–
OS simulates expected behavior and returns
some exceptions may be fatal to running task –
e.g. zero divide, illegal instruction, invalid address
–
OS reflects the failure back to the running process
some exceptions may be fatal to the system
the OS controlls all supervisor mode transitions –
Dealing with the cause of a trap
3/5/03 - 9
–
e.g. power failure, cache parity, stack violation
–
OS cleanly shuts down the affected hardware
computer and I/O architecture
(Returning to User Mode)
Stacking and unstacking a trap
user mode computation
return is opposite of interrupt/trap entry
supervisor mode stack
user mode stack growth
user-mode PC and PS saved user-mode registers parameters to 2 nd level handler
–
2nd level system call handler returns to 1st level handler
–
1st level handler restores all registers from stack
–
use privileged return instruction to restore PC/PS
–
resume user-mode execution after trapped instruction
saved registers can be changed before return
return PC stack frame for 2 nd level handler
...
computer and I/O architecture
3/5/03 - 10
3/5/03 - 11
–
used to set entry point for newly loaded programs
–
used to deliver signals to user-mode processes
–
used to set return codes from system calls
computer and I/O architecture
3/5/03 - 12
Traps while in Supervisor Mode I/O architectures: busses
nearly identical to traps while in user mode –
trap saves interrupted PC/PS on supervisor mode stack
–
trap goes to same vector & 1st level handler
–
same register saving, restoring, and return
there are very few differences
control data address interrupts
main bus
–
saved PS at time of interrupt shows supervisor mode
–
2nd level handler knows trap was from supervisor mode (and may consider it to be more or less severe than the same trap from user mode)
computer and I/O architecture
Controller
CPU
3/5/03 - 13
Memory
Controller
Device
computer and I/O architecture
Memory type busses
3/5/03 - 14
Network type busses
came from back-plane memory-to-CPU interconnects
evolved as peripheral device interconnects
–
a few “bus masters”, and many “slave devices”
–
SCSI, USB, 1394 (firewire), Infiniband, ...
–
arbitrated multi-cycle bus transactions
–
cables and connectors rather than back-planes
request, grant, address, respond, transfer, ack
–
designed for easy and dynamic extensibility
operations: read, write, read/modify/write, interrupt
–
originally slower than back-plane, but no longer
originally most busses were of this sort
much more similar to a general purpose network
–
ISA, EISA, PCMCIA, PCI, cPCI, video busses, ...
–
distinguished by form-factor, speed, data width, ...
–
newer busses support bridging, hot-swap, self-identifying
computer and I/O architecture
3/5/03 - 15
–
packet switched, topology, routing, node identity
–
may be master/slave (USB) or peer-to-peer (1394)
–
may be implemented by controller or by host
computer and I/O architecture
3/5/03 - 16
I/O architectures: devices & controllers
mechanisms: device controller registers
I/O devices
device controllers export registers to the bus
–
peripheral devices that interface between the computer and other media (disks, tapes, networks, serial ports, keyboards, displays, pointing devices, etc.)
device controllers connect a device to a bus –
communicate control operations to device
–
relay status information back to the bus
–
manage DMA transfers for the device
–
generate interrupts for the device
FER DCD
PER RI
reading from registers obtains data/status
may require special instructions (e.g. x86 IN/OUT) may be mapped onto bus like memory accessed with normal (load/store) instructions I/O address space not accessible to most processes
computer and I/O architecture
3/5/03 - 18
(16550 UART registers)
Register Data Register Interrupt Enable Register Interrupt Register Line Control Register
RTS Modem Control Register RER Line Status Register CTS Modem Status Register
A 16550 presents seven 8-bit registers to the bus.
0: data – read received byte, write to transmit a byte (or LSB of speed divisor when speed set is enabled)
1: interrupt enables – for transmit done, data received, cd/ring (or MSB of speed divisor when speed set is enabled)
2: interrupt registers – currently pending interrupt conditions 3: line control register – character length, parity and speed 4: modem control register – control signals sent by computer
All communication between the bus and the device (send data, receive data, status and control) is performed by reading from, and writing to these registers. computer and I/O architecture
–
privileged instructions restricted to supervisor mode
A simple device: 16550 UART
BRK
writing into registers controls device or sends data
–
3/5/03 - 17
DTR OVR DSR
–
–
computer and I/O architecture
contents x x x x x MDM STS XMT RCV MDM STS XMT RCV PARITY STOP WORDLEN
registers in controller can be addressed from bus
register access method varies with CPU type
a controller is usually specific to a device and a bus
offset 0 x x x 1 2 3 spee BRK d 4 5 RCV EMT XMT 6
–
3/5/03 - 19
5: line status register – xmt/rcv completion and error conditions 6: modem status registers – received modem control signals computer and I/O architecture
3/5/03 - 20
Scenario: direct I/O with polling
(mechanisms: direct polled I/O) all transfers happen under direct control of CPU
uart_write_char( char c ) { while( (inb(UART_LSR) & TR_DONE) == 0); outb( UART_DATA, c ); }
–
CPU transfers data to/from device controller registers
–
transfers are typically one byte or word at a time
–
may be accomplished with normal or I/O instructions
CPU polls device until it is ready for data transfer
char uart_read_char() { while( (inb(UART_LSR) & RX_READY) == 0); return( inb(UART_DATA) );
–
received data is available to be read
–
previously initiated write operations have been completed
advantages –
} computer and I/O architecture
3/5/03 - 21
performance of direct I/O each byte or word transferred requires mutiple instructions busy-wait polling ties up CPU until I/O is completed
devices are idle while we are running other tasks –
3/5/03 - 22
bus facilitates data flow in all directions between
CPU is wasted while awaiting completion of transfers –
computer and I/O architecture
Direct Memory Access – I/O w/o the CPU
CPU intensive data transfers –
very easy to implement (both hardware and software)
–
CPU, memory, and device controllers
CPU can be the bus-master –
initiating data transfers with memory or device controllers
device controllers can also master the bus
I/O can only happen when an I/O task is running
–
how can problems be dealt with
CPU instructs controller what transfer is desired what data to move to/from what part of memory
–
let controller transfer data without attention from CPU
–
device controller performs transfer w/o CPU assistance
–
let application block pending I/O completion
–
device controller generates interrupt at end of transfer
–
let controller interrupt CPU when I/O is finally done
computer and I/O architecture
3/5/03 - 23
computer and I/O architecture
3/5/03 - 24
completion interrupts – waking up CPU
Interrupt Handling Application Program
device controllers, busses, and interrupts
... instr; instr; instr; instr; instr; instr ...
busses have ability to send interrupts to the CPU
user mode
–
devices signal controller when they are done/ready
supervisor mode
–
when device is done, controller asserts interrupt on bus
CPUs and interrupts –
1st level interrupt handler
interrupts look very much like traps
PS/PC PS/PC PS/PC PS/PC
return to user mode
PS/PC
Interrupt vector table
traps come from CPU, interrupts are caused externally –
unlike traps, interrupts can be selectively enabled/disabled
2nd level handler (device driver interrupt routine)
a device can be told it can or cannot generate interrupts special instructions can enable/disable interrupts to CPU computer and I/O architecture
3/5/03 - 25
interrupts vs. traps –
they are triggered when something happens
–
there is (usually) no persistent state that must be cleared
interrupts are caused a device being in some state –
they are triggered when the device enters a particular state
–
they will continue to be asserted until device state changes
lock(devlock);
/* lock device */
/* update data read count */
/* program the DMA request */
req_xfr = req_cnt – dp->cnt;
dp->loc = req_loc;
dp->adr = req_adr;
dp->cnt = req_cnt;
dp->op = READ;
/* turn off device ability to interrupt */ dp->ctrl = IDISABLE;
dp->ctrl = IENABLE | GO;
/* wake up the requester */ wakeup(devcompletion);
intr_enable( save );
once delivered, an interrupt must be disabled
await(devcompletion);
/* tell intr dispatcher we're done */
CPU must ignore continuing request for that interrupt
/* request has completed */
cause must be cleared, and interrupt acknowledged
unlock(devlock);
computer and I/O architecture
dev_intr_handler() {
save = intr_enable(DISABLE);
/* re-enable and await completion */
the device is changed from DONE to BUSY again
–
3/5/03 - 26
DMA read w/completion interrupts
traps are caused by an instantaneous condition
–
list of device interrupt handlers computer and I/O architecture
3/5/03 - 27
/* release device */
computer and I/O architecture
return( ACKNOWLEDGE_INTERUPT) } 3/5/03 - 28
(device I/O with completion interrupts)
mechanisms: memory mapped I/O
requesting process checks to see if device is busy
DMA may not be the easiest way to do I/O
–
if idle, start the I/O operation, and await its completion
–
if busy, wait for the device to become idle
I/O interrupt handler –
gathers completion information from the device
–
posts completion awakening requester
wake up the next requester
–
continuous updates to isolated areas of the screen
–
1MB display controller sits on the CPU memory bus
–
each byte of display memory corresponds to one pixel
–
application uses ordinary stores to update display
low overhead per update, no interrupts to service
we'll talk about waiting and waking up in two weeks computer and I/O architecture
consider a video game display adaptor
implement as a bit-mapped display adaptor
when current device owner finishes using the device –
–
3/5/03 - 29
relatively easy to program computer and I/O architecture
trade-off: memory mapped vs. DMA
3/5/03 - 30
Smart Device Controller
DMA performs large transfers efficiently –
better utilization of both the devices and the CPU
I/O completion interrupts
device doesn't have to wait for CPU to do transfers –
I/O instructions
but there is considerable per transfer overhead setting up the operation, processing completion interrupt
memory-mapped I/O has no start/finish overhead –
device driver
basic status basic control
accessed through bus
control registers (on bus) buffer pointers
device controller
but every byte is transferred by a CPU instruction normal instructions
DMA better for occasional large transfers
accessed through DMA
memory-mapped better frequent small transfers memory-mapped devices are more difficult to share computer and I/O architecture
3/5/03 - 31
shared buffers (in memory) computer and I/O architecture
3/5/03 - 32
Random v.s. Sequential Access
(I/O Mechanisms: smart controllers) Smarter controlers can improve on basic DMA they can queue multiple input/output requests –
when one finishes, automatically start next one
–
reduce completion/start-up delays
–
eliminate need for CPU to service interrupts
request scheduling to improve perormance
–
they can do automatic error handling & retries
they can better hide the details of underlying devices computer and I/O architecture
–
byte/block N must be read before byte/block N+1
–
may be read/write once, or may be rewindable
–
examples: magnetic tape, printer, keyboard
Random access devices
they can relieve CPU of other I/O responsibilities –
Sequential access devices
3/5/03 - 33
–
possible to seek directly to any desired byte/block
–
seeks may or may not be instantaneous
–
examples: memory, magnetic disk, CD, graphics adaptor
They are used very differently computer and I/O architecture
random access devices: disks
Disk drive geometry
random access devices are much more interesting –
usage, performance, and scheduling techniques
program loading, file I/O, paging
–
disk performance drives timesharing performance
–
a mounted assembly of circular platters
–
read/write head per surface, all moving in unison
track –
ring of data readable by one head in one position
cylinder
disk I/O operations are subject to overhead
–
–
higher overhead means fewer operations/second
–
careful scheduling can reduce overhead
–
clever scheduling can improve throughput and delay
computer and I/O architecture
spindle head assembly
key time sharing services depend on disk I/O –
3/5/03 - 34
corresponding tracks on all platter
sector –
3/5/03 - 35
logical records written within tracks
disk address = computer and I/O architecture
3/5/03 - 36
Disk Drive - Logical
Disk Drive – Physical
Sectors
Spindle
Track
10 heads
platter/surface
0 1
5 platters 10 surfaces
head positioning assembly
8 9
Cylinder
Motor
computer and I/O architecture
3/5/03 - 37
computer and I/O architecture
Optimizing disk performance
Disk Drive Performance heads 10 platters cylinders 17,000 tracks/inch sectors/track 400 bytes/sector RPM 7200 speed seek time 2-15ms (average 9ms) latency 0-8ms (average 4ms)
best case worst case average
don't start I/O until disk is on-cyl/near sector
5 18,000 512 200Mb/sec
–
I/O ties up the controller, locking out other operations
–
other drives seek while one drive is doing I/O
minimize head motion
time to read one 8,000 byte block seek rotate transfer total 400 s 0ms 0ms 400 s 23.4ms (58X) 15ms 8ms 400 s 13.4ms (33X) 9ms 4ms 400 s
computer and I/O architecture
3/5/03 - 38
–
do all possible reads in current cylinder before moving
–
make minimum number of trips in small increments
encourage efficient data requests
3/5/03 - 39
–
have lots of requests to choose from
–
encourage cylinder locality
–
encourage largest possible block sizes
computer and I/O architecture
3/5/03 - 40
Head Travel under various algorithms
read sections 6-6.3
76
First Come First Served 124 17 269 201 29
137
12
48
107
252
125
Tot=880
29
Shortest Seek First 17 12 124 137
68
172
108
For the next lecture (see Greek to English dictionary regarding figure 6-3)
there will be a quiz on the reading 76 47
12
5
112
13
64
201 68
topics for the next lecture
269
Tot=321
76
Scan/look (elevator algorithm) 124 137 201 269 29
17
12
48
13
5
Tot=450
64
68
240
12
computer and I/O architecture
3/5/03 - 41
key points
user view of processes
–
process address spaces
–
object modules, load modules, linkage editing
–
procedure calls, stack frames, system calls, signals
computer and I/O architecture
trap and interrupt handling
channels sit between CPU and I/O devices –
save/restore, vectoring 1st and 2nd level handlers
think of them as extremely smart busses
the include highly specialized CPUs
busses, devices, controllers, interconnections
–
they execute channel I/O programs
I/O mechanisms, what they are, how they work
–
instructions to read, write and control devices
–
instructions to generate progress interrupts
–
polled I/O, direct I/O, memory mapped I/O, DMA
–
interrupt driven I/O, smart controllers
once started, I/O programs execute w/o CPU attention
random access devices –
disk geometry, disk performance, disk scheduling
computer and I/O architecture
3/5/03 - 42
Channel Controllers – I/O co-processors
supervisor mode execution, privileged instructions –
–
3/5/03 - 43
–
command chaining
–
data chaining
computer and I/O architecture
3/5/03 - 44
Typical Channel Architecture
Typical Channel Program (both programs located in main memory)
Device Controller 0x11?
CPU
Main bus
Channel Controller 0x1?? Channel Controller 0x2??
...
Device Controller 0x1F?
Device 0x110
...
Device 0x11F
SIO 0x101, iopgm
...
...
... 3/5/03 - 45
Channel Controller iopgm SEEK cyl=1020, hd=5, rec=10 READ buf=xxx, cnt=4096 READX buf=yyy, cnt=4096, intr TIC next next
intr: TIO 0x101
all channels, controllers and devices have "Geographic" addresses computer and I/O architecture
Main CPU
...
computer and I/O architecture
SEEK cyl=1050, hd=0, rec=2 WRITE buf=zzz, cnt=8192, intr END intr
(note, channel can concurrently execute one program per controller) 3/5/03 - 46