Von Neumann computer. Basic machine architecture. Simple modern architecture. Autonomous transfers. nabg 1

Von Neumann computer • Modern electronic computers started with a design by Von Neumann (~1945) Basic machine architecture 1 nabg 2 nabg Von Neum...

Author: Aileen Richard

2 downloads 1 Views 1MB Size

Report

Download PDF

Recommend Documents

1) Von Neumann Architecture

Chapter 02: Computer Organization. Lesson 01: Von Neumann Machine Architecture

The Von Neumann Architecture. Designing Computers. The Von Neumann Architecture. Chapter Von Neumann Architecture

ovon-neumann Architecture oinstruction execution in Von Neumann Architecture ovon Neumann Architecture s limitation oharvard Architecture

Von Neumann Architecture

VON NEUMANN HYBRID ARCHITECTURE

Computer Organization I. Lecture 3: von Neumann Architecture (Part II) von Neumann Architecture

Instruction sets. von Neumann architecture. Harvard architecture. CPU + memory. von Neumann vs. Harvard. RISC vs. CISC

CPS311 Lecture: Basic Von Neumann Architecture; Introduction to the MIPS Architecture and Assembly Language

6. Combinational Circuits. Building Blocks. Computer Architecture. Digital Circuits. TOY lectures. von Neumann machine

Machine Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture

Modern Computer Architecture, Rafiquzzaman Rajan Chandra, Rafiqzzaman,,,

CSE502: Computer Architecture CSE 502: Computer Architecture

Computer Architecture

COMPUTER ARCHITECTURE

Computer Architecture

COMPUTER ARCHITECTURE

Computer Architecture

COMPUTER ARCHITECTURE COMPUTER HISTORY

COMPUTER ARCHITECTURE VS. INSTRUCTION SET ARCHITECTURE

Target Machine Architecture

COSC 243. Computer Architecture 2. Lecture 13 Computer Architecture 2. COSC 243 (Computer Architecture)

Von Neumann computer • Modern electronic computers started with a design by Von Neumann (~1945)

Basic machine architecture

1 nabg

2 nabg

Von Neumann computer

Von Neumann computer

• Control + Arithmetic logic unit

• In the Von Neumann design, all I/O data transfers have to pass through registers (“accumulators”) in the arithmetic‐logic‐unit

– “Central Processing Unit” where all data processing work done

– This design was adequate for the earliest computers with their very small memories and slow limited I/O devices – It proved less appropriate when, by the mid‐1950s, machines had somewhat larger memories and faster I/O using magnetic tapes and then disks.

• Memory – Stores both program and data

• Transfer of a block of data from tape to memory using a machine with 3 registers in its ALU would have been programmed something like

• Some earlier proto computers had different memories (sometimes using differing hardware technologies) for program and data

Load r0,512 Load r1,destination Loop: Read-word from tape-control into r2 Store r2,@r1 Increment r1 Decrement r0 Test r0 Skip if zero Goto loop // Continue now block has loaded

– The program might actually be “hard wired”

• Input and output – Input of data and output of results of computation

The disk/tape transfer occupies the CPU

3 nabg

4 nabg

Autonomous transfers

Simple modern architecture

• By the late 1950s, logic circuitry had become a little cheaper and it was practical to put control logic (still implemented using vacuum tubes) into device controllers for things like disks and tapes – so potentially relieving the CPU from doing the detailed work of the transfer

• From the late 1950s onwards, the basic structure of a computer became:

– The code in the CPU would have been more like Load r0, destination Copy r0, tape-control register Start transfer of block on tape … Check if transfer complete

The tape/disk unit could have its own counter register counting the bytes transferred and its own address register for location where data to be stored, and circuitry to update these registers as data transfers took place

• Of course, there had to be a data pathway from device directly to memory – “direct memory access” 5 nabg

nabg

6 nabg

1

Disk working with direct memory access 1

Simple modern architecture • CPU (Central Processing Unit)

2 PC IR Flags

PC

CPU

ALU

IR

Block number

Flags

Byte counter Destination address Flags

Registers

Block number

CPU

– Timing and control circuitry – High speed data registers (older term for register – “accumulator”)

Byte counter Destination address

ALU

DISK

Flags

Registers

Disk cache

DISK

Disk cache

BUS

BUS CPU executing other instructions

CPU to disk: load block number XXX and start seek;

3

Disk moving heads (seeking

• Although some computer architectures allow a little more flexibility, it’s typical that instructions for data processing operations (add, multiply, xor, …) to take data from registers and place results in registers;

4 PC IR Flags

PC

CPU

ALU

IR

Block number Byte counter

Flags

Destination address Flags

Registers

– Variables in memory have to have their values copied into registers before the values can be manipulated; the results must be stored back into memory.

Block number Byte counter

CPU

Destination address

ALU

DISK

Flags

Registers

Disk cache

DISK

Disk cache

BUS

BUS

• Memory

CPU to disk: copy into memory starting at address ******;

disk to CPU: got it;

5

– Magnetic core (1950s‐1970s), later semi‐conductor (1970…) memory for OS, programs, and data

6 PC IR Flags

PC

CPU

ALU

Block number

Flags

Destination address Flags

Registers

IR

Byte counter

Disk cache

DISK

Byte counter

• Peripheral device controllers

Destination address Flags

Registers

BUS

Disk cache

DISK

BUS Data transferred "directly" into memory

Block of memory to be filled

Block number

CPU

ALU

– Sophisticated, complex controllers for disks and tapes, – Simpler controllers for slow devices like printers, terminals, card‐readers, …

disk to CPU: transfer complete;

• Bus

Block of memory now filled

– A common communications highway 7

nabg

8 nabg

CPU • The CPU of a modern small computer is physically implemented as single silicon "chip".   – This chip will have engraved on it the millions of transistors and the interconnecting “wiring” that define the CPU's circuits.   – The chip will have one hundred or more pins around its rim ‐‐‐ some of these pins are connection points for the signal lines from the bus, others will be the points where electrical power is supplied to the chip.

More modern CPU chips Intel 4004 – one of the first single chip CPUs (~1971)

9

10

nabg

CPU

Fetch‐decode‐execute

• Although physically a single component, the CPU is logically made up from a number of subparts.   • The three most important, which will be present in every CPU, are

• The timing and control circuits are the heart of the system.   • A controlling circuit defines the computer's basic processing cycle repeat fetch next instruction from memory decode instruction (i.e. determine which data manipulation circuit is to be activated) fetch from memory any additional data that are needed execute the instruction (feed the data to the appropriate manipulation circuit) until "halt" instruction has been executed;

11 nabg

nabg

12 nabg

2

CPU registers

CPU registers

• Timing & control

• ALU

– Program counter

– Lots of anonymous registers that hold data temporarily during operations like multiplications, shifts etc

• Address of next instruction to be executed – By default, it’s the address following the current instruction – Changed if current instruction is a subroutine‐call, jump (goto), branch, or “skip”

– Instruction register • Hold current instruction for decoding circuits

– Flags • Did last operation result in – – – –

Zero value +ve value ‐ve value …

Timing and Control Unit

Program counter (PC) Timing and Control Unit

Instruction register (IR) Flags

ALU

(various anonymous registers)

High speed registers

13

nabg

14 nabg

CPU registers

Instruction repertoire

• High speed registers

• Each distinct machine architecture has its own “instruction set”

– Hold data values and/or addresses of data values – On most modern machines –

– Instructions correspond to circuits built into the ALU¶ – In the machine, instructions are represented by “op‐ codes” – specific bit patterns – Assembly language uses mnemonic names

• One register reserved as “stack pointer” • One register reserved as “stack frame pointer”

• At assembly language level (hand‐written or compiler‐generated) use of registers is explicit – Code at this level is all about • Copy data to this register • Combine data in these registers • Store data from this register back into main memory • Use contents of this register as address of some data value • …

Timing and Control Unit

Program counter (PC) Instruction register (IR)

• e.g. Add for an addition instruction

Flags

•

ALU

(“mnemonic” – designed to aid the memory, the names chosen to remind the programmer of the effect of the instruction)

(various anonymous registers)

¶Not always true.  Some CPUs are “micro‐coded”.  Their built‐in hardware circuits implement a different, usually much simpler architecture and instruction set.  They have “read‐only memory” subroutines that simulate the supposed instruction set using the simpler circuits actually present.  Such approaches are old – the IBM360 series (1964) had a range of machines varying in power by 1..100 at least all supposedly with the same instruction set; actually, the smaller machines simulated all the complex instructions using simpler circuitry.  Modern Intel CPU chips have to be backward compatible with the 386/486 chips introduced in the 1980s; again, they simulate some of the instructions of those old designs using their newer circuitry.

High speed registers

C++ has provision for specifying use of registers in high‐level code.  Rarely useful! ‐ Makes code very machine specific & anyway use of registers better left to optimisation phase of compiler

15

nabg

16

nabg

Instructions & high level languages

Instructions • Example Motorola 68000 CPU chip, instruction repertoire includes ADD AND Bcc

CLR CMP JMP JSR SUB RTS

• It’s not hard to envisage how simple expressions in a high level language might get coded using a given instruction set:

Add two integer values Perform an AND operation on two bit patterns Test a condition flag, and possibly branch to another instruction (variants like BEQ testing equality, BLT testing less than) Clear, i.e. set to 0 Compare two values Jump or goto Call a subroutine Subtract second value from first Return from subroutine Original 1984 Macintosh used this Motorola CPU

nabg

Instruction register (IR)

ALU

High speed registers

Registers are mostly the same size (same number of bits), something like “flags” might have fewer bits.  Size of register is size of data element most readily manipulated – 1‐byte, 2‐bytes, 4‐bytes, or some arbitrary “word size” (12‐bit, 18‐bit, 36‐bit, 40‐bit, 60‐bit).  Operations, e.g. additions, on larger data elements will need to be done by sequences of instructions that manipulate register sized portions.

nabg

Program counter (PC) Flags

(various anonymous registers)

int main(int argc, char** argv) { int total = 0; int data[] = { 1,2,3,4,5}; int len = sizeof(data)/sizeof(int); int* ptr = data; int i=0; while(i 50 times faster than subroutines – Exploited by re‐written subroutines for faster floating point arithmetic

• “Floating point” unit – Circuits implement floating point add, subtract, multiply, divide Expanding the instruction set is no longer really an option. Modern chips typically have all heavily used operations implemented in circuitry; exotic instructions to perform special operations aren’t readily exploited by compilers so only prove worthwhile in specialist chips – e.g. a chip for decoding compressed movies might be derived from a standard chip extended with additional specialized instructions

Bus

Multiple bus

• CPU and direct memory access devices have to compete for use of the bus

• Memory can be “multi‐ported” so that it works with multiple buses. • Instructions to disks (and more sophisticated things like auxiliary I/O processors or channels) can go on the same bus as used by the CPU but the data can be transferred on a different bus

– Transfers on bus will take (at least) one clock cycle so CPU may have to wait a whole clock cycle when it wants to get the next instruction but the bus is busy with a transfer – Wait for a whole clock cycle? But we want speed!

– CPU doesn’t have to wait even a single bus cycle

95 nabg

nabg

94

nabg

96 nabg

16

Multiple memory modules

Multiple bus

• Memory can be organized in multiple modules so as to allow more than one simultaneous transfer

Device controller

CPU

– (Memory will have read and write speeds, maybe these not fast enough) – Different schemes – can use either high‐order or low‐ order bits of address to identify module

DMA data transfers on this second  bus

• Low order bits will place “successive bytes/words” in different modules – Could be useful if disk and bus could potentially transfer data faster than a memory module can write words – by having successive words in different modules, transfers can utilize full speed.

Memory Bus‐1

Bus‐2 97

nabg

98 nabg

Multi‐module memory

“Cache” memories in device controllers

Multiple reads/writes in progress simultaneously on different modules

• Can speed up some I/O by providing device controllers with their own memory

Device controller

Addresses ending: 000b 001b Illustrates case 010b where low order 011 b bits select 100b module 101b 110b 111b

– “Disk write”

Memory

CPU

Bus‐1

• Give disk controller details of disk address and immediately copy the contents of main‐memory data buffer into disk controller’s memory. • Disk controller will write data to disk when opportunity arises

DMA data transfers on this second  bus

Bus‐2

Useful when bus speed faster than memory write cycle time “Cache” – a secret hiding place or store

99

nabg

CPU & Memory

Instruction cache • Useful for code like loops that are executed a large number of times

• Add a “cache” of high‐speed memory to CPU – Instruction cache – Data cache

– (Operations like matrix multiplication)

• Code runs faster if instructions are in cache

• Cache may be “on chip” with the CPU circuitry, or accessed via some separate higher speed bus

– How did they get there? • Combination of hardware and (OS) software (mainly hardware) – Detect that using the same small area of instruction memory for a large number of cycles – Copy those instructions into cache – Fudge the address decoding/instruction fetch mechanism so that these instructions subsequently fetched from cache

101 nabg

nabg

100

nabg

102 nabg

17

Data cache

Data cache

• Matrix multiplication again –

• Writes to the data cache?

– Large chunks of memory holding 2d arrays of floating point numbers – Code repeatedly reading values – Code runs faster if the data are brought up into cache memory.

– Matrix multiplication ResultMatrix = MatrixA x MatrixB

– Again would run faster if the “store” operations used cache rather than real memory – so hold the ResultMatrix in cache as well – Of course, will still have to write back into main memory!

• How?  Again, combination of hardware and software (mainly hardware) detects repeated access to same regions of memory, copies that block of memory into cache and fixes the addressing mechanism so that cache used on data fetches. 103 nabg

104 nabg

CPU caches

Instruction pipeline

• CPU caches add lots of complexity especially if have multi‐cpu designs where the same data sets may be manipulated by code running in parallel on different CPUs • Extra hardware

• A computer with an instruction pipeline doesn’t have a single “instruction register” – in effect it has several “instruction registers” in a pipe‐line (queue) • Instructions are “pre‐fetched” and appended to back of pipeline • Instruction at front of pipe‐line – and some of the others in the pipe‐line – are being executed

– Cache loading has to be done largely by hardware – Locking and coherence mechanisms • If have multiple CPUs, need to avoid problems that could arise if a CPU tries to update data in cache when that data is shared with other CPUS

– Need to re‐load instruction and data caches whenever have a process switch

– Extra hardware in CPU determines which instructions can run concurrently 105

nabg

106 nabg

Instruction pipeline

Instruction pipeline • Pipelined CPU – Start the floating point multiply – Store?

• Don’t have to work on a single instruction – Code like x = a[i][j]*b[j][k]; j++; if(j