Operating Systems & Memory Systems: Address Translation

Operating Systems & Memory Systems: Address Translation Computer Science 220 ECE 252 Professor Alvin R. Lebeck Fall 2006 Outline • Finish Main Memory...
Author: Lindsey Conley
1 downloads 0 Views 192KB Size
Operating Systems & Memory Systems: Address Translation Computer Science 220 ECE 252 Professor Alvin R. Lebeck Fall 2006

Outline • Finish Main Memory • Address Translation – basics – 64-bit Address Space

• Managing memory • OS Performance Throughout • Review Computer Architecture • Interaction with Architectural Decisions

© Alvin R. Lebeck 2001

CPS 220

Page 1

2

Fast Memory Systems: DRAM specific • Multiple RAS accesses: several names (page mode) – 64 Mbit DRAM: cycle time = 100 ns, page mode = 20 ns

• New DRAMs to address gap; what will they cost, will they survive? – Synchronous DRAM: Provide a clock signal to DRAM, transfer synchronous to system clock – RAMBUS: reinvent DRAM interface (Intel will use it) » Each Chip a module vs. slice of memory » Short bus between CPU and chips » Does own refresh » Variable amount of data returned » 1 byte / 2 ns (500 MB/s per chip) – Cached DRAM (CDRAM): Keep entire row in SRAM

© Alvin R. Lebeck 2001

CPS 220

3

Main Memory Summary • Big DRAM + Small SRAM = Cost Effective – Cray C-90 uses all SRAM (how many sold?)

• Wider Memory • Interleaved Memory: for sequential or independent accesses • Avoiding bank conflicts: SW & HW • DRAM specific optimizations: page mode & Specialty DRAM, CDRAM – Niche memory or main memory? » e.g., Video RAM for frame buffers, DRAM + fast serial output

• IRAM: Do you know what it is?

© Alvin R. Lebeck 2001

CPS 220

Page 2

4

Review: Reducing Miss Penalty Summary • Five techniques – – – – –

Read priority over write on miss Subblock placement Early Restart and Critical Word First on miss Non-blocking Caches (Hit Under Miss) Second Level Cache

• Can be applied recursively to Multilevel Caches – Danger is that time to DRAM will grow with multiple levels in between

© Alvin R. Lebeck 2001

CPS 220

5

Review: Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the cache

© Alvin R. Lebeck 2001

CPS 220

Page 3

6

Review: Cache Optimization Summary Technique Larger Block Size Higher Associativity Victim Caches Pseudo-Associative Caches HW Prefetching of Instr/Data Compiler Controlled Prefetching Compiler Reduce Misses Priority to Read Misses Subblock Placement Early Restart & Critical Word 1st Non-Blocking Caches Second Level Caches Small & Simple Caches Avoiding Address Translation Pipelining Writes

© Alvin R. Lebeck 2001

MR + + + + + + +

MP HT – –

+ + + + +

+



+ + +

Complexity 0 1 2 2 2 3 0 1 1 2 3 2 0 2 1

CPS 220

7

System Organization interrupts

Processor

Cache

Core Chip Set

I/O Bus

Main Memory Disk Controller

Disk

© Alvin R. Lebeck 2001

CPS 220

Page 4

Disk

Graphics Controller Graphics

Network Interface

Network

8

Computer Architecture • Interface Between Hardware and Software Applications Operating System

CPU

Software Compiler

This is IT

Memory

Multiprocessor

I/O

Hardware

Networks

© Alvin R. Lebeck 2001

CPS 220

9

Memory Hierarchy 101 Very fast Cost Effective Memory System (Price/Performance) © Alvin R. Lebeck 2001

CPS 220

Page 5

10

Virtual Memory: Motivation • Process = Address Space + thread(s) of control • Address space = PA

Virtual Physical

– programmer controls movement from disk – protection? – relocation?

• Linear Address space – larger than physical address space » 32, 64 bits v.s. 28-bit physical (256MB)

• Automatic management © Alvin R. Lebeck 2001

CPS 220

11

Virtual Memory • Process = virtual address space + thread(s) of control • Translation – VA -> PA – What physical address does virtual address A map to – Is VA in physical memory?

• Protection (access control) – Do you have permission to access it?

© Alvin R. Lebeck 2001

CPS 220

Page 6

12

Virtual Memory: Questions • How is data found if it is in physical memory? • Where can data be placed in physical memory? Fully Associative, Set Associative, Direct Mapped

• What data should be replaced on a miss? (Take Compsci 210 …)

© Alvin R. Lebeck 2001

CPS 220

13

Segmented Virtual Memory • Virtual address (232, 264) to Physical Address mapping (230) • Variable size, base + offset, contiguous in both VA and PA Virtual 0x1000 0x6000

Physical 0x0000 0x1000 0x2000

0x9000 0x11000

© Alvin R. Lebeck 2001

CPS 220

Page 7

14

Intel Pentium Segmentation Physical Address Space

Logical Address Seg Selector

Offset

Global Descriptor Table (GDT)

Segment Descriptor

Segment Base Address

© Alvin R. Lebeck 2001

CPS 220

15

Pentium Segmention (Continued) • Segment Descriptors – Local and Global – base, limit, access rights – Can define many

• Segment Registers – contain segment descriptors (faster than load from mem) – Only 6

• Must load segment register with a valid entry before segment can be accessed – generally managed by compiler, linker, not programmer

© Alvin R. Lebeck 2001

CPS 220

Page 8

16

Paged Virtual Memory • Virtual address (232, 264) to Physical Address mapping (228) – virtual page to physical page frame

Virtual page number

Offset

• Fixed Size units for access control & translation Virtual 0x1000 0x6000

Physical 0x0000 0x1000 0x2000

0x9000 0x11000

© Alvin R. Lebeck 2001

CPS 220

17

Page Table • Kernel data structure (per process) • Page Table Entry (PTE) – VA -> PA translations (if none page fault) – access rights (Read, Write, Execute, User/Kernel, cached/uncached) – reference, dirty bits

• Many designs – Linear, Forward mapped, Inverted, Hashed, Clustered

• Design Issues – support for aliasing (multiple VA to single PA) – large virtual address space – time to obtain translation

© Alvin R. Lebeck 2001

CPS 220

Page 9

18

Alpha VM Mapping (Forward Mapped) • “64-bit” address divided into 3 21 segments seg 0/1 – seg0 (bit 63=0) user code/heap – seg1 (bit 63 = 1, 62 = 1) user stack – kseg (bit 63 = 1, 62 = 0) base kernel segment for OS +

• Three level page table, each one page

L1 10

L2 10

+

– Alpha 21064 only 43 unique bits of VA – (future min page size up to 64KB => 55 bits of VA)

+

• PTE bits; valid, kernel & user read & write enable (No reference, use, or dirty bit)

phys page frame number

– What do you do for replacement?

© Alvin R. Lebeck 2001

L3 PO 10 13

CPS 220

19

Inverted Page Table (HP, IBM) Virtual page number

• One PTE per page frame

Offset Inverted Page Table (IPT)

Hash

VA

PA,ST

– only one VA per physical frame

• Must search for virtual address • More difficult to support aliasing • Force all sharing to use the same VA

Hash Anchor Table (HAT)

© Alvin R. Lebeck 2001

CPS 220

Page 10

20

Intel Pentium Segmentation + Paging Logical Address

Dir Table

Seg Selector

Physical Address Space

Linear Address Space Offset

Offset Page Table

Global Descriptor Table (GDT) Page Dir

Segment Descriptor

Segment Base Address

© Alvin R. Lebeck 2001

CPS 220

21

The Memory Management Unit (MMU) • Input – virtual address

• Output – physical address – access violation (exception, interrupts the processor)

• Access Violations – – – – –

not present user v.s. kernel write read execute

© Alvin R. Lebeck 2001

CPS 220

Page 11

22

Translation Lookaside Buffers (TLB) • Need to perform address translation on every memory reference – 30% of instructions are memory references – 4-way superscalar processor – at least one memory reference per cycle

• Make Common Case Fast, others correct • Throw HW at the problem • Cache PTEs

© Alvin R. Lebeck 2001

CPS 220

23

Fast Translation: Translation Buffer • Cache of translated addresses • Alpha 21164 TLB: 48 entry fully associative Page Number 1

Page offset 2

v r w

phys frame

tag

...

... ...

3

© Alvin R. Lebeck 2001

4

...

48

48:1 mux

CPS 220

Page 12

24

TLB Design • • • •

Must be fast, not increase critical path Must achieve high hit ratio Generally small highly associative Mapping change – page removed from physical memory – processor must invalidate the TLB entry

• PTE is per process entity – Multiple processes with same virtual addresses – Context Switches?

• Flush TLB • Add ASID (PID) – part of processor state, must be set on context switch

© Alvin R. Lebeck 2001

CPS 220

25

Hardware Managed TLBs • Hardware Handles TLB miss • Dictates page table organization • Compilicated state machine to “walk page table”

CPU

TLB

Control

– Multiple levels for forward mapped – Linked list for inverted

• Exception only if access violation

Memory

© Alvin R. Lebeck 2001

CPS 220

Page 13

26

Software Managed TLBs • Software Handles TLB miss • Flexible page table organization • Simple Hardware to detect Hit or Miss • Exception if TLB miss or access violation • Should you check for access violation on TLB miss?

CPU

TLB

Control

Memory

© Alvin R. Lebeck 2001

CPS 220

27

Mapping the Kernel • Digital Unix Kseg

264-1 User Stack

– kseg (bit 63 = 1, 62 = 0)

• Kernel has direct access to physical memory • One VA->PA mapping for entire Kernel • Lock (pin) TLB entry

Kernel

– or special HW detection

Physical Memory

Kernel

User Code/ Data

0 © Alvin R. Lebeck 2001

CPS 220

Page 14

28

Considerations for Address Translation Large virtual address space • Can map more things – – – –

files frame buffers network interfaces memory from another workstation

• Sparse use of address space • Page Table Design – space – less locality => TLB misses

OS structure • microkernel => more TLB misses © Alvin R. Lebeck 2001

CPS 220

29

Address Translation for Large Address Spaces • Forward Mapped Page Table – grows with virtual address space » worst case 100% overhead not likely – TLB miss time: memory reference for each level

• Inverted Page Table – grows with physical address space » independent of virtual address space usage – TLB miss time: memory reference to HAT, IPT, list search

© Alvin R. Lebeck 2001

CPS 220

Page 15

30

Hashed Page Table (HP) Virtual page number

• Combine Hash Table and IPT [Huck96]

Offset

– can have more entries than physical page frames

Hash Hashed Page Table (HPT) VA

PA,ST

• Must search for virtual address • Easier to support aliasing than IPT • Space – grows with physical space

• TLB miss – one less memory ref than IPT

© Alvin R. Lebeck 2001

CPS 220

31

Clustered Page Table (SUN) VPBN

Boff

Offset

...

Hash

– virtual page block number (VPBN) – block offset

VPBN next PA0 attrib

VPBN next PA0 attrib

...

VPBN next PA0 attrib PA1 attrib PA2 attrib PA3 attrib VPBN next PA0 attrib

• Combine benefits of HPT and Linear [Talluri95] • Store one base VPN (TAG) and several PPN values

© Alvin R. Lebeck 2001

CPS 220

Page 16

32

Reducing TLB Miss Handling Time • Problem – must walk Page Table on TLB miss – usually incur cache misses – big problem for IPC in microkernels

• Solution – build a small second-level cache in SW – on TLB miss, first check SW cache » use simple shift and mask index to hash table

© Alvin R. Lebeck 2001

CPS 220

33

Cache Indexing • Tag on each block – No need to check index or block offset

• Increasing associativity shrinks index, expands tag Block Address

TAG

Index

Block offset

Fully Associative: No index Direct-Mapped: Large index

© Alvin R. Lebeck 2001

CPS 220

Page 17

34

Address Translation and Caches • Where is the TLB wrt the cache? • What are the consequences? • Most of today’s systems have more than 1 cache – Digital 21164 has 3 levels – 2 levels on chip (8KB-data,8KB-inst,96KB-unified) – one level off chip (2-4MB)

• Does the OS need to worry about this? Definition: page coloring = careful selection of va->pa mapping

© Alvin R. Lebeck 2001

CPS 220

35

TLBs and Caches CPU

CPU

VA Tags

PA Tags

$

TLB

PA

PA

PA L2 $

TLB

MEM

MEM

Conventional Organization

Virtually Addressed Cache Translate only on miss Alias (Synonym) Problem

© Alvin R. Lebeck 2001

$

VA

PA $

VA

VA

VA TLB

CPU

CPS 220

Page 18

MEM Overlap $ access with VA translation: requires $ index to remain invariant across translation

36

Virtual Caches • Send virtual address to cache. Called Virtually Addressed Cache or just Virtual Cache vs. Physical Cache or Real Cache • Avoid address translation before accessing cache – faster hit time to cache

• Context Switches? – Just like the TLB (flush or pid) – Cost is time to flush + “compulsory” misses from empty cache – Add process identifier tag that identifies process as well as address within process: can’t get a hit if wrong process

• I/O must interact with cache

© Alvin R. Lebeck 2001

CPS 220

37

I/O and Virtual Caches Virtual Cache Physical Addresses I/O is accomplished with physical addresses DMA • flush pages from cache • need pa->va reverse translation • coherent DMA

© Alvin R. Lebeck 2001

interrupts

Processor

Cache

Memory Bus

I/O Bridge

I/O Bus

Main Memory Disk Controller

Disk

CPS 220

Page 19

Disk

Graphics Controller Graphics

Network Interface

Network

38

Aliases and Virtual Caches 264-1

• aliases (sometimes called synonyms); Two different virtual addresses map to same physical address • But, but... the virtual address is used to index the cache • Could have data in two different locations in the cache

Physical Memory

User Stack

Kernel

Kernel

User Code/ Data

0 © Alvin R. Lebeck 2001

CPS 220

39

Index with Physical Portion of Address • If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag Page Offset

Page Address

Address Tag

Index

Block Offset

• Limits cache to page size: what if want bigger caches and use same trick? – Higher associativity – Page coloring

© Alvin R. Lebeck 2001

CPS 220

Page 20

40

Page Coloring for Aliases • HW that guarantees that every cache frame holds unique physical address • OS guarantee: lower n bits of virtual & physical page numbers must have same value; if direct-mapped, then aliases map to same cache frame – one form of page coloring

Page Offset Page Address

Address Tag

Block Offset Index

© Alvin R. Lebeck 2001

CPS 220

41

Page Coloring to reduce misses Cache

Page frames

• Notion of bin – region of cache that may contain cache blocks from a page

• Random vs careful mapping • Selection of physical page frame dictates cache index • Overall goal is to minimize cache misses

© Alvin R. Lebeck 2001

CPS 220

Page 21

42

Careful Page Mapping [Kessler92, Bershad94] • Select a page frame such that cache conflict misses are reduced – only choose from available pages (no VM replacement induced)

• static – “smart” selection of page frame at page fault time

• dynamic – move pages around

© Alvin R. Lebeck 2001

CPS 220

43

A Case for Large Pages • Page table size is inversely proportional to the page size – memory saved

• Fast cache hit time easy when cache > # of cache blocks in page – must be conflict misses – interrupt processor – move a page (recolor)

• Cost of moving page