Chapter 6. Data Types

Chapter 6 Data Types Chapter 6 Topics • • • • • • • • • Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array T...
Author: Jesse Moore
7 downloads 0 Views 709KB Size
Chapter 6 Data Types

Chapter 6 Topics • • • • • • • • •

Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types Associative Arrays Record Types Union Types Pointer and Reference Types 6-2

Introduction • A data type defines  a collection of data values and  a set of predefined operations on those values

6-3

Primitive Data Types • Almost all programming languages provide a set of primitive data types • Primitive data types: Those not defined in terms of other data types • Some primitive data types are merely reflections of the hardware • Others require little non-hardware support 6-4

Primitive Data Types: Integer • Almost always an exact reflection of the hardware so the mapping is trivial • There may be as many as eight different integer types in a language • Java’s signed integer sizes: byte, short, int, long

6-5

Primitive Data Types: Floating Point • Model real numbers, but only as approximations • Languages for scientific use support at least two floating-point types (e.g., float and double); sometimes more • Usually exactly like the hardware, but not always 8 or 11 23 or 52 bits bits • IEEE Floating-Point Exponent Fraction Standard 754 Sign bit 6-6

More on Floating Point • Precision – the accuracy of the fractional part of a value, measured as the number of bits

• Range – a combination of the range of fractions and the range of exponents

6-7

Primitive Data Types: Complex • Represented as an ordered pair of floating numbers • Python specifies the imaginary part by following it with a j or J (7 + 3j)

• Languages that support a complex type include operations for arithmetic on complex values 6-8

Primitive Data Types: Decimal • For business applications (money) – –

Essential to COBOL C# offers a decimal data type

• Store a fixed number of decimal digits 1 byte

1 5 9

7 4 8 3 3 2 5 BCD Code

• Advantage: accuracy • Disadvantages: limited range, wastes memory 6-9

Primitive Data Types: Boolean • Simplest of all • Range of values: two elements, one for “true” and one for “false” • Could be implemented as bits, but often as bytes • Advantage: readability (compared with using integers to represent switches/flags) 6-10

Primitive Data Types: Character • Stored as numeric codings • Most commonly used coding: ASCII • An alternative, 16-bit coding: Unicode – Includes characters from most natural languages – Originally used in Java – C# and JavaScript also support Unicode

6-11

Character String Types • Values are sequences of characters • Design issues: – Is it a primitive type or just a special kind of array? – Should the length of strings be static or dynamic?

6-12

Operations of Character String Type • Typical operations: – – – – –

strcpy in C Assignment and copying strcmp in C Comparison (=, >, etc.) strcat in C Catenation NAME1(2:4) in Ada Substring reference Pattern matching In Perl, string-matching patterns are defined in terms of regular expressions /[A-Za-z][A-Za-z\d]+/ /\d+\.?\d*|\.\d+/ digit

or

zero or one of the preceding element 6-13

Character String Type in Certain Languages • C and C++ – Not primitive – Use char arrays and a library of functions that provide operations

• SNOBOL4 (a string manipulation language) – Primitive – Many operations, including elaborate pattern matching

• Java – Primitive via the String class 6-14

Character String Length Options • Static: length is static and set when the string is created • Limited Dynamic Length: varying length up to a declared maximum – C and C++

• Dynamic: no maximum – SNOBOL4, Perl, JavaScript

• Ada supports all three string length options 6-15

Character String Type Evaluation • Aid to writability • As a primitive type with static length, they are inexpensive to provide--why not have them? • Dynamic length is nice, but is it worth the expense?

6-16

Character String Implementation • Static length: need compile-time descriptor fields describing the character

• Limited dynamic length: may need a runtime descriptor for length (but not in C and C++) End of string is marked with the null character • Dynamic length: need run-time descriptor; allocation/de-allocation is the biggest implementation problem

6-17

Compile- and Run-Time Descriptors

type name

Compile-time descriptor for static strings

type name

Run-time descriptor for limited dynamic strings

Any descriptor entries that are dynamically bound must be maintained at run time 6-18

User-Defined Ordinal Types • An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers (可數) • Examples of primitive ordinal types in Java – integer – char – Boolean

• User-defined ordinal types – enumeration – subrange

6-19

Enumeration Types • user enumerates all of the possible values, which are named constants • C# example enum days {mon, tue, wed, thu, fri, sat, sun};

屬於此type的變數可被assign {mon, tue, wed, thu, fri, sat, sun}其中之一的值 屬於此type的變數可被測試是否值等於或不等於 {mon, tue, wed, thu, fri, sat, sun} 其中之一 屬於此type的變數可被測試是否值大於或小於 {mon, tue, wed, thu, fri, sat, sun} 其中之一 6-20

Design Issues of Enumeration Types • Is an enumeration constant allowed to appear in more than one type definition, and if so, how is the type of an occurrence of that constant checked? enum days {mon, tue, wed, thu, fri, sat, sun}; enum planets {sun, earth, mar, moon};

• Are enumeration values coerced to integer? • Any other type coerced to an enumeration type? 6-21

Design of Enumeration Types

new

• In languages that do not have enumeration types – Programmers usually simulate them with integer values – Example: in C int red = 0, blue = 1; – Problem: no type checking when they are used red + blue blue + 8 + red / 2 red = 3

All legal 6-22

Enumeration Type in C++ (1/2)

new

• C++ includes C’s enumeration type enum colors {red, blue, green, yellow, black}; colors myColor = blue, yourColor = red;

• Enumeration values are coerced to int when they are put in integer context myColor++;

would assign green to myColor if myColor is currently blue

• In fact, C++ allows enumeration constants to be assigned to variables of any numeric type 6-23

Enumeration Type in C++ (2/2)

new

• No other type values is coerced to an enumeration type myColor = 4;

illegal in C++

• C++ enumeration constants can appear in only one enumeration type in the same reference environment

6-24

Enumeration Type in Ada

new

• Ada permits overloaded literal: literals are allowed to appear in more than one declaration • Neither the enumeration literals nor the enumeration variables are coerced to integers – Both the range of operations and the range of values of enumeration types are restricted

6-25

Enumeration Type in C#

new

• Like C++, but enumeration types are never coerced to integers • Operations on enumeration types are restricted to those that make sense • The range of values is restricted to that of the particular enumeration type

6-26

Evaluation of Enumerated Type • Aid to readability, e.g., no need to code a color as a number • Aid to reliability (in Ada, C#, and Java 5.0): – No arithmetic operations are legal on enumeration types; so don’t allow colors to be added – No enumeration variable can be assigned a value outside its defined range • C treats enumeration variables like integer variables, so it doesn’t have such advantages 6-27

Subrange Types • An ordered contiguous subsequence of an ordinal type – Example: 12..18 is a subrange of integer type

• Ada’s design type Days is (mon, tue, wed, thu, fri, sat, sun); subtype Weekdays is Days range mon..fri; subtype Index is Integer range 1..100; Day1: Days; Day2: Weekday; Day2 := Day1;

legal unless Day1 is sat or sun requires run-time range checking 6-28

Usages of Subrange • Compiler must generate range-checking code for every assignment to a subrange variable • For the indices of arrays (most commonly) – Arrays are of limited size

• For loop variables – Subranges of ordinal types are the only way the range of Ada for loop variables can be specified 6-29

Subrange Evaluation • Aid to readability – Make it clear to the readers that variables of subrange can store only certain range of values

• Reliability – Assigning a value to a subrange variable that is outside the specified range is detected as an error 6-30

Implementation of User-Defined Ordinal Types • Enumeration types are usually implemented as integers that provide no increase of reliability • Subrange types are implemented like the parent types with code inserted (by the compiler) to restrict assignments to subrange variables

6-31

Array Types • An array is an aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate, relative to the first element.

6-32

Array Design Issues • What types are legal for subscripts? • Are subscripting expressions in element references range-checked? • When are subscript ranges bound? • When does allocation take place? • What is the maximum number of subscripts? • Can array objects be initialized? • Are any kind of slices allowed? 6-33

Array Indexing • Indexing (or subscripting) is a mapping from indices to elements array_name (index_value_list) → element

• Index Syntax – FORTRAN, PL/I, Ada use parentheses • Ada explicitly uses parentheses to show uniformity between array references and function calls because both are mappings

– Most other languages use brackets 6-34

Arrays Index (Subscript) Types • often a subrange of integers • Ada: any ordinal type (such as Boolean, char, and enumeration) type Week_Day_Type is (Monday, Tuesday, Wednesday, Thursday, Friday); type Sales is array (Week_Day_Type) of Float;

• Checking subscript ranges? – C, C++, Perl, and Fortran do not specify range checking – Java, ML, C# specify range checking 6-35

Array Categories • based on the binding to subscript ranges, the binding to storage, and from where the storage is allocated • Five categories  Static array  Fixed stack-dynamic array  Stack-dynamic array  Fixed heap-dynamic array  Heap-dynamic array 6-36

Static Arrays • subscript ranges are statically bound and storage allocation is static (before run-time) – Advantage: efficiency (no dynamic allocation) In C

float b[30]; … int fun1() { static int a[20]; … }; 6-37

Fixed Stack-Dynamic • subscript ranges are statically bound, but storage allocation is done at declaration elaboration time during execution – Advantage: space efficiency In C

You cannot do this…

int fun1() { int a[20]; … };

int fun1(int k) { int a[k]; … }; 6-38

Stack-Dynamic • Both the subscript ranges and the storage allocation are dynamically bound at elaboration time (but fixed during lifetime) – Advantage: flexibility (the size of an array need not be known until the array is to be used) user input Get(List_Len); In Ada declare List: array (1..List_Len) of Integer; begin … end; deallocated here 6-39

Fixed Heap-Dynamic • similar to fixed stack-dynamic – subscript ranges are statically bound – storage binding is dynamic but fixed after allocation

• differences – bindings are done when the user program requests them during execution – storage is allocated from heap, not stack 6-40

Fixed Heap-Dynamic Example • in C++

a’s range is statically bound

int *a; … a = new int[20]; a[1] = 18; … a’s storage binding is dynamic 6-41

Heap-Dynamic • binding of subscript ranges and storage allocation is dynamic and can change any number of times during the array’s lifetime – Advantage: flexibility (arrays can grow or shrink during program execution) In Perl @list = (1, 2, 4, 7, 10); … push(@list, 13, 17); … @list = ();

Create an array of 5 members Add two more elements Empty the array 6-42

Subscript Binding and Array Categories (continued) • C and C++ arrays that include static modifier are static • arrays within C and C++ functions without static modifier are fixed stack-dynamic • Ada arrays can be stack-dynamic • C and C++ also provide fixed heap-dynamic arrays (via malloc/free and new/delete) • C# includes a second array class ArrayList that provides fixed heap-dynamic • Perl and JavaScript support heap-dynamic arrays 6-43

Heterogeneous Arrays

old

• Elements need not be of the same type • Supported by Perl, Python, JavaScript, and Ruby • Arrays are heap dynamic

6-44

Array Initialization • Some language allow initialization at the time of storage allocation – C, C++, Java, C# example int list [] = {4, 5, 7, 83} – Character strings in C and C++ char name [] = “freddie”; – Arrays of strings in C and C++ char *names [] = {“Bob”, “Jake”, “Joe”]; – Java initialization of String objects String[] names = {“Bob”, “Jake”, “Joe”}; 6-45

Array Initialization in Ada List : array (1..5) of Integer := (1, 3, 5, 7, 9);

Bunch : array (1..5) of Integer := (1 => 17, 3 => 34, others => 0);

6-46

List Comprehension in Python

new

• a function is applied to each of the elements of a given array • a new array is constructed from the results Syntax [ expression for iterate_var in array if condition]

Example [ x * x for x in range(12) if x % 3 == 0]

The range function creates the array [0, 1, 2, …, 11] Result [ 0, 9, 36, 81] 6-47

Arrays Operations • C: none • Ada – assignment; the right side is an aggregate constant rather than an array name – catenation; for all single-dimensioned arrays – relational operators (= and /= only)

• Python – array catenation (+), element membership (in), equality (is, ==), etc.

• Fortran – elemental operations like pair-wise addition 6-48

Elemental Operation Example >> a=[1 3 4] a = 1 3 >> b = [5 7 8] b = 5 7 >> a+b ans = 6

10

From MatLab 4 an aggregate constant 8

12 6-49

Arrays Operations in APL • • • • • • • •

Addition of vertices/matrices Reverse elements of a vector Reverse columns of a matrix Reverse Rows of a matrix Transpose a matrix Invert a matrix Product of two matrices inner product of two vectors

6-50

Rectangular and Jagged Arrays • A rectangular array is a multi-dimensioned array in which all of the rows have the same number of elements and all columns have the same number of elements • A jagged matrix has rows with varying number of elements – Possible when multi-dimensioned arrays actually appear as arrays of arrays Row 1 Row 2 Row 3

Example: An array of character arrays in C

6-51

Slices • A slice is some substructure of an array; nothing more than a referencing mechanism • Slices are only useful in languages that have array operations

6-52

new

Slice Examples in Python vector = [2, 4, 6, 8, 10, 12, 14, 16] mat = [[1, 2, 3],[4, 5, 6],[7, 8, 9]]

0

1

2

3

4

5

6

7

2

4

6

8

10

12

14

16

vector[3:6]

The first element of the slice

The first subscript after the last element

mat[0][0:2]

0 1 1 4 2 7

2 5 8

3 6 9 mat[1]

6-53

Slice Examples in Fortran 95 • Fortran 95 Integer, Dimension (10) :: Vector Integer, Dimension (3, 3) :: Mat Integer, Dimension (3,3,4) :: Cube

Vector (3:6) is a four element array Mat(:, 2) refers to the 2nd column of Mat Mat(3, :) refers to the 3rd row of Mat Cube(2, :, :) (see next page) 6-54

Slices Examples in Fortran 95

Mat(:, 2)

Cube(2, :, :)

Mat(2:3, :)

Cube(:, :, 2:3) 6-55

Implementation of Arrays • Access function maps subscript expressions to an address in memory • Access function for single-dimensioned arrays: address(list[k]) = address (list[lower_bound]) + ((k-lower_bound) * element_size) lower_bound

1 2 list

… …

k

element_size address(list[lower_bound]) 6-56

Implementing the Access Function • If the element type is statically bound and the array is statically bound to storage address(list[k]) = address (list[lower_bound]) + ((k-lower_bound) * element_size) computed at compile time

• However, the addition and multiplication must be done at run time 6-57

Accessing Multi-dimensioned Arrays •多維陣列的元素需依序放入一維的記憶體 • Two common ways: – Row major order (by rows) – used in most languages – Column major order (by columns) – used in Fortran Row major

0 3 1 6 2 1

4 2 3

7 5 8

3

4

7

6

2

5

1

3

8

Column major (in Fortran) 3

6

1

4

2

3

7

5

8 6-58

Locating an Element in a Multidimensioned Array •General format (row-major) Location (a[i,j]) = address of a[row_lb,col_lb] + (((i - row_lb) * n) + (j - col_lb)) * element_size col_lb row_lb a[row_lb,col_lb]

i - row_lb

6-59

Compile-Time Descriptors

for run-time checking of index ranges

Single-dimensioned array

Multi-dimensional array

Any descriptor entries that are dynamically bound must be maintained at run time 6-60

Associative Arrays • An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys –

User defined keys must be stored

• Design issues: What is the form of references to elements

6-61

Associative Arrays in Perl (1/2) • Often called hashes. Names begin with %; literals are delimited by parentheses %hi_temps = ("Mon" => 77, "Tue" => 79, “Wed” => 65, …);

• Key is a string (“Mon”, “Tue”, …) • Value is a scalar (number, string, or reference)

6-62

Associative Arrays in Perl (2/2) • Subscripting is done using braces and keys $hi_temps{"Wed"} = 83;

add a new element

• Elements can be removed delete $hi_temps{“Tue”};

• exists returns true or false if (exists $hi_temp{“Wed”}) …

6-63

Record Types • A record is a possibly heterogeneous aggregate of data elements in which the individual elements are identified by names – Records are supported with the struct data type in C, C++, and C#

• Design issues: – What is the syntactic form of references to the field? – Are elliptical references allowed? 省略的 6-64

Comparison With Array

new

• Elements of an array are referenced by indices • Elements of a record (fields) are referenced with identifiers (names) • Records in some languages are allowed to include unions

6-65

Definition of Records in COBOL • COBOL uses level numbers to show nested records; others use recursive definition 01 EMPLOYEE-RECORD. 02 EMPLOYEE-NAME. 05 FIRST PICTURE IS level 05 MID PICTURE IS numbers 05 LAST PICTURE IS 02 HOURLY-RATE PICTURE

X(20). X(10). X(20). IS 99V99.

Lines with the same level number are in the same record

6-66

Definition of Records in Ada • Record structures are indicated in an orthogonal way type Employee_Name_Type is record First: String (1..20); Middle: String (1..10); Last: String (1..20); end record; type Employee_Record_Type is record Employee_Name : Employee_Name_Type; Hourly_Rate: Float; end record; Employee_Record : Employee_Record_Type; 6-67

Record Field References • COBOL field_name OF record_name_1 OF ... OF record_name_n MID OF EMPLOYEE-NAME OF EMPLOYEE-RECORD

• Others (dot notation)

由內向外

record_name_1.record_name_2. ... record_name_n.field_name Employee_Record.Employee_Name.Middle 由外向內 6-68

References to Records • Fully qualified references must include all record names • Elliptical references allow leaving out record names as long as the reference is unambiguous • for example in COBOL FIRST, FIRST OF EMPLOYEE-NAME, and FIRST of EMPLOYEE-RECORD are elliptical references to the employee’s first name 6-69

Operations on Records • Assignment is very common if the types are identical • Ada allows record comparison of equality or inequality • Ada records can be initialized with aggregate literals • COBOL provides MOVE CORRESPONDING – Copies a field of the source record to the corresponding field in the target record 6-70

COBOL’s MOVE CORESPONDING 01 INPUT-RECORD. 02 NAME. same name 05 LAST PICTURE IS X(20). but not in the 05 MID PICTURE IS X(15). same order 05 FIRST PICTURE IS X(20). that’s ok 02 EMPLOYEE-NUMBER PICTURE IS 9(10). 02 HOURS-WORKED PICTURE IS 99. 01 OUTPUT-RECORD. 02 NAME. 05 FIRST PICTURE IS X(20). 05 MID PICTURE IS X(15). 05 LAST PICTURE IS X(20). 02 EMPLOYEE-NUMBER PICTURE IS 9(10). 02 GROSS-PAY PICTURE IS 99V99. 02 NET-PAY PICTURE IS 99V99. ECORD. R TU P T U O O T D R O C G INPUT-RE MOVE CORRESPONDIN 6-71

Evaluation and Comparison to Arrays • Straightforward and safe design • Records are used when collection of data values is heterogeneous • Access to array elements is much slower than access to record fields, because subscripts are dynamic (field names are static) • Dynamic subscripts could be used with record field access, but it would disallow type checking and it would be much slower 6-72

Implementation of Record Type Fields are stored in adjacent memory locations. offset of Field3 Field1

Field2

Field3

Offset address relative to the beginning of the records is associated with each field Field accesses are all handled using these offsets. 6-73

Unions Types • A union is a type that may store different type values at different times during program execution Field1

A union

Field2 Field3

• Design issues

All fields are stored in the same location

– Should type checking be required? – Should unions be embedded in records?

6-74

Why Unions (1/2)? • Suppose you have three types of records A B

A B

A B

C

D

E

These records are similar but not identical, and have different usages

X

Y Z Then you have to write different versions of functions that act similarly for these records fun f1(record X) { … }

fun f2(record Y) { … }

fun f3(record Z) { … } 6-75

Why Unions (2/2)? • If these records are combined into one U

A B C D E

Then you need only one function

fun f (record U) { … }

But C, D, E are not used at the same time, wasting storage If C, D, E are defined in a union, they occupy the same storage Where C, D, or E is stored in a variable of type U depends on the usage of the variable (specified by programmer) 6-76

Free Unions • Fortran, C, and C++ provide union constructs in which there is no language support for type checking; the union in these languages is called free union union flexType { int intE1; float floatE1; }; union flexType el1; float x; … el1.intE1 = 27; x = el1.floatE1;

Unsafe Union in C not type checked Because the system doesn’t track the current type of the current value of el1 6-77

Discriminated Unions • Type checking of unions require that each union include a type indicator called a discriminant or tag – Supported by Ada

6-78

Ada Union Type (Embedded in a Record) type Shape is (Circle, Triangle, Rectangle); record type Colors is (Red, Green, Blue); type Figure (Form : Shape) is record Filled: Boolean; Color: Colors; union case Form is discriminant when Circle => Diameter: Float; when Triangle => Leftside, Rightside: Integer; Angle: Float; when Rectangle => Side1, Side2: Integer; end case; end record; 6-79

Ada Union Type Illustrated

A discriminated union of three shape variables 6-80

Ada Union Type Figure_1 : Figure; Figure_2 : Figure(Form => Triangle); Figure_1 is declared to be an unconstrained variant record that has no initial values Its type can be changed by assignment of a whole record, including the discriminant. Figure_1 := (Filled => True; Color => Blue; discriminant Form => Rectangle; Side_1 => 12, Side_2 => 3);

constrained to be a triangle and cannot be changed to another variant 6-81

Type Checking Discriminated Union Figure_1 : Figure; Figure_2 : Figure(Form => Triangle);

if (Figure_1.Diameter > 3.0) … • The run-time system would need to check Figure_1 to determine whether its form tag was circle • If it was not, it would be a type error to reference its Diameter 6-82

Evaluation of Unions • Potentially unsafe constructs in some languages – Do not allow type checking

• Java and C# do not support unions – Reflective of growing concerns for safety in programming language

6-83

Implementation of Union Types type Node (Tag : Boolean) is record case Tag is when True => Count : Integer; when False => Sum : Float; end case; end record;

An Ada union

Descriptor for a union

6-84

Pointer and Reference Types • A pointer type variable has a range of values that consists of memory addresses and a special value, nil • Usage – indirect addressing – dynamic memory management

that is

• A pointer can be used to access a location in the area where storage is dynamically created (usually called a heap)

6-85

Design Issues of Pointers • What are the scope of and lifetime of a pointer variable? • What is the lifetime of a heap-dynamic variable? • Are pointers restricted as to the type of value to which they can point? • Are pointers used for dynamic storage management, indirect addressing, or both? • Should the language support pointer types, reference types, or both? 6-86

Pointer Operations • Two fundamental operations: assignment and dereferencing • Assignment is used to set a pointer variable’s value to some useful address • Dereferencing yields the value stored at the location represented by the pointer’s value – Dereferencing can be explicit or implicit – C++ uses an explicit operation via * j = *ptr sets j to the value located at ptr 6-87

Pointer Dereferencing Illustrated

The assignment operation j = *ptr 6-88

When Pointer Points to Records… • If a pointer p points to a record with a field name age, there are two equivalent references (*p).age p->age

• In Ada, p.age can be used.

6-89

Problems with Pointers • Dangling pointers (dangerous) – A pointer points to a heap-dynamic variable that has been de-allocated • Creating one (with explicit deallocation): Allocate a heap-dynamic variable and set a pointer to point at it Set a second pointer to the value of the first pointer Deallocate the heap-dynamic variable, using the first pointer The second pointer becomes a dangling pointer 6-90

Problems with Pointers • Lost heap-dynamic variable – An allocated heap-dynamic variable that is no longer accessible to the user program (often called garbage) • Creating one: Pointer p1 is set to point to a newly created heap-dynamic variable p1 is later set to point to another newly created heap-dynamic variable • The process of losing heap-dynamic variables is called memory leakage 6-91

Pointers in Ada • Some dangling pointers are disallowed – A heap-dynamic variable can be automatically de-allocated at the end of the scope of its pointer type scope – Lessens the need for explicit de-allocation – Explicit de-allocation is the major source of dangling pointers

• The lost heap-dynamic variable problem is not eliminated by Ada

6-92

Pointers in C and C++ • Extremely flexible but must be used with care • Pointers can point at any variable regardless of where it is allocated (not limited to heap) • Used for dynamic storage management and addressing & • Pointer arithmetic is possible • Explicit dereferencing and address-of operators • Domain type need not be fixed (void *) • void * can point to any type; type checking is not a problem (it cannot be de-referenced)

6-93

Pointer Arithmetic in C and C++ Array names without subscript refer to the address of the first element A array name is treated exactly like a constant pointer

float stuff[100]; float *p; p = stuff; *(p+5) is equivalent to stuff[5] p[i] is equivalent to stuff[i] 6-94

Pointers in Fortran 95 • Pointers point to heap and non-heap variables • Implicit dereferencing • Pointers can only point to variables that have the TARGET attribute • The TARGET attribute is assigned in the declaration: can be pointed INTEGER, TARGET :: Apple INTEGER :: Orange cannot be pointed 6-95

C++ Reference Types • Constant pointers that are always implicitly dereferenced Pointer 指 過 來

Cell A

Reference 指 過 去

Cell B

從 一 而 終

Cell A

6-96

C++ Reference Types (Cont.) int result = 0; int &ref_result = result; … ref_result = 100;

reference variables must be initialized ref_result is a reference variable

因為從一而終的關係, 改變ref_result的值一定是設定 ref_result指過去的cell內容而非改變ref_result本身的內容

可當作result的分身(aliase)

• used primarily for formal parameters • Advantages of both pass-by-reference and pass-by-value 6-97

C++ Reference Variable Example C++ parameters are passed by value

int n; ... f1(n); f2(&n);

a reference-type formal parameter acts as other para.

int f1(int &r) {… r = 5; }

passing a pointer requires explicit dereferencing

int f2(int *p) {… *p = 5; } 6-98

Reference Variables in Java and C# • Java extends C++’s reference variables and allows them to replace pointers entirely – Java removes C(C++)-style pointers – References refer to class instances

• C# includes both the references of Java and the pointers of C++

6-99

Evaluation of Pointers • Dangling pointers and garbage are problems as is heap management • Pointers are like goto's--they widen the range of cells that can be accessed by a variable • Pointers or references are necessary for dynamic data structures--so we can't design a language without them

6-100

Representations of Pointers • Large computers use single values • Intel microprocessors use segment and offset memory segment

offset

056f:7403

Max 64KB

6-101

Dangling Pointer Problem Solutions • Tombstone: extra heap cell that is a pointer to the heap-dynamic variable – The actual pointer variable points only at tombstones – When heap-dynamic variable is de-allocated, tombstone remains but set to nil – Costly in time and space tombstone pointer a heap-dynamic variable pointer

nil when deallocated 6-102

Dangling Pointer Problem Solutions Locks-and-keys: – Pointer values are represented as (key, address) pairs – Heap-dynamic variables are represented as variable plus cell for integer lock value – When heap-dynamic variable allocated, lock value is created and placed both in lock cell and key cell of pointer key

lock cell

address

lock value

allocated

a heap-dynamic variable

a pointer lock value

lock value

deallocated

#*@!&% 6-103

Type Checking • We generalize the concept of operands and operators to include subprograms and assignments operands b + c operator int sub1(int a, int b) operator

operands

operands a = b + c; operator 6-104

Type Checking • Type checking is the activity of ensuring that the operands of an operator are of compatible types • A compatible type is one that is either legal for the operator, or is allowed under language rules to be implicitly converted, by compilergenerated code, to a legal type – This automatic conversion is called a coercion. • A type error is the application of an operator to an operand of an inappropriate type

6-105

Type Checking (continued) • If all type bindings are static, nearly all type checking can be static • If type bindings are dynamic, type checking must be dynamic (done at run time) • A programming language is strongly typed if type errors are always detected – This requires that the types of all operands can be determined, either at compile time or at run time

6-106

Strong Typing • Advantage of strong typing: allows the detection of the misuses of variables that result in type errors • Language examples: – FORTRAN 77 is not: parameters, EQUIVALENCE – C and C++ are not: parameter type checking can be avoided; unions are not type checked – Ada is, almost (UNCHECKED CONVERSION is loophole) (Java is similar) 6-107

Strong Typing (continued) • Coercion rules strongly affect strong typing--they can weaken it considerably (C++ versus Ada) a and b are int and d is float intended a + b;

keying error a + d;

C++ detects no error. a is coerced to float

• Although Java has just half the assignment coercions of C++, its strong typing is still far less effective than that of Ada 6-108

Type Compatibility • How can we determine that two variables have compatible types? – Predefined scalar types: simple – Structured types and user-defined types: use type equivalence rules (no coercion)

• Type equivalence rules – Name type equivalence: judge by name – Structure type equivalence: judge by structure

6-109

Name Type Equivalence • Name type equivalence: two variables have equivalent types if – they are in the same declaration, or – they are in declarations that use the same type name Subrange type type Indextype is 1..100; or count, index : Indextype; count and index have equivalent types

type Indextype is 1..100; count : Indextype; … index : Indextype; 6-110

Pros and Cons of Name Type Equivalence • Easy to implement but highly restrictive – Subranges of integer types are not equivalent with integer types type Indextype is 1..100; count : Integer; index : Indextype;

count and index are not equivalent

6-111

– Formal parameters must be the same type as their corresponding actual parameters (Pascal) type Mytype = integer; var formal parameter count : Mytype; subprogram s1(a:integer) begin … count and a are not end;

equivalent types

begin … s1(count); … actual parameter 6-112

Structure Type Equivalence • Structure type equivalence: two variables have equivalent types if their types have identical structures • More flexible, but harder to implement

6-113

Structure Type Equivalence Issue 1 – Are two record (or strcut) types equivalent if they are structurally the same but use different field names? struct node { int age; char name[20]; struct node *next; } a;

struct item { int num; char id[20]; struct node *item; } b;

Structurally the same 6-114

Structure Type Equivalence Issue 2 – Are two array types equivalent if they are the same except that the subscripts are different? In Pascal

Type name1 = array [0..9] of char; Type name2 = array [1..10] of char; Pascal allows the array index type to be a subrange

name1 and name2 are structurally the same 6-115

Structure Type Equivalence Issue 3 – Are two enumeration types equivalent if their components are spelled differently? In C

typedef enum {RED, GREEN, BLUE} color_t; typedef enum {ONE, TWO, THREE} number_t; color_t and number_t are structurally the same

6-116

Structure Type Equivalence Issue 4 – With structural type compatibility, you cannot differentiate between types of the same structure (e.g. different units of speed, both float) type NType1 = record speed_in_meter : real; rank : integer; end

type NType2 = record speed_in_foot : real; rank : integer; end

Structurally the same

6-117

Heap Management • A very complex run-time process • Single-size cells vs. variable-size cells • Two approaches to reclaim garbage – Reference counters (eager approach): reclamation is gradual – Garbage collection (lazy approach): reclamation occurs when the list of variable space becomes empty

6-118

Reference Counter • Reference counters: maintain a counter in every cell that store the number of pointers currently pointing at the cell – Disadvantages: space required, execution time required, complications for cells connected circularly

6-119

Garbage Collection •



The run-time system allocates storage cells as requested and disconnects pointers from cells as necessary without storage reclamation A mark-sweep process gathers garbage when no cells are available  All cells initially are set to garbage  All pointers are traced into heap, and reachable cells are marked as not garbage  All garbage cells are returned to list of available cells



Disadvantages: when you need it most, it works worst (takes most time when program needs most of cells in heap) 6-120

Marking Algorithm 尋找garbage cell 的過程

6-121

Variable-Size Cells • All the difficulties of single-size cells plus more • Required by most programming languages • If garbage collection is used, additional problems occur – The initial setting of the indicators of all cells in the heap is difficult – The marking process in nontrivial – Maintaining the list of available space is another source of overhead 6-122

Summary • The data types of a language are a large part of what determines that language’s style and usefulness • The primitive data types of most imperative languages include numeric, character, and Boolean types • The user-defined enumeration and subrange types are convenient and add to the readability and reliability of programs • Arrays and records are included in most languages • Pointers are used for addressing flexibility and to control dynamic storage management 6-123