Programming Languages and Data Structures. Discussion Oct 29

Programming Languages and Data Structures Discussion 5 2015 Oct 29 Outline • BST : Deletion • Bytes/Bits Ordering • Bitwise Operators in C/C++ ...
4 downloads 1 Views 331KB Size
Programming Languages and Data Structures Discussion 5 2015 Oct 29

Outline •

BST : Deletion



Bytes/Bits Ordering



Bitwise Operators in C/C++



Command line parameters



Exercise

BST: Successor node Next larger node Node * succesor(Node * t) { if (t->right != NULL) return min(t->right); Node* p = t->parent; while (p!=NULL && t==p->right){ t = p; p = p->parent; } return p; }

How to find a minimum node?

10 5

15

2

9 7

20 17 30

BST: Predecessor node Next smaller node Node * predecessor(Node * t) { if (t->left != NULL) return max(t->left); Node* p = t->parent; while (p!=NULL && t==p->left){ t = p; p = p->parent; } return p; }

How to find a maximum node?

10 5 2

15 7

20 9

8

17 30

BST:Deletion •

Why might deletion be harder than insertion?

10 5

15

2

9 7

20 17 30

BST:Lazy Deletion •

Instead of physically deleting nodes, just mark them as deleted •

simpler



physical deletions done in batches



some adds just flip deleted flag



extra memory for deleted flag



many lazy deletions slow finds



some operations may have to be modified (e.g., min and max)

BST: Deletion - Leaf Case

10

Delete(17) 5

15

2

9 7

20 17 30

BST: Deletion - One Child Case 10

Delete(15) 5

15

2

9 7

20 30

BST: Deletion - Two Child Case •

Replace node with descendant whose value is guaranteed to be between left and right subtrees: the successor; Perform case 1 or 2 to delete it

10 5

10

Delete(5) 20

2

9 7

9 30

2

20 7

30

BST: Deletion - Two Child Case •

Why will case 2 always go to case 0 or case 1? Because when a node has 2 children, its successor is the minimum in its right subtree



Could we have used predecessor instead? YES. But what to choose? make a balanced tree?

Binary Representations • •

Base 2 number representation Represent 35110 as 00000001010111112 or 1010111112

Encoding Byte Values •



Binary

000000002

-- 111111112



Byte = 8 bits (binary digits)



Example: 001010112 = 32+8+2+1 = 4310



Example: 2610 = 16+8+2 = 001010102

Decimal

010 -- 25510

• Hexadecimal

0016 -- FF16



Groups of 4 binary digits



Byte = 2 hexadecimal (hex) or base 16 digits



Base-16 number representation



Use characters ‘0’ to ‘9’ and ‘A’ to ‘F’ to represent



Write FA1D37B16 in C code as a 4-byte value: •

0xFA1D37B or 0xfa1d37b


l a im nary c x He De Bi 0 0 0000 1 1 0001 2 2 0010 3 3 0011 4 4 0100 5 5 0101 6 6 0110 7 7 0111 8 8 1000 9 9 1001 A 10 1010 B 11 1011 C 12 1100 D 13 1101 E 14 1110 F 15 1111

Sizes of objects (in bytes) C data type Typical 32-bit bool 1 char 1 short int 2 int 4 float 4 long int 4 double 8 long long 8 long double 8 pointer * 4

x86-64 1 1 2 4 4 8 8 8 16 8

MSB and LSB •

Most significant bit/byte (MSB) : •



the largest value, sometimes referred to as the left-most bit/ byte (convention in positional notation of writing)

Least significant bit/byte (LSB): •

right-most bit/byte

A0 B1 C2 D3 E4 F5 67 8916

MSB

LSB LSB

MSB

Byte Ordering •

How should bytes within multi-byte word be ordered in memory or files?



You want to store the 4-byte word 0xaabbccdd. What order will the bytes be stored?



Endianness: big endian vs. liVle endian • •

Two different conventions, used by different architectures Origin: Gulliver’s Travels break eggs

Little-Endian vs. Big-Endian Representation A0 B1 C2 D3 E4 F5 67 8916 Little-Endian

Big-Endian 0 MSB = A0 B1 C2 D3 E4 F5 67 LSB = 89

address

MAX

LSB = 89 67 F5 E4 D3 C2 B1 MSB = A0

Little-Endian vs. Big-Endian Camps

Big-Endian Motorola 68xx, 680x0

Bi-Endian

Motorola Power PC

Intel AMD

IBM Hewlett-Packard

Little-Endian

Silicon Graphics MIPS

Sun SuperSPARC Internet TCP/IP

Why is endianness so important?

DEC VAX RS 232

Little-Endian vs. Big-Endian •

Advantages and Disadvantages Big-Endian • easier to determine a sign of the number: by looking at the byte at address offset 0. • easier to compare, divide two numbers • more natural, strings and integers are stored in the same oder.

Little-Endian • Makes it easier to place values on non-word boundaries. • Conversion from a 16-bit integer address to a 32-bit integer address does not require any arithmetic. • easier addition and multiplication of multi precision numbers

Endianness example: Pointers 0

address

MAX

long int * lptr; 89 67 F5 E4 D3 C2 B1 A0

If ptr is int* ?

Big-Endian (* lptr) = 8967F5E4;

Little-Endian lptr + 1

(* lptr) = E4F56789;

Note about signed and unsigned • The first bit (most significant bit - MSB) in signed int’s/char’s is used a sign bit. • e.g., x = 1000 0010 • unsigned char x = 82 • signed char x = -2 • Similarly, for 32-bit/64-bit int’s the first bit (MSB) denotes the sign (0 = +ve, 1 = -ve) • Therefore, signed char’s range = -128 to 127 & unsigned char’s range = 0 to 255

• Normally unsigned integers are used when dealing with bitwise operations. (char , int …) • How does casting between signed and unsigned work – what values are going to be produced? Bits are unchanged, just interpreted differently! https://en.wikipedia.org/wiki/Signed_number_representations

Most Significant Bit First vs Least Significant Bit First •

The expressions Most Significant Bit First and Least Significant Bit First are indications on the ordering of the sequence of the bits in the bytes sent over a wire in a transmission protocol or in a stream (e.g. an audio stream).



Most Significant Bit First means that the most significant bit will arrive first: •



e.g. the hexadecimal number 0x12 will arrive as the sequence 0 0 0 1 0 0 1 0 .

Least Significant Bit First means that the least significant bit will arrive first: •

e.g. the same hexadecimal number 0x12 will arrive as the (reversed) sequence 0 1 0 0 1 0 0 0.

Bitwise Operations in Integers •

Corresponding bits of both operands are combined by the usual logic operations. • & – AND • Result is 1 if both operand bits are 1

• | – OR • Result is 1 if either operand bit is 1

• ^ – Exclusive OR • Result is 1 if operand bits are different

• ~ – Complement • Each bit is reversed

unsigned int c, a, a =0xF0; // 1111 b= 0xAA; // 1010 c = a & b; // 1010 c = a | b; // 1111 c = a ^ b; // 0101 c = ~a; // 0000

b; 0000 1010 0000 1010 1010 1111

Bit shift Operators • > – Shift right • Divide by 2

int a = 5; int b = a > 3; char d = 5; char e = d operators have different meanings in C++ : they are stream insertion and extraction operators

// binary: 101 // binary: 101000, or 40 in decimal // binary: 101, or back to 5 like we started with // binary (all 8 bits): 00000101 // binary:10000000 - the first 1 in 101 was

int x = -16; // binary (all 32 bits): 11111111111111111111111111110000 int y = x >> 3; // binary: 11111111111111111111111111111110 int z = (unsigned int)x >> 3; // binary: 00011111111111111111111111111110

Bitwise Operators • Issues for signed int • Shifting left : overflowed. • Shifting right: it may shift in zeroes or the sign bit, depending on platform and/or compiler. (arithmetic (signed) shift or a logical (unsigned) shift) • Signed extension: short int y = -12345; int iy = (int) y;

• Why use them? – A single bit shift left is a very fast way of integer multiplication by 2. – A single bit shift right is a very fact way of integer division by 2. – An AND operation with all zeros is a very efficient way to clear out a field. – We can use single bits in a word (32 bits) to represent Boolean values very efficiently. These are sometimes call flags.

Bitwise Operators : Example •

Creating a Mask •

testing if bit i is set unsigned char isBitISet( unsigned char ch, int i ) { unsigned char mask = 1

24; 8; 8; 24;

http://commandcenter.blogspot.fr/2012/04/byte-order-fallacy.html http://en.cppreference.com/w/cpp/types/integer

C++: Command line parameters •

Purpose: •

Modifying program behaviour - command-line parameters can be used to tell a program how you expect it to behave; for example, some programs have a -q (quiet) option to tell them not to output as much text. (-v)



Having a program run without user interaction — this is especially useful for programs that are called from scripts or other programs.

python xx.py cd ~/

C++: Command line parameters int main(int argc, char* argv[]) { if (argc > 1) { if (std::string(argv[1]) == "-v") { // do something } else if (std::string(argv[1]) == "-q") { // do something } else { //do something; } } return 0; }

Arguments are integer or float? atoi, atof, stoi, stof Tools: Getopt, Boost.Program_options To find command-line arguments are simple and not very robust

Exercise1: Hardwood Species Hardwoods are the botanical group of trees that have broad leaves, produce a fruit or nut, and generally go dormant in the winter. America's temperate climates produce forests with hundreds of hardwood species -- trees that share certain biological characteristics. Although oak, maple and cherry all are types of hardwood trees, for example, they are different species. Together, all the hardwood species represent 40 percent of the trees in the United States. On the other hand, softwoods, or conifers, from the Latin word meaning "cone-bearing," have needles. Widely available US softwoods include cedar, fir, hemlock, pine, redwood, spruce and cypress. In a home, the softwoods are used primarily as structural lumber such as 2x4s and 2x6s, with some limited decorative applications. Using satellite imaging technology, the Department of Natural Resources has compiled an inventory of every tree standing on a particular day. You are to compute the total fraction of the tree population represented by each species. Output Print the name of each species represented in the population, in alphabetical order, followed by the percentage of the population it represents, to 4 decimal places.

Exercise1: Hardwood Species Sample Input

Sample Output

Red Alder Ash Aspen Basswood Ash Beech Yellow Birch Ash Cherry Cottonwood Ash Cypress Red Elm Gum Hackberry

Ash 13.7931 Aspen 3.4483 Basswood 3.4483 Beech 3.4483 Black Walnut 3.4483 Cherry 3.4483 Cottonwood 3.4483 Cypress 3.4483 Gum 3.4483 Hackberry 3.4483 Hard Maple 3.4483 Hickory 3.4483

White Oak Hickory Pecan Hard Maple White Oak Soft Maple Red Oak Red Oak White Oak Poplan Sassafras Sycamore Black Walnut Willow

Pecan 3.4483 Poplan 3.4483 Red Alder 3.4483 Red Elm 3.4483 Red Oak 6.8966 Sassafras 3.4483 Soft Maple 3.4483 Sycamore 3.4483 White Oak 10.3448 Willow 3.4483 Yellow Birch 3.4483

Exercise2:CLRS 6.5-9 •

Give an O(n lg k) time algorithm to merge k sorted lists into one sorted list, where n is the total number of elements in all the input lists. (Hint: Use a min-heap for k-way merging.)

Thanks