Written and compiled by Mrs Ellis
last updated:17/11/2013
Data representation, data types and data structures You need to understand: Representation of data as bit patterns • • •
• •
• • • • •
Describe and use the binary number system and the hexadecimal notation as shorthand for binary number patterns. Describe how characters and numbers are stored in binary form. Explain the representation of positive and negative integers in a fixed-‐ length store using both two’s complementation and sign/magnitude representation. Explain and use shift functions: logical and arithmetic shifts. Describe the need for standardised character sets. Explain the use and nature of the ASCII character set. (Knowledge of actual ASCII codes is not required) Describe the nature and uses of floating point form. State the advantages and disadvantages of representing numbers in integer and floating point formats. Convert a real number to floating point form. Describe truncation and rounding, and explain their effect upon accuracy. Describe the causes of overflow and underflow.
Data types and data structures • • •
Describe, interpret and manipulate data structures: stacks, queues, trees, linked lists, arrays (up to three dimensional) and records. Represent the operation of linked lists and trees using pointers or arrays. Select and justify appropriate data types and structures for given situations.
Representation Of Data As Bit Patterns How computer store information Binary: 0 or 1, in short, the “0”s or “1”s are achieved by power on or off, logic gate open or close, transistors conduct or not. 0 or 1 can be combined to form many different patterns (bit patterns) to encode information. Since writing a long string of 0s and 1s can be exhausting, a more compact form is introduced – hexadecimal. Each hexadecimal digit represents four binary digits, thus make a long binary much shorter. Since the maximum 4 digits binary number is 1111 (decimal 15), and the maximum single digit number is 9, there are six more symbols used for hexadecimal (or simply hex). The following is a table of hexadecimal and their binary counterparts. hex binary decimal
Written and compiled by Mrs Ellis
last updated:17/11/2013
0 0000 0 1 0001 1 2 0010 2 3 0011 3 4 0100 4 5 0101 5 6 0110 6 7 0111 7 8 1000 8 9 1001 9 A 1010 10 B 1011 11 C 1100 12 D 1101 13 E 1110 14 F 1111 15 The primary use of hexadecimal notation is a human-‐friendly representation of binary-‐ coded values in computing and digital electronics. Hexadecimal is commonly used for memory addresses. To convert binary to hex or hexadecimal, just following those steps: Step 1: group the binary bits in groups of 4 starting from the right most bit (the least significant bit), padding with zeros at the leftmost if necessary. For example: Binary bit pattern: 1001100 can be grouped: 0100 1100 Step 2: convert each group of 4 bits into hex, in the above example: 0100 is hex 4, and 1100 is hex C. Therefore, the bit pattern 1001100 in hex representation is 4C Representation Of Positive And Negative Integers In A Fixed-‐Length Store Computers use a fixed number of bits to represent an integer. This is called fixed-‐length store. The commonly-‐used bit-‐lengths for integers are 8-‐bit, 16-‐bit, 32-‐bit or 64-‐bit. Besides bit-‐lengths, there are two representation schemes for integers: Unsigned Integers: can represent zero and positive integers. Signed Integers: can represent zero, positive and negative integers. Three representation schemes had been proposed for signed integers: • Sign-‐Magnitude representation • 1's Complement representation • 2's Complement representation
Written and compiled by Mrs Ellis
last updated:17/11/2013
All the above schemes use the leftmost bit -‐ the most-‐ significant bit, the sign bit, to indicate if the integer is negative or positive. If the sign bit is 1, the bit pattern represents a negative integer while 0 means the pattern represents a positive integer. Sign-‐magnitude representation: The most-‐significant bit (msb) is the sign bit, with value of 0 representing positive integer and 1 representing negative integer. The remaining n-‐1 bits represent the magnitude (absolute value) of the integer. Example: The 8-‐bit pattern: 1000 0001 has "1" as the msb, thus a negative number. And the remaining bit pattern makes decimal number 1. Therefore , this bit pattern represents -‐ 1. The drawbacks of sign-‐magnitude representation are: There are two representations (0000 0000B and 1000 0000B) for the number zero, which could lead to inefficiency and confusion. Positive and negative integers need to be processed separately. 1's complement representation: Again, the most significant bit (msb) is the sign bit, with value of 0 representing • positive integers and 1 representing negative integers. The remaining n-‐1 bits represents the magnitude of the integer, as follows: • For positive integers, the absolute value of the integer is equal to "the magnitude of the (n-‐1)-‐bit binary pattern". For negative integers, the absolute value of the integer is equal to "the magnitude of the complement (inverse) of the (n-‐1)-‐bit binary pattern" (hence called 1's complement). Example: the 8-‐bit pattern 1000 0001 represents a negative number (msb=1). The inverse (complement) of the remaining bit pattern is 111 1110 which is 126. Thus we get -‐126.
Written and compiled by Mrs Ellis
last updated:17/11/2013
Again, the drawbacks are: There are two representations (0000 0000B and 1111 1111B) for zero. The positive integers and negative integers need to be processed separately. 2's complement representation: The most significant bit (msb) is the sign bit, with value of 0 representing • positive integers and 1 representing negative integers. The remaining n-‐1 bits represents the magnitude of the integer, as follows: • For positive integers, the absolute value of the integer is equal to "the magnitude of the (n-‐1)-‐bit binary pattern". For negative integers, the absolute value of the integer is equal to "the magnitude of the complement of the (n-‐1)-‐bit binary pattern plus one" (hence called 2's complement).
Steps to convert 2’s complement representation to decimal: 1. Check the sign bit If the sign bit is 0, the number is positive and its absolute value is the binary value of the remaining n-‐1 bits. If the sign bit is 1, the number is negative.
Written and compiled by Mrs Ellis
last updated:17/11/2013
2. Invert the remaining bits and plus 1 to get the absolute value (magnitude) of negative number. For example, 8-‐bit pattern: 1 100 0100 Sign bit is 1 → negative Invert the remaining bits: 100 0100⇒ 011 1011 Add 1 to the inverted bits: 011 1011 + 1 = 011 1100 which in decimal is 60 Hence, the value is -‐60 Step 2 can also be done by checking the remaining bits from the right (least-‐significant bit). Look for the first occurrence of 1. Flip all the bits to the left of that first occurrence of 1. The flipped pattern gives the absolute value. Arithmetic shifts can be useful as efficient ways of performing multiplication or division of signed integers by powers of two. Shifting left by n bits on a signed or unsigned binary number has the effect of multiplying it by 2n. Shifting right by n bits on a two's complement signed binary number has the effect of dividing it by 2n, but it always rounds down (towards negative infinity). . A left arithmetic shift of a binary number by 1. The empty position in the least significant bit is filled with a zero. Note that arithmetic left shift may cause an overflow. Before shift: 23 After shift: 46 A right arithmetic shift of a binary number by 1. The empty position in the most significant bit is filled with a copy of the original MSB. Before shift: 23 After shift: 11 Logical shifts can be useful as efficient ways of performing multiplication or division of unsigned integers by powers of two. Shifting left by n bits on a signed or unsigned binary number has the effect of multiplying it by 2n. Shifting right by n bits on an unsigned binary number has the effect of dividing it by 2n (rounding towards 0).
Written and compiled by Mrs Ellis
last updated:17/11/2013
A logic left shift of a binary number by 1. The empty position in the least significant bit is filled with a zero. Before shift: 23 After shift: 46
A logic right shift of a binary number by 1. The empty position in the most significant bit is filled with a zero. Before shift: 23 After shift: 11
Character representations in computers So far we have covered the integer representation in fix-‐length store. How do computers represent characters? Computers still use 0s and 1s of course. It is just a matter of establishing a standard way to encode a set of characters using numbers. One of the most adapted standards is the ASCII which stands for American Standard Code for Information Interchange. An ASCII code is the numerical representation of a character such as 'a' or '@' or an action of some sort. ASCII was developed a long time ago and it has been extended from its original 128 (7-‐bit) characters to 256 to include more symbols (not other language characters). The drawbacks of ASCII: • It was not originally designed for computers • It does not support other writing systems, like Chinese. • It only has limited 256 characters and symbols Benefit of using ASCII: • enables computer (systems) to communicate with each other easily • use of (mainly) just one code avoids confusion New character encoding standards have been developed to address ASCII drawbacks. One of those is the Unicode -‐ a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems. Developed in conjunction with the Universal Character Set standard and published in book form as The Unicode Standard, the latest version of Unicode contains a repertoire of more than 110,000 characters covering 100 scripts. Real number representation in computers
Written and compiled by Mrs Ellis
last updated:17/11/2013
Now we know we can represent any integers (positive or negative) using binary numbers. However, to represent fractions or numbers with decimal points in computers, it is not so easy. In base ten notation, the number 56.23 can be treated as following: 50 + 6 + 0.2+0.03 which written in base ten: 5 6 . 2 3 1 0 -‐1 5×10 6×10 . 2×10 3×10-‐2 Similarly, in binary (base two) notation, the fraction 4.625 can be represented in the following way: 100.101 1 0 0 . 1 0 1 2 1 0 -‐1 -‐2 1×2 0×2 0×2 . 1×2 0×2 1×2-‐3 Loss Of Accuracy in binary representations of real numbers Assuming we have a computer that uses 8-‐bits to represent fractions and the decimal point is in the middle, like the following table shows, the smallest value is zero (00000000), the largest value is 15.9375 (11111111). The smallest non-‐zero value is 0.0625. Power of Every 3 2 1 0 . -‐1 -‐2 -‐3 -‐4 2 repres 8 4 2 1 . 0.5 0.25 0.125 0.0625 ented value is a multiple of 0.0625. For example, if we need to represent the number 3.14 using the above scheme, the closest binary representation to 3.14 is 00110010, which is in fact 3.125! This is NOT very accurate! In fact, no matter what we do, using finite number of bits to represent real numbers will always have limited accuracy. There are infinite numbers between two real numbers (even between 0.1 and 0.2). Floating Point Numbers Introduction Floating point numbers are numbers that contain a fractional part i.e. they contain a decimal point with numbers after it. They are called floating point because the point can 'float' or move when the number is expressed using scientific notation. For example 123.456 and 0.4546 can be expressed as 1.23456 x 102 and 4.546 x 10-‐1. In the first
Written and compiled by Mrs Ellis
last updated:17/11/2013
number the point has floated left, and in the second it has floated right. Terminology Points to note for the numbers 1.23456 x 102 and 4.546 x 10-‐1: • Base ten is used in the two numbers • 1.23456 and 4.546 are called the mantissa of each respective number • 2 and -‐1 (from 102 and 10-‐1) are called the powers or exponents of the numbers. Floating point binary numbers Real numbers can be stored using floating point form which stores real numbers in mantissa and exponent. An international standard called IEEE 754 floating point standard (not required by exam) defines the way a floating point binary fraction is stored. There are two main forms. One uses 32 bits (single precision) to store a number and the other 64 bits (double precision) to do the same, but with more accuracy. (not required by exam.) The 32 bit form is shown in the graphic below.
In WJEC exam, you will not likely be asked to use the IEEE standards. But you may asked to convert a decimal number to a binary number in given number of bits. The next example shows you how to do this. Example 2: suppose we have a decimal number 2.75D. To convert it to binary in two’s compliment: 12-‐bit representation for mantissa and 4 bits for exponent, follow the steps below: Step 1: convert the integer part, 2D = 10B Step 2: convert the decimal part, 0.75D = 1 x ½ + 1 x ¼ = .11B Step 3: create the mantissa by combining the sign bit and the two parts: 010.11B Step 4: calculate the exponent which is 2D (decimal point starting from the second bit) and 4-‐bit binary : 0010 Step 5: pad with “0”s to 12 bits: 0101 1000 0000B 0101 1000 0000 0010
Written and compiled by Mrs Ellis
last updated:17/11/2013
Storing number in interger form when possible, because: • stored with complete accuracy • no need to store decimal places • integer requires less storage space than floating point form • may reduce processing time •
Benefit of storing numbers in floating point form: • greater range of (positive/negative) numbers can be stored in the same number of bits • numbers with more decimal places to be stored in the same number of bits (precision) • can store decimal parts of a number • Drawbacks of floating point form: • numbers are not normally stored completely accurately • require more complex processing • no exact representation of zero Overflow and underflow No matter how many bits we use to represent real numbers, the finite number of bits used will limit the values. For example, the 32-‐bit single precision floating scheme only has 8 bits for the exponent. This approximately gives a range between ±10-‐38 and 1038.
• Overflow number is too large to be handled correctly by the computer • Underflow The number is too small to be represented in the Exponent field (less than 2−127 for example in 32-‐bit IEEE standard) Rouding: number is approximated to the nearest whole number/tenth/hundredth. Trauncating: number is approximated to whole number/tenth/hundredth. Examples: Round 37.75 to the nearest whole number is 38 Truncate 37.75 to the nearest whole number is 37 In general, rounding tends to give an answer closer to the original number. When rounding is used in a program may result in problems such as: o Further calculations increases inaccuracy o A test for equality may fail due to minor difference casued by rounding o In some applications, a high level of accuracy is vital, rounding may reduce this accuracy. Data types and data structures
Written and compiled by Mrs Ellis
last updated:17/11/2013
In CG2, you have learned some primitive data types, one-‐ and two-‐dimensional arrays and records data structures. A data structure is a collection of related data items. They can be organized in a computer in many ways. In addition to arrays, queue, stack, linked list and binary tree are commonly used, and each has its own way of organizing items they contain. Using two-‐dimensional arrays we can organise data like pupils’ names and their heights like so: JOE HARRY EMILY ERIKA 1.7 1.69 1.65 1.76 For JOE: (0,0) has the name and (0,1) has JOE’s height. The best way to think of 3-‐D array is to image neatly stacked cubes in space with x, y, z coordinates for each cube’s position. This element in the 3D array will be myArray(2,3,1)
An example of use 3D array is to keep track of pupil’s grades in subjects over a period of time. Linked List A linked list can be thought of as an array that can be as large as is needed (within the bounds of the RAM available on the computer). Put simply, each element of the list contains the item of data that is to be stored, and the memory address of the next item in the list, as shown in the diagram below. Doing this means that the location of each item doesn’t have to be immediately next to the last element, and that you can always add another item to the list as needed.
Written and compiled by Mrs Ellis
last updated:17/11/2013
To add a new record to an ordered linked list, the insert location is established, the previous element’s pointer is changed to the location of the new element, and the newly inserted element’s pointer is aimed at the next element in the sequence. Therefore, only one existing item (the one before the new item) needs to be modified. This is better expressed as a diagram: -‐ Binary Tree A binary tree is a tree like data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". A binary tree is a simple way of storing data that can be searched easily later. To create one, you choose a “root node”, then take each additional value and put it to the left or right of the root, depending on whether it’s greater or less than it. Advantage: faster to search or add a value than an array. Disadvantage: more comples to program and process. To add a new value to the tree, you work down to the bottom (using the less than/more than idea) and add your new value at an appropriate place. For example, the number “1” would go to the left of 2108 -‐ it’s less than 2617, less than 2456 and less than 2108. Queue (FIFO – First In First Out)
Root node
Written and compiled by Mrs Ellis
last updated:17/11/2013
A queue structure allows instructions to be passed to the CPU in the order in which they are received. A new instruction joins the “tail” of the queue, and moves along it as each instruction is processed at the head. It is simple to add an item to a queue-‐just add it to the end. Example uses of queue data structure: • • • •
a printer queue a keyboard buffer a download buffer a processor scheduling queue (see CG3.2)
Stack (LIFO – Last In First Out) A stack is similar to a queue, with the difference being that the last item added to the stack is the first item to be retrieved. You can think of it as being similar to a large stack of papers on an office worker’s desk, where the employee has things added to it throughout the day. They simply work through the pile from top to bottom. New material can be added to the top of the pile at any time.
Good examples of stacks: • • •
subroutine return addresses interrupt handling undo function
The examiner will be looking for the idea of “winding back” in your answers – ensure your example has an element of “last in, first out” about it. Exam Questions:
Written and compiled by Mrs Ellis
last updated:17/11/2013
S2011.08 In a certain implementation, a linked list of integers is actually stored in a table form as shown below. The integers are to be accessed in ascending numerical order. A variable points to the address 852, which contains the lowest integer, 2415. Complete the pointer column in the table below.
S2010.10
(a) In a certain computer, sign/magnitude is used to represent integers using eight bits, with the left bit being set to zero for a positive number. Show how the negative number -8 will be represented. [1] (b) (i) An advantage of floating point form in a computer is that it can be used to store numbers which are not integers. State one other advantage of using floating point form rather than integer form and state one problem which may result from storing numbers in floating point form. [2]
(ii)In a certain computer, real numbers are stored in floating point form using 16 bits as shown below:
Convert the number 18.5 into this floating point form. [2] (c) State one benefit of using a character set such as ASCII. In the ASCII character set, the character “S” is stored as 01010011. How will the character “U” be stored? [2]