## Data representation, data types and data structures

Written  and  compiled  by  Mrs  Ellis     last  updated:17/11/2013   Data  representation,  data  types  and  data  structures     You  need  to ...
Author: Edith Bond
Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

Data  representation,  data  types  and  data  structures     You  need  to  understand:   Representation  of  data  as  bit  patterns   • • •

• •

• • • • •

Describe  and  use  the  binary  number  system  and  the  hexadecimal  notation   as  shorthand  for  binary  number  patterns.   Describe  how  characters  and  numbers  are  stored  in  binary  form.   Explain   the   representation   of   positive   and   negative   integers   in   a   fixed-­‐ length   store   using   both   two’s   complementation   and   sign/magnitude   representation.   Explain  and  use  shift  functions:  logical  and  arithmetic  shifts.   Describe   the   need   for   standardised   character   sets.   Explain   the   use   and   nature  of  the  ASCII  character  set.  (Knowledge  of  actual  ASCII  codes  is  not   required)   Describe  the  nature  and  uses  of  floating  point  form.   State   the   advantages   and   disadvantages   of   representing   numbers   in   integer  and  floating  point  formats.   Convert  a  real  number  to  floating  point  form.   Describe  truncation  and  rounding,  and  explain  their  effect  upon  accuracy.   Describe  the  causes  of  overflow  and  underflow.

Data  types  and  data  structures   • • •

Describe,  interpret  and  manipulate  data  structures:  stacks,  queues,  trees,  linked   lists,  arrays  (up  to  three  dimensional)  and  records.   Represent  the  operation  of  linked  lists  and  trees  using  pointers  or  arrays.   Select  and  justify  appropriate  data  types  and  structures  for  given  situations.

Representation  Of  Data  As  Bit  Patterns   How  computer  store  information     Binary:  0  or  1,  in  short,  the  “0”s  or  “1”s  are  achieved  by  power  on  or  off,  logic  gate  open   or  close,  transistors  conduct  or  not.    0  or  1  can  be  combined  to  form  many  different   patterns  (bit  patterns)  to  encode  information.     Since  writing  a  long  string  of  0s  and  1s  can  be  exhausting,  a  more  compact  form  is   introduced  –  hexadecimal.    Each  hexadecimal  digit  represents  four  binary  digits,  thus   make  a  long  binary  much  shorter.    Since  the  maximum  4  digits  binary  number  is  1111   (decimal  15),  and  the  maximum  single  digit  number  is  9,  there  are  six  more  symbols   used  for  hexadecimal  (or  simply  hex).    The  following  is  a  table  of  hexadecimal  and  their   binary  counterparts.   hex   binary   decimal

Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

0   0000   0   1   0001   1   2   0010   2   3   0011   3   4   0100   4   5   0101   5   6   0110   6   7   0111   7   8   1000   8   9   1001   9   A   1010   10   B   1011   11   C   1100   12   D   1101   13   E   1110   14   F   1111   15     The  primary  use  of  hexadecimal  notation  is  a  human-­‐friendly  representation  of  binary-­‐ coded  values  in  computing  and  digital  electronics.    Hexadecimal  is  commonly  used  for   memory  addresses.     To  convert  binary  to  hex  or  hexadecimal,  just  following  those  steps:     Step  1:  group  the  binary  bits  in  groups  of  4  starting  from  the  right  most  bit  (the  least   significant  bit),  padding  with  zeros  at  the  leftmost  if  necessary.  For  example:       Binary  bit  pattern:  1001100      can  be  grouped:  0100  1100     Step  2:  convert  each  group  of  4  bits  into  hex,  in  the  above  example:  0100  is  hex  4,  and   1100  is  hex  C.    Therefore,  the  bit  pattern  1001100  in  hex  representation  is  4C       Representation  Of  Positive  And  Negative  Integers  In  A  Fixed-­‐Length  Store     Computers  use  a  fixed  number  of  bits  to  represent  an  integer.  This  is  called  fixed-­‐length   store.  The  commonly-­‐used  bit-­‐lengths  for  integers  are  8-­‐bit,  16-­‐bit,  32-­‐bit  or  64-­‐bit.   Besides  bit-­‐lengths,  there  are  two  representation  schemes  for  integers:     Unsigned  Integers:  can  represent  zero  and  positive  integers.   Signed  Integers:  can  represent  zero,  positive  and  negative  integers.       Three  representation  schemes  had  been  proposed  for  signed  integers:   • Sign-­‐Magnitude  representation   • 1's  Complement  representation   • 2's  Complement  representation

Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

All  the  above  schemes  use  the  leftmost  bit  -­‐  the  most-­‐  significant  bit,  the  sign  bit,  to   indicate  if  the  integer  is  negative  or  positive.  If  the  sign  bit  is  1,  the  bit  pattern   represents  a  negative  integer  while  0  means  the  pattern  represents  a  positive   integer.       Sign-­‐magnitude  representation:           The  most-­‐significant  bit  (msb)  is  the  sign  bit,  with  value  of  0  representing  positive   integer  and  1  representing  negative  integer.   The  remaining  n-­‐1  bits  represent  the  magnitude  (absolute  value)  of  the  integer.     Example:     The  8-­‐bit  pattern:  1000  0001  has  "1"  as  the  msb,  thus  a  negative  number.  And  the   remaining  bit  pattern  makes  decimal  number  1.  Therefore  ,  this  bit  pattern  represents  -­‐ 1.                                   The  drawbacks  of  sign-­‐magnitude  representation  are:   There  are  two  representations  (0000  0000B  and  1000  0000B)  for  the  number  zero,   which  could  lead  to  inefficiency  and  confusion.   Positive  and  negative  integers  need  to  be  processed  separately.       1's  complement  representation:   Again,  the  most  significant  bit  (msb)  is  the  sign  bit,  with  value  of  0  representing   • positive  integers  and  1  representing  negative  integers.   The  remaining  n-­‐1  bits  represents  the  magnitude  of  the  integer,  as  follows:   •   For  positive  integers,  the  absolute  value  of  the  integer  is  equal  to  "the  magnitude  of  the   (n-­‐1)-­‐bit  binary  pattern".   For  negative  integers,  the  absolute  value  of  the  integer  is  equal  to  "the  magnitude  of  the   complement  (inverse)  of  the  (n-­‐1)-­‐bit  binary  pattern"  (hence  called  1's  complement).     Example:  the  8-­‐bit  pattern  1000  0001  represents  a  negative  number  (msb=1).  The   inverse  (complement)  of  the  remaining  bit  pattern  is  111  1110  which  is  126.  Thus  we   get  -­‐126.

Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

Again,  the  drawbacks  are:   There  are  two  representations  (0000  0000B  and  1111  1111B)  for  zero.   The  positive  integers  and  negative  integers  need  to  be  processed  separately.      2's  complement  representation:   The  most  significant  bit  (msb)  is  the  sign  bit,  with  value  of  0  representing   • positive  integers  and  1  representing  negative  integers.   The  remaining  n-­‐1  bits  represents  the  magnitude  of  the  integer,  as  follows:   •   For  positive  integers,  the  absolute  value  of  the  integer  is  equal  to  "the  magnitude  of  the   (n-­‐1)-­‐bit  binary  pattern".     For  negative  integers,  the  absolute  value  of  the  integer  is  equal  to  "the  magnitude  of  the   complement  of  the  (n-­‐1)-­‐bit  binary  pattern  plus  one"  (hence  called  2's  complement).

Steps  to  convert  2’s  complement  representation  to  decimal:   1. Check  the  sign  bit     If  the  sign  bit  is  0,  the  number  is  positive  and  its  absolute  value  is  the   binary  value  of  the  remaining  n-­‐1  bits.   If  the  sign  bit  is  1,  the  number  is  negative.

Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

2. Invert  the  remaining  bits  and  plus  1  to  get  the  absolute  value  (magnitude)  of   negative  number.   For  example,   8-­‐bit  pattern:  1  100  0100   Sign  bit  is  1  →  negative   Invert  the  remaining  bits:  100  0100⇒  011  1011   Add  1  to  the  inverted  bits:  011  1011  +  1  =  011  1100  which  in  decimal  is  60   Hence,  the  value  is  -­‐60     Step  2  can  also  be  done  by  checking  the  remaining  bits  from  the  right  (least-­‐significant   bit).  Look  for  the  first  occurrence  of  1.  Flip  all  the  bits  to  the  left  of  that  first  occurrence   of  1.  The  flipped  pattern  gives  the  absolute  value.       Arithmetic  shifts  can  be  useful  as  efficient  ways  of  performing  multiplication  or   division  of  signed  integers  by  powers  of  two.  Shifting  left  by  n  bits  on  a  signed  or   unsigned  binary  number  has  the  effect  of  multiplying  it  by  2n.  Shifting  right  by  n  bits  on   a  two's  complement  signed  binary  number  has  the  effect  of  dividing  it  by  2n,  but  it   always  rounds  down  (towards  negative  infinity).       .   A  left  arithmetic  shift  of  a  binary  number  by  1.  The     empty  position  in  the  least  significant  bit  is  filled  with  a     zero.  Note  that  arithmetic  left  shift  may  cause  an     overflow.     Before  shift:  23     After  shift:  46         A  right  arithmetic  shift  of  a  binary  number  by  1.  The     empty  position  in  the  most  significant  bit  is  filled  with  a     copy  of  the  original  MSB.     Before  shift:  23   After  shift:  11         Logical  shifts  can  be  useful  as  efficient  ways  of  performing  multiplication  or  division  of   unsigned  integers  by  powers  of  two.  Shifting  left  by  n  bits  on  a  signed  or  unsigned   binary  number  has  the  effect  of  multiplying  it  by  2n.  Shifting  right  by  n  bits  on  an   unsigned  binary  number  has  the  effect  of  dividing  it  by  2n  (rounding  towards  0).

Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

A  logic  left  shift  of  a  binary  number  by  1.  The  empty   position  in  the  least  significant  bit  is  filled  with  a  zero.     Before  shift:  23   After  shift:  46

A  logic  right  shift  of  a  binary  number  by  1.  The  empty   position  in  the  most  significant  bit  is  filled  with  a  zero.     Before  shift:  23   After  shift:  11

Character  representations  in  computers     So  far  we  have  covered  the  integer  representation  in  fix-­‐length  store.  How  do  computers   represent  characters?  Computers  still  use  0s  and  1s  of  course.  It  is  just  a  matter  of   establishing  a  standard  way  to  encode  a  set  of  characters  using  numbers.    One  of  the   most  adapted  standards  is  the  ASCII  which  stands  for  American  Standard  Code  for   Information  Interchange.  An  ASCII  code  is  the  numerical  representation  of  a  character   such  as  'a'  or  '@'  or  an  action  of  some  sort.  ASCII  was  developed  a  long  time  ago  and  it   has  been  extended  from  its  original  128  (7-­‐bit)  characters  to  256  to  include  more   symbols  (not  other  language  characters).      The  drawbacks  of  ASCII:   • It  was  not  originally  designed  for  computers   • It  does  not  support  other  writing  systems,  like  Chinese.   • It  only  has  limited  256  characters  and  symbols   Benefit  of  using  ASCII:   • enables  computer  (systems)  to  communicate  with  each  other  easily   • use  of  (mainly)  just  one  code  avoids  confusion     New  character  encoding  standards  have  been  developed  to  address  ASCII   drawbacks.  One  of  those  is  the  Unicode  -­‐  a  computing  industry  standard  for  the   consistent  encoding,  representation  and  handling  of  text  expressed  in  most  of  the   world's  writing  systems.  Developed  in  conjunction  with  the  Universal  Character  Set   standard  and  published  in  book  form  as  The  Unicode  Standard,  the  latest  version  of   Unicode  contains  a  repertoire  of  more  than  110,000  characters  covering  100  scripts.     Real  number  representation  in  computers

Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

Now  we  know  we  can  represent  any  integers  (positive  or  negative)  using  binary   numbers.    However,  to  represent  fractions  or  numbers  with  decimal  points  in   computers,  it  is  not  so  easy.    In  base  ten  notation,  the  number  56.23  can  be  treated  as   following:   50  +  6  +  0.2+0.03   which  written  in  base  ten:   5   6   .   2   3   1 0 -­‐1 5×10   6×10   .   2×10   3×10-­‐2       Similarly,  in  binary  (base  two)  notation,  the  fraction  4.625  can  be  represented  in  the   following  way:  100.101     1   0   0   .   1   0   1   2 1 0 -­‐1 -­‐2 1×2   0×2   0×2   .   1×2   0×2   1×2-­‐3     Loss  Of  Accuracy  in  binary  representations  of  real  numbers     Assuming  we  have  a  computer  that  uses  8-­‐bits  to  represent  fractions  and  the  decimal   point  is  in  the  middle,  like  the  following  table  shows,  the  smallest  value  is  zero   (00000000),  the  largest  value  is  15.9375  (11111111).  The  smallest  non-­‐zero  value  is   0.0625.   Power  of   Every   3   2   1   0   .   -­‐1   -­‐2   -­‐3   -­‐4   2   repres     8   4   2   1   .   0.5   0.25   0.125   0.0625   ented   value  is  a  multiple  of  0.0625.           For  example,  if  we  need  to  represent  the  number  3.14  using  the  above  scheme,  the   closest  binary  representation  to  3.14  is  00110010,  which  is  in  fact  3.125!  This  is  NOT   very  accurate!  In  fact,  no  matter  what  we  do,  using  finite  number  of  bits  to  represent   real  numbers  will  always  have  limited  accuracy.    There  are  infinite  numbers  between   two  real  numbers  (even  between  0.1  and  0.2).     Floating  Point  Numbers     Introduction   Floating  point  numbers  are  numbers  that  contain  a  fractional  part  i.e.  they  contain  a   decimal  point  with  numbers  after  it.  They  are  called  floating  point  because  the  point  can   'float'  or  move  when  the  number  is  expressed  using  scientific  notation.  For  example   123.456  and  0.4546  can  be  expressed  as  1.23456  x  102  and  4.546  x  10-­‐1.  In  the  first

Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

number  the  point  has  floated  left,  and  in  the  second  it  has  floated  right.       Terminology   Points  to  note  for  the  numbers  1.23456  x  102  and  4.546  x  10-­‐1:   • Base  ten  is  used  in  the  two  numbers   • 1.23456  and  4.546  are  called  the  mantissa  of  each  respective  number   • 2  and  -­‐1  (from  102  and  10-­‐1)  are  called  the  powers  or  exponents  of  the  numbers.     Floating  point  binary  numbers   Real  numbers  can  be  stored  using  floating  point  form  which  stores  real  numbers   in  mantissa  and  exponent.   An  international  standard  called  IEEE  754  floating  point  standard  (not  required  by   exam)  defines  the  way  a  floating  point  binary  fraction  is  stored.  There  are  two  main   forms.  One  uses  32  bits  (single  precision)  to  store  a  number  and  the  other  64  bits   (double  precision)  to  do  the  same,  but  with  more  accuracy.  (not  required  by  exam.)     The  32  bit  form  is  shown  in  the  graphic  below.

In  WJEC  exam,  you  will  not  likely  be  asked  to  use  the  IEEE  standards.  But  you  may   asked  to  convert  a  decimal  number  to  a  binary  number  in  given  number  of  bits.     The  next  example  shows  you  how  to  do  this.     Example  2:  suppose  we  have  a  decimal  number  2.75D.  To  convert  it  to  binary  in  two’s   compliment:  12-­‐bit  representation  for  mantissa  and  4  bits  for  exponent,  follow  the  steps   below:   Step  1:  convert  the  integer  part,  2D  =  10B   Step  2:  convert  the  decimal  part,  0.75D  =  1  x  ½    +  1  x  ¼  =  .11B   Step  3:  create  the  mantissa  by  combining  the  sign  bit  and  the  two  parts:  010.11B   Step  4:  calculate  the  exponent  which  is  2D  (decimal  point  starting  from  the  second  bit)   and  4-­‐bit  binary  :  0010   Step  5:  pad  with  “0”s  to  12  bits:  0101  1000  0000B     0101  1000  0000   0010

Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

Storing  number  in  interger  form  when  possible,  because:   • stored  with  complete  accuracy     • no  need  to  store  decimal  places     • integer  requires  less  storage  space  than  floating  point  form   • may  reduce  processing  time   •

Benefit  of  storing  numbers  in  floating  point  form:   • greater  range  of  (positive/negative)  numbers  can  be  stored  in  the  same  number   of  bits   • numbers  with  more  decimal  places  to  be  stored  in  the  same  number  of  bits   (precision)   • can  store  decimal  parts  of  a  number   •   Drawbacks  of  floating  point  form:   • numbers  are  not  normally  stored  completely  accurately   • require  more  complex  processing   • no  exact  representation  of  zero     Overflow  and  underflow     No  matter  how  many  bits  we  use  to  represent  real  numbers,  the  finite  number  of  bits   used  will  limit  the  values.  For  example,  the  32-­‐bit  single  precision  floating  scheme  only   has  8  bits  for  the  exponent.  This  approximately  gives  a  range  between  ±10-­‐38  and  1038.

• Overflow      number  is  too  large  to  be  handled  correctly  by  the  computer   •  Underflow    The  number  is  too  small  to  be  represented  in  the  Exponent  field  (less   than  2−127  for  example  in  32-­‐bit  IEEE  standard)     Rouding:  number  is  approximated  to  the  nearest  whole  number/tenth/hundredth.   Trauncating:  number  is  approximated  to  whole  number/tenth/hundredth.   Examples:           Round  37.75  to  the  nearest  whole  number  is  38       Truncate  37.75  to  the  nearest  whole  number  is  37   In  general,  rounding  tends  to  give  an  answer  closer  to  the  original  number.     When  rounding  is  used  in  a  program  may  result  in  problems  such  as:   o Further  calculations  increases  inaccuracy   o A  test  for  equality  may  fail  due  to  minor  difference  casued  by  rounding   o In  some  applications,  a  high  level  of  accuracy  is  vital,  rounding  may   reduce  this  accuracy.     Data  types  and  data  structures

Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

In  CG2,  you  have  learned  some  primitive  data  types,  one-­‐  and  two-­‐dimensional  arrays   and  records  data  structures.         A  data  structure  is  a  collection  of  related  data  items.  They  can  be  organized  in  a   computer  in  many  ways.    In  addition  to  arrays,  queue,  stack,  linked  list  and  binary  tree   are  commonly  used,  and  each  has  its  own  way  of  organizing  items  they  contain.     Using  two-­‐dimensional  arrays  we  can  organise  data  like  pupils’  names  and  their  heights   like  so:   JOE   HARRY   EMILY   ERIKA   1.7   1.69   1.65   1.76     For  JOE:  (0,0)  has  the  name  and  (0,1)  has  JOE’s  height.     The  best  way  to  think  of  3-­‐D  array  is  to  image  neatly  stacked  cubes  in  space  with  x,  y,  z   coordinates  for  each  cube’s  position.           This  element  in  the  3D  array  will  be  myArray(2,3,1)

An  example  of  use  3D  array  is  to  keep  track  of  pupil’s  grades  in  subjects  over  a  period  of   time.     Linked  List   A  linked  list  can  be  thought  of  as  an  array  that  can  be  as  large  as  is  needed  (within  the   bounds   of   the   RAM   available   on   the   computer).   Put   simply,   each   element   of   the   list   contains  the  item  of  data  that  is  to  be  stored,  and  the  memory  address  of  the  next  item   in  the  list,  as  shown  in  the  diagram  below.     Doing  this  means  that  the  location  of  each  item  doesn’t  have  to  be  immediately  next  to   the  last  element,  and  that  you  can  always  add  another  item  to  the  list  as  needed.

Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

To   add   a   new   record   to   an   ordered   linked   list,   the   insert   location   is   established,   the   previous  element’s  pointer  is  changed  to  the  location  of  the  new  element,  and  the  newly   inserted  element’s  pointer  is  aimed  at  the  next  element  in  the  sequence.  Therefore,  only   one   existing     item   (the   one   before   the   new   item)   needs   to   be   modified.   This   is   better   expressed  as  a  diagram:  -­‐           Binary  Tree   A   binary   tree   is   a   tree   like   data   structure   in   which   each   node   has   at   most   two   child   nodes,   usually   distinguished   as   "left"   and   "right".     A   binary   tree   is   a   simple   way   of   storing  data  that  can  be  searched  easily  later.  To  create  one,  you  choose  a  “root  node”,   then  take  each  additional  value  and  put  it  to  the  left  or  right  of  the  root,  depending  on   whether  it’s  greater  or  less  than  it.     Advantage:    faster  to  search  or  add  a  value  than  an  array.   Disadvantage:    more  comples  to  program  and  process.     To  add  a  new  value  to  the  tree,  you  work  down  to  the  bottom  (using  the  less  than/more   than  idea)  and  add  your  new  value  at  an  appropriate  place.  For  example,  the  number  “1”   would  go  to  the  left  of  2108  -­‐  it’s  less  than  2617,  less  than  2456  and  less  than  2108.               Queue    (FIFO  –  First  In  First  Out)

Root node

Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

A  queue  structure  allows  instructions  to  be  passed  to  the  CPU  in  the  order  in  which  they   are  received.  A  new  instruction  joins  the  “tail”  of  the  queue,  and  moves  along  it  as  each   instruction  is  processed  at  the  head.    It  is  simple  to  add  an  item  to  a  queue-­‐just  add  it  to   the  end.           Example  uses  of  queue  data  structure:   • • • •

a  printer  queue     a  keyboard  buffer     a  download  buffer     a  processor  scheduling  queue (see CG3.2)

Stack  (LIFO  –  Last  In  First  Out)   A  stack  is  similar  to  a  queue,  with  the  difference  being  that  the  last  item  added  to  the   stack  is  the  first  item  to  be  retrieved.  You  can  think  of  it  as  being  similar  to  a  large  stack   of   papers   on   an   office   worker’s   desk,   where   the   employee   has   things   added   to   it   throughout   the   day.   They   simply   work   through   the   pile   from   top   to   bottom.   New   material  can  be  added  to  the  top  of  the  pile  at  any  time.

Good  examples  of  stacks:   • • •

subroutine  return  addresses   interrupt  handling   undo  function

The  examiner  will  be  looking  for  the  idea  of  “winding  back”  in  your  answers  –  ensure   your  example  has  an  element  of  “last  in,  first  out”  about  it.     Exam  Questions:

Written  and  compiled  by  Mrs  Ellis

last  updated:17/11/2013

S2011.08   In a certain implementation, a linked list of integers is actually stored in a table form as shown below. The integers are to be accessed in ascending numerical order. A variable points to the address 852, which contains the lowest integer, 2415. Complete the pointer column in the table below.

S2010.10

(a) In a certain computer, sign/magnitude is used to represent integers using eight bits, with the left bit being set to zero for a positive number. Show how the negative number -8 will be represented. [1] (b) (i) An advantage of floating point form in a computer is that it can be used to store numbers which are not integers. State one other advantage of using floating point form rather than integer form and state one problem which may result from storing numbers in floating point form. [2]

(ii)In a certain computer, real numbers are stored in floating point form using 16 bits as   shown below:

Convert the number 18.5 into this floating point form. [2] (c) State one benefit of using a character set such as ASCII. In the ASCII character set, the character “S” is stored as 01010011. How will the character “U” be stored? [2]