Data Structures and Algorithms. Abstract Data Types

Data Structures and Algorithms Abstract Data Types Introduction Data Type(DT): set of values a variable could have. Ex: an integer number is of the...
Author: Maximilian Paul
0 downloads 0 Views 2MB Size
Data Structures and Algorithms

Abstract Data Types

Introduction Data Type(DT): set of values a variable could have. Ex: an integer number is of the type integer. • Internal representation: array of bits. • The need of using DTs: – To choose an optimal internal representation. – To take the most of the characteristics of a DT. Ex: arithmetic operations. • Simple DT, elemental or primitive: integers, floats, booleans and chars. • Simple associated Operations. Ex: multiply.

1

Introduction (II) Working with number matrices −→ defining a matrix DT and associated operations (Ex: multiply). Abstract Data Type(ADT): mathematical model with a set of operations (procedures). • ADT = a DT generalization. • Procedures = a generalization of primitive operations. • Encapsulation o abstraction allow: – To locate the ADT definition and its operations at the same place. ⇒ Specific libraries for an ADT. – To change the implementation accordingly to the problem. – To facilitate code debugging. – To get a better program structure.

2

Introduction (III) Ex: ADT matrix −→ traditional implementation or disperse matrices. • ADT implementation: to translate a type definition into programming language sentences. A procedure for each ADT operation. • An implementation has to choose a Data Structure (DS) for the ADT. • DS built from simple DTs + programming language structured methods (arrays or records).

3

Example Definning an ADT matrix of integers: • Matrix of integers≡ array of integer numbers ordered by rows and columns. The number of elements of each row is the same. The number of elements of each column is the same. An integer number which belongs to a matrix is identified by row and column. Logic representation: 



3 7 9 M = 4 5 6  2 3 5 M [1, 2] = 4.

4

Example (II) • Operations: – Sum Mat(A,B: Matrix) returns C: Matrix ≡ Sums two matrices with the same number of columns and rows. The addition is carried out adding, element by element, each integer number of matrix A with the corresponding integer number of matrix B with identical row and column number. – Multiply Mat(A,B: Matrix) returns C: Matrix ≡ multiply two matrices that accomplish . . . . – Invert Mat(A: Matrix) return C: Matrix ≡ if the matrix has inverted one, this is calculated . . . .

5

Example: Implementation In C language: #define NUM_FILAS 10 #define NUM_COLUMNAS 10 /* Definimos el TAD */ typedef int t_matriz[NUM_FILAS][NUM_COLUMNAS]; /* Definimos una variable */ t_matriz M;

void Suma_Mat(t_matriz A, t_matriz B, t_matriz C) { int i,j; for (i=0;itope + 1 == maxP) /* Comprobamos si cabe el elemento. */ tratarPilaLlena(); /* Si no cabe hacemos un tratamiento de error. */ else { /* Si cabe, entonces */ p->tope = p->tope + 1; /* actualizamos el tope e */ p->v[p->tope] = e; /* insertamos el elemento. */ } return(p); /* Devolvemos un puntero a la pila modificada. */ } pila *desapilar(pila *p) { p->tope = p->tope - 1; /* Decrementamos el marcador al tope. */ return(p); /* Devolvemos un puntero a la pila modificada. */ } 4

int tope(pila *p) { return(p->v[p->tope]); /* Devolvemos el elemento senyalado por tope. */ }

int vaciap(pila *p) { return(p->tope < 0); /* Devolvemos 0 (falso) si la pila no esta vacia, */ } /* y 1 (cierto) en caso contrario. */

5

Linked representation with dynamic memory -

Pn :  

Pn−1 :  

···

P2 :  

P1 •

Type definition

typedef struct _pnodo { int e; /* Variable para almacenar un elemento de la pila. */ struct _pnodo *sig; /* Puntero al siguiente nodo que contiene un elemento } pnodo; /* Tipo nodo. Cada nodo contiene un elemento de la pila. */ typedef pnodo pila; pila *crearp() { return(NULL); /* Devolvemos un valor NULL para inicializar */ } /* el puntero de acceso a la pila. */ int tope(pila *p) { return(p->e); /* Devolvemos el elemento apuntado por p */ } 6

pila *apilar(pila *p, int e) { pnodo *paux; paux = (pnodo *) malloc(sizeof(pnodo)); /* Creamos un nodo. paux->e = e; /* Almacenamos el elemento e. paux->sig = p; /* El nuevo nodo pasa a ser tope de la pila. return(paux); /* Devolvemos un puntero al nuevo tope.

*/ */ */ */

} pila *desapilar(pila *p) { pnodo *paux;

/* Guardamos un puntero al nodo a borrar. */ paux = p; p = p->sig; /* El nuevo tope sera el nodo apuntado por el tope actual. */ free(paux); /* Liberamos la memoria ocupada por el tope actual. */ return(p); /* Devolvemos un puntero al nuevo tope. */ } int vaciap(pila *p) { return(p == NULL); /* Devolvemos 0 (falso) si la pila no esta vacia, */ } /* y 1 (cierto) en caso contrario. */ 7

Queues. Array and linked representation Definition. A queue is a linear data structure characterized by the way to access to its data: first-in-first-out (FIFO). The elements are introduced in the head and extracted from the tail. pcab

Q1

pcol

Q2

···



Qn

pcab

Q1

pcol

Q2

···

Qn

Qn+1

pcab

Q1



Qn+1

Q2

pcol

···

Qn

Qn+1

pcab

Q2

pcol

···

Qn

Qn+1

Applications • A processes queue to a specific resource. • A print queue.

8

Queue operations • crearq(): creates an empty queue. /

• encolar(q,e): puts an elements e into the tail of the queue q. pcab

Q1

pcol

···

Q2

pcab /

Qn

Q1

pcol

···

Q2

Qn

e

• desencolar(q): deletes the head of the queue q. pcab

Q1

pcol

Q2

···

pcab /

Qn

pcol

Q2

Q3

···

/

Q1

Qn

• cabeza(q): consults the head element of the queue q. pcab

Q1

pcol

Q2

···

Qn

• vaciaq(q): consults if the queue q is empty or not. pcab

Q1

pcol

Q2

···

Qn

/

True if n = 0 False if n > 0

9

An array representation for queues maxC−1

0

···

Q1

Q2

···

···

Qn

pcab

pcol

Possible cases maxC−1

0

···

Q1

Q2

···

Qn

pcab

pcol

maxC−1

0

···

···

Qn pcol

Q1

Q2

···

pcab

10

'$ '$ '$ '$ maxC-1 0

maxC-1 0 B B

 

B B

XX HH pcab

· &% ··

 

XX pcab HH pcol

&%

&% &%  

pcol

B B

¿An empty or full queue?

Type definition #define maxC ...

/* Talla maxima del vector. */

typedef struct { int v[maxC]; /* Vector definido en tiempo de compilacion. */ int pcab, pcol; /* Marcador a la cabeza y a la cola. */ int talla; /* Numero de elementos. */ } cola;

11

cola *crearq() { cola *q;

q = (cola *) malloc(sizeof(cola)); /* Reservamos memoria para la cola. */ q->pcab = 0; /* Inicializamos el marcador a la cabeza. */ q->pcol = 0; /* Inicializamos el marcador a la cola. */ q->talla = 0; /* Inicializamos la talla. */ return(q); /* Devolvemos un puntero a la cola creada. */ } cola *encolar(cola *q, int e) { if (q->talla == maxC) /* Comprobamos si cabe el elemento. */ tratarColaLlena(); /* Si no cabe hacemos un tratamiento de error. */ else { /* Si cabe, entonces */ q->v[q->pcol] = e; /* guardamos el elemento, */ q->pcol = (q->pcol + 1) % maxC; /* incrementamos marcador de cola, */ q->talla = q->talla + 1; /* e incrementamos la talla. */ } return(q); /* Devolvemos un puntero a la cola modificada. */ }

12

cola *desencolar(cola *q) { q->pcab = (q->pcab + 1) % maxC; /* Avanzamos el marcador de cabeza. */ q->talla = q->talla - 1; /* Decrementamos la talla. */ return(q); /* Devolvemos un puntero a la cola modificada. */ } int cabeza(cola *q) { return(q->v[q->pcab]); /* Devolvemos el elemento que hay en cabeza. */ } int vaciaq(cola *p) { return(q->talla == 0); /* Devolvemos 0 (falso) si la cola */ /* no esta vacia, y 1 (cierto) en caso contrario.*/ }

13

Linked representation with dynamic memory pcab pcol

-

Q1 :  

Q2

···

:  

Qn−1 :    *

Qn •

Type definition

typedef struct _cnodo { int e; /* Variable para almacenar un elemento de la cola. */ struct _cnodo *sig; /* Puntero al siguiente nodo que contiene un elemento } cnodo; /* Tipo nodo. Cada nodo contiene un elemento de la cola. */ typedef struct { cnodo *pcab, *pcol; /* Punteros a la cabeza y la cola. */ } cola;

14

cola *crearq() { cola *q; q = (cola*) malloc(sizeof(cola)); /* Creamos una cola. */ q->pcab = NULL; /* Inicializamos a NULL los punteros. */ q->pcol = NULL; return(q); /* Devolvemos un puntero a la cola creada.*/ } cola *encolar(cola *q, int e) { cnodo *qaux;

/* Creamos un nodo. */ qaux = (cnodo *) malloc(sizeof(cnodo)); qaux->e = e; /* Almacenamos el elemento e. */ qaux->sig = NULL; if (q->pcab == NULL) /* Si no hay nigun elemento, entonces */ q->pcab = qaux; /* pcab apunta al nuevo nodo creado, */ else /* y sino, */ q->pcol->seg = qaux; /* el nodo nuevo va despues del que apunta pcol. * q->pcol = qaux; /* El nuevo nodo pasa a estar apuntado por pcol. */ return(q); /* Devolvemos un puntero a la cola modificada. */ } 15

cola *desencolar(cola *q) { cnodo *qaux; /* Guardamos un puntero al nodo a borrar. qaux = q->pcab; q->pcab = q->pcab->sig; /* Actualizamos pcab. if (q->pcab == NULL) /* Si la cola se queda vacia, entonces q->pcol = NULL; /* actualizamos pcol. free(qaux); /* Liberamos la memoria ocupada por el nodo. return(q); /* Devolvemos un puntero a la cola modificada.

*/ */ */ */ */ */

} int cabeza(cola *q) { return(q->pcab->e); /* Devolvemos el elemento que hay en la cabeza. */ } int vaciaq(cola *q) { return(q->pcab == NULL); /* Devolvemos 0 (falso) si la cola */ } /* no esta vacia, y 1 (cierto) en caso contrario. */

16

Lists. Linked and array representation Definition. A list is a data structure formed by an object sequence. Each object is referenced by its position in the sequence. Operations • crearl(): creates an empty list. 1

n

/

···

• insertar(l,e,p): inserts e at the position p of the list l. The elements from this position until the end are moved one position to the right. 1

p

···

L1

n

···

Lp

1 /

Ln

···

L1

p

p+1

e

Lp

n+1

···

Ln

• borrar(l,p): remove the element of the position p of the list l. 1

L1

p

···

n

···

Lp

1 /

Ln

L1

n−1

p

···

Lp+1

···

Ln

• recuperar(l,p): returns the element of the position p of the list l. 1

L1

p

···

Lp

n

···

Ln

/

Lp

17

• vacial(l): consults if the list l is empty or not. 1

n

···

L1

/

True if n = 0 False if n > 0

Ln

• fin(l): returns the position that follows the last position of the list l. 1

n

···

L1

/

Ln

n+1

• principio(l): returns the first position of the list l. 1

L1

n

···

/

Ln

1

• siguiente(l,p): returns the next position of p in the list l. 1

L1

p

···

Lp

n

···

Ln

/

p+1

18

Array representation of lists n−1

0

L1

L2

···

Ln

maxL−1

···

u ´ltimo

Type definition #define maxL ...

/* Talla maxima del vector. */

typedef int posicion; /* Cada posicion se referencia con un entero. */ typedef struct { int v[maxL]; /* Vector definido en tiempo de compilacion. */ posicion ultimo; /* Posicion del ultimo elemento. */ } lista;

19

lista *crearl() { lista *l /* Creamos la lista. */ l = (lista *) malloc(sizeof(lista)); l->ultimo = -1; /* Inicializamos el marcador al ultimo. */ return(l); /* Devolvemos un puntero a la lista creada. */ } lista *insertar(lista *l, int e, posicion p) { posicion i;

if (l->ultimo == maxL-1) /* Comprobamos si cabe el elemento. */ tratarListaLlena(); /* Si no cabe hacemos un tratamiento de error. */ else { /* Si cabe, entonces */ for (i=l->ultimo; i>=p; i--) /* hacemos un vacio en la posicion p, */ l->v[i+1] = l->v[i]; l->v[p] = e; /* guardamos el elemento, */ l->ultimo = l->ultimo + 1; /* e incrementamos el marcador al ultimo. */ return(l); /* Devolvemos un puntero a la lista modificada. */ } }

20

lista *borrar(lista *l, posicion p) { posicion i; for (i=p; iultimo; i++) /* Desplazamos los elementos del vector. */ l->v[i] = l->v[i+1]; l->ultimo = l->ultimo - 1; /* Decrementamos el marcador al ultimo. */ return(l); /* Devolvemos un puntero a la lista modificada. */ } int recuperar(lista *l, posicion p) { return(l->v[p]); /* Devolvemos el elemento que hay en la posicion p. */ } int vacial(lista *l) { return(l->ultimo < 0); /* Devolvemos 0 (falso) si la lista */ } /* no esta vacia, y 1 (cierto) en caso contrario. */

posicion fin(lista *l) { return(l->ultimo + 1); /* Devolvemos la posicion siguiente a la ultima. * } 21

posicion principio(lista *l) { return(0); /* Devolvemos la primera posicion. */ } posicion siguiente(lista *l, posicion p) { return(p+1); /* Devolvemos la posicion siguiente a la posicion p. */ }

22

Linked representation of lists with dynamic memory We use a sentry node at the beginning of the list to increase the list update performance. first

:  

last

···

L1



Ln−1

Ln

 :   *

: 





• First option: given a position p, the element Lp is at the node pointed by p. p

-

···

Lp :  

···

• Second option: given a position p, the element Lp is pointed by p->sig. p

···

-

Lp−1 :   

Lp : 



···

23

Type definition

typedef struct _lnodo { int e; /* Variable para almacenar un elemento de la lista. */ struct _lnodo *sig; /* Puntero al siguiente nodo que contiene un elemento } lnodo typedef lnodo *posicion; /* Cada posicion se referencia con un puntero. */ typedef struct { /* Definimos el tipo lista con un puntero */ posicion primero, ultimo; /* al primero y ultimo nodos. */ } lista; lista *crearl() { lista *l; l = (lista *) malloc(sizeof(lista)); /* Creamos una lista. */ l->primero = (lnodo *) malloc(sizeof(lnodo)); /* Creamos el centinela */ l->primero->sig = NULL; l->ultimo = l->primero; return(l); /* Devolvemos un puntero a la lista creada. */ } 24

lista *insertar(lista *l, int e, posicion p) { posicion q; /* Dejamos q apuntando al nodo que se desplaza. q = p->sig; p->sig = (lnodo *) malloc(sizeof(lnodo)); /* Creamos un nodo. p->sig->e = e; /* Guardamos el elemento. p->sig->sig = q; /* El sucesor del nuevo nodo esta apuntado por q.

*/ */ */ */

if (p == l->ultimo) /* Si el nodo insertado ha pasaso a ser el ultimo, */ l->ultimo = p->sig; /* actualizamos ultimo. */ return(l); /* Devolvemos un puntero a la lista modificada. */ } lista *borrar(lista *l, posicion p) { posicion q; if (p->sig == l->ultimo) /* Si el nodo que borramos es el ultimo, l->ultimo = p; /* actualizamos ultimo. q = p->sig; /* Dejamos q apuntando al nodo a borrar. p->sig = p->sig->sig; /* p->sig apuntara a su sucesor. free(q); /* Liberamos la memoria ocupada por el nodo a borrar. return(l); /* Devolvemos un puntero a la lista modificada.

*/ */ */ */ */ */

} 25

int recuperar(lista *l, posicion p) { return(p->sig->e); /* Devolvemos el elemento que hay en la posicion p. */ } int vacial(lista *l) { return(l->primero->sig == NULL); /* Devolvemos 0 (falso) si la lista */ } /* no esta vacia, y 1 (cierto) en caso contrario. */ posicion fin(lista *l) { return(l->ultimo); } posicion principio(lista *l) { return(l->primero); }

/* Devolvemos la ultima posicion. */

/* Devolvemos la primera posicion. */

posicion siguiente(lista *l, posicion p) { return(p->sig); /* Devolvemos la posicion siguiente a la posicion p. */ } 26

Linked list representation with static memory 0

maxL−1

1

... p2

*

*

L1 p1

Lp

... *K

L2 *

Ln−1

...

Ln • *

u1

... • *

Type definition #define maxL ...

/* Talla maxima del vector. */

typedef posicion int; /* El tipo posicion se define como un entero. */ typedef struct { int v[maxL]; /* Vector definido en tiempo de compilacion. */ posicion p[maxL]; /* Vector de posiciones creado en tiempo de execucion. posicion p1, /* Marcador al principio de la lista. */ u1, /* Marcador al fin de la lista. */ p2; /* Marcador al principio de la lista de nodos vacios. */ } lista;

27

p

...

...

Lp

...

Lp = l.v[l.p[p]]

*

lista *crearl() { lista *l; int i; l = (lista *) malloc(sizeof(lista)); /* Creamos la lista. */ l->p1 = 0; /* El nodo 0 es el centinela. */ l->u1 = 0; l->p[0] = -1; l->p2 = 1; /* La lista de nodos vacios comienza en el node 1. */ for (i=1; ip[i] = i+1; l->p[maxL-1] = -1; /* El ultimo nodo vacio no senyala a ningun lugar. */ return(l); /* Devolvemos un puntero a lista construida. */ }

28

lista *insertar(lista *l, int e, posicion p) { posicion q;

if (l->p2 == -1) /* Si no quedan nodos vacios, */ tratarListaLlena(); /* hacemos un tratamiento de error. */ else { q = l->p2; /* Dejamos un marcador al primer nodo vacio. */ l->p2 = l->p[q]; /* El primer nodo vacio sera el sucesor de q. */ l->v[q] = e; /* Guardamos el elemento en el nodo reservado. */ l->p[q] = l->p[p]; /* Su sucesor pasa a ser el de la pos. p. */ l->p[p] = q; /* El sucesor del nodo apuntado por p pasa a ser q. */ if (p == l->u1) /* Si el nodo que hemos insertado pasa a ser el ultimo, l->u1 = q; /* actualizamos el marcador u1. */ return(l); /* Devolvemos un puntero a la lista modificada. */ }

29

lista *borrar(lista *l, posicion p) { posicion q;

if (l->p[p] == l->u1) /* Si el nodo que borramos es el ultimo, */ l->u1 = p; /* actualizamos u1. */ q = l->p[p]; /* Dejamos q senyalando al nodo a borrar. */ l->p[p] = l->p[q]; /* El sucesor del nodo senyalado por p pasa a ser el sucesor del nodo apuntado por q. */ l->p[q] = l->p2; /* El nodo que borramos sera el primero de los vacios. * l->p2 = q; /* El principio de la lista de nodos vacios comienza en q. */ return(l); }

30

Data Structures and Algorithms

Divide and Conquer

“Divide and Conquer” general concept Given a problem of size of input n: 1. Divide the problem in problems of litter size of input (sub-problems), 2. Solve the sub-problems independently (recursively), 3. Combine the sub-problems solutions to get the original problem solution. Characteristics Recursive method. Cost in the problem division into sub-problems + Cost in the results combination. Efficient method when the sub-problems are of a similar size of input. 1

Sorting Algorithms Problem: To sort in an non-decreasing way a set of n integers saved in an array A.

Insertion Sort Strategy: 1. Two parts in the array: one sorted and one unsorted. 











ordenado 







no ordenado





2. The first element of the unsorted part is selected, and it is inserted in the corresponding position in the sorted part (and keeping this part sorted). i 











ordenado 







no ordenado







































































 



ordenado 























no ordenado

2

Algorithm: Parameters:

Insertion Sort A: array A[l, . . . , r], l: index of the first vector position, r: index of the last vector position

void Insert Sort(int *A,int l,int r) { int i,j,aux; for(i=l+1;il) && (A[j-1]>aux)) { A[j]=A[j-1]; j--; } A[j]=aux; } }

3

function call:

Insertion Sort(A,0,4) 0

2

1

3

4

initial array A:

45 14 33 3 56

j=1

45 45 33 3 56

first iteration (i=1,aux=14)

aux 







j=0

14 45 33 3 56 







second iteration (i=2,aux=33) 14 45 45 3 56

j=2

aux 

j=1 











14 33 45 3 56 









4

third iteration (i=3,aux=3) j=3

14 33 45 45 56

j=2

14 33 33 45 56

j=1

14 14 33 45 56 aux 















3 14 33 45 56

j=0 















fourth iteration (i=4,aux=56) aux 

j=4







3 

















14 33 45 56 













5

Efficiency analysis Worst case n X n(n − 1) 2 ∈ Θ(n ) (i − 1) = 2 i=2

Best case n X

1 = n − 1 ∈ Θ(n)

i=2

Average case insert in i:

ci =

1 2(i − 1) + i

n X i=2

where Hn =

i−2 X

! k

=

k=1

n X i+1 1 ci = − 2 i i=2

!

i+1 1 (i − 1)(i + 2) = − 2i 2 i n2 + 3n 2 = − Hn ∈ Θ(n ) 4

Pn

1 i=1 i ∈ Θ(log n)

Temporal cost of “Insert Sort” algorithm =⇒ Ω(n) O(n2 )

6

Selection Sort Characteristics: Given an array A[1, . . . , N ] 1. Initially, the minimum value is selected and it is put at the first position; i.e., the 1-th litter element to the position 1. 2. Then, it is selected the minimum value in the sub-sequence A[2, . . . , N ] and it is put in the second position; i.e., the 2-th litter element to the position 2. 3. Following this procedure with all the i positions of the array, assigning to each one the corresponding element: the i-th litter.

7

Algorithm: Parameters:

Selection Sort A: array A[l, . . . , r], l: index to the first vector position, r: index to the last vector position

void Selection Sort(int *A,int l,int r) { int i,j,min,aux; for(i=l;i>

+

30     

3

/     

30

?? ?? ?? ??



== == == ==

5

6

3

7

Representation of trees Representation using lists of children ⇒ To form a children list for each node.

Data Structure: 1. Array v with the information of each node. 2. Each element (node) of the array is pointing to a linked list of elements (nodes) which informs who their children are. 3. An index to the root.

8

Representation using lists of children: example

7 3

12

5 1

8

11

4 2 6 9 13 10

1 2 3 4 5 6 root 7 8 9 10 11 12 13 14

4

2

6

1

8

3

5

12 /

9 11

13

10

/

/ / / /

/ / / / /

/

/

...

9

Representation using lists of children (III) Advantages of this representation: It’s a simple representation. Facilitates the operations related to the access of children. Disadvantages of this representation: It wastes memory. The access to the parent of a node is costly.

10

Representation using lists of children (IV) Exercise: recursive function that prints the indexes of the nodes in preorder. #define N ... typedef struct snode{ int e; struct snode *next; } node; typedef struct { int root; node *v[N]; } tree;

void preorder(tree *T, int n){ node *aux; aux = T->v[n]; printf(" %d ",n); while (aux != NULL){ preorder(T,aux->e); aux = aux->next; } }

11

Representation of trees (II) Leftmost child − right sibling representation For each node, save the following information: 1. key: value of type of base T saved in the node. 2. left child: leftmost child of the node. 3. right sibling: right sibling of the node.

12

Leftmost child − right sibling representation: Example

A /

A B

B

C E

D F

C

D

/

/

E

J

F

J

/ /

/

GH I K LM G

H

/

/

I / /

K

L

M

/

/

/ /

13

Leftmost child − right sibling representation (III) A variation that facilitates the access to the parent from a child node: to link the rightmost child with its parent. It is necessary to specify, in each node, if the pointer to right sibling is pointing to a sibling or its parent. Example:

A /

A B

B

C E

D F

C

D

/

E

J

F

J

/

GH I K LM G

H

/

/

I /

K

L

M

/

/

/ 14

Leftmost child − right sibling representation (IV) Advantages of this representation Facilitates the operations related to the access to children and parent of a node. Efficient memory use. Disadvantages of this representation The maintenance of the structure is complex.

15

Leftmost child − right sibling representation (V) Exercise: function that computes recursively the height of a tree. int height(tree *T){ tree *aux; int maxhsub=0, hsub; if (T == NULL) return(0); else if (T->left == NULL) return(0);

typedef struct snode{ char key[2]; struct snode *left; struct snode *right; } tree;

else{ aux = T->left; while ( aux != NULL){ hsub = height(aux); if (hsub > maxhsub) maxhsub = hsub; aux = aux->right; } return(maxhsub + 1);

} } 16

Binary trees Binary tree: finite set of nodes that, either is empty, or is formed by a special node called root. The rest of the nodes are grouped into two disjoint binary trees called left subtree and right subtree. root A B D

C E

F right subtree

G left subtree

Example of different binary trees: A B D

A B

C E

F G

D

C E

F

G 17

Representing binary trees Representation using arrays Data structure: Index to the root. Array v to save information of each node. Each element of the array (node), will be a structure with: 1. key: value of the type T saved in the node. 2. left child: index to the node which is the left child. 3. right child: index to the node which is the right child.

18

Representation using arrays: Example

A B D

C E

G

F

/ / / 5 / 4 / / 7

/ F / A D B C G E

/ / / 6 / 8 1 / /

/

/

/

...

0 1 2 3 root 4 5 6 7 8

Definition of types in C: #define N ... typedef ... type_baseT; typedef struct{ type_baseT e; int left, right; } node; typedef struct{ int root; node v[N]; } tree;

N−1

19

Representing binary trees (II) Representation using dynamic memory For each node, the following information has to be saved: 1. key: value of type T saved in the node. 2. left child: pointer to left child. 3. right child: pointer to right child.

20

Representation using dynamic memory: Example A A B D

B

C E

C /

F

D

E

/ /

F /

/ /

G G / /

Definition of types in C: typedef ... type_baseT; typedef struct snode{ type_baseT e; struct snodo *left, *right; } tree; 21

Representation using dynamic memory (III) ⇒ Variation that facilitates the access to parent from a child: in each node, a pointer to its parent.

A /

A B D

C E

C

B /

F

D

E

F

/ /

/

/ /

G G Definition of types in C:

/ /

typedef ... type_baseT; typedef struct snode{ type_baseT e; struct snode *left, *right, *parent; } tree; 22

Traversing binary trees As for every tree, we are going to study three ways: traversing in previous order (preorder) traversing in symmetric order (inorder) traversing in back order (postorder) Preorder(x) if x 6= EMPTY then ActionP(x) Preorder(left(x)) Preorder(right(x))

Inorder(x) if x 6= EMPTY then Inorder(left(x)) ActionP(x) Inorder(right(x))

Postorder(x) if x 6= EMPTY then Postorder(left(x)) Postorder(right(x)) ActionP(x)

Cost ∈ Θ(n), being n the number of nodes of the tree.

23

Traversing binary trees: Implementation ⇒ Representation using dynamic memory. Action P ≡ print the key of the node. Definition of types in C: typedef int type_baseT; typedef struct snode{ type_baseT e; struct snode *left, *right; } tree;

void inorder(tree *a){ if (a != NULL){ inorder(a->left); printf(‘‘ %d ’’,a->e); inorder(a->right); } }

void preorder(tree *a){ if (a != NULL){ printf(‘‘ %d ’’,a->e); preorder(a->left); preorder(a->right); } } void postorder(tree *a){ if (a != NULL){ postorder(a->left); postorder(a->right); printf(‘‘ %d ’’,a->e); } } 24

Binary trees: Exercise A C function that deletes all the leaves of a binary tree, keeping the interior nodes. tree *delete_leaves(tree *T){ if (T == NULL) return(NULL); else if ( (T->left == NULL) && (T->right == NULL) ){ free(T); return(NULL); }else{ T->left = delete_leaves(T->left); T->right = delete_leaves(T->right); return(T); } }

25

Complete binary tree Complete binary tree: binary tree in which all its levels have the maximum number of nodes except, may be, the last level. In such a case, the leaves of the last level are leftmost located.

26

Complete binary tree: Representation The complete binary trees can be represented with an array: At position 1 is located the root of the tree. Given a node located at i in the array: • At position 2i is located its left child. • At position 2i + 1 is located its right child. • At position bi/2c is located its parent if i > 1. 1 7 3

2 16

10 4

5 3

8 4

7 5

11 9 10 2 13

6 1

1

2

3

4

5

6

7 10 16 3 11 5

7

8

9 10 11

1 4 13 2 19

11 19

27

Complete binary tree: Exercise Three C functions that, given a node of a complete binary tree represented using an array, calculate the array position in which it is located the parent node, the left child and the right child:

int left(int i) { return(2*i); }

int right(int i) { return((2*i)+1); }

int parent(int i) { return(i/2); }

28

Properties of binary trees The maximum number of nodes at level i is 2i−1, i ≥ 1. In a binary tree of i levels there are a maximum of 2i − 1 nodes, i ≥ 1. In a non empty binary tree, if n0 is the number of leaves and n2 is the number of nodes of degree 2, is true that n0 = n2 + 1. The height of a complete binary tree that has n nodes is blog2 nc.

29

Properties of binary trees: Example Level 1 2 height = 3 3

4

Maximum nodes per level Level 1 20 = 1 Level 2 21 = 2 Level 3 22 = 4 Level 4 23 = 8 Maximum nodes in a tree 24 − 1 = 15 Number of leaves (n0) n2 = 7 n0 = n2 + 1 = 7 + 1 = 8 Height of a complete binary tree n = 15 blog2 nc = blog2 (15)c = 3

30

Data Structures and Algorithms

Sets

General Concepts Set: group of different elements; each element can be a set or an atom. Multiset ≡ set with repeated elements. Representation of sets: Explicit representation. C = {1, 4, 7, 11} Representation using properties. C = {x ∈ N | x is even}

1

Notation Fundamental relationship: belonging (∈) • x ∈ A, if x is a member of the set A. • x∈ / A, if x is not a member of the set A. Size or cardinality: number of elements a set has. Empty or null set: without any element, ∅. A ⊆ B or B ⊇ A, if all the elements of A are also elements of B. Two sets are equal if and only if A ⊆ B and B ⊆ A. Proper subset: A 6= B and A ⊆ B.

2

Elementary operations on sets Union of sets: A ∪ B, is the set whose elements are elements of A, of B, or of both. Intersection of two sets: A ∩ B, is the set whose elements are elements of, at the same time, of A and B. Difference of two sets: A − B, is the set shose elements are elements of A and are not elements of B.

3

Dynamic sets Dynamic set: its elements can vary through time.

Representation of dynamic sets Its elements will have: key: the value that identifies the element. satellite information: additional information of the element. It may exist a total order relationship between the keys. Ex: the keys are integer or real numbers, words (lexicographic order) etc. If ∃ a total order → define a minimum and maximum, or previous or successor of an element.

4

Operation on dynamic sets Given S and x and key(x) = k. It exists two types of operations: Consulting: • Search(S,k) → x ∈ S and clave(x) = k. → null if x ∈ / S. • Empty(S): if S is empty or not. • • • •

Possible operations if in S exists a total order relationship between the keys: Minimum(S): → x with the least key k in S. Maximum(S): → x with greatest key k in S. Previous(S,x): → element with the immediate inferior key of x. Successor(S,x): → element with the immediate superior key of x.

Modifying: • Insert(S,x): Adds x to S. • Delete(S,x): Removes x from S. • Create(S): Creates S empty. 5

Hash tables Dictionary: a set that allows, mainly, the following operations: insertion and deletion of an element, and determination if an element belongs or not to the set.

Direct addressing tables Direct addressing: It is advised when the universe U of the possible elements is small. Each element can be identified with a unique key. Representation using an array with size equal to the size of the universe |U |. T [0, . . . , | U | −1]: each position k of the array references to the element x with key k.

6

Representation of direct addressing tables Array of pointers to structures where it is saved the information of the nodes. Trivial operations of cost O(1): insert(T,x). search(T,k). delete(T,x). Variation: save the information in the array itself. The position of the element indicates which its key is. The need of a technique to distinguish between empty and occupied positions.

7

Representation of direct addressing tables (II) key

information

key

information

0 1

1

0

2

2

1 1 2 2

3 4

3

4

4 4

5

5

6 6

7

7

8

8

8

.. . M−1

8

.. . M−1

8

Hash tables Using direct addressing, if the universe U is large: Impossibility to save the array. The number of elements is usually small compared with the size of the universe U → wasted space. ⇒ to limit the size of the array T [0, . . . , M − 1]. Hash tables: the position of x with key k, is obtained with the application of a hashing function h over the key k: h(k). h : K −→ {0, 1, . . . , M − 1} Table address: each position of the array. x with key k is hashed to the table address h(k).

9

Hash tables (II) T 0 h(k2)

1 2

h(k1)

3

K k2 k1

L

4

k5

5 k7

6

h(k5)

7 8

h(k7)

.. . M−1

Collision: two or more keys mapped into the same table address.

10

Collision-resolution by chaining Strategy: the use of a linked list in each table address. T 0 1

K

k9

k6

k1

k8

k2

2 k9

L

k6

3 k2 k1

4 k8

k5

5 k3

k4

k7

6

k5

7 8

k4

k7

k3

.. . M−1

11

Collision-resolution by chaining (II) Given T and x, and key(x) = k, the operations are: Insert(T,x): inserts the element x in the top of the list pointed by T [h(key(x))]. Search(T,k): searches an element with key k in the list T [h(k)]. Delete(T,x): deletes x from the list with top T [h(key(x))].

12

Analysis of operation costs Given T with m table addresses and saving n elements ⇒ load factor: α =

n m.

Assumption: h(k) can be calculated in O(1). Insert → O(1). Search

→ O(n). → Θ(1 + α)

 if we have that n = O(m) ⇒ O(1)

Delete → Θ(1 + α) ⇒ O(1)

13

Hashing functions “Ideal” function → satisfies the simple uniform hashing: each element with equal probability to be mapped in the m table addresses. It is difficult to find such a function. We use functions that hash elements among the table addresses acceptably. Universe of keys ≡ the set of natural numbers N = {0, 1, 2, . . . }. Representation of keys are natural numbers. Ex: string → to combine the ASCII representation of its characters.

14

Hashing functions: the division method The key k is transformed to a value between 0 and m − 1: h(k) = k mod m Example: If k = 100 and m = 12 → h(100) = 100 mod 12 = 4 → table address 4. If k = 16 and m = 12 → h(16) = 16 mod 12 = 4 → table address 4. Critic point: the election of m. Good behavior → m prime and not close to a power of 2. Example: To save 2000 strings with α ≤ 3. Minimum table addresses required: 2000/3 = 666,ˆ 6. m close to 666, prime and not close to a power of 2: 701.

15

Hashing functions: the multiplication method The key k is transformed to a value between 0 and m − 1 in two steps: 1. Multiply k by a constant in the range 0 < A < 1 and get only the fractional part. (k · A) mod 1 2. Multiply the previous value by m, and truncate the result to the closest lower integer (floor). h(k) = bm (( k · A) mod 1)c The value of m is not critic. To get m as a power of two to facilitate the calculation: m = 2p, p integer.

16

Hashing functions: Examples with strings To carry out the conversion to a natural number. String x of n characters (x = x0x1x2 . . . xn−1).

Example of functions based on the division method. Function 1: to sum the ASCII codes of each character. (it takes the most of all the key) h(x) = (

n−1 X

xi) mod m

i=0

Disadvantages: • m ↑↑, bad distribution of keys. Ex: m = 10007, strings of length ≤ 10 → maximum value of x: 255 · 10 = 2550. Table addresses from 2551 to 10006 empty. • The order of characters is not considered. h(“spot”)=h(“stop”). 17

Example of functions based on the division method. (II) Function 2: to use the three first characters as numbers of a particular base (256). h(x) = (

2 X

xi 256i) mod m

i=0

Disadvantages: • Strings with the first three characters equal → same table address. h(“class”)=h(“clark”)=h(“clan”). Function 3: Similar to function 2, but considering all the string. h(x) = (

n−1 X

xi 256((n−1)−i)) mod m

i=0

Disadvantages: • A costly h computation. 18

Example of functions based on the multiplication method. Function 4: to sum the ASCII codes of each character considering that they are numbers of base 2. n−1 X

h(x) = bm (((

xi 2((n−1)−i)) · A) mod 1) c

i=0

19

Examples of functions over strings: Empiric evaluation Number of elements n = 3975. Number of table addresses m = 2003. Function 1:

34 1000

32 30

alfa: 1.985; desv.tip: 4.33

28 26 100

24 numero de cubetas

elemento por cubeta

22 20 18 16 14

10

12 10 1

8 6 4 2 0 0

250

500

750

1000 cubeta

1250

1500

1750

2000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 numero de elementos

20

Examples of functions over strings: Empiric evaluation (II) Function 2:

1000

300 alfa: 1.985; desv.tip.: 8.97

250

200

numero de cubetas

elementos por cubeta

100

150

10

100 1 50

0 0

250

500

750

1000 cubeta

1250

1500

1750

2000

0

25

50

75

100

125 150 175 numero de elementos

200

225

250

275

300

21

Examples of functions over strings: Empiric evaluation (III) Function 3:

10 1000 9

alfa: 1.985; desv.tip: 1.43

8 100

numero de cubetas

elementos por cubeta

7 6 5 4

10

3 1 2 1 0 0

250

500

750

1000 cubeta

1250

1500

1750

2000

0

1

2

3

4 5 numero de elementos

6

7

8

9

22

Examples of functions over strings: Empiric evaluation (IV) Function 4:

15 1000 14

alfa: 1.985; desv.tip: 1.8

13 12 11

100

numero de cubetas

elementos por cubeta

10 9 8 7 6

10

5 4

1

3 2 1 0 0

250

500

750

1000 cubeta

1250

1500

1750

2000

0

1

2

3

4

5

6 7 8 numero de elementos

9

10

11

12

13

14

23

Examples of functions over strings: Variation of the number of table addresses Function 3: m = 2003 10 1000 9

alfa: 1.985; desv.tip: 1.43

8 100

numero de cubetas

elementos por cubeta

7 6 5 4

10

3 1 2 1 0 0

250

500

750

1000 cubeta

1250

1500

1750

2000

0

1

2

3

4 5 numero de elementos

6

7

8

9

m = 2000 50 alfa: 1.987; desv.tip: 7.83

1000

45 40 100 numero de cubetas

elementos por cubeta

35 30 25 20

10

15 1 10 5 0 0

250

500

750

1000 cubeta

1250

1500

1750

2000

0

5

10

15

20 25 numero de elementos

30

35

40

45

24

Examples of functions over strings: Variation of the number of table addresses (II) Function 4: m = 2003 15 1000 14

alfa: 1.985; desv.tip: 1.8

13 12 11

100

numero de cubetas

elementos por cubeta

10 9 8 7 6

10

5 4

1

3 2 1 0 0

250

500

750

1000 cubeta

1250

1500

1750

2000

0

1

0

1

2

3

4

5

6 7 8 numero de elementos

9

10

11

12

13

14

14

15

m = 2000 16 alfa: 1.987; desv.tip: 1.8

1000

14

12

numero de cubetas

elementos por cubeta

100 10

8

6

4

10

1

2

0 0

250

500

750

1000 cubeta

1250

1500

1750

2000

2

3

4

5

6 7 8 9 numero de elementos

10

11

12

13

25

Hash tables: Exercise Definitions: #define NCUB ... typedef struct snode{ char *pal; struct snode *next; }node; typedef node *Table[NCUB]; Table * T1, * T2, * T3; /* Creates an empty table */ void create_Table(Table * T) /* Inserts the word w in the table T */ void insert(Table * T, char *pal) /* Returns a pointer to the node which has the word pal, or NULL if it is not found */ node *search(Table * T, char *pal)

26

Hash tables: Exercise (II) Function that created a table T 3 with the elements of the intersection of the tables T 1 and T 2. void intersection(Table * T1, Table * T2, Table * T3){ int i; node *aux; create_Table(T3); for(i=0; ipal) != NULL ) insert(T3,aux->pal); aux = aux->next; } } }

27

Binary search trees Binary search tree: binary tree that could be empty, or accomplishes: Each node has a unique key (no repeated keys). For each node n, if its left subtree is not empty, all the keys saved in this subtree are smaller than the key saved at n. For each node n, if the right subtree is not empty, all the keys saved in this subtree are greater than the key saved at n. Used for representing dictionaries, priority queues, etc.

28

Binary search trees: Examples 4

15

7

7

18 10

3

21

11

9

9

YES

8

15

15 7

NO

8

7

18 16

3

14

21

3

18 8

14

21 19

29

Binary search trees representation Using dynamic variables, each node is a struct: key: key of the node. left sibling: pointer to the left sibling. right sibling: pointer to the right sibling. 15

15 7 3

11 9

7

18

18 /

21

3

11

/ /

/

21 / /

9

/ /

30

Binary search trees representation (II) Type definition in C: typedef ... type_baseT; typedef struct snode{ type_baseT key; struct snode *left, *right; }bst;

15 7

18 /

3

11

/ /

/

21 / /

9

/ / 31

Binary search trees representation (III) Variation: saving a pointer to parent. typedef ... type_baseT; typedef struct snode{ type_baseT key; struct snode *left, *right, *parent; }bst;

15

/

7

18 /

11

3 / /

/

21

/ /

9 / / 32

Maximum and minimum height of a bst Minimum height: O(log n). Maximum height: O(n).

logn

...

...

...

Minimum height

h

.. Maximum height

Notation: Height of a tree = h. Normally, height of a random bst ∈ O(log n)

33

Traversing binary search trees 3 methods: preorder, inorder and postorder. Inorder: if the action is, for instance, to print the keys of the node, the keys of the bst are going to be printed sorted (increasingly). void inorder(bst *T){ if (T != NULL){ inorder(T->left); printf(" %d ",T->key); inorder(T->right); } } Exercise: a trace of inorder with the previous tree: 3, 7, 9, 11, 15, 18, 21 34

Searching an element in a bst The most common operation.

bst *bst_search(bst *T, type_baseT x){ while ( (T != NULL) && (x != T->key) ) if (x < T->key) T = T->left; else T = T->right; return(T); } Cost: O(h).

35

Searching an element in a bst (II) Exercise: recursive version of the previous algorithm. bst *bst_search(bst *T, type_baseT x){ if (T != NULL) if (x == T->key) return(T); else if (x < T->key) return(bst_search(T->left,x)); else return(bst_search(T->right,x)); return(T); }

36

Searching an element in a bst: Example 1 Search x = 11 → node = bst_search(T,11);

T

15

15

15

T

7

7

18 /

3

11

/ /

/

9

/ /

T=T−>left;

7

18

18

/

21 / /

3

11

/ /

/

9

/ /

T=T−>right;

/

21 / /

3

21

11

/ /

/ /

/

9

T

/ /

return(T);

37

Searching an element in a bst: Example 2 Search x = 19 → node = bst_search(T,19);

T

15

15

15

15

T

7

7

18

3

11

/ /

/

9

/ /

T=T−>right;

7

18

/

18

/

21 / /

3

11

/ /

/

9

/ /

T=T−>right;

/

21 / /

3

11

/ /

/

9

/ /

T=T−>left;

7

T

21 / /

18 /

3

21

11

/ /

/ /

/

9

T=NULL

/ /

return(T);

38

Searching the minimum and maximum bst *minimum(bst *T){ if (T != NULL) while(T->left != NULL) T=T->left; return(T); } bst *maximum(bst *T){ if (T != NULL) while(T->right != NULL) T=T->right; return(T); } Cost: O(h). 39

Searching an element in a bst: Example T

15

15

15

T

7

7

18

18

/

Minimum:

3

21

11

/ /

T

/

/ /

/

3

21

11

/ /

/ /

/

9

7

/

3

T

/ /

/

9

/ /

T=T−>left;

21

11

/ /

9

/ /

18

/ /

T=T−>left;

return(T);

15

15

15

T

7

7

18

Maximum:

3

11

/ /

/

9

/ /

T=T−>right;

7

18

/

18

/

21 / /

3

11

/ /

/

9

/ /

T=T−>right;

/

21 / /

3

11

/ /

/

T

21 / /

9

/ /

return(T);

40

Inserting an element in a bst → to keep the condition of the bst. Strategy: 1. If the bst is empty → to insert the new node as root. 2. If the bst is not empty: a) To search the position that corresponds to the new node in the tree. In order to achieve that: to traverse the tree from the root as a searching process. b) Once the position is located: to insert it in the tree linking it correctly with its parent node. Cost: O(h).

41

Inserting an element in a bst (II) bst *bst_insert(bst *T, bst *node){ bst *aux, *aux_parent=NULL; aux = T; while (aux != NULL){ aux_parent = aux; if (node->key < aux->key) aux = aux->left; else aux = aux->right; } if (aux_parent == NULL) return(node); else{ if (node->key < aux_parent->key) aux_parent->left = node; else aux_parent->right = node; return(T); }

} 42

Inserting an element in a bst: Example Inserting a node of key 10. aux

aux_parent aux

T

15

7

18

7

/

3

7

18

21

3

/ /

/

11

/ /

aux 18 /

9

/

3

21

9

/ /

/ /

aux parent=aux; aux=aux->left;

aux parent=aux; aux=aux->right;

aux parent=aux; aux=aux->left;

T

15

T

15 7

7

/ /

/

9

/ /

21

11

/ /

/ /

T

15

aux_parent

/

11

/ /

T

15

18 /

18 /

3

3

11

/ /

/

aux

21 / /

9

/

/ /

aux parent=aux; aux=aux->right;

aux=null

/ /

/

aux_parent

aux_parent

21

11

/ /

9 10 / /

node

aux parent->right=node; return(T); 43

Deleting an element in a bst Strategy: 1. If the element to delete has no children (it’s a leaf), it is removed. 2. If the element to delete is an interior node that has one child, it is removed and its position is replaced by its only child. 3. If the element to delete is an interior node that has two children, it is removed and its position is replaced by the node of minimum key value of the right subtree (or by the maximum key value of the left subtree). Cost: O(h). Exercise: Implement a C function that deletes an element x from the bst.

44

Deleting an element in a bst: Leaf T

/

/

Deleting x = 15. A leaf with no parent.

aux

15

aux-parent=null T=NULL; free(aux); return(T);

Deleting x = 9. A leaf that is left-child of a node. T 15 7

T

18

7

/

3

11

/ /

/ /

18 /

21 /

/

aux_parent 9 aux

15

20 / /

aux parent->left=NULL; free(aux);

3

/ /

21

11 /

/

/

20 / /

return(T);

45

Deleting an element in a bst: Leaf (II) Deleting x = 21. A leaf that is right-child of a node.

T

15

7

T

aux_parent 18

7

/

3

/ /

/ /

/

18

/ /

21

11

15

3

11

/ /

/

aux 9

/ /

aux parent->right = NULL; free(aux);

9

/ /

return(T);

46

Deleting an element in a bst: Interior node with just one child Deleting x = 15. Interior node which is root of a tree and has only one right-child. 15

T

aux

/

18

T

aux_parent=null 18

/

/

21

21

/ /

/ /

T=aux->right; free(aux);

return(T);

Deleting x = 7. Interior node that being a left-child has an only right-child. 15

T aux

7

aux_parent

T

18

/

15

/

11

/

11

21

/

/ /

9

/ /

aux parent->left=aux->right; free(aux);

18 /

21

9

/ /

/ /

return(T); 47

Deleting an element in a bst: Interior node with just one child (II) Deleting x = 21. Interior node that being a right-child has an only right-child. T

15

7

18

/

/

/

/

16

3

/ /

25

5

/ /

18

/

/

/

15

7

aux 21

16

3

T

aux_parent

25

/ /

5

/ /

/ /

aux parent->right=aux->right; free(aux);

return(T);

Deleting x = 15. Interior node that being the root has an only left-child. 15

T

/

7

aux aux_parent=null

3

11

/ /

/

9

/ /

T=aux->left; free(aux);

7

T 3

11

/ /

/

9

/ /

return(T); 48

Deleting an element in a bst: Interior node with just one child (III) Deleting x = 3. Interior node that being a left-child has an only left-child. T 7

aux_parent

T

18 11

/

/

1

15

7

/

3

aux

15

/

21 / /

1

21

11

/ /

/ /

/

9

9

/ /

18

/ /

/ /

aux parent->left=aux->left; free(aux);

return(T);

Deleting x = 19. Interior node that being right-child has an only left-child. T 7

5

/ /

/

/

/

18 / /

aux parent->right=aux->left; free(aux);

15

7

/

17

3

T

19

/

/

aux_parent aux

15

17 /

18

3

/ /

5

/ /

return(T); 49

Deleting an element in a bst: Interior node with two children Deleting x = 18. An interior node with two children and its right-child is the node of minimum key value of its right subtree.

T

aux_parent

15

aux

7

18

/

/

17

3

/ /

T

15

p1

7

p2=null

25

25

/

/

31

5

/

/ /

26 / /

aux->key=p1->key; aux->hder=p1->right; free(p1);

/

31

17

3

/

/ /

26

5

/ /

/ /

return(T);

50

Deleting an element in a bst: Interior node with two children (II) Deleting x = 10. Interior node with two children and the node of minimum key of its right subtree is not its right-child. T

aux

10

aux_parent=null 7 3

7

p2

18

/

25

16

/

p1

12

18

/

/

T

/

/

12

31

/

/

3

16

/

/

25 /

/

15 /

15

26

/

/ /

13 / /

aux->key=p1->key; p2->left=p1->right; free(p1);

31

13

/

26

/ /

/ /

return(T);

51

Binary search trees: Exercise Recursive function that prints the keys of a bst smaller that a key k.

typedef ... type_baseT; typedef struct snode{ type_baseT key; struct snode *left, *right; }bst; bst *T;

void smaller(bst *T, type_baseT k){ if (T != NULL){ if (T->key < k){ printf(" %d ",T->key); smaller(T->left,k); smaller(T->right,k);; } else smaller(T->left,k); } }

→ Cost: O(n).

52

Heaps. Priority queues. Complete binary tree: binary tree in which all its levels have the maximum number of nodes except, may be, the last level. In such a case, the leaves of the last level are leftmost located. Heap: set of n elements represented in a complete binary tree that accomplishes the following: for all node i, except the root, the key value of the node i is less or equal to the key value of the parent of i (the heap property).

53

Representing heaps → using the static representation of the complete binary trees: Position 1 of the array → root of the tree. Node of position i of the array: • Position 2i → left-child node of i. • Position 2i + 1 → right-child node of i. • Position bi/2c → parent node if i > 1. From position bn/2c + 1 to position n → keys of the leaf nodes. (n is the number of nodes of the complete binary tree)

54

Representing heaps: Example 1 16 3

2 10

14 4

5 8

8 2

6 9

7

1

7 3

2

3

4

5

6

7

16 14 10 8 7

9

3 2 4 1

8

9 10

9 10 4 1

Number of nodes of the heap (size Heap): 1

A

2

3

4

5

16 14 10 8 7

6

7

8

9 10

9 3 2 4 1

...

size

size_Heap

55

Heaps: Properties Using the static representation of a complete binary tree: A[bi/2c] ≥ A[i], 1 < i ≤ n Node of maximum key → root. Each branch is sorted increasingly from the root to the leaves. Height ∈ Θ(log n).

56

Keeping the heap property Assume A[i] < A[2i] and/or A[i] < A[2i + 1] 1 16 3

2 10

4 4

5 14

8 2

7

6

7 9

3

9 10 1 8

→ heapify function: keeps heap property. To transform sub-tree of a heap. Left and right sub-trees of i are heaps.

57

Keeping heap property: Heapify void heapify(type_baseT *M, int i){ type_baseT aux; int left, right, greater; left = 2*i; right = 2*i+1; if ( (left M[i]) ) greater = left; else greater = i; if ( (right M[greater]) ) greater = right; if (greater != i){ aux = M[i]; M[i] = M[greater]; M[greater] = aux; heapify(M,greater); } } 58

Keeping heap property: Example → initial call heapify(M,2) 1 16 3

2 4 i

10

4

5

left 14 8 2

6

right 7

1

7 9

M

3

2

3

4

5

6

i

8

9 10

9 3 2 8 1

16 4 10 14 7

9 10 1 8

7

left right

...

size

size_Heap

1 16 2

3 4 i 5

4 greater 14 8 2

10

7 9 10 1 8

6

1

7 9

3

M

2

3

4

16 4 10 14 7 i

greater

5

6

7

8

9 10

9 3 2 8 1

...

size

size_Heap

59

Keeping heap property: Example(II) 1 16 2

3 10

14 5

4 4 i

6 9

7

1

7 M

3

2

3

4

5

16 14 10 4 7

8

9 10 2 1 8 left right

6

7

8

9 10

...

9 3 2 8 1

size

left right size_Heap

i

1 16 2

3 14

4 i 4 8

9 10 2 1 8 greater

10 5 7

6

1

7 9

3

M

2

3

4

5

6

16 14 10 4

7

9 3 2 8 1

i

7

8

9 10

greater

...

size

size_Heap

60

Keeping heap property: Example (III) 1 16 2

3 14

4 8

10 5

8

7

6

1

7 9

3

M

9 10 2 4 i 1 greater

2

3

4

5

16 14 10 8 7

6

7

8

9 10

9 3 2 4 1

...

size

i greater size_Heap

Temporal cost of heapify: Best case: O(1). Worst case: O(h) = O(log n). For a node i: O(h), h is the height of i.

61

Building a heap: build Heap Problem: Given a set of n elements in a complete binary tree represented in an array M , how to transform the array to a heap. Strategy: 1. The leaves are heaps. 2. To apply heapify over the rest of nodes, starting with the greatest non-leaf index (M [bn/2c]). To descend until the root (M [1]). void build_Heap(type_baseT *M, int n){ int i; size_Heap = n; for(i=n/2; i>0; i--) heapify(M,i); }

62

Building a heap: Example 1 4 3

2 1

3

4

5 2

8 14

M

1 4

2 1

6

7 9

16

10

9 10 7 8 3 3

4 2

5 16

6 9

7 10

8 14

9 8

10 7

build Heap(M,10);

63

1 4 3

2 1

3 5 16 i

4 2 8 14

M

1 4

6

7 9

10

9 10 8 7

2 1

3 3

4 2

5 6 16 9 i=greater

7 10

8 14

9 8

10 7 size_Heap

Heapify(M,5); 1 4 3

2 1

3

4

5 2

8 14

M

1 4

2 1

6

7 9

16

10

9 10 8 7 3 3

4 2

5 16

6 9

7 10

8 14

9 8

10 7 size_Heap

Heapify(M,5); // 64

1 4 3

2 1

3

4

5 2 i

8 greater 14

M

1 4

6

7 9

16

10

9 10 8 7

2 1

3 3

4 2 i

5 16

6 9

7 10

8 9 10 14 8 7 greater size_Heap

Heapify(M,4); 1 4 3

2 1

3

4

5 14 i

8 2

M

1 4

2 1

6

7 9

16

10

9 10 8 7 3 3

4 14

5 16

6 9

7 10

8 2

9 8

10 7 size_Heap

Heapify(M,4); // 65

1 4 3

2 1

i 3

4

5 14

8 2

M

1 4

6 9

16

10 7 greater

9 10 8 7

2 1

3 3 i

4 14

5 6 16 9

7 8 10 2 greater

9 8

10 7 size_Heap

Heapify(M,3); 1 4 3 10

2 1 4

5 14

8 2

M

1 4

2 1

16

6

7 9

3

9 10 7 8 3 4 10 14

5 6 16 9

7 3

8 2

9 8

10 7 size_Heap

Heapify(M,3); // 66

1 4 3 10

2 i 1 4 14 8 2

M

1 4

6

5

7

9 16 greater

3

9 10 8 7

2 1 i

3 4 10 14

5 6 16 9 greater

7 3

8 2

9 8

10 7 size_Heap

Heapify(M,2); 1 4 3 10

2 16 4 14

M

1 4

6

5

8 2

9 10 8 7

2 16

3 10

4 14

5 1

7 9

1

6 9

7 3

3

8 2

9 8

10 7 size_Heap

heapify(M,2); → heapify(M,5); 67

1 4 3 10

2 16 4 8 2

M

1 4

6

5 14

9

i 1

3

7

9 10 7 greater 8

2 16

3 10

4 14

5 1 i

7 3

6 9

size_Heap 9 10 8 7 greater

8 2

heapify(M,5); 1 4 3 10

2 16 4 14 8 2 M

1 4

6

5

2 16

7 9

7

3

9 10 1 8 3 10

4 14

5 7

6 9

7 3

8 2

9 8

10 1 size_Heap

heapify(M,5); // → heapify(M,2); // 68

1 i 4 3 10

2 greater 16 4

6

5 14

8 2

9

7

3

7

9 10 1 8

1 2 3 M 4 16 10 i greater

4 14

5 7

7 3

6 9

8 2

9 8

10 1 size_Heap

heapify(M,1); 1 16 3 10

2 4 4 14 8 2 1 M 16

2 4

6

5

9

7

3

7

9 10 1 8 3 10

4 14

5 7

6 9

7 3

8 2

9 8

10 1 size_Heap

heapify(M,1); → Heapify(M,2); 69

1 16 3 10

2 4 i 4 greater 14 8 2 1 M 16

6

5

7 9

7

3

9 10 8 1 3 10

2 4 i

4 5 14 7 greater

7 3

6 9

8 2

9 8

10 1 size_Heap

heapify(M,2); 1 16 3 10

2 14 4 4 8 2 1 M 16

6

5

2 14

7 9

7

3

9 10 1 8 3 10

4 4

5 7

6 9

7 3

8 2

9 8

10 1 size_Heap

heapify(M,2); → Heapify(M,4); 70

1 16 3 10

2 14 4

6

5

i 4

7 9

7

3

8 2 1 M 16

10 9 8 1 greater 3 4 5 2 14 10 4 7 i

7 3

6 9

size_Heap 8 9 10 2 8 1 greater

heapify(M,4); 1 16 3 10

2 14 4 8 2 1 M 16

6

5 8

2 14

7

4 3 10

9

1 4 8

7 9

3

10 5 7

6 9

7 3

8 2

9 4

10 1 size_Heap

heapify(M,4); // → heapify(M,2); // → heapify(M,1); // 71

Temporal cost of build Heap 1st approximation: as heapify ∈ O(log n) ⇒ build heap ∈ O(n log n) 2nd approximation: heapify over a node O(h) → height h: NhO(h) Altura (h)

Nivel (i)

Num. maximo de nodos

[log(n)]

1

2

[log(n)]−1

2

2

3

2

[log(n)]

2

[log(n)]+1

2

[log(n)]−2

1

...

...

...

0

blog2 nc−h

Nh ≤ 2

0

1

2

[log(n)]−1

[log(n)]

2blog2 nc = 2h 72

Temporal cost of build Heap (II) For each height, heapify over each node.

blog2 nc

T (n) =

X h=0

As

P∞

h h=0 hx

blog2 nc

 h log log 2n 2n X 2blog2 nc X X 1 log2 n h NhO(h) ≤ h ≤ 2 = n h 2h 2h 2 h=0

h=0

h=0

= x/(1 − x)2, if 0 < x < 1, ⇒

n

log 2n X h=0

 h 1 h ≤ 2

1/2 n (1 − 1/2)2

= 2n ∈ O(n)

73

Heapsort Sorting method Strategy: 1. build Heap: to transform the array M to a heap. 2. Greatest key always at the root. → swap root with the lat node in the heap → the greatest element will be located in the last position. 1

2

M

...

n−2 n−1 n

sizeH 1

2

M

...

n−2 n−1 n

1; i--){ aux = M[i]; M[i] = M[1]; M[1] = aux; size_Heap--; heapify(M,1); } }

77

Heapsort Sorting method: Example 1 1

3 10

2 14 4

5 8

8 2

7

4

9

1

1

16 14

7

4

3

i 10 1

i 8 3 4 1 2 5 6 7 9 10 16 14 10 8 7 9 3 2 4 1 sizeH

M[1]↔ M[i];

3 10

2

6 9

14

5 8

8 2

7

4

8

6

7 9

4 8 2

9 10 4 16 sizeH

heapify(M,1);

5 4

3

9

1 2 3 4 5 6 7 8 1 14 10 8 7 9 3 2

3 10

2

7

1

6

7 9

3

9

1 2 3 4 5 6 7 8 14 8 10 4 7 9 3 2

9 10 1 16 sizeH

end iteration i = 10;

78

1

1 14 3 10

2 8 4 4 8 2

8 7

9

7

3 10

2

6

5

4

9

7

8

3

i 9 10 1 16 sizeH

end iteration i = 9;

1 3 8 7 1

9

2

9

7

1

2

6 3

7

heapify(M,1); 3

5

3

1 2 3 4 5 6 7 8 9 10 10 8 9 4 7 1 3 2 14 16 sizeH

10 8

1

7

1 2 3 4 5 6 7 8 9 10 1 8 10 4 7 9 3 2 14 16 sizeH

1 2

6

5 4

8 2

M[1]↔ M[i];

4

9

4

7

8 2

1 2 3 4 5 6 7 8 14 8 10 4 7 9 3 2

3

2

6

5 4

3

9 1 i

4

1 10

1

5 4

8

9

4 7

6

7 1

3

2

3

3

4

5 4

7

6

7 1

2

8 i 2 i 1 2 3 4 5 6 7 8 9 10 10 8 9 4 7 1 3 2 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 2 8 9 4 7 1 3 10 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 9 8 3 4 7 1 2 10 14 16 sizeH

M[1]↔ M[i];

heapify(M,1);

end iteration i = 8; 79

1

1 2

9 3

2 8 6

5 4

3 8

7

i 1

7

8

2

3

4

1

1

7

3

4

6

5 4

2

7

3

4

3

2

6

5 4

1

2

i 1 2 3 4 5 6 7 8 9 10 9 8 3 4 7 1 2 10 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 2 8 3 4 7 1 9 10 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 8 7 3 4 2 1 9 10 14 16 sizeH

M[1]↔ M[i];

heapify(M,1);

end iteration i = 7;

1

1

1

8 3

2 7 5 4

2

6 1

i

7 3

2 7

3

4

1

5 4

4

3

4 2

3

2 3

4

5 1

2

i 1 2 3 4 5 6 7 8 9 10 8 7 3 4 2 1 9 10 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 1 7 3 4 2 8 9 10 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 7 4 3 1 2 8 9 10 14 16 sizeH

M[1]↔ M[i];

heapify(M,1);

end iteration i = 6;

80

1

1

7

2 3

2 4 4 1

4 3

2 4

3 i

1

2

3

4

5

3

4 1

2

3

2

1

i 1 2 3 4 5 6 7 8 9 10 7 4 3 1 2 8 9 10 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 2 4 3 1 7 8 9 10 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 4 2 3 1 7 8 9 10 14 16 sizeH

M[1]↔ M[i];

heapify(M,1);

end iteration i = 5;

1

1

4

4

3

1 3

2 2

1

3

3

2 2

3

3

2 2

1

i 1

i 1 2 3 4 5 6 7 8 9 10 4 2 3 1 7 8 9 10 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 1 2 3 4 7 8 9 10 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 3 2 1 4 7 8 9 10 14 16 sizeH

M[1]↔ M[i];

heapify(M,1);

end iteration i = 4;

81

1

1

1

1

3 3

2

2

2

2 2

1

2 1 i i 1 2 3 4 5 6 7 8 9 10 3 2 1 4 7 8 9 10 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 1 2 3 4 7 8 9 10 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 2 1 3 4 7 8 9 10 14 16 sizeH

M[1]↔ M[i];

heapify(M,1);

end iteration i = 3;

1

1 2

1

i i 1 2 3 4 5 6 7 8 9 10 2 1 3 4 7 8 9 10 14 16 sizeH

1 2 3 4 5 6 7 8 9 10 1 2 3 4 7 8 9 10 14 16 sizeH

M[1]↔ M[i];

heapify(M,1);

2 1

1 2 3 4 5 6 7 8 9 10 1 2 3 4 7 8 9 10 14 16

82

Temporal cost of heapsort Building a heap: O(n). n − 1 calls to heapify (of cost O(log n)). ⇒ heapsort ∈ O(n log n). Only when all the elements are equal the cost of heapsort would be O(n).

83

Priority queues The common application of a heap. Priority queue set S of elements, each one with a key (priority). Associated operations: • Insert(S,x): inserts x in the set S. • Extract Max(S): deletes and returns the element of S with greatest key. • Maximum(S): return and element of S with greatest key. ⇒ processes management in a shared system.

84

Inserting an element in a heap Strategy: 1. Expand the size of the heap in 1 (size Heap + 1). 2. Insert the new element in this position of the array (right-most located leaf). 3. Compare the key of the new element with its parent: If the key of the new node is greater → the parent does not accomplish the heap property: a) Swap the keys. b) if the one which is the parent of the new element does not accomplish the heap property → repeat the swapping of the keys until the parent of the new element is greater or equal or the new element is the root.

85

Inserting and element in a heap (II) Function that inserts the new element of key x (before calling it, we have to check if the size of the array allows the insertion): void insert(type_baseT *M, type_baseT x){ int i; size_Heap++; i = size_Heap; while ( (i > 1) && (M[i/2] < x) ){ M[i] = M[i/2]; i = i/2; } M[i] = x; }

86

Inserting and element in a heap: Example inserting an element of key 15. 1

1

16

16

3 10

2 14 4

5 8

8 2

7

4

9

1

14

7

6 9

3 10

2 4 8

3

10

8 2

9

7 9 10 4 1

7

6

5

3

i

1 2 3 4 5 6 7 8 9 10 16 14 10 8 7 9 3 2 4 1 sizeH

i 8 1 2 3 4 5 6 7 9 10 11 16 14 10 8 7 9 3 2 4 1 sizeH

initial tree

talla Heap++; i=size Heap;

87

1

1

16

16 3 10

2 14 4 8 8 2

i 9 10 4 1

5

1 3 10

2 i 7

6 9

16

4 8

3

7

5

8 2

14 9 10 4 1

i 15 7

6 9

3 10

2 4 8

3

7

5

8 2

14 9 10 4 1

7

6 9

3

7

1 2 3 4 5 6 7 8 9 10 11 16 14 10 8 7 9 3 2 4 1 7 i sizeH

1 2 3 4 5 6 7 8 9 10 11 16 14 10 8 14 9 3 2 4 1 7 i sizeH

1 2 3 4 5 6 7 8 9 10 11 16 15 10 8 14 9 3 2 4 1 7 i sizeH

M[i]=M[i/2]; i=i/2;

M[i]=M[i/2]; i=i/2;

M[i]=x;

88

Temporal cost of inserting an element in a heap Cost: number of compares until finding the position of a new element. Heap of n elements: • best case: inserting from the beginning at the corresponding position: O(1). • worst case: the element is the maximum → achieving the root: O(log n).

89

Extracting the maximum of a heap → the root Strategy: 1. Get the root value (M [1]). 2. Delete the root by replacing it with the value of the last position of the heap (M [1]=M [size Heap]). Reduce the size of the heap in 1. 3. Apply heapify over the root in order to maintain the heap property.

90

Extracting the maximum of a heap (II)

typy_baseT extract_max(type_baseT *M){ type_baseT max; if (size_Heap == 0){ fprintf(stderr,"Empty heap"); exit(-1); } max = M[1]; M[1] = M[size_Heap]; size_Heap--; heapify(M,1); return(max); }

91

Extracting the maximum of a heap: Example 1

1

16

16 3 10

2 14 4

5 8

8 2

7

4

9

1

14 7

6 9

3 10

2 4 8

3

10

5

8 2

7

4

9

1

7

6 9

3

10

1 2 3 4 5 6 7 8 9 10 16 14 10 8 7 9 3 2 4 1 size

1 2 3 4 5 6 7 8 9 10 16 14 10 8 7 9 3 2 4 1 size

max=M[1];

M[1]↔ M[size Heap];

92

1

1

1

14 3 10

2 14 4

5 8

8 2

7

4

8 7

6 9

3 10

2 4 4

3

9

1 2 3 4 5 6 7 8 9 1 14 10 8 7 9 3 2 4 sizeH

size Heap–; heapify(M,1);

5

8 2

7

1

7

6 9

3

9

1 2 3 4 5 6 7 8 9 14 8 10 4 7 9 3 2 1 size

return(max);

93

Temporal cost of extrating the maximum of a heap To apply heapify over the root: O(log n). Extract Max ∈ O(log n).

⇒ heap which represent the priority queue: operations ∈ O(log n).

94

Data structure for disjoint sets: MF-set Equivalence class of a: subset that contains all the elements related with a (through a equivalence relation). Equivalence classes → partition of C Every element of the set appears in an equivalence class. In order to know if a has an equivalence relation with b ⇒ check out if a and b are in the same equivalence class.

95

MF-set (II) MF-set (Merge-Find set): structure of n fixed elements. It is not possible to add or delete elements. Elements organized in equivalence classes. Subsets identified by representative: - the smallest one. - no matter the element. If the subset is not modified → same representative. C

4 1 2

6

5 7

3

8 10

9 11

12

96

MF-set (III) Operations: Merge(x,y): x ∈ Sx and y ∈ Sy → union of Sx with Sy . The new subset representative is one of its members, normally the representative of Sx or Sy is selected. The subsets Sx and Sy are removed. Find(x): returns the representative of the equivalence class to which x belongs to.

Applications: grammar inference, equivalence of finite state machines, calculation of expansion tree of minimum cost of a non-directional graph, etc.

97

MF-sets representation Each subset → a tree: • Node: element information. • The root is the representative. Tree representation using pointers to the parent: the one pointing itself will be the root. MF-set: collection of trees (wood). Each element → number from 1 to n + array M : position i ≡ index of the parent of i. 1

2

4

5

12

10

6

8

3

9

7

11

1 2 3 4 5 6 7 8 9 10 11 12 M 1 1 2 4 4 4 9 10 8 10 9 12

98

Operations on MF-sets Merge Merge(x,y): to do that the root of the tree points to the root of the other tree. Assuming that x and y are root (representatives) ⇒ to modificate the pointer of the parent of one of the representatives, O(1). 1

12

10

2

3

8

4

5

6

9

7

11

1 2 3 M 1 1 2

4 5 6 7 8 9 10 11 12 10 4 4 9 10 8 10 9 12

99

Operations on MF-sets Find Find(x): using the pointer to the parent, traverse the tree from x to the root. The nodes visited are the search path. Cost proportional with the depth of the node, O(n). 10

8

4

5

6

1 2 3 M 1 1 2

9

7

4 5 6 7 8 9 10 11 12 10 4 4 9 10 8 10 9 12

11

100

Analysis of the temporal cost Initial MF-Set: n subsets of one element ⇒ worst sequence of operations: 1. carry out n − 1 Merge operations → unique set of n elements, and 2. carry out m Find operations. Temporal cost: n − 1 operations Merge ∈ O(n) → m operations Find ∈ O(mn). Cost determined by how it is carried out the Merge operation, after k operations Merge can produce a tree of height k. ⇒ Heuristic techniques in order to improve the cost (reducing the height of the tree).

101

Merge by height or range

Strategy: to merge in order that the root of the shortest tree points to the root of the highest.

Height of resulting tree: max(h1, h2).

Required to keep the height of each node.

The height of a tree of n elements ≤ blog nc.

Cost of m searching operations: O(m log n). 102

4

5

10 6

10

8

4 8

5

6

9

9

7

11

7

11

103

Path compression Strategy: When an element is being searched, do all nodes of the search path to be directly linked with the root. Combining both heuristics, cost of m searching operations: O(mα(m, n)), with α(m, n) ≡ Ackerman inverse function, (slow growing). Normally, α(m, n) ≤ 4. In practice, with both heuristics ⇒ to do m searching operations has an almost linear cost with m. 10

10 8

9

8

11

9

7

11

7

104

Tries A kind of search tree. Application: dictionaries. Words with common prefixes use the same memory for the prefixes. Common prefixes compaction ⇒ save of space.

105

Tries (II) Ex: set {pool, prize, preview, prepare, produce, progress}

106

Tries: Searching an element It begins at the root. From the beginning to the end of the word, character by character is considered. Chose the edge labeled with the same character. Each step takes one character and descends one level. If the word is finished and a leaf has been reached ⇒ found. If at any time there is no edge with the current character or the word is finished and we are in an internal node ⇒ word not recognized. Search time proportional to the length of the word −→ very efficient data structure.

107

Tries: Representation Trie ≡ a kind of finite state machine (FSM). Representation: transition matrix. • Rows = states. • Columns = labels. Each position of the matrix save the next state to transit. Very efficient temporal cost. Most of the nodes will have few edges =⇒ a great amount of memory wasted.

108

Balanced trees Data structures to save elements. Allow efficient searchs.

Balanced tree: a tree where any leaf are more far away from the root than any other leaf. Several balance strategies (different definition for more far away). Different algorithms to update the tree.

109

AVL tree AVL tree: binary search tree that accomplishes 1. The height of the subtrees of every node differ at maximum in 1. 2. Every subtree is an AVL tree. Creators: Adelsson, Velskii and Landis. They are not completely balanced. Search, insert and delete an element ∈ O(log n). 12

8

11

5

4

5

17

8

12

18

18

11

17

4

110

2-3 trees 2-3 trees: empty tree or a single node or with several nodes that accomplish: 1. Every interior node has 2 or 3 children. 2. Every path from the root to a leaf has the same length. Internal nodes: p1 k1 p2 k2 p3 • • • • •

p1 : p2 : p3 : k1: k2:

pointer to the first child. pointer to the second child. pointer to the third child (if exists). smallest key of any descendent from the second child. smallest key if any descendent from the third child.

Leaves: information of the corresponding key.

111

2-3 trees (II) Example:

112

2-3 trees: Searching The values in the internal nodes guide the searching. Begin at the root: k1 and k2 are the two values saved in the root. • If x < k1, carry on searching in the first child. • If x ≥ k1 and the second node has only 3 children, carry on searching in the second child. • If x ≥ k1 and the node has 3 children, carry on searching in the second child if x < k2 and in the third child if x ≥ k2. Apply the strategy to every node that belongs to the search path. End: a leaf is reached with • the key x =⇒ element found. • a key different to x =⇒ element not found.

113

B-trees 2-3 trees generalization. Applications: • Extern data storage. • Data base indexes management. • Allows to reduce the access to disk in data base queries.

114

B-trees (II) B tree of n order: search tree n-ary that accomplishes The root is a leaf or it has a minimum of two children. Every nodeo, except the root and the leaves, have between d n2 e and n children. Every path from the root to a leaf has the same length. Every interior node has up to (n − 1) key values and up to m pointers to their children. The elements are saved in the interior nodes and in the leaves. A B-tree can be seen as a hierarchical index. The root would be the first indexed level.

115

B-tree (III) Internal nodes:

p1 k1 p2 k2 . . . . . . kn−1 pn

• pi is a pointer to the i-th children, 1 ≤ i ≤ n. • ki are the values of the keys, (k1 < k2 < . . . < kn−1 ) so: ◦ all the keys of the subtree p1 are less than k1. ◦ For 2 ≤ i ≤ n − 1, all the keys in the subtree pi are greater or equal than ki−1 and less than ki. ◦ All the keys in the subtree pn are greater or equal than kn−1.

116

B-´ arboles + B+ tree: B-tree in which all the keys saved in the internal nodes are no useful (only used for searching purposes). All the keys of the internal nodes are duplicated in the leaves. Advantage: the leaves are sequencially linked and it is possible to access to the information of the elements without visiting the interior nodes. Improve the efficiency in some searching methods.

117

Data Structutes and Algorithms

Graphs

Definitions Graph→ model in order to represent relationships between the elements of a set. Graph: (V ,E), V is a set of vertices or nodes, with a relationship between them; E is a set of pairs (u,v), u,v ∈ V , called edges or arcs. Directed graph: the relationship on V is not symmetric. Edge ≡ sorted pair (u,v). Undirected graph the relationship on V is symmetric. Edge ≡ non-sorted pair {u,v}, u,v ∈ V and u6=v 1

2

3

1

2 3

4

5

6

Directed graph G(V , E). V = {1,2,3,4,5,6} E = {(1,2),(1,4),(2,5),(3,5),(3,6),(4,2), (5,4),(6,6)}

5

4

Non-directed graph G(V , E). V = {1,2,3,4,5} E = {{1,2},{1,5},{2,3},{2,4},{2,5}, {3,4},{4,5}} 1

Definitions (II) Path from u ∈ V to v ∈ V : sequence v1, v2, . . . , vk that u = v1, v = vk , and (vi−1,vi) ∈ E, for i =2,. . . ,k. Ex: path from 3 to 2 →. 1

2

3

4

5

6

Length of a path: number of edges of a path. Simple path: path in which all its vertices, except, may be, the first and the last, are different.

2

Definitions (III) Cycle: simple path v1, v2, . . . , vk that v1 = vk . Ex: is a cycle of length 3. 1

2

3

4

5

6

1

2

3

4

5

6

Loop: cycle of length 1.

Acyclic graph: graph without cycles. 1

2

3

4

5

6

3

Definitions (IV) v is adjacent to u if exists an edge (u,v) ∈ E. In an undirected graph, (u,v) ∈ E relates the nodes u, v. In a directed graph, (u,v) ∈ E has v as destiny, and u as origin. Degree of a vertex: number of edges with the node as destination. In directed graphs exist the out-degree and the in-degree. The degree of the vertex is the sum of in-degree and out-degree. Degree of a graph: maximum degree of its vertices.

4

Definitions (V) G0 = (V 0, E 0) is a subgraph of G = (V , E) if V 0 ⊆ V and E 0 ⊆ E. Induced subgraph by V 0 ⊆ V : G0 = (V 0,E 0) that E 0 = {(u,v) ∈ E | u,v ∈ V 0}. Examples of subgraphs of the slide number 1 graph:

1

2

1

2

4

5

4

5

V 0 = {1,2,4,5} E 0 = {(1,2),(1,4),(2,5),(5,4)}

Induced graph by V 0 = {1,2,4,5}

5

Definition (VI) v is reachable from u, is exists a path from u to v. Am undirected graph is connected is exists a path between a vertex and any other one. A directed graph with such a property is called strongly connected: 2 1

4

3

If a directed graph is not strongly connected, but if the subjacent graph (without direction in the edges) is connected, the graph is weakly connected.

6

Definitions (VII) In an undirected graph, the connected components are the equivalence classes following the equivalence relation “to be reachable from”. An undirected graph is unconnected if it is formed by several connected components. In a directed graph, the strongly connected components, are the equivalence classes following the equivalence relation “to be mutually reachable”. A directed graph is non-strongly connected if it is formed by several strongly connected components. 1

2

3

4

5

6

7

Definitions (VIII) Weighted graph: every edge, or vertex, or both, have a weight.

1 8

4

10 12

2 7

9

5

3 −1

15

6

9

8

Introduction to the theory of graphs problem → representation with graphs → algorithm → computer

The K¨ onigsberg bridges An island in the center of the river. Seven bridges that link the different areas. Problem: to schedule a walk from a point to the same point, crossing all the bridges but once and only once.

9

The K¨ onigsberg bridges (II)

10

The K¨ onigsberg bridges (III)

11

The K¨ onigsberg bridges (IV)

Translating the problem to graphs: representing islands and borders with points. transforming bridges in lines that link the points. A C

D

B

New problem: is it possible to draw the figure from a point and come back to the same point, without passing twice over a line and without lifting the pen? 12

The K¨ onigsberg bridges (V) Euler found out the following rules: 1. A graph formed by vertices of even degree can be traversed in one pass, from a vertex to the same vertex. 2. A graph with just two vertices of odd degree can be traversed in one pass, but without coming back to the starting vertex. 3. A graph with a number of vertices of odd degree greater that two cannot be traversed in just one pass. The K¨ onigsberg bridges: VERTEX A B C D

DEGREE 3 3 5 3

⇒ ¡the problem has no solution! 13

Representing graphs: Adjacency lists G = (V ,E): array of size |V |. Position i → pointer to a linked list of elements (adjacency list). The elements of the list are the adjacent vertices of i

2

1

3 4

5

1

4

2

5

3

6

1

2

5

2

1

5

3

2

4

4

2

5

3

5

1

2

4

1

2

2

5

3

5

4

2

5

4

6

6

3

4

4

6

14

Adjacency lists (II) If G is directed, the sum of the length of the adjacency lists will be |E|. If G is undirected, the sum of the length of the adjancency lists will be 2|E|. Spatial cost, directed or not: O(|V | + |E|). Proper representation for graphs with |E| less than |V |2. Disadvantages: if it is needed to check out if an edge (u,v) belongs to E ⇒ search v in the adjacency list of u. Cost O(Degree(G)) ⊆ O(|V |).

15

Adjacency lists (III) Representation suitable for weighted graphs. The weight of (u,v) is saved in the node of v of the adjacency list of u.

1 8

4

10 12

2 7

9

5

3 −1

15

6

9

1

2 10

2

5 7

3

5 −1

4

2 12

5

4 9

6

6 9

4 8

6 15

16

Adjacency lists (IV) C types definition (weighted graphs): #define MAXVERT ... typedef struct vertex{ int node, weight; struct vertex *next; }vert_adj; typedef struct{ int size; vert_adj *adj[MAXVERT]; }graph;

17

Representing graphs: Adjacency matrix G = (V ,E): matrix A of size |V | × |V |.  1 if (i,j) ∈ E Value aij of the matrix: aij = 0 in other case

2

1

3

1

4

4 5

4

5

2

5

1 2 3

3

6

1 2 3 4 5 6

1 0 1 0 0 1 1 0 0 0 0 0 0

2 1 0 1 1 1 2 1 0 0 1 0 0

3 0 1 0 1 0

4 0 1 1 0 1

5 1 1 0 1 0

3 0 0

4 1 0

5 0 1

6 0 0

0 0 0 0

0 0 1 0

1 0 0 0

1 0 0 1

18

Adjacency matrix (II) Spatial cost: O(|V |2). Representation suitable for graphs with a low number of vertices, or dense graphs (|E| ≈ |V | × |V |). Check if an edge (u,v) belongs to E → consult position A[u][v]. Cost O(1).

19

Adjacency matrix (III) Representing weighted graphs: The weight (i,j) is saved in A[i, j].  aij =

1 8

4

10 12

2 7

9

5

w(i, j) if (i,j) ∈ E 0 or ∞ in other case

1 2 3

3 −1

15

6

9

4 5 6

1 0 0 0 0 0 0

2 10 0 0 12 0 0

3 0 0 0 0 0 0

4 8 0

5 6 0 0 0 7 0 −1 15 0 0 0 0 9 0 0 0 9

20

Adjacency matrix (IV) C types definition: #define MAXVERT ... typedef struct{ int size; int A[MAXVERT][MAXVERT]; }graph;

21

Traversing graphs: depth-first algorithm → A generalization of the pre-order method in a tree. Strategy: Start from a arbitrary vertex v. When a new vertex is visited, explore every path that starts from it. Until a path is not finished, the next path is not started. The exploration of a path is finished when a vertex already visited is reached. If there were vertices not reachable from v the traverse is uncompleted: select one of them as new starting vertex, and repeat the process.

22

Depth-first traversing algorithm (II) Recursive strategy: given G = (V , E) 1. Mark all the vertices as unvisited. 2. Chose vertex u as initial point. 3. Mark u as visited. 4. ∀v adjacent to u, (u,v) ∈ E, if v has not been visited, repeat recursively (3) and (4) for v. Finalize when all reachable nodes from u have been visited. If from u all the nodes are not reachable: back to (2), chose a new unvisited vertex v as starting point, and repeat the process until traversing all the vertices.

23

Depth-first traversing algorithm (III) → use a color array (size of input |V |) to indicate if u has been visited (color[u]=YELLOW) or not (color[u]=WHITE): Algorithm Depth-first(G){ for each vertex u ∈ V color[u] = WHITE end for for each vertex u ∈ V if (color[u] = WHITE) Visit node(u) end for } Algorithm Visit node(u){ color[u] = YELLOW for each vertex v ∈ V adjacent to u if (color[v] = WHITE) Visit node(v) end for } 24

Depth-first traversing algorithm: Example

2

5 7

1

4

3

6

1

4

2

5

3

5

4

6

5

7

6

7

2

5

3

2

3

7

¡Look out!: the traversing depends on the order in which the vertices in the adjacency lists appear.

25

Depth-first traversing algorithm: Example (II) 2

2

5

2

5

7 4

u

4

1

6

3

Visit node(1) 2

5

4

color[1]=YELLOW 2

5

7 u

u

6

3

Visit node(4)

5

7

u

4

6

3

7 1

1

6

3

Depth-First(G) 2

7

7 u

1

5

1

4

6

3

color[4]=YELLOW

1

4

6

3

Visit node(6)

26

Depth-first traversing algorithm: Example (III) 2

2

5

5

7

2 u

5

7

u

7

u

4

1

4

1

6

3

1

6

3

color[6]=YELLOW

4

6

3

Visit node(7)

color[7]=YELLOW u

2

2

5

5

7 4

1

u

6

3

Visit node(3)

2

5

7 4

1

u

6

3

color[3]=YELLOW

7 1

4

6

3

Visit node(5)

27

Depth-first traversing algorithm: Example (IV) 2

u

u

5

2

u

7 1

4

6

3

color[5]=YELLOW

2

5

5

7 1

4

6

3

Visit node(2)

7 1

4

6

3

color[2]=YELLOW

28

Depth-first traversing algorithm: temporal cost G = (V , E) is represented using adjacency lists. Visit node is applied only over unvisited vertices → only once over each vertex. Visit node depends on the number of adjacent vertices that u has (length of adjacency list). cost of all calls to Visita node: X

|adj(v)| = Θ(|E|)

v∈V

Adding the cost associated to the loops of Depth-First: O(|V |). ⇒ Depth-First cost is O(|V | + |E|).

29

Traversing graphs: breadth-first algorithm → Generalization of the tree traversing by levels. Strategy: Start from an arbitrary vertex u, visit u and, afterwards, visit every adjacent node to u. Repeat the process for every adjacent node to u, following the order in which they were visited. Cost: O(|V | + |E|). 2

5 7

u

1

4

6

3 30

Least cost paths G = (V , E) directed and weighted and w(u,v) the weight of each edge. Weight of a path p =< v0, v1, . . . , vk >: the sum of all the weights of the path:

w(p) =

k X

w(vi−1, vi)

i=1

Minimum weight path from u to v: path with minimum weight among all the paths from u to v, or ∞ if there is no path from u to v. Length of a path from u to v: weight of a path from u to v. Shortest path from u to v: Minimum weight path from u to v.

31

Least cost paths (II) Paths from 1 to 2: 10

30

4

Path

1

30

1

1

10

3 100

5

3 50

4 10

50

3

Length (weight or cost)

50

30

5

20 10

1

2

100

5

1

50

1

2 5

4 20

4

50 2

20

35 2

2 20

100 120

2

40 32

Least cost paths (III) We use a directed weighted graph to represent the communication between cities: • vertex = city. • edge (u,v) = road from u to v; the weight associated to the edge is the distance. Shortest path = the fast way. Variants in the least cost paths: • • • •

Shortest Shortest Shortest Shortest

paths from a vertex to all the other. paths from all the vertex to a particular one. path from a vertex u to a vertex v. paths between every pair of vertices.

33

Dijkstra algorithm Problem: G = (V , E) directed weighted graph with non-negative weights; given an origin vertex s, get the shortest paths to the rest of vertices of V . If there are negative weights, the solution could be wrong. Other algorithms allow negative weights, but without cycles of negative weight. Idea: explote the property that the shortest path between two vertices has shortest paths between the vertices that belong to the path.

34

Dijkstra algorithm (II) The Dijkstra algorithm keeps these sets: A set of vertices S that has the vertices whose shortest distant from the origin is known. Initially S = ∅. A set of vertices Q = V − S that keeps, for every vertex, the shortest distant from the the origin passing through the vertices that belong to S (Provisional distance). To save provisional distances an array D[1..|V |] is used, where D[i] indicates the provisional distant from the origin s to the vertex i. Initially, D[u] = ∞ ∀u ∈ V − {s} and D[s]= 0. Besides: An array P [1..|V |] to recovery the minimum calculated paths. P [i] saves the index to the vertex that precedes to the vertex i in the shortest path from s to i.

35

Dijkstra algorithm (III) Strategy: 1. Extract from Q the vertex u whose provisional distance D[u] is least. → this distance is the least possible between the origin vertex s and u. The provisional distances would correspond with the shortest path using vertices from S. ⇒ it is no longer possible to find a shortest path from s to u using any other vertex of the graph 2. Insert u, to whom the shortest path from s has been calculated, in S (S = S ∪ {u}). Update the provisional distances of the vertices of Q adjacent to u that improve using the new path. 3. Repeat 1 and 2 until Q is empty ⇒ in D it will be, for every vertex, the shortest distance to the origin.

36

Algorithm Dijkstra(G, w, s) { for each vertex v ∈ V do D[v] = ∞; P [v] = N U LL; end for D[s] = 0; S = ∅; Q = V ; while Q 6= ∅ do u = extract min(Q); /∗ following D ∗/ S = S ∪ {u}; for each vertex v ∈ V adjacent to u do if D[v] > D[u] + w(u,v) then D[v] = D[u] + w(u,v); P [v] = u; end if end for end while } 37

Dijkstra algorithm: Example Origin vertex: 1. Discontinuous line edges = provisional paths from origin to the vertices. Thick line edges = minimum paths already calculated. Rest of edges with thin lines.

38

s

1

5

2

20

10

40

6

3

5

10 5 10

S {}

Q {1,2,3,4,5,6}

u −

5

1 0 N U LL

D P

4

20

2 ∞ N U LL

5

3 ∞ N U LL

4 ∞ N U LL

5 ∞ N U LL

6 ∞ N U LL

s

1

5

2

20

10

40

6

3

5

10 5 10

S {1}

Q {2,3,4,5,6}

u 1

5 D P

20

1 0 N U LL

4 2 ∞ N U LL

5

3 40 1

4 ∞ N U LL

5 10 1

6 5 1 39

s

1

5

2

20

10

40

6

3

5

10 5 10

S {1,6}

Q {2,3,4,5}

5

u 6

4

20

1 0 N U LL

D P

2 25 6

5

3 40 1

4 ∞ N U LL

5 10 1

6 5 1

s

1

5

2

20

10

40

6

3

5

10 5 10

S {1,6,5}

Q {2,3,4}

5 u 5

D P

20

1 0 N U LL

4 2 25 6

5

3 40 1

4 30 5

5 10 1

6 5 1 40

s

1

5

2

20

10

40

6

3

5

10 5 10

S {1,6,5,2}

Q {3,4}

5 u 2

D P

4

20

1 0 N U LL

2 25 6

5

3 40 1

4 30 5

5 10 1

6 5 1

4 30 5

5 10 1

6 5 1

s

1

5

2

20

10

40

6

3

5

10 5 10

S {1,6,5,2,4}

Q {3}

5 u 4

D P

20

1 0 N U LL

4 2 25 6

5

3 35 4

41

s

1

5

2

20

10

40

6

3

5

10 5 10

S {1,6,5,2,4,3}

Q {}

5 u 3

20

D P

1 0 N U LL

4 2 25 6

5

3 35 4

4 30 5

5 10 1

6 5 1

42