Introduction to Algorithms Massachusetts Institute of Technology Singapore-MIT Alliance Professors Erik Demaine, Lee Wee Sun, and Charles E. Leiserson

September 28, 2001 6.046J/18.410J SMA5503 Handout 12

Problem Set 2 Solutions MIT students: This problem set is due in lecture on Monday, September 24. SMA students: This problem set is due after the video-conferencing session on Wednesday, September 26. Reading: Chapters 6, 7, 5.1-5.3. Both exercises and problems should be solved, but only the problems should be turned in. Exercises are intended to help you master the course material. Even though you should not turn in the exercise solutions, you are responsible for material covered by the exercises. Mark the top of each sheet with your name, the course number, the problem number, your recitation instructor and time, the date, and the names of any students with whom you collaborated. MIT students: Each problem should be done on a separate sheet (or sheets) of three-hole punched paper. SMA students: Each problem should be done on a separate sheet (or sheets) of two-hole punched paper. You will often be called upon to “give an algorithm” to solve a certain problem. Your write-up should take the form of a short essay. A topic paragraph should summarize the problem you are solving and what your results are. The body of your essay should provide the following: 1. A description of the algorithm in English and, if helpful, pseudocode. 2. At least one worked example or diagram to show more precisely how your algorithm works. 3. A proof (or indication) of the correctness of the algorithm. 4. An analysis of the running time of the algorithm. Remember, your goal is to communicate. Graders will be instructed to take off points for convoluted and obtuse descriptions.

Handout 12: Problem Set 2 Solutions

2 Exercise 2-1. Do Exercise 5.3-1 on page 104 of CLRS. Solution:

   

 Exchange    !#"%$&(')+*, for -  . to  do Exchange / -  /012 !3"% - ')

R ANDOMIZE -I N -P LACE

-

$54 -36 7*98;:@3:#

For our base case, we have initialized to 2. Therefore we must show that for each possible 1permutation, the subarray A[1] contains this 1-permutation with probability . Clearly this is the case, as each element has a chance of of being in the first position.

7:I is A

and our base

J4K 

 G L B F C E   B +:M.#G  BDCNEB 22:(:(.#.#GGG G5>  G +  ( : 3 .  G  G >  2  ( : # .  G 5 G > . 6OBPCFEB B 6QCNE B BDCFE 6R CNEG B BLCFE BDCNE

Now we use induction, and assume that all trees with nodes or fewer has a height of . Next we consider a tree with nodes. Looking at the node, we know its height is one greater than its parent (and since we’re not in the base case, all nodes have a parent). The parent of the th node in the tree is also the th node in the tree. Therefore its height is . Then the th node in the tree has a height of . Therefore by induction we have shown that the height of an node tree is .



Exercise 2-3. Do Exercise 6.4-3 on page 136 of CLRS. Solution:

S $  FC E 2*





The running time of H EAPSORT on an array of length that is already sorted in increasing order is because even though it is already sorted, it will be transformed back into a heap and sorted.

$  +* "J(S T FC E





The running time of H EAPSORT on an array of length that is sorted in decreasing order will be . This occurs because even though the heap will be built in linear time, every time the element is removed and the H EAPIFY is called it will cover the full height of the tree. Exercise 2-4. Do Exercise 7.2-2 on page 153 of CLRS. Solution:

Handout 12: Problem Set 2 Solutions

S $L U9*

3

V $+*W>

The running time will be because every time partition is called, all of the elements will be put into the subarray of elements smaller than the partition. The recurrence will be which is clearly

V $X4K3* 6YS $+*

S $ U *

Exercise 2-5. Do Problem 7-3 on page 161 of CLRS. Solution:

7:MZ

(a) This sort is intuitively correct because the largest rd of the elements will eventually be sorted amung their peers. If they are in the first third of the array to begin with, they will be sorted into the middle third. If they are in the middle or last third, then they will obviously be sorted into their proper position. Similarly any element which belongs in the each of the thirds will be sorted into position by the three sub-sorts.

V $L+*[>\Z V $]. S $L CNE 2* , H EAPSORT > S $ CFE +* , and Q UICKSORT >k U . Therefore all other Which solves to

sorts are faster and these professors do not deserve tenure for this work! Problem 2-1. Average-case performance of quicksort

l L$  NC E 2*

, but we have not We have shown that the expected time of randomized quicksort is yet analyzed the average-case performance of ordinary quicksort. We shall prove that, under the assumption that all input permutations are equally likely, not only is the running time of ordinary quicksort , but it performs essentially the same comparisons and exchanges between input elements as randomized quicksort.

l L$  NC E +*

Consider the implementation of PARTITION given in lecture on a subarray

 monnip3 :

$ q'rm+')pM* TX / m? - sm um  p t 6  t wvOT -  - 6      t  mx / -  Let y be a set of distinct elements which are provided in random order (all orders equally likely) as the input array / monnip# to PARTITION , where >zp{4/m 6  is the size of the array. Let T denote the initial value of  mx . PARTITION 1 2 3 for to 4 do if 5 then 6 exchange 7 exchange 8 return

Handout 12: Problem Set 2 Solutions

4

 m 6 |nnip3   m 6 |nn p#

y 4Y}7T+~ , that is, that all permuta-

(a) Argue that is a random permutation of tions of the input subarray are equally likely. Solution:

 monnip3

 >€pW4Rm 6   8ƒ‚ 3:#%>„$L4…3*†8

7: ‘ A if 5q>… 6 if “YT‡n (b) Consider two input arrays o”9 monn)p3 and  U  mqnnip3 consisting of the elements of y such that ˆ $Lo”• - P*> ˆ $ U  - P* for all - >zm2'rm 6 ('nnn'ip . Suppose that we run PARTITION on o” monnip3 and  U  monn p# and trace the two executions to record the branches taken, indices calculated, and exchanges performed — but not the actual array values manipulated. Argue briefly that the two execution traces are identical. Argue further that PARTITION performs the same permutation on both inputs.

Solution:

ˆ

PARTITION takes different branches based only on the comparisons made in the function. This is clear by observing line 4 of the function, as it is the only place where the sequence of instructions may differ. As the arrays have identical function values, they must take the same branches, calculate the same indicies and perform the same exchanges. Consequently PARTITION will perform the same partition on both arrays, which follows directly as the exchanges performed are identical.

ˆ $]*

> —r™1”9'š™ U 'nnn'†™7›1œ to be an $L='†x* input pattern if ™(”ž> A , ™7Ÿ= ¡} 4o(' 6 M~ for ˜ – - >\.¢')Z?'nnn•') £ } -|‰ 7™ Ÿ2>@4qM~ £ >\4K . Define a sequence – >I—]™1”9'†™ U 'nnn'†™7›1œ to be an $L='†?* output pattern if Ž 4o if - ’O¤'  ™3Ÿ > ‘ A if - >z¤' 6  if - “O¤n We say that a permutation —]g™7Ÿ for all - >@('†.ƒ'nnn9') . Define a sequence , and

Handout 12: Problem Set 2 Solutions (c) How many Solution:

4\

$L='†x*

5

input patterns are there? How many

›­4q ˆ ®4\

'†

output patterns are there?

¬4

input patterns because we can choose positions out of There are possible positions to have . There is one ( ) output pattern because the pattern must be exactly negative ones followed by a 0, followed by ones.

y

¯4O

$='šx*

$L='†x*

input pattern? How many (d) How many permutations of satisfy a particular permutations of satisfy a particular output pattern?

y

Solution:

$L°4…x*†8P$]®4g3*98 permutations are possible of y

4o

to satisfy a particular input pattern. This is the total number of ways to rearrange the elements which have a value of amongst themselves, and rearrange those with a value of 1 amongst themselves. There are also permutations possible to satisfy a particular output pattern for the same reason.

ˆ

$°4Ox*98N$r®4z3*98

– > —r™1”†'†™ U 'nnn•'†™3›Mœ y5£ ²

$L='†x*

–± > ¦— ™ ” ± 'š™ U ± 'nnn'†™ › ± œ be an $L='†?* y – , and likewise define

Let be an input pattern, and let output pattern. Define to be the set of permutations of that satisfy to be the set of permutations of that satisfy .

y³£ ²x´

y

–±

y³£ ²

y5£ ²x´

=8

(e) Argue that PARTITION implements a bijection from to . (Hint: Use the fact from group theory that composing a fixed permutation with each of the possible permutations yields the set of all permutations.)

=8

Solution:

y³£ ²

–

ˆ

All members of satisfy and so they all have the same result when the function is applied to its elements. Therefore by part (b) when all these inputs are given to PARTITION they are subject to the same permutation. Using the hint, we then know that after all of the distinct inputs are run through PARTITION that they will produce distinct outputs. From part we know that and are the all same size, and also we have proven that PARTITION is onto, and therefore PARTITION must be a bijection!

$L¤4µx*†8P$]o4Q3*98

$¢*

y 4I}7T+~

T¶>  mx

y³£ ²

y³£ ²x´

/ m 6 | nn)p3

(f) Suppose that before the call to PARTITION , the input subarray is a random permutation of , where . Argue that after PARTITION , the two resulting subarrays are random permutations of their respective elements. Solution:

J4^ y³£ ²

Using our solution from part (e), we know that after PARTITION is run on , we get all values in the set . Therefore we get all permutations of the ones and all permutations of the negative ones. Furthermore, we get each sub-array permutations an equal number of times and so the subarrays are also random permutations.

y³W£ ²x4·´ 

Handout 12: Problem Set 2 Solutions

6

$ q'rm+')pM* T‡>·/ m?

(g) Use induction to show that, under the assumption that all input permutations are equally likely, at each recursive call of Q UICKSORT , every element of belonging to is equally likely to be the pivot .

 monnip3

y

Solution:

$]™2*

$¢*

The base case for the initial array; we know that it is randomly permuted, and so by part and each of its subarrays will also be randomly permuted after PARTITION . Therefore we can inductively apply at each partition to prove that every subarray will also be randomly permuted.

$r™2*



l L$  NC E +*

(h) Use the analysis of R ANDOMIZED -Q UICKSORT to conclude that the average-case running time of Q UICKSORT on elements is . Solution: By part (g) we know that under the assumption that the input pattern is random every element is equally likely to be chosen as the pivot at each recursive call which then produces the same random distribution of quicksort traces as R ANDOMIZED -Q UICKSORT . Therefore as their distribution is the same, the expected-case analysis for R ANDOMIZED -Q UICKSORT will apply to the average case of Q UICKSORT . Therefore the average case of Q UICKSORT time. also takes

l $  FC ¸(E +*



Problem 2-2. Analysis of -ary heaps





A -ary heap is like a binary heap, but (with one possible exception) nonleaf nodes have children instead of 2 children.



(a) How would you represent a -ary heap in an array? Solution:

 ¹ 1pM • - > BD- :zq‚ - 6%t

The -ary heap would be similar to a binary heap with the parent and child indexes calculated as follows:

o4K t > A n nnik >-  .

where The root of the tree would be at index







  0.#  1U 6  6   t

Alternate Solution: A -ary heap can be represented in a -dimensional array as follows. The root is kept in , its children are kept in order in through , their children are kept in order in through , and so on. The two procedures that map a node with index to its parent and to its th child (for ), respectively, are;

/0 ov

6   t v…

 6 3.  -

Handout 12: Problem Set 2 Solutions

7

$-* return º $ - 4K3* :M(» $' * D - ARY-C HILD - t return $ - 4O3* 6%t56  D - ARY-PARENT

To convince yourself that these procedures really work, verify that

‡v t v˜

$

D - ARY-PARENT D - ARY-C HILD

/>\.

$ - ' t *i*[> - '

for any . Notice that the binary heap procedures are a special case of the above procedures when .



(b) What is the height of a -ary heap of



elements in terms of





and ?

Solution:



S $ NC ¸(E(¼ +* . We know that  6  6  U 6 nnn 6 1½ ¨©” ’ ¾v  6 ”  6  U 6 nnn 6 1½  ½ 4O ’ ¾v  ½9¿ 4^ o4K q4K  ½ ’ {$]q4K7* 6 ·v 1½•¿ ” ’ CF¸(E(¼ ${$]o4K3* 6 3*Œv 6  which solves to ¬> º $ CN¸(E ¼ $L[$q4K3* 6 3*=4O3*,» . (c) Give an efficient implementation of E XTRACT-M AX in a  -ary max-heap. Analyze its running time in terms of  and  . CORRECTION A -ary heap would have a height of

Solution:

$LŠ' - ')=')¢* t/‹ ¤-  A ÁtoÀ o4K¬ vK /0À ƒw“^/  if -6 >zÀ and - 6 t then t -6 t>  then Exchange / -   / t  H EAPIFY $q' t ')=')¢* The running time of H EAPIFY is l $] NC ¸(E#¼ +* because at each depth we are doing  loops, and we recurse to the depth of the tree. In H EAPIFY we compare the the - th H EAPIFY 1 2 for 3 4 5 if 6 7 8

node and each of its children to find the maximum value for all of the nodes. Then if

Handout 12: Problem Set 2 Solutions

8

-

the maximum child is greater than the th node, we switch the two nodes and recurse on the child.

"J1TX /Ã  /  Ä>…J4O "J(T

E XTRACT-M AX (A, n) 1 2 3 4 H EAPIFY (A, 1, n, d) 5 return

l $/ÀL!7 ¼ 2*

The running time of this algorithm, is clearly constant work plus the time of H EAPIFY . E XTRACT-M AX works by storing the value which as shown above is of the maximum element, moving the minimum element into the root of the heap, and then calling heapify to restore the heap property.







(d) Give an efficient implementation of I NSERT in a -ary max-heap. Analyze its running time in terms of and .

Solution:

See next problem part for I NCREASE -K EY definition.

L$ Š'† ')='šƒ* Ä 6  >¥4³Å

I NSERT 1 2 3 I NCREASE -K EY

$LŠ' - '†')2* l $ NC ¸($ E ¼ 2* +* l FC ¸(E ¼

From the following problem part, we know I NCREASE -K EY runs in time, therefore since I NSERT only adds constant time operations it is also . It is rather trivially correct as the algorithm has not changed because all calculations involving the number of children are performed by I NCREASE K EY .

ƤÇ#È2$/ - r'†?*



$q' - '†?*

 - 2

, which first sets (e) Give an efficient implementation of I NCREASE -K EY and then updates the -ary max-heap structure appropriately. Analyze its running time in terms of and .

Solution:





Handout 12: Problem Set 2 Solutions

9

L$ Š' - '†x*  - Æ®Ç#È2$L - ]'†x* W>z/ -  Ÿ - “g ,É ¼7Ê w’O/ -  / -   /&É ¼Ÿ Ê  Ÿ -  É ¼Ê

I NCREASE -K EY 1 2 if 3 while and 4 do 5 Exchange 6

l $ L!3 ¼ +*

Our implementation loops proportionally to at most the depth of the tree, therefore it runs in time. I NCREASE -K EY loops, at each step comparing the increased node to its parent and exchanging them if the heap property is violated. Therefore, once the algorithm terminates we know that the node once again satisfies the heap property and has the correct value.



(f) When might it be better to use a -ary heap instead of a binary heap? Solution:



It would be better to use a -ary heap when it is predicted that the heap will do many more I NSERT s and I NCREASE -K EY s than E XTRACT-M AX s because I NSERT and I NCREASE -K EY are faster algorithms as increases while E XTRACT-M AX gets slower.