School of Computer Science

A06494 Calculators may be used in this examination provided they are not capable of being used to store alphabetical information other than hexadecim...
Author: Crystal Pitts
1 downloads 2 Views 137KB Size
A06494

Calculators may be used in this examination provided they are not capable of being used to store alphabetical information other than hexadecimal numbers.

School of Computer Science

Second Year - BSc Artificial Intelligence and Computer Science Second Year - BSc Computer Science Second Year - BEng/MEng Computer Science/Software Engineering Third Year - BSc Accounting and Finance with Year Computer Science

06 02640

Machine Learning Summer Examinations 2012

Time allowed: 1 hr 30 min

[Answer FOUR out of Five Questions]

-1-

Turn Over

Non-alpha only [Answer 4 out of 5 questions] 1. Decision Tree Learning The following table contains Student Exam Performance Data. We want to predict if a student will be first this year. Student ID

First last year?

Male?

Works hard?

Drinks?

First this year?

1

yes

yes

no

yes

yes

2

yes

yes

yes

no

yes

3

no

no

yes

no

yes

4

no

yes

no

yes

no

5

yes

no

yes

yes

yes

6

no

yes

yes

yes

no

(a)

Construct a minimal decision tree, using the ID3 algorithm, that predicts whether or not a student will come out first this year. Show all your working. [15%]

(b)

Translate your decision tree into a collection of decision rules.

(c)

Explain how does your tree classify the following new instances?

[5%]

Student ID

First last year?

Male?

Works hard?

Drinks?

First this year?

7

no

yes

no

yes

??

8

no

no

yes

yes

?? [5%]

A06494

-2-

Turn Over

Non-alpha only 2. Bayesian learning and probabilistic sequence models (a)

Consider the following training instances to train a robot janitor to predict whether or not an office contains a recycling bin. STATUS FLOOR DEPT. OFFICE SIZE RECYCLING BIN? 1. faculty four cs medium yes 2. student four ee large yes 3. staff five cs medium no 4. student three ee small yes 5. staff four cs medium no How would a naive Bayesian classifier classify the following instance: STATUS FLOOR DEPT. OFFICE SIZE RECYCLING BIN? student four cs small ?? Show all your working.

(b)

[15%]

Consider the following sequence over the alphabet {x,y}: S=[x x x x x y y y y y x x x x] and consider the two models: M1: A random sequence model with parameters P(1)=0.4, P(0)=0.6 M2: A first-order Markov model with initial probabilities of 0.5 for each symbol, and transition probabilities: P(1|1)=0.6, P(0|1)=0.4, P(1|0)=0.1, and P(0|0)=0.9. Which model is more likely to be the generator of the sequence S and why? Give both an intuitive argument and then rigorous computations to justify your answer. [10%]

A06494

-3-

Turn Over

Non-alpha only 3.

Reinforcement Learning Consider the deterministic reinforcement environment drawn below. The nodes are states, the arcs are actions, the numbers on the arcs are the immediate rewards. Let the discount rate equal 0.8. The L, R, or C at the beginning of each arc is the name of the action that arc represents.

(a)

Start with a Q-table that initially contains all Q-values equal to 3 (an arbitrary choice). Use Q-learning to update these values after each of the following three episodes, showing all of your working: Episode 1: start → a → b → d → end Episode 2: start → a → b → end Episode 3: start → a → d → end [20%]

(b)

What is the optimal policy estimate at the end of the last episode in a)? [5%]

A06494

-4-

Turn Over

Non-alpha only 4.

Instance based learning. (a)

Consider the following data set with two real-valued inputs x (i.e. the coordinates of the points) and one binary output y (taking values + or -). We want to use k-nearest neighbours (K-NN) with Euclidean distance to predict y from x.

Calculate the leave-one-out cross-validation error of 1-NN on this data set. That is, for each point in turn, try to predict its label y using the rest of the points, and count up the number of misclassification errors. [10%]

5.

(b)

Calculate the leave-one-out cross-validation error of 3-NN on the same data set. [5%]

(c)

Describe how would you choose the number of neighbours K in K-NN in general? [5%]

(d)

Which of the situations below bears the danger to result in over fitting and why? (i) increasing the number of neighbours K (ii) decreasing the number of neighbours K [5%]

Short questions (a)

Argue whether the following statement is true or false: A classifier that attains 100% accuracy on the training set and 70% accuracy on test set is better than a classifier that attains 70% accuracy on the training set and 75% accuracy on the test set. [5%]

(b)

Which of the classifiers that you know of would perform well when the data classes are non-linearly separable? List all the classifiers that you would consider for such a case. [5%]

(c)

Point out two advantages and two disadvantages of K-NN learning.

(d)

In decision tree learning, can perfect purity be always achieved at the leaf nodes once we have used all attributes? Why or why not? [5%]

(e)

Describe a conceptual similarity between Independent Component Analysis and Latent Semantic Analysis.

A06494

-5-

[5%]

[5%]

End of Paper