NLP Programming Tutorial 10 – Neural Networks
NLP Programming Tutorial 10 Neural Networks
Graham Neubig Nara Institute of Science and Technology (NAIST)
1
NLP Programming Tutorial 10 – Neural Networks
Prediction Problems
Given x, predict y
2
NLP Programming Tutorial 10 – Neural Networks
Example we will use: ●
Given an introductory sentence from Wikipedia
●
Predict whether the article is about a person Given
●
Predict
Gonso was a Sanron sect priest (754-827) in the late Nara and early Heian periods.
Yes!
Shichikuzan Chigogataki Fudomyoo is a historical site located at Magura, Maizuru City, Kyoto Prefecture.
No!
This is binary classification (of course!) 3
NLP Programming Tutorial 10 – Neural Networks
Linear Classifiers y = sign (w⋅ϕ(x)) I
= sign ( ∑i=1 w i⋅ϕi ( x)) ●
x: the input
●
φ(x): vector of feature functions {φ1(x), φ2(x), …, φI(x)}
●
w: the weight vector {w1, w2, …, wI}
●
y: the prediction, +1 if “yes”, -1 if “no” ●
(sign(v) is +1 if v >= 0, -1 otherwise)
4
NLP Programming Tutorial 10 – Neural Networks
Example Feature Functions: Unigram Features ●
Equal to “number of times a particular word appears”
x = A site , located in Maizuru , Kyoto φunigram “A”(x) = 1
φunigram “site”(x) = 1
φunigram “,”(x) = 2
φunigram “located”(x) = 1 φunigram “in”(x) = 1 φunigram “Maizuru”(x) = 1 φunigram “Kyoto”(x) = 1 φunigram “the”(x) = 0 φunigram “temple”(x) = 0
… ●
The rest are all 0
For convenience, we use feature names (φunigram “A”) instead of feature indexes (φ1)
5
NLP Programming Tutorial 10 – Neural Networks
Calculating the Weighted Sum x = A site , located in Maizuru , Kyoto φunigram “A”(x) φunigram “site”(x) φunigram “located”(x)
=1 =1 =1
φunigram “Maizuru”(x) φunigram “,”(x) φunigram “in”(x) φunigram “Kyoto”(x) φunigram “priest”(x) φunigram “black”(x)
=1 =2 =1 =1 =0 =0
*
…
wunigram “a” wunigram “site”
=0 = -3
wunigram “located” wunigram “Maizuru” wunigram “,” wunigram “in” wunigram “Kyoto” wunigram “priest” wunigram “black”
=0 =0 =0 =0 =0 =2 =0
=
0 + -3 + 0 + 0 + 0 0 + + 0 + 0 + 0 +
… =
-3 → No!
6
NLP Programming Tutorial 10 – Neural Networks
The Perceptron ●
Think of it as a “machine” to calculate a weighted sum φ“A” =1 φ“site” = 1 φ“located” = 1 φ“Maizuru”= 1 φ“,” =2 φ“in” =1 φ“Kyoto” = 1 φ“priest” = 0 φ“black” = 0
0 -3 0 0 0 0 0 2 0
I
sign( ∑ i=1 w i⋅ϕi ( x))
-1
7
NLP Programming Tutorial 10 – Neural Networks
Problem: Linear Constraint ●
Perceptron cannot achieve high accuracy on nonlinear functions
X
O
O
X
8
NLP Programming Tutorial 10 – Neural Networks
Neural Networks ●
Neural networks connect multiple perceptrons together φ“A” =1 φ“site” = 1 φ“located” = 1 φ“Maizuru”= 1 φ“,” =2 φ“in” =1 φ“Kyoto” = 1 φ“priest” = 0 φ“black” = 0
●
-1
Motivation: Can express non-linear functions
9
NLP Programming Tutorial 10 – Neural Networks
Example: ●
Build two classifiers:
φ(x1) = {-1, 1} φ(x2) = {1, 1}
X
φ2
O φ1
O
X
φ(x3) = {-1, -1} φ(x4) = {1, -1}
φ1
1
φ2
1
1
-1
w1
y1
w2
y2
φ1 -1 φ2 -1 1 -1 10
NLP Programming Tutorial 10 – Neural Networks
Example: ●
These classifiers map the points to a new space y(x3) = {-1, 1}
φ(x1) = {-1, 1} φ(x2) = {1, 1}
X
φ2
O
O
y2
φ1
O
y1
X
X
φ(x3) = {-1, -1} φ(x4) = {1, -1}
1 1 -1
y1
-1 -1 -1
y2
y(x1) = {-1, -1} y(x4) = {-1, -1}
O y(x2) = {1, -1}
11
NLP Programming Tutorial 10 – Neural Networks
Example: ●
In the new space, examples are classifiable!
φ(x1) = {-1, 1} φ(x2) = {1, 1} φ2
X
O φ1
O
1 1 1
X
y3
φ(x3) = {-1, -1} φ(x4) = {1, -1} 1 1 -1 -1 -1 -1
y1
y(x3) = {-1, 1} O
y2
y1
y2
y(x1) = {-1, -1} y(x4) = {-1, -1} X
O y(x2) = {1, -1}
12
NLP Programming Tutorial 10 – Neural Networks
Example: ●
Final neural network: φ1
1
φ2
1
1
-1
w1
φ1 -1 φ2 -1 1 -1
1
1
w3
y4
w2
1 1 13
NLP Programming Tutorial 10 – Neural Networks
Representing a Neural Network ●
Assume network is fully connected and in layers
●
Each perceptron: ● ●
A layer ID A weight vector network = [ (1, w0), (1, w1), (1, w2), (2, w3) ]
Layer 1 φ“A” =1 φ“site” = 1 φ“located” = 1 φ“Maizuru”= 1 φ“,” =2 φ“in” =1 φ“Kyoto” = 1 φ“priest” = 0 φ“black” = 0
Layer 2
0
1
2
3
14
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process ●
Predict one perceptron at a time using previous layer φ“A” =1 φ“site” = 1 φ“located” = 1 φ“Maizuru”= 1 φ“,” =2 φ“in” =1 φ“Kyoto” = 1 φ“priest” = 0 φ“black” = 0
0
1
3
2
15
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process ●
Predict one perceptron at a time using previous layer φ“A” =1 φ“site” = 1 φ“located” = 1 φ“Maizuru”= 1 φ“,” =2 φ“in” =1 φ“Kyoto” = 1 φ“priest” = 0 φ“black” = 0
0 -1
1
3
2
16
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process ●
Predict one perceptron at a time using previous layer φ“A” =1 φ“site” = 1 φ“located” = 1 φ“Maizuru”= 1 φ“,” =2 φ“in” =1 φ“Kyoto” = 1 φ“priest” = 0 φ“black” = 0
0 -1
1
1
3
2
17
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process ●
Predict one perceptron at a time using previous layer φ“A” =1 φ“site” = 1 φ“located” = 1 φ“Maizuru”= 1 φ“,” =2 φ“in” =1 φ“Kyoto” = 1 φ“priest” = 0 φ“black” = 0
0 -1
1
2
1
3
1 18
NLP Programming Tutorial 10 – Neural Networks
Neural Network Prediction Process ●
Predict one perceptron at a time using previous layer φ“A” =1 φ“site” = 1 φ“located” = 1 φ“Maizuru”= 1 φ“,” =2 φ“in” =1 φ“Kyoto” = 1 φ“priest” = 0 φ“black” = 0
0 -1
1
2
1
3
-1
1 19
NLP Programming Tutorial 10 – Neural Networks
Review: Pseudo-code for Perceptron Predicton predict_one(w, phi) score = 0 for each name, value in phi if name exists in w score += value * w[name] if score >= 0 return 1 else return -1
# score = w*φ(x)
20
NLP Programming Tutorial 10 – Neural Networks
Pseudo-Code for NN Prediction predict_nn(network, phi) y = [ phi, {}, {} … ] # activations for each layer for each node i: layer, weight = network[i] # predict the answer with the previous perceptron answer = predict_one(weight, y[layer-1]) # save this answer as a feature for the next layer y[layer][i] = answer return the answer for the last perceptron
21
NLP Programming Tutorial 10 – Neural Networks
Neural Network Activation Functions ●
Previously described NN uses step function
y =sign( w⋅ϕ( x))
y
2 -4
-3
-2
-1
0
0
1
2
3
4
-2 sign(φ*x)
Step function is not differentiable → use tanh
y =tanh (w⋅ϕ(x )) Python: from math import tanh tanh(x)
y
●
-4
-3
-2
2 1 0 -1 -1 0 -2
1
tanh(φ*x)
2
3
4 22
NLP Programming Tutorial 10 – Neural Networks
Learning a Perceptron w/ tanh ●
First, calculate the error: δ = y' - y correct tag
●
system output
Update each weight with:
w ← w + λ⋅δ⋅ϕ(x ) ●
Where λ is the learning rate
●
(For step function perceptron δ = -2 or +2, λ = 1/2) 23
NLP Programming Tutorial 10 – Neural Networks
Problem: Don't Know Correct Answer! ●
For NNs, only know correct tag for last layer φ“A” =1 φ“site” = 1 φ“located” = 1 φ“Maizuru”= 1 φ“,” =2 φ“in” =1 φ“Kyoto” = 1 φ“priest” = 0 φ“black” = 0
0
y' = ? y=1
y' = ? 1 y=1
2
3
y' = 1 y = -1
y' = ? y=1 24
NLP Programming Tutorial 10 – Neural Networks
Answer: Back-Propogation ●
Pass error backwards along the network w=0.1 w=1
δ = 0.2
w=-0.3
δ = 0.4
j
●
δ = -0.9
∑i δ i w j ,i
Also consider gradient of tanh y
2 -4
-3
-2
-1
0
0
1
2
3
-2
4
2
d tanh (ϕ( x )∗w)=1−( ϕ( x)∗w) =1− y ●
2 j
Combine:
δ j=(1− y ) ∑i δi w j ,i 2 j
25
NLP Programming Tutorial 10 – Neural Networks
Back Propagation Code update_nn(network, phi, y') create array δ calculate y using predict_nn for each node j in reverse order: if j is the last node δj = y' – yj else δj =(1− y 2j ) ∑i δ i w j ,i for each node j: layer, w = network[j] for each name, val in y[layer-1]: w[name] += λ * δj * val 26
NLP Programming Tutorial 10 – Neural Networks
Training process create network randomize network weights for I iterations for each labeled pair x, y in the data phi = create_features(x) update_nn(w, phi, y) ●
●
For previous perceptron, we initialized weights to zero In NN: randomly initialize weights (so not all perceptrons are identical) 27
NLP Programming Tutorial 10 – Neural Networks
Exercise
28
NLP Programming Tutorial 10 – Neural Networks
Exercise (1) ●
Write two programs ● ●
●
train-nn: Creates a neural network model test-nn: Reads a neural network model
Test train-nn ● ● ●
Input: test/03-train-input.txt Use one iteration, one hidden layer, two hidden nodes Calculate updates by hand and make sure they are correct
29
NLP Programming Tutorial 10 – Neural Networks
Exercise (2) ●
Train a model on data/titles-en-train.labeled
●
Predict the labels of data/titles-en-test.word
●
Grade your answers ●
●
script/grade-prediction.py data-en/titles-en-test.labeled your_answer
Compare: ● ●
With a single perceptron/SVM classifiers With different neural network structures
30
NLP Programming Tutorial 10 – Neural Networks
Thank You!
31