Data Mining: Neural Network Applications

Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc. louise_f...
Author: Evan Summers
5 downloads 0 Views 2MB Size
Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc. [email protected]

Objectives of Presentation • • • • •

Introduce insurance professionals to neural networks Show that neural networks are a lot like some conventional statistics Indicate where use of neural networks might be helpful Show practical examples of using neural networks Show how to interpret neural network models

A Common Actuarial Application: Loss Development Accident

Months of Development

Year

12

24

36

1980

48

60

72

84

96

108

120

132

144

156

168

180

267

1,975

4,587

7,375 10,661 15,232 17,888 18,541 18,937 19,130 19,189 19,209 19,234 19,234 19,246

1981

310

2,809

5,686

9,386 14,884 20,654 22,017 22,529 22,772 22,821 23,042 23,060 23,127 23,127 23,127

1982

370

2,744

7,281 13,287 19,773 23,888 25,174 25,819 26,049 26,180 26,268 26,364 26,371 26,379 26,397

1983

577

3,877

9,612 16,962 23,764 26,712 28,393 29,656 29,839 29,944 29,997 29,999 29,999 30,049 30,049

1984

509

4,518 12,067 21,218 27,194 29,617 30,854 31,240 31,598 31,889 32,002 31,947 31,965 31,986

1985

630

5,763 16,372 24,105 29,091 32,531 33,878 34,185 34,290 34,420 34,479 34,498 34,524

1986

1,078

8,066 17,518 26,091 31,807 33,883 34,820 35,482 35,607 35,937 35,957 35,962

1987

1,646

9,378 18,034 26,652 31,253 33,376 34,287 34,985 35,122 35,161 35,172

1988

1,754 11,256 20,624 27,857 31,360 33,331 34,061 34,227 34,317 34,378

1989

1,997 10,628 21,015 29,014 33,788 36,329 37,446 37,571 37,681

1990

2,164 11,538 21,549 29,167 34,440 36,528 36,950 37,099

1991

1,922 10,939 21,357 28,488 32,982 35,330 36,059

1992

1,962 13,053 27,869 38,560 44,461 45,988

1993

2,329 18,086 38,099 51,953 58,029

1994

3,343 24,806 52,054 66,203

1995

3,847 34,171 59,232

1996

6,090 33,392

1997

5,451

An Example of a Nonlinear Function S c a tte rp lo t o f C u m u la tive P a id L o s s e s

C u m u la tiv e P a id L o s s e s

60000

40000

20000

0

1

3

6

8

D e v e lo p m e n t A g e

11

13

16

Conventional Statistics: Regression • •



One of the most common methods of fitting a function is linear regression Models a relationship between two variables by fitting a straight line through points Minimizes a squared deviation between an observed and fitted value

Neural Networks •

Also minimizes squared deviation between fitted and actual values



Can be viewed as a non-parametric, non-linear regression

The Feedforward Neural Network T h re e L a y e r N e u ra l N e tw o rk

In p u t L a ye r (In p u t Da ta )

Hid d e n L a ye r (P ro ce ss Da ta )

O u tp u t L a ye r (P re d icte d V a lu e )

The Activation Function •

The sigmoid logistic function f (Y ) =

1 1 + e −Y

Y = w0 + w1 * X 1 + w2 X 2 ... + wn X n

The Logistic Function Logistic Function for Various Values of w1

1.0

0.8 w1=-10 w1=-5 w1=-1 w1=1 w1=5 w1=10

0.6

0.4

0.2

0.0 X -1.2

-0.7

-0.2

0.3

0.8

Simple Example: One Hidden Node Simple Neural Network One Hidden Node

Input Layer (Input Data)

Hidden Layer (Process Data)

Output Layer (Predicted Value)

Function if Network has One Hidden Node h = f ( X ; w0, w1 ) = f ( w0 + w1 X ) =

1 1 + e −( w0 + w1 X ) 1

f ( f ( X ; w0 , w1 ); w2 , w3 ) = 1+ e

− ( w2 + w3

1 1+ e

− w0 + w1 X

)

Development Example: Incremental Payments Used for Fitting Scatterplot of IncrementalPaid Losses

30000

Paid Losses

20000

10000

0

1

3

6

8

11

Development Age (Years)

13

16

Two Methods for Fitting Development Curve •

Neural Networks • •



Simpler model using only development age for prediction More complex model using development age and accident year

GLM model • • • •

Example uses Poisson regression Like OLS regression, but does not require normality Fits some nonlinear relationships See England and Verrall, PCAS 2001

The Chain Ladder Model Cumulative paid:

j

Dij = ∑ Cik k =1

Age to age factor:

λij =

Di , j +1 Dij

Estimate of age to age factor using mean:

n

∑ λij

λ j = i =1 n

Common Approach: The Deterministic Chain Ladder Estimate of paid at 24 months:

C24 = D12 λ12 − D12 Estimate of Ultimate Paid: u Diu = Dij ∏ λik k= j

GLM Model

A Stochastic Chain Ladder Model Poisson Model:

E ( C ij ) = mij = xi y j Var [C ij ] = φ xi y j n

∑ yk = 1

k =1

Data often normalized by dividing by an exposure base

Hidden Nodes for Paid Chain Ladder Example O u tp u t o f H id d e n N o d e s

F itte d V a lu e fo r 2 N o d e s

1 .0

0 .8

0 .6

0 .4

N e u ra l N e tw o rk F itte d V a lu e

60 H id d e n N o d e 1 H id d e n N o d e 2

40

20

0 .2

0

0 .0 3

8

13

D e v e lo p m e n t A g e (Y e a rs )

3

8

13

D e v e lo p m e n t A g e (Y e a rs )

NN Chain Ladder Model with 3 Nodes 3 Node Neural Network Fitted

Neural Network Fitted

80.00

60.00

40.00

20.00

0.00

0.5

3.0

5.5

8.0

Development Age

10.5

13.0

15.5

Universal Function Approximator •

The feedforward neural network with one hidden layer is a universal function approximator



Theoretically, with a sufficient number of nodes in the hidden layer, any continuous nonlinear function can be approximated

Neural Network Curve with Dev Age and Accident Year

1996

150

1995

1995

1994

1993 1994 1992

Neural Fitted PPE

1993

1992

100

1991 1990 1989

1991 1988 1990 1989 1988

50

1986 1985 1984

1987

1983

1986

1982

1985

1981 1980

1984 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980

1987

1983 1982 1981 1980

1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980

1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980

1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980

1991 1989 1990 1987 1988 1985 1986 1984 1982 1983 1980 1981 1981 1982 1983 1986 1984 1985 1987 1988 1989 1990 1980

0

0.5

3.0

5.5

1980 1981 1982 1985 1983 1984 1986 1987 1988 1989

8.0

Development Age

1980 1981 1984 1982 1983 1985 1986 1987 1988

1980 1983 1981 1982 1984 1985 1986 1987

10.5

1982 1980 1981 1983 1984 1985 1986

1981 1980 1982 1983 1984 1985

13.0

1980 1981 1982 1983 1984

1980 1983

15.5

GLM Poisson Regression Curve Chain Ladder GLM Fitted Using Accident Year Symbol 95 96

200

95

Chain Ladder GLM Fitted

94

150 94

93

100

92

50

97 96 95 94 93 92 89 90 91 88 86 87 85 84 83 82 81 80

0

0.5

89 90 91 88 86 87 85 84 83 82 81 80

94 93

92 89 90 91 88 86 87 85 84 83 82 81 80

3.0

93

92 89 90 91 88 86 87 85 84 83 82 81 80

93 92 89 90 91 88 86 87 85 84 83 82 81 80

92 89 90 91 88 86 87 85 84 83 82 81 80

5.5

89 90 91 88 86 87 85 84 83 82 81 80

90 89 88 86 87 85 84 83 82 81 80

8.0

89 88 85 86 87 84 83 82 81 80

88 85 86 87 84 83 82 81 80

85 86 87 83 84 81 82 80

10.5

Development Age (Years)

82 83 84 85 86 80 81

82 83 84 85 80 81

13.0

81 82 83 84 80

80 83

15.5

How Many Hidden Nodes for Neural Network? •

Too few nodes: Don’t fit the curve very well



Too many nodes: Over parameterization •

May fit noise as well as pattern

How Do We Determine the Number of Hidden Nodes? • • •

Use methods that assess goodness of fit Hold out part of the sample Resampling • •



Bootstrapping Jacknifing

Algebraic formula •

Uses gradient and Hessian matrices

Hold Out Part of Sample •

Fit model on 1/2 to 2/3 of data



Test fit of model on remaining data



Need a large sample

Cross-Validation • • • • • •

Hold out 1/n (say 1/10) of data Fit model to remaining data Test on portion of sample held out Do this n (say 10) times and average the results Used for moderate sample sizes Jacknifing similar to cross-validation

Bootstrapping •

• • •

Create many samples by drawing samples, with replacement, from the original data Fit the model to each of the samples Measure overall goodness of fit and create distribution of results Used for small and moderate sample sizes

Jackknife of 95% CI for 2 and 5 Nodes Jacknife Result for 2 and 5 Nodes

Neural Fitted 2Node95 5Node95

6.00

4.00

2.00

0.00

0.5

3.0

5.5

8.0 Dev.age

10.5

13.0

15.5

Another Complexity of Data: Interactions Fit of Paid Development by 3Accident8 Year AccYr: 1989 to 1993

AccYr: 1993 to 1997

13 180

Neural Fitted PPE

80

AccYr: 1980 to 1984

180

AccYr: 1984 to 1989

80

3

8

13 Development Age

Technical Predictors of Stock Price A Complex Multivariate Example

Stock Prediction: Which Indicator is Best? • • •

Moving Averages Measures of Volatility Seasonal Indicators •



The January effect

Oscillators

The Data •

S&P 500 Index since 1930 • • • •

Open High Low Close

Moving Averages •

A very commonly used technical indicator • • •

• •

1 week MA of returns 2 week MA of returns 1 month MA of returns

These are trend following indicators A more complicated time series smoother based on running medians called T4253H

Volatility Measures • • •

Finance literature suggests volatility of market changes over time More turbulent market -> higher volatility Measures • • •

Standard deviation of returns Range of returns Moving averages of above

Seasonal Effects Month Effect On Stock Returns .02

95% CI 20 day return

.01

0.00

-.01

-.02 N=

1602

1429

1.00

1646

1565

3.00 2.00

MONTH

1598

1588

5.00 4.00

1560

1626

7.00 6.00

1514

1637

9.00 8.00

1478

1580

11.00 10.00

12.00

Oscillators • • •

May indicate that market is overbought or oversold May indicate that a trend is nearing completion Some oscillators • •

Moving average differences Stochastic

Stochastic and Relative Strength Index • Stochastic based on observation that as prices increase closing prices tend to be closer to upper end of range • %K = (C – L5)/(H5 – L5) • C is closing prince, L5 is 5 day low, H5 is 5 day high

• %D = 3 day moving average of %K

• RS =(Average x day’s up closes)/(Avg of x day’s down Closes) • RSI = 100 – 100/(1+RS)

Measuring Variable Importance •

Look at weights to hidden layer



Compute sensitivities: •

a measure of how much the predicted value’s error increases when the variables are excluded from the model one at a time

Neural Network Result Variable Importance



X X X X X X X •

Smoothed return %K (from stochastic) Smoothed %K 2 Week %D 1 Week range of returns Smoothed standard deviation Month R2 was .13 or 13% of variance explained

Understanding Relationships Between Predictor and Target Variables: A Tree Example

smooth.r

Suggest Documents