Extreme Value Analysis

Extreme Value Analysis Extreme Value Analysis Nov 8, 2012 Extremes Extreme Value Analysis Motivation I Quantification of extremes is important ...
Author: Elinor Fisher
3 downloads 3 Views 844KB Size
Extreme Value Analysis Extreme Value Analysis

Nov 8, 2012

Extremes

Extreme Value Analysis

Motivation

I

Quantification of extremes is important for environmental disaster planning purposes - flood, wind, mudslides, fire, tornado, temperatures, drought, etc

I

Financial applications: insurance, risk analysis, stock gain/loss, etc

I

Human health: heat waves, ozone/pollutant levels, contamination levels, etc

Extremes

Extreme Value Analysis

Extreme Value Distribution

*Simulation from a χ23 Extremes

Extreme Value Analysis

Extreme Value Distribution

χ23

GEV(10.8, 0.8, 1)

Extremes

Extreme Value Analysis

Extreme Value Distribution

Extremes

Extreme Value Analysis

Extreme Value Analysis

Why do we need EVA?

What if the distributional model is correctly specified?

But what if the data are normally distributed?

Extremes

Extreme Value Analysis

Motivating Example

8000 7000 6000 5000

dowjones[, 2]

9000

10000

11000

12000

Daily Dow Jones, 1996−2000

0

200

400

600

800

1000

Index

Extremes

Extreme Value Analysis

1200

Motivating Example Log transformation for normality (djd), Differenced to remove temporal correlation:

Density

30

0.00

20

−0.02

10

−0.04

0

−0.06

diff(log(dowjones[, 2]))

0.02

40

0.04

50

density.default(x = djd)

0

200

400

600

800

1000

1200

Index

−0.08

−0.06

−0.04 N = 1303

Log Difference

−0.02

0.00

Density Extremes

Extreme Value Analysis

0.02

Bandwidth = 0.001849

0.04

Motivating Example

But what if the data are normally distributed? ●



0.04



● ●

● ●● ●

0.00 −0.02

Sample Quantiles

0.02

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

−0.04

● ●

−0.06









−2

0

2

Standard Normal Quantiles

djd ∼ N(µ, σ) reasonable assumption for > 99% of the data.

Extremes

Extreme Value Analysis

Motivating Example

djd ∼ N(µ, σ) estimated by: x¯ = 0.0006697099, and σ = 0.01068724 P(drop of 249 points or more ) = P(djd ≤ -0.02843519) = P(drop of 617 points or more ) = P(djd ≤ -0.07454905) = And yet both of these were observed. −249.43 as the 1st quantile; −617.78 as the minimum difference (maximum drop). Extreme value analysis yields much more realistic (block) quantiles (return levels) and probabilities for extreme events.

Extremes

Extreme Value Analysis

Interpretable Quantity

A common measure of extreme events is the n-year return level.1 The n-year return level, yn , is the level so extreme it is expected to occur once every n time-units. Time units are usually natural groupings of the measured quantity - years, months, days, hours. Note: the n-year (block) return level corresponds to the (1 − n1 ) quantile of the predictive distribution.

1

Cooley, D., Nychka, D., Naveau, P., ”Bayesian Spatial Modeling of Extreme Precipitation Return Levels” Extremes

Extreme Value Analysis

Extreme Value Distribution

20 15 10

Temp

25

30

How do we decide what is extreme?

0

50

100

150 Months

Extremes

Extreme Value Analysis

200

20 15 10

Temp

25

30

Extreme Value Distribution

0

50

100

150 Months

Extremes

Extreme Value Analysis

200

Theory Given a sequence of iid RV’s X with common distribution F, consider the maxima Mn = max{X1 , ..., Xn }. The distribution of Mn can be derived exactly: P{Mn ≤ y } = {F (y )}n (Coles, 2001). Extremal Types Theorem: If there exists sequences of normalizing constants {an > 0} and {bn } such that P{(Mn − bn )/an ≤ y } → G (y ) where G is non-degenerate distribution function, then G belongs to the class of Generalized Extreme value distributions. Note: In practice, an and bn are not estimated.

Extremes

Extreme Value Analysis

Theory

The Generalized Extreme Value (GEV) distribution is defined by   −1/ξ  y −µ G(y) = exp − 1 + ξ ψ +

The Generalized Extreme Value (GEV) distribution is defined by location µ, scale ψ, and shape ξ. Parameter Interpretation:

Extremes

Extreme Value Analysis

Theory GEV:   −1/ξ  y −µ Pr{Y ≤ y } = exp − 1 + ξ ψ +

Three Types: I

Gumbel

I

Fr´echet

I

Weibull

Extremes

Extreme Value Analysis

Theory Parameters: Location µ, scale ψ, and shape ξ. Main Advantage: Direct parameterization for n-year Return Level, yn directly obtained w/ estimated GEV parameters   1 yn − µ −1/ξ = 1+ξ ψ n leads to: yn =

( ξ µ + ψ n ξ−1

if ξ 6= 0,

µ + ψ log n

if ξ = 0.

Main Disadvantage: Loss of information - biased return levels

Extremes

Extreme Value Analysis

Motivating Example Dow Jones Example:

50

density.default(x = djd) ●

0.04

0.04



● ●

0.00

0.02

20

−0.02

Density

Sample Quantiles

30

0.00 −0.02

● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●●

−0.04





−0.06

10

−0.04



−0.06





0

diff(log(dowjones[, 2]))

0.02

40

● ●● ●

0

200

400

600

800

1000

1200

Index

Log Difference



−0.08

−0.06

−0.04

−0.02

0.00

0.02

0.04

N = 1303 Bandwidth = 0.001849

Density

Extremes

Extreme Value Analysis

−2

0

2

Standard Normal Quantiles

QQ Norm Plot



GEV Analysis of Dow Jones Monthly Blocks: ∼22 days/month. 61 months (blocks). Minimums Quantile Plot ● ●●





Empirical

0.6

●●●● ● ●●● ● ●●

0.4

Model

0.8

●● ●● ● ●● ● ● ●●● ●●●● ● ●● ● ●● ● ● ● ●

0.0

0.2

●● ●● ● ●● ●● ● ● ●● ● ● ●● ●●●

0.2

0.4

0.6

0.8

1.0



● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

0.01

0.02

● ●●

0.03



0.04

Empirical

Model

Density Plot

0.06

0.08

40

0.05

30



f(z)

0.06



20



0.04



1e−01

1e+00

1e+01

10

●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

0

0.00

0.02

Return Level





Return Level Plot

0.10

0.0

0.01 0.02 0.03 0.04 0.05 0.06 0.07

1.0

Probability Plot

1e+02

1e+03

Return Period

● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●

0.00

0.02





0.04

0.06





0.08

z

library(ismev); library(extrRemes) djd.gevMIN u} = 1 − 1 + ξ σxu +

Advantage of GPD: I

Richer source of extreme data Disadvantages of GPD:

I

Parameters tied to threshold value, u, ie sclae param: σGPD = ψGEV + ξ(u − µ)

I

Extremes tend to occur in clusters - violates independence assumptions

Extremes

Extreme Value Analysis

POT Analysis of Dow Jones

0.02 0.00

0.01

Mean Excess

0.03

How to chose threshold?

0.00

0.01

0.02

0.03

0.04

0.05

0.06

u

Mean Residual Life Plot mrl.plot(djd.neg, umin=0, umax=max(djd.neg)-0.01) Extremes

Extreme Value Analysis

Threshold: Theory

Peaks Over Thresholds (POT) If the threshold is chosen to be “high enough” I

the parameters will stabilize

I

parameter estimations are equivalent to the GEV parameters, after a linear transformation of the scale parameter, (and allowing for sample estimation effects).

I

ξGPD = ξGEV

I

return value estimates are robust and asymptotically equivalent to GEV return value estimates

Bias-Variance trade-off:

Extremes

Extreme Value Analysis

POT Analysis of Dow Jones Thresholds: 0.015, 0.022

0.8

0.07

1.0

0.07

0.8

0.06



● ●



1.0

0.02







●● ●

0.03

● ●

0.0



0.04

0.05

0.06



0.0

0.2

● ●



0.03

0.2

● ● ● ● ● ●



0.04



● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●





Empirical

0.6

Model

● ● ● ●

0.4



0.05

0.06 Empirical

0.03 0.6





● ●

0.4

0.6

0.8

● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

1.0

●● ●

0.03

0.04

0.05

Empirical

Model

Empirical

Model

Return Level Plot

Density Plot

Return Level Plot

Density Plot

150

1.0

50

0.5

f(x)

100

Return level

100 f(x)

0.1

50

0.2

Return level

0.3

150

0.4

0.06

1.5

0.0

● ●●●

0.4

● ●





0.02

0.2

● ●● ● ● ●● ●● ● ●● ● ●●● ●●● ●●● ● ●●●

0.2

Quantile Plot ● ● ●

● ●

●● ●●● ●● ●● ●● ● ●●● ●●

0.0

Probability Plot ●



0.05

0.8 0.6

●●● ● ●●●●● ●●● ●● ● ●●

0.4

Model

Quantile Plot ●●● ● ● ● ●● ●● ●● ●● ●● ●●

0.04

1.0

Probability Plot



0.0

0 1e+03

Return period (years)

0.01

0.03

0.05

0.07

x

●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

1e−01

● ●



1e+01

0



1e+01

0.0

● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

1e−01

1e+03

Return period (years)

0.02

0.04

0.06 x

djd.gpdMIN15 0 for each i (if these constraints are violated, L is automatically set to 0).

Extremes

Extreme Value Analysis

Point Process

I

The basic method of estimation is therefore to choose the parameters (µ, ψ, ξ) to maximize the negative log-likelihood. This is performed by numerical nonlinear optimization.

I

In practice it is convenient to replace (µ, ψ, ξ) by (θ1 , θ2 , θ3 ) where θ1 = µ, θ2 = log ψ, θ3 = ξ (defining θ2 to be log ψ rather than ψ itself makes the algorithm more numerically stable, and has the advantage that we don’t have to build the constraint ψ > 0 explicitly into the optimization procedure).

Extremes

Extreme Value Analysis

Point Process

Provided the model fits the data: I

PP approach should produce equivalent parameter values as the POT approach

I

Parameters are independent of the threshold (adjusting for estimation error)

I

Ideal threshold determined by considering where the parameter values stabilize

Extremes

Extreme Value Analysis

PP Analysis of Dow Jones

Quantile Plot

0.0

0.2

0.4

0.6

0.8

1.0

empirical



0.06

● ●

0.04

empirical

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.02

model

0.0 0.2 0.4 0.6 0.8 1.0

Probability plot

● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.02

0.03

0.04

0.05

0.06

model

djd.ppMIN 0, y > 0

Extremes

Extreme Value Analysis

Bivariate Extremes - Block Maxima

1

Z V (x, y ) = 2

max( 0

w 1−w , dH(w ) y y

where H is a distribution function on [0, 1] satisfying the mean constraint: 1

Z

wdH(w ) = 1/2 0

If H is differentiable with density h, then Z V (x, y ) = 2

1

max( 0

w 1−w , h(w )dw y y

This defines the class of bivariate extreme value distributions

Extremes

Extreme Value Analysis

Bivariate Extremes - Block Maxima

For any constant a > 0, V (a−1 x, y −1 ) = aV (x, y ) Thus, G n (x, y ) = (exp{−V (x, y )})n = exp{−nV (x, y )} = G (n−1 x, n−1 y )

So if (X , Y ) has distribution function G , then Mn also does, with rescaling n−1

Extremes

Extreme Value Analysis

Bivariate Extremes - Block Maxima Theory was derived using standard unit Fr´echet. Note that this can be generalized to the complete class of bivariate extreme value distributions by generalizing the marginal distributions: 



x − µx σx



1 ξx



y − µy σy



1 ξy

x˜ = 1 + ξx and  y˜ = 1 + ξy with distribution function:

G (x, y ) = exp{−V (˜ x , y˜ )}

Extremes

Extreme Value Analysis

Bivariate Extremes - Block Maxima H does not have to be differentiable. For example, when H places mass 0.5 on w = 0 and w = 1, V (x, y ) = x −1 + y −1 and the corresponding bivariate extreme value distribution is: G (x, y ) = exp{−(x −1 + y −1 )} This corresponds to independent variables. When H places mass 1 on w = 0.5: G (x, y ) = exp{−(max(x −1 , y −1 )} This corresponds to dependent variables.

Extremes

Extreme Value Analysis

Bivariate Extremes - Block Maxima Families for which the mean is parameter-free and the V integral is tractable define dependence classes. I

Logistic Family:

I

Bi-logistic Family:

I

Dirichlet model:

Extremes

Extreme Value Analysis

References

I

Coles, 2001. An Introduction to Statistical Modeling of Extremes.

I

Cooley, D. 2012 course notes from ENVR short course on uni- and multi-variate extremes

I

R-packages: ismev, evd, evir, SpatialExtremes, extRemes

I

Richard Smith spatemp notes http://www.stat.unc.edu/faculty/rs/s321/spatemp.pdf

I

http://cran.r-project.org/web/packages/SpatioTemporal/ vignettes/Tutorial.pdf

Spatial Extremes

Extremes

Extreme Value Analysis