Lecture 6: Non Normal Distributions

Lecture 6: Non Normal Distributions and their Uses in GARCH Modelling Prof. Massimo Guidolin 20192– Financial Econometrics Spring 2016 Overview  ...
Author: Natalie Joseph
5 downloads 2 Views 1MB Size
Lecture 6: Non Normal Distributions and their Uses in GARCH Modelling Prof. Massimo Guidolin

20192– Financial Econometrics Spring 2016

Overview

 Non-normalities in (standardized) residuals from asset return models

 Tools to detect non-normalities: Jarque-Bera tests, kernel density estimators, Q-Q plots

 Conditional and unconditional t-Student densities; MLE vs. method-of-moment estimation

 Cornish-Fisher density approximations and their applications in risk managements  Hints to Extreme Value Theory (EVT)

Lecture 6: Non-normal distributions – Prof. Guidolin

2

Overview and General Ideas

 In this lecture we learn how to model departures of the marginal conditional densities from normality

 Let’s recap where we are at in the course. This is what we said…  We will proceed in three steps following a stepwise distribution modeling (SDM) approach:  Establish a variance forecasting model for each of the assets individually and introduce methods for evaluating the performance of these forecasts DONE!  Consider ways to model conditionally non-normal aspects of the returns on the assets in our portfolio—i.e., aspects that are not captured by conditional means, variances, and covariances NEXT • We still study RPF,t and possibly assume a GARCH has been fitted

 Link individual variance forecasts with correlations

 Recall baseline model:

Lecture 6: Non-normal distributions – Prof. Guidolin

3

Why an Interest in the Conditional Density?

 (G)ARCH models fail to produce sufficient non-normalities

 In lecture 6, we have studied dynamic univariate models of conditional heteroskedasticity • It has been stressed that these induce unconditional return distributions which are non-normal

 However ARCH models do not seem to induce sufficient nonnormality

• This can be seen in the fact that the standardized residuals from most GARCH models fail to be normally distributed Matching Gaussian

Kernel density

Lecture 6: Non-normal distributions – Prof. Guidolin

4

Tools to Test for Normality: Jarque-Bera

 For instance, in a Gaussian GARCH(1,1) model, Rt+1 = t+1zt+1, zt+1  N(0,1) 2t+1 =  + R2t + 2t and zt+1 = Rt+1/t+1  N(0,1) is a testable implication

• This GARCH is called “Gaussian” because zt+1  N(0,1), where zt is the standardized residual series

 Therefore non-normalities keep plaguing standardized residuals from many types of Gaussian GARCH models  Two issues: (A) How can we detect non-normalities in an empirical density (for either returns or standardized residuals)? (B) What can we do about it?

 Jarque-Bera test based on sample skewness & kurtosis  If X is a r. v. with mean μ and standard deviation , the skewness measures the asymmetry of the density function:

In our case, standardized residuals – but this can be applied generally

Lecture 6: Non-normal distributions – Prof. Guidolin

5

Tools to Test for Normality: Jarque-Bera

 Skewness is the scaled third central moment and reveals whether the empirical distributions of standardized residuals is asymmetric around the mean

 Skewness is computed as an odd power scaled central moment  Its sign depends on the relative weight of the observations below the mean respect to those above the mean: • Skew = 0, symmetric distribution (e.g., Normal) • Skew > 0, asymmetric to the right (e.g., Log-normal) • Skew < 0, asymmetric to the left (e.g., many empirical densities for realized asset returns)

 Kurtosis is instead defined as:

• This measure gives large weights to the observations far from the mean, i.e. the observations that falls in the tails of the distribution • The normal distribution has kurtosis of 3, so that its excess of kurtosis (kurt-3) is 0; a kurtosis larger than 3 means tails fatter than in the normal case Lecture 6: Non-normal distributions – Prof. Guidolin

6

Tools to Test for Normality: Jarque-Bera

 Kurtosis is the scaled fourth central moment and reveals whether the empirical distributions of standardized residuals has tails thicker than a Gaussian distribution  Jarque-Bera test summarizes any non-zero skewness and any non-zero excess kurtosis in a formal test of hypothesis

 Jarque and Bera (1980) proposed a test that measures departure from normality in terms of the skewness and kurtosis • Under the null of normally distributed errors, the asymptotic distribution of sample estimators of skewness and kurtosis are:

• Asymptotic means that the normal approximation becomes increasingly good as the sample size grows • Because they are asymptotically independent, the squares of their standardized forms can be added to obtain the Jarque-Bera statistic: Lecture 6: Non-normal distributions – Prof. Guidolin

7

Tools to Test for Normality: Kernel Estimators

 A kernel density estimator is a “smoother” of a standard empirical histogram

• Large values of this statistic indicate departures from normality • Example on S&P 500 daily returns, 1926-2010:

 A kernel density estimator is an empirical density “smoother” based on the choice of two objects, the kernel function K(x) and the bandwidth parameter h: • It generalizes the “histogram estimator”:

Lecture 6: Non-normal distributions – Prof. Guidolin

8

Tools to Test for Normality: Kernel Estimators

• (x) is the delta (Dirac) function, with (x) always zero but at x=0, when (0) = 1 • Let’s give a few examples. The most common type of kernel function used in applied finance is the Gaussian kernel: • A K(x) with optimal (in a Mean-Squared Error sense) properties is Epanechnikov’s: • Other popular kernels are the triangular and box kernels:

Lecture 6: Non-normal distributions – Prof. Guidolin

9

Tools to Test for Normality: Kernel Estimators

• The bandwidth parameter h is usually chosen according to the rule (T here is sample size): • The choice of the bandwidth in this way depends on the fact that it minimizes the integrated MSE:

• Do different choices of K(x) make a big differences? • It seems not, financial returns are typically leptokurtic, i.e., they have fat tails and highly peaked densities around mean Moment-matched Gaussian

Lecture 6: Non-normal distributions – Prof. Guidolin

10

Tools to Test for Normality: Q-Q Plots

 A Q-Q plot represents the quantiles of an empirical density vs. the quantile of some theoretical distribution • A less formal and yet powerful method to visualize non-normalities consists of quantile-quantile (Q-Q) plots

 The idea is to plot the quantiles of the returns against the quantiles of the normal (or otherwise selected) theoretical distribution • If the returns are truly normal, then the graph should look like a straight line on a 45-degree angle • Systematic deviations from the 45-degree line signal that the returns are not well described by the normal distribution • The recipe is: sort all standardized returns zt = RPF,t/σPF,t in ascending order, and call the ith sorted value zi • Then calculate the empirical probability of getting a value below the actual as (i−0.5)/T , where T is number of obs. • The subtraction of .5 is an adjustment allowing for a continuous distribution Lecture 6: Non-normal distributions – Prof. Guidolin

11

Tools to Test for Normality: Q-Q Plots

• Calculate the standard normal quantiles as where denotes the inverse of the standard normal density • We can scatter plot the standardized and sorted returns on the Y-axis against the standard normal quantiles on the X-axis

Raw S&P 500 returns

After GARCH(1,1)

 Why do risk managers care? Because differently from JB test and kernel density estimators, Q-Q plots provide information on where (in the support of the empirical return distribution) nonnormalities occur Lecture 6: Non-normal distributions – Prof. Guidolin

12

Non-Normality: What Can We do?

 Two key approaches to deal with non-normalities: to model conditional Gaussian moments; change the marginal density

 An obvious question is then: if all (most) financial returns have non-normal distributions, what can we do about it?  Probably, to stop pretending asset returns are “more or less” Gaussian in many applications and conceptualizations  Given that, there are two possibilities. First, to keep assuming that asset returns are IID, but with marginal, unconditional distributions different from the Normal • Such marginal distributions will have to capture the fat tails and possibly also the presence of asymmetries

 Second, stop assuming that asset returns are IID and model instead the presence of dynamics/time-variation in conditional densities • You have done this already: GARCH models!

 It turns out that both approaches are needed by high frequency (e.g., daily) return data Lecture 6: Non-normal distributions – Prof. Guidolin

13

Non-Normality: t Student Returns

 A Student-t distribution captures thickness in the tails in excess of the Gaussian through a power-type pdf

 Perhaps the most important deviations from normality are the fatter tails and the more pronounced peak in the standardized returns distribution as compared with the normal  The standardized Student, t(d) parameterized by d, is a relatively simple distribution that is well suited to deal with these features: where d > 2 and () is a standard gamma function

• d should be in principle an integer, but d real number is usually accepted in estimation • It can be shown that the first d moments of t(d) will exist, so that d > 2 is a way to guarantee that at least variance exists • and check out the “gamma” function in Wikepedia Lecture 6: Non-normal distributions – Prof. Guidolin

14

Non-Normality: t Student Returns

 d is the only but key parameter of a t-Student; as d  ∞, a tStudent effectively becomes Gaussian

 Key feature of the t(d) distribution is that the random variable, z, is taken to a power, rather than an exponential, as in the normal case  This allows t(d) to have fatter tails than the normal, that is, higher values of the density f(z) when z is far from zero

 Example: exp(-(52)) = 1.39e-011 2 has a dimming effect on the volatility coefficient, given sample standard deviation ( ), in the sense that as d  ∞, volatility converges to sample standard deviation, but it is otherwise lower

 Let’s examine the case of our 4 asset classes, monthly:

• VW CRSP Stock Returns: • VW REIT Returns:

rt+1= 0.890 + 3.900zt+1 rt+1 = 1.052 + 3.780zt+1

Lecture 6: Non-normal distributions – Prof. Guidolin

with d = 6.70 with d = 4.69

18

Applications of t Student Retuns: Density Modelling • 10Y Treas. Note Returns: rt+1 = 0.670 + 2.034zt+1 with d = 8.57 • 1M Treasury Bill Returns: rt+1 = 0.465 + 0.225zt+1 with d = 8.50 • d < 9 in all cases is a rather powerful indication of non-normalities

 We can generalize Q-Q plots to assess the appropriateness of nonnormal distributions

• E.g., assess if returns standardized by GARCH conform to the t(d) distr.

 The quantiles of t(d) are usually not easily found. One then uses the relationship: After GARCH(1,1)

After t-GARCH(1,1)

 t-Student conditional distributions may often improve GARCH fit Lecture 6: Non-normal distributions – Prof. Guidolin

19

Applications of t Student Returns: Value-at-Risk  Under Gaussian returns, VaRt+1(p) = - t+1Φ-1(p) - t+1 > 0

 Remember (see Appendix A) that VaRt,K > 0 is such that Pr(RPt,K < -VaRt,K) = p

• The calculation of VaRt,1 is trivial in the univariate case, when n =1, and Rt,K has a Gaussian density: p = Pr(Rt+1 < -VaRt+1) = Pr((Rt+1 - t+1)/t+1 < -(VaRt+1+ t+1)/t+1) = Pr(z 0 (as Φ-1(p) 0 • This means that on any single day, there is a probability of 1% to record a percentage loss more than 5.85% • Yes, it is not that high and yet the data used are rather plausible: start having some doubts on the Gaussian density as a density for returns… • The corresponding absolute VaR on an investment of $10M is then: $VaRt+1 (1%) = (1-exp(-0.028))($10M) = $276,116 a day

 What happens when ptf. returns follow a t-Student distribution?  In this case, the expression VaRt+1(p) = - t+1Φ-1(p) - t+1 is easily extended to: VaRtSt+1(p) = - t+1[(d-2)/d]1/2x x t-1p (d) - t+1  This derives from Lecture 6: Non-normal distributions – Prof. Guidolin

21

Cornish-Fisher Approximations

 t-Student models only accommodate fat tails and fail to capture asymmetries in the empirical distribution of returns

• For instance, for our monthly data set on stock portfolio returns, t+1= 0.89% t+1= 3.90%, estimated d = 6.70, and t1%-1(6.70) = -3.036 • VaRtSt+1(1%) = -3.900(-3.036) – 0.890 = 10.95% per month • A Gaussian IID Var would have been: VaRt+1(1%) = = -4.657(-2.326) + – 0.890 = 9.94% per month, remarkably lower

 The t(d) distribution is the most used tool that allows for conditional non-normality in portfolio returns  However, it builds on only one parameter and it does not allow for conditional skewness  Approximations represent a simple alternative in risk management that allow for skewness and excess kurtosis  Here Cornish-Fisher approximation (other approximations exist): Lecture 6: Non-normal distributions – Prof. Guidolin

22

Cornish-Fisher Approximations

 A Cornish-Fisher quantile is an expansion around the Normal that depends on sample skewness and excess kurtosis

Sample Excess Kurtosis

Sample Skewness

Φ-1p  Φ-1(p)

 The Cornish-Fisher quantile, CF-1p, can be viewed as a Taylor expansion around the normal distribution  If we have neither skewness nor excess kurtosis so that 1 = 2 = 0, then we simply get the quantile of the normal distribution back, CF-1p = Φ-1p

• For instance, for our monthly data set on stock portfolio returns, t+1 = 0.89%, t+1 = 3.90%, 1 = -0.584, 2 = 5.226 – 3 = 2.226. Because Φ-1p = -2.326, we have: = -0.423

= -0.520

= 0.128

• Therefore CF-11% = -3.148 and VaRCFt+1(1%) = 13.77% per month Lecture 6: Non-normal distributions – Prof. Guidolin

23

Cornish-Fisher Approximations

 You can use the difference btw. VaRCFt+1(1%) = 13.77% and VaRtSt+1(1%) = 10.95% to quantify the importance of negative skewness for monthly risk management (2.82% per month)  Gaussian VaRt+1(1%) = 9.94% looks increasingly dangerous!

 The following plot concerns 1% VaR for monthly US stock returns data (i.e., t+1 = 0.89% t+1 = 3.27%)  The approach to risk management followed so far is a bit odd: we care extremely for the left tail of the density of ptf. returns, but we model the entire density

 Can we do any differently?

Lecture 6: Non-normal distributions – Prof. Guidolin

24

Extreme Value Theory

 Extreme value theory estimates (conditional) tail probabilities of IID returns standardized according to an appropriate volatility model

 Typically, the biggest risks to a portfolio is the sudden occurrence of a single large negative return  Having an as-precise-as-possible knowledge of the probabilities of such extremes is therefore essential  Pre-requisite condition: an appropriately scaled version of asset returns must be IID according to some distribution, • Appropriate scaling will often involve specifying and estimating a volatility (GARCH) model

 Consider the probability of standardized returns z less a threshold u being below a value x given that the standardized return itself is beyond the threshold, u: 0 Lecture 6: Non-normal distributions – Prof. Guidolin

25

Extreme Value Theory  Hold on: what does really mean?

0

x+u

 Not really useful to risk managers, is it?

 The solution is simple: instead of considering z (standardized returns), consider –z, the negative of standardized returns

-

u

 Notice given u, x > 0, 1 - Fu(x) = 1 - Pr{-z – u  x|-z > u} = 1 - Pr{z  –(x + u)|z < -u} Ah! That’s what you want! = Pr{z  – (x + u)|z < -u } Pr(A|B)=Pr(AB)/P(B)

 Using the general definition of a conditional probability,

Lecture 6: Non-normal distributions – Prof. Guidolin

26

Extreme Value Theory

 Extreme value theory (EVT) exploits the fact that the tails of the density of any IID series can be approximated by a generalized Pareto as we move towards the extreme tails

 So it seems that all one needs is a model of the conditional CDF, as we have been developing so far  However, EVT has one key result: as you let the threshold, u, get large, almost any distribution, Fu(x), converges to the generalized Pareto (GP) distribution, G(x; , ), where  > 0   is the key parameter of the GPD: •  > 0 implies a thick tail distribution such as the t-Student •  = 0 leads to a Gaussian density •  < 0 a thin-tailed distribution

• Note that  = 0  Gaussian, not a surprise: tails decay exponentially Lecture 6: Non-normal distributions – Prof. Guidolin

27

Extreme Value Theory

 Maximum likelihood estimates of the parameters of the GPD can be obtained using standard methods

 At this point,

does not have a “congenial” expression for applied purposes  Re-write instead (for y  x + u):

• Now let T denote the total sample size and let Tu denote the number of observations beyond the threshold, u • The term 1 − F(u) can then be estimated simply by the proportion of data point beyond the threshold, u, call it Tu/T

 Fu(y-u) can be estimated by MLE on the standardized observations in excess of the chosen threshold  This means: assuming ,  ≠0, suppose to have obtained ML estimates of  and  in G(x; , )  Then the resulting CDF is: ^ ^

Lecture 6: Non-normal distributions – Prof. Guidolin

^

28

Extreme Value Theory: Hill’s Estimator

 Using an approximation based on the fact that the tails have a smooth shape, Hill’s estimator is obtained in closed form

 This way of proceeding represents the “high” way because it is based on MLE + an application of the GPD approximation result

• However, this is not the most common approach: when  > 0 (the case of fat tails most common in finance), then a very easy estimator exists, namely the so-called Hill’s estimator

 The idea is that a rather complex ML estimation under the GPD may be approximated in the following way (for y > u):

which exploits the fact that the tails are a slowly varying function of y for most distributions and is thus set to a constant, c  See Appendix B for a sketch of proof of the following result ^

F(y) = 1 –

^ cy-1/ ^

=

Lecture 6: Non-normal distributions – Prof. Guidolin

[

]

29

-1

Extreme Value Theory: Hill’s Estimator

• What is the payoff of all our approximation efforts? Our estimates are available in closed form—they do not require numerical optimization! • They are therefore extremely easy to calculate • A first application of Hill’s ETV estimator consists of the computation of (partial) Q-Q plots for returns below some threshold loss –u < 0 • It can be shown that the QQ plot from EVT can be built using the relationship where yi is the ith standardized loss sorted in descending order (i.e., for negative standardized –u returns ) • Being a partial CDF estimator, ETV-based QQ plots are frequently excellent • They obviously suffer from consistency issues, as same quantile varies with the threshold u Lecture 6: Non-normal distributions – Prof. Guidolin

30

Reading List/How to prepare the exam

 Carefully read these Lecture Slides + class notes  Possibly read CHRISTOFFERSEN, chapter 6

 Lecture Notes are available on Prof. Guidolin’s personal web page 



Jaschke, S. (2002) “The Cornish-Fisher-Expansion in the Context of Delta-GammaNormal Approximations”, Journal of Risk, Number 4, Summer 2002.

Teräsvirta, T. (2009) “An Introduction to Univariate GARCH Models”, in Andersen, T., R. Davis, J.-P. Kreiß, and T. Mikosch, Handbook of Financial Time Series, Springer.

Lecture 6: Non-normal distributions – Prof. Guidolin

31

Appendix A: Value-at-Risk

 Let’s review the definition of (relative) VaR: VaR simply answers the question “What percentage loss is such that it will only be exceeded p x 100% of the time in the next K trading periods (days)?”  Formally:

VaRt,K > 0 is such that Pr(RPt,K < -VaRt,K) = p

where RP is a continuously compounded portfolio return

 The absolute $VaR has a similar definition with “dollar/euro” replacing “percentage” in the definition above  Continuously compounded means that RPt,K  ln(VPt+K) – ln(VPt) where VPt is the portfolio value

 Absolute $VaR is defined as Pr(exp(RPt,K)< exp(-VaRt,K)) = p or [subtract 1] Pr((VPt+K/VPt)-1 < exp(-VaRt,K)-1) [multiply by VPt] = Pr(VPt+K – VPt < (exp(-VaRt,K)-1)VPt) = Pr($Losst,K>(1-exp(-VaRt,K))VPt) = Pr($Losst,K > $VaRt,K) = p Lecture 3: Multivariate Time Series Analysis– Prof. Guidolin

32

Appendix B: Deriving Hill’s Estimator  Proceed to develop parameter  into the c constant

into B(x)x-1/ and absorb the

 Writing the log-likelihood function for the approximate conditional density, taking first-order conditions and solving, deliver a simple estimator for  yields: ^ Hill

which is easy to implement and remember

 We can estimate the c parameter by ensuring that the fraction of observations beyond the threshold is accurately captured by the density as in : because we have approximated as F(u) = 1 – cu-1/

 Solving this equation for c yields:

^

^

Lecture 3: Multivariate Time Series Analysis– Prof. Guidolin

33

Suggest Documents