THE UNIVERSITY OF CALGARY. Estimation of the Shape Parameter in the Elliptical Family of Distributions

THE UNIVERSITY OF CALGARY Estimation of the Shape Parameter in the Elliptical Family of Distributions Based on Non–Parametric Measures of Association...
Author: Buddy Wilson
4 downloads 1 Views 2MB Size
THE UNIVERSITY OF CALGARY

Estimation of the Shape Parameter in the Elliptical Family of Distributions Based on Non–Parametric Measures of Association

by

Matthew K. Davis

A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science

DEPARTMENT OF Mathematics and Statistics

CALGARY, ALBERTA June, 2007

c Matthew K. Davis 2007

THE UNIVERSITY OF CALGARY FACULTY OF GRADUATE STUDIES The undersigned certify that they have read, and recommend to the Faculty of Graduate Studies for acceptance, a thesis entitled “Estimation of the Shape Parameter in the Elliptical Family of Distributions Based on Non–Parametric Measures of Association” submitted by Matthew K. Davis in partial fulfillment of the requirements for the degree of Master of Science.

Supervisor, Dr. Gemai Chen Department of Mathematics and Statistics

Dr. Alexander de Leon Department of Mathematics and Statistics

Dr. Gordon Fick Department of Community Health Sciences

Date

ii

Abstract It is not so well known that there exist relationships between the correlation ρ (association parameter) of a bivariate normal population and some non–parametric measures of association such as Kendall’s tau and Spearman’s rho. These relationships are introduced for the two aforementioned measures along with Blomqvist’s quadrant measure of association. These relationships are not necessarily limited to the normal distribution and it is illustrated that for the family of elliptical distributions the relationship between Kendall’s tau or Blomqvist’s quadrant measure and the parameter ρ holds. Based on these relationships robust estimates of the association (correlation) parameter are proposed. The properties of these estimates are investigated via simulation for various elliptical distributions in terms of bias, variance and root mean square error, and the robustness to extreme observations of these non–parametric estimates for the association parameter are discussed when a normal sample is contaminated with a few outliers. To illustrate the benefits of using robust estimates of association, practical examples are discussed for distributions with heavy tails. This is an important area in application. For example, various exchange rates are heavy tailed and the association among various rates can be calculated using our proposed robust estimates.

iii

Acknowledgements I would like to thank many people for aiding me throughout this work. Most importantly my supervisor Dr. Gemai Chen. His support and suggestions have been invaluable throughout my Master’s program. The helpful comments and suggestions of Dr. Gordon Fick and Dr. Alex de Leon have helped to improve this thesis greatly and are much appreciated. I would like to also thank Dr. Peter Ehlers for his support and interest, and also for running the graduate seminars. My time spent at the University of Otago was instrumental in starting this work and I would like to thank my previous supervisor Dr. Mark Meerschaert for his support and guidance. I also would like to thank Dr. David Fletcher for helping me through some trying times. I am indebted to my family and friends for supporting me throughout my education. I am especially happy to thank my girlfriend Kristin for her support and for finding the odd typos in my thesis. The funding provided by the Department of Mathematics and Statistics has been well appreciated over the past year and a half, and the gracious financial support of my Aunt Rosemary has been instrumental in supporting me throughout my education. I cannot say thank you enough for that. Finally I would like to dedicate this to my late father. He would be jealous of the computers that I did all these simulations on, and proud of the end result.

iv

Table of Contents Approval Page

ii

Abstract

iii

Acknowledgements

iv

Table of Contents

v

List of Symbols

ix

1 Introduction 1.1 Measuring Association . . . . . . . . 1.1.1 Properties . . . . . . . . . . . 1.1.2 Correlation . . . . . . . . . . 1.1.3 Spearman’s Rho . . . . . . . . 1.1.4 Kendall’s Tau . . . . . . . . . 1.1.5 Quadrant Measure . . . . . . 1.2 Thesis Outline . . . . . . . . . . . . . 1.2.1 Chapter 2: Results . . . . . . 1.2.2 Chapter 3: Simulation Study 1.2.3 Chapter 4: Applications . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

2 Properties of Various Estimates of ρ 2.1 Elliptical Distributions . . . . . . . . . . . . . . . . . . . . 2.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 The Shape Parameter ρ . . . . . . . . . . . . . . . 2.1.3 The Orthant Probability of Elliptical Distributions 2.1.4 Relationships with ρ . . . . . . . . . . . . . . . . . 2.2 Pearson’s Rho . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Distribution . . . . . . . . . . . . . . . . . . . . . . 2.3 Kendall’s Tau . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Distribution . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Greiner’s Estimate ρbg of ρ . . . . . . . . . . . . . . 2.4 Quadrant Measure . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Distribution . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Quadrant Estimate ρbq of ρ . . . . . . . . . . . . . 2.5 Spearman’s Rho . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Distribution . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Spearman’s Estimate ρbs of ρ . . . . . . . . . . . . . 2.5.3 Kendall’s Estimate ρbk of ρ . . . . . . . . . . . . . . v

. . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . .

1 3 4 5 8 10 12 14 15 15 15

. . . . . . . . . . . . . . . . . .

17 17 18 21 24 25 28 28 29 30 31 33 34 34 36 36 36 38 38

3 Simulation Study 3.1 Simulation #1: Estimates for Elliptical Distributions 3.1.1 Design . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Normal . . . . . . . . . . . . . . . . . . . . . 3.1.3 Student’s T . . . . . . . . . . . . . . . . . . . 3.1.4 Cauchy . . . . . . . . . . . . . . . . . . . . . 3.1.5 Dependence on Heavy Tails . . . . . . . . . . 3.2 Simulation #2: Contaminated Normal . . . . . . . . 3.2.1 Design . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Results . . . . . . . . . . . . . . . . . . . . . . 3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

39 40 40 41 44 52 55 59 60 61 68

4 Applications 4.1 Association in Log–Returns of Exchange Rates: #1 . . . . . 4.1.1 Canadian–US & Euro–US . . . . . . . . . . . . . . . 4.1.2 Australia–US & New Zealand–US . . . . . . . . . . . 4.2 Association in Log–Returns of Exchange Rates: #2 . . . . . 4.2.1 Elliptically Contoured Stable Distributions . . . . . . 4.2.2 German Mark & Japanese Yen versus British Pound

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

70 71 72 73 76 76 77

5 Conclusions 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80 80 81

Bibliography

83

A Bootstrap Estimation of the Standard Error of the Estimates

90

B R–Code B.1 Generation of Random Bivariate Samples . . . B.2 Estimates for ρ . . . . . . . . . . . . . . . . . B.2.1 Pearson’s Product Moment Correlation B.2.2 Greiner’s Estimate (b ρg ) . . . . . . . . . B.2.3 Quadrant Estimate (b ρq ) . . . . . . . . B.2.4 Spearman’s Estimate (b ρs ) . . . . . . . B.2.5 Kendall’s Suggested Estimate (b ρk ) . . . B.3 Simulation #1 . . . . . . . . . . . . . . . . . . B.4 Simulation #2 . . . . . . . . . . . . . . . . . .

vi

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . . . . . . . . . Coefficient (b ρp ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

92 92 93 93 93 93 94 94 95 95

List of Tables 3.1

Estimates Used in the Simulation . . . . . . . . . . . . . . . . . . . . . .

Estimates of ρ for the Bivariate Daily Log–Returns of the CUS and EUS FX Rates (Bootstrap SEs in Parentheses) . . . . . . . . . . . . . . . . . . 4.2 Estimates of ρ for the Bivariate Daily Returns of the AUS and NUS FX Rates since 1971 (Bootstrap SEs in Parentheses) . . . . . . . . . . . . . . 4.3 Estimates of ρ for the Bivariate Daily Returns of the AUS and NUS FX Rates since 1999 (Bootstrap SEs in Parentheses) . . . . . . . . . . . . . . 4.4 Estimates of ρ for the Bivariate Daily Returns of the MBP and YBP (Bootstrap SEs in Parentheses) . . . . . . . . . . . . . . . . . . . . . . .

40

4.1

vii

73 73 75 78

List of Figures 1.1 Examples of Different Types of Association . . . . . . . . . . . . . . . . . 1.2 Example of the Influence of Outliers on ρbp . . . . . . . . . . . . . . . . . 1.3 Greiner’s Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9

The Effect of ρ and σx on the Contours of the Density Function for a Bivariate T–Distribution with 4 Degrees of Freedom . . . . . . . . . . . .

3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20 3.21 3.22

Legend for Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . Bias, Variance, RMSE for NORM (n = 20, 100) . . . . . . . . . . . . . . Comparing Ratios of RMSE for NORM (n = 100) . . . . . . . . . . . . . Bias, Variance, RMSE for TDF2 (n = 20, 100) . . . . . . . . . . . . . . . Comparing Ratio of RMSE for TDF2 (n = 100) . . . . . . . . . . . . . . Bias, Variance, RMSE for TDF4 (n = 20, 100) . . . . . . . . . . . . . . . Comparing Ratio of RMSE for TDF4 (n = 100) . . . . . . . . . . . . . . Bias, Variance, RMSE for TDF8, TDF12, TDF16, TDF20 (n = 100) . . . Comparing Ratios of RMSE for the Estimates to the RMSE of ρbp for TDF8, TDF12, TDF16, TDF20 (n = 100) . . . . . . . . . . . . . . . . . Comparing Ratios of RMSE for the Estimates to the RMSE of ρbg for TDF8, TDF12, TDF16, TDF20 (n = 100) . . . . . . . . . . . . . . . . . Comparing Ratios of RMSE for CAUC (n = 100) . . . . . . . . . . . . . Bias, Variance, RMSE for CAUC (n = 20, 100) . . . . . . . . . . . . . . . Effect of Tails on the Bias of the Estimates (n = 100) . . . . . . . . . . . Effect of Tails on the Variance of the Estimates (n = 100) . . . . . . . . Effect of Tails on the RMSE of the Estimates (n = 100) . . . . . . . . . . Outlier Situations Considered for Simulation # 2 with Bivariate Contours Bias for Outlier Simulation: Cases A, B, C . . . . . . . . . . . . . . . . . Bias for Outlier Simulation: Cases D, E, F . . . . . . . . . . . . . . . . . Variance for Outlier Simulation: Cases A, B, C . . . . . . . . . . . . . . Variance for Outlier Simulation: Cases D, E, F . . . . . . . . . . . . . . . RMSE for Outlier Simulation: Cases A, B, C . . . . . . . . . . . . . . . RMSE for Outlier Simulation: Cases D, E, F . . . . . . . . . . . . . . . .

4.1 4.2 4.3 4.4

CUS and EUS Exchange Rate Log–Returns . . . . . . AUS and NUS Exchange Rate Log–Returns since 1971 AUS and NUS Exchange Rate Log–Returns since 1999 Ger–UK and Jpn–UK Exchange Rate Log–Returns . .

3.10

viii

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

4 7 11 22 41 42 43 45 46 47 48 49 50 51 53 54 56 57 58 61 62 63 64 65 66 67 72 74 75 78

List of Symbols (X, Y )T A bivariate random variable X = (X1 , . . . , Xp )T A p–variate random vector (x1 , y1 )T , (x2 , y2 )T , . . . , (xn , yn )T A bivariate random sample Σ The covariance matrix when it exists, otherwise the association/shape matrix ρ The association/shape parameter of an elliptical distribution and, when it appropriate, the correlation of the bivariate normal distribution τ The population parameter associated with Kendall’s Tau %S The population parameter associated with Spearman’s Rho κ The population parameter associated with the quadrant measure of association τb Kendall’s estimate of rank correlation, also known as Kendall’s Tau στb2 The variance of τb %bS Spearman’s estimate of rank correlation, also known as Spearman’s Rho κ b The estimate of the quadrant measure of association ρbp Pearson’s product moment correlation estimate ρbg The estimate of ρ based on its relation to τ in the normal case ρbs The estimate of ρ based on its relation to %S in the normal case ρbk The estimate of ρ based on its relation to %S in the normal case and the bias of %bS ρbq The estimate of ρ based on its relation to κ in the normal case ϕX (t) The characteristic function of X Ep (µ, Σ, φ) A p–variate elliptical distribution with location parameter µ, association/shape matrix Σ, and characteristic generator φ d

= Equality in distribution RMSE Root mean squared error

ix

Chapter 1 Introduction The study of association between random variables is important in statistics. It is often of interest to determine a quantity that expresses how much association there is between two random variables after it is determined that the random variables are not independent. For example, if one is building a portfolio of stocks it is prudent to know if, how, and how much two or more stocks are related. This can then be used in financial applications such as portfolio optimization or value at risk. There are numerous association measures in the literature which can be used to measure the association between populations. The three most common being the linear correlation, commonly estimated by Pearson’s product–moment correlation coefficient, and two non–parametric measures, Spearman’s rho and Kendall’s tau, both of which have population meanings and estimates that are presented later. The most frequently used estimate is by far Pearson’s, however in many instances it is not necessarily the most appropriate measure of association due to the assumptions this estimate requires. It is best to start with an exploration of some of the semantics involving the rich vocabulary used in the study of association. Association, correlation, covariance, covariation, dependence, etc.. are all common terms used to discuss the relationship between two random variables, X and Y . van Belle (2002) notes that there are numerous terms for denoting relationships, and he adopts the use of covariation in his discussion and chooses not to use association since it “conjures up images of categorical data” (van Belle, 2002, p. 53). While association may have this problem, the term does not evoke the illusion that this association requires a finite second moment. Correlation, covariance, and covariation appear to imply an assumption of finite second moments, and in some cases 1

2 it is not possible to meet this assumption. Since association appears to be free of the assumptions that the other terms have, it is adopted here. Thus, the terms association and correlation can be used interchangeably when there are finite second moments and when there is not, association is the term that can be used as the analogue of correlation. The bivariate normal distribution is commonly used in many bivariate situations. Its use in a diverse range of applications is in part due to its mathematical simplicity and also in part due to the elliptical nature of many observed multivariate data sets. It is however only one member of a broad family of distributions that have elliptical shapes. While sharing the elliptical shape of the normal these distributions allow for properties that the normal does not, making them applicable to a broader range of situations and suitable for capturing aspects of observed data that the normal could not, such as heavy tails. The use of association over correlation and other terms is necessary since there are cases in the elliptical family of distributions where the distributions do not have a finite second moment, such as the symmetric stable distributions, and possibly not even a finite first moment. For normal distributions linear association is often discussed, but there are many other ways that random variables may be associated. One way of relaxing the assumption of linear association is to consider monotone association. Both of these types of association imply that if one variable increases then the other either increases or decreases. Some questions that can be asked are: Is there any association? How strong is the association? Is the association positive (one increases, the other increases) or negative (one increases, the other decreases)?

3

1.1

Measuring Association

Lehmann (1966) discusses various definitions of dependence (association), some of which are stronger than others. Copulas (Nelsen, 2006; with references therein) provide a very versatile method for studying the association between random variables. To measure the association between two populations there are a multitude of methods that may be used depending on what questions are being asked. In this thesis the association measures considered are limited to those that measure linear or monotone association. By far the most common method for measuring linear association is the correlation (Casella & Berger, 1990). Another measure is the correlation ratio (Kruskal, 1958; Prokhorov, 2002). There are also an abundance of rank based, or non–parametric measures as well, such as Spearman’s rho (Spearman, 1904; Kruskal, 1958), Kendall’s tau (Kendall, 1938; Kruskal, 1958), and the coefficient of medial correlation (Blomqvist, 1950; Kruskal, 1958). The later is also known as Blomqvist’s quadrant measure of association, which is the name adopted here. Kruskal (1958) gives probabilistic population meanings to all three of these non–parametric measures. Lehmann (1966) notes that the correlation or covariance and the three non–parametric measures are all appropriate for measuring quadrant dependence which includes monotone and linear dependence. The first two measures discussed are suited for measuring linear association, whereas the rank based procedures are useful for measuring a monotone or linear relationship. Figure 1.1 illustrates the difference among linear, monotone, and quadratic relationships. Supposing that (X, Y )T is a bivariate random vector, the desirable properties of a measure of association between X and Y are introduced below, followed by some specific measures.

4

X

Quadratic Relationship

Y

Two Monotone Relationships

Y

Y

Linear

X

X

Figure 1.1: Examples of Different Types of Association 1.1.1

Properties

To properly define measures of association, there should be some common properties that all measures should possess. Let λ(X, Y ) denote an association measure, the standard properties that λ(X, Y ) must have are P1 |λ(X, Y )| ≤ 1 P2 λ(X, Y ) = λ(Y, X) P3 If X and Y are independent then λ(X, Y ) = 0 P4 If |λ(X, Y )| = 1 then there is a perfect association between X and Y These properties differ from those of Renyi (1959) which are for symmetric nonparametric measures of dependence (association). Renyi’s axioms are (see Schweizer & Wolff, 1981) R1 λ(X, Y ) is defined for any X and Y R2 λ(X, Y ) = λ(Y, X) R3 0 ≤ λ(X, Y ) ≤ 1 R4 λ(X, Y ) = 0 if and only if X and Y are independent

5 R5 λ(X, Y ) = 1 if and only if each of X and Y is almost surely a strictly monotone function of the other R6 λ(f (X), g(Y )) = λ(X, Y ) where f, g are monotone functions R7 If (X, Y )T has a bivariate normal distribution with correlation coefficient ρ, then λ(X, Y ) = φ(|ρ|) where φ is a strictly increasing function These axioms are stricter than the properties P1 to P4 above but do include all of the same properties except P1. Note however that in P1, if λ(X, Y ) < 0 then λ2 (X, Y ) > 0 and so λ2 (X, Y ) would be equivalent to R3. The axioms R1, R6, and R7 are not included though, and hence there may be bivariate distributions, and/or monotone functions f , g where an association measure that meets P1 to P4 fails to meet R1, R6, and R7. The first set of properties concern the primary questions regarding association and hold for both parametric and non–parametric measures of association. Renyi’s axioms highlight some of the benefits of using a non–parametric measure, such as invariance to outliers and/or monotone transformations of the data. The last of Renyi’s axioms provides the impetus for investigating estimates for the association parameter ρ for a bivariate normal distribution based on non–parametric measures of association. This in turn leads to the investigation of whether these relationships hold for the extension of the normal distribution to the elliptical distributions. 1.1.2

Correlation

The most standard measure of association is the correlation. Suppose that (X, Y )T is a bivariate random variable with mean µT = (µx , µy )T , and finite second moments. Then the correlation between X and Y is defined as Cor(X, Y ) =

E ((X − µx )(Y − µy )) p . V ar(X)V ar(Y )

(1.1)

6 If (X, Y )T has a bivariate normal distribution with covariance matrix   2 ρσx σy   σx Σ= , 2 ρσx σy σy

(1.2)

then the correlation is ρ. Here the parameter ρ is termed the association parameter and Σ is parameterized to include ρ in the off–diagonals unless mentioned otherwise. Correlation results in a measure of association that is not applicable for all distributions, notably those with heavy tails such as bivariate T distributions with two degrees of freedom, bivariate Cauchy distributions, and bivariate stable distributions. This is due to its requirement of finite second moments. In some situations however the analogue of correlation can be applied when this moment requirement is not met. For example, in time series, Brockwell & Davis (1991) use the analogue of auto–correlation when considering time series with heavy tailed innovations where the auto–correlation does not exist. This analogue is a function of the time series coefficients which has the same notation as the auto–correlation. It would be the auto–correlation if the innovations have finite variances. The history of the correlation coefficient is diverse with many historical curiosities intertwined with its development. Rodgers & Nicewander (1988) give a brief overview of these curiosities which span nearly two centuries. Porter (1986) notes that a French astronomer, Auguste Bravais, was perhaps the first to coin the term correlation in 1846, but it really was the work of Francis Galton and Karl Pearson which paved the way for estimating correlation. Galton’s interest in inherited characteristics and his experiments led him to realize that he required mathematical tools to explain the variation in the linear associations that he was finding. Pearson aided in this matter and is probably most well known for his product moment correlation coefficient ρbp (Porter, 1986) which estimates the correlation of a bivariate normal population. Pearson (1920) gives a personal account

7 of the development of ρbp and Stigler (1989) discusses Galton’s account. Pearson’s product moment correlation coefficient, ρbp , is calculated for a sample (x1 , y1 )T , (x2 , y2 )T , . . . , (xn , yn )T from a normal distribution with mean µ and covariance matrix Σ by Pn

− x¯)(yi − y¯) . P ¯)2 ni=1 (yi − y¯)2 i=1 (xi − x

ρbp = pPn

i=1 (xi

(1.3)

There is a huge literature on the properties of ρbp . Devlin, Gnanadesikan & Kettenring (1975) note that ρbp is very sensitive to outliers. Abdullah (1990) remarks that this is illustrated in Chernick (1982) using influence functions. Here we offer a simple illustration. Figure 1.2 contains a sample of size 50 from a bivariate normal distribution with

(−4,4)

2

3

4

ρbp = 0.48. If one outlier at (-4,4) is added then ρbp reduces to −0.0008.

1

y

● ●



●●





● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●

−1

0



−4

−3

−2



−1

●● ●

0

● ●

1

●●

2

x

Figure 1.2: Example of the Influence of Outliers on ρbp The discussion of the robustness of ρbp is a lengthy one, mostly focussing on the effects of non–normality on ρbp . Kowalski (1972) presents a broad historical survey on this discussion of robustness. He notes that in reviewing the literature there are approximately equally large camps claiming robustness or lack thereof. Yuan & Bentler (2000) also discuss some more recent developments in the debate on the robustness of ρbp .

8 While it is agreed upon by many that ρbp is quite robust to non–normality when ρ = 0, this stability breaks down for non–zero values of ρ (Kowalski, 1972). There are many possible situations where ρbp is greatly affected. One such instance is when the kurtosis of the population is greatly deviated from the normal where Kowalski (1972) notes that the variance of the estimate is either higher or lower than the value given by the normal theory. Thus the estimate is not robust to heavy tails. Kraemer (1980) notes this as well, along with two other requirements for the robustness of ρbp linearity, and homoscedasticity. To deal with the lack of robustness it is possible to utilize non–parametric estimates which are more robust. Three such estimates are suggested in the next three subsections.

1.1.3

Spearman’s Rho

Renyi’s axioms state that if one were to use a non–parametric measure of association then the distributional assumption issues with ρ would be nullified by R1. One of the early non–parametric, or rank based estimates of association is Spearman’s rho (Spearman, 1904). Essentially this estimate of association is based on calculating Pearson’s product moment correlation on ranked data. When the assumption of normality fails and/or when practitioners are working with data that is ordinal, the use of Pearson’s linear correlation is unjustified. Various rank based measures were presented near the turn of the last century to deal with this, one was Spearman’s rho which estimates the population association measure %S .1 The measure which Spearman’s rho estimates can be written as (Kruskal, 1958) %S = 6P ((X1 − X2 )(Y1 − Y3 ) > 0) − 3,

(1.4)

where (X1 , Y1 )T , (X2 , Y2 )T , (X3 , Y3 )T are independent copies of the bivariate random vec1

To disambiguate between the rho (ρ) that is used to denote association and the measure that Spearman’s rho is estimating, a scripted rho with a subscripted S, %S , is used to denote the latter.

9 tor (X, Y )T . This is arrived at from the definition %S = 3P ((X1 − X2 )(Y1 − Y3 ) > 0) − 3P ((X1 − X2 )(Y1 − Y3 ) < 0) which is equivalent to the aforementioned definition. Kruskal (1958) gives two interpretations of %S , with one being more natural (according to the author) than the other. In that interpretation, %S is defined as the standardized difference in the probabilities that at least one observation out of three is concordant with the other two and at least one is discordant with the other two. Here concordance is defined as the slope between two points being positive, and discordance when the slope is negative. Assuming that the bivariate population is normal, Renyi’s seventh axiom applied to %S leads to the relationship %S =

6 sin−1 (ρ/2) . π

(1.5)

Kruskal (1958) shows this result and in Chapter 2 the result is shown based on the orthant probability of the normal distribution. To estimate %S , Spearman (1904) suggests the following procedure: Assign ranks R(xi ) to xi and ranks R(yi ) to yi then %S can be estimated by calculating ρbp on the ranks. When there are no ties an alternative form is (Conover, 1999; Hollander & Wolfe, 1999) %bS = 1 −

6

Pn

2 i=1 (R(xi ) − R(yi )) . n(n2 − 1)

(1.6)

Inverting the function linking ρ and %S and using in the estimate %bS for %S it is possible to form an estimate of ρ for a normal population based on %bS , namely,  ρbs = 2 sin

πb %S 6

 ,

(1.7)

where the subscript s denotes that this estimate for the association parameter is based on Spearman’s rho.

10 1.1.4

Kendall’s Tau

Another non–parametric measure of association is τ . It has a rich history of discovery and rediscovery. It is not uncommon for certain concepts to be developed and end up wallowing in the shadows only to be independently discovered and promoted later on. The history of τ and the estimate τb follows such a path. Originally proposed by Fetchner (1897) as a measure of association for rank variables, measures similar to τ are used in Lipps (1905), and March (1905) among others in and around the turn of the twentieth century. March (1905) even uses similar terminology as Kendall (1938) does. March (1905) uses concordant and discordant pairs in his formulation of an index of dependence between time series and Kendall uses the same terms in his paper. In the 1920s, Lindberg (1925, 1929) and Esscher (1924) both considered the estimation of τ . At the end of the decade a graduate student at Mount Allison University, S.D. Homes proposed a graphical method for estimating the linear correlation coefficient and developed an accompanying formula which is in fact τb (Sandiford, 1929). This method was not followed up on as it was found to be computationally laborious for all but small samples and there was uncertainty amongst the authors as to the standard error of the estimate (Griffin, 1958). The definition of the measure associated with Kendall’s tau is simpler than that of %S . Suppose that (X1 , Y1 )T and (X2 , Y2 )T are independent copies of (X, Y )T then  τ =P

   Y1 − Y2 Y1 − Y 2 >0 −P 0 and discordant if (yj − yi )/(xj − xi ) < 0. When there are no tied x values nor tied y values, the total number of concordant pairs Nc and the total number of discordant pairs Nd add up to n(n − 1)/2, so in the case of no ties, Kendall (1938) defines the following estimate of

12 association τb =

N c − Nd . n(n − 1)/2

(1.10)

It should be noted that this estimate is appropriate when there are no ties, which is on occasion referred to as τa . In samples with ties, an estimate τb may be used, and for contingency tables τc is used. Neither of these last two measures are discussed here, the methods though can be seen in Conover (1999), Hollander & Wolfe (1999), or Sheskin (2004), and are briefly discussed in Appendix A. In Greiner (1909), Esscher (1924), and Kendall (1949, 1970), it is suggested that the relationship outlined in R7 may be used to create an estimate of ρ by inverting the relationship and using the sample value τb for τ , that is,  ρbg = sin

 πb τ . 2

(1.11)

The subscript g is used to highlight that this estimate of ρ is based on Greiner’s relationship. Thus, in the normal case this joins ρbs as another alternative estimate to ρbp for ρ based on rank based association measures.

1.1.5

Quadrant Measure

A final measure of association which has its roots in overcoming the complexities of calculating τb and %bS for large samples in a time when computational power was cost prohibitive is Blomqvist’s quadrant measure of association, κ. It only requires the knowledge of the proportion of the population that falls into each of the four quadrants around the median of the distribution. The population interpretation is given by Kruskal (1958) and is probabilistic as with τ and %. Suppose that the bivariate random variable (X, Y )T has marginal medians µ eX and µ eY , then κ is defined as κ = P ((X − µ ex )(Y − µ eY ) > 0) − P ((X − µ eX )(Y − µ eY ) < 0).

(1.12)

13 The interpretation of κ is the probability that an observation is in the first and third quadrant about the median minus the probability of it being in the second and fourth quadrant. Kruskal (1958) showed that for bivariate normal distributions 2 sin−1 (ρ) , κ= π

(1.13)

which is the same as for τ . Kruskal (1958) remarks that the quadrant measure κ can be estimated naturally by calculating the sample medians and using the four quadrants around the sample medians (e x, ye)T to calculate the number of sample points in each quadrant. Denote Ni as the number of points in each quadrant, then κ b=

N1 + N3 − N2 − N4 . N1 + N3 + N2 + N4

(1.14)

When n is even and there are no ties then this defines the sample statistic unambiguously and the denominator is simply n. To make the definition precise for odd sample sizes, Blomqvist (1950) suggested that if (e x, ye)T = (xj , yj )T for some j then neglect that particular point in the calculation. And, if (e x, ye)T = (xj , yk )T for some j 6= k then count the two sample points (xj , yj )T , (xk , yk )T as one point and assign it to the quadrant that is touched by both points. Thus using κ b it is possible to estimate ρ as with τb and %bS , that is,  ρbq = sin

πb κ 2

 ,

(1.15)

where the subscript q denotes that this estimate of ρ is based on the quadrant measure of association.

14

1.2

Thesis Outline

There has been little discussion on the estimates that are proposed here based on non– parametric measures of association. In this thesis, these estimates and the relationships among them are explored to provide a detailed description of their benefits and on occasions, their pitfalls. The relationship between τ and ρ, occasionally referred to as Greiner’s relationship is the most well known of these relationships and even then it is rarely mentioned. Subsequent to the discussion of Greiner’s relationship in the early part of the twentieth century not many references to the estimate ρbg have been made. Kendall (1970) mentions the relationship along with the relationship between Spearman’s %S and ρ for the bivariate normal distribution. Kruskal (1958) mentions the two relationships that Kendall (1970) mentions, along with a relationship between the coefficient of medial correlation and ρ. Newson (2001) briefly mentions the relationship between ρ and τ as a method for creating robust confidence intervals for ρ. Rupinski & Dunlap (1996) use Greiner’s relationship to deal with the presence of τb instead of ρbp when undertaking a meta-analysis. Lindskog (2000) and Lindskog, McNeil & Schmock (2002) show that the relationship between τ and ρ holds for the elliptical family and gives a short simulation study of the usage of ρbg in comparison to other estimates of the association based on the estimated root mean squared error. Evandt et al. (2004) also discuss the estimated root mean squared error of ρbg in the case of the bivariate normal distribution being contaminated by a few outliers. Apart from the work on Greiner’s relationship, the other non–parametric estimates for ρ mentioned here are underrepresented in the literature. This collection of non– parametric estimates is considered for ρ in this thesis and the benefits of the various estimates are discussed.

15 1.2.1

Chapter 2: Results

In Chapter 2, the elliptical distributions are introduced and the parameter ρ is discussed in detail. The relationships between κ or τ and ρ are noted to hold for all elliptical distribution and a note is given as to why the relationship does not hold for %S when the population is not normal. Results for the various measures of association and their corresponding estimates are given. These include the bias and variance of the estimates when available. By considering the bias in the estimate of Spearman’s rho, a fourth estimate of ρ is given which is based on τb and %bS and is proposed in Kendall (1970). 1.2.2

Chapter 3: Simulation Study

To determine the differences among the various estimates of ρ based on non–parametric measures of association, a simulation study is undertaken. This entails simulating various bivariate samples of elliptical distributions and calculating the bias, variance and root mean square error. The main illustration here is to show that when there are heavy tails, these estimates perform better than the commonly used ρbp . To highlight the robustness to outliers in normal samples, a simulation expanding the simulation of Evandt et al. (2004) is conducted. Based on the simulations, suggestions as to the use of the various estimates are discussed for various situations.

1.2.3

Chapter 4: Applications

To illustrate the use of the proposed estimates of ρ, some practical examples are considered. These examples involve accurately estimating the association for the log–returns of daily foreign exchange rates. A comparison to one method for estimating the association when the distributions are elliptically stable is made as well where the robust estimates

16 provide a different and perhaps more appropriate estimate of association. Finally, a brief discussion is provided highlighting the conclusions arrived at in this thesis using the non–parametric estimates of association. This focuses on recommended usage and possible future work on the subject. The appendices include a discussion on how to obtain standard errors for our non–parametric estimates of association using the method of bootstrap, and the code in the R statistical computing language for the estimates and the simulations.

Chapter 2 Properties of Various Estimates of ρ In Chapter 1 three non–parametric measures of association are presented along with their relationships with the correlation parameter ρ of the bivariate normal distribution. Here the more general elliptical family of distributions is presented. Some analytic notes are provided for the properties of the three proposed estimates and a fourth estimate is proposed based on the bias of %bS

2.1

Elliptical Distributions

One of the most commonly used multivariate distributions is the multivariate normal distribution. The elliptical family can be thought of as a generalization of the normal distribution. The name refers to the shape of the contours of the density function and like the bivariate normal density, all the contours are elliptical. There are many elliptical distributions, three well known ones are the normal, the T, and the Cauchy distributions. The normal is a member of the broader class of Kotz– type distributions which is included in the elliptical family of distributions, along with Pearson type II and VII classes of distributions. Other elliptical distributions include the multivariate symmetric stable distributions, logistic distribution, and multi–uniform distribution. A comprehensive list is given in Table 3.1 in Fang, Kotz & Ng (1990), or in Jensen (2006). With the breadth of distributions that fall into the elliptical distribution family there are many avenues for applications. For example the T and the stable distributions are commonly used in finance to account for excessively heavy tails (Nolan, 2003; Laman-

17

18 tia, Ortobelli & Rachev, 2006; Breymann, Dias & Embrechts, 2003). The multivariate Cauchy has applications in physics where it is occasionally referred to as the Lorentz distribution. Nadarajah (2003) notes that Kotz–type distributions are being used more frequently in fields such as ecology, mathematical finance, and signal processing.

2.1.1

Definition

Fang, Kotz & Ng (1990) note that there are numerous equivalent definitions of an elliptical distribution. Some equivalent definitions can be seen in Cambanis, Huang & Simons (1981), and Chmielewki (1981). The latter also gives a review and bibliography for early papers on the elliptically symmetric distributions. More recent references regarding elliptical distributions are available in Fang (2006). Fang, Kotz & Ng (1990) define an elliptical distribution based on a spherical distribution which is an extension of the standard multivariate normal distribution Np (0, Ip ) where the location vector is the origin and the covariance matrix is the identity matrix. Cambanis, Huang & Simons (1981) and Fang (2006) define elliptical distributions directly via their characteristic functions. When the density function exists Kelker (1970) defines elliptical and spherical distributions using the probability density function. In this thesis the development from Fang, Kotz & Ng (1990) is followed. Spherical Distributions Definition 1 A random p–vector X is said to have a spherical (one could say circular in the bivariate case) distribution if for every p × p orthogonal matrix Γ there is d

ΓX = X, d

where = stands for equality in distribution. In Theorem 2.1 of Fang, Kotz & Ng (1990) two equivalent conditions for X to be spherically distributed are presented, namely, X has a spherical distribution if and only

19 TX

if the characteristic function ϕX (t) = E(eit

) satisfies

ϕX (t) = ϕX (ΓT t), or there exists a scalar function φ(·) such that ϕX (t) = φ(tT t).

(2.1)

The function φ(·) uniquely determines the type of spherical distribution X has and is termed the characteristic generator. The spherical random p–vector is denoted by Sp (φ). Example (Bivariate Standard Normal): A standardized bivariate normal distribution Z ∼ N2 (0, I2 ) is a spherical distribution because ϕZ (t) = exp{(−1/2)tT t} and thus φ(a) = exp{(−1/2)a}. By Theorem 2.5 of Fang, Kotz & Ng (1990) another equivalent form for X ∼ Sp (φ) is d

X = RU(p) ,

(2.2)

where R ≥ 0 is a random variable independent of U(p) which is uniformly distributed on the unit sphere surface in p–dimensions. For details regarding the relationship between R and φ(·) see Theorem 2.2 in Fang, Kotz & Ng (1990). R may also be found by considering the result of Theorem 2.3 in Fang, Kotz & Ng d

d

d

(1990) where X = RU(p) implies kXk = R independent of X/kXk = U(p) . Using this d

result it can be shown that for Z ∼ N2 (0, I2 ), then Z = RU(p) , where R = χ22 is a chi–squared distribution with 2 degrees of freedom. Elliptical Distributions Elliptical distributions can be defined as extensions of spherical distributions in the same way that the standard normal is extended to the general normal distribution. Definition 2 A random p–vector X is said to have an elliptical distribution if d

X = µ + AT Y,

(2.3)

20 where Y ∼ Sk (φ), µ is a p × 1 vector, and A is a p × k matrix (in the bivariate case of X, p = 2) satisfying Σ = AT A and Rank(Σ) = k. Such an elliptical distribution is denoted X ∼ Ep (µ, Σ, φ). An equivalent definition in Cambanis, Huang & Simons (1981) gives the characteristic function of X ∼ Ep (µ, Σ, φ) as   ϕX (t) = exp itT µ φ tT Σt .

(2.4)

Example (Normal Distribution): If X ∼ N2 (µ, Σ), then the characteristic function of X is ϕX (t) = exp{itT µ} exp{−tT Σt/2}, hence X is elliptically distributed. Example (Cauchy Distribution): If X has a multivariate Cauchy distribution with location µ and association matrix Σ then its characteristic function is o n √ ϕX (t) = exp{itT µ} exp − tT Σt . Thus the Cauchy is elliptically distributed. The bivariate Cauchy is a special case of the bivariate T distribution which has a more complicated form for its characteristic function that varies depending on whether the degrees of freedom are even, odd, or fractional and thus is not reproduced here.1 The Cauchy is also a special case of the elliptically symmetric stable distributions. Example (Stable Distribution, Nolan ,2003): A random p–vector X with independent copies X1 , X2 is said to be stable if it is stable under addition, that is, there exist d

constants a > 0, b > 0, and c such that X = aX1 + bX2 + c. Stable distributions are usually represented by their characteristic functions. If the distribution of X is symmetric 1

See Theorem 3.10 in Fang, Kotz & Ng (1990) for the characteristic functions for even, odd and fractional degrees of freedom. A detailed proof is provided in Sutradhar (1986,1988).

21 and has a characteristic function of the form  ϕX (t) = exp{itT µ} exp −(tT Σt)α/2 , then X is an elliptical stable distribution with tail–index α. As with the multivariate normal, Fang, Kotz & Ng (1990) note that the marginal distributions of an elliptical distribution are elliptical.  If X ∼ Ep (µ, Σ, φ) where X =  Σ11 Σ12  (XT(1) , XT(2) )T , µ = (µT(1) , µT(2) )T and Σ =   then X(1) ∼ Eq (µ(1) , Σ11 , φ) and Σ12 Σ22 X(2) ∼ Er (µ(2) , Σ22 , φ) where p = q + r. 2.1.2

The Shape Parameter ρ

When (X, Y )T ∼ N2 (µ, Σ) with Σ parameterized as   2 ρσx σy   σx Σ= , 2 ρσx σy σy

(2.5)

the parameter ρ is interpreted as the correlation. For the more general bivariate elliptical distributions this is not necessarily the case. To understand the meaning of ρ in the elliptical case, a brief discussion on how it can be thought of as both the shape and association parameter, and how it affects the conditional distributions of bivariate elliptical distributions is given below. For a bivariate elliptical distribution (X, Y )T ∼ E2 (µ, Σ, φ) we can also write Σ in the same form as (2.5). For Σ to remain non-negative definite ρ needs to be in the range of [−1, 1]. Geometrically the parameter ρ can be thought of as the shape parameter in transforming a spherical distribution to an elliptical distribution. The parameters σx and σy can be thought of as scaling transformation parameters which spread or shrink the spherical distribution. The parameter µ denotes a shift of the distribution from the origin. Figure 2.1 illustrates the effects of ρ and σx for a T distribution. It shows that ρ is

22 responsible for changing the shape and changing the angle of the contours of the bivariate density function and that σx scales the distribution in the x–axis (if one considers σy , it would do the same for the y–axis).

2 1 y −1

0

1

2

3

−3

−2

−1

0

1

x

x

σx = 2, σy = 1, ρ = 0

σx = 2, σy = 1, ρ = 0.6

2

3

2

3

2 1 y

−1 −2 −3

−3

−2

−1

0

0

1

2

3

−2

3

−3

y

0 −1 −2 −3

−3

−2

−1

y

0

1

2

3

σx = 1, σy = 1, ρ = 0.6

3

σx = 1, σy = 1, ρ = 0

−3

−2

−1

0 x

1

2

3

−3

−2

−1

0

1

x

Figure 2.1: The Effect of ρ and σx on the Contours of the Density Function for a Bivariate T–Distribution with 4 Degrees of Freedom The parameter ρ also plays a role in the covariance and hence correlation for the elliptical distributions. When the distribution has finite second moments the covariance

23 matrix is found to be (Theorem 2.17 in Fang, Kotz & Ng, 1990)  Cov (X, Y )T = −2φ0 (0)Σ. Thus when ρ = 0 the covariance matrix is a diagonal matrix. It should be noted though that only when the distribution is normal does this result in independence. For example, for a bivariate T distribution with identity covariance matrix, the density function is (Kotz & Nadarajah, 2004) 1 f (x, y) = 2π



 1 2 1+ x + y2 ν

−(ν+2)/2 ,

where ν is the degrees of freedom. This joint density function cannot be factored as the product of the marginal densities, so the marginals are dependent even when ρ = 0. It is also noted that the covariance is also affected by any additional parameters passed to φ(·). In considering how ρ affects the conditional distributions it is first noted that a conditional distribution of a given elliptical distribution is still elliptical but the generator φ(·) is not the same as the original unless the distribution is normal. From Corollary 5 in Cambanis, Huang & Simons (1981), if (X, Y )T ∼ E2 (µ, Σ, φ) then the location of the conditional distribution of Y given X = x is µY |x = µy + ρ

σy (x − µx ), σx

(2.6)

and the conditional spread is σY2 |x = σy2 (1 − ρ2 ).

(2.7)

We see that the conditional location µY |x is the marginal location µy plus a portion ρ of the change in x. When ρ = 0, the conditional location µY |x is the marginal location µy (changes in x does not play a role in determining µY |x ). As to the conditional spread σY2 |x , it has nothing to do with the spread in x, equals a portion (1 − ρ2 ) of the marginal

24 spread σy2 , and becomes zero when ρ = ±1, implying that for any given x, the values of Y do not spread out when ρ = ±1. So the parameter ρ has a similar interpretation in terms of location and spread for the elliptical distributions to the interpretation of ρ for a normal distribution in terms of mean and variance. It is in this sense we say that ρ measures the linear relationship between X and Y . For example, consider a special case where (X, Y)T hascentral bivariate T  1 ρ  distribution with ν degrees of freedom, where Σ =  . The joint density ρ 1 function is (Kotz & Nadarajah, 2004) f (x, y) =

1 2π

p

1 − ρ2



 1 2 2 1+ x + y − 2ρxy ν(1 − ρ2 )

−(ν+2)/2 .

Kotz & Nadarajah (2004) give the conditional density function of Y |X = x as Γ((ν + 2)/2) p p f (y|x) = Γ((ν + 1)/2) π(ν + x2 ) 1 − ρ2



(y − ρx)2 1+ (ν + x2 )(1 − ρ2 )

−(ν+2)/2 .

The conditional location is ρx and the conditional spread is 1 − ρ2 . By examining the density function it is noted in Kotz & Nadarajah (2004) that Y |X has a T distribution with ν + 1 degrees of freedom when x = ±1, otherwise Y |X still has an elliptical (symmetric) distribution. This is an example of the result in Cambanis, Huang & Simons (1981) that the generator φ(·) of the joint distribution is not the same as the generator of conditional distribution.

2.1.3

The Orthant Probability of Elliptical Distributions

Definition 3 The orthant probability of a multivariate random variable X centered at the origin is P (X ≥ 0) = P (X1 ≥ 0, . . . , Xp ≥ 0). Using calculus and two changes of variables as in Lindskog (2000) it is simple to find

25 that for (X, Y ) ∼ N2 (0, Σ) the orthant probability of the normal distribution is 1 sin−1 ρ P (X > 0, Y > 0) = + , 4 2π    α θ  and for a general covariance matrix Σ? =   the orthant probability is θ β 1 1 + sin−1 4 2π



θ √ αβ

(2.8)

 .

Example 2.9 in Fang, Kotz & Ng (1990) states that the orthant probability of any non–degenerate member of the elliptical family of distributions is the same as that of the normal. The following result leads directly to showing that the orthant probability of an elliptical distribution is the same as the normal distribution. Theorem 2.5 in Lindskog (2000) states that every elliptical distribution has a stochastic representation related to the normal. If the non–degenerate random variable X ∼ Ep (µ, Σ, φ) then X can be stochastically represented as d

X = µ + RY,

(2.9)

where Y ∼ Np (0, Σ) and need not be independent of R > 0. The orthant probability of a bivariate elliptical distribution can then be shown to be the same as the normal by considering (X1 , X2 ) ∼ E2 (0, Σ, φ) and (Y1 , Y2 ) ∼ N2 (0, Σ). Then using the representation in (2.9), P (X1 > 0, X2 > 0) = P (RY1 > 0, RY2 > 0) Z ∞ = P (rY1 > 0, rY2 > 0 | R = r)dFR (r) 0 Z ∞ = P (rY1 > 0, rY2 > 0)dFR (r) 0 Z ∞ = P (Y1 > 0, Y2 > 0)dFR (r) 0

= P (Y1 > 0, Y2 > 0) =

1 sin−1 ρ + . 4 2π

26 2.1.4

Relationships with ρ

To show that the relationships between τ or κ and ρ hold for elliptical distributions one additional result is needed. From Theorem 2.16 in Fang, Kotz & Ng (1990), if X ∼ Ep (µ, Σ, φ) then for a p × q matrix A, and an m × 1 vector b, there is b + AT X ∼ Eq (b + AT µ, AT ΣA, φ).

(2.10)

To show the relationship between τ and ρ holds for elliptical distributions, consider two independent copies (X1 , Y1 )T and (X2 , Y2 )T from E2 (µ, Σ, φ). Then       µ   Σ 0   (X1 , Y1 , X2 , Y2 )T ∼ E4   ,   , φ . µ 0 Σ Define





 1 0 −1 0  AT =  , 0 1 0 −1 then



 e Ye )T = AT (X1 , Y1 , X2 , Y2 )T =  (X, 

X1 − X2   ∼ E2 (0, 2Σ, φ) . Y1 − Y2

Rewrite τ as τ = P ((X1 − X2 )(Y1 − Y2 ) > 0) − P ((X1 − X2 )(Y1 − Y2 ) < 0) , which is equivalent to the definition given in Chapter 1 when there are no ties. Using e Ye )T , τ can further be rewritten as (X, e Ye > 0) − P (X e Ye < 0) = 2P (X e Ye > 0) − 1 τ = P (X e > 0, Ye > 0) + P (X e < 0, Ye < 0)) − 1 = 4P (X e > 0, Ye > 0) − 1, = 2(P (X e Ye )T . By the results on the orthant which is a function of the orthant probability of (X, probability for normal distributions, we have   1 sin−1 ρ 2 sin−1 ρ τ =4 + −1= , 4 2π π

27 which is the relationship that Greiner (1909) finds. This is referred to as Greiner’s relationship linking τ and ρ. Since the location parameter of an elliptical distribution is the median of the dise Ye )T ∼ tribution, subtracting the median from (X, Y )T ∼ E2 (µ, Σ, φ) results in (X, E2 (0, Σ, φ). Using the definition of κ given in Chapter 1, we have κ = P ((X − µ ex )(Y − µ eY ) > 0) − P ((X − µ eX )(Y − µ eY ) < 0) e Ye > 0) − P (X e Ye < 0) = P (X e > 0, Ye > 0) + P (X e < 0, Ye < 0) − P (X e < 0, Ye > 0) − P (X e > 0, Ye < 0) = P (X     1 sin−1 ρ 1 sin−1 ρ + − −2 =2 4 2π 4 2π −1 2 sin ρ = . π For %S it seems intuitive based on the definition in Chapter 1 that the relationship holds for all elliptical distributions. This is not the case though. Fang, Fang & Kotz (2002) show via the copula definition of %S that while %S does depend on ρ, it also depends on the marginal distribution of the elliptical distributions. A thorough introduction to copulas is necessary to explain the reasoning behind this result and is beyond the scope of this discussion. Though this is a setback in using estimates based on %bS , these estimates are still investigated to illustrate the extent of this extra dependence. This is made evident in the simulations in Chapter 3 and is discussed there. For a bivariate normal random variable it is possible to show that the relationship is based on the orthant probability. Consider the distribution of       e  X   X1 − X2   1 0 −1 0 0 0  T  = =  (X1 , Y1 , X2 , Y2 , X3 , Y3 ) , Ye Y1 − Y3 0 1 0 0 0 −1 

 2σx2

σx σy ρ  . σx σy ρ 2σy2 The corresponding orthant probability for a normal distribution with such a covariance e Ye )T is  where (Xi , Yi )T ∼ N2 (µ, Σ), i = 1, 2, 3. The covariance matrix of (X, 

28 matrix is 1 sin−1 (ρ/2) + , 2 π By the definition of %S the relationship between %S and ρ when the population is normal is   e Ye > 0 − 3 %S = 6P ((X1 − X2 )(Y1 − Y3 ) > 0) − 3 = 6P X   1 sin−1 (ρ/2) 6 sin−1 (ρ/2) =6 + −3= . 2 π π Even though this relationship is only applicable when the population is normal, it is used in this thesis to create estimates for ρ when the population is elliptical. This is done to determine how much of an effect the deviation from normality has on estimates based on this relationship.

2.2

Pearson’s Rho

Chapter 1 introduces Pearson’s rho as an estimate for ρ. For a bivariate sample, P P (x1 , y1 )T , (x2 , y2 )T , . . . , (xn , yn )T with x¯ = n−1 ni=1 xi and y¯ = n−1 ni=1 , Pearson’s estimate of ρ is Pn

− x¯)(yi − y¯) . P ¯)2 ni=1 (yi − y¯)2 i=1 (xi − x

ρbp = pPn 2.2.1

i=1 (xi

(2.11)

Assumptions

Sheskin (2004) gives a list of assumptions which ρbp is based on. First, the sample measurements must be of the interval or ratio level. Second, the sample should be drawn from a normal distribution. Thus there should be linearity, homoscedasticity, and elliptical shape in the population. In Chapter 1 it is mentioned that there is a large collection of work regarding the robustness to deviations from normality. Kowalski (1972) concludes that ρbp should be limited to cases where the bivariate random variable is normal, or near normal. Based on this it is noted that there should be very few extreme observations in

29 the sample as the normal rarely admits observations greater than three standard deviations from the mean. Thus, like most estimates that are based on a specific distribution, deviations from the original distribution results in less than desirable results. In finance the normality assumption is rarely met for most observed returns (Cont, 2000; among others) thus Pearson’s estimate does not perform as well as it does for normal samples.

2.2.2

Distribution

It can be shown that ρbp has an asymptotically normal distribution. Fisher (1915) gives the exact distribution of ρbp when sampling from a normal distribution. The density function involves an infinite series and is improved upon by Hotelling (1953) where the rate of convergence of the infinite series is faster than Fisher’s. Hotelling (1953) derived the exact distribution of ρbp as (reported in Krishnamoorthy, 2006) (n − 2)Γ(n − 1) (1 − ρ2 )(n−1)/2 (1 − r2 )(n/2)−2 (1 − rρ)−n+(3/2) fρbp (r) = √ 2πΓ(n − (1/2)) ∞ X Γ(j + (1/2))Γ(j + (1/2))Γ(n − (1/2))((1 + rρ)/2)j × . Γ(1/2)Γ(1/2)Γ(n − (1/2) + j)j! j=0 Chapter 1 includes a discussion on the robustness of this distribution to non–normal samples and a conclusion can be drawn that it is affected by even moderate perturbations of normality, especially heavy tails as these are related to the kurtosis. It is possible to use a normal distribution to approximate the distribution of ρbp when the assumption is that ρ = 0 (Sheskin, 2004). For ρ 6= 0, to overcome the difficulty in using the complicated distribution of ρbp the use of Fisher’s Z–transform is suggested (Wackerly, Mendenhall & Scheaffer, 2002; Sheskin, 2004). Fisher’s Z–transform normalizes and stabilizes ρbp . It does though still suffer from many of the same robustness issues as ρbp . Gayen (1951) notes that the Z–transform is also affected by non–normality, mostly in the mean and variance. An improvement on Fisher’s Z–transformation is considered in Hotelling (1953) along with the moments of the transform. This thesis does not con-

30 centrate on the transformation; it considers the bias, variance, and RMSE of the actual estimate ρbp . Bias The estimate ρbp is a biased estimate of ρ. When the underlying distribution is a bivariate normal the expected value is an infinite series of which Hotelling (1953) gives the first few terms as 2



E(b ρp ) = ρ + (1 − ρ )

 −ρ ρ(1 − 9ρ2 ) ρ(1 + 42ρ2 − 75ρ4 ) + + + ... . 2n 8n2 16n3

(2.12)

The bias is then approximately Bias(b ρp ) = E(b ρp ) − ρ ≈

ρ(ρ2 − 1) , 2n

(2.13)

which is an odd function of ρ. Variance Hotelling (1953) also gives the variance of ρbp as an infinite series, 2 2

V (b ρp ) = (1 − ρ )



 1 11ρ2 −24ρ2 + 75ρ4 + + + ... . n 2n2 2n3

(2.14)

Thus the RMSE of the estimate ρbp when sampling from a normal distribution is approximately RM SE(b ρp ) =

q

r 2

V (b ρp ) + Bias2 (b ρp ) ≈ (1 − ρ )

1 ρ2 + 2. n 4n

(2.15)

While it is possible to ascertain results such as the RMSE for ρbp for the normal case, it is not so simple in other contexts. It is well understood that this estimate is greatly affected by deviations from normality and this is illustrated via simulations in Chapter 3.

31

2.3

Kendall’s Tau

Like all non–parametric estimates of association there are no specific distributional assumptions made in the construction of τb. The only assumption is that the data is at least ordinal. In Section 1.1.4 the estimation procedure that Kendall (1938) proposed was introduced for the case of no ties. Throughout the remainder of this thesis it is assumed that the samples are drawn from continuous distributions and hence the possibility of a tie is negligible. To formally work with the estimate τb of the non–parametric association measure τ , the notation of Hollander & Wolfe (1999) is adopted. For a bivariate sample (x1 , y1 )T , (x2 , y2 )T , . . . , (xn , yn )T , the estimate of τ that corresponds to Kendall’s estimate can be written as n

X  2 τb = Q (xi , yi )T , (xj , yj )T , n(n − 1) i 0

yj − yi xj − xi

 ,

.

  −1 if t < 0 This estimate is invariant to monotone transformations in the data, in particular it is invariant to the ranking transformation and hence may be easily calculated based on ranks.

2.3.1

Distribution

Kendall (1938) shows that under the null hypothesis that τ = 0, τb is asymptotically normally distributed. For finite samples Conover (1999) gives tables of exact upper and lower quantiles for n ≤ 60, and Kendall (1970) notes that the normal curve “provides a satisfactory approximation” (Kendall, 1970, p. 51) for samples larger than 10. In the

32 non–null case (τ 6= 0) Hoeffding (1947) shows that τb is also asymptotically normally distributed. Note should be made that a larger n is needed for a normal approximation to be appropriate in this case. Bias  τb is an unbiased estimate for τ . To see this, consider Q (X1 , Y1 )T , (X2 , Y2 )T to find Z   T T Q (x1 , y1 )T , (x2 , y2 )T dFXY (x1 , y1 )dFXY (x2 , y2 ) E(Q (X1 , Y1 ) , (X2 , Y2 ) ) = R Z Z dFXY (x1 , y1 )dFXY (x2 , y2 ) − dFXY (x1 , y1 )dFXY (x2 , y2 ) = S+ S−     Y1 − Y2 Y1 − Y 2 =P >0 −P 0 ,  S − = (x1 , y1 )T , (x2 , y2 )T : (y1 − y2 )/(x1 − x2 ) < 0 . Based on this result, n

X 2 2 τ= E(b τ) = n(n − 1) i 0 − P sign < 0 = κ, Yi − ye Yi − ye where S + = {(xi , yi ) : (xi − x e)/(yi − ye) > 0} , S + = {(xi , yi ) : (xi − x e)/(yi − ye) < 0} , thus E(b κ) = n

−1

n X

E(Q(Xi , Yi )) = κ,

(2.24)

i=1

and hence κ b is an unbiased estimate for κ. Variance To calculate the variance of κ b it is trivial to note that E(Q2i ) = 1, and since Qi and Qj are independent then E (Qi Qj ) = E (Qi ) E (Qj ) = κ2 . Thus the variance of κ b is  !2  ! n n n X X X V (b κ) = n−2 E  Qi  − κ2 = n−2 E(Q2i ) + 2 E(Qi Qj ) − κ2 i=1



= n−2 n +

i=1 2

2n(n − 1)κ 2



i 0) is (1/2) + sin−1 ρ/π, then the result would be the same as if κ were replaced by its relationship with ρ. The variance of this non–parametric estimate of association is much higher than that of τb. Blomqvist (1950) notes this and gives the asymptotic relative efficiency between

36 the two estimates as 4/9 when ρ = 0. Thus it is expected that the estimate of ρ based on κ b has larger variance than the other estimates. 2.4.2

Quadrant Estimate ρbq of ρ

Using the estimate for κ and inverting the relationship between ρ and κ results in an estimate for ρ based on κ b,  ρbq = sin

πb κ 2

 .

The bias, variance and RMSE can be expressed using the same Taylor expansion as with ρbg . Since none of the higher moments of κ b depend on the specific elliptical distribution it is expected that the bias, variance, and RMSE are the same for any elliptical distribution. This is illustrated in Chapter 3.

2.5

Spearman’s Rho

Spearman’s rho requires ranking each of the marginal samples and then calculating Pearson’s product moment estimate on the ranked data. This leads to an estimate of the non–parametric measure of association %S . The popularity of this rank based estimate is in part due to its simple calculation.

2.5.1

Distribution

Under the assumption of independence the distribution of %bS is approximately normal. Conover (1999) gives a good overview of the test of independence based on %bS along with τb. In the non–null case it is discussed in Sheskin (2004) that Fisher’s Z–transformation can be used for tests based on %bS when n ≥ 10 and %S < 0.9.

37 Bias Unlike the previous two non–parametric estimates of association the estimate %bS is a biased estimate for %S . Kendall (1970) shows that E(b %S ) =

3τ n−2 %S + , n+1 n+1

(2.26)

so the bias is Bias(b %S ) =

3(τ − %S ) . n+1

(2.27)

Asymptotically %bS is unbiased, but for finite samples it is not. An alternative, more natural and unbiased estimate for %S is suggested in Kruskal (1958) which is related to the estimate of τ in that it requires the consideration of the  concordance of all n3 triples of the sample points. The estimate proves to be equivalent to n+1 3 %bS − τb. n−2 n−2 While the benefits of this estimate are that it is unbiased and is derived in a more natural manner (Kruskal, 1958), the estimate %bS is simpler to compute and is more widely adopted and hence is used as the estimate for %S . Variance Kendall (1970) notes that there does not exist a formula for the variance of the estimate %bS and gives an upper bound for the variance when no assumptions are made to the sampling distribution V (b %S ) ≤

3(1 − %2S ) . n

(2.28)

The calculation of the variance of %bS when sampling from a normal distribution is quite involved and has been given in the form of an infinite series. Kendall (1970) gives the following expression, where ρ is the correlation of the bivariate normal, V (b %S ) =

 1 1 − 1.5635ρ2 + 0.3047ρ4 + 0.1553ρ6 + 0.0616ρ8 + 0.0242ρ10 + . . . . (2.29) n

38 This expansion has been improved upon in various works, for example, in David & Mallows (1961).

2.5.2

Spearman’s Estimate ρbs of ρ

To estimate ρ using Spearman’s rho it is possible to use %bS as the estimate for %S thus giving the estimate  ρbs = 2 sin

πb %S 6

 ,

with the subscript s to denote that this is based on Spearman’s rho. Since the non–null distribution of %bS is quite complicated, the bias, variance and RMSE are illustrated in Chapter 3.

2.5.3

Kendall’s Estimate ρbk of ρ

Accounting for the bias in estimating %S by %bS , Kendall (1970) suggests an alternative estimate for the association ρ, namely,    π 3(b τ − %bS ) ρbk = 2 sin %bS − , 6 n−2

(2.30)

with the subscript k denoting that this estimate was proposed by Kendall. As with the other non–parametric estimate of ρ the properties of this estimate are also best ascertained via simulation. All together the five estimates that are considered in this thesis for estimating ρ are ρbp , ρbg , ρbq , ρbs and ρbk . In Chapter 3, simulation results are used to study the bias, variance and root mean square error of these estimates.

Chapter 3 Simulation Study In the previous chapter it is seen that analytically determining the bias, variance, and hence the root mean square error (RMSE) of many of the estimators is not possible in even the normal case, never mind for other elliptical distributions. Due to this a thorough simulation study of the estimators is undertaken. Previous simulations using the estimate ρbg are undertaken in Evandt et al. (2004), Lindskog (2000), and Rupinski & Dunlop (1996) with the latter also considering ρbs . Lindskog (2000) considers three distributions, the normal, and the bivariate T with one or three degrees of freedom for which the mean square error of both ρbp and ρbg are empirically estimated via simulation for samples of size n = 30, 90, 300 and values of ρ = 0.1, 0.3, 0.5, 0.7, 0.9. Rupinski & Dunlap (1996) use Greiner’s relationship, and the relationship between Spearman’s rho and ρ to deal with both Kendall’s tau and Spearman’s rho when doing a meta-analysis based on ρbp . Their findings are that the biases and standard errors of the resulting estimators are small enough and comparable enough that the two relationships may be used instead of simply ignoring analyses that use Kendall’s tau or Spearman’s rho. Evandt et al. (2004) suggest using Greiner’s relationship to construct what they term, “a little known robust estimator” for the correlation coefficient, ρ, when a normal sample is contaminated by a few outliers. It is shown in a small simulation that the estimate of ρ using τb and Greiner’s relationship is at least as good as Spearman’s rho. The estimates ρbg , ρbq , ρbs , and ρbk are all robust estimates for the association parameter ρ. Their robustness to outliers is illustrated by considering various heavy–tailed elliptical and contaminated normal distributions. 39

40

Table 3.1: Estimates Used in the Simulation Estimate ρbp ρbg ρbq ρbs ρbk

3.1

Name Pearson Greiner Quadrant or Blomqvist Spearman Kendall

Simulation #1: Estimates for Elliptical Distributions

To determine some of the properties of these robust estimates of association for elliptical distributions various specific distributions are considered. First, results for the estimates of association when sampling from a bivariate normal distribution are given. The Cauchy and the bivariate T are also considered to give an illustration of the effect of heavy tails on the estimates.

3.1.1

Design

The design of this simulation in part follows the simulation in Evandt et al. (2004). To estimate the bias, variance, and the RMSE of the estimates in various conditions N = 105 samples of the desired distribution are generated for sample sizes n = 20, 100 and for values of ρ = 0, 0.01, 0.02, . . . , 0.99 and the estimates ρbp , ρbg , ρbq , ρbs , and ρbk are calculated for each sample. The bias is estimated by the mean of all the samples for each value of ρ minus the true value. The variance is estimated as the variance of all the samples for each value of ρ. All the R codes are included in Appendix B along with a brief discussion on how the random samples are generated. The RMSE is estimated by \ RM SE(b ρ) =

q 2 \ρ) , V[ (b ρ) + Bias(b

(3.1)

\ρ) and V[ where Bias(b (b ρ) are the estimated bias and variance for any of the five estimates

41 of ρ. As with ρbp , the bias of the other four estimates appear to be odd functions of ρ and only the non-negative values of ρ are considered here. The distributions considered are the normal (NORM), the Cauchy (CAUC), and the T with degrees of freedom ν = 2, 4, 8, 12, 16, 20 (TDF2,. . . ,TDF20). The results are presented graphically with the resulting curves for the bias, variance and RMSE smoothed using R’s built in spline smoother. In all the figures the bias, variance and RMSE presented are based on the simulation results and not the analytical results. Figure 3.1 presents a legend for the presentation of the simulation results, which is be used for the majority of the figures presented in this chapter. ^ ρ g

^ ρ p

^ ρ q

^ ρ s

^ ρ k

1

Figure 3.1: Legend for Simulation Results

3.1.2

Normal

It is well documented that ρbp is developed specially for estimating ρ when the sample is from a normal distribution. It is illustrated here that most of the robust estimates fare 1

only marginally worse than ρbp . The samples are generated from a bivariate normal distribution with density function f (x1 , x2 ) =

1 2π |Σ|1/2

 exp

 −1 T −1 (x − µ) Σ (x − µ) , 2

(3.2) 



 1 ρ  where x = (x1 , x2 )T , the covariance matrix is standardized such that Σ =   and ρ 1 the mean is µ = (0, 0)T . It is noted in Figure 3.2 that ρbk is the least biased out of the five estimates for ρ and that ρbg and ρbp have nearly identical biases with the difference between them being quite small compared to the differences between the biases of the other estimates. The biases

0.00

0.00

0.05

0.05

0.10

RMSE 0.10

0.20

RMSE

0.15

0.25

0.35

0.15

0.30

0.000

0.00

0.005

0.02

0.08

0.020

0.10

0.025

0.12

−0.010

−0.05 0.0

0.0

0.0 k

−0.008

−0.006

−0.03

−0.04

^ ρ s ^ ρ

0.2

0.2

0.2 0.4 0.6

0.4 0.6

0.4

0.6

ρ 0.8

0.8

0.8

1.0 0.015

Variance

0.06

Variance

q

^ ρ p ^ ρ g ^ ρ

0.010

0.04

−0.004

Bias

−0.02

Bias

−0.002

−0.01

0.000

0.00

42

n=20 n=100

1.0 0.0

1.0 0.0

0.0

0.2

0.2

0.2

0.4

ρ ρ

0.4

ρ ρ

0.4

0.6 0.8 1.0

0.6 0.8 1.0

0.6

0.8

1.0

ρ

Figure 3.2: Bias, Variance, RMSE for NORM (n = 20, 100)

43 do not contribute greatly to the RMSE. It is readily apparent that ρbp has a lower variance and RMSE than all other estimates as expected. The estimates ρbg , ρbs , and ρbk all have nearly identical variance and RMSE, particularly for n = 100. The estimate based on the quadrant measure has a significantly higher variance and RMSE than all the other estimates. This can be attributed to the very simple design of the estimate and the large variability in κ b. To better compare the RMSE the ratios of the RMSE of ρbg , ρbq , ρbs and ρbk over the RMSE of ρbp and the ratios of the RMSE of ρbq , ρbs , ρbk , and ρbp over the RMSE of ρbg are plotted in Figure 3.3.

1.5 0.0

0.5

1.0

Ratio

1.0 0.0

0.5

Ratio

1.5

2.0

Ratio of RMSE of Estimates to RMSE of Greiner's Estimate

2.0

Ratio of RMSE of Estimates to RMSE of Pearson's Estimate

0.0

0.2

0.4

0.6

0.8

1.0

ρ

0.0

0.2

0.4

0.6

0.8

1.0

ρ

Figure 3.3: Comparing Ratios of RMSE for NORM (n = 100) It is noted from Figure 3.3 that ρbg has a lower RMSE than ρbs when ρ is approximately greater than 0.4 and has a consistently lower RMSE than ρbk . This is illustrated throughout the following simulations. Overall, the estimate ρbp performs the best among the five estimates. It is also found that ρbg is a better robust estimate for large ρ.

44 3.1.3

Student’s T

There are many situations where the assumption of normality is not met. One of the most common issues is with the kurtosis of the sample. The tails are too heavy for the sample to have come from a normal distribution, but it instead may have come from some other elliptical distribution that has heavier tails such as a bivariate T distribution, a bivariate Cauchy distribution, or a bivariate stable distribution. In this section bivariate T distributions are considered over differing degrees of freedom. The Cauchy situation is dealt with in the next section. The samples in this section are generated from the bivariate T distribution with ν degrees of freedom. The density function is f (x1 , x2 ) =

1 2π |Σ|1/2



−(1+ ν2 ) 1 T −1 1 + (x − µ) Σ (x − µ) , ν 

(3.3)



 1 ρ  where x = (x1 , x2 )T , the association matrix Σ =  , and the location parameter ρ 1 µ is set to the origin (0, 0)T . By varying the degrees of freedom for the T distribution one varies the kurtosis, with a smaller value of the degrees of freedom implying a higher kurtosis and in turn heavier tails. The degrees of freedom considered are 2, 4, 8, 12, 16 and 20. The corresponding samples are labeled TDF2, TDF4, TDF8, TDF12, TDF16, and TDF20, respectively. While it has not been considered here, it is possible to use numeric optimization to find maximum likelihood estimates for the location, scale and shape parameters. It would be expected that an estimate for ρ based on this approach would provide an estimate that has better properties than those of ρbp for T–distributions. Depending on the degrees of freedom it is seen in the following figures that the bias, variance, and RMSE results vary for many of the estimators. This is discussed in more detail in Section 3.1.5.

45

n=100

Bias

−0.02

−0.03 −0.05

−0.03

−0.04

Bias

−0.02

−0.01

−0.01

0.00

0.00

n=20

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.8

1.0

0.6

0.8

1.0

0.6

0.8

1.0

ρ

0.06 0.04

Variance

0.10 0.00

0.00

0.02

0.05

Variance

0.08

0.15

0.10

ρ

0.6

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4 ρ

0.20

RMSE

0.15 0.10

0.2

0.00

0.05

0.1 0.0

RMSE

0.3

0.25

0.30

0.4

ρ

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4 ρ

Figure 3.4: Bias, Variance, RMSE for TDF2 (n = 20, 100)

46

2.5 2.0 0.0

0.5

1.0

1.5

Ratio

1.5 0.0

0.5

1.0

Ratio

2.0

2.5

3.0

Ratio of RMSE of Estimates to RMSE of Greiner's Estimate

3.0

Ratio of RMSE of Estimates to RMSE of Pearson's Estimate

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

ρ

Figure 3.5: Comparing Ratio of RMSE for TDF2 (n = 100) The results for ν = 2 are presented in Figures 3.4 and 3.5. It is apparent that all the robust estimates have smaller variance and RMSE than ρbp . The two estimates ρbs and ρbk , which are based on %bS , have larger (in absolute value) biases than ρbp for the larger sample size. For the smaller sample size, the biases of the robust estimates ρbq and ρbs are greater than that of ρbp . The results for ρbg and ρbq have not fluctuated much in comparison to the normal case. When considering where the RMSE for ρbg becomes less than that of ρbs and ρbk it appears that this occurs near ρ = 0.4 (Figure 3.5) and is comparable to the result in Section 3.1.2. Overall it is noticed that for ν = 2 the estimates ρbg and ρbq are the least biased for large samples and that for all the robust estimates the variances and RMSEs are comparable to the normal results. The results for ν = 4 are displayed in Figures 3.6 and 3.7. Pearson’s product moment correlation coefficient is less biased than all the robust estimates except ρbg , and ρbk when n = 20. ρbp now has a smaller variance and RMSE than ρbq . In terms of variance and

0.00

0.00

0.05

0.05

0.10

0.25

0.15

0.30

0.35

0.000

0.00

0.005

0.02

0.010

0.04

0.0

0.0 0.020

0.08

0.025

0.10

0.2

0.2

0.2 0.4 0.6

0.4 0.6

0.4

0.6

ρ 0.8

0.8

0.8

1.0 0.015

Variance

0.06

Variance

0.0

0.10

RMSE

0.20

RMSE

0.15

−0.020

−0.05

−0.015

−0.04

Bias

−0.02

Bias

−0.010

−0.03

−0.005

−0.01

0.000

0.00

47

n=20 n=100

1.0 0.0

1.0 0.0

0.0

0.2

0.2

0.2

0.4

ρ ρ

0.4

ρ ρ

0.4

0.6 0.8 1.0

0.6 0.8 1.0

0.6

0.8

1.0

ρ

Figure 3.6: Bias, Variance, RMSE for TDF4 (n = 20, 100)

48

1.5 0.0

0.5

1.0

Ratio

1.0 0.0

0.5

Ratio

1.5

2.0

Ratio of RMSE of Estimates to RMSE of Greiner's Estimate

2.0

Ratio of RMSE of Estimates to RMSE of Pearson's Estimate

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

ρ

Figure 3.7: Comparing Ratio of RMSE for TDF4 (n = 100) RMSE little has changed from the ν = 2 case. The value of ρ where ρbg has lower RMSE than ρbs and ρbk remains approximately 0.4. Overall the trend that ρbp is improving in the sense of its RMSE decreasing is beginning. This is further emphasized as ν increases. It is apparent that when ν = 8 (Figure 3.8) the bias of ρbg and ρbp are nearly identical and that the biases of ρbs and ρbk , are starting to decrease with the bias of the latter decreasing at a faster rate. It is noted that even by ν = 20, ρbk does not surpass the bias of ρbp and ρbg to attain the bias for the case when samples are from a normal distribution. For these higher values of ν it is apparent that by ν = 12 the variance and RMSE of ρbp is comparable to that of ρbg , ρbs , and ρbk . By ν = 16 all of the proposed robust estimates have larger RMSE than ρbp with the exception of ρbs for very low values of ρ (Figures 3.9 and 3.10) and by ν = 20 all the robust estimates have higher RMSE than ρbp . Overall for the T distribution it is noticed that ρbg and ρbq are the most consistent. This fact is further illustrated in Section 3.1.5. The least biased of the estimates is consistently ρbg , with ρbp and ρbk closely approaching the bias of ρbg as ν increases. In the case of ρbp it eventually has lower bias than ρbg . Furthermore, as the degrees of freedom increase it

49

RMSE

0.00

0.05

0.10

0.15

0.10

0.15

0.010

0.000

0.005

0.010

0.000

0.005

0.010

0.000

0.005

0.010

0.015

Bias 0.020

0.025

0.020

0.025

0.020

0.025

0.020

0.025

!0.012

!0.008

!0.004

0.000

0.2

0.2

0.2

0.4

0.4

0.4

!

DF=8

!

DF=8

!

0.6

0.6

0.6

0.8

0.8

0.8

1.0

1.0

1.0 RMSE

Variance 0.015

Bias !0.010

!0.008

!0.006

!0.004

!0.002

0.000

0.0

0.0

0.0

0.2

0.2

0.2

0.4 !

!

DF=12

0.4

!

DF=12

0.4

0.6

0.6

0.6

0.8

0.8

0.8

1.0

1.0

1.0 RMSE

Variance 0.015

Bias

!0.002

0.000

0.8 1.0

1.0

1.0

!0.006

0.000

0.6

0.8

0.8

!0.008

!0.002

!

0.6

0.6

!0.010

!0.004

0.4

!

DF=16

0.4

!

DF=16

0.4

!0.006

0.2

0.2

0.2

!0.008

0.0

0.0

0.0

!0.010

DF=16

0.05

0.15

0.005

DF=12

0.00

0.10

0.000

DF=8

0.05

0.15

0.0

0.00

Variance 0.10

0.0

0.05

0.0

0.00

RMSE

Variance 0.015

Bias !0.004

0.0

0.0

0.0

0.2

0.2

0.2

!

DF=20

0.4

!

DF=20

0.4

!

DF=20

0.4 0.6

0.6

0.6

0.8

0.8

0.8

1.0

1.0

1.0

Figure 3.8: Bias, Variance, RMSE for TDF8, TDF12, TDF16, TDF20 (n = 100)

50

1.5 0.4

0.6

0.8

1.0

0.0

0.2

0.4

ρ

ρ

df=16

df=20

0.6

0.8

1.0

0.6

0.8

1.0

1.5 1.0 0.5 0.0

0.0

0.5

1.0

Ratio

1.5

2.0

0.2

2.0

0.0

Ratio

1.0

Ratio

0.0

0.5

1.0 0.0

0.5

Ratio

1.5

2.0

df=12

2.0

df=8

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4 ρ

Figure 3.9: Comparing Ratios of RMSE for the Estimates to the RMSE of ρbp for TDF8, TDF12, TDF16, TDF20 (n = 100)

51

1.5 0.4

0.6

0.8

1.0

0.0

0.2

0.4

ρ

ρ

df=16

df=20

0.6

0.8

1.0

0.6

0.8

1.0

1.5 1.0 0.5 0.0

0.0

0.5

1.0

Ratio

1.5

2.0

0.2

2.0

0.0

Ratio

1.0

Ratio

0.0

0.5

1.0 0.0

0.5

Ratio

1.5

2.0

df=12

2.0

df=8

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4 ρ

Figure 3.10: Comparing Ratios of RMSE for the Estimates to the RMSE of ρbg for TDF8, TDF12, TDF16, TDF20 (n = 100)

52 is noticed that ρbp starts to improve its performance to the point where for ν = 20 the variance and RMSE are consistently lower than those of all the other robust estimates. This is also illustrated in Section 3.1.5 where the effect of changing ν on each estimate is presented. When the sample is from a T distribution it is noticed that the properties of ρbs and ρbk fluctuate with ν. This is expected according to the caution in Fang, Fang & Kotz (2002), that the relationship between ρ and %S is dependent on the distribution of the underlying population. For samples that are close to normal (large ν), ρbp performs nearly the same as it does for the normal. The other estimates perform in a similar fashion. This is to be expected as the normal is the limiting distribution of the T as ν → ∞. In the next case the Cauchy distribution is investigated to discuss the issues that some of these estimates have with extremely heavy–tails. The results are expected to be similar to those for the T with ν = 2. Additional illustrative comparisons of the estimates and the effect of the number of degrees of freedom are given in the subsequent subsection.

3.1.4

Cauchy

The bivariate Cauchy is a special case of both the symmetric stable distributions and the T distribution. It is a T distribution with one degree of freedom, or it can be thought of as a symmetric stable distribution with tail index of α = 1. Considering the Cauchy in either manner quickly yields the result that this distribution does not even have a finite mean, and thus it is significantly heavy tailed. The samples are generated from a bivariate Cauchy distribution with the density function f (x1 , x2 ) =

1 2π |Σ|1/2

−3/2 1 + (x − µ)T Σ−1 (x − µ) ,

(3.4)

53 where x = (x1 , x2 )T and the parameters µ and Σ are standardized such that µ = (0, 0)T  1 ρ  and Σ =  . ρ 1 For the Cauchy distribution with density (3.4) the correlation does not exist due to moment requirements but the parameter ρ does determine the shape of the density function. As the mean also does not exist, the location parameter µ does not have the interpretation as the mean of the distribution, although it is still the location parameter and corresponds to the median of the distribution.

3 0

1

2

Ratio

2 0

1

Ratio

3

4

Ratio of RMSE of Estimates to RMSE of Greiner's Estimate

4

Ratio of RMSE of Estimates to RMSE of Pearson's Estimate

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

ρ

Figure 3.11: Comparing Ratios of RMSE for CAUC (n = 100) Simulation results are presented in Figure 3.11 and Figure 3.12. Due to the extremely heavy tails, ρbp is an inappropriate estimate for ρ due to its large bias, variance and RMSE. It is worth noting that ρbk and ρbs also have exceptionally large biases, whereas ρbg and ρbq retain similar biases as in the T distribution and normal case. This feature is attributed to fact that relationship between %S and ρ that the estimates are based on is only appropriate for normal populations. It appears that for ρbg and ρbq the bias remains the same no matter what the distribution is. This is also the case for the variance and RMSE of ρbq In Figure 3.11 the RMSE ratios appear to be very similar to those for the

54

n=100

−0.02 Bias

−0.08

−0.08

−0.06

−0.04

−0.04 −0.06

Bias

−0.02

0.00

0.00

n=20

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.8

1.0

0.6

0.8

1.0

0.6

0.8

1.0

ρ

0.20

Variance

0.15 0.10

0.2 0.0

0.00

0.05

0.1

Variance

0.25

0.3

0.30

0.35

ρ

0.6

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4 ρ

0.3

RMSE

0.2

0.3

0.1

0.2

0.0

0.1 0.0

RMSE

0.4

0.4

0.5

0.5

0.6

0.6

ρ

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4 ρ

Figure 3.12: Bias, Variance, RMSE for CAUC (n = 20, 100)

55 cases in Sections 3.1.2 and 3.1.3, except that for large ρ, ρbq is a better estimate than ρbs and ρbk . 3.1.5

Dependence on Heavy Tails

In the previous sections the robust estimates and ρbp are compared for different distributions. Here the effect of heavy tails is further investigated. It is well known that the T distribution converges to the normal as ν → ∞, thus using the results for the normal, the T distribution, and the Cauchy which is T with ν = 1, it is possible to compare the bias, variance, and RMSE for differing heavy–tailed behaviors. The results are illustrated in Figures 3.13, 3.14, and 3.15. The bias, variance, and RMSE for all five estimates are expected to converge to the normal results as ν increases. This behavior is very apparent for all the estimates with the exception of ρbq which has very static variances regardless of the degrees of freedom. The estimates ρbg and ρbq are expected to be the most robust to changing the heaviness of the tails due to the appropriateness of the two relationships they are based on for all elliptical distributions. ρbq is the more robust of the two, particularly in respect to its variance and because of this, its RMSE. Empirically the variance of ρbq appears not to fluctuate whatever the degrees of freedom are. This is understandable since all points in each quadrant are treated equally and therefore unaffected by extreme values. The variance of this estimate though is greater than that of the other estimates for most values of ν. In regards to the bias there is very little fluctuation for both ρbg and ρbp . The dependence on heavy tails for the estimates ρbs and ρbk , which depend on the relationship between %S and ρ is apparent in all three figures, particularly for the bias. It is easy to differentiate between the results when ν is small and when ν is large. For example the largest difference between the bias occurs when the sample is from T with ν = 1 and ν = 2.

56

0.00

Bias(ρ^p)

bias

−0.08

−0.06

−0.04

−0.02

TDF1 TDF2 TDF4 TDF8 TDF12 TDF16 TDF20 NORM

0.0

0.2

0.4

0.6

0.8

1.0

ρ

Bias(ρ^g) 0.00 −0.02 bias 0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

ρ

ρ

Bias(ρ^s)

Bias(ρ^k)

0.8

1.0

0.8

1.0

−0.02 bias

−0.06 −0.08

−0.08

−0.06

−0.04

−0.04

−0.02

0.00

0.2

0.00

0.0

bias

−0.04 −0.06 −0.08

−0.08

−0.06

bias

−0.04

−0.02

0.00

Bias(ρ^q)

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4

0.6 ρ

Figure 3.13: Effect of Tails on the Bias of the Estimates (n = 100)

57

0.030

Variance(ρ^p)

0.015 0.000

0.005

0.010

variance

0.020

0.025

TDF1 TDF2 TDF4 TDF8 TDF12 TDF16 TDF20 NORM

0.0

0.2

0.4

0.6

0.8

1.0

ρ

Variance(ρ^g) 0.030 0.025 0.020 0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

ρ

ρ

Variance(ρ^s)

Variance(ρ^k)

0.8

1.0

0.8

1.0

0.025 0.020 0.015 0.010 0.005 0.000

0.000

0.005

0.010

0.015

variance

0.020

0.025

0.030

0.2

0.030

0.0

variance

0.015

variance

0.000

0.005

0.010

0.015 0.000

0.005

0.010

variance

0.020

0.025

0.030

Variance(ρ^q)

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4

0.6 ρ

Figure 3.14: Effect of Tails on the Variance of the Estimates (n = 100)

58

0.20

RMSE(ρ^p)

0.10 0.00

0.05

rmse

0.15

TDF1 TDF2 TDF4 TDF8 TDF12 TDF16 TDF20 NORM

0.0

0.2

0.4

0.6

0.8

1.0

ρ

RMSE(ρ^g) 0.20 0.15 0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

ρ

ρ

RMSE(ρ^s)

RMSE(ρ^k)

0.8

1.0

0.8

1.0

0.15 0.10 0.05 0.00

0.00

0.05

0.10

rmse

0.15

0.20

0.2

0.20

0.0

rmse

0.10

rmse

0.00

0.05

0.10 0.00

0.05

rmse

0.15

0.20

RMSE(ρ^q)

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4

0.6 ρ

Figure 3.15: Effect of Tails on the RMSE of the Estimates (n = 100)

59 The traditional estimate ρbp is expected to perform worse for small values of ν and it does. When considering the results though there is a point somewhere between ν = 4 and ν = 8 where the RMSE starts to be approximately the same as the RMSE for the normal sample. This observation is not limited to ρbp ; it is true of all the estimates. For small ν, the RMSE of ρbp varies more than the other 4 estimates.

3.2

Simulation #2: Contaminated Normal

If there are a few outliers in a bivariate normal sample then it is possible that ρbp is greatly affected. This is briefly illustrated in Chapter 1. Here a normal sample is contaminated with some outliers that could be cleaned out, or they could be interesting observations that should be kept. If they are kept then ρbp is unduly affected by their presence whereas the robust estimates for ρ are not affected as much. Evandt et al. (2004) consider using ρbg to estimate ρ when the normal sample is contaminated with some outliers. In this simulation their simulation is repeated and expanded. The most glaring results in their simulation are seen when the outliers take opposite positions in the data. This can result in the extreme case where ρbp switches to the opposite sign. To expand the simulation done in Evandt et al. (2004) more values of ρ are considered along with additional outlier situations. In Evandt et al. (2004) ρbg is compared with ρbp and %bS for estimating ρ from a standard normal sample of size n = 50 and ρ = 0.2, 0.8. Three situations are considered, the first with no outliers added, the second with two outliers at (−4, −4) and (4, 4). The third situation considers outliers at (−4, 4) and (4, −4). The outliers used in this simulation are quite extreme with the probability of observing these points or something more extreme being as small as 10−5 . Here additional cases are presented where the outliers are not as extreme, such as considering outliers at

60 (−3, −3), (−3, 3), (3, −3), (3, 3) where the probability of observing these events is much higher (on the order of 10−3 ) and it is more difficult to determine whether these are outliers or usual observations. This simulation illustrates the robustness to outliers that the various estimates of ρ have, particularly ρbq which treats all points in a quadrant equally. 3.2.1

Design

The outliers are added as pairs to a randomly generated bivariate sample by replacing randomly selected sample points. The outlier cases considered are A Outliers at (−4, −4) & (4, 4) B Outliers at (−4, 4) & (4, −4) C Outliers at (0, −4) & (0, 4) D Outliers at (−3, −3) & (3, 3) E Outliers at (−3, 3) & (3, −3) F Outliers at (0, −3) & (0, 3) These cases are illustrated below in Figure 3.16, with a contour plot of the density included to highlight that these are outliers. The sample sizes considered are n = 20, 50, 100 from a standard bivariate normal distribution with ρ = 0, 0.01, 0.02, . . . , 0.99. N = 104 samples of each combination of ρ, n, and outlier case are used to estimate the bias, variance and RMSE. The results focus on the biases of the estimates but the variance and the RMSE are given as well. As with the previous simulation a spline smoother is used to generate smooth curves for the three performance measures.

61

C

A

F

D

D

F

E

C

A

E

F

D

D

F

E

0 A −4

C −2

0

B 2

4

−4

−2

0 −2 −4

B

2

E

4

B

ρ = 0.6

2

4

ρ=0

A −4

C −2

0

B 2

4

Figure 3.16: Outlier Situations Considered for Simulation # 2 with Bivariate Contours 3.2.2

Results

The simulated bias, variance and RMSE for the cases A–F and n = 20, 50, 100 are presented in Figures 3.17 to 3.22. In Figures 3.17 and 3.18 the biases of the cases A–F are illustrated and it is noted that the estimate ρbq has the most consistently small bias of all the estimates across all the situations. For the in–line outliers (A and D) the three estimates ρbg , ρbs , and ρbk have nearly the same biases. For the other cases ρbg is less biased. Overall it is noted that ρbp is definitely more biased than all the robust estimates in every situation. When considering the variance of the estimates (Figures 3.19 and 3.19) it is noted that ρbp has the lowest variance among all estimates when considering cases A and D. This can be attributed to the dominating and consistently over–estimating effect of the outliers on ρbp . ρbq has the highest variance out of all the estimates considered with the exception of very high values of ρ and cases B and E where ρbp has a higher variance. This is expected as the quadrant estimator has the highest variance out of all the estimates of association. The other three estimates of ρ have roughly equivalent variances across all

62

A, n=50

A, n=100

0.3

0.5

^ ρ p ^ ρ g ^ ρ q

0.15 Bias 0.10

0.2

0.3

Bias

0.4

^ ρ s ^ ρ k

0.00

0.0

0.0

0.1

0.05

0.1

0.2

Bias

0.20

0.6

0.4

0.25

A, n=20

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

ρ

ρ

ρ

B, n=20

B, n=50

B, n=100

0.8

1.0

0.8

1.0

0.8

1.0

−0.2 0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.6

C, n=20

C, n=50

C, n=100

Bias

−0.15

Bias

−0.10

−0.05

0.0 −0.1 −0.2

−0.20 0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

−0.12 −0.10 −0.08 −0.06 −0.04 −0.02

0.00

ρ

−0.3

0.2

0.4

ρ

−0.4 0.0

0.2

ρ

0.00

0.2

−0.5

−0.8

−0.4

−0.6

−0.3

Bias

Bias

−0.4

−0.2

−0.4 −0.6 Bias

−0.8 −1.0 −1.2 0.0

Bias

−0.1

−0.2

0.0

0.0

0.0

0.2

0.4

ρ

Figure 3.17: Bias for Outlier Simulation: Cases A, B, C

0.6 ρ

63

0.25

0.05 0.00

0.00

0.0

0.05

0.1

0.10

Bias

Bias

0.3 Bias 0.2

0.10

0.20

q

^ ρ s ^ ρ k

0.15

0.4

^ ρ p ^ ρ g ^ ρ

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

E, n=50

E, n=100

0.8

1.0

0.6

0.8

1.0

0.0

0.4

0.6

F, n=100 0.00 −0.02 −0.06

−0.10

Bias

Bias

−0.15

−0.05

−0.10

−0.05

0.00

0.00

F, n=50

−0.20

−0.08

−0.15

Bias

0.2

F, n=20

0.6

0.8

1.0

1.0

−0.15

Bias 0.4

ρ

ρ

0.8

−0.20 0.2

ρ

0.4

1.0

−0.30 0.0

ρ

−0.25

0.2

0.8

−0.25

−0.4 −0.5 0.6

1.0

−0.10

−0.1 −0.3

Bias

−0.2

−0.4 −0.6

0.4

0.8

−0.05

E, n=20

−0.30 0.0

0.6 ρ

−0.8

0.2

0.4

ρ

−1.0 0.0

0.2

ρ

−0.04

0.2

−0.2

0.0

Bias

D, n=100 0.15

D, n=50

0.5

D, n=20

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

ρ

Figure 3.18: Bias for Outlier Simulation: Cases D, E, F

0.6 ρ

64

A, n=50

Variance

0.03

0.005

Variance

0.00

0.00

0.000

0.01

0.04

0.02

0.06

k

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.6 ρ

B, n=20

B, n=50

B, n=100

0.8

1.0

0.8

1.0

0.8

1.0

0.4

0.6

0.8

1.0

Variance

0.005

0.010

0.03 Variance

0.02

0.000

0.01 0.00

0.02

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

ρ

ρ

ρ

C, n=20

C, n=50

C, n=100

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.015

Variance

0.010 0.005

Variance

0.000

0.00

0.00

0.02

0.01

0.04

0.02

0.06

0.03

0.08

0.020

0.04

0.10

0.025

0.2

0.015

0.020

0.04

0.10 0.08 0.06 0.04

Variance

0.4

ρ

0.00 0.0

Variance

0.2

ρ

0.025

0.0

0.015

0.020

0.04

q

^ ρ s ^ ρ

0.010

0.10 0.08

^ ρ p ^ ρ g ^ ρ

0.02

Variance

A, n=100 0.025

A, n=20

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4

0.6 ρ

Figure 3.19: Variance for Outlier Simulation: Cases A, B, C

65

D, n=50

Variance

0.03 Variance

0.00

0.000

0.005

0.01

0.02

0.06 0.04 0.00

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

ρ

ρ

ρ

E, n=20

E, n=50

E, n=100

0.8

1.0

0.8

1.0

0.8

1.0

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.4

0.6

ρ

ρ

F, n=20

F, n=50

F, n=100

0.2

0.4

0.6 ρ

0.8

1.0

Variance

0.005

0.010

0.03 Variance

0.02

0.000

0.01 0.00 0.0

0.015

0.020

0.04

0.10 0.08 0.06 0.04 0.02 0.00

Variance

0.2

ρ

0.025

0.2

0.015

Variance

0.005 0.000

0.01 0.00

0.02 0.00 0.0

0.010

0.03 Variance

0.02

0.06 0.04

Variance

0.08

0.020

0.04

0.10

0.025

0.0

0.015

0.020

0.04

q

^ ρ s ^ ρ k

0.010

0.10 0.08

^ ρ p ^ ρ g ^ ρ

0.02

Variance

D, n=100 0.025

D, n=20

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4

0.6 ρ

Figure 3.20: Variance for Outlier Simulation: Cases D, E, F

66

A, n=50

0.0

0.0

0.00

0.1

0.05

0.1

0.2

0.10

0.15

RMSE

0.3

0.3

RMSE

k

0.2

^ ρ s ^ ρ

0.4

0.20

0.4

0.6 0.5

^ ρ p ^ ρ g ^ ρ q

RMSE

A, n=100 0.25

A, n=20

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

ρ

ρ

ρ

B, n=20

B, n=50

B, n=100

0.8

1.0

0.8

1.0

0.8

1.0

0.5

0.2

0.2

0.1

0.2

0.4

0.2

0.3

RMSE

0.6 RMSE

0.4

0.8 0.6

RMSE

1.0

0.4

1.2

0.8

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.4

0.6

ρ

ρ

C, n=20

C, n=50

C, n=100

0.10

RMSE

0.15

RMSE

0.10

0.20

0.40 0.35 0.30 0.25

0.05

0.20

0.05

0.15 0.10

RMSE

0.2

ρ

0.15

0.0

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

ρ

Figure 3.21: RMSE for Outlier Simulation: Cases A, B, C

0.6 ρ

67

D, n=50

0.25

0.15

0.5 0.4

^ ρ p ^ ρ g ^ ρ q

0.4

0.6

0.8

1.0

RMSE

0.05 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.6 ρ

E, n=20

E, n=50

E, n=100

0.8

1.0

0.8

1.0

0.8

1.0

0.25

0.5 0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

0.20

1.0

0.0

0.2

0.4

0.6

ρ

ρ

F, n=20

F, n=50

F, n=100 0.15

ρ

0.10 RMSE

0.10

0.10

0.05

0.05

0.15

0.20

RMSE

0.15

0.25

0.30

0.20

0.2

0.15

RMSE

0.1

0.05

0.10

0.2

0.3

RMSE

0.4

0.8 0.6 0.4

RMSE

0.4

ρ

0.2 0.0

RMSE

0.2

ρ

0.30

0.2

1.0

0.0

0.00

0.0

0.00

0.05

0.1

0.10

0.15

RMSE

0.3

k

0.10

0.20

^ ρ s ^ ρ

0.2

RMSE

D, n=100

0.30

D, n=20

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4

0.6 ρ

0.8

1.0

0.0

0.2

0.4

0.6 ρ

Figure 3.22: RMSE for Outlier Simulation: Cases D, E, F

68 the cases, though ρbg has a minutely higher variance for cases B and E. When considering the RMSE it is apparent in Figures 3.21 and 3.22 that the bias does play a much larger role than in Simulation # 1. This is especially true for ρbp whose RMSE is dominated by the bias term. Due to its lower bias, the estimate ρbq has an RMSE closer to those of the other robust estimates, unlike the results in the previous simulations. In fact ρbq has the best performance for estimating ρ in situations B and E for small samples.

3.3

Discussion

It has been noted that the robust estimates, particularly ρbg , ρbs and ρbk all perform decently well for samples from elliptical distributions when compared to ρbp . This is expected as ρbp is not robust to heavy tails / high kurtosis (see Kowalski, 1972; Kraemer, 1980; along with the discussion in Chapter 1). When the tails are the heaviest it is noted that ρbg is the least biased of all the estimates. As the tails lighten and become similar to the tails of a normal distribution it is seen that ρbk is moderately less biased than ρbg and ρbp . The biases of ρbg and ρbq remain nearly constant across all elliptical distributions presented, and the variance of ρbq does so as well. The results are expected for ρbq based on the results in Section 2.5.2 where it is noted that the bias, variance, and RMSE of the estimate are not affected by the sample’s distribution. The performance of ρbk and ρbs are dependent on the heaviness of the tails of the distribution whereas the performance of ρbg and ρbq are not affected as much in this manner. The estimates ρbs and ρbk have a significantly larger variation across the various distributions. The estimate ρbq does not perform better than the other estimates in any of the cases considered in Simulation #1 due to the significant amount of information reduction involved in calculating κ b. The only case where it shows promise is with extremely heavy

69 tails such as the Cauchy distribution. Even there it is still outperformed by the estimate ρbg based on Greiner’s relationship. ρbq does though present a useful estimate of the association when outliers are present and could be used as an estimate for discovering highly influential outliers. It is also the only estimate whose bias, variance and RMSE are consistent for any elliptical distribution. It was expected that ρbp would be a poor performer for both the heavy–tailed and the outlier cases. This is because it is not intended for these situations due to its lack of robustness to deviations from normality. Overall it is noted that ρbg performs quite well in all situations. If this were to be a competition among the robust estimates it is suggested that this estimate win for its versatility. This is particularly true when highlighting the dependence on the distribution that %S and its subsequent estimates of ρ have. The estimate ρbq based on the quadrant measure of association should get special mention as it is the simplest to calculate, is appropriate for all elliptical distributions and does result in a decent estimate for ρ especially in Simulation # 2.

Chapter 4 Applications Even though many classic financial problems such as portfolio optimization and value at risk are originally based on normal models, in practice it is noted that the empirical properties of the financial data do not fit a normal model. Many authors have discussed this, the references in Galea, Diaz–Garcia & Vilca (2005) and Cont (2001) list some examples. Cont (2001) presents a list of stylized empirical facts regarding financial data, specifically the returns, and one of the primary stylized aspects of these data sets is that they have heavy–tails or equivalently they have excess kurtosis. The use of models that allow for heavy–tails are used regularly in finance to account for this issue. Some suggested models have distributions that are in the family of elliptical distributions. Hauksson et al. (2001) suggest that “while the tails of the normal distribution are too thin for most financial assets, it is still possible that some other elliptical distribution could fit the returns better.” (Hauksson et al., 2001, p. 87–88). Nolan (2003) uses stable distributions (some of which are elliptical) to fit the returns of foreign exchange (FX) rates and allow for significant heavy tails. A more moderate distribution which allows for slightly lighter tails (still not as light as the normal) is the T distribution, which is used by Breyman, Dias & Embrechts (2003) to model FX returns. Using elliptical distributions for the capital asset pricing model is discussed in Galea, Diaz–Garcia & Vilca (2005 and the references within). Lamantia, Ortobelli & Rachev (2006) discuss value at risk models using elliptical distributions. Kat (2002, p. 1) considers the association or shape parameter ρ the “single most important parameter in portfolio theory.” This is due to its importance for a classical portfolio optimization problem. Pafka & Kondor (2004) also mention its importance 70

71 in the fields of investment theory, capital allocation, and risk management where it is important to have reliable estimates of the association. Therefore the robust estimates of ρ studied in this thesis are useful alternatives to ρbp which is not warranted due to non–normality especially when the data show high kurtosis. The use of Greiner’s relationship to estimate the association parameter in econometric analysis has been considered by Breymann, Dias & Embrechts (2003) when studying the log–returns for high frequency exchange rate data. ρbg was used there since it is more robust and efficient for estimating ρ than Pearson’s product moment estimator. From the results in this thesis it appears appropriate to use ρbq in the same application. In this chapter, we study three pairs of foreign exchange (FX) rates. For the first pair, we show that ρbp is comparable to the robust estimates. For the second pair we show that ρbp underestimates the association. For the third pair we show that both ρbp and an estimate used in financial data analysis overestimate the association. The standard errors (SE) of the various association estimates in the following applications are calculated using the bootstrap method of Efron (1979). The details of this bootstrap method are given in Appendix A.

4.1

Association in Log–Returns of Exchange Rates: #1

In this application the log–returns of the Canadian–US, Euro–US, Australian–US, and the New Zealand–US exchange rates are investigated. While it is naive to assume that these data sets are independent (they are time series) many practitioners do not consider this to be an issue and perform analysis on the data under the assumption that they are. Hence for this example which is illustrative of the benefits of using these robust estimates of ρ it is assumed that the assumption of independent daily log–returns is appropriate.

72 Supposing that the exchange rate on day t is xt then the log–returns are defined as  rt = ln 4.1.1

xt xt−1

 .

(4.1)

Canadian–US & Euro–US

As an example consider the log–returns on the noon buying rates in New York City for cable transfers payable in foreign currencies for the Canada-US (CUS) rate and the EURO-US (EUS) rate from January 4, 1999 till February 7, 2007. The data is downloadable from the St. Louis Federal Reserve website. The total number of daily returns is 2035 (from 2036 days of trading). The scatter plot of the bivariate data is given in Figure 4.1 along with box–plots of the marginals showing moderate amounts of heavy tail behavior. Each of the five estimates of ρ is given in the following table with the bootstrap SEs in parentheses.

Figure 4.1: CUS and EUS Exchange Rate Log–Returns

73

Table 4.1: Estimates of ρ for the Bivariate Daily Log–Returns of the CUS and EUS FX Rates (Bootstrap SEs in Parentheses) ρbp 0.3308 (0.0200)

ρbg 0.3326 (0.0233)

ρbq 0.2987 (0.0315)

ρbs 0.3280 (0.0233)

ρbk 0.3281 (0.0219)

It is noted that all the estimates of ρ give similar values with the exception of ρbq which normally has large variability. The SEs for all of the estimates are also comparable. This similarity is expected as the empirical distribution of these returns do not contain many outliers/extreme values that could greatly effect ρbp . In the next example this is not be the case. Thus for the Canada–US and Euro–US log–returns it is estimated that the linear association would be approximately 0.33 based on the consensus of the five estimates for ρ with the exception of ρbq , and once one accounts for the large SE of ρbq it too would concur with this approximation.

4.1.2

Australia–US & New Zealand–US

Consider the New-Zealand-US (NUS) and the Australia-US (AUS) FX rates over a longer period (Jan. 04, 1971 – Feb. 07, 2007) where any dates that are missing at least one value have been removed. The following illustrates the large difference any potential outliers would have on the estimation of ρ. Table 4.2: Estimates of ρ for the Bivariate Daily Returns of the AUS and NUS FX Rates since 1971 (Bootstrap SEs in Parentheses) ρbp 0.5445 (0.0336)

ρbg 0.6374 (0.0081)

ρbq 0.6260 (0.0099)

ρbs 0.6016 (0.0086)

ρbk 0.6016 (0.0087)

In Figure 4.2 and Table 4.2 it is noticed that the heavy tail behavior illustrated in the

74

Figure 4.2: AUS and NUS Exchange Rate Log–Returns since 1971 box–plots leads to an under estimation of the association between these two exchange rates using ρbp . Considering the SEs of the estimates it is noticed that the heavy tails have affected the SE of ρbp , producing an estimate with wide variability whereas the robust estimates all have relatively small SEs. By considering the estimates ±2 × SE there may be a significant difference between ρbp and ρbg or ρbq . This is of particular concern in building a diversified portfolio where having low association leads to good diversification and highly associated returns should be avoided (Kat, 2002). Thus if ρbp were used here the risk may be underestimated in comparison to using the robust estimates. Overall when considering the association between the log–returns of the New Zealand– US and Australian–US exchange rates a good estimate of the association parameter under the assumption that the rates follow a bivariate elliptical distribution would be somewhere between 0.62 and 0.64 based on the estimates of ρbg and ρbq .

75 Examining contemporary dates of the Australia–US and New Zealand–US rates (January 4, 1999 till February 7, 2007, Figure 4.3) some of the most extreme returns are no longer included in the sample yet there are still enough so that the robust estimates provide a better estimate than ρbp .

Figure 4.3: AUS and NUS Exchange Rate Log–Returns since 1999

Table 4.3: Estimates of ρ for the Bivariate Daily Returns of the AUS and NUS FX Rates since 1999 (Bootstrap SEs in Parentheses) ρbp 0.7315 (0.0306)

ρbg 0.7903 (0.0101)

ρbq 0.7943 (0.0164)

ρbs 0.7803 (0.0109)

ρbk 0.7805 (0.0110)

Upon analysis of the five estimates of association it is noticed that ρbp continues to underestimate in comparison to the general consensus of the four robust estimates, particularly ρbg and ρbq . It is also noted that ρbs and ρbk are again slightly less than the two

76 most consistent estimates of ρ. The SEs here are slightly larger for the robust estimates as the sample size has decreased, while the SE for ρbp has remained nearly the same based on the sample having slightly lighter tails. Due to this there does not appear to be as much of a glaring difference between the estimates apart from the difference of precision. Based on these results it is suggested that the estimate of association between these two log–returns is approximately 0.79. A confidence interval could be given using bootstrapping, or by transforming the confidence interval for τb as in Newson (2001). Using these estimates of association it is apparent that ρbp possibly gives a poor estimate of the association parameter. Attention should be paid in determining if the sample appears to be from an elliptical distribution. If not, other methods such as copulas would be more appropriate, yet more involved.

4.2

Association in Log–Returns of Exchange Rates: #2

Two of the usually assumed families of distributions instead of the normal for financial data are the T distributions, and the family of elliptically symmetric stable distributions. Thus for estimation purposes one needs to estimate the location parameters, the spread parameters, the association parameter(s), and the weight of the tails determined by the tail index (α) for the stable distributions and the degrees of freedom (ν) for the T. Here the five estimates of association is compared to the estimate provided by Nolan (2003) for stable distributions.

4.2.1

Elliptically Contoured Stable Distributions

Stable distributions allow for heavy tails, and symmetric stable distributions are also members of the elliptical family of distributions. In Chapter 2 the elliptical symmetric

77 stable distribution was defined to have the characteristic function, n α/2 o exp − xT Σx ,

0 < α ≤ 2,

which has the form of an elliptical characteristic function. Nolan (2003) uses the following procedure for estimating the association matrix Σ. First, estimate the scale parameter σi for each Xi , denote the estimate as σbi . This can be done by the method of maximum likelihood. Second, find the scale parameter estimate   2 2 −1 \ \ for Xi + Xj , denoted by σ(i, j). Then σ cii = σbi and σc σ(i, j) − σ cii − σc ij = 2 jj . Thus, for a bivariate sample the estimated association matrix is   σc c 11 σ 12  b = Σ  , σc c 12 σ 22 and the estimate of the association parameter ρ is 2

\ σ(i, j) − σc − σc 22 √ 11 . ρb = 2 σc c 11 σ 22

(4.2)

Nolan uses this method to estimate the association of the following pair of exchange rates.

4.2.2

German Mark & Japanese Yen versus British Pound

Nolan (2003) fits a bivariate symmetric stable distribution to the log–returns of the German Mark and Japanese Yen against the British Pound for the period January 2, 1980 to May 21, 1996. He utilizes some of the properties of stable distributions to estimate an association matrix from which the value ρ is estimated as 0.4469. The SE of this estimate is not provided. Figure 4.4 gives the scatter plot of the log–returns and the marginal box–plots. Note that as with the two previous pairs of exchange rates the box–plots imply heavy tailed marginals. In Table 4.4 the standard estimate ρbp and the four robust estimates for the association parameter are given for this data set.

78

Figure 4.4: Ger–UK and Jpn–UK Exchange Rate Log–Returns

Table 4.4: Estimates of ρ for the Bivariate Daily Returns of the MBP and YBP (Bootstrap SEs in Parentheses) ρbp 0.4354 (0.0183)

ρbg 0.3845 (0.0159)

ρbq 0.3576 (0.0227)

ρbs 0.3757 (0.0156)

ρbk 0.3758 (0.0167)

Nolan (2003) 0.4469

79 Note in this case that Nolan’s estimate and ρbp are comparable and higher than all the non–parametric estimates. Nolan does not give a SE for his estimate making it difficult to compare with the others. The other SEs all appear to be similar. The overestimation of ρ by ρbp may be partially due to the outlying value in the bottom left of the scatter plot. From cases A and D in the outlier simulation (Section 3.2) it is noted that if there are outliers like this then ρbp overestimates ρ as it does here. The non–parametric measures appear to reach similar conclusions with ρ being estimated between 0.36 and 0.38 using a consensus of the robust measures. As for Nolan’s estimate, it appears to provide an overestimate of the association parameter ρ. The results in this section show that the use of the robust estimates, particularly ρbg , is useful when estimating the association parameter ρ for financial data. Based on this it is possible to estimate the association matrix Σ using the median absolute deviation (MAD) method as in Breymann, Dias & Embrechts (2003), where M AD = median (|xi − median(x)|) for a sample x = (x1 , . . . , xn ). Using this result it is then possible to continue the analysis of the data for various portfolio and risk analysis procedures such as portfolio optimization (Lauprete, Samaroc & Welsch, 2002) and value at risk.

Chapter 5 Conclusions 5.1

Summary

It is established in this thesis and in Lindskog (2000) that the non–parametric measures of association τ and κ have relationships to the association parameter ρ in elliptical distributions. It is also suggested here that there is a relationship between %S and ρ which is dependent on the distribution. Based on these relationships it becomes possible to construct non–parametric estimates of association as robust estimates for ρ when the population distribution is elliptical. In Chapter 2 some analytic results are given showing that τ , %S and κ all have relationships with the correlation coefficient of the normal distribution (ρ). This is shown based on the probabilistic definitions of these measures and the fact that these definitions can be represented as orthant probabilities. For samples from a normal distribution all the relationships are well documented and for some there has been a history of discussion bordering on a century. For the elliptical case it is noted that caution is needed as the relationship between %S and ρ is dependent on the underlying distribution. The relationships of the remaining two non–parametric measures (τ, κ) to the association parameter ρ do hold for all elliptical distributions. Based on these relationships a total of four estimates for the association parameter are proposed, and their bias, variance and RMSE studied. In Chapter 3 simulation results are given to show the properties of the proposed estimates. The dependence on the distribution of the estimates based on %S is shown in Simulation #1 and the primary conclusion drawn is that overall ρbg is the most desirable 80

81 of the four proposed estimates. In Simulation #2 it is noted that ρbq is the estimate least affected in terms of bias when there are outliers. Hence even though ρbq has the largest variance it is quite accurate at estimating the association when outliers are present. The lack of robustness of ρbp that is discussed in Chapter 1 is also seen in both the heavy–tailed and outlier simulations. Chapter 4 illustrates a practical use of the proposed estimates of association. It focusses on the log–returns for foreign exchange rates and it is noted that in some cases the common estimate ρbp of association provides substantially lower or higher estimates of ρ than the proposed estimates and hence its use may result in overexposure or underexposure to risk. In these cases, our proposed estimates of ρ seem to be preferred. Overall the four estimates based on non–parametric measures of association are useful in estimating the association when the distribution is elliptical with heavy–tails, or has outliers since they are more robust that ρbp which is not designed for use in these situations. It is noted that the estimate of association based on τ provides the most consistent, and accurate estimate. The estimate based on the quadrant measure performs admirably in terms of bias when outliers are present and definitely is the most consistent in its properties amongst the estimates. The estimates based on the relationship between %S and ρ are illustrated to be dependent on the distribution though they still at times outperform the Pearson’s product moment estimate.

5.2

Future Work

Part of the impetus for this thesis is extending the results of Renyi’s seventh axiom from the specific assumption of normality to more general cases, in particular the elliptical distributions. It is noted that a general extension does not work since for certain measures of associations the relationships between them and the association parameter ρ is not a

82 strictly increasing function, and there is some dependence on the underlying distribution. This is emphasized in the work of Fang, Fang & Kotz (2002) where copula interpretations are used. Hence in further work it would be desirable to utilize copulas to arrive at results for the elliptical distributions. In Simulation #1 there appears to be a relationship between %S and ρ that is affected by the degrees of freedom. An open question is: Is there a function γ such that %S = γ(ρ, ν) for a bivariate T distribution with ν degrees of freedom? Other work pertaining to the estimates investigated here is the construction of confidence intervals for ρ which have appropriate coverage properties. Newson (2001) has suggested using the standard confidence interval for τ and transforming the bounds using Greiner’s relationship to create a robust confidence interval for ρ. Alternative confidence interval construction methods would also be of interest such as the Samara–Randles construction for τ (Hollander & Wolfe, 1999), the Z–transform for %S , and various bootstrap procedures. The application of the estimates and the relationships that spawned them to time series could also be investigated. Ferguson, Genest & Hallin (2000) show that Kendall’s tau can be adapted for serial dependence. Based on this it would be of interest to know whether ρbg , which is based on Kendall’s tau, can provide a good initial estimate for the coefficient of an AR(1) model. Other non–parametric measures of association share similar relationships to ρ in the normal case. Measures such as Blest’s measure of association (Blest, 2000), the measure proposed by Gideon & Hollister (1987), and Gini’s coefficient of cograduation are not considered here. Further consideration of these alternative measures and others may provide some interesting results.

Bibliography [1] Abdullah, M. (1990). On a Robust Correlation Coefficient. The Statistician, 39, 455–460. [2] Blest, D.C. (2000). Rank Correlation–An Alternative Measure. Australian and New Zealand Journal of Statistics, 42, 101–111. [3] Blomqvist, N. (1950). On a Measure of Dependence Between Two Random Variables. The Annals of Mathematical Statistics, 21, 593–600. [4] Breymann, W., Dias, A., & Embrechts, P. (2003). Dependence Structures for Multivariate High–Frequency Data in Finance. Quantitative Finance, 3, 1–14. [5] Brockwell, P. & Davis, R. (1991). Time Series: Theory and Methods, 2 Ed. Springer, New York. [6] Casella, G. & Berger, R. (1990). Statistical Inference. Duxbury Press, Belmont, California. [7] Chernick, M. (1982). The Influence Function and its Application to Data Validation. American Journal of Mathematical and Management Sciences, 2, 263–288. [8] Conover, W.J. (1999). Practical Nonparametric Statistics. 3rd Edition. Wiley, New York. [9] Cont, R. (2001). Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues. Quantitative Finance, 1, 223–236. [10] David, F. & Mallows, C. (1961). The Variance of Spearman’s Rho in Normal Samples. Biometrika, 48, 19–28.

83

84 [11] Devlin, S., Gnanadesikan, R., & Kettenring, J. (1975). Robust Estimation and Outlier Detection with Correlation Coefficients, Biometrika, 62, 531–558. [12] Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7, 1–26. [13] Efron, B. (1981). Nonparametric Estimates of Standard Error: The Jackknife, The Bootstrap, and Other Resampling Methods. Biometrika, 68, 589–599. [14] Esscher, F. (1924). On a Method of Determining Correlation from the Ranks of Variates. Skandinavisk Akturietidskrift, 7, 201–219. [15] Evandt, O., Coleman, S., Ramalhoto, M. & van Lottum, C. (2004). A Little Known Robust Estimator of the Correlation Coefficient and Its Use in a Robust Graphical Test for Bivariate Normality with Applications in the Aluminum Industry. Quality and Reliability Engineering International, 20, 433–456. [16] Fang, dia

K. of

(2006).

Statistical

Elliptically Sciences,

Contoured

Ed.

Kotz,

Distributions, S.

et

al.

in

Wiley,

EncyclopeNew

York.

http://www.mrw.interscience.wiley.com/emrw/9780471667193/home [17] Fang, H., Fang, K., & Kotz, S. (2002). The Meta–Elliptical Distributions with Given Marginals. Journal of Multivariate Analysis, 82, 1–16. [18] Fang, K., Kotz, S., & Ng, K. (1990). Symmetric Multivariate and Related Distributions. Chapman and Hall, New York. [19] Fetchner, G.T. (1897). Kollektivmasslehre. Wilhelm Engelmann, Leipzig. [20] Ferguson, T., Genest, C., & Hallin, M. (2000). Kendall’s Tau for Serial Dependence. Canadian Journal of Statistics, 28, 587–604.

85 [21] Fisher, R. A. (1915). Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Population. Biometrika, 10, 507–521. [22] Galea, M., Diaz–Garcia, J. & Vilca, F. (2005). Influence Diagnostics in the Capital Asset Pricing Model Under Elliptical Distributions. Research Report, UNICAMP Institute of Mathematics, Statistics and Scientific Computation. http://www.ime.unicamp.br/rel pesq/2005/ps/rp42-05.pdf [23] Gayen, A. (1951). The Frequency Distribution of the Product–Moment Correlation Coefficient in Random Samples of Any Size Drawn from Non–Normal Universes. Biometrika, 38, 219–247. [24] Gideon, R. & Hollister, R. (1987). A Rank Correlation Coefficient Resistant to Outliers. Journal of the American Statistical Association, 82, 656–666. [25] Greiner, R. (1909). Uber das Fehlersystem der Kollekivmasslehre. Zeitschrift fur Mathematik und Physik, 57, 121, 225, and 337. [26] Griffin, H.D. (1958). Graphic Computation of Tau as a Coefficient of Disarray. Journal of the American Statistical Association. 53, 441–447. [27] Hauksson, H., Dacorongna, M., Domenig, T. & Samorodnitsky, G. (2001). Multivariate Extremes, Aggregation and Risk Estimation, Quantitative Finance, 1, 79–95. [28] Hoeffding, W. (1947). On the Distribution of the Rank Correlation Coefficient τ When the Variates are Not Independent. Biometrika, 34, 183-196. [29] Hollander, M. and Wolfe, D. (1999). Nonparametric Statistical Methods. Wiley, New York. [30] Hothorn, T., Bretz, F. & Genz, A. (2001). On Multivariate–T and Gauss Probabilities in R. R–News, 1:2, 27–29.

86 [31] Hotelling, H. (1953). New Light on the Correlation Coefficient and its Transforms. Journal of the Royal Statistical Society Series B (Methodological), 15, 193–232. [32] Jensen, of

D.

Statistical

(2006). Sciences,

Multivariate Ed.

Kotz,

Distributions, S.

et

al.

in Wiley,

Encyclopedia New

York.

http://www.mrw.interscience.wiley.com/emrw/9780471667193/home [33] Kat, H.M. (2002). The Dangers of Using Correlation to Measure Dependence. ISMA Centre Discussion Papers in Finance. [34] Kelker, D. (1970). Distribution Theory of Spherical Distributions and a Location– Scale Parameter Generalization. Sankhya Series A, 32, 419–430. [35] Kendall, M.G. (1938). A New Measure of Rank Correlation. Biometrica. 30, 81–89. [36] Kendall, M.G. (1949). Rank and Product–Moment Correlation, Biometrika, 36, 177– 193. [37] Kendall, M.G. (1970). Rank Correlation Methods. Charles Griffin & Company Ltd., London. [38] Kotz, S. & Nadarajah, S. (2004). Multivariate t Distributions and Their Applications. Cambridge University Press, Cambridge. [39] Kowalski, C. (1972). On the Effects of Non–Normality on the Distribution of the Sample Product–Moment Correlation Coefficient, Applied Statistics, 21, 1–12. [40] Kraemer, H. (1980). Robustness of the Distribution Theory of the Product Moment Correlation Coefficient. Journal of Educational Statistics, 5, 115–128. [41] Krishnamoorthy, K. (2006). Handbook of Statistical Distributions with Applications. Chapman & Hall/CRC, Boca Raton.

87 [42] Kruskal, W. H. (1958). Ordinal Measures of Association, Journal of the American Statistical Association, 53, 814–861. [43] Lamantia, F., Ortobelli, S. & Rachev, S. (2006). An Empirical Comparison Among VaR Models and Time Rules with Elliptical and Stable Distributed Returns. Technical Report, Institute of Statistics and Mathematical Economic Theory, University of Karlsruhe. http://www.pstat.ucsb.edu/research/papers/VaRfa23lv.pdf [44] Lauprete, G., Samarov, A., & Welsch, R. (2002). Robust Portfolio Optimization. Metrika, 55, 139–149. [45] Lehmann, E. (1966). Some Concepts of Dependence. The Annals of Mathematical Statistics, 37, 1137–1153. [46] Lindberg, J.W. (1925). Uber die Korrelation, VI Skandinaviska Mathematikerkongress, Copenhagen, 437–446. [47] Lindberg, J.W. (1929). Some Remarks on the Mean Error of the Percentage of Correlation, Nordic Statistical Journal, 1, 137–141. [48] Lindskog, F. (2000). Linear Correlation Estimation. Preprint, RiskLab ETH Zurich, www.risklab.ch/ftp/papers/LinearCorrelationEstimation.pdf. [49] Lindskog, F., McNeil, A., & Schmock, U. (2002). Kendall’s Tau for Elliptical Distributions. In: Credit Risk: Measurement, Evaluation and Management, G. Bohl, G. Nakhaeizadeh, S.T. Rachev, T. Ridder, K.H. Vollmer, Eds. Physica–Verlag, Heidelberg, 649–676. [50] Lipps, G.F. (1905). Die Bestimmung der Abhangigkeit zwischen den Merkmalen eines Gegenstandes, Berichte uber die Vorhandlungen der Koniglich Sachsischen

88 Gesellschaft der Wissenschaften zu Leipzig, Mathematisch–Physische Klasse, 57, 1– 32. [51] March, L. (1905). Comparaison Numerique de Courbes Statistiques, Journal de la Societe de Statistique de Paris, 46, 255–277 and 306–311. [52] Nadarajah, S. (2003). The Kotz–Type Distributions with Applications. Statistics, 37, 341–358. [53] Newson, R. (2001). Parameters Behind “Non–Parametric” Statistics: Kendall’s τa , Somers’ D and median differences. The Stata Journal, 1, 1–20. [54] Nelsen, R. (2006). An Introduction to Copulas. Springer, New York. [55] Nolan, J. (2003). Modeling Financial Data with Stable Distributions, in Handbook of Heavy Tailed Distributions in Finance. Ed. S.T. Rachev, 106–129. [56] Pafka, S. & Kondor, I. (2004). Estimate Correlation Matrices and Portfolio Optimization. Physica A, 343, 623–634. [57] Pearson, K. (1920). Notes on the History of Correlation. Biometrika, 13, 25–45. [58] Porter, T. M. (1986). The Rise of Statistical Thinking: 1820–1900. Princeton University Press, Princeton, New Jersey. [59] Prokhorov, A. (2002). Correlation Ratio, in Encyclopaedia of Mathematics, Ed. M. Hazewinkel, Springer, New York. http://eom.springer.de. [60] Rodgers, J. & Nicewander, A. (1988). Thirteen Ways to Look at the Correlation Coefficient. The American Statistician, 42, 59–66.

89 [61] Rupinski, M. and Dunlop, W. (1996). Approximating Pearson Product–Moment Correlations From Kendall’s Tau and Spearman’s Rho. Educational and Psychological Measurement, 56, 419–429. [62] Sandiford, P. (1929). Educational Psychology. Longmans, Green and Co., New York. [63] Schweizer, B. and Wolff, E. (1981). On Nonparametric Measures of Dependence for Random Variables. The Annals of Statistics, 9, 879-885. [64] Sheskin, D.J. (2004). Handbook of Parametric and Nonparametric Statistical Procedures, 3rd Ed. Chapman & Hall. [65] Spearman, C. (1904). The Proof and Measurement of Association Between Two Things. American Journal of Psychology, 15, 72–101. [66] Stigler, S. (1989). Francis Galton’s Account of the Invention of Correlation. Statistical Science, 4, 73–79. [67] Sutradhar, B.C. (1986). On the Characteristics Function of Multivariate Student T–Distribution. Canadian Journal of Statistics, 14, 329–337. [68] Sutradhar, B.C. (1988). Author’s Revision. Canadian Journal of Statistics, 16, 323. [69] van Belle, G. (2002). Statistical Rules of Thumb. John Wiley and Sons, New York. [70] Wackerly, D., Mendenhall, W. & Scheaffer, R. (2002). Mathematical Statistics with Applications. Duxbury, Pacific Grove, California. [71] Yuan, K. & Bentler, P. (2000). Inferences on Correlation Coefficients in Some Classes of Non–Normal Distributions. Journal of Multivariate Analysis, 72, 230–248.

Appendix A Bootstrap Estimation of the Standard Error of the Estimates To estimate the standard errors (SE) of the five estimates a basic bootstrap procedure is used based on Efron (1979, 1981). In Hollander & Wolfe (1999) a bootstrap method is suggested to determine the SE for τb when distributional assumptions are not made. Based on this a bootstrap procedure is used to estimate the SE of the estimates. When using the standard bootstrap one issue is that ties exist in the resampled data. Since the estimates proposed are developed based on the assumption of no ties this is an issue. Thus the tied versions of τb, %bS and κ b are used when there are ties. The simulations in Chapter 3 did not contain ties, thus the tied versions of the non–parametric estimates were not discussed. Here they are necessary and are presented. To calculate κ b when there are ties, the only situation that is a problem is when ties equal the medians. This is treated in the same way in which the estimate is calculated when the sample size is odd. To calculate τb when there are ties the program R uses the procedure outlined in Sheskin (2004). Let NC and ND denote the number of concordant and discordant pairs then calculate TX =

s X

(t2i(X) − ti(X) ),

i=1

where ti(X) denotes the number of tied values of x for a given rank, and s is the total number of different tied sets in the x observations. Do the same for the y observations, and denote the result by TY . The corrected estimate for τ when there are ties is τbcorrected = p

2(NC − ND ) p . n(n − 1)TX n(n − 1)TY 90

91 To calculate %bS when there are ties, the formula for ρbp can be used on the ranked data. Below is the procedure to obtain the bootstrap SE for ρbg . For other estimates, the method is similar. 1. With replacement take a simple random sample of size n from the original data (x1 , y1 )T , (x2 , y2 )T , . . . , (xn , yn )T to generate a bootstrap sample (x∗1 , y1∗ )T , (x∗2 , y2∗ )T , . . . , (x∗n , yn∗ )T . ∗

2. Calculate ρbg for the bootstrap sample and denote the result by ρbg(1) . ∗

3. Repeat steps 1 and 2 m − 1 times to generate m values of ρbg(i) , i = 1, 2, . . . , m. 4. Calculate the bootstrap standard error for ρbg as v !2 u m m u 1 X X ∗ ∗ SEBoot (b ρg ) = t ρbg(i) − m−1 ρbg(i) . m − 1 i=1 i=1 This procedure is used to estimate the SE for each of ρbp , ρbg , ρbq , ρbp , and ρbk . For the estimates in Chapter 4 the number of bootstrap resamples is m = 200.

Appendix B R–Code The language R (versions 2.3.1–2.4.1) is used to perform all the simulations. R is available to download at www.r-project.org.

B.1

Generation of Random Bivariate Samples

A pseudo–random number generator in R for generating multivariate T and multivariate normal vectors is included in the package mvtnorm written by Hothorn, Bretz & Genz (see Hothorn, Bretz & Genz, 2001 and references therein for further details). This package also includes the calculation of multivariate probabilities for both the normal and T. The method used to generate n i.i.d. p–variate random normal vectors from Np (µ, Σ) is: 1. Calculate the eigenvalues of the p × p covariance matrix Σ provided by the user. If any are less than zero then stop. 2. Perform a singular value decomposition of Σ such that Σ = U DV T where U and V are orthogonal and D is a diagonal matrix. 3. Calculate R = (V ((U T Dii )T ))T , where Dii is a diagonal matrix of the square root of the diagonal entries of D. 4. Generate np standard random normal numbers and from an n × p X. 5. Compute Y = XR + (µ1T )T , where 1 is an n–vector with all entries equal to 1. Then the n rows of Y form an i.i.d. sample of size n form Np (µ, Σ).

92

93 The method to generate n i.i.d. p–variate T random vectors from a multivariate T with ν degrees of freedom, location vector µ and covariance matrix Σ: 1. Generate n i.i.d. p–variate normal vectors from Np (0, Σ) and denote each as Yi . 2. Generate n χ2ν random numbers, denote each as Si . 3. Calculate Zi =



√ νYi / Si + µ. The Zi , i = 1, 2, . . . , n are the desired i.i.d. sample.

Full details on the coding are available by viewing the code included in the mvtnorm package.

B.2

Estimates for ρ

All estimates for ρ require the bivariate sample X to be in the matrix form (n × 2). All of the following functions with the exception of the one for ρbq are wrappers to the cor( ) function.

B.2.1

Pearson’s Product Moment Correlation Coefficient (b ρp )

rhohat.p

Suggest Documents