Introduction. Covariance and Correlation. Results from this section

Introduction • Chapters 5/6 -- described properties of random variables • discrete • continuous – Focused on one variable at a time • Doctor visits • ...
Author: Oscar Flynn
0 downloads 0 Views 99KB Size
Introduction • Chapters 5/6 -- described properties of random variables • discrete • continuous – Focused on one variable at a time • Doctor visits • Time to failure • Household income

Covariance and Correlation

1

2

Results from this section • Sometimes, we are interested in examining the properties of two or more RV together • Extend to concepts and definitions from chapter 5 and 6 to include 2 RVs • Will provide basis for analyzing properties of samples means

• Develop two measures of ‘comovement’ – covariance – correlation coefficient

• With these definitions, we can examine properties of linear combinations of random variables 3

4

1

• Sometimes we worry about a weighted average of two or more random variables – Combined SAT score (M+V) – Income from different jobs – Income in household from husband and wife – Returns from different stocks in your portfolio

• All of these outcomes are combinations of two or more RV • Properties of the combination are a function of the individual series, plus their covariance • As in any chapter – when we have a random variable, we want to know – What is its expected value? – What is its variance?

5

Recall some defintions

6

Two Random Variables

• E[x] = expected value of x

• Suppose however you are interested in the probability that two events occur

= Gi xi Pr(xi) = µx (for discrete random variable)

• Var[x] = variance of x = E[(x - µx)2] = F2x

• Pr(A ∩B) = probability A and B happens

Theoretical measures of central tendency and variation, respectively 7

8

2

Covariance • Given two events, A and B are independent if Pr(A|B) = Pr(A) • The realization of B conveys no information about the likelihood A will happen • Now want to measure how much dependence there is between two variables • How much information about A is conveyed in the value of B?

• Covariance: measure of how much do x and y vary together • Defined just like a variance but with two variables

9

10

• If Cov(X,Y)>0 and Y>N, we expect X>n

• Cov(x,y) = E[(x - µx)(y - µy)] = Fxy • Can also be written as

• If Cov(X,Y)N, we expect XN, we know nothing about the likely value of X in relation to n

14

A Word of Caution • A non-zero covariance between two variables means the outcomes are related • This does not however mean they are “causally” related

• When x and y and independent, Cov(X,Y)=0 • Lets look at some pictures 15

16

4

Some notes about expectations • • • • •

E[x] = μx Suppose Y=a+bx We are interested in E[Y] E[Y] = E[a+bx] Expectations of linear combinations equal linear combinations of expectations

• E[Y] = E[a] + E[bX] • The constant a is foxed so E[a] = a • The constant b is fixed (not random), so is comes outside the expectation • E[Y] =a + bE[X] = a + b μx

17

18

Lets look at variance • Var(x) = E[(X-E(x))2] = σ2x • Var(x) = E[(X- μx)2] = σ2x

• Y = a+bX • μ y = a + b μx

• Y=a+bX

• Var(Y) = E[(a+bX – a + bμx)2] • = E[(bX-bμx)2] = E[b2(X- μx)2]

• Var(Y) = E[(Y- μy)2] • Substitute in the definitions of Y and μy

• Again, the b is a constant, so it can be taken outside the expectation 19

20

5

Problem with covariance: scale dependent • Var(Y) = E[b2(X- μx)2] = b2 E[(X- μx)2] • = b2 • Note that

• Cov(x,y) = E[(x - µx)(y - µy)] = Fxy • x is dollars, y is education • Compare this to a covariance when income is measured in cents

• Var(x) = E[(X- μx)2] = σ2x • Var(y) = b2σ2x 21

22

• cov(z,y) = E[(z - µz)(y - µy)]

• z is cents

= E[(100x - 100µx)(y - µy)]

• z=100*x, y is education

= 100 E[(x - µx)(y - µy)]

• E[z] = E[100x] = 100X[x] =µx = µz

= 100Fxy By changing the scale of x, we have changed the scale of the covariance 23

24

6

Scale Independence: Correlation Coefficient Consider ρzy where z=100x

• corr(x,y) = Dxy = cov(x,y)/FxFy •

ρzy = cov(z,y)/(σzσy)

-1 # Dxy # 1

σz = 100σx cov(z,y) = 100σxy ρzy = σzy/σzσy = 100σxy/100σx σy = σxy/σx σy 25

How to Estimate a Correlation Coefficient • • • • • • •

26

In PC-SAS

n observations of x and y (xi, yi) n = sample mean of x N= sample mean of y s2x = [Gi(xi - n)]/(n-1) s2y = [Gi (yi - N)]/(n-1) Jxy = Gi(xi - n)(yi - N)/(n-1) Kxy = Jxy /sxsy

proc corr data=two; var earn age; run; Lets look at some pictures

27

28

7

Linear combination of two random variables • Var(z) = E[(z - µz)2] = E[(x + y - µx - µy)2 = E[((x - µx) + (y - µy))2] = E[(x - µx)2 + (y - µy)2 + 2(x - µx)(y - µy)] = E[(x - µx)2] + E[(y - µy)2] + E[ 2(x - µx)(y - µy)]

• Let z=x+y • E[z] = E[x + y] = µx + µy • Var(z) = E[(z - µz)2] • Add in the definition of Z • Expand the quadratic form 29

• Note:

30

• General model: • z = a + bx + cy

• E[(x - µx)2] = Var(x) = F2x • E[(y - µy)2]= Var(y) = F2y • E[(x - µx)(y - µy)] = Fxy

• E[x] = µx • E[y] = µy

Var(x) = F2x Var(y) = F2y

• E[z] = a + bµx + cµy • Var(z) = b2 F2x + c2 F2y + 2bcFxy

• Therefore: • Var(z) = F2x + F2y + 2 Fxy

= b2 F2x + c2 F2y + 2bc FxFyρxy 31

32

8

Example 1: SAT Scores • M = math score and V = verbal score • T = total = M + V • µm = 520 • µv = 480

F2m = 120 Fm = 10.95 F2v = 143 Fv = 11.95

• T=M+V • a = 0, b = 1, c = 1 • E[T] = a + b µm + cµv • E[T] = µm + µv = 520 + 480 = 1000

• Cov(M,V) = 66 33

34

Extension • F2t = b2F2m + c2F2v + 2bcFmv

• What is correlation coefficient between M and V? • F2m = 120 Fm = 10.95 • F2v = 143 Fv = 11.95 • Cov(M,V) = 66 • Cov(M,V) = FxFyρxy • ρxy = Cov(M,V)/FxFy = 66/(10.95*11.95) = 0.504

F2t = Var(T) = F2m + F2v + 2Fmv = 120 + 143 + 2(66) = 395 Ft = 19.87 35

36

9

Example 2: SAT Prep Course

Example 2: (continued) E[D] = a + b µ2 + c µ1

• T1 = total SAT before test prep course • T2 = SAT after test prep • D = difference in scores = T2 - T1 • µ1 = 1000 • µ2 = 1111

F21 F22

= 395 = 399

a=0

F1 = 19.87 F2 = 11.97

• Cov(T1 ,T2) = 280 • What is the distribution for the gain in SAT scores after taking a prep course?

b=1

c = -1

E[D] = µ2 - µ1 = 1111 - 1000 = 111

37

38

Example 3: Portfolio Returns F2d = Var(D) = b2 F22 + c2 F21 + 2bcF12

• s is return on stocks • b is return on bonds • 75% of portfolio in stocks and 25% in bonds • r = portfolio return = 0.75s + 0.25b

= Var(D) = F22 + F21 - 2F12 = 395 + 399 - 2(280) = 234

• µs = 12% • µb = 6%

Fd = 15.3 39

F2s = 16.12 F2b = 4.49

Fs = 4.01 Fb = 2.12 40

10

• ρsb = 0.26 • a = 0 b = 0.75 c= 0.25 • E[r] = 0.75µs + 0.25µb = 0.75(12) + 0.25(6) = 10.5 •

F2r = Var(r) = .752 F2s + 0.252 F2b + 2(.25)(.75)FsFbρsb = 0.752(4.01)2 + 0.252(2.12)2 + 2(0.75)(0.25)(4.01)(2.12)(0.26) =9.05 + 5.03 + 0.82 = 14.9 Fr = 3.86 41

Example 4: Admission to Kentucky State University

42

Admission to Kentucky State

• Admission of Freshmen: Academic Criteria. Favorable consideration for admission will be given to accredited secondary school graduates whose college ability test scores and high school grades give promise of success in college. Secondary school students planning to apply for admission to KSU should emphasize the following school courses: English, mathematics, history, and science. They must also meet the University general admission requirements.

• Index of 430 or greater. Kentucky State University requires all admitted students to meet an admission index in order to be admitted unconditionally to the University. The index was established to quantify an assessment of a student’s high school activities and ACT assessment. The admissions index is a numerical score calculated by multiplying the ACT by 10, the grade-point average by 100, and by adding the two sums. The equation is as follows: ACT x 10 + GPA x 100 = index.

43

44

11

• µI = 10*20 + 100*2.75 = 201 + 275 = 475 • F2I = 100*5.62 + 10,000*1.22

• A = ACT and G=GPA • µA = 20 • µG = 2.75 • ρAG = 0.45

FA = 5.6 FG = 1.2

+ 2(10)(100)(5.6)(1.2)(0.45

• F2I = 22,509 • FI = 150.3 • Suppose I is normally distributed • N[475, 22,509]. What fraction are eligible for enrollment to Kentucky State?

• Index = I = 10A + 100G • a=0, b=10, c=100

45

46

• Pr(Not admitted) = Pr[I≤430] • =Pr[z ≤ (430 – 475)/150] • =Pr[z ≤ -0.3] = 0.3821 • Pr(admitted) = 1 – Pr(not admitted) = 1 - Pr[I≤430] = 1 – 0.3821 = 0.6179 47

12

Suggest Documents