Business Statistics COMPARISONS

Business Statistics TWO 𝜇S OR MEDIANS: COMPARISONS CONTENTS Comparing two samples Comparing two unrelated samples Comparing the means of two unrela...

Author: Victor Harrell

1 downloads 0 Views 580KB Size

Report

Download PDF

Recommend Documents

International comparisons of criminal justice statistics 2001

BUSINESS MATHEMATICS & STATISTICS (MEAN)

BUSINESS STATISTICS Six Lectures on Statistics

Introductory Business Statistics

New gtld Statistics & Business Implications

SINGAPORE BUSINESS FORMATION STATISTICS REPORT

An Introduction to Business Statistics

METHODOLOGICAL GUIDELINES STRUCTURAL BUSINESS STATISTICS

12. Descriptive Statistics. Applied Statistics in Business & Economics, 4 th edition. Descriptive Statistics. Descriptive Statistics

EXAM 2 ECON2110: BUSINESS STATISTICS II

Electronic Business Questionnaires at Statistics Netherlands

Statistics for Decision- Making in Business

EMPLOYMENT STATISTICS. Zarb School of Business - Undergraduate

Introducing the new Business Demography statistics

Introduction to Business Statistics. Chapter 4

Department of Econometrics and Business Statistics

Level 2 Certificate in Business Statistics

B.Com. (Part-II) - Accountancy & Business Statistics

International comparisons

COMPETITIVE COMPARISONS

INTRODUCTION Letter from Don Lough, Jr. ABOUT Statistics & Comparisons What to Expect Study Life Ministry

SKULL COMPARISONS: SUMMARY

Some Performance Comparisons

INTERNATIONAL COMPARISONS CONFERENCE

Business Statistics

TWO 𝜇S OR MEDIANS: COMPARISONS

CONTENTS Comparing two samples Comparing two unrelated samples Comparing the means of two unrelated samples Comparing the medians of two unrelated samples Old exam question

COMPARING TWO SAMPLES It often happens that we want to compare two situations  do I sell more when there is music in my shop?  is the expensive machine more precise than the cheap one?  are adverisements on TV or internet equally profitable?  do people buy more on Tuesdays than on Wednesday?  in couples, who drinks more: the man or the woman?  etc.

COMPARING TWO SAMPLES In all these questions we compare two populations  Situation 1: two populations (or sub-populations) with similar variable  sales in 105 days without music  sales in 96 days with music

 Data matrix: two options

SPSS requires this data presentation

COMPARING TWO SAMPLES  Situation 2: one sample with paired observations  drinks of the man in 78 couples  drinks of the woman in the same 78 couples

 Data matrix: one option only

 Will be discussed in a later lecture

COMPARING TWO UNRELATED SAMPLES Situation 1  independent samples/unrelated samples  introduce symbols for the two random variables  e.g., using 𝑋1 en 𝑋2

 𝑋1 with sample 𝑋1,1 , 𝑋1,2 , … , 𝑋1,𝑛1 and 𝑋2 with sample 𝑋2,1 , 𝑋2,2 , … , 𝑋2,𝑛2

 or using 𝑋 and 𝑌

 𝑋: 𝑋1 , 𝑋2 , … , 𝑋𝑛𝑋 and 𝑌: 𝑌1 , 𝑌2 , … , 𝑌𝑛𝑌

 sample sizes can be different

Or of course using “meaningful” indices: 𝑋𝐵 and 𝑋𝐺 for Belgium and Germany. Not 𝐵 and 𝐺, because we need to stress that it is “about” a variable 𝑋 (like sales)

COMPARING TWO UNRELATED SAMPLES We want to test hypothesis such as  are the means equal?

 𝐻0 : 𝜇𝑋 = 𝜇𝑌 or 𝐻0 : 𝜇1 = 𝜇2 or 𝐻0 : 𝜇𝑋1 = 𝜇𝑋2 or ...

 are the variances equal?  𝐻0 : 𝜎𝑋2 = 𝜎𝑌2 or etc.

 are the proportions equal  𝐻0 : 𝜋𝑋 = 𝜋𝑌 or etc.

Also:  inequalities, like 𝐻0 : 𝜇𝑋 ≥ 𝜇𝑌  and non-zero differences, like 𝐻0 : 𝜇𝑋 = 𝜇𝑌 + 85

COMPARING TWO UNRELATED SAMPLES Context:  sample 𝑋1 : sales in 𝑛1 = 105 days without music  sample 𝑋2 : sales in 𝑛2 = 96 days with music General idea: 𝑋1 ~𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑜𝑜 𝜃1  � 𝜃1 = 𝜃2 ? 𝑋2 ~𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑜𝑜 𝜃2

COMPARING THE MEANS OF TWO UNRELATED SAMPLES Assumption (for now!):  𝑋~𝑁 𝜇𝑋 ; 𝜎𝑋2  𝑌~𝑁 𝜇𝑌 ; 𝜎𝑌2  in words: both samples come from normally distributed populations with known variances Question  are 𝜇𝑋 and 𝜇𝑌 different?  can we test this, on the basis of the (limited) evidence concerning 𝑥̅ and 𝑦�?  so, can we reject 𝐻0 : 𝜇𝑋 = 𝜇𝑌 ? To decide  use 𝑋� − 𝑌� ~𝑁 𝜇𝑋�−𝑌� , 𝜎𝑋2� −𝑌�

COMPARING THE MEANS OF TWO UNRELATED SAMPLES For one sample, we had 𝑋� − 𝜇𝑋� ~𝑁 0,1 𝜎𝑋� As it turns out, for two samples, we have 𝑋� − 𝑌� − 𝜇𝑋� − 𝜇𝑌� ~𝑁 0,1 𝜎𝑋�−𝑌�    

𝜇𝑋� − 𝜇𝑌� = 𝜇𝑋 − 𝜇𝑌 follows from the null hypothesis for instance 𝐻0 : 𝜇𝑋 = 𝜇𝑌 or 𝐻0 : 𝜇𝑋 − 𝜇𝑌 = 85 𝑥̅ and 𝑦� are obtained from the data but what is 𝜎𝑋�−𝑌� ?

COMPARING THE MEANS OF TWO UNRELATED SAMPLES For one sample, we had

2 𝜎 𝑋 2 𝜎𝑋� = 𝑛 As it turns out, for two independent samples, we have 𝜎𝑋2� −𝑌� = 𝜎𝑋2� + 𝜎𝑌�2 , so

𝜎𝑋�−𝑌� =

𝜎𝑋2 𝜎𝑌2 + 𝑛𝑋 𝑛𝑌

 recall that variances add up when 𝑋 and 𝑌 are independent 2 2  e.g., 𝜎𝑋+𝑌 = 𝜎𝑋2 + 𝜎𝑌2 but also 𝜎𝑋−𝑌 = 𝜎𝑋2 + 𝜎𝑌2

COMPARING THE MEANS OF TWO UNRELATED SAMPLES Example Context:  do I sell more when there is music in my shop? Experiment  on some days the music is turned on, on other days the music is turned off  you keep track of the sales during each day Data:  sample of sales on days with music (𝑥1 , 𝑥2 , … , 𝑥105 )  sample of sales on days without music (𝑦1 , 𝑦2 , … , 𝑦96 ) Five step procedure

COMPARING THE MEANS OF TWO UNRELATED SAMPLES  Step 1:

 𝐻0 : 𝜇𝑋 = 𝜇𝑌 ; 𝐻1 : 𝜇𝑋 ≠ 𝜇𝑌 ; 𝛼 = 0.05

 Step 2:

 sample statistic: 𝑋� − 𝑌�  reject for “too large” and “too small” values

 Step 3:

 null distribution  valid because ...

 Step 4:

 𝑧𝑐𝑐𝑐𝑐 =  𝑧𝑐𝑐𝑐𝑐 =

 Step 5:

𝑋�−𝑌� − 𝜇𝑋 −𝜇𝑌 𝜎𝑋 � −𝑌 �

 reject or not reject because ...

=

𝑋�−𝑌� ~𝑁 𝜎𝑋 � −𝑌 �

0,1

in a minute we will supply full details and a worked example ...

COMPARING THE MEANS OF TWO UNRELATED SAMPLES  But, wait ...

 ... isn’t it weird to assume that 𝜎𝑋2 and 𝜎𝑌2 are known, while 𝜇𝑋 and 𝜇𝑌 are not known?

 In reality the population variances will often be unknown as well!

remember we had the same problem in the one-sample case? there we decided to estimate the value of 𝜎 2 with the value of 𝑠 2 and paid a price of using the wider 𝑡-distribution here we will do the same: estimate the two 𝜎 2 -values with two 𝑠 2 -values  and pay the same price: use 𝑡-dsitribution instead of 𝑧-distribution    

COMPARING THE MEANS OF TWO UNRELATED SAMPLES For one sample, we had

𝑋� − 𝜇𝑋� ~𝑡df 𝑆𝑋� As it turns out, for two samples, we have 𝑋� − 𝑌� − 𝜇𝑋� − 𝜇𝑌� ~𝑡df 𝑆𝑋�−𝑌�    

𝜇𝑋� − 𝜇𝑌� = 𝜇𝑋 − 𝜇𝑌 follows from the null hypothesis 𝑥̅ and 𝑦� are obtained from the data but what is 𝑠𝑋�−𝑌� ? and how to choose df?

COMPARING THE MEANS OF TWO UNRELATED SAMPLES Two options for 𝑠𝑋�−𝑌� :  1: estimating 𝜎𝑋2 and 𝜎𝑌2 from 𝑠𝑋2 and 𝑠𝑌2 respectively  2: assuming 𝜎𝑋2 = 𝜎𝑌2 = 𝜎 2 and estimating 𝜎 2 as the weighted average of both sample variances Both options lead to a different value of df

COMPARING THE MEANS OF TWO UNRELATED SAMPLES Option 1:  estimating 𝜎𝑋2 and 𝜎𝑌2 from 𝑠𝑋2 and 𝑠𝑌2 respectively 𝑠𝑋�−𝑌� =

 testing with 𝑡-distribution with df = quick rule, but bad approximation: 𝑑𝑑 ≈ min 𝑛𝑋 − 1, 𝑛𝑌 − 1

𝑠𝑋2 𝑠𝑌2 + 𝑛𝑋 𝑛𝑌

2 2 2 𝑠𝑋 𝑠𝑌 + 𝑛𝑋 𝑛𝑌 2 2 2 2 𝑠𝑋 𝑠𝑌 𝑛𝑋 𝑛𝑌

𝑛𝑋 − 1

+

𝑛𝑌 − 1

Compare to 𝜎𝑋�−𝑌� =

𝜎𝑋2 𝜎𝑌2 + 𝑛𝑋 𝑛𝑌

COMPARING THE MEANS OF TWO UNRELATED SAMPLES Option 2:  estimating the common 𝜎 2 from both samples

 a “weighted mean” of 𝑠𝑋2 and 𝑠𝑌2 , the pooled variance 𝑠P2

 and

2 2 𝑛 − 1 𝑠 + 𝑛 − 1 𝑠 𝑋 𝑌 𝑋 𝑌 𝑠P2 = 𝑛𝑋 − 1 + 𝑛𝑌 − 1

𝑠𝑋�−𝑌� =

𝑠P2

𝑛𝑋

+

𝑠P2

𝑛𝑌

Compare to 𝑠𝑋�−𝑌� =

 testing with 𝑡-distribution with df = 𝑛𝑋 − 1 + 𝑛𝑌 − 1 = 𝑛𝑋 + 𝑛𝑌 − 2

𝑠𝑋2 𝑠𝑌2 + 𝑛𝑋 𝑛𝑌

COMPARING THE MEANS OF TWO UNRELATED SAMPLES

COMPARING THE MEANS OF TWO UNRELATED SAMPLES Use of SPSS

a data set on Computer Anxiety Rating split by gender

COMPARING THE MEANS OF TWO UNRELATED SAMPLES Results split by gender

Results of 𝑡-test

COMPARING THE MEANS OF TWO UNRELATED SAMPLES Zoom in

𝑡-test with pooled estimate of 𝜎𝑋2 = 𝜎𝑌2

𝑡-test with separate estimates of 𝜎𝑋2 and 𝜎𝑌2

value of the 𝑡-statistic (𝑡calc )

degrees of freedom

𝑝-value (2-sided)

COMPARING THE MEANS OF TWO UNRELATED SAMPLES And one more thing ...

tests of the assumption of equal variance 𝐻0 : 𝜎𝑋2 = 𝜎𝑌2 versus 𝐻1 : 𝜎𝑋2 ≠ 𝜎𝑌2

𝑝-value for this test

COMPARING THE MEANS OF TWO UNRELATED SAMPLES For these two tests, we need both 𝑋� and 𝑌� to be normally distributed  This means either of the following three:  𝑋 and 𝑌 have normally distributed populations  𝑋 has a symmetric distribution and 𝑛𝑋 ≥ 15, and the same holds for 𝑌  𝑛𝑋 ≥ 30 and 𝑛𝑌 ≥ 30

 Very similar to the one-sample case!

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES  Recall the non-parametric one-sample test for the median

 the Wilcoxon signed ranks test  replacing the values by ranks and testing the sum of the positive ranks

 Can we also develop a non-parametric (rank-order) order test for two unrelated samples?  Yes we can: Wilcoxon-Mann-Whitney test

 named after Frank Wilcoxon, Henry Mann, and Donald Whitney  also named Wilcoxon (Mann-Whitney) test, Mann-Whitney test, etc.

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES  Computational steps of the Wilcoxon-Mann-Whitney test

 combine both samples (𝑋 and 𝑌)  assign ranks to the combined sample  ties get an average rank  sum the ranks of both samples separately (𝑇𝑋 and 𝑇𝑌 )  compare the test statistic 𝑇𝑋 (or 𝑇𝑌 ) to a critical value from the table

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Example (same as before)  Sample data are collected on the capacity rates (in %) for two factories  factory A, the rates are 71, 82, 77, 94, 88  factory B, the rates are 85, 82, 92, 97

 Are the median operating rates for two factories the same (at a significance level 𝛼 = 0.05)?

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Example  data A: 𝑥𝑖 (𝑛𝑋 = 5)  data B: 𝑦𝑖 (𝑛𝑌 = 4)  one case of ties (82)  𝑇𝑌 = 24.5

a tie: observations 3 and 4 are 82, so assign rank 3.5 to facilitate the discussion, we focus on the sample with the smallest sample size

Capacity Factory A

Rank

Factory B

Factory A

71

1

77

2

82

3.5

Factory B

82

3.5

85

5

88

6 92

94

7 8

97 Rank sums:

9 20.5

24.5

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Testing the Wilcoxon-Mann-Whitney 𝑇 statistic  using a table of critical values  included in tables at exam

 using a normal approximation

 valid for large samples when Wilcoxon-Mann-Whitney table of critical values is not sufficient

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Table of critical values of Wilcoxon statistic

 for 𝑛𝑥 = 𝑛1 = 4 and 𝑛𝑦 = 𝑛2 = 5 at 𝛼 = 0.05:  𝑇lower = 11, 𝑇upper = 29  𝑅crit = 0,11 ∪ [29,45]

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Conclusion from small sample Wilcoxon-Mann-Whitney test  𝑇𝑌 = 24.5 is between 𝑇lower = 11 and 𝑇upper = 29  Therefore, do not reject the null hypothesis (𝐻0 : 𝑀𝑋 = 𝑀𝑌 ) at the 5% level  There is not enough evidence to conclude that the medians are different

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Large sample approximation  Under 𝐻0 , it can be shown that  

𝑛𝑌 𝑛𝑋 +𝑛𝑌 +1 𝐸 𝑇𝑌 = 2 𝑛 𝑛 𝑛 +𝑛 +1 var 𝑇𝑌 = 𝑋 𝑌 𝑋 𝑌 12

 Further, when 𝑛𝑋 ≥ 10 or 𝑛𝑌 ≥ 10, we use a normal approximation:  𝑇𝑌 ~𝑁  𝑍=

𝑛𝑋 𝑛𝑋 +𝑛𝑌 +1 2

𝑛 𝑛 +𝑛 +1 𝑇𝑌 − 𝑌 𝑋 𝑌 2

𝑛𝑋 𝑛𝑌 𝑛𝑋 +𝑛𝑌 +1 12

𝑛𝑋 𝑛𝑌 𝑛𝑋 +𝑛𝑌 +1 , 12

~𝑁 0,1

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Large sample approximation (continued)  so you can compute 𝑧calc =

𝑛𝑌 𝑛𝑋 +𝑛𝑌 +1 𝑇𝑌,calc − 2 𝑛𝑋 𝑛𝑌 𝑛𝑋 +𝑛𝑌 +1 12

 and compare it to 𝑧crit (e.g., ±1.96)

COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Use of SPSS

𝑇 = 345

𝑧-score with normal approximation 𝑝-value (2-sided)

OLD EXAM QUESTION 21 May 2015, Q2a