Business Statistics
TWO πS OR MEDIANS: COMPARISONS
CONTENTS Comparing two samples Comparing two unrelated samples Comparing the means of two unrelated samples Comparing the medians of two unrelated samples Old exam question
COMPARING TWO SAMPLES It often happens that we want to compare two situations ο§ do I sell more when there is music in my shop? ο§ is the expensive machine more precise than the cheap one? ο§ are adverisements on TV or internet equally profitable? ο§ do people buy more on Tuesdays than on Wednesday? ο§ in couples, who drinks more: the man or the woman? ο§ etc.
COMPARING TWO SAMPLES In all these questions we compare two populations ο§ Situation 1: two populations (or sub-populations) with similar variable ο§ sales in 105 days without music ο§ sales in 96 days with music
ο§ Data matrix: two options
SPSS requires this data presentation
COMPARING TWO SAMPLES ο§ Situation 2: one sample with paired observations ο§ drinks of the man in 78 couples ο§ drinks of the woman in the same 78 couples
ο§ Data matrix: one option only
ο§ Will be discussed in a later lecture
COMPARING TWO UNRELATED SAMPLES Situation 1 ο§ independent samples/unrelated samples ο§ introduce symbols for the two random variables ο§ e.g., using π1 en π2
ο§ π1 with sample π1,1 , π1,2 , β¦ , π1,π1 and π2 with sample π2,1 , π2,2 , β¦ , π2,π2
ο§ or using π and π
ο§ π: π1 , π2 , β¦ , πππ and π: π1 , π2 , β¦ , πππ
ο§ sample sizes can be different
Or of course using βmeaningfulβ indices: ππ΅ and ππΊ for Belgium and Germany. Not π΅ and πΊ, because we need to stress that it is βaboutβ a variable π (like sales)
COMPARING TWO UNRELATED SAMPLES We want to test hypothesis such as ο§ are the means equal?
ο§ π»0 : ππ = ππ or π»0 : π1 = π2 or π»0 : ππ1 = ππ2 or ...
ο§ are the variances equal? ο§ π»0 : ππ2 = ππ2 or etc.
ο§ are the proportions equal ο§ π»0 : ππ = ππ or etc.
Also: ο§ inequalities, like π»0 : ππ β₯ ππ ο§ and non-zero differences, like π»0 : ππ = ππ + 85
COMPARING TWO UNRELATED SAMPLES Context: ο§ sample π1 : sales in π1 = 105 days without music ο§ sample π2 : sales in π2 = 96 days with music General idea: π1 ~ππππππππππππ π1 ο§ οΏ½ π1 = π2 ? π2 ~ππππππππππππ π2
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Assumption (for now!): ο§ π~π ππ ; ππ2 ο§ π~π ππ ; ππ2 ο§ in words: both samples come from normally distributed populations with known variances Question ο§ are ππ and ππ different? ο§ can we test this, on the basis of the (limited) evidence concerning π₯Μ
and π¦οΏ½? ο§ so, can we reject π»0 : ππ = ππ ? To decide ο§ use ποΏ½ β ποΏ½ ~π πποΏ½βποΏ½ , ππ2οΏ½ βποΏ½
COMPARING THE MEANS OF TWO UNRELATED SAMPLES For one sample, we had ποΏ½ β πποΏ½ ~π 0,1 πποΏ½ As it turns out, for two samples, we have ποΏ½ β ποΏ½ β πποΏ½ β πποΏ½ ~π 0,1 πποΏ½βποΏ½ ο§ ο§ ο§ ο§
πποΏ½ β πποΏ½ = ππ β ππ follows from the null hypothesis for instance π»0 : ππ = ππ or π»0 : ππ β ππ = 85 π₯Μ
and π¦οΏ½ are obtained from the data but what is πποΏ½βποΏ½ ?
COMPARING THE MEANS OF TWO UNRELATED SAMPLES For one sample, we had
2 π π 2 πποΏ½ = π As it turns out, for two independent samples, we have ππ2οΏ½ βποΏ½ = ππ2οΏ½ + πποΏ½2 , so
πποΏ½βποΏ½ =
ππ2 ππ2 + ππ ππ
ο§ recall that variances add up when π and π are independent 2 2 ο§ e.g., ππ+π = ππ2 + ππ2 but also ππβπ = ππ2 + ππ2
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Example Context: ο§ do I sell more when there is music in my shop? Experiment ο§ on some days the music is turned on, on other days the music is turned off ο§ you keep track of the sales during each day Data: ο§ sample of sales on days with music (π₯1 , π₯2 , β¦ , π₯105 ) ο§ sample of sales on days without music (π¦1 , π¦2 , β¦ , π¦96 ) Five step procedure
COMPARING THE MEANS OF TWO UNRELATED SAMPLES ο§ Step 1:
ο§ π»0 : ππ = ππ ; π»1 : ππ β ππ ; πΌ = 0.05
ο§ Step 2:
ο§ sample statistic: ποΏ½ β ποΏ½ ο§ reject for βtoo largeβ and βtoo smallβ values
ο§ Step 3:
ο§ null distribution ο§ valid because ...
ο§ Step 4:
ο§ π§ππππ = ο§ π§ππππ =
ο§ Step 5:
ποΏ½βποΏ½ β ππ βππ ππ οΏ½ βπ οΏ½
ο§ reject or not reject because ...
=
ποΏ½βποΏ½ ~π ππ οΏ½ βπ οΏ½
0,1
in a minute we will supply full details and a worked example ...
COMPARING THE MEANS OF TWO UNRELATED SAMPLES ο§ But, wait ...
ο§ ... isnβt it weird to assume that ππ2 and ππ2 are known, while ππ and ππ are not known?
ο§ In reality the population variances will often be unknown as well!
remember we had the same problem in the one-sample case? there we decided to estimate the value of π 2 with the value of π 2 and paid a price of using the wider π‘-distribution here we will do the same: estimate the two π 2 -values with two π 2 -values ο§ and pay the same price: use π‘-dsitribution instead of π§-distribution ο§ ο§ ο§ ο§
COMPARING THE MEANS OF TWO UNRELATED SAMPLES For one sample, we had
ποΏ½ β πποΏ½ ~π‘df πποΏ½ As it turns out, for two samples, we have ποΏ½ β ποΏ½ β πποΏ½ β πποΏ½ ~π‘df πποΏ½βποΏ½ ο§ ο§ ο§ ο§
πποΏ½ β πποΏ½ = ππ β ππ follows from the null hypothesis π₯Μ
and π¦οΏ½ are obtained from the data but what is π ποΏ½βποΏ½ ? and how to choose df?
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Two options for π ποΏ½βποΏ½ : ο§ 1: estimating ππ2 and ππ2 from π π2 and π π2 respectively ο§ 2: assuming ππ2 = ππ2 = π 2 and estimating π 2 as the weighted average of both sample variances Both options lead to a different value of df
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Option 1: ο§ estimating ππ2 and ππ2 from π π2 and π π2 respectively π ποΏ½βποΏ½ =
ο§ testing with π‘-distribution with df = quick rule, but bad approximation: ππ β min ππ β 1, ππ β 1
π π2 π π2 + ππ ππ
2 2 2 π π π π + ππ ππ 2 2 2 2 π π π π ππ ππ
ππ β 1
+
ππ β 1
Compare to πποΏ½βποΏ½ =
ππ2 ππ2 + ππ ππ
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Option 2: ο§ estimating the common π 2 from both samples
ο§ a βweighted meanβ of π π2 and π π2 , the pooled variance π P2
ο§ and
2 2 π β 1 π + π β 1 π π π π π π P2 = ππ β 1 + ππ β 1
π ποΏ½βποΏ½ =
π P2
ππ
+
π P2
ππ
Compare to π ποΏ½βποΏ½ =
ο§ testing with π‘-distribution with df = ππ β 1 + ππ β 1 = ππ + ππ β 2
π π2 π π2 + ππ ππ
COMPARING THE MEANS OF TWO UNRELATED SAMPLES
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Use of SPSS
a data set on Computer Anxiety Rating split by gender
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Results split by gender
Results of π‘-test
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Zoom in
π‘-test with pooled estimate of ππ2 = ππ2
π‘-test with separate estimates of ππ2 and ππ2
value of the π‘-statistic (π‘calc )
degrees of freedom
π-value (2-sided)
COMPARING THE MEANS OF TWO UNRELATED SAMPLES And one more thing ...
tests of the assumption of equal variance π»0 : ππ2 = ππ2 versus π»1 : ππ2 β ππ2
π-value for this test
COMPARING THE MEANS OF TWO UNRELATED SAMPLES For these two tests, we need both ποΏ½ and ποΏ½ to be normally distributed ο§ This means either of the following three: ο§ π and π have normally distributed populations ο§ π has a symmetric distribution and ππ β₯ 15, and the same holds for π ο§ ππ β₯ 30 and ππ β₯ 30
ο§ Very similar to the one-sample case!
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES ο§ Recall the non-parametric one-sample test for the median
ο§ the Wilcoxon signed ranks test ο§ replacing the values by ranks and testing the sum of the positive ranks
ο§ Can we also develop a non-parametric (rank-order) order test for two unrelated samples? ο§ Yes we can: Wilcoxon-Mann-Whitney test
ο§ named after Frank Wilcoxon, Henry Mann, and Donald Whitney ο§ also named Wilcoxon (Mann-Whitney) test, Mann-Whitney test, etc.
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES ο§ Computational steps of the Wilcoxon-Mann-Whitney test
ο§ combine both samples (π and π) ο§ assign ranks to the combined sample ο§ ties get an average rank ο§ sum the ranks of both samples separately (ππ and ππ ) ο§ compare the test statistic ππ (or ππ ) to a critical value from the table
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Example (same as before) ο§ Sample data are collected on the capacity rates (in %) for two factories ο§ factory A, the rates are 71, 82, 77, 94, 88 ο§ factory B, the rates are 85, 82, 92, 97
ο§ Are the median operating rates for two factories the same (at a significance level πΌ = 0.05)?
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Example ο§ data A: π₯π (ππ = 5) ο§ data B: π¦π (ππ = 4) ο§ one case of ties (82) ο§ ππ = 24.5
a tie: observations 3 and 4 are 82, so assign rank 3.5 to facilitate the discussion, we focus on the sample with the smallest sample size
Capacity Factory A
Rank
Factory B
Factory A
71
1
77
2
82
3.5
Factory B
82
3.5
85
5
88
6 92
94
7 8
97 Rank sums:
9 20.5
24.5
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Testing the Wilcoxon-Mann-Whitney π statistic ο§ using a table of critical values ο§ included in tables at exam
ο§ using a normal approximation
ο§ valid for large samples when Wilcoxon-Mann-Whitney table of critical values is not sufficient
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Table of critical values of Wilcoxon statistic
ο§ for ππ₯ = π1 = 4 and ππ¦ = π2 = 5 at πΌ = 0.05: ο§ πlower = 11, πupper = 29 ο§ π
crit = 0,11 βͺ [29,45]
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Conclusion from small sample Wilcoxon-Mann-Whitney test ο§ ππ = 24.5 is between πlower = 11 and πupper = 29 ο§ Therefore, do not reject the null hypothesis (π»0 : ππ = ππ ) at the 5% level ο§ There is not enough evidence to conclude that the medians are different
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Large sample approximation ο§ Under π»0 , it can be shown that ο§ ο§
ππ ππ +ππ +1 πΈ ππ = 2 π π π +π +1 var ππ = π π π π 12
ο§ Further, when ππ β₯ 10 or ππ β₯ 10, we use a normal approximation: ο§ ππ ~π ο§ π=
ππ ππ +ππ +1 2
π π +π +1 ππ β π π π 2
ππ ππ ππ +ππ +1 12
ππ ππ ππ +ππ +1 , 12
~π 0,1
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Large sample approximation (continued) ο§ so you can compute π§calc =
ππ ππ +ππ +1 ππ,calc β 2 ππ ππ ππ +ππ +1 12
ο§ and compare it to π§crit (e.g., Β±1.96)
COMPARING THE MEDIANS OF TWO UNRELATED SAMPLES Use of SPSS
π = 345
π§-score with normal approximation π-value (2-sided)
OLD EXAM QUESTION 21 May 2015, Q2a