Statistics Lecture 13 Inference for a Population Mean

The government claims that students earn an average of $4500 during their summer break from studies. A random sample of students gave a sample average...
Author: Dorcas Glenn
12 downloads 1 Views 576KB Size
The government claims that students earn an average of $4500 during their summer break from studies. A random sample of students gave a sample average of $3975 and a 95% CI was found to be ($3525,$4425). This interval is interpreted to mean that: 1. If the study were to repeated many times, there is a 95% probability that the true population mean summer earnings is not $4500 as the government claims. 2. Because our specific CI does not contain the value $4500 there is a 95% probability that the true population mean summer earnings is not $4500. 3. If we were to repeat our survey many times, then about 95% of all the CI will contain the value $4500. 4. If we repeat our survey many times, then about 95% of our confidence intervals will contain the true population value of the mean earnings of students.

Statistics 111 - Lecture 13 Inference for a Population Mean Two-sample Tests for difference in means

1

Comparing Two Samples • Up to now, we have looked at inference for one sample • Our next focus in this course is comparing the data from two different samples • For now, we will assume that these two different samples are independent of each other and come from two distinct populations

Population 1:1 , 1

Sample 1:

Population 2:2 , 2

, s1

Sample 2:

, s2

Blackout Baby Boom Revisited • Nine months (Monday, August 8th) after Nov 1965 blackout, NY Times claimed an increased birth rate • Already looked at single two-week sample: found no significant difference from usual rate (430 births/day) • What if we instead look at difference between weekends and weekdays? Sun Mon

Tue

Wed Thu

Fri

Sat

452

470

431

448

467

377

344

449

440

457

471

463

405

377

453

499

461

442

444

415

356

470

519

443

449

418

394

399

451

468

432

Weekdays

Weekends

2

Two-Sample Z test • We want to test the null hypothesis that the two populations have different means • H0: 1 = 2 or equivalently, 1 - 2 = 0 • Two-sided alternative hypothesis: 1 - 2  0

• If we assume our population SDs 1 and 2 are known, we can calculate a two-sample Z statistic:

• We can then calculate a p-value from this Z statistic using the standard normal distribution

Two-Sample Z test for Blackout Data • To use Z test, we need to assume that our pop. SDs are known: 1 = 21.7 and 2 = 24.5

• We can then calculate a two-sided p-value for Z=7.5 using the standard normal distribution • From normal table, P(Z > 7.5) is less than 0.0002, so our p-value = 2  P(Z > 7.5) is less than 0.0004 • We reject the null hypothesis at -level of 0.05 and conclude there is a significant difference between birth rates on weekends and weekdays

3

Two-Sample t test with unknown variances • We still want to test the null hypothesis that the two populations have equal means (H0: 1 - 2 = 0) • If 1 and 2 are unknown, then we need to use the sample SDs s1 and s2 instead, which gives us the two-sample T statistic:

• The p-value is calculated using the t distribution, but what degrees of freedom do we use? • df can be complicated and often is calculated by software • Simpler and more conservative: set degrees of freedom equal to the smaller of (n1-1) or (n2-1)

Two-Sample t test for Blackout Data • To use t test, we need to use our sample standard deviations s1 = 21.7 and s2 = 24.5

• We need to look up the tail probabilities using the t distribution • Degrees of freedom is the smaller of n1-1 = 22 or n2-1 = 7

4

Two-Sample t test for Blackout Data • From t-table with df = 7, we see that P(T > 7.5) < 0.0005 • If our alternative hypothesis is two-sided, then we know that our p-value < 2  0.0005 = 0.001 • We reject the null hypothesis at -level of 0.05 and conclude there is a significant difference between birth rates on weekends and weekdays • Same result as Z-test, but we are a little more conservative

5

Two-Sample Confidence Intervals • In addition to two sample t-tests, we can also use the t distribution to construct confidence intervals for the mean difference • When 1 and 2 are unknown, we can form the following 100·C% confidence interval for the mean difference 1 - 2

• The critical value tk* is calculated from a t distribution with degrees of freedom k • k is equal to the smaller of (n1-1) and (n2-1)

Confidence Interval for Blackout Data • We can calculate a 95% confidence interval for the mean difference between birth rates on weekdays and weekends:

• We get our critical value tk* = 2.365 is calculated from a t distribution with 7 degrees of freedom, so our 95% confidence interval is: • Since zero is not contained in this interval, we know the difference is statistically significant!

6

Confidence Interval for Blackout Data • We can calculate a 95% confidence interval for the mean difference between birth rates on weekdays and weekends:

• We get our critical value tk* = 2.365 is calculated from a t distribution with 7 degrees of freedom, so our 95% confidence interval is: • Since zero is not contained in this interval, we know the difference is statistically significant!

Two-Sample t test with unknown variances • One more alternative: Suppose we are comparing two populations that have different means but the same standard deviations.

• We want to infer about the difference between the means when the standard deviation is unknown. • We are assuming that both populations have the same standard deviation but we have two estimates S12 and S 22 (the two samples standard deviations). • The best way to combine theses two estimates to give a more informative estimator. The pooled estimator of the variance is

s 2p 

(n1  1)  s12  (n2  1)  s22 n1  n2  2

7

Two-Sample t test with unknown variances The test statistics that should be used in this situation is T

( x1  x2 )  0 1 1 s 2p     n1 n2 

Calculate the P-value by using the t distribution with ( n1  n2  2) degrees of freedom and then compare it to the appropriate significance level Alternatively if we are testing a two-sided hypothesis we can construct the appropriate CI:    ( x  x )  t *  s 2  1  1  , ( x  x )  t *  s 2  1  1   1 2 p 1 2 p     n1 n2   n1 n2   

Matched Pairs • Sometimes the two samples that are being compared are matched pairs (not independent) • Example: Sentences for crack versus powder cocaine • We could test for the mean difference between X1 = crack sentences and X2 = powder sentences • However, we realize that these data are paired: each row of sentences have a matching quantity of cocaine • Our t-test for two independent samples ignores this relationship

8

Matched Pairs Test • First, calculate the difference d = X1 - X2 for each pair • Then, calculate the mean and SD of the differences d Sentences Quantity

Crack X1

Powder X2

Difference d = X 1 - X2

5

70.5

12

58.5

25

87.5

18

69.5

100

136

30

106.0

200

169.5

37

132.5

500

211.5

70.5

141.0

2000

264

87.5

176.5

5000

264

136

128.0

50000

264

211.5

52.5

150000

264

264

0.0

Matched Pairs Test • Instead of a two-sample test for the difference between X1 and X2, we do a one-sample test on the difference d • Null hypothesis: mean difference between the two samples is equal to zero H0 : d= 0

versus Ha : d 0

• Usual test statistic when population SD is unknown:

• p-value calculated from t-distribution with df = 8 • P(T > 5.24) < 0.0005 so p-value < 0.001

• Difference between crack and powder sentences is statistically significant at -level of 0.05

9

Matched Pairs Confidence Interval • We can also construct a confidence interval for the mean difference d of matched pairs

• We can just use the confidence intervals we learned for the onesample, unknown  case

• Example: 95% confidence interval for mean difference between crack and powder sentences:

Summary of Two-Sample Tests • Two independent samples with known 1 and 2 • We use two-sample Z-test with p-values calculated using the standard normal distribution

• Two independent samples with unknown 1 and 2 • We use two-sample t-test with p-values calculated using the t distribution with degrees of freedom equal to the smaller of n 1-1 and n2-1

• Two independent samples with unknown 1 and 2 and assume they are equal • We use two-sample t-test with pooled variance estimator. The pvalues is calculated using the t distribution with n1+n2-2 degrees of freedom

• Two samples that are matched pairs • We first calculate the differences for each pair, and then use our usual one-sample t-test on these differences

10

Summary of Two-Sample Tests • JMP! • How to make a one sample t-test like we have learned • How to make two sample t-tests

11

Suggest Documents