10 Hypothesis Testing

10 Hypothesis Testing 10 10.1 Hypothesis Testing Introduction Def 1: A hypothesis is a statement about a population parameter θ. Def 2: The two co...

Author: Phillip Goodwin

6 downloads 5 Views 561KB Size

Report

Download PDF

Recommend Documents

Lecture 10: Hypothesis Testing

Chapter 10 Hypothesis Testing: Categorical Data BINF702 SPRING CHAPTER 10 HYPOTHESIS TESTING: CATEGORICAL DATA 1

Lecture 10: Confidence intervals & Hypothesis testing

Notes 4: Hypothesis Testing: Hypothesis Testing, One Sample Z test, and Hypothesis Testing Errors

Introduction to Hypothesis Testing. Introduction to Hypothesis Testing

Hypothesis Testing. Hypothesis Testing. Example. Example. Chapter 9

Introduction to Hypothesis Testing One-Sample Hypothesis Testing

2) Hypothesis Testing. What is hypothesis testing Standard procedures Examples

Chapter 3: Hypothesis Testing

Comments on Hypothesis Testing

7. Hypothesis testing

Hypothesis Testing for Proportions

Section 7.2: Hypothesis Testing

HYPOTHESIS TESTING: CATEGORICAL DATA

Statistics and Hypothesis Testing

Hypothesis Testing. Lecture 4: Hypothesis Testing. Steps of Hypothesis Testing. Hypothesis test for a single mean I

Introduction to Hypothesis Testing

Hypothesis Testing Flow Chart

Lecture 12. Hypothesis Testing

Part IV. Hypothesis Testing

Statistical Tests (Hypothesis Testing)

Chapter 8: Hypothesis Testing

CHAPTER 3 HYPOTHESIS TESTING

10

Hypothesis Testing

10 10.1

Hypothesis Testing Introduction

Def 1: A hypothesis is a statement about a population parameter θ. Def 2: The two complementary hypotheses in a hypothesis testing problem a are called the null hypothesis and the alternative hypothesis. They are denoted by H0 and Ha , Hypothesis testing problem is a decision making problem. We ue the data to reach one of two possible decisions: - reject the null hypothesis (H0 ) and accept the alternative (Ha ) - accept the null hypothesis (H0 ) and reject the alternative (Ha )

1

10

Hypothesis Testing

2

Motivation Example 1 A political candidate, Jones, claims that he will gain more than 50% of the votes in a city election to be the winner. If we do not believe Jones’s claim, we would like to seek support for its converse “the probability of selecting a voter favoring Jones is less than 0.5”. Suppose that n = 15 voters are randomly selected from the city and the number favoring Jones is recorded. Question: Let p be the proportion of voters favoring Jones. Which claim does the data support: H0 :

p = 0.5 (Jones’s Claim)

Ha :

p < 0.5 (our belief)

This is a problem of hypothesis testing.

10

Hypothesis Testing

3

Motivating Example 2 A medical researcher hypothesizes that a new drug A is more effective than the old drug B in reducing blood pressure. She • randomly selects patients and randomly divides them into two groups • the new drug A is given to the first group of patients • the old drug B is given to the second group of patients • records the treatment effects (responses) to drug A. • records the treatment effects (responses) to drug B. Let µA (or µB ) be the average change in a patient’s blood pressure after taking the drug A (or B). Which statement is supported by data: H0 :

µA = µB . (no difference between two drugs)

Ha :

µA > µB . (new drug A is better)

10

Hypothesis Testing

4

Motivating Example 3 A machine in a factory must be repaired if it produces more than 10% defectives among the large lot of items produced in a day. A random sample of 100 items from the day’s production contains 15 defectives. And the supervisor says that the machine must be repaired. Does the sample evidence support his decision? Question: Let p be the proportion of defectives produced in that day. Which one of the following hypotheses should we accept? H0 :

p = 0.10. (no repair is needed)

Ha :

p > 0.10. (need a repair)

10

Hypothesis Testing

More Examples Hypothesis tests are widely used in practice: • An educator claims that two methods of teaching reading are equally effective. • A political candidate claims that a majority of voters favor his election. • A drug company claims that their new drug is effective on 80% of people suffering from insomnia. These hypotheses can be statistically verified using observed data.

5

10

Hypothesis Testing

6

Court-room Trial (from Wikipedia) A statistical test procedure is comparable to a criminal trial; • A defendant is considered not guilty as long as his/her guilt is not proven. • The prosecutor tries to prove the guilt of the defendant. • Only when there is enough charging evidence, the defendant is convicted. In the start of the procedure, there are two hypotheses: H0 :

“the defendant is not guilty” (null hypothesis)

Ha :

“the defendant is guilty” (alternative hypothesis)

H0 is for the time being accepted, and Ha is the hypothesis one tries to prove. A hypothesis test can be regarded as a judgment of evidence.

10

Hypothesis Testing

Elements of Statistical Tests Any statistical test of hypotheses is composed of four elements: 1. Null hypothesis H0 2. Alternative hypothesis Ha 3. Test statistic • a function of the data, on which the statistical decision is based. 4. Rejection region (RR) • contains the values of the test statistic for which H0 is rejected in favor of Ha . • For a given sample, if the computed value of the test statistic falls in RR, we reject H0 and accept Ha ; if the test statistic value does not fall into RR, we accept H0 and reject Ha .

7

10

Hypothesis Testing

Test Statistic: Jones Election Example Let Y be the number of voters favoring Jones among n = 15 voters. • If Y = 0, which hypothesis is more likely to be true? If Jones is actually favored by more than 50% of the electorate, it is highly improbable to observe Y = 0. Thus we would reject H0 in favor of Ha . What if Y = 1? What if Y = 14? Note that small values of Y support Ha more. So one decision rule could be: if Y ≤ k, then we reject H0 ; otherwise we reject Ha . • Y is the test statistic, • The rejection region (RR) is RR = {y : y ≤ k}.

8

10

Hypothesis Testing

10.2

Type I and Type II Errors

Type I Error is made if H0 is rejected when H0 is true. Let α

= P (making Type I error) = P (reject H0 when H0 is true) = P (value of the test statistic is in RR when H0 is true).

Type II Error is made if H0 is accepted when Ha is true. Let β

= P (making Type II error) = P (accept H0 when H0 is false) = P (value of the test statistic is not in RR when Ha is true).

α and β measures the risk associated with the two possible erroneous decisions resulted from a statistical test.

9

10

Hypothesis Testing

10

Example 10.1: For Jones’s political poll, n = 15 voters were sampled. We wish to test H0 : p = 0.5,

Ha : p < 0.5.

The test statistic Y is the number is sampled voters favoring Jones. Suppose we select RR= {y : y ≤ 2} as the rejection region. Calculate α.

10

Hypothesis Testing

Example 10.2: Suppose that Jones will receive 30% of the votes (p = 0.3). What is the probability β that the sample will erroneously lead us to conclude that H0 is true and that Jones is going to win? (Is the test good in protecting us from concluding that Jones is a winner if in fact he will lose?)

11

10

Hypothesis Testing

Which is More Important: Type I or Type II error • Incorrect decisions cost money, prestige, or time and imply a loss. • Two types of errors are not symmetric. In the courtroom trial example, • The consequence of committing Type I error: convicting an innocent defendant. • The consequence of committing Type II error: acquitting a person who committed the crime. Which is more serious?

12

10

Hypothesis Testing

13

Decision Table for Courtroom Trial Example Table 1: Type I and Type II Errors for Courtroom Trial

Accept H0

H0 is true

H1 is true

(truly not guilty)

(truly is guilty)

Right decision

Wrong decision

(Acquittal)

(Type II Error)

Reject H0

Wrong decision

(Conviction)

(Type I Error)

Right Decision

10

Hypothesis Testing

How to Find Good RR In Jones election example, how to choose k? Ideally, we would like to choose k to reduce α and β.

14

10

Hypothesis Testing

Example 10.3: Suppose that Jones will receive 10% of the votes (p = 0.1). Calculate β. Compare the result with Example 10.2.

15

10

Hypothesis Testing

16

How to Balance α and β Example 10.1 to 10.3 show that the test using RR = {y ≤ 2} guarantee a low risk of making Type I error (α = 0.004), but it does not offer adequate protection against a type II error. What if we enlarge RR into a new rejection region RR*, i.e. RR ⊂ RR∗ ? α∗ β∗

=

P (test statistic is in RR∗ when H0 is true)

≥

P (test statistic is in RR when H0 is true) = α.

= P (test statistic is not in RR∗ when Ha is true) ≤ P (test statistic is not in RR when Ha is true)

Similarly, if we shrink RR, then α will decrease and β will increase. Therefore, α and β are inversely related.

10

Hypothesis Testing

Revisit Jones’s Political Poll Example Example 10.4: For Example 10.1, now assume we use the rejection region RR= {y : y ≤ 5} as the rejection region. Calculate the level α of the test. Calculate β if p = 0.3. Compare with Example 10.1 and 10.2.

17

10

Hypothesis Testing

10.3

18

Common Large-Sample Tests

Consider testing H0 : θ = θ0 ,

vs Ha : θ > θ0 . (upper-tail alternative)

How to perform the test? • Let θˆ be some estimator of θ. • If θˆ is close to θ0 , it seems to be reasonable to accept H0 . • If the true θ is larger than θ0 , then θˆ is more likely to be large. Consequently, large values of θˆ favor rejection of H0 : θ = θ0 , or equivalently, acceptance of Ha : θ > θ0 . So the rejection region is RR = {θˆ > k} for some choice of k.

10

Hypothesis Testing

19

10

Hypothesis Testing

20

How to Choose k • First, choose the level α; a common value is 0.05 or 0.01. • Determine k by fixing the type I error probability equal to α (the level of the test). If H0 is true, θˆ has an approximately normal distribution with mean θ0 and standard error σθˆ. To achieve the level-α test, we choose k as k = θ0 + zα σθˆ, since θˆ − θ0 ˆ > zα ) = α. P (θ > θ0 + zα σθˆ|H0 true) = P ( σθˆ Level-α test for upper-tailed alternatives: ˆ

0 - Test statistic: Z = θ−θ σθˆ - Rejection region: RR = {z > zα }. (upper-tail rejection region)

10

Hypothesis Testing

21

10

Hypothesis Testing

Example 10.5. A vice president in charge of sales claims that their salespeople are averaging no more than 15 sales contacts per week. To check his claim, n = 36 salespeople are selected at random, and the number of contacts made by each is recorded. The mean and variance of the 36 measurements were 17 and 9. Does the evidence contradict the vice president’s claim. Use a test with level α = 0.05.

22

10

Hypothesis Testing

Example 10.6. A machine in a factory must be repaired if it produces more than 10% defectives in a day. A random sample of 100 items from the day’s production contains 15 defectives. And the supervisor says that the machine must be repaired. Does the sample evidence support his decision? Use a test with level 0.01.

23

10

Hypothesis Testing

24

Level-α Test for Lower-tailed Alternatives Consider testing H0 : θ = θ0 ,

vs Ha : θ < θ0 . (lower-tail alternative)

Small values of θˆ favor rejection of H0 : θ = θ0 , or equivalently, acceptance of Ha : θ < θ0 . The test statistic θˆ − θ0 Z= . σθˆ The lower-tail rejection region: RR = {z < −zα }.

10

Hypothesis Testing

25

10

Hypothesis Testing

26

Level-α Test for Two-tailed Alternative Consider testing H0 : θ = θ 0 ,

vs Ha : θ 6= θ0 . (two-tailed alternative)

We reject H0 if θˆ is either much smaller or much larger than θ0 . We use the same test statistic θˆ − θ0 Z= σθˆ The two-tailed rejection region is RR

=

{z > zα/2 or z < −zα/2 }

=

{|z| ≥ zα/2 }.

10

Hypothesis Testing

27

Example 10.7 A psychological study was conducted to compare the reaction times of men and women to a stimulus. Independent random samples of 50 men and 50 women were employed in the experiment. The results are shown Table 10.2. Do the data present sufficient evidence to suggest a difference between true mean reaction times for men and women? Use α = 0.05.

Men

Women

n1 = 50

n2 = 50

y¯1 = 3.6 seconds

y¯2 = 3.8 seconds

s21 = 0.18

s22 = 0.14

10

Hypothesis Testing

28

10

Hypothesis Testing

Which Alternative To Use There are three alternatives to use for a test: - upper-tail alternative Ha : θ > θ0 , - lower-tail alternative Ha : θ < θ0 . - two-tailed alternative Ha : θ 6= θ0 . The answer depends on the hypothesis that we seek to support. • If we are interested only in detecting an increase in the percentage of defectives, we should locate the rejection region in the upper tail of the standard Normal distribution. • If we wish to detect a change in p either above or below p = 0.10, we should locate the rejection region in both tails of the standard normal distribution and employ a two-tailed test. • The two-sided alternative permits us to detect either the case θ > θ0 or the reverse case θ < θ0 ; in either case, H0 is false.

29

10

Hypothesis Testing

10.4

30

Calculate Type II Error Probabilities β

For the test H0 : θ = θ 0

vs

Ha : θ > θ0 ,

we use the rejection region RR = {θˆ : θˆ > k}, where k = θ0 + zα σθˆ for the level-α test. Suppose that the experimenter has in mind that θ = θa (where θa > θ0 ), the probability β of a type II error is β

= P (θˆ is not in RR when Ha is true) = P (θˆ ≤ k when θ = θa ) = P

θˆ − θa k − θa ≤ when θ = θa σθˆ σθˆ

!

If θa is the true value of θ, then (θ − θa )/σθˆ has approximately N(0,1). Consequently, β can be determined by finding a corresponding area under a standard normal curve.

.

10

Hypothesis Testing

Example 10.8. Suppose that the vice president in Example 10.5 want to able able to detect a difference equal to one call in the mean number of customer calls per week. That is, he wishes to test H0 : µ = 15 against Ha : µ = 16. With the data as given in Example 19.5, find β for the test.

31

10

Hypothesis Testing

32

10

Hypothesis Testing

How to Reduce Type II Error With n fixed, the size of β depends on the distance between θ0 and θa • If θa is close to θ0 , the true value of θ id difficult to detect, and the probability of accepting H0 when Ha is true tends to be large. • If θa is far from θ0 , the true value is relatively easy to detect, and β is considerably smaller. For a specified value of θ, the type II error probability β can be made smaller by choosing a large sample size n.

33

10

Hypothesis Testing

10.5

Sample Size Calculation for Z Tests

Suppose we wish to test H0 : µ = µ0 vs Ha : µ > µ0 . If we specify • the desired level of α • the desired level of β (where β is evaluated when µ = µa with µa > µ0 then we can choose the sample size n to reach these desired levels. For the rejection region {Y¯ > k}, we solve the following two equations simultaneously α

β

= P (Y¯ > k when µ = µ0 ) ¯ Y − µ0 k − µ0 √ > √ when µ = µ0 = P (Z > zα ) = P σ/ n σ/ n = P (Y¯ ≤ k when µ = µa ) ¯ Y − µa k − µa √ ≤ √ when µ = µa = P (Z ≤ −zβ ). = P σ/ n σ/ n

34

10

Hypothesis Testing

35

Sample Size for Upper-Tailed α-level Test By simplifying the above two equations, we get k − µ0 √ σ/ n k − µa √ σ/ n

=

zα ,

=

−zβ .

Solving both of the equations, and we have k n

σ = µa − zβ √ , n =

(zα + zβ )2 σ 2 . 2 (µa − µ0 )

Remark: Exactly the same solution would be obtained for a one-tailed alternative Ha : µa < µ0 .

10

Hypothesis Testing

Example 10.9. Suppose that the vice president of Example 10.5 wants to test H0 : µ = 15 against µa = 16 with α = β = 0.05. Find the sample size that will ensure the accuracy. Assume that σ 2 is approximately 9.

36

10

Hypothesis Testing

10.6

P-value

The probability α of a Type I error is also called the significance level, or simply, the level of the test. Some concerns about α are: • Although small values of α are recommended, the actual value of α to use in analysis is somewhat arbitrary. • It is possible that, when two statisticians analyze the same data but reach opposite conclusions: one reject the null hypothesis at the α = 0.05 and the other accept that null hypothesis at the α = 0.01. • The α values of 0.05 or 0.01 are used out of habit or for the sake of convenience rather than as a result of careful consideration of the ramifications of making a Type I error.

37

10

Hypothesis Testing

What is p-value Definition: p-value is the smallest level of significant α for which the observed data indicate the null hypothesis should be rejected. • The smaller the p−value, the more compelling evidence that the null hypothesis should be rejected. • Many scientific journals require researchers to report p-values associated with statistic tests because these values provide the reader more information than the conventional ”rejected or accepted” decision. • The p−value permits each reader to use his or her own choice for α in deciding whether the observed data should lead to rejection of H0 . If p-value ≤ α, we reject H0 (and accept Ha ) at the level α, If p-value > α, we accept H0 (and reject Ha ) at the level α.

38

10

Hypothesis Testing

39

Computing p-value • For the upper-tailed test, use the test statistic W and the RR={w : w ≥ k}. Compute the observed value of test statistic w0 , then p-value = P (W ≥ w0 , when H0 is true). • For the lower-tailed test, use the test statistic W and the RR={w : w ≤ k}. Compute the observed value of test statistic w0 , then p-value = P (W ≤ w0 , when H0 is true). • For the two-tailed test, use the test statistic W and the RR={w : |w| ≥ k}. Compute the observed value of test statistic w0 , then p-value = P (|W | ≥ |w0 |, when H0 is true).

10

Hypothesis Testing

Example 10.11. Find the p-value for the statistical test of Example 10.7.

40

10

Hypothesis Testing

10.7

41

Small Sample t-Test

Assume that Y1 , · · · , Yn denote a random sample size n from a normal distribution with unknown mean µ and unknown σ 2 . If Y¯ and S denote the sample mean and sample standard deviation, respectively, and if H0 : µ = µ0 is true, then Y¯ − µ0 √ T = S/ n has a t distribution with n − 1 df. Similar to Z test, the proper rejection region for the upper-tailed alternative Ha : µ > µ0 is given by RR = {t > tα }, where tα is such that P (T > tα ) = α for T distribution with n − 1 df

10

Hypothesis Testing

42

Small-Sample Test for µ Assumptions: Y1 , · · · , Yn constitute a random sample from a normal distribution with E(Yi ) = µ. H0 : µ = µ0 .    µ > µ0 , (upper-tail alternative). Ha : µ < µ0 , (lower-tail alternative).    µ 6= µ0 , (two-tailed alternative). Test statistic:

Rejection region:

Y¯ − µ0 √ . T = S/ n    t > t α ,   

upper-tail RR

t < −tα ,

lower-tail RR

|t| > tα/2

two-tailed RR.

10

Hypothesis Testing

43

Example 10.12 A manufacturer has developed a new powder and tested it in eight shells. The resulting muzzle velocities were: 3005

2925

2935

2965

2995

3005

2937

2905

Assume muzzle velocities are normally distributed. The manufacturer claims that the new gunpowder produces an average velocity of not less than 3000 feet per second. Do the data provide sufficient evidence to contradict the manufacturer’s claim at the 0.025 significance level?

10

Hypothesis Testing

Example 10.13. What is the p-value associated with the statistical test in Example 10.12?

44

10

Hypothesis Testing

10.8

45

Testing Hypotheses Concerning Variances

Assumptions: Y1 , · · · , Yn constitute a random sample from a normal distribution with unknown mean µ and unknown variance σ 2 . We consider testing H0 : σ 2 = σ02 versus various alternative hypotheses. If H0 is true and σ 2 = σ02 , (n − 1)S 2 χ = σ02 2

has a χ2 distribution with n − 1 df. For the level-α test, the rejection region RR = {χ2 > χ2α }, where P (χ2 > χ2α ) = α.

10

Hypothesis Testing

46

Test of Hypotheses Concerning a Population Variance Assumptions: Y1 , · · · , Yn constitute a random sample from a normal distribution with E(Yi ) = µ, V (Yi ) = σ 2 . Consider testing H0 : σ 2 = σ02 vs  2 2  , (upper-tail alternative). σ > σ  0  Ha : σ 2 < σ02 , (lower-tail alternative).    2 σ 6= σ02 , (two-tailed alternative). Test statistic:

Rejection region:

2 (n − 1)S χ2 = . σ02

 2 2  χ > χ  α, 

upper-tail RR

χ2 < χ21−α ,

lower-tail RR

  

χ2 > χ2α/2 or χ2 < χ21−α/2

two-tailed RR.

10

Hypothesis Testing

47

10

Hypothesis Testing

Example 10.16 A company produces machines engine parts that are supposed to have a diameter variance no larger than .0002 (diameters measured in inches). A random sample of ten parts gave a sample variance of .0003. Test, at the 5% level, H0 : σ 2 = 0.0002 vs Ha : σ 2 > 0.0002. What is the p-value?

48

10

Hypothesis Testing

49

Test for Comparing Variances of Two Populations Assumptions: Y11 , · · · , Y1n1 and Y21 , · · · , Y2n2 are independent random samples from normal distributions with unknown means and variances V ar(Y1i ) = σ12 and V ar(Y2i ) = σ22 . Consider testing H0 : σ12 = σ22 versus Ha : σ12 > σ22 . Use the rejection region S12 RR = { 2 > k}. S2 Recall that the statistic (n1 − 1)S12 (n2 − 1)S22 S12 σ12 F = 2 / 2 = 2 2. σ1 (n1 − 1) σ2 (n2 − 1) S2 σ2 S12 . S22

For the level-α test, the rejection Under the null hypothesis, F = region RR = {F > Fn1 −1,n2 −1,α }, where P (F > Fn1 −1,n2 −1,α ) = α.

10

Hypothesis Testing

Example 10.19. Suppose that we wish to compare the variation of parts (in diameters) produced by the company in Example 10.16 with that produced by a competitor. Recall that the sample variance for our company, based on n = 10 diameters, was s21 = 0.0003. In contrast, the sample variance of the diameter measurements for 20 of the competitor’s parts was s22 = 0.0001. Do the data provide sufficient information to indicate a smaller variation in diameters for the competitor? Test with α = 0.05. Compute the p-value.

50

10

Hypothesis Testing

Duality between Hypothesis-Testing and Confidence Interval • If 100(1 − α)% confidence interval covers θ0 , then we would accept H0 , otherwise accept Ha : θ 6= θ0 . • α−level test of H0 : θ = θ0 versus Ha : θ > θ0 , accept the alternative if θ0 is less than a 100(1 − α)% lower confidence bound for θ. • For a lower-tailed test, accept Ha when θ0 is larger than 100(1 − α)% upper confidence bound for θ.

51

10

Hypothesis Testing

Power of the test power(θ) = 1 − β = P (W in RR when the parameter value isθ).

52

10

Hypothesis Testing

53

Neyman-Pearson Lemma Suppose that we wish to test the simple null hypothesis H0 : θ = θ0 versus simple alternative hypothesis Ha : θ = θa , based on a random sample Y1 , Y2 , · · · , Yn from a distribution with parameter θ. Let L(θ) be the likelihood of the sample. Then, for a given α, the test that maximize the power at θa has a rejection region, RR determined by L(θ0 ) < k. L(θa ) The value of k is chosen so that the test has the desired value for α. Such a test is a most powerful α-level test for H0 versus Ha .

10

Hypothesis Testing

54

Likelihood Ratio tests Define λ by ˆ 0) maxΘ∈Ω0 L(Θ) L(Ω = λ= . ˆ maxΘ∈Ω L(Θ) L(Ω) A likelihood ratio test of H0 : Θ ∈ Ω0 versus Ha : Θ ∈ Ωa employs λ as a test statistics and the RR is determined by λ ≤ k. For large n, −2 ln(λ) has approximately a χ2 distribution with r0 − r df, where r0 and r are the numbers of free parameters that specified by Θ ∈ Ω0 and Θ ∈ Ω respectively.