Power of the Test for Population Proportion

Power of the Test for Population Proportion The purpose of the testing the power of a significance test is to see it is possible to reject the null hy...
Author: Frederica Jones
32 downloads 2 Views 181KB Size
Power of the Test for Population Proportion The purpose of the testing the power of a significance test is to see it is possible to reject the null hypothesis given some alternative mean when we sample from our population. Remember when we conduct a significance test we will assume the null hypothesis is correct. We can’t accept the null hypothesis, since it is assumed to be correct. The decision on why we would accept should a hypothesis varies and by doing the homework examples you can begin to understand how we can arrive at such a value. The alternative hypothesis is unknown. When we conduct a test of significance we are trying to find evidence against the null hypothesis. Again, the test does not indicate what the alternative value could be. But we also need to be practical. There are two different types of significance, practical and statistical. When we say a result is statistically significant we are saying that the result observed or an even more extreme observation is very rare assuming that the null hypothesis is correct. Who judges what rare is? Every person must make that determination individually. Obviously many people can come to an agreement on what rare will be for a particular field of study such as medicine. What is practical significance? As the word implies the observed must differ enough from the null hypothesis that the difference would be deemed important. For example, suppose that current statistics shows that candidate A has 35% of the voting public support. Let us pretend for a moment that someone can show that by increasing the number of adds candidate A runs by 30% the support would go up to 37%. Again, I am saying that the observed difference is real. The next question would be, “Do we care?” Is a 2% increase important? If you answer no, then the result is not practically significant. What does this have to do with the power of the test? In order to calculate the power of the test we need to choose an alternative value for the population proportion; keep in mind that we are trying to find evidence against the null value. How do we choose such a value? We will pick the closest value to the null hypothesis that marks the boundary from results deemed importantly different, from a result that is not important. If the true value of p turns out to be in the range here, while we would agree the result is real, yet we would not deem the difference important enough to care.

p0

pa If the population parameter, p, turns out to be one these values in the red, then we would deem those values to be practically different from the value p0.

So in the above picture the value of pa becomes the alternative proportion value we will use to calculate the power.

In order to run the power of a significance test, or how much power a significance test has, we need to know the following. Null hypothesis Alternative hypothesis A significance level; α. And al alternative value for p, which I will identify as pa. I will use the example on page 410 (Stats Data and Models by Velleman 1st edition) concerning Therapeutic Touch. Therapeuctic touch (TT), taught in many schools of nursing, is a therapy in which the practitioner moves her hands near, but not touching, a patient in an attempt to manipulate a “human energy field” (HEF). Therapeutic touch practitioners believe that by adjusting this field they can promote healing. In 1998 the Journal of the American Medical Association published a paper reporting work by a then 9year old girl (L. Rosa, ,E. Rosa, L. Sarner, S. Barrett, “A Close Look at Therapeutic Touch,” JAMA 279(13) [1 April 1998]: 1005-1010). She had performed a simple experiment in which she challenged 15 TT practitioners to detect her unseen hand hovering over their left or right hand (selected by a flip of a coin). The practioners “warmed up” with a period during which they could see the experimenter’s hand, and each said that they could detect the girl’s energy human energy field. In this scenario, since we are trying to prove the practitioners can detect the human energy field, the null hypothesis should state the practioners can not detect the field. The number of tests that were run, were 150, in total for all of the practioners. Just to simplify the situation, let us say there is one practioner, and this person will try and detect the girls energy field 150 times. Since we are assuming that this person is guessing we will say that the expected number of correct guesses is 75. TT practitioners to detect her unseen hand hovering over their left or right hand It terms of proportions we expect p = 0.5, H0, since the person just has two choices, left or right hand. The alternative will state that Ha: p > 0.5, the practitioners are doing better than just guessing. Next we need a significance level, that is we need to have an idea of when we will deem a value to be rare enough to call it statistically significant; this of course will then mean we have provided evidence against the null hypothesis. In the discussion about power on page 420 there is no mention of a significance level, so I will guess that they meant, α to be 5%. Next we need to deem what is an important change with respect to the null value. On page 421 the discourse mentions that certainly if the value p turns out to be 75% this would deemed significant by most people, worthy of attention. The affect size is then 25%, 0.75 – 0.5. So our alternative is pa = 0.75. So now the question the power calculation will answer is, “How likely are we to reject the null hypothesis, H0: p = 0.5, if the real proportion is 75%?

In order to do so we need to find the critical value associated with α = 5%; assuming p is really po. We will use the formula pˆ - p0 Z= p0 (1 − p0 ) n First find the z-score associated with the extreme value of 5%. I find this value is 1.645 (note: our alternative will be to the right of p0, so only values greater than p0 can provide the necessary evidence. Thus the z-score will be positive, 1.645, instead of negative, -1.645. When I solve the above equation for p-hat I get the following formula.

pˆ c = Z

po (1 − po ) + po n

The value of p-hat, in this case, is called a critical value, because this value separates the rejection region from the non-rejection region.

pˆ c = 1.645

0.5(1 − 0.5) + 0.5 150

pˆ c = 0.5671. This value signifies that if our actual sample value exceeds 0.5671, we would find this enough evidence to reject at the 5% significance level. pˆ axis, which contain all possible values of p-hat [0, 1], along with the likelihood (area above) when p = 0.5 5%

0.35

0.4 0

0.4 5

0 .50

0.55

0.6 0

0.6 5

0.5671 Now we can see from the picture below that if the correct value of p is 0.75 it should be quite easy to detect when we collect a sample of 150 trials. In other words when we find a sample proportion pˆ c > 0.5671 we will reject the null hypothesis.

0 .3 5

0 .4 0

0 .4 5

0 .50

0 .5 5

0.6 .60 0 0

0.5671

0.6 .65 5 0

0 .7 0

0 .75

0 .8 0

0 .8 5

0 .9 0

That is our chance of observing P( pˆ c > 0.5671) is nearly 100% if p = 0.75.

P( pˆ c > 0.5671) =

   0.5671- 0.75   PZ >  0.75(1 − 0.75)    150  

= P(Z > -5.17) ≈ 1. The power of this significance test is nearly one, if we assume p = 0.75. In other words our test would easily detect an effect value change of 25%.

Here is something to consider. What occurs to the power when the sample size goes up when dealing with proportions? To answer this question let us look at a different scenario. In 2008 when Hillary Rodham-Clinton makes her run for the Whitehouse, polling organizations will be taking polls all day and all night. Suppose that Hillary is holding the popular vote at 54% against the Republican candidate Dan Quayle for a number of months now; the value has been verified by many polling organizations over 5 months. Her campaign manager will hold off running negative adds unless Hillary’s lead slips below 47%. Now, a controversy arises regarding Hillary, and the campaign manager is worried that the lead may have changed considerably. The manager wonders if the lead has slipped to 47% or lower. He will conduct a significance test at the 1% level. The test that will be run is H0: p = 0.54, Ha: p < 0.54. Below are different values of the critical values and the accompanying graphs depending on the sample size, against the alternative proportion of pa = 0.47.

Sample Size, n 200 400 600 800

Standard deviation, σ 0.0352 0.0249 0.0203 0.0176

Critical ˆc Value, p 0.458 0.482 0.493 0.499

0.36

0 .3 9

0 0.4 .42 2

0.45

0 .4 8

0.51 0 .5 1

0 0.54 .5 4

0 .5 7

0.60

0 .6 3

0 .6 6

0.47 n = 200. 0.54

0.39

0.42

0.45

0.48

0.47

0.51

n = 400.

0.54 0.54

0.54

0.57

0.60

0.63

Notice as n increases the overlap between the two curves decreases (why?) The smaller overlap indicates that whatever the correct value of p is, it should be easier to detect. As n increases so does the power of the significance test.

0.42

0.48

0.45

0.47

0.51

n = 800.

0.54 0.5

0.57

0.60

0.54

The critical value changes as n increases. Since the critical value is calculated assuming the null hypothesis is correct, as n increases the critical value gets closer to po, the null value (keeping the significance level equal). Lets us calculate the power and type II error for the two extreme cases, n = 200, and n = 800. Case 1 – n = 200. H0: p = 0.54, Ha: p < 0.54 α = 1%. First let us calculate the critical value, . pˆ c . We find the corresponding z-score for 1%, to be -2.326 (either look it up in a table, t-table is the easier method, or use a computer, like excel, = normsinv(0.01)) Next, find my critical value using

pˆ c = Z

po (1 − po ) + po n

pˆ c = 0.458. And now we are ready to calculate either the power or a type II error; both occur when the alternative value for p is correct, in this case pa = 0.47. In figure 1, if pˆ c < 0.458, then we say there is enough evidence to reject the null hypothesis, even though there is a 1% chance, in the long run, this decision will not be correct. So 1% of the samples collected will result in a pˆ c below 0.458 even though p = 0.54.

0.42 0.45 0.48 0.51 0.54 0.57 0.60 0.63 0.66

Figure 1 - Assumed value of p is 0.54, null value; po. Red area shows rejection region at

But if p is really 0.47, then the shaded area in figure 2, depicts the chance of rejecting the null hypothesis; power.    0.458- 0.47    P Z < So P( pˆ c < 0.458) =  0.47(1 − 0.47)    0.36 0.39 0.42 0.45 0.48 0.51 0.54 0.57 0.60 200   = P(Z < -0.34) = 0.3669 Power.

Figure 2 - Assumed value of p is 0.47, alternative value; pa. Red area shows rejection region, power of the test.

So this says if p is really 0.47, when we run the significance test, with the following parameters, H0: p = 0.54, Ha: p < 0.54 α = 1% , n = 200, in the long run, when we gather a sample of 200, and calculate pˆ , 36.69% of the time this value will be less than 0.458 (the rejection region for the null hypothesis). Therefore, if we believe p = 0.47 we should not run the significance test, because we are not likely to detect it. The probability of type II is 1 – 0.3669 which says we will most likely make this type of error.

Case II – n = 800. H0: p = 0.54, Ha: p < 0.54 α = 1%

0.48

0.51

0.54

0.57

0.60

Figure 3 - Assumed value of p is 0.54, null value; po. Red area shows rejection region at 1%. We calculate pˆ c = 0.499. Notice how the rejection region has changed compared to figure 1, larger sample size equates into less variability with the value of pˆ .

For the power calculation,   Z < P So P( pˆ c < 0.499) =    = P(Z < 1.643)

 0.499- 0.47   0.47(1 − 0.47)   800 

0.42

0.45

0.48

0.51

0.5

Figure 4 - Assumed value of p is 0.47, pa. Red area shows power of the test.

= 0.9498 So this says if p is really 0.47, when we run the significance test, with the following parameters, H0: p = 0.54, Ha: p < 0.54 α = 1% , n = 800, in the long run, when we gather a sample of 800, and calculate pˆ , 94.98% of the time this value will be less than 0.499 (the rejection region for the null hypothesis). Therefore, if we believe p = 0.47 this significance test is sure to find it.

Now if Hillary’s campaign manager decides to run this test, and let us say he comes up with a value of pˆ = 0.52, (which means that we have not found evidence to reject the null hypothesis), we at least know that, whatever the value of p really is, it not likely to be 0.47. What about a type II error? P(type II error) = 1 - P(power of a significance test) The area of the other half of the graph, for the alternative value of p, is the probability of a type II error.

0.42

0.45

0.48

0.51

0.5

Figure 5 - Assumed value of p is 0.47; non shaded area depicts the probability of a type II error.

Suggest Documents