One-sample inference: Categorical Data

Introduction Confidence intervals Hypothesis tests Conclusion One-sample inference: Categorical Data Patrick Breheny October 8 Patrick Breheny STA...
Author: Clifton Osborne
2 downloads 2 Views 219KB Size
Introduction Confidence intervals Hypothesis tests Conclusion

One-sample inference: Categorical Data Patrick Breheny

October 8

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

One-sample vs. two-sample studies

A common research design is to obtain two groups of people and look for differences between them We will learn how to analyze these types of two-group, or two-sample studies in a few weeks We are going to start, however, with a simpler case: the one-sample study

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

One-sample inference

For example, a researcher collects a random sample of individuals, measures their heights, and wants to make a generalization about the heights in the population Or a researcher collects a random sample of individuals, determines whether or not they smoke, and wants to make inferences about the percentage of the population that smokes These are examples of one-sample inference problems – the first involving continuous data, the second involving categorical data

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

One-sample inference: categorical data

Today’s topic is inference for one-sample categorical data The object of such inference is percentages: What percent of patients survive surgery? What percent of women develop breast cancer? What percent of people who do better on one therapy than another?

Investigators see one percentage in their sample, but what does that tell them about the population percentage? In short, how accurate are percentages?

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

The normal approximation

A percentage is a kind of average – the average number of times an event occurs per opportunity Thus, one approach is to use the central limit theorem, which tells us that: The expected value of the sample percentage is the population percentage The standard error of the sample average is equal to the population standard deviation divided by the square root of n The shape of the sampling distribution is approximately normal (how accurate this is depends on n)

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

The normal approximation (cont’d) Statisticians often use p to represent the population proportion, and pˆ to represent the sample proportion Thus, if we observe pˆ in our sample, the central limit theorem suggests that pˆ is a good estimate of p If pˆ is a goodpestimate of the population percentage, then it follows that pˆ(1 − pˆ) is a good estimate of the population standard deviation Continuing, a good estimate for the SE is r pˆ(1 − pˆ) SE = n

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

The probability that p and pˆ are close

If the probability that pˆ is within 1 standard error of p is 68%, what is the probability that p is within 1 standard error of pˆ? Also 68%; it’s the same thing, just worded differently Therefore, if p plus or minus 1.96 standard errors has a 95% chance of containing pˆ, then pˆ plus or minus 1.96 standard errors has a 95% chance of containing p

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

The form of confidence intervals

Thus, x% confidence intervals look like: (ˆ p − zx% SE, pˆ + zx% SE) where zx% contains the middle x% of the standard normal distribution For 95% confidence intervals, then, z is always 1.96

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

Procedure for finding confidence intervals

To sum up, the central limit theorem tells us that we can create x% confidence intervals by: p

#1 Calculate the standard error: SE = pˆ(1 − pˆ)/n #2 Determine the values of the normal distribution that contain the middle x% of the data; denote these values ±zx% #3 Calculate the confidence interval: (ˆ p − zx% SE, pˆ + zx% SE)

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

Example: Survival of premature infants

In order to estimate the survival chances of infants born prematurely, researchers at Johns Hopkins surveyed the records of all premature babies born at their hospital in a three-year period They found 39 babies who were born at 25 weeks gestation, 31 of which survived at least 6 months Their best estimate (point estimate) is that 31/39 = 79.5% of all babies (in other hospitals, in future years) born at 25 weeks gestation would survive at least 6 months, but how accurate is that percentage?

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

Example: Survival of premature infants (cont’d) The standard error of the percentage is r .795(1 − .795) SE = 39 = 0.0647 So, one way of expressing the accuracy of the estimated percentage is: 79.5% ± 6.5% (this would be about a 68% confidence interval) Another way wold be to calculate the 95% confidence interval: (79.5 − 1.96(6.47), 79.5 + 1.96(6.47)) = (66.8%, 92.2%)

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

Problems with the normal approximation That approach works pretty well, but if you think about it, the distribution our data isn’t normal – it’s binomial The normal approximation works because the binomial distribution looks a lot like the normal distribution when n is large and p isn’t close to 0 or 1 Other times, the normal approximation doesn’t work as well n=15, p=0.95

0.3 0.2

Probability

0.10

0.1

0.05

0.0

0.00

Probability

0.4

0.15

n=39, p=0.8

20

22

24

26

28

30

32

34

36

Patrick Breheny

38

10

11

STA 580: Biostatistics I

12

13

14

15

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

Example: Survival of premature infants, part II

In their study, the Johns Hopkins researchers also found 29 infants born at 22 weeks gestation, none of which survived 6 months The normal approximation is clearly not going to work here, for two reasons: The estimated standard deviation will be 0 Even if it wasn’t, the confidence interval will be symmetric about 0, so half of it would be negative

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

Using the binomial distribution directly

But why settle for an approximation? The number of infants who survive is going to follow a binomial distribution; why not use that directly? It seems pretty obvious that the lower limit of our confidence interval should be 0, but how can we use the binomial distribution to find an upper limit? The upper limit should be a number p such that there would only be a 2.5% probability of observing 0 infants who survive if the probability of surviving really were p

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

0.6 0.4 0.2 0.0

P(0 out of 29 infants survive)

0.8

1.0

Finding the upper limit for p

0.00

0.05

0.10

0.15 p

Patrick Breheny

STA 580: Biostatistics I

0.20

0.25

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

Exact confidence intervals

Thus, the exact confidence interval for the population percentage of infants who survive after being born at 22 weeks is (0%,11.9%) The exact confidence interval for the population percentage of infants who survive after being born at 25 weeks is (63.5%,90.7%) Recall that our approximate confidence interval for the population percentage of infants who survive after being born at 25 weeks was (66.8%, 92.2%)

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

Exact vs. approximate intervals

When n is large and p isn’t close to 0 or 1, it doesn’t really matter whether you choose the approximate or the exact approach The advantage of the approximate approach is that it’s easy to do by hand In comparison, finding exact confidence intervals by hand is quite time-consuming

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Approximate approach Exact approach The big picture

Exact vs. approximate intervals (cont’d) However, we live in an era with computers, which do the work of finding confidence intervals instantly (as we will see in lab) If we can obtain the exact answer easily, there is no reason to settle for the approximate answer That said, in practice, people use and report the approximate approach all the time Possibly, this is because the analyst knew it wouldn’t matter, but more likely, it’s because the analyst learned the approximate approach in their introductory statistics course and doesn’t know any other way to calculate a confidence interval

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

One-sample hypothesis tests

It is relatively rare to have specific hypotheses about population percentages One important exception is the collection of paired samples In a paired sampling design, we collect n pairs of observations and analyze the difference between the pairs

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

Hypothetical example: A sunblock study

Suppose we are conducting a study investigating whether sunblock A is better than sunblock B at preventing sunburns The first design that comes to mind is probably to randomly assign sunblock A to one group and sunblock B to a different group This is nothing wrong with this design, but we can do better

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

Signal and noise Generally speaking, our ability to make generalizations about the population depends on two factors: signal and noise Signal is the magnitude of the difference between the two groups – in the present context, how much better one sunblock is than the other Noise is the variability present in the outcome from all other sources besides the one you’re interested in – in the sunblock experiment, this would include factors like how sunny the day was, how much time the person spent outside, how easily the person burns, etc. Hypothesis tests depend on the ratio of signal to noise – how easily we can distinguish the treatment effect from all other sources of variability Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

Signal to noise ratio

To get a larger signal-to-noise ratio, we must either increase the signal or reduce the variability The signal is usually determined by nature and out of our control Instead, we are going to have to reduce the variability/noise If our sunblock experiment were controlled, we could attempt such steps as forcing all participants to spend an equal amount of time outside, on the same day, in an equally sunny area, etc.

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

Person-to-person variability

But what can be done about person-to-person variability (how easily certain people burn)? A powerful technique for reducing person-to-person variability is pairing For each person, we can apply sunblock A to one of their arms, and sunblock B to the other arm, and as an outcome, look at the difference between the two arms In this experiment, the items that we randomly sample from the population are pairs of arms belonging to the same person

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

Benefits of paired designs

What do we gain from this? As variability goes down, Confidence intervals become narrower Hypothesis tests become more powerful

How much narrower? How much more powerful? This depends on the fraction of the total variability that comes from person-to-person variability

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

More examples

Investigators have come up with all kinds of clever ways to use pairing to cut down on variability: Before-and-after studies Crossover studies Split-plot experiments

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

Pairing in observational studies

Pairing is also widely used in observational studies Twin studies Matched studies

In a matched study, the investigator will pair up (“match”) subjects on the basis of variables such as age, sex, or race, then analyze the difference between the pairs In addition to increasing power, pairing in observational studies also eliminates (some of the) potential confounding variables

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

Cystic fibrosis experiment

You may not have known it at the time, but you have already conducted an exact hypothesis test for paired categorical data in your homework Recall our cystic fibrosis experiment in which each patient took both drug and placebo and the reduction in their lung function (measured by FVC) over a 25-week period was recorded This is a crossover study, an example of a paired design

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

The null hypothesis

The null hypothesis here is that the drug provides no benefit – that whether the patient received drug or placebo has no impact on their lung function Under the null hypothesis, then, the probability that a patient does better on drug than placebo (let’s call this p) is 50% So, another, more compact and mathematical way of writing the null hypothesis, is p0 = .5 (statisticians like to use a subscript 0 to denote the null hypothesis)

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

The sign test

We can test this null hypothesis by using our knowledge that, under the null hypothesis, the number of patients who do better on the drug than placebo (x) will follow a binomial distribution with n = 14 and p = 0.5 This approach to hypothesis testing is called the sign test All we need to do is calculate the p-value (the probability of obtaining results as extreme or more extreme than the one observed in the data, given that the null hypothesis is true)

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

“As extreme or more extreme”

The result observed in the data was that 11 patients did better on the drug But what exactly is meant by “as extreme or more extreme” than 11? It is uncontroversial that 11, 12, 13, and 14 are as extreme or more extreme than 11 But what about 0? Is that more extreme than 11? Under the null, P (11) = 2.2%, while P (0) = .006% So 0 is more extreme than 11, but in a different direction

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

One-sided vs. two-sided tests Potentially, then, we have two different approaches to calculating this p-value: Find the probability that x ≥ 11 Find the probability that x ≥ 11 ∪ x ≤ 3 (the number that is as far away from the expected value of 7 as 11 is, but in the other direction)

These are both reasonable things to do, and intelligent people have argued both sides of the debate However, the statistical and scientific community has for the most part come down in favor of the latter – the so called “two-sided test” For this class, all of our tests will be two-sided tests

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

The sign test Thus, the p-value of the sign test is p = P (x ≤ 3) + P (x ≥ 11) = P (x = 0) + · · · + P (x = 3) + P (x = 11) + · · · + P (x = 14) = .006% + .09% + .6% + 2.2% + 2.2% + .6% + .09% + .006% = 5.7% One might call this result “borderline significant” – it isn’t below .05, but it’s close These results suggest that the drug has potential, but with a sample size of only 14, it’s hard to say for sure

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

Introduction Thinking about the sign test, what enabled us to calculate the p-value? How were we able to attach a specific number to the probability that x would take on certain values? We were able to do this because we knew that, under the null, x followed a specific distribution (in that case, the binomial) This is the most common strategy for developing hypothesis tests – to calculate from the data a quantity for which we know its distribution under the null hypothesis Note that in general, we would not know the distribution of the number of patients who do better on drug than placebo – only under the null hypothesis

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

Test statistics

This quantity that we know the distribution of under the null hypothesis is called a test statistic Because we can calculate the test statistic from the data, and because we know its distribution under the null hypothesis, we can calculate the probability of obtaining a result as extreme or more extreme than the observed result (the p-value)

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

The z test statistic As we did before with confidence intervals, we can use the central limit theorem for this problem, now to create a test statistic From the central limit theorem, we know that z, the number of standard errors away from p that pˆ falls, follows (approximately) a standard normal distribution Our test statistic, then is z=

pˆ − p0 SE

Having calculated z, we can get p-values from the standard normal distribution This approach to hypothesis testing is called the z-test Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

The standard error

What about the standard error? Under the null, the population standard deviation is p p0 (1 − p0 ), which means that, under the null, r p0 (1 − p0 ) SE = n

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

Procedure for a z-test

The procedure for a z-test is then: p #1 Calculate the standard error: SE = p0 (1 − p0 )/n #2 Calculate the test statistic z = (ˆ p − p0 )/SE #3 Calculate the area under the normal curve outside ±z

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

The z-test for the cystic fibrosis experiment

For the cystic fibrosis experiment, p0 = 0.5 Therefore, r

p0 (1 − p0 ) n r 0.5(0.5) = 14 = .134

SE =

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Paired samples The sign test The z-test

The z-test for the cystic fibrosis experiment (cont’d)

The test statistic is therefore pˆ − p0 SE .786 − .5 = .134 = 2.14

z=

The p-value of this test is therefore 2(1.6%) = 3.2%

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Confidence intervals can produce hypothesis tests

It may not be obvious, but there is a close connection between confidence intervals and hypothesis tests For example, suppose our hypothesis test was to construct a 95% confidence interval and then reject the null hypothesis if p0 was outside the interval It turns out that this is exactly the same as conducting a hypothesis test with α = 5%

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Hypothesis tests can produce confidence intervals

Alternatively, suppose we formed a collection of all the values of p0 for which the p-value of our hypothesis test was above 5% This would form a 95% confidence interval for p Note, then, that there is a correspondence between hypothesis testing at significance level α and confidence intervals with confidence level 1 − α It turns out that the z-test corresponds to the approximate interval, and that the sign test corresponds to the exact interval

Patrick Breheny

STA 580: Biostatistics I

Introduction Confidence intervals Hypothesis tests Conclusion

Conclusion In general, then, confidence levels and hypothesis tests always lead to the same conclusion This is a good thing – it would be confusing otherwise Furthermore, this is not just true of confidence intervals for one-sample categorical data; it is generally true of all confidence intervals and hypothesis tests However, the information provided by each technique is different: the confidence interval is an attempt to estimate a parameter, while the hypothesis test is an attempt to measure the evidence against the hypothesis that the parameter is equal to a certain, specific number

Patrick Breheny

STA 580: Biostatistics I