The objective is to make inference about (unknown) population proportion p, based on sample proportion pˆ from the data. For example, we may wish to know the proportion (or percentage) p of all potential voters who will support candidate A in an election or support a new policy. The population here can be viewed as binary (discrete): each individual has only two possible outcomes, denoted by Xi = 1 or Xi = 0 (such as “Yes/No”, “happy/unhappy”, “pass/fail”).

Inference for Population Proportion

We can make inference about population proportion p based on the sample proportion pˆ : pˆ = Xn , where X is the total number of cases with “Xi = 1” and n is the total sample size. Note that X ∼ B(n, p), but the distribution of pˆ is unknown for small sample sizes (for large sample sizes, the distribution of pˆ is approximately normal). We can make statistical inference about p in two ways: construct a confidence interval for p, perform a hypothesis testing for p.

Inference for Population Proportion Inference for population proportion p is based on the sampling distribution of the sample proportion pˆ : When np ≥ 10 and n(1 − p) ≥ 10 (i.e., n is large and p is not too close to 0 or 1), we have, approximately, r pˆ ∼ N p,

p(1 − p) n

! ,

Note that E(ˆp) = p and SD(ˆp) =

or

pˆ − p q

∼ N(0, 1).

p(1−p) n

q

p(1−p) n .

The above approximate normal distribution is then used to construct confidence intervals or hypothesis testing for p.

CI for Population Proportion

An approximate (1 − α) × 100% confidence interval (CI) for the population proportion p is r ∗

pˆ − z α 2

pˆ (1 − pˆ ) , n

r ∗

pˆ + z α 2

pˆ (1 − pˆ ) n

! , or pˆ ± z∗α × SE(ˆp), 2

where pˆ is the sample proportion and z∗α/2 is the 1 − α2 percentile of N(0, 1). For example, for a 95% CI, α = 0.05 and z∗α/2 = z∗0.025 = 1.96 ≈ 2. q p) ∗ The term z α pˆ (1−ˆ (or z∗α × SE(ˆp)) is called margin of error. n 2

2

CI for Population Proportion

Interpretation of a 95% CI for p: if we take many samples of size n from the same population and construct a CI for p based on each sample using the same formula, about 95% of all these CI’s will cover (include) the true value p. Similar interpretation for a 99% CI. We call 95% (or 99%) the confidence level. For a given confidence level, the shorter the CI the more accurate the interval estimation for p.

CI for Population Proportion The length (or width) l of the CI for p is r ∗

l = 2z α 2

pˆ (1 − pˆ ) = 2 × marginal of error. n

Thus, the length of the CI depends on the sample size n and confidence level 1 − α. The above formula can be used to compute a desirable sample size n (before collecting data), with a pre-specified length l and confidence level 1 − α, where pˆ may be replaced by a guessed value (e.g., 0.5).

Inference for Population Proportion Consider one-sided hypotheses: H0 : p = p0 (or p ≤ p0 )

versus

Ha : p > p0 ,

where p0 is a known proportion. For example, H0 : p ≤ 0.5 vs Ha : p > 0.5 (here p0 = 0.5). If H0 holds, we have the following approximate sampling distribution for pˆ , when the sample size n is large (i.e., when np ≥ 10 and n(1 − p) ≥ 10) r pˆ ∼ N p0 ,

p0 (1 − p0 ) n

! ,

or

pˆ − p0 q

p0 (1−p0 ) n

∼ N(0, 1).

Inference for Population Proportion The test statistic is given by pˆ − p0 Z=q . p0 (1−p0 ) n

Under H0 , we have the null distribution Z ∼ N(0, 1) approximately for large samples. At significance level α, we reject H0 : p = p0 (or p ≤ p0 ) if the value of the test statistic z > z∗α . So the where z∗α is the 1 − α percentile of N(0, 1). So the rejection region is all z values satisfying {z > z∗α }.

Inference for Population Proportion Alternatively, the p-value is given by p − value = P(Z ≥ z), where Z ∼ N(0, 1) is a random variable and z is the value of the test statistic computed from a data/sample. The p-value is the probability of observing the value z (computed from a dataset) or observing more extreme values (i.e., values larger than z here) if the null hypothesis holds. The smaller the p-value, the stronger the evidence against H0 (so in favor of Ha ).

Inference for Population Proportion

There are another form of one-sided hypotheses H0 : p ≥ p0 versus Ha : p < p0 and two-sided hypotheses H0 : p = p0 versus Ha : p 6= p0 , where p0 is a known proportion. The choice of hypotheses should be determined by the research questions. For the above hypotheses, the formula of the test statistic remains the same. However, the formulas for the rejection regions and p-values need to be modified, depending on the form of Ha (please see lecture notes for details).

General comments about hypothesis testing

General comments about hypothesis testing: It is important to set up the appropriate hypotheses, i.e., one sided or two sided hypotheses. The types of hypotheses are determined by the research questions, and should be set up before you see the data. Usually the alternative hypothesis Ha is what a researcher tries to verify (or something new), while H0 is the complement of Ha .

General comments about hypothesis testing The hypotheses are formulated in terms of population parameters (e.g., population proportion p), not statistics. The test statistic may be viewed as a standardized difference between the sample proportion pˆ and true population proportion p if the null hypothesis H0 holds. The decision for a hypothesis testing problem can be (i) rejecting H0 or not based on a given significance level α; or (ii) any evidence against H0 or not based on a p-value; or reporting both results. Interpret the decision in words in the context of a given application.

Relationship between two-sided test and CI

A 95% CI covers p0 if and only if the two-sided test for H0 : p = p0 vs Ha : p 6= p0 fails to reject H0 at level 5% level. Or, equivalently, the two-sided test for H0 : p = p0 vs Ha : p 6= p0 rejects H0 at level 5% level if and only if the 95% CI does not cover p0 . Notice the correspondence between 95% and 5%. Similar statements hold for 99% or 90% CIs, corresponding to 1% or 10% significance levels. Examples: if a 95% (or 99%) CI for p is (0.1, 0.4), then a test for H0 : p = 0.5 vs Ha : p 6= 0.5 will reject H0 at level 5% (or 1%), or vice versa.

Relationship between significance level α and p-value

Relationship between p-value and significance level α: A test rejects H0 at level α if and only if its p-value is less than α.

Example: if a p-value=0.03, then we reject H0 at 5% level (but not at 1% level). Or, if we reject H0 at 5% level, the p-value must be smaller than 0.05.

Thus, a p-value is more informative than a rejection region, since a rejection region is determined by a specific significance level.

Hypothesis Testing

Since a sample is only a small subset of the population, statistical inference cannot lead to definite (100%) conclusions. That is, errors are inevitable. In hypothesis testing, there are two types of errors: type I and type II errors. Neither errors will be 0. Type I error: we reject H0 when H0 is in fact true. Type II error: we accept H0 when H0 is in fact false (i.e. Ha is true).

Hypothesis Testing

The power of a test is the probability of rejecting H0 when an alternative is true, i.e., the ability to detect an alternative hypothesis we wish to prove. Power = 1− P(type II error). Significance level α = P(type I error). The power depends on (i) the sample size n, (ii) the significance level α, (iii) the type of hypotheses (one-sided or two-sided), and (iv) the population standard deviation. See an example in Chapter 6.

Hypothesis Testing

In hypothesis testing, the test statistic and its null distribution are evaluated under the null hypothesis (i.e., assuming H0 holds). Thus, our statements are usually, “reject the null hypothesis” or “fail to reject the null hypothesis” or “strong evidence against the null hypothesis,” etc. Here, “null hypothesis” cannot be replaced by “alternative hypothesis” when we make a statement about the result of the test.

Hypothesis Testing

Sample size calculation: In designing a study, before collecting data, we first need to choose the sample size n. This is usually based on the desirable power we hope to have (e.g., 80% power), or the desirable length (or margin of error) of a confidence interval. For example, if we wish to have at least 80% power to detect a specific alternative, such as detecting p = 0.6 in testing H0 : p ≤ 0.5 versus Ha : p > 0.5, how large should the sample size n be?