Chapter 11: Inference For Population Proportions

Chapter 11: Inference For Population Proportions Jugal Ghorai University of Wisconsin-Milwaukee January 6, 2010 Jugal Ghorai Chapter 11: Inference F...
Author: Janel Matthews
0 downloads 1 Views 2MB Size
Chapter 11: Inference For Population Proportions Jugal Ghorai University of Wisconsin-Milwaukee

January 6, 2010

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Chapter 11: Inference For Population Proportions

• Chapter Outline 11.1 Confidence Interval For One Proportion 11.2 Hypothesis Test For One Proportion 11.3 Inference for Two Proportions

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

General Objective: In Chapter 9 and Chapter 10 we learned to construct Confidence Interval and test hypothesis about one or two population means. In this chapter we will learn to make similar statistical inference for one or two population proportions. In Section 11.1 we describe confidence interval for one proportion. In Section 11.2 we describe hypothesis tests for one proportion. In Section 11.3 we describe inference for two proportions.

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

11.1 Confidence Interval for one Population Proportion Some Examples of Proportions Proportion of US adults who have health insurance. Proportion of imported cars in US. Proportion of US citizens identified as Republicans. Notations Unknown population proportion will be denoted by p. Estimated sample proportion will be denoted by pˆ. Definition Population Proportion: The proportion of individuals with a specified attribute in the entire population is called the population proportion. It will be denoted by p. Sample Proportion: The proportion of individuals with a specified attribute in the selected sample is called the sample proportion. It will be denoted by pˆ. Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Goal: Based on a random sample Obtain point estimate of p Obtain interval estimate of p Test Hypothesis about p NOTE: Population Proportions are actually Population Means for data recorded as 0 or 1. For example, each individual either has health insurance or does not have the health insurance. We record it as W = 0 ( no health insurance) X or W = 1 ( has health insurance). In this case X = Wi would indicate the total number of individuals having the health insurance. If there are N individuals in the entire population, then the ratio p = X /N would represent the proportion of individuals with health insurance. N X X = Wi /N which is the population mean of all However, p = N i=1 Wi . Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

In this chapter we will use X to denote the total count in the entire population. N = population size. Hence p=X/N. This p is usually unknown. x to denote the total count in the sample. n= sample size. Hence pˆ = x/n. This will be calculated from the sample and hence will be known. Example 11.1: Playing Hooky from Work. In a survey of 1010 employees, 202 responded as playing Hookey from work. Estimate the proportion of employees who play Hooky from work. Solution: x = 202, n = 1010. Hence pˆ = x/n = 202/1010 = 0.20. Conclusion: Based on the data, the estimated proportion of employees who play Hooky is about 20%. Note: This estimate will change if another sample is taken. Hence we need the sampling distribution of pˆ. Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Key fact 11.1: For samples of size n, the mean of pˆ is µpˆ = p ; the standard deviation of pˆ is σpˆ =

p p(1 − p)/n

for large n, the distribution pˆ is approximately normal. Note: σpˆ will be estimated by σ ˆpˆ =

Jugal Ghorai Chapter 11: Inference For Population Proportions

p pˆ(1 − pˆ)/n where pˆ = x/n.

University of Wisconsin-Milwaukee

Large sample CI for p

Steps in constructing CI for p: Collect information: x = count in the sample, n = sample size. Compute pˆ: pˆ = x/n Set confidence coefficient 1 − α and compute zα/2 p Compute Margin of error E: E = zα/2 × pˆ(1 − pˆ)/n Lower bound = pˆ − E Upper bound = pˆ + E

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Example 11.3: Z interval for p Playing Hooky From Work: Construct a 95% CI for p, the proportion of employees who play Hooky from work. Collect Information: x = 202, n = 1010, 1 − α = 0.95 Compute zα/2 = z0.025 = 1.96 Compute margin p of error: p E = zα/2 × pˆ(1 − pˆ)/n = 1.96 × (.2)(.8)/1010 = 0.025 Lower Bound = pˆ − E = 0.2 − 0.025 = 0.175 Upper Bound = pˆ + E = 0.2 + 0.025 = 0.0.225 CI: (0.175, 0.225)

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Sample Size for Estimating p p Margin of Error Formula: E = zα/2 × p(1 − p)/n The equation above contains E, α, p and n. Hence, given any three, we can solve for the remaining one. For example, for given α, p and n, we can calculate E using the formula above. For given E, α and p we can solve for n. The solution is: h z i2 α/2 n = p(1 − p) E The formula for n above is not very useful since we do not know the value of p. If no information is available regarding p, then p(1-p) is replaced by its maximum value. The maximum value of p(1-p) is 0.25. This gives the most conservative estimate of n h i2 z as: n = 0.25 α/2 . E Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

In practice often partial information for p is known in the form p ≤ pg or p ≥ pg for some pg . In such cases the partially conservative estimate h ofi2n is: z n = pg (1 − pg ) × α/2 . E Example 11.4 Sample Size for estimating p: Consider the ”Playing Hookey From Work” problem. Problem: How many samples are necessary to obtain an estimate of p that will ensure a margin of error of at most 0.01 and a 95% confidence. Solution: Since nothing is known about p, we will use the h i2 z most conservative estimate of n. n = 0.25 α/2 . E Collect Information: 1 − α = 0.95. Hence α/2 = 0.025 and z0.025 = 1.96. Also margin of error at most 0.01 implies E ≤ 0.01.h So use i E = 0.01.  1.96 2 zα/2 2 n = 0.25 E = 0.25 0.01 . = 9604 Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Output 11.2 Using StatCrunch with summary data

Open StatCrunch using Chapter 10. Follow: Stat > Proportion > One sample > With Summary Enter Number of successes ( x) and sample size (n) and press ”Next” For hypothesis testing Select Alternative For confidence interval enter confidence level Press ”Calculate”

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Hypothesis Test for one Proportion Gun Control: Gun control is a very controversal issue in US. Politicians always want to align themselves with majority opinion. Hence it is of interest to know whether majority of the population favor gun control or oppose gun control. If p denotes the proportion of people who favor gun control, then the problem can be formulated as a hypothesis test. H0 : p = 0.5 vs Ha : p > 0.5. In general the specified value of p does not have to be 0.5 all time. It could be any number between 0 and 1. The specified value will be denoted by p0 . Also the alternative hypothesis does not always have to be greater than type. The three cases are listed below. H0 : p = p0 vs Ha : p < p0 (less than type) H0 : p = p0 vs Ha : p > p0 (greater than type ) H0 : p = p0 vs Ha : p 6= p0 (not equal to type) Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Z test for H0 : p = p0

Collect information: Find x, n, and p0 and α. Compute: Point estimate pˆ = x/n pˆ − p0 Use test statistic: Z = p p0 (1 − p0 )/n Rejection Rule: For Ha : p < p0 , Reject H0 : p = p0 if z ≤ −zα For Ha : p > p0 , Reject H0 : p = p0 if z ≥ zα For Ha : p 6= p0 , Reject H0 : p = p0 if |z| ≥ z(α/2)

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Example 11.6: Gun Control problem. In a survey of 1250 randomly selected US citizen 650 fovored gun control. At a 5% level of significance do the data support the claim that majority favor banning hand gun sales ? Solution: Collect information: n=1250, x=650 p0 = 0.5, α = 0.05 Compute: Point estimate pˆ = x/n = 650/1250 = 0.520 Rejection Rule: H0 : p = 0.5 vs Ha : p > 0.5, Reject H0 : p = p0 if z ≥ zα = 1.645 Compute test statistic: pˆ − p0 0.520 − 0.5 Z=p =p = 1.41 p0 (1 − p0 )/n 0.5(1 − 0.5)/1250 Since z = 1.41 < z0.05 = 1.645, do NOT reject H0 . Conclusion: There is not enough evidence to conclude that majority favors the ban on the sale of hand guns. Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Using P value to test H0 : p = p0

Recall H0 : p = 0.5 vs Ha : p > 0.5. and computed z value = 1.41. The P value = Right tail area under N(0,1) to the right of 1.41. Either using Tbale -II or StatCrunch this area is 0.0793 which is greater than 0.05. Do NOT reject H0 .

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Output 11.3 Using StatCrunch with summary data

Open StatCrunch using Chapter 10. Follow: Stat > Proportion > One sample > With Summary Enter Number of successes ( x) and sample size (n) and press ”Next” For hypothesis testing Select Alternative Press ”Calculate” Note: StatCrunch will print out the P value. It can be used to null hypothesis for any α.

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Inference for Two Proportions Example 11.8: Eating out Vegetarian: It is believed that men and women differ in ordering vegetarian food while eating out. In an effort to verify this, 747 men and 434 women were interviewed. 276 men and 195 women said they order vegetarian dish some time. Question: (a) Formulate the problem as a hypothesis test. (b) Explain the basic idea for carrying out the test. (c) Discuss the use of the data to make a decision concerning the hypothesis test.

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Solution: Some notations:

Population Prpotion Sample Size Number of Successes Sample Proportions

Population 1 p1 n1 x1 pˆ1 = x1 /n1

Population 2 p2 n2 x2 pˆ2 = x2 /n2

(a) Population 1: All US men Population 2: All US women p1 = proportion of all US men who sometime order veg dish. p2 = proportion of all US women who sometime order veg dish. H0 : p1 = p2 (men and women order veg dish equally often.) H0 : p1 < p2 (men order veg dish less often than women. ) Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Solution: (b) Logic of testing H0 : p1 = p2 : Compute estimates pˆ1 = x1 /n1 and pˆ2 = x2 /n2 Recall that pˆ1 and pˆ2 are like shows of p1 and p2 . Hence pˆ1 − pˆ2 is always close to (p1 − p2 ).

Hence If p1 = p2 , then pˆ1 − pˆ2 ≈ 0 If p1 < p2 , then pˆ1 − pˆ2 p2 , then pˆ1 − pˆ2 >> 0

This suggests intuitive rejection rules as: If Ha : p1 < p2 , then Reject H0 if pˆ1 − pˆ2 < c1 If Ha : p1 > p2 , then Reject H0 if pˆ1 − pˆ2 > c2 If Ha : p1 6= p2 , then Reject H0 if |ˆ p1 − pˆ2 | > c

To determine c1 and c2 we need the sampling distribution of (ˆ p1 − pˆ2 ). This is stated next. Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Sampling Distribution of (ˆ p1 − pˆ2 ) Key fact 11.2 Let pˆ1 and pˆ2 denote sample proportions from two independent samples of size n1 and n2 respectively. Then Mean of (ˆ p1 − pˆ2 ) is µpˆ1 −ˆp2 = p1 − p2 , standard deviation of (ˆ p1 − pˆ2 ) is p σpˆ1 −ˆp2 = p1 (1 − p1 )/n1 + p2 (1 − p2 )/n2 , (ˆ p1 − pˆ2 ) − (p1 − p2 ) Z=p p1 (1 − p1 )/n1 + p2 (1 − p2 )/n2 is approximately N(0,1). (ˆ p1 − pˆ2 ) p If p1 = p2 = p, then Z = p p(1 − p) 1/n1 + 1/n2 is approximately N(0,1). Since p is still unknown, Z above can not be computed. It will be replaced by a pooled estimate based on both samples. Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Pooled estimate of common p

x1 + x2 . n1 + n2 (ˆ p1 − pˆ2 ) p For large samples, Z = p pˆp (1 − pˆp ) 1/n1 + 1/n2 is approximately N(0,1). Pooled estimate of common p is: pˆp =

This result will be used all statistical inference regarding p1 and p2 . Now back to the rejection rules:

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Rejection Rule Contd. Recall the intutive rejection rules based on (ˆ p1 − pˆ2 ): If Ha : p1 < p2 , then Reject H0 if pˆ1 − pˆ2 < c1 If Ha : p1 > p2 , then Reject H0 if pˆ1 − pˆ2 > c2 If Ha : p1 6= p2 , then Reject H0 if |ˆ p1 − pˆ2 | > c These rules can now be restated in terms of Z statistic: (ˆ p1 − pˆ2 ) p Z=p pˆp (1 − pˆp ) 1/n1 + 1/n2 If Ha : p1 < p2 , then Reject H0 if Z ≤ −zα If Ha : p1 > p2 , then Reject H0 if Z ≥ zα If Ha : p1 6= p2 , then Reject H0 if |Z | ≥ z(α/2)

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Example 11.9 Eating Out Vegetarian: It is believed that men and women differ in ordering vegetarian food while eating out. In an effort to verify this, 747 men and 434 women were interviewed. 276 men and 195 women said they order vegetarian dish some time. At 5% level of significance do the data support the claim that men order veg dish less often than women ? Solution: Collect information: For Men: x1 = 276, n1 = 747, True proportion p1 . For Women: x2 = 195, n2 = 434, True proportion p2 . Significance level = α = 0.05 Set up hypotheses and rejection rule: H0 : p1 = p2 vs Ha : p1 < p2 . zα = z0.05 = 1.645 Reject H0 if Z ≤ −1.645. Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Calculate estimates: pˆ1 = 276/747 = 0.369, pˆ2 = 195/434 = 0.449, pˆp = (276 + 195)/(747 + 434) = 0.399, Calculate Z statistic: (ˆ p1 − pˆ2 ) p Z=p pˆp (1 − pˆp ) 1/n1 + 1/n2 (.369 − .449) p =p = −2.71 0.399(1 − 0.399) 1/747 + 1/434 Make a decision: Since Z = −2.71 < −1.645 = z0.05 , reject H0 . Conclusion: There is enough evidence to conclude that men order veg dish significantly less often than women !!

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Testing with P value

Since Ha : p1 < p2 , the P value = P(Z statistic > |zo |) = P(Z statistic > 2.71) = 0.0034 Since P value = 0.0034 < 0.05 = α, H0 is rejected. Conclusion: There is enough evidence to conclude that men order veg dish significantly less often than women !!

Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

CI for (p1 − p2 ) Steps for Constructing CI for (p1 − p2 ) Collect Information: Sample 1: n1 , x1 , True Proportion: p1 Sample 2: n2 , x2 , True Proportion: p2 Confidence Coefficient: (1 − α) Compute: pˆ1 − pˆ2 = x1 /n1 − x2 /n2 , Compute Margin  s of Error: pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )  + E = zα/2 ×  n1 n2 Compute Lower Bound: (ˆ p1 − pˆ2 ) − E Compute Upper Bound: (ˆ p1 − pˆ2 ) + E Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Example 11.10: CI for (p1 − p2 ) Collect Information: Men: n1 = 747, x1 = 276, True Proportion: p1 Women: n2 = 434, x2 = 195, True Proportion: p2 Confidence Coefficient: (1 − α) = 0.90, α = 0.10, α/2 = 0.05, z0.05 = 1.645 Compute: pˆ1 − pˆ2 = nx11 − nx22 = .369 − 0.449 = −0.08, Compute Margin s of Error:  E = zα/2 ×  = 1.645 ×

q

pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )  + n1 n2

0.369(1−0.369) 747

+

0.449(1−0.449) 434

 =0.049

Lower Bound: (ˆ p1 − pˆ2 ) − E = −0.08 − 0.049 = −0.129 Upper Bound: (ˆ p1 − pˆ2 ) + E = −0.08 + 0.049 = −0.031 Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Sample Size for Estimating (p1 − p2 ) So far we assumed that samples have already been collected. Our concern was to use the collected data to make inference for (p1 − p2 ). Based on the collected information, we computed the margin of error  susing the formula E = zα/2 × 

pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )  + . n1 n2

A large margin of error would indicate that the estimates are NOT very accurate. If a more accurate estimate is desired then one has to collect more data. This poses the question: How many samples should we collect if we want estimates with a given accuracy ? The accuracy is measured by the magnitude of the margin of error, E. Problem: Find sample size n1 and n2 for a specified value of E. This is done using the equation for E above. Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Sample Size for Estimating (p1 − p2 ) Contd. s E = zα/2 × 

 pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 )  + . n1 n2

The equation for E involves: E, n1 , n2 , α, pˆ1 and pˆ2 To make life easier, we will assume that equal number of samples will be taken from each population. This means set n1 = n2 = n."r The the formula for E reduces # to: pˆ1 (1 − pˆ1 ) pˆ2 (1 − pˆ2 ) + . E = zα/2 × n n hp i √ E = zα/2 × pˆ1 (1 − pˆ1 ) + pˆ2 (1 − pˆ2 ) / n. h i2 z n = [ˆ p1 (1 − pˆ1 ) + pˆ2 (1 − pˆ2 )] α/2 E This formula for n is still NOT useful since we do not know the value of pˆ1 and pˆ2 . This is discussed next. Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Common Sample Size for Estimating (p1 − p2 ) If nothing is known about p1 and p2 we replace p1 (1 − p1 ) and p2 (1 − p2 ) each by its maximum value which is = 0.25. h i2 z This gives: n = [0.25 + 0.25] α/2 OR E n = 0.5 ×

hz

i α/2 2 E

If, however, some bounds of p1 and p2 are known, for example, it can be assumed that p1 ≤ p1g and p2 ≤ p2g , then n = [ˆ p1g (1 − pˆ1g ) + pˆ2g (1 − pˆ2g )]

Jugal Ghorai Chapter 11: Inference For Population Proportions

hz

i α/2 2 E

University of Wisconsin-Milwaukee

Example of Sample Size Calculation Recall that in Example 11.10 the calculated margin of error was E = 0.049. This was based on n1 = 747 and n2 = 434. If we want margin of error to be, say, 0.02 rather than 0.049, how many samples do we need ? Solution: Suppose prior to collecting data we have no idea about the magnitude of p1 and p2 . So we will use the most h i2 z with (1 − α) = 0.90 conservative estimate n = 0.5 × α/2 E zα/2 = z0.05 = 1.645 h i2 z n = 0.5 × α/2 = 0.5[1.645/02]2 = 3382.53 = 3383. E Always round up for sample size. Increasing the accuracy from 0.049 to 0.02 would need a lot more samples !! Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee

Output 11.4 Using StatCrunch with summary data Open StatCrunch using Chapter 10. Follow: Stat > Proportion > Two samples > With Summary Enter Number of successes ( x1 ) and sample size (n1 ) for sample 1 Enter Number of successes ( x2 ) and sample size (n2 ) for sample 2 and press ”Next” For hypothesis testing Select Alternative and Press ”Calculate” For confidence interval Select confidence level and Press ”Calculate” Note: StatCrunch will print out the P value for hypothesis testing. It can be used to null hypothesis for any α. Jugal Ghorai Chapter 11: Inference For Population Proportions

University of Wisconsin-Milwaukee