Example 1: Population Proportion

PubH 6414 Worksheet 9: Inference for Proportions from One Group 1 of 5 Example 1: Population Proportion Table 3-16 in the text describes the number ...
Author: Drusilla Ryan
0 downloads 1 Views 138KB Size
PubH 6414 Worksheet 9: Inference for Proportions from One Group

1 of 5

Example 1: Population Proportion Table 3-16 in the text describes the number of men and women with and without hematuria among a sample of patients with acquired hemophilia. Data for the sample of 15 men are represented below as ‘Yes’ for hematuria and ‘No’ for no hematuria. Reference: Dawson B, Trapp RG. (2004). Chapter 3. Summarizing Data & Presenting Data in Tables and Graphs. In, Basic & Clinical Biostatistics. 4th ed. New York: McGraw-Hill. 1. We are interested in the proportion of men with hematuria. Code this binary variable as ‘1’ for ‘Yes’ and ‘0’ for ‘No’ and sum the coded outcomes. Hematuria status Yes No No Yes Yes No Yes Yes Yes No No Yes Yes No Yes Sum

X: Coded as ‘1’ or ‘0’ 1 0 0 1 1 0 1 1 1 0 0 1 1 0 1 9

2. The proportion of men with hematuria in this sample = the sum of the coded variable divided by the number of men (n = 15). p

x n

i



9  0.60 15

When the sample proportion is calculated as the sum of the coded ‘1’s and ‘0’s divided by the number of trials, the sample proportion resembles a sample mean. By the Central Limit Theorem, the sample mean has a normal distribution regardless of the population distribution if the sample size is large enough. By the Central Limit Theorem the sample proportion has a normal distribution when n* > 5 and n*(1-) > 5.

PubH 6414 Worksheet 9: Inference for Proportions from One Group

2 of 5

Example 2: Confidence Interval of a Population Proportion Data as presented in the section slides: 202 of 900 randomly selected metro area youth reported that they currently smoked in 2005. a. Construct a 95% confidence interval for the population proportion of smokers among metro area youth in 2005. Does the 95% confidence interval provide evidence that the smoking rate among metro area youth is significantly different from the projected rate of 25% in 2005? As you answer the questions above, you should included in your answer the proportion of smokers, the standard error, the calculation of the confidence interval, and the interpretation of the confidence interval. i. Calculate the sample proportion of smokers among metro area youth (p): p = 202/900 = 0.224 ii. Calculate the SE of sample proportion. Use the sample proportion to calculate the SE(p) 0.224(1  0.224)  0.014 se = 900 iii. Confidence coefficient for the 95% CI of the population proportion: 1.96 Rcmdr: Distributions > Continuous Distributions > Normal Distribution > Normal Quantiles Probabilities = 0.975; select Lower tail R script: z=qnorm(0.975) iv. Compute the lower limit of the confidence interval. p – z*se = 0.224 – 1.96*0.0014 = 0.197 v. Compute the upper limit of the confidence interval. p + z*se = 0.224 – 1.96*0.0014 = 0.251 vi. Interpretation of confidence interval – is there evidence that the metro area smoking rate is significantly different from 25%? Why or why not? We have 95% confidence that the interval from 19.7% and 25.1% contains the true mean smoking rate of metro area youth. Since the 95% confidence interval contains the projected value of 25% there is not sufficient evidence to conclude, at 0.05 alpha level, that the rate in the metro area is significantly less than the projected rate.

PubH 6414 Worksheet 9: Inference for Proportions from One Group

3 of 5

b. Compare the confidence interval results to the hypothesis test results. What are some similarities and differences between these two methods of inference? Similar: the confidence coefficient for the 95% confidence interval is the same as the critical value for the hypothesis test with alpha = 0.05. Conclusions from the confidence interval and the hypothesis test are the same. Different: The SE of the confidence interval is calculated using the observed sample proportion. The SE for the hypothesis test is calculated using the hypothesized value since the hypothesis test is conducted under the assumption that the null hypothesis is true. Because of the different calculations of SE it’s possible that confidence intervals and hypothesis tests of proportion from one group might give different results.

PubH 6414 Worksheet 9: Inference for Proportions from One Group

4 of 5

Example 3: One Sample Hypothesis Test of a Proportion From a random sample of 100 infants born to mothers who smoked during pregnancy, 16 had LBW a. Use the low birth weight data to conduct a hypothesis test with significance level 0.05 to investigate whether LBW rate for infants born to mothers who smoke is different from the national average LBW rate of 0.077. Set the significance level to 0.05. In your answer you should include the null and alternative hypotheses, the appropriate type of test statistic, critical value (s), the test statistic and p-value, and your conclusion. i. Proportion of LBW infants. p = 16/100 = 0.16 ii. State the null and alternative hypotheses. hypothesized value ( = the national rate = 0.077 Ho:  = 0.077 HA:  ≠ 0.077 This is a two-tailed alternative. iii. Identify the appropriate test statistic. The appropriate test statistics is the z-statistic because the sample proportion has a normal distribution when n* > 5 and n*(1-) > 5. iv. Determine the critical value(s) for the hypothesis test. The critical values are from the standard normal distribution with a rejection region of 0.025 in each tail: -1.96 and 1.96 v. Calculate the test statistic and p-value. 0.077(1  0.077) SE   0.027 100 z = p -  SE ( 0.027 p-value = 2* 0.0009247942 = 0.001849588 The test statistic is 3.11 with p-vale 0.00185. Since the test statistic is positive, use Rcmdr to get the upper tail probability then multiple by 2 in the script window to obtain the two-tailed probability. Rcmdr: Distributions > Continuous Distributions > Normal Distribution > Normal Probabilities Variable Value = z; select Lower tail R script: 2*pnorm(z, lower.tail=FALSE) 2*(1-pnorm(z))

PubH 6414 Worksheet 9: Inference for Proportions from One Group

5 of 5

vi. State the conclusion of the test. Use both the critical value method and the p-value method to make the conclusion. The test statistic is in the upper tail rejection region because 3.11 > 1.96 The p-value of the test (p=0.00187) is less than the significance level 0.05. By both criteria the null hypothesis is rejected and we can conclude that the LBW rate for infants whose mothers smoked is significantly greater than the LBW rate in the general population b. Compare the results of the 95% confidence interval in the lesson and the z-test of one proportion. What are the similarities and differences? Similarities: the confidence coefficient for a 95% confidence interval is the same as the critical value for a hypothesis test with alpha level = 0.05. The conclusions are the same for both the confidence interval and the hypothesis test. Differences: The SE for the confidence interval is calculated using the observed sample proportion. The SE for the hypothesis test is calculated using the hypothesized proportion. The confidence interval provides information about the precision of the estimate. The hypothesis test provides a p-value.

Suggest Documents