Chapter 3: Inference about Population Proportions

Chapter 3: Inference about Population Proportions We are often concerned with making inferences about population proportions. For example: - According...
Author: Verity Davis
1 downloads 2 Views 140KB Size
Chapter 3: Inference about Population Proportions We are often concerned with making inferences about population proportions. For example: - According to a recent Gallup poll, 60% of Americans are dissatisfied with the way things are going in the United States. - In 1990, the proportion of female in US is 51.3%.

Section 1: Inferences about single population proportion (Revision) Recall: Normal Approximation to the Binomial

= σ The binomial random variable X has mean µ = np and standard deviation

np (1 − p ) . If n

for a binomial distribution is large ( np>5 and n(1-p)>5)we may use normal approximation to the binomial. -

We standardized X to obtain Z =

Remark: The sample proportion pˆ=

(

X − np  N (0,1) np (1 − p )

x number of successes is approximately = n total number of observation

)

Normal p, p(1 − p) / n where np>5 and n(1-p)>5. Hence, a) If npˆ > 5 and n(1 − pˆ ) > 5 , a 100 (1 − α )% C.I for the population proportion, p, is given by

pˆ (1 − pˆ ) pˆ ± z (1 − α / 2) s ( pˆ ) where s ( pˆ ) = n b) To test the hypothesis

H 0 : p = p0 H a : p ≠ p0

z* =

, use the test statistic

pˆ − p0 = where s ( pˆ ) s ( pˆ )

p0 (1 − p0 ) n

Reject H0 if | z* | > z (1 − α / 2) . The P-value is 2 × P( Z >| z* |) .

Page 1 of 9

Example: It has been reported that approximately 60% of U.S. households have two or more television sets and that at least half of Americans sometimes watch television alone. Suppose that 75 U.S. households are sampled, and of those sampled, 49 had two or more television sets and 35 respondents sometimes watch television alone. 1) Two claims can be tested using the sample information. What are the two claims? 2) Construct a 95% confidence interval for the proportion of the Americans that have two or more television sets? Do the data present sufficient evidence to show that the 60% figure claimed in the magazine article is incorrect? 3) Do the data present sufficient evidence to contradict the claim that at least half of Americans sometimes watch television alone?

Page 2 of 9

Section 2: Inferences about two population proportions (Revision) Let pˆ1 =

x1 x and pˆ 2 = 2 and assume n1 pˆ1 > 5, n1 (1 − pˆ1 ) > 5, n2 pˆ 2 > 5, and n2 (1 − pˆ 2 ) > 5, then n1 n2

a) A 100(1 − α )% C.I for difference between the population proportions, D = p1 − p2 , is given by

= s ( Dˆ ) A Dˆ ± z (1 − α / 2) s ( Dˆ ) where

c) To test the hypothesis

z* =

pˆ1 (1 − pˆ1 ) pˆ 2 (1 − pˆ 2 ) ˆ pˆ − pˆ + = where D 1 2 n1 n2

H 0 : p1 − p2 = 0 , use the test statistic H a : p1 − p2 ≠ 0 Dˆ − 0 = where s ( Dˆ ) s ( Dˆ )

pˆ1 (1 − pˆ1 ) pˆ 2 (1 − pˆ 2 ) + n1 n2

Reject H0 if | z* | > z (1 − α / 2) . The P-value is 2 × P( Z >| z* |) .

Example: An experiment was conducted to test the effect of a new drug on a viral infection. The infection was induced in 100 mice, and the mice were randomly split into two groups of 50. The first group, the control group, received no treatment for the infection. The second group, the experimental group, received the drug. After a 30-day period, the proportions of survivors in the two groups were found to be 0.36 and 0.60, respectively. 1) Is there sufficient evidence to indicate that the drug is effective in treating the viral infection? 2) Use 95% confidence interval to estimate the actual difference in the cure rates for the experimental versus the control groups.

Page 3 of 9

Sec 3: Inferences about Several Proportions If a random variable X follows the gamma distribution then the probability density function is given by f ( x) =

1 xα −1e − x / β β Γ(α ) α

Chi-Square distribution with r degree of freedom is a special case of gamma distribution where

α = r / 2 and β =2 . Recall that if X has the binomial(n,p) distribution then n  n n  n! n− x P( X = x) = where   =   p (1 − p )  p  p  p !(n − p )!

Proportions are really just special cases of means. To see this, let x be 1 or 0 if the ith U.S. citizen is male or female, respectively, and let p represent the actual proportion of male citizens. Then if N is the population size, 1 p= N

N

∑x. i =1

i

So p is really the average of the N 1s and 0s. Page 4 of 9

This means we can take a sample of size n and estimate p with the unbiased point estimator pˆ ≡

1 n ∑ xi , n i =1

where X i is the number for the ith randomly chosen person. Since the population of the U.S. is rather large, we can view the X i s as independent Bernoulli(p) random variables. That is, the ith person should be male with probability p. Then n

∑X i =1

i

~ binomial(n, p),

which means that for k ∈ {0,1, …, n} k k 1 n  n  n k  n−k = = P = pˆ = P X P X= k= ∑ ∑ i i       k  p (1 − p ) . n n n =   i 1=  i1    And hence the mean and standard deviation of pˆ are p and

p (1 − p ) / n respectively (these

results we have used in sections 1 and 2 in this chapter, but we did not see why. But now we know why…………..).

The multinomial distribution is a joint distribution generalization of the binomial distribution. That is, suppose that n independent experiments are to be performed, each of which results in outcome 1 with probability p1 , outcome 2 with probability p2 ,…, outcome k with probability pk . Let X i denote the number of the n experiments resulting in outcome i, k

then for

∑n i =1

i

= n and

k

∑p i =1

i

= 1 , we have

P( X 1= x1 , X 2= x2 , , X k= xk = )

p ni n! p1n1 p2n2  pknk= n !Π ik=1 i . n1 !n2 ! nk ! ni !

The expression above is the joint probability mass function for the multinomial distribution. We can verify this probability expression by noting that the probability of any particular sequence of the n outcomes where event i occurs exactly ni times for i =1,…,k is exactly Π ik=1 pini . Also, the number of these sequences is exactly

n! . n1 !n2 ! nk !

Page 5 of 9

Example: Cutthroat is a three-player game of pool that John, George, and Ringo like to play (Paul is dead). John is very good, and wins 60% of the time, George wins 30% of the time, and Ringo wins 10% of the time. Suppose they play ten games. What is the probability that John wins at least five games and George wins at least four?

Back to statistics... The chi-square goodness-of-fit test is a hypothesis test for determining whether certain probabilities (or proportions) take on particular values. To do this, we perform a multinomial experiment, and record the observed numbers n1 , …, nk of trials resulting in each outcome type 1, …, k . We then compare these numbers to the expected values of the numbers of outcomes of each type under the null hypothesis. This depends on the test statistic X = 2

k

∑ i =1

( ni − Ei ) Ei

2

where Ei = n × pi (expected cell frequency) .

The Chi-Square Goodness-of-Fit Test: H 0 : p1 = p10 , p2 = p20 , …, pk = pk 0 H a : at least one of the proportions differs from its hypothesized value Reject Ho if X 2 > χ 2 (1 − α , k − 1) .

Page 6 of 9

Example: The units in a population consist of one of five types. A random sample of 300 units is classified as follows:

Category

Observed Count, ni

1

60

2

50

3

130

4

40

5

20

Total

300

It is hypothesized that = H 0 : p1 0.20, = p2 0.15, = p3 0.40, = p4 0.15, = p5 0.10. At α = 0.05 level, do the 300 units appear to be from a population with these values for pi .

Page 7 of 9

Chi-Square Test of Homogeneity The following table is r × c contingency table where the r rows correspond to the r populations and the c columns correspond to the c categories of classification.

Categories of classification 2 … c … n12 n1c

Population 1

1 n11

Total n1.

2

n21

n22



n2c

n2.

 r

 nr1

 nr 2

 …

 nrc

 nr .

Total

n.1

n.2



n.c

n

To test the hypothesis H 0 : p1 = p2 = … = pr H a : Not all the population proportions are equal Or on the other word H 0 : The population are homogeneous H a : The populations are not homogeneous We use the test statistic

( n − E ) where E = 2

χ

r

2

c

∑∑

=i 1 =i 1

ij

ij

Eij

ij

 n.1   n  n1. (expected cell frequency) .  

Reject Ho if X 2 > χ 2 (1 − α , (r − 1)(c − 1)) .

Page 8 of 9

Example: A researcher studied the characteristics of subjects attending a five-day human sexuality program. The results are shown in the following table:

Marital Status Group

single

Married or

Total

divorced Medical Students

50

20

70

Nursing Students

12

25

37

Other students

6

8

14

Group leaders

1

21

22

Total

69

74

143

Test whether or not the four populations represented in the study are homogenous with respect to marital status.

Page 9 of 9