Section 6.4. Parameter and Statistic. How Likely Are the Possible Values of a Statistic? The Sampling Distribution

Section Section 6.4  How Likely Are the Possible Values of a Statistic?  The Sampling Distribution Parameter and Statistic    Agresti/Fra...
Author: Gervase Wiggins
5 downloads 4 Views 174KB Size
Section Section 6.4 

How Likely Are the Possible Values of a Statistic?



The Sampling Distribution

Parameter and Statistic 





Agresti/Franklin Statistics, 1e, 1 of 21

Example: 2003 California Recall Election 



Prior to counting the votes, the proportion (p) in favor of recalling Governor Gray Davis was an unknown parameter. An exit poll of 3160 voters reported that the sample proportion in favor of a recall was 0.54.

• •

That is x=1706 voters in favor of a recall. The sample proportion=x/n=1706/3160=0.54.

Agresti/Franklin Statistics, 1e, 2 of 21

Example: 2003 California Recall Election 

 

Agresti/Franklin Statistics, 1e, 3 of 21

Sampling Distribution 



Question: How do we know that a sample statistic is a good estimate of a population parameter? Answer: The sampling distribution.

• The sampling distribution of a statistic is the probability distribution that specifies probabilities for the possible values the statistic can take. Agresti/Franklin Statistics, 1e, 5 of 21

Parameter: A numerical summary of a population, such as a population proportion (p) or a population mean (µ). Statistic: A numerical summary of sample data, such as a sample proportion or a sample mean. Statistic estimates Parameter.

If a different random sample of 3160 voters were selected resulting 1590 in favor or a (different) sample proportion 1675/3160=0.53, which is different from 0.54. Imagine all the distinct samples of 3160 voters you could possibly get. Then under re-sampling sense, the sample proportion is a random variable. Agresti/Franklin Statistics, 1e, 4 of 21

Example: Sampling Distribution 

Which Brand of Pizza Do You Prefer?

• Two Choices: A or D. • Assume that half of the population prefers A and half prefers D.

• Parameter of interest: p=population proportion. That is p=0.5.

• Take a random sample of n = 3 tasters. Agresti/Franklin Statistics, 1e, 6 of 21

1

Example: Sampling Distribution Sample of size 3=n

No. Prefer Pizza A (x)

Proportion (x/n)

(A,A,A)

3

1

(A,A,D)

2

2/3

(A,D,A)

2

2/3

(D,A,A)

2

2/3

(A,D,D)

1

Sample Proportion

Probability

0

1/8

1/3

3/8

2/3

3/8

1

1/8

1/3

(D,A,D)

1

1/3

(D,D,A)

1

1/3

0

0

(D,D,D)

Example: Sampling Distribution

Agresti/Franklin Statistics, 1e, 7 of 21

Agresti/Franklin Statistics, 1e, 8 of 21

P ( x)= n C x p x (1 − p) n − x

Example: Sampling Distribution

Mean and Standard Deviation of the Sampling Distribution of the Sample Proportion 

For a binomial random variable with n trials and probability p of success for each, the sampling distribution of the proportion of successes has: Mean = p and standard deviation =



Use binomial distribution:

P ( x )= n C x p x (1 − p ) n − x

To obtain these value, take the mean np and standard deviation np (1 − p ) for the binomial distribution of the number of successes and divide by n.

Agresti/Franklin Statistics, 1e, 9 of 21

Example: 2003 California Recall Election

Agresti/Franklin Statistics, 1e, 10 of 21

Example: 2003 California Recall Election 



Sample: Exit poll of 3160 voters.

• 

n=3160

Describe the mean and standard deviation of the sampling distribution of the number in the sample who voted in favor of the recall.



Suppose that exactly 50% of the population of all voters voted in favor of the recall.

Agresti/Franklin Statistics, 1e, 11 of 21

p(1 - p) =standard error n

n=3160, p=0.50.

• µ = np = 3160(0.50) = 1580

σ = np(1 - p) = 3160 (0.50 )(0.50) = 28.1



Agresti/Franklin Statistics, 1e, 12 of 21

2

Example: 2003 California Recall Election 

Describe the mean and standard deviation of the sampling distribution of the proportion in the sample who voted in favor of the recall.

Example: 2003 California Recall Election 

Mean = p = 0.50 Standard Deviation =

(0.50)(0.50) p(1 − p) = = 0.000079 = 0.0089 3160 n



Agresti/Franklin Statistics, 1e, 13 of 21

Example: 2003 California Recall Election

Agresti/Franklin Statistics, 1e, 14 of 21

Example: 2003 California Recall Election 



Convert the sample proportion value of 0.54 to a z-score:

(0.54 - 0.50) z= = 4.5 0.0089

If the population proportion supporting recall was 0.50, would it have been unlikely to observe the exit-poll sample proportion of 0.54? Based on your answer, would you be willing to predict that Davis would be recalled from office?





A sample proportion of 0.54 would be even more unlikely if the population support were less than 0.50. We have strong evidence that the actually p was large than 0.50. The exit poll gives strong evidence that Governor Davis would be recalled.

The sample proportion of 0.54 is more than four standard errors from the expected value of 0.50. The sample proportion of 0.54 voting for recall would be very unlikely if the population support were p = 0.50.

Agresti/Franklin Statistics, 1e, 15 of 21

Agresti/Franklin Statistics, 1e, 16 of 21

Example: 2003 California Recall Election

Recap: Summary of the Sampling Distribution of a Proportion (p) 



Fact: The sampling distribution of the sample proportion has a bell-shape with a mean µ = 0.50 and a standard deviation = 0.0089 if np 15. n(1-p) 15. ≥

σ



Agresti/Franklin Statistics, 1e, 17 of 21

For a random sample of size n from a population with proportion p, the sampling distribution of the sample proportion has

Mean = p and standard error = 

p(1 - p) n

If n is sufficiently large such that the expected numbers of outcomes of the two types, np and n(1p), are both at least 15, then this sampling distribution has a bell-shape.

Agresti/Franklin Statistics, 1e, 18 of 21

3

The Sampling Distribution of the Sample Mean

Section 6.5 How Close Are Sample Means to Population Means?









The sample mean, x, is a random variable. The sample mean varies from sample to sample. By contrast, the population mean, µ, is a single fixed number.

Agresti/Franklin Statistics, 1e, 19 of 21

Agresti/Franklin Statistics, 1e, 20 of 21

Mean and Standard Error of the Sampling Distribution of the Sample Mean

Example: How Much Do Mean Sales Vary From Week to Week?

For a random sample of size n from a population having mean µ and standard deviation , the sampling distribution of the sample mean has: σ

• •

Center described by the mean µ (the same as the mean of the population).



Daily sales at a pizza restaurant vary from day to day.



The sales figures fluctuate around a mean µ = $900 with a standard deviation = $300.

Spread described by the standard error, which equals the population standard deviation divided by the square root of the sample size: σ

σ

n

Agresti/Franklin Statistics, 1e, 21 of 21

Agresti/Franklin Statistics, 1e, 22 of 21

Example: How Much Do Mean Sales Vary From Week to Week?

Example: How Much Do Mean Sales Vary From Week to Week?



 

The mean sales for the seven days in a week are computed each week. The weekly means are plotted over time. These weekly means form a sampling distribution.

Agresti/Franklin Statistics, 1e, 23 of 21



What are the center and spread of the sampling distribution?

µ = $900 σ=

300 = 113 7

Agresti/Franklin Statistics, 1e, 24 of 21

4

Sampling Distribution vs. Population Distribution

Standard Error 

Agresti/Franklin Statistics, 1e, 25 of 21

Agresti/Franklin Statistics, 1e, 26 of 21

Standard Error

Central Limit Theorem



The standard error of the sample mean:



As the sample size n increases, the denominator increase, so the standard error decreases. With larger samples, the sample mean is more likely to fall close to the population mean.

σ



n







Knowing how to find a standard error gives us a mechanism for understanding how much variability to expect in sample statistics “just by chance.”

Question: How does the sampling distribution of the sample mean relate with respect to shape, center, and spread to the probability distribution from which the samples were taken?

Agresti/Franklin Statistics, 1e, 27 of 21

Agresti/Franklin Statistics, 1e, 28 of 21

Central Limit Theorem

Central Limit Theorem: How Large a Sample?

For random sampling with a large sample size n, the sampling distribution of the sample mean is approximately a normal distribution. This result applies no matter what the shape of the probability distribution from which the samples are taken.

Agresti/Franklin Statistics, 1e, 29 of 21



The sampling distribution of the sample mean takes more of a bell shape as the random sample size n increases. The more skewed the population distribution, the larger n must be before the shape of the sampling distribution is close to normal. In practice, the sampling distribution is usually close to normal when the sample size n is at least about 30. Agresti/Franklin Statistics, 1e, 30 of 21

5

A Normal Population Distribution and the Sampling Distribution 

If the population distribution is approximately normal, then the sampling distribution is approximately normal for all sample sizes.

Agresti/Franklin Statistics, 1e, 31 of 21

How Does the Central Limit Theorem Help Us Make Inferences 



For large n, the sampling distribution is approximately normal even if the population distribution is not. This enables us to make inferences about population means regardless of the shape of the population distribution.

Agresti/Franklin Statistics, 1e, 32 of 21

6

Suggest Documents