Statistical Estimation and Sampling Distributions. Chapter 7

Statistical Estimation and Sampling Distributions Chapter 7 1 Introduction Poll: Most think bin Laden planning another U.S. attack Wednesday, Augus...
1 downloads 0 Views 157KB Size
Statistical Estimation and Sampling Distributions Chapter 7

1

Introduction Poll: Most think bin Laden planning another U.S. attack Wednesday, August 23, 2006; CNN As the five-year anniversary of the September 11, 2001, terrorist attacks approaches, nearly three-fourths of those responding to a CNN poll said they believe Osama bin Laden is planning another significant attack against the United States. Seventy-four percent of the 1,033 adult Americans polled said they believe an attack is being planned, according to the poll conducted by Opinion Research Corporation on behalf of CNN…The poll was conducted by telephone August 18-20 with 1,033 adult Americans. The margin of error for the question on whether bin Laden is planning another attack is plus or minus 3 percentage points. 2

Introduction How do use probability to infer something about unobserved population characteristics, using statistics from our sample which we do observe? The purpose of statistics is to characterize the underlying population from which a sample was taken – i.e., to infer something about the population using information from the sample.

3

Introduction Parameters are characteristics of the underlying population. Statistics are quantities we compute from our sample in order to estimate the values of the population parameters. Example: Consider the mean height of all USU students. Let μ represent this average. In order to estimate μ, we may sample, say, 100 students at random, measure their individual heights, and then compute X , the average of our sample. Thus, μ is the population parameter representing true mean height. X is an estimate of μ. 4

7.2 Point Estimates A point estimate θˆ of an unknown parameter θ is a statistic that represents the “best guess” at the value of θ. There may be more than one good point estimates of a parameter. An estimate is unbiased if E(θˆ ) = θ. Otherwise, bias = E(θˆ ) - θ. All else being equal (e.g. equal variances), the smaller the magnitude of the bias, the better. Example: Suppose that E(X1) = μ and E(X2) = μ. Is μˆ = X 1 / 2 + X 2 / 2 an unbiased estimate of μ? 5

7.2 Point Estimates X~B(n,p), then construct a point estimate of the success probability p. Is the point estimate unbiased? If X1, … , Xn is a sample of observations from a probability distribution with a mean μ, construct a point estimate of μ. Is the point estimate unbiased? If X1, … , Xn is a sample of observations from a probability distribution with variance σ2, construct a point estimate of σ2. Is the point estimate unbiased?

6

7.2 Point Estimates pˆ =

X n

If X~B(n,p), then is an unbiased point estimate of the success probability p. If X1, … , Xn is a sample of observations from a probability distribution with a mean μ, then the sample mean μˆ = X is an unbiased point estimate of the population mean μ. If X1, … , Xn is a sample of observations from a probability 2, then the sample variance distribution with variance σ n ( X i − X )2 ∑ 2 2 i =1 is an unbiased point estimate of the σˆ = S = n −1 population variance σ2. 7

7.2 Point Estimates The sample Proportion is an unbiased point estimate of the population proportion.

pˆ estimates p The sample mean is an unbiased point estimate of the population mean.

X estimates μ The sample variance is an unbiased point estimate of the population variance.

S2 estimates σ2 8

7.3 Sampling Distributions Distribution of a Sample Proportion A population proportion p is just an average of 1’s and 0’s. The estimate – computed from a sample of size n – is likewise a sample average of 1’s and 0’s. What does the Central Limit Theorem say about the distribution of a population proportion? What are the mean and variance of that distribution? The standard deviation of this sampling distribution is ˆ ). referred to as the standard error, or s.e.( p 9

7.3 Sampling Distributions Distribution of a Sample Proportion

X If X~B(n,p) then the sample proportion p ˆ= n has the approximate distribution ⎛ p (1 − p ) ⎞ pˆ ~ N ⎜ p, ⎟ n ⎠ ⎝

10

7.3 Sampling Distributions How do we use the sampling distribution of an estimated proportion to infer something about the underlying population based upon what we observe in our sample? Consider a gender question in science and engineering: Does the fact that there are 17 women out of 39 total students in a statistics class for scientists and engineers say anything about the gender breakdown in the underlying population of science and engineering students at USU? We can address this by asking: What is the probability of our observing 17 or fewer women in a sample of size 39, if we assume for the sake of argument that the underlying population is 55% female (same as the USU breakdown)? 11

7.3 Sampling Distributions What does “margin of error” mean when sample proportions are reported in the news? Consider again the article shown previously. What is the margin of error? What is the probability that a normally distributed random variable is within two standard deviations of it’s mean? The margin of error represents approximately 2 standard errors (1.96 standard errors to be exact). Hence, the interval that is ±1.96 standard errors around pˆ is called a 95% confidence interval. That is, the researchers are 95% certain that the interval contains the true population proportion p. 12

7.3 Sampling Distributions Distribution of the Sample Variance If X1, … , Xn is a sample of observations from a probability distribution with variance σ2, then the sample variance has the distribution

S ~σ 2

2

χ

2 n −1

n −1 13

7.3 Sampling Distributions Distribution of the Sample Mean If X1, … , Xn is a sample of observations from a probability distribution with a mean μ, then the sample mean has the distribution

⎛ σ2 ⎞ μˆ = X ~ N ⎜⎜ μ , ⎟⎟ n ⎠ ⎝ And therefore

Z=

n(X − μ)

σ

~ N (0,1) 14

7.3 Sampling Distributions The t Statistic The question is, how do we make inferences about μ when we don’t know σ2? If X1, … , Xn are normally distributed with mean μ and variance σ2, then n( X − μ) Z= ~ N (0,1) σ If σ is unknown, it can be replaced with the known quantity S (the sample standard deviation). Then the distribution of the statistic T follows a t distribution with n – 1 degrees of freedom. n(X − μ) T= ~ t n −1 S 15

7.3 Sampling Distributions Recall: A t-distribution with ν degrees of freedom is defined as follows:

tν =

N (0,1)

χν /ν 2

where the N(0,1) and Xv2 are independently distributed. As v→∞, the t-distribution tends toward a standard normal distribution. 16

17

7.3 Sampling Distributions 1.

Suppose that we have a sample X1,…,X16 that is normally distributed with mean μ and variance σ2. What is the value c for which P( (X - μ ) / S ≤ c) = 0.95 | ?

We can use data from a random sample of individuals in our class to draw conclusions about the class as a whole. 2. Construct a point estimate of the proportion of juniors in our class. What is the standard error of the estimate? 3. Construct a point estimate of the average shoe size of students in our class. What is the standard error of the estimate? 18