Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS Part 1: Introduction Sampling Distributions & the Central Limit Theorem Point Estim...

Author: Ronald Daniels

0 downloads 2 Views 2MB Size

Report

Download PDF

Recommend Documents

Statistical Estimation and Sampling Distributions. Chapter 7

Chapter 7 Sampling and Sampling Distributions

Sampling distributions and Estimation

Chapter 7 Introduction to Sampling Distributions

Chapter 11. Sampling Distributions

Sampling Distributions, the CLT, and Estimation

Chapter 6 Point estimation

Change Point Estimation Under Adaptive Sampling

Sampling Distributions

Chapter 6 Introduction to Sampling Distributions

Sampling Distributions

CHAPTER 4: POINT ESTIMATION AND

Chapter 7 Continuous Probability Distributions

Chapter 7: Theoretical Probability Distributions

Section 3.1: Sampling Distributions

SAMPLING DISTRIBUTIONS (REVIEW TOPIC)

PROBABILITY AND SAMPLING DISTRIBUTIONS

Outline of the course. Chapter 7 : Sampling Distributions and the Central Limit Theorem

Introduction To Statistics Sampling & Sampling Distributions: Basics

Chapter 7: Sampling Distributions. Section 7.1 How Likely Are the Possible Values of a Statistic? The Sampling Distribution

Lab #7. Point Estimation and Bootstrapping

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS Part 1: Introduction Sampling Distributions & the Central Limit Theorem Point Estimation & Estimators Sections 7-1 to 7-2

Sample data is collected on a population to draw conclusions, or make statistical inferences, about the population.

Types of statistical inference: 1) parameter estimation (e.g. estimating µ) - with a certain level of confidence 2) hypothesis testing (e.g. H0 : µ = 50) 1

Example of parameter estimation (or point estimation): We’re interested in the value of µ. We collected data and we use the observed x¯ as a point estimate for µ. µ is the unknown parameter being estimated. ¯ NOTATION: µˆ = X ¯ is the estimator. X {We often show an estimator as a ‘hat’ over its respective parameter.}

The estimate x¯ is a single value, or a point estimate.

¯ is the statistic of interest from the data. X 2

Sample-to-sample variability ¯ (the sample mean) deThe value we get for X pends on the specific sample chosen. Sample

Population

¯ is a random variable! This means, X

¯ The distribution of the random variable X ¯ is called the sampling distribution of X.

¯ to be close to µ (we ARE usWe expect X ing it to estimate µ) but there is variability in ¯ before it is observed because we use random X sampling to choose our sample of size n.

3

¯ Sampling distribution of X • Tells us what kind of values are likely to occur ¯ for X. • Puts a probability distribution over the pos¯ sible values for X. HINT: It’s distribution will be normal

In a simple random sample of n observations from a population, ¯ =µ E(X) ¯ is an unbiased estimator of µ. ⇒X This gives us a measure of center for the sam¯ but what about the pling distribution for X, ¯ random variable? variability of the X

4

¯ Sampling distribution of X

f(x)

Case 1 Original population is normally distributed.

x

The x¯ I observe depends on the sample (the particular n observations) I chose from this normal distribution. Let’s look at the distribution of x¯ values if I choose a sample of size n and compute x¯ for that sample, and I repeat this process 1000 times... 5

f(x)

x

1) Choose a sample of size n from a normal distribution 2) Compute x¯ 3) Plot the x¯ on our frequency histogram 4) Do steps 1-3 1000 times See applet at: http://onlinestatbook.com/stat sim/sampling dist/index.html

6

SKETCH THE PLOTS: ¯ for n=2 when original popDistribution of X ulation is normal.

¯ for n=25 when original Distribution of X population is normal.

7

Turns out, in this case, the random variable ¯ is normally distributed. X This normal distribution is centered at µ (the mean of the original population we were sampling from). ¯ depends on the sample The variability of X size n, and the variability in the original population. SPECIFICALLY: When X ∼ N (µ, σ 2), 2 σ ¯ ∼ N (µ, ) X n

¯ is less variNOTE: the distribution for X able than the distribution for X. 8

2 σ ¯ ∼ N (µ, ) X n

¯ from n = 25 is less variable than NOTE: X ¯ from n = 2. X

More data (larger n) gives us a better esti¯ mate of µ from X. ¯ is squished The distribution of our estimator X closer, or is tighter, around the thing we’re trying to estimate. Which is beneficial when estimating something.

9

¯ Sampling distribution of X

f(x)

f(x)

Case 2 Original population is NOT normally distributed.

x

f(x)

x

Or anything else...

x

10

¯ look like? What does the distribution of X

1) Choose a sample of size n from the distribution 2) Compute x¯ 3) Plot the x¯ on our frequency histogram 4) Do steps 1-3 1000 times ———————————————————– Right-skewed with n = 10.

11

Really non-normal (mass out at the ends) with n = 2.

Really non-normal (mass out at the ends) with n = 25.

12

¯ is normally Turns out the random variable X distributed no matter what your original distribution was IF n is large enough... What’s large enough? Rule of thumb is n ≥ 30 So, what have we learned... if X is normally distributed, then ¯ ∼ N (µ, σ 2/n) for any n. X if X is NOT normally distributed, then ¯ ∼ N (µ, σ 2/n) for n ≥ 30. X if X is not severely non-normal, then ¯ ∼ N (µ, σ 2/n) is close to true for n < 30. X

13

Sampling Distributions and the Central Limit Theorem Section 7-2

Sample data is collected on a population to draw conclusions, or make statistical inferences, about the population.

NOTATION: ¯ represents the random − A large letter like X ¯ and X ¯ can take on many values. variable X, − A small letter like x¯ represents an actual observed x¯ from a sample, and it is a fixed quanitity once observed.

14

• Random Sample The random variables X1, X2, . . . , Xn are a random sample of size n if... a) the Xi’s are independent random variables, and b) every Xi has the same sample probability distribution (i.e. they are drawn from the same population). NOTE: the observed data x1, x2, . . . , xn is also referred to as a random sample.

15

• Statistic – A statistic is any function of the observations in a random sample. ∗ Example: ¯ is a function of the obserThe mean X vations (specifically, a linear combination of the observations). Pn Xi 1 1 1 i=1 ¯ X = = X1+ X2+· · ·+ Xn n n n n – A statistic is a random variable, and it has a probability distribution – The distribution of a statistic is called the sampling distribution of the statistic because is depends on the sample chosen.

16

– The sampling distribution of the mean is very important. What is the expected value of the sample ¯ in a random sample? mean X 1 1 1 ¯ E(X) = E( X1 + X2 + · · · + Xn) n n n X 1 = E(Xi) n nµ 1X µ= = µ = µX¯ = n n ¯ =µ¯ =µ Notation: E(X) X where µ is the population mean. (µ is also the expected value of a single Xi)

17

What is the variance of the sample mean ¯ in a random sample? X (Xi’s in a random sample are independent.)

1 1 1 ¯ V (X) = V ( X1 + X2 + · · · + Xn) n n2 X n 1 = V (Xi) n 2 X 1 σ2 = n 2 2 1 σ = nσ 2 = n n 2 σ 2 ¯ Notation: V (X) = σX¯ = n

where σ 2 is the population variance. (σ 2 is also the variance of a single Xi)

18

As we have described earlier, for n ≥ 30 2 σ ¯ ∼ N (µ, ) X n (and this is also true for n < 30 if each Xi comes from a normal population).

Using this fact, and what we know about standardizing variables, leads to... • The Central Limit Theorem If X1, X2, . . . , Xn is a random sample of size n taken from a population with mean µ and variance σ 2, the limiting form of the distribution of ¯ −µ X √ Z= σ/ n as n → ∞ is the standard normal distribution, or N (0, 1). 19

The approximation of ¯ −µ X √ ∼ N (0, 1) σ/ n depends on the size of n.

Satisfactory approximation for n ≥ 30 for any population. Satisfactory approximation for n < 30 for near normal populations. ———————————————————— The next graphic shows 3 different original populations (one nearly normal, two that are not), ¯ based on a and the sampling distribution for X sample of size n = 5 and size n = 30. 20

The three original distributions are on the far left (one that is nearly symmetric and bell-shaped, one that is right skewed, and one that is highly right skewed).

As shown in: Navidi, W. ‘Statistics for Engineers and Scientists’, McGraw Hill, 2006 21

Things to notice from the previous graphic: ¯ decreases as n increases • The variability of X 2 σ ¯ Recall: V (X) = n . • If the original population has a shape that’s ¯ closer to normal, smaller n is sufficient for X to be normal. • The normal approximation gets better with larger n when you’re starting with a nonnormal population. • Even when X has a very non-normal distri¯ still has a normal distribution with bution, X a large enough n.

22

• Example: Flaws in a copper wire. Let X denote the number of flaws in a 1 inch length of copper wire. The probability mass function of X is presented in the following table: x P (X = x) 0 0.48 1 0.39 2 0.12 3 0.01 Suppose n = 100 wires are sampled from this population. What is the probability that the average number of flaws per wire in the sample is less than 0.5?

23

ANS: ¯ < 0.5) =? P (X

24

¯  and X ¯ Differences in sample means X What if we’re interested in estimating the difference in means between two populations? Value of interest: µ1 − µ2

0.15

0.20

¯1 − X ¯2 Estimator: X

Pop'n 2

0.00

0.05

0.10

Pop'n 1

−5

0

5

10

15

20

25

30

Y

The above picture shows two populations with different means, µ1 − µ2 6= 0.

25

0.20 0.15

Pop'n 2

0.00

0.05

0.10

Pop'n 1

−5

0

5

10

15

20

25

30

Y

If the populations had the same mean, then the two distributions would be on top of each other (no distinction), and µ1 − µ2 = 0. We want to know the behavior of our estimator ¯1 − X ¯ 2. X ¯ So far, we’ve only discussed the behavior of X.

26

¯1 − X ¯ 2: The sampling distribution of X We will assume the sample from each group was taken independent of each other (two independent samples). ¯1 − X ¯ 2) = E(X ¯ 1) − E(X ¯ 2) E(X = µ1 − µ2 where where

µ1 is the population mean of pop’n 1 µ2 is the population mean of pop’n 2

¯1 − X ¯ 2 ) = V (X ¯ 1 ) + V (X ¯ 2) V (X σ12 σ22 + = n1 n2

{since independent}

where σ12 is the population variance of pop’n 1 where σ22 is the population variance of pop’n 2 27

⇒

¯1 − X ¯ 2 is a random variable X with and

¯1 − X ¯ 2) = µ1 − µ2 E(X ¯1 − X ¯ 2) = V (X

σ12 n1

+

σ22 n2

So, we have the expected value and the variance of this random variable of interest. But we’d like to know the full distribution of the r.v.

28

IF both original populations were normal, then ¯ 1 and X ¯ 2 are linear combinations of normal X ¯1 − X ¯ 2 is also a linear random variables, and X combination of normals... so 2 2 σ σ ¯1 − X ¯ 2 ∼ N (µ1 − µ2, 1 + 2 ) X n1 n2

Again, we have a random variable of interest ¯1 − X ¯ 2 that has a normal distribution with X known ‘predictable’ behavior. ————————————————————

What if both original populations were NOT normal? If n1 and n2 are both greater than 30, then we can apply the central limit theorem to show that ¯1 − X ¯ 2 is again, normally distributed. X 29

• Approximate Sampling Distribution ¯ − X ¯ for X If we have two independent populations with means µ1 and µ2 and variances σ12 and σ22, ¯ 1 and X ¯ 2 are sample means of two and if X independent random samples of size n1 and n2 from the two populations, then the sampling distribution of ¯1 − X ¯ 2) − (µ1 − µ2) (X r Z= σ22 σ12 n1 + n2

is approximately standard normal (if the conditions of the central limit theorem apply). If the original populations were normal to begin with, then Z is exactly a standard normal.

30

• Example: Difference in means A random sample of n1 =20 observations are taken from a normal population with mean 30. A random sample of n2 =25 observations are taken from a different normal population with mean 27. Both populations have σ 2 = 8. ¯1 − X ¯ 2 exWhat is the probability that X ceeds 5?

31

• Example: Picture tube brightness (problem 7-14 p.248) A consumer electronics company is comparing the brightness of two different types of picture tubes. Type A is the present model, and is thought to have a population mean brightness of 100 and a known standard deviation of 16. Type B has an unknown mean brightness and standard deviation equal to type A. If µB exceeds µA, the manufacturer would like to adopt type B for use. A random sample of 25 is taken from each type...

32

The observed difference in sample means is x¯B − x¯A = 6.75 (so, the sample mean brightness for type B was higher than the sample mean for type A, but is it high enough). What decision should they make?

33