Distribution of the Sample Mean

Distribution of the Sample Mean Estimation of the population mean • In many investigations the data of interest take on a wide range of possible val...
Author: Evangeline Paul
3 downloads 0 Views 1MB Size
Distribution of the Sample Mean

Estimation of the population mean • In many investigations the data of interest take on a wide range of possible values. • Examples: attachment loss (mm) and DMFS. • With this type of data it is often of interest to estimate the population mean, μ. • A common estimator for μ is the sample mean, 𝑋 • In this lecture we will focus on the sampling distribution of 𝑋

Example: Fluoride Varnish Study* • Children in Yakima WA were randomized to two different methods of fluoride varnish delivery • Followed for ~3 years • Outcome of interest was number of surfaces with new decay * Weinstein, P. et al. Caries Research 2009;43(6):484-90.

Example: Fluoride Varnish Study • Can summarize the observed data with the sample mean and standard deviation • The sample mean is used as an estimate of the true population mean. • 𝑋 = 7.4 for the “Standard” group • How good of an estimate is it? means ± standard deviations

Example: Fluoride Varnish Study • 𝑋is a random variable. • Its value is determined by which people are randomly chosen to be in the sample. • Many possible samples, many possible 𝑋’s.

X 7.4

0

20

40

X 7.8

0

X 7.8

0

20

40

20

40

0

20

40

0

20

40

20

40

20

40

0

20

40

0

20

40

40

0

20

40

0

20

40

0

20

40

0

20

40

20

40

20

40

X 6.9

0

X 7.2

0

40

X 7.0

X 7.6

0

20

X 8.1

X 7.5

X 7.0

0

20

X 6.8

X 8.0

X 7.8

X 6.6

0

0

X 7.4

X 7.0

0

40

X 7.9

X 7.4

0

20

X 7.3

20

40

X 7.0

0

20

40

Example: Fluoride Varnish Study • In our study we only see one occurrence of the sample mean. • We will have a better idea of how good our one estimate is if we have good knowledge of how 𝑋 behaves. • That is, if we know the probability distribution of 𝑋.

X 7.4

0

20

40

X 7.8

0

X 7.8

0

20

40

20

40

0

20

40

0

20

40

20

40

20

40

0

20

40

0

20

40

40

0

20

40

0

20

40

0

20

40

0

20

40

20

40

20

40

X 6.9

0

X 7.2

0

40

X 7.0

X 7.6

0

20

X 8.1

X 7.5

X 7.0

0

20

X 6.8

X 8.0

X 7.8

X 6.6

0

0

X 7.4

X 7.0

0

40

X 7.9

X 7.4

0

20

X 7.3

20

40

X 7.0

0

20

40

The Central Limit Theorem • An important result in probability theory states that the probability distribution for averages (i.e. 𝑋 ) is the Normal distribution* • The size of the sample needs to be reasonably large • This result will often hold, regardless of the distribution of the original data *some restrictions will apply

Probability distribution for 𝑿

μ

Approximation with the Normal distribution not as good with only 10 observations

More on the distribution of 𝑿 • The expected value of 𝑋 is μ • 𝑋 is “unbiased” • On average, 𝑋 is perfect as an estimator of μ

More on the distribution of 𝑿 • The standard deviation of 𝑋 is 𝑆𝐸 𝑋 =

𝜎 𝑛

• 𝜎 is the standard deviation in the population. • n is the number of people in the sample

• It is called the standard error of the mean or SEM

More on the distribution of 𝑿 𝝈 𝑺𝑬 𝑿 = 𝒏 • One can think of the 𝑆𝐸 𝑋 as the average error that 𝑋 makes when estimating μ, or the precision of the estimate. • The precision of 𝑋 is better (SEM is smaller) when the sample is larger (larger n) • The precision is worse (SEM is greater) when the population is more variable (has greater 𝜎)

More on the distribution of 𝑿 • By the Central Limit Theorem when n is reasonably large, then the distribution of 𝑋 will be approximately Normal, with mean μ, and standard deviation 𝜎 𝑛 2 𝜎 𝑋 ~ 𝑁𝑜𝑟𝑚𝑎𝑙 𝜇, 𝑛

Example: Birthweight data • The histogram shows the distribution of birthweights at a Boston hospital. • Estimate the probability that the mean birthweight of the next 20 babies born will be greater than 120 oz.

𝜇 = 112 oz 𝜎 = 20.6 𝑜𝑧

Law of Large Numbers 2 𝜎 𝜇,

• Recall 𝑋 ~ 𝑁𝑜𝑟𝑚𝑎𝑙 𝑛 • As the n gets large, the distribution of 𝑋 is forced to be closer and closer to μ.

Law of Large Numbers 2 𝜎 𝜇,

• Recall 𝑋 ~ 𝑁𝑜𝑟𝑚𝑎𝑙 𝑛 • As the n gets large, the distribution of 𝑋 is forced to be closer and closer to μ. • With larger sample sizes 𝑋 provides a better estimate of μ. • The same is true for the sample standard deviation s. • As the sample size increases, s should get closer to the population standard deviation σ.

Standard Error versus Standard Deviation Standard Deviation: describes the variability of a population or a sample.

Standard Error: describes the variability of an estimator that is usually a function of the whole sample.

Confidence intervals for the mean 𝑋−𝜇 ~𝑁(0,1) 𝜎 𝑛

• If n is large enough we can use the result that to a construct confidence interval for μ. • However, this would result in a formula that involves σ, a value that we don’t usually know. • In practice we will estimate σ with the sample standard deviation, s. • Substituting the random variable s for σ will alter the distribution of the Z score slightly.

The t distribution The distribution of the statistic

𝑋−𝜇 𝑇= 𝑠 𝑛 is called a “t” distribution with n-1 “degrees of freedom”, and is denoted by tn-1

The t distribution 𝑋−𝜇 𝑇= 𝑠 𝑛 • The shape of the t distribution is similar to the Normal distribution, but it has higher variability • How much higher depends on the degrees of freedom, which depends on the sample size.

The t distribution 𝑋−𝜇 𝑇= 𝑠 𝑛 • The larger the sample, the less variability. • t distributions with higher degrees of freedom are more similar to the Normal distribution.

Confidence intervals for the mean • If X is Normal or n is large, then 𝑇 = with n-1 degrees of freedom and 𝑃 −𝑡𝑛−1,0.975

Suggest Documents