PHP 2510 Inference about population mean; distribution of the sample mean; standard error; central limit theorem

PHP 2510 Inference about population mean; distribution of the sample mean; standard error; central limit theorem Begin formalities of drawing inferen...
Author: Georgia Barton
2 downloads 0 Views 100KB Size
PHP 2510 Inference about population mean; distribution of the sample mean; standard error; central limit theorem

Begin formalities of drawing inference from data Usual set up: target parameter (e.g. population mean) Use a statistic to estimate the parameter (e.g. sample mean) Recall: Statistic is a summary of data (random variables).

PHP 2510 – October 15, 2009

1

Key Idea: We want to estimate population mean which is unknown. We know sample mean. (We collect data, and calculate sample mean.) We use sample mean as an estimate of the population mean. The sample mean is random. It has a distribution. We need to quantify uncertainty of using sample mean to estimate population mean.

PHP 2510 – October 15, 2009

2

To motivate the idea...

Suppose a biostatistics program has 10 PhD students. We label them by A, B, ..., J. Their heights are (inch): 5.41, 5.31, 5.47, 5.54, 5.58, 5.84, 5.73, 5.41, 5.74, and 5.26. The population is: {A, B, ..., J}. Population average height (parameter of interest) is 5.529. Suppose that we do not know their average height, and want to estimate it using a random sample of 5 students. There are many ways to randomly choose 5 students.

PHP 2510 – October 15, 2009

3

Possible samples include: sample

sample mean

1

BCDFH

5.514

2

BCEFH

5.522

3

DEHIJ

5.506

4

BDGIJ

5.516

5

ADEFI

5.622

...

...

...

We see only one of these, which is random.

PHP 2510 – October 15, 2009

4

What if we decide to randomly select 8 students and use their average height as an estimate? Possible samples include: sample

sample mean

1

ABDEFHIJ

5.511

2

ABCDEFGH

5.536

3

ABCEGHIJ

5.489

4

BCDEFGIJ

5.559

5

ABCEFGHJ

5.501

...

...

PHP 2510 – October 15, 2009

...

5

5.8 5.7 5.6 5.5

x xxx xx xxx xx xx xx xx x

5.3

5.4

sample mean

x x x xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx x x

4

5

6

7

8

9

sample size

PHP 2510 – October 15, 2009

6

Posing the basic question We want to draw inference about the mean of some population. We collect observations X1 , . . . , Xn and compute the sample mean X = (1/n)

n X

Xi

i=1

Note that each Xi is random. Question: What information about the true mean is captured by the sample mean? Specifically, what can we infer about the true mean from the sample mean?

PHP 2510 – October 15, 2009

7

Sample mean is a random variable The key to conducting inference is to understand that the sample mean is a random variable. It has: • A mean • A variance • An associated probability model In short, we will study how the sample mean, which is observable, behaves in relation to the population mean, which is not.

PHP 2510 – October 15, 2009

8

Intuition about the sample mean Much of what we will learn in a formal way appeals to intuition.

• A sample mean will almost never be equal to the population mean.

• A sample mean based on a large sample is expected to be closer to the population mean, compared to a sample mean based on a small sample.

PHP 2510 – October 15, 2009

9

Example: Presidential polling The objective is to estimate percent of eligible voters who will vote for Barack Obama. The true percent who will vote for Obama is unknown. Let’s denote it by π. In the poll, the response of each individual is a Bernoulli random variable:   1 vote for Obama Xi =  0 do not vote for Obama We therefore can write both the mean and variance of Xi : E(Xi ) var(Xi )

PHP 2510 – October 15, 2009

= π = π(1 − π)

10

Three polls are taken, each using a different sample size: • 10 randomly chosen subjects

• 100 subjects

• 1000 subjects

PHP 2510 – October 15, 2009

11

Example: Diastolic blood pressure Suppose we want to estimate mean DBP among those who are at risk for hypertension. The unknown mean is µ. The variance of DBP also is unknown; call it σ 2 . Random variable for the individual outcomes: Xi

= diastolic BP

• Mean of each outcome: E(Xi ) = µ • Variance of each outcome: var(Xi ) = σ 2

PHP 2510 – October 15, 2009

12

Consider samples of size • 10 subjects

• 100 subjects

• 1000 subjects

PHP 2510 – October 15, 2009

13

Mean and variance of sample mean Sample mean is a random variable: X

= (X1 + X2 + · · · + Xn )/n n X = Xi /n i=1

What is its mean and variance?

PHP 2510 – October 15, 2009

14

Example 1: Presidential poll. What are the mean and variance of sample means?

PHP 2510 – October 15, 2009

15

Example 2: Blood pressure Population distribution: Suppose DBP has normal distribution with µ = 80, σ = 10 What are the mean and variance of sample means?

PHP 2510 – October 15, 2009

16

Distribution of the sample mean Case 1: Population distribution is normal For an individual in the population, Xi ∼ N (µ, σ 2 ) (Example: DBP has µ = 80, σ = 10) Then, for a sample of size n, the sample mean also has a normal distribution X n ∼ N (µ, σ 2 /n)

PHP 2510 – October 15, 2009

17

Example: Population distribution of DBP is normal, with mean = 80, SD = 10 Sample 25 individuals, compute X. E(X) = var(X) = Moreover, X ∼

PHP 2510 – October 15, 2009

18

Continuing with the DBP example, what is the distribution and variance of the following sample means? 1. Sample mean of n = 64 people

2. Sample mean of n = 100 people

3. Sample mean of n = 10000 people

PHP 2510 – October 15, 2009

19

Distribution of the sample mean Case 2: Population distribution is not normal Examples: Poisson, Binomial Then, for large samples, the sample mean approximately has a normal distribution with mean equal to E(X) and variance equal to var(X)/n

PHP 2510 – October 15, 2009

20

Example 1. Population distribution is Bernoulli with p = 0.2. What is the distribution of the following sample means? 1. Sample mean with n = 100 2. Sample mean with n = 64 3. What if p = 0.5 instead?

PHP 2510 – October 15, 2009

21

Example 2. Population distribution is Poisson with λ = 1. What is the distribution of the following sample means? 1. Sample mean of 25 observations 2. Sample mean of 100 observations 3. What is the probability that a sample mean from n = 25 will exceed 1.2? What is the probability it will exceed 1.6?

PHP 2510 – October 15, 2009

22

Standard error as a measure of precision The SD of a sample mean is called its standard error (SE). SE is a measure of how precisely the sample mean estimates the population mean. • Depends on n • Depends on underlying variation in the population being sampled For a sample drawn from a population having variance σ 2 , the √ standard error of X is σ / n • decreases with n • increases with σ

PHP 2510 – October 15, 2009

23

Central Limit Theorem Characterizes distribution of X in large samples When n is large, X ∼ N (µ, σ 2 /n) where µ = σ2

=

population mean population variance

This is true for any underlying distribution of values. What is the implication?

PHP 2510 – October 15, 2009

24

Summary so far Sample mean is a random variable The expected value of the sample mean is the same as the expected value for the population The variance of the sample mean is σ 2 /n √

The standard error of the sample mean is σ/ n When n is large, the sample mean follows a normal distribution If underlying distribution is normal, sample mean is normally distributed, regardless of sample size

PHP 2510 – October 15, 2009

25

Suggest Documents