Program. Statistical inference Statistical models, estimation and confidence intervals. The sample mean. Distribution of a sample mean

Program Faculty of Life Sciences Statistical inference Statistical models, estimation and confidence intervals Ib Skovgaard & Claus Ekstrøm E-mail: i...

Author: Amanda Long

11 downloads 3 Views 825KB Size

Report

Download PDF

Recommend Documents

Statistical Inference. Confidence Intervals

1-Sample Inference: Confidence Intervals

Distribution of the Sample Mean

Continuous Probabilities: Normal Distribution, Confidence Intervals for the Mean, and Sample Size. The Normal Distribution

Sampling Distribution of Sample Mean

Chapter 9: Confidence Intervals. Statistical Estimation Point Estimation Interval Estimation. Confidence Intervals One-sided Confidence Intervals

Statistical Intervals Based on a Single Sample

PHP 2510 Inference about population mean; distribution of the sample mean; standard error; central limit theorem

Real One- and Two-Sample Statistical Inference

Inference on Proportion. Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval. Hypothesis Testing

The mean and the std. dev. of the sample mean

Properties of the Sample Mean

The central limit theorem The distribution of the sample proportion The distribution of the sample mean

The Distributions of the Sample Mean and Sample Proportion

Bootstrapped Confidence Intervals as an Approach to Statistical Inference

Inferences Based on a Single Sample Estimation with Confidence Intervals

Sample Mean Range

7 2 Confidence Intervals for the Mean (S Known or n 30) and Sample Size

CONCEPTIONS OF SAMPLE AND THEIR RELATIONSHIP TO STATISTICAL INFERENCE

Statistical inference using bootstrap confidence intervals Michael Wood Bootstrap confidence intervals

CHAPTER 7 STANDARD ERROR OF THE MEAN AND CONFIDENCE INTERVALS

s) is. sample variance sample mean

3.3 Statistical Inference with one sample from a population

Confidence intervals and other statistical intervals in metrology

Program Faculty of Life Sciences

Statistical inference Statistical models, estimation and confidence intervals Ib Skovgaard & Claus Ekstrøm E-mail: [email protected]

• Distribution of a sample mean • Statistical inference for a single sample • statistical model • estimation and precision of estimates • the t-distribution • confidence intervals • Statistical inference for linear regression

Slide 2 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

The sample mean

Distribution of a sample mean

• But how precise is it?

2.0 1.5 Density 1.0

• Estimate of µ is µ ˆ = y¯ = 12.76

n = 25

0.5

• Sample statistics, y¯ = 12.76 and s = 2.25.

1.5

• We have: a sample of n = 162 weights: y1 , . . . , y162 .

n = 10

Density 1.0

• Wanted: the mean weight in the population — µ

0.5

Weights of crabs:

2.0

Histograms of the sample mean of n independent N(0, 1) variables.

0.0

To answer this we make a confidence interval for µ. This requires a statistical model.

0.0

ˆ − µ to be? How large can we expect µ −1.0

−0.5

0.0 y

0.5

1.0

−1.0

−0.5

0.0 y

Mean? — Standard deviation? — distribution?

Slide 3 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Slide 4 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

0.5

1.0

Distribution of a sample mean

Statistical model Histogram and N-density

QQ-plot

. . . and σ can be estimated from the sample.

Sample Quantiles 12 14 16 18

Density 0.10

10

√ normal with mean µ and standard deviation σ / n

● ●● ● ●● ●●

● ●●

8

• Because a mean of n independent N(µ, σ 2 )-variables is

0.05

• Answer: Mathematical computation!

0.00

In practice we only observe one sample mean, so how can we find its distribution?

0.15

20

●

●

●●

● ●●●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ●● ●●● ●● ●●

● ●

8

10

12

14 16 Weight

18

20

−2

−1 0 1 Theoretical Quantiles

2

Statistical model: y1 , . . . , y162 are independent and yi ∼ N(µ, σ 2 ) In words, the observations are normally distributed, have the same mean, the same standard deviation and are independent.

Slide 5 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Slide 6 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

ˆ Precision of µ

Estimation Statistical model: y1 , . . . , y162 ∼ N(µ, σ 2 ) independent Parameters in the model • mean µ — in the population • standard deviation σ — in the population

Estimation: The population parameters are estimated as the sample statistics: • µ ˆ = y¯

ˆ tells nothing about the precision. But we know that The estimate µ √ • sd(¯ y) = σ/ n √ • y¯ is within µ ± 1.96 · σ / n with 95% probability. But we don’t know σ , just the estimate (s). • Standard error of y¯ — estimated standard deviation:

√ SE(¯ y ) = s/ n √ • y¯ is within µ± ??? · s/ n with probability 95%.

• σ ˆ =s

Slide 7 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Slide 8 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

The t-distribution

Confidence interval for µ

df = 1, 4 and N(0, 1)

0.3

0.4

Standardization √ n(¯ y − µ) z= ∼ N(0, 1), σ

0.0

0.1

Density 0.2

When the estimate, s, of σ is inserted the distribution is changed from a normal distribution to a t-distribution: √ n(¯ y − µ) ∼ tn−1 T= s −4

−2

0 T

2

4

The t-distribution with n − 1 degrees of freedom. • Thicker tails than N(0, 1)

If t0.975,n−1 is the 97.5%-quantile in the tn−1 -distribution: √ n(¯ y − µ) < tn−1,0.975 = 0.95. P −tn−1,0.975 < s These two inequalities can be rearranged to give two inequalities for µ: s s P y¯ − tn−1,0.975 · √ < µ < y¯ + tn−1,0.975 · √ ) = 0.95 n n This interval contains the population mean, µ, with probability 95%. The interval is called a 95% confidence interval for µ.

• Resembles N(0, 1) more and more as df increases. Slide 9 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Confidence intervals: weights of crabs

Slide 10 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Confidence intervals: interpretation

Recall: n = 162, y¯ = 12.75 and s = 2.25. Quantiles: > qt(0.975,161) [1] 1.974808 > qt(0.95,161) [1] 1.654373 Compute • Standard error, SE(ˆ µ )? • 95% confidence interval?

95%-confidence interval for µ s ˆ ± tn−1,0.975 · SE(ˆ y¯ ± tn−1,0.975 · √ = µ µ) n Interpretation: with probability 95%, the interval contains the population mean, µ. What happens when the sample size, n, increases? Does the 95% confidence interval become wider or narrower?

• 90% confidence interval?

Slide 11 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Slide 12 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Confidence intervals: interpretation

The central limit theorem

If we repeated the experiment, then in the long run 95% of the confidence intervals would contain the population mean.

The main reason that the normal distribution is so important.

Confidence intervals for 50 data sets from N(0, 1).

The central limit theorem

95%, n=10

75%, n=10

Assume that Y1 , . . . , Yn are independent random variables with the same distribution with mean µ and standard deviation σ . Then their mean

95%, n=40

1 n Y¯ = ∑ Yi ∼ N(µ, σ 2 /n), n i=1 has a distribution which approaches the normal distribution as n increases. More precisely, ¯ Y −µ √ ≤ z → Φ(z) P σ/ n −2

−1

0 µ

1

2

−2

−1

0 µ

1

2

−2

−1

0 µ

Slide 13 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Summary: a single sample

• Statistical model: y1 , . . . , y162 independent and yi ∼ N(µ, σ 2 ) • Parameters, µ and σ : mean and standard deviation in the

population. • Estimates: µ ˆ = y¯ and σˆ = s • Distribution of the estimate: µ ˆ is normal with mean µ and

√ standard deviation σ / n

• Standard error is an estimate of the standard deviation of an

√ estimate: SE(ˆ µ ) = s/ n

• 95%-confidence interval:

ˆ ± tn−1,0.975 · SE(ˆ y¯ ± tn−1,0.975 · √sn = µ µ)

Slide 15 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

1

2

Hence, the confidence interval for the mean may be OK, even if the population is not normal. Slide 14 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Statistical model and parameters

Statistical model: the deviations from the straight line are normally distributed and independent yi = α + β · xi + e i ,

e1 , . . . , en ∼ N(0, σ 2 ) uafhængige

In words: The mean of yi is α + β · xi and the remainders (or residuals) are normal and independent with the same standard deviation. Parameters (population constants) • Intercept α and slope β • Standard deviation σ for the deviations from the line

Slide 16 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Estimates and distribution of the estimates

Standard errors and confidence intervals

ˆ shown earlier (Chapter 2). Estimates βˆ and α

Distributions: σ2 βˆ ∼ N β , , SSx

Estimate of the residual standard deviation: s s 1 n 1 n 2 ˆ 2 ˆ (y − α − β · x ) = s= i i ∑ ∑ ri n − 2 i=1 n − 2 i=1 ˆ are normally distributed: βˆ and α σ2 x¯2 1 ˆ ∼ N α, σ 2 βˆ ∼ N β , , α + , SSx n SSx

1 x¯2 ˆ ∼ N α, σ 2 α + n SSx

Standard errors — estimates of standard deviations s s 1 x¯2 ˆ =s SE(βˆ ) = √ , SE(α) + n SSx SSx n

SSx = ∑ (xi − x¯)2 .

95% confidence intervals:

i=1

The statistical experiment is an instrument that “measures” the values α and β with a precision given by the standard errors.

Slide 17 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Stearic acid example

βˆ ± t0.975,n−2 · SE(βˆ ),

ˆ ± t0.975,n−2 · SE(α) ˆ α

Note: t-distribution with n − 2 degrees of freedom is used.

Slide 18 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Reflection: What is a statistical model? • A statistical model describes the probability distribution of the

> model1 = lm(digest~st.acid} > summary(model1)

population from which our sample is drawn. • But how can we know that? • We can’t, but a model is just a rough picture displaying the

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 96.53336 1.67518 57.63 1.24e-10 *** st.acid -0.93374 0.09262 -10.08 2.03e-05 *** Residual standard error: 2.97 on 7 degrees of freedom • Statistical model? Interpretation of models? • Estimates? Confidence intervals?

important features. • Some of these features are not known. This is why we

measure a sample. • Therefore a statistical model is not complete; some aspects

have to be estimated from the sample. • These aspects may be given as a number of parameters such

as mean and standard deviation. • The remaining part of the model is assumed and should be

validated as well as possible. Without a model we have no basis for probability calculations.

Slide 19 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Slide 20 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

A typical statistical model

Main points from this lecture

Many statistical models consist of two parts: observation

=

fixed part + random part

= predictable part + unpredictable part Predictable means that it depends on factors we know (type of antibiotics, amount of stearic acid, age, treatment, etc.). The random part is defined by the equation above as the remainder (or residual)

• Statistical model and parameters • Estimates, distribution of estimates, standard error • Confidence intervals: estimate ± t-fraktil · SE(estimate) and

interpretation

random part = observation − fixed part The random part is often assumed to be normally distributed.

Slide 21 — Statistics for Life Science (Week 3-2 2010) — Statistical inference

Slide 22 — Statistics for Life Science (Week 3-2 2010) — Statistical inference