BIOM5010: Statistics #2G. Confidence Intervals Statistical Testing Statistical Power

Author: Neal Waters

0 downloads 1 Views 280KB Size

Report

Download PDF

Recommend Documents

Statistical Inference. Confidence Intervals

Testing statistical hypotheses based on fuzzy confidence intervals

Chapter 9: Confidence Intervals. Statistical Estimation Point Estimation Interval Estimation. Confidence Intervals One-sided Confidence Intervals

Confidence intervals and other statistical intervals in metrology

1 Introduction: Statistical Intervals

CONFIDENCE INTERVALS FOR SOIL PROPERTIES BASED ON DIFFERING STATISTICAL ASSUMPTIONS

Statistical inference using bootstrap confidence intervals Michael Wood Bootstrap confidence intervals

LAB EXERCISE: Statistical Analysis (calculating 95% confidence intervals)

6. Duality between confidence intervals and statistical tests

Bootstrapped Confidence Intervals as an Approach to Statistical Inference

CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

The StatPlay Software for Statistical Understanding: Confidence Intervals and Hypothesis Testing

Constructing Confidence Intervals based on Register Statistics

Statistical Tests (Hypothesis Testing)

( ) ( Statistical Equivalence Testing

Statistical Intervals Based on a Single Sample

Visual Hypothesis Testing with Confidence Intervals

Program. Statistical inference Statistical models, estimation and confidence intervals. The sample mean. Distribution of a sample mean

Lecture 10: Confidence intervals & Hypothesis testing

Statistical Power Analysis

Statistical Power in ANOVA

Chap. 8 Testing statistical hypotheses

Appendix II Testing Statistical Hypotheses

Statistical testing vs. interval estimation

BIOM5010: Statistics Slides 2G.1

BIOM5010: Statistics #2G Confidence Intervals Statistical Testing Statistical Power

(C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

Confidence Intervals

Slides 2G.2

• Text: Estimation IX(E)

Population Sample Sample Sample

Popula tion mean an e le m p m Sa

Confidence Interval Typical intervals are 95% or 99%

Sample

If repeated samples were taken and the 95% confidence interval computed for each sample, 95% of the intervals would contain the population mean (C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

What's wrong here?

Slides 2G.3

Source: phdcomics.com (C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics Slides 2G.4

Confidence Interval Calculation

• t distribution: – you estimate mean and std from the data (normal case)

• Normal distribution: – you know std; esitimate mean from data (unusual case)

• Using t distribution – Calc: m (sample mean) – Calc: s (sample std) – DF: Degrees of Freedom DF = N – 1

– SE: Standard error

sm= s/ √ N – Find tCL in t distribution Student's t distribution (wikipedia) (C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics Slides 2G.5

Confidence Interval Calculation

– Prev: Calculate: m, s, DF, sm

– Find tCL in t distribution – Low Lim = M – tCLsM – Upp Lim = M + tCLsM

DF

95%

99%

2

4.303

9.925

3

3.182

5.841

100

1.984

2.626

• Example (95% conf) – N=100, M=10, s=5 – DF = 99 – SM = 5/sqrt(100) = 0.5 – TCL = 1.984 – CI: Low lim = 10–TCL×0.5 – CI: Upp lim = 10+TCL×0.5

95% of Area

• CI= [9.008 ... 10.992] Student's t distribution (wikipedia) (C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

t Tests

Slides 2G.6

• Example, students asked to rate (on 1 ... 7) whether “you think animal research is wrong?” • Question: Is there a difference between population Mmale and Mfemale in answer?

Group

n

M

Female 17 5.35 Male

17 3.88

s 1.67 1.73

• Assumptions of the t-test – The two populations have the same variance – The populations are normally distributed. – Each value sampled independently from each other • If one subject provides two scores, then the scores are not independent.

(C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

t Tests

Slides 2G.7

• H0: There is no difference between the groups • t = (ΔM – MH0)/sM – ΔM = difference of means

= MF–MM = 1.470

=0 – MH0 = hypothesized value – sM = estimated standard error

√

2

2

σ1 σ2 • sM = + n1 n 2

= sqrt(1.672/17 + 1.732/17) = .581

• t = 1.47/.581 = 2.53 • DF = (n1 – 1) + (n2 – 1) = 32

(C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

One– vs two– tailed

Slides 2G.8

What was the research question? Two tailed: – Q: Is there a difference between population means male/female in answer? – H0: There is no difference.

(C) 2011 – 2014. Andy Adler.

One tailed: – Q: Is female pop. mean > than male pop mean? – H0: The pop mean female is not greater.

BIOM5010: Statistics

Implementation

Slides 2G.9

• Tables can be annoying • Matlab charges $$$ for stats toolbox function p= two_sided(T,DF); xi = 1 / ( 1 + T^2 / DF ); p= 1.0*betainc (xi, DF/2, 0.5); function p= one_sided(T,DF); xi = 1 / ( 1 + T^2 / DF ); p= 0.5*betainc (xi, DF/2, 0.5);

(C) 2011 – 2014. Andy Adler.

DF

95%

99%

2

4.303

9.925

3

3.182

5.841

100

1.984

2.626

>> %Usage two_sided(3.182,3) ans = 0.0500 (=>95%)

BIOM5010: Statistics

Example

Slides 2G.10

• Two tailed question: – Is there a difference between population means male/female in answer? – H0: There is no difference

• One tailed question: – Is population mean for female greater than for male? – H0: The population mean for female is not greater.

(C) 2011 – 2014. Andy Adler.

>> %two sided two_sided(2.53,32) ans = 0.0165

p-value >> %one sided one_sided(2.53,32) ans = 0.0082

BIOM5010: Statistics

Questions

Slides 2G.11

• What are the assumptions on the t test? • What is the relationship between the confidence interval and the p-value? Can you estimate one from the other? • Why does the 2-tailed test give different values to the 1-tailed test? • If an instructor puts a table of t-test values on an exam, are you required to use it in the questions?

(C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

Statistical Power

Slides 2G.12

Scenario: You work for ABC pharma. Doing stats. • They have invented a new drug. They think it's great. • need evidence its better than current (XYZ pharma). • Run a study – Randomize patients get: ABC vs. XYZ. – Outcome measure M = measure of health

• If you choose to few subjects, – SE – ...

(C) 2011 – 2014. Andy Adler.

will be too large

BIOM5010: Statistics

Statistical Power

Slides 2G.13

If you choose to few subjects, – – – – – –

SE t = ΔM / SE p-value Study Drug You

will will will will will will

be too large be too small be too large be non-significant not get approved be fired

• Statistical Power – p of correctly accepting H, when it is true – ability of a test to detect an effect (if it exists) – Formally, Statistical power of a test is • p(correctly rejects H0 when H0 is false)

(C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

Questions

Slides 2G.14

• How is Statistical power different from t-test result? • You've made some measurements, what do you apply? • You have a test plan, what do you apply? • How about this idea? 1. Get 10 patients 2. Do test 3. Is significant? • Yes => Collect bonus for inexpensive test plan • No => Add 10 more patients, go to step #1

(C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

Statistical Power

Slides 2G.15

• Statistical Power: the probability of correctly rejecting a false null hypothesis. power = 1 – β. – β = false null hypothesis = False Pos Rate = FPR – β = p value (“prob data occur by chance given H 0”).

• It is important to consider power while designing an experiment. • This will help avoid spending a lot of time and/or money on an experiment that has little chance of finding a significant effect.

(C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

Power example

Slides 2G.16

• You think your new drug is 10% better than the competition. (ie they=10 and you're at 11). Patient variability gives std of 5. How big of a trial to get power = 95% • p = 1 – 0.95 = .05 • Since we assume std, we can use normal distribution, or t distribution with large DF

One Tailed

• t = 1.645 • t = (ΔM – MH0)/sM • sM = (11 – 10)/1.645 = 0.607 •

√

2 1

2 2

√

2

2

DF

95%

99%

3

2.353

4.541

100

1.660

2.364

∞

1.645

2.326

2 σ σ 2σ 2×5 σ sM = + = 2 →n= 2 = =135.7 2 n1 n 2 n sm 0.607

(C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

Power example

Slides 2G.17

• You think your new drug is 10% better than the competition. (ie they=10 and you're at 11). Patient variability gives std of 5. How big of a trial to get power = 95% • sM = (11 – 10)/1.645 = 0.607

√

2 2 2 σ 21 σ 22 2 σ 2×5 s M= + = 2 σ →n= 2 = =135.7 2 n1 n 2 n s m 0.607

√

• Choose 136 (realistically N=200) – This is N=136 in each group – Recalculate t test at this N – Or use online power calculators (C) 2011 – 2014. Andy Adler.

One Tailed DF

95%

99%

3

2.353

4.541

100

1.660

2.364

∞

1.645

2.326

BIOM5010: Statistics

Questions

Slides 2G.18

• How is Statistical power different from t-test result? • You've made some measurements, what do you apply? • You have a test plan, what do you apply? • How about this idea? 1. Get 10 patients 2. Do test 3. Is significant? • Yes => Collect bonus for inexpensive test plan • No => Add 10 more patients, go to step #1

(C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics Slides 2G.19

Question: Statistical Power

Example: – Imagine a researcher wants to claim that Green(G) people are smarter than Purple(P) people.

• He measures head mass of a sample of volunteers from each group. Assume the true population statistics are μG=4.4kg, σG=0.6kg and μP=4.3kg, σP=0.6kg. – Can we use the t test here? – What is the confidence interval on the mean for each group for a sample size N=100 and N=10000. – What size of study (N) is required to achieve a power of 0.95?

(C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

Pairwise comparisons

Slides 2G.20

• Many experiments are designed to compare more than two conditions. • Example: study Smiles and Leniency. – effect of different types of smiles (false, felt, miserable, neutral) on leniency showed to a person

• Obvious way: t test of difference between each group mean and each other group mean. – – – –

Test 1: false ≠ felt Test 2: false ≠ miserable Test 3: false ≠ neutral ... – Number of tests: NT= (NM – 1)×NM/2 • NM: number of means (groups) (C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

Multiple Comparisons

Slides 2G.21

Source: xkcd.com/882/ (C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics Slides 2G.22

(C) 2011 – 2014. Andy Adler.

Multiple Comparisons

BIOM5010: Statistics Slides 2G.23

(C) 2011 – 2014. Andy Adler.

Multiple Comparisons

BIOM5010: Statistics

What to do?

Slides 2G.24 • Bonferroni Correction – onlinestatbook.com/chapter10/pairwise_correlated.html – Divide significance by the number of comparisons – For 20 jelly beans: α/NT = .05/20 = .0025

• ANOVA – Analysis of Variance (ANOVA) compares means. – general rather than specific differences • H0: All means are equal

• Tukey HSD – Honestly Significant Difference test – Similar to T test for each pair of means – DF = Nobvervations – NM – Use “studentized range calculator”

• Comment: understand the issue and the Bonferroni concept. Look up the other tests if you have to. (C) 2011 – 2014. Andy Adler.

BIOM5010: Statistics

Questions

Slides 2G.25

• What is the multiple comparisons problem? – What can happen if we forget to do it?

• In the “Smiles and Leniency” study, what would the Bonferroni correction calculate (significance = .05) – onlinestatbook.com/2/case_studies/leniency.html

• How is the ANOVA test different from the Tukey HSD test? (Difference in H0) • Philosophical problem: we look at events until we see one that's unusual. Then we do “science” and see whether the effect is significant. What is NM?

(C) 2011 – 2014. Andy Adler.