Empirical Software Engineering. Hypotheses testing

Technische Universität München Empirical Software Engineering Hypotheses testing Dr. Antonio Vetrò Technische Universität München Institute for Info...
Author: Leon May
2 downloads 2 Views 1MB Size
Technische Universität München

Empirical Software Engineering Hypotheses testing Dr. Antonio Vetrò

Technische Universität München Institute for Informatics Software & Systems Engineering

Outline §  §  §  §  §  § 

Statistics recall Hypothesis testing process Confidence intervals A software engineering example Final considerations Type of tests

2

Outline §  §  §  §  §  § 

Statistics recall Hypothesis testing process Confidence intervals A software engineering example Final considerations: HP testing pitfalls Type of tests

3

Statistical inference §  The process of making guesses about the truth from a sample Sample statistics n

µˆ = X n =

∑x i =1

n

n

∑ (x − X i

σˆ 2 = s 2 =

Population parameters

∑x µ=

i=1

N

N

∑(x − µ) i

σ2 =

i =1

N

2

i =1

n −1

*the notation ^ is often used to indicate “estimate”

Truth (not observable)

N

n)

Sample (observation)

2

Make guesses about the whole population

Statistics vs. Parameters

§  Sample Statistic –  Any summary measure calculated from data; e.g., could be a mean, a difference in means or proportions, an odds ratio, or a correlation coefficient •  E.g., the mean size in a sample of 1000 classes of a scientific Java application is 245 LOCs •  E.g., the correlation coefficient between LOCs and number of bugs in the sample of 100 Java classes is 0.65

§  Population parameter –  The true value/true effect in the entire population of interest •  E.g., the true mean size in all classes of scientific applications written in Java is 250 LOCs •  E.g., the true correlation coefficient between LOCs and number of bugs in classes of all scientific applications written in Java is 0.67

Examples of Sample Statistics:

Single population mean Single population proportion Difference in means Difference in proportions Odds ratio/risk ratio Correlation coefficient Regression coefficient …

The Central Limit Theorem: If all possible random samples, each of size n, are taken from any population with a mean µ and a standard deviation σ, the sampling distribution of the sample means (averages) will: 1. have mean:

µx = µ

2. have standard deviation:

σ σx = n

3. be approximately normally distributed regardless of the shape of the parent population (normality improves with larger n)

Central Limit Theorem caveats for small samples: §  For small samples: –  The sample standard deviation is an imprecise estimate of the true standard deviation (σ); this imprecision changes the distribution to a T-distribution. •  A t-distribution approaches a normal distribution for large n (≥100), but has fatter tails for small n ( μy , by chance E.g., P-value < 0.001 means: P(empirical data/null hypothesis) tα/2, n+m−2

not reject H

otherwise

a t-distribution with n + m − 2 degrees of freedom. From this, it follows that we can 0 the hypothesis that µx = µy as follows:

he 100 α/2 percentile point of a t-random variable with n + m − 2 ee Figure 8.5).accept H0 if |T | ≤ tα/2, n+m−2 reject H0 if |T | > tα/2, n+m−2 thebevalue the test the statistic is isv,observed then the p-value is estIfcan run by of determining p-value.TIf T to equal re tα/2, n+m−2 is the 100 α/2 percentile point of a t-random variable with n + m − 2 • 8.5).Ifofthe test statistic T is v , then p value is p-value of (see theFigure test H0 value againstofHthe rees of freedom 1 is given by

given by

Alternatively, the test can be run by determining the p-value. If T is observed to equal hen the resulting p-value of the test of H0 against H1 is given by

p-value = P{|Tn+m−2 | ≥ |v|} p-value = P{|Tn+m−2 | ≥ |v|}

≥ |v|} = 2P{T = 2P{Tn+m−2 n+m−2 ≥ |v|}

p-value = P{Tn+m−2 ≥ v}

Program 8.4.2 bothforthe value of theSheldon testRoss statistic and the corresponding p-value. Source:computes Probability and Statistics Engineers and Scientists,

re Tn+m−2 is a t-random variable having n + m − 2 degrees of freedom. * Normal distributions and unknown variances. n and m are sample sizes for X and Y. 95% confidence level f we are interested in testing the one-sided hypothesis

random variable having n + m − 2 degrees of freedom.

µone-sided versus H1 : µx > µy volunteers at a cold research institute caught a cold after having EXAMPLE Twenty-two d in testing Hthe hypothesis 0 : 8.4b x ≤ µy

random variable of freedom. is thewith 100k degrees α/2 percentile n+m−2

point of a t-random variable with n + m − 2

reedom (see Figure 8.5). ively, theoftest be P-value: run by determining If T is(2/2) observed to equal estimator the can common variance σ 2 , recalling is given bythe p-value. The the basics esting the Equality of Means of Two Normal Populations resulting p-value of the test of H0 against 317H1 is given by 2 + (m − 1)S 2 (n − 1)S x y Sp2 =Compute * : |v|} our example n+m−2 | ≥ np-value +pmvalue − 2=in P{|T

2P{Tn+m−2 ≥ |v|} s true, and so µx 1.187 − µy = 0, the=0.208 statistic Area5! Area5!

is a t-random variable X − Y having n + m − 2 degrees of freedom. T ≡ ! E 8.5 Density of a t-random variable with k degrees of freedom. 6.98 interested in testing the one-sided hypothesis 2 S (1/n + 1/m)

m−2

2t!,k

0

t!,k

6.891e-12

p

re Sp2 , the pooled estimator of the common variance σ 2 , is given by

H :µ ≤µ

versus

H :µ >µ

x of2 +freedom. y 1 x y (m − 1)Sy2 From this, it follows that we can (n − 1)S ith n + m − 2 0Sdegrees x 2 1.622 p = at µx = µy as follows: n + m − 2 µx − µy =values 0, the statistic Hence, H0 is true, and ll bewhen rejected atso large of T . Thus the significance level α test is to, t0.95,371 =1.649 Conf. level = 0.95 accept H0 if |T X −|Y ≤ tα/2, n+m−2

318

reject H0

H0 if T ≥ tα, n+m−2 if reject |T | > tα/2, n+m−2

T ≡! Sp2 (1/n + 1/m)

not reject H

6.98 > 1.622 Chapter 8: Hypothesis Testing

otherwise

a t-distribution with n + m − 2 degrees of freedom. From this, it follows that we can 0 the hypothesis that µx = µy as follows:

Hypothesis rejected , but this tells he 100 α/2 percentile point of a t-random variable with n + m − 2 nothing about causality. ee Figure 8.5).accept H0 if |T | ≤ tα/2, n+m−2 reject H0 if |T | > tα/2, n+m−2 thebevalue the test the statistic is isv,observed then the p-valueWe is need givenexplanations by estIfcan run by of determining p-value.TIf T to equal re tα/2, n+m−2 is the 100 α/2 percentile point of a t-random variable with n + m − 2 • 8.5).Ifofthe test statistic T is v , then p value is p-value of (see theFigure test H0 value againstofHthe rees of freedom 1 is given by

Alternatively, the test can be run by determining the p-value. If T is observed to equal hen the resulting p-value of the test of H0 against H1 is given by

p-value = P{|Tn+m−2 | ≥ |v|} p-value = P{|Tn+m−2 | ≥ |v|}

≥ |v|} = 2P{T = 2P{Tn+m−2 n+m−2 ≥ |v|}

p-value = P{Tn+m−2 ≥ v}

6.891e-12

Program 8.4.2 bothforthe value of theSheldon testRoss statistic and the corresponding p-value. Source:computes Probability and Statistics Engineers and Scientists,

re Tn+m−2 is a t-random variable having n + m − 2 degrees of freedom. * Normal distributions and unknown variances. n and m are sample sizes for X and Y. 95% confidence level f we are interested in testing the one-sided hypothesis

random variable having n + m − 2 degrees of freedom.

µone-sided versus H1 : µx > µy volunteers at a cold research institute caught a cold after having EXAMPLE Twenty-two d in testing Hthe hypothesis 0 : 8.4b x ≤ µy

The P-value and significance of the test §  By convention, p-values of

Suggest Documents