Technische Universität München
Empirical Software Engineering Hypotheses testing Dr. Antonio Vetrò
Technische Universität München Institute for Informatics Software & Systems Engineering
Outline § § § § § §
Statistics recall Hypothesis testing process Confidence intervals A software engineering example Final considerations Type of tests
2
Outline § § § § § §
Statistics recall Hypothesis testing process Confidence intervals A software engineering example Final considerations: HP testing pitfalls Type of tests
3
Statistical inference § The process of making guesses about the truth from a sample Sample statistics n
µˆ = X n =
∑x i =1
n
n
∑ (x − X i
σˆ 2 = s 2 =
Population parameters
∑x µ=
i=1
N
N
∑(x − µ) i
σ2 =
i =1
N
2
i =1
n −1
*the notation ^ is often used to indicate “estimate”
Truth (not observable)
N
n)
Sample (observation)
2
Make guesses about the whole population
Statistics vs. Parameters
§ Sample Statistic – Any summary measure calculated from data; e.g., could be a mean, a difference in means or proportions, an odds ratio, or a correlation coefficient • E.g., the mean size in a sample of 1000 classes of a scientific Java application is 245 LOCs • E.g., the correlation coefficient between LOCs and number of bugs in the sample of 100 Java classes is 0.65
§ Population parameter – The true value/true effect in the entire population of interest • E.g., the true mean size in all classes of scientific applications written in Java is 250 LOCs • E.g., the true correlation coefficient between LOCs and number of bugs in classes of all scientific applications written in Java is 0.67
Examples of Sample Statistics:
Single population mean Single population proportion Difference in means Difference in proportions Odds ratio/risk ratio Correlation coefficient Regression coefficient …
The Central Limit Theorem: If all possible random samples, each of size n, are taken from any population with a mean µ and a standard deviation σ, the sampling distribution of the sample means (averages) will: 1. have mean:
µx = µ
2. have standard deviation:
σ σx = n
3. be approximately normally distributed regardless of the shape of the parent population (normality improves with larger n)
Central Limit Theorem caveats for small samples: § For small samples: – The sample standard deviation is an imprecise estimate of the true standard deviation (σ); this imprecision changes the distribution to a T-distribution. • A t-distribution approaches a normal distribution for large n (≥100), but has fatter tails for small n ( μy , by chance E.g., P-value < 0.001 means: P(empirical data/null hypothesis) tα/2, n+m−2
not reject H
otherwise
a t-distribution with n + m − 2 degrees of freedom. From this, it follows that we can 0 the hypothesis that µx = µy as follows:
he 100 α/2 percentile point of a t-random variable with n + m − 2 ee Figure 8.5).accept H0 if |T | ≤ tα/2, n+m−2 reject H0 if |T | > tα/2, n+m−2 thebevalue the test the statistic is isv,observed then the p-value is estIfcan run by of determining p-value.TIf T to equal re tα/2, n+m−2 is the 100 α/2 percentile point of a t-random variable with n + m − 2 • 8.5).Ifofthe test statistic T is v , then p value is p-value of (see theFigure test H0 value againstofHthe rees of freedom 1 is given by
given by
Alternatively, the test can be run by determining the p-value. If T is observed to equal hen the resulting p-value of the test of H0 against H1 is given by
p-value = P{|Tn+m−2 | ≥ |v|} p-value = P{|Tn+m−2 | ≥ |v|}
≥ |v|} = 2P{T = 2P{Tn+m−2 n+m−2 ≥ |v|}
p-value = P{Tn+m−2 ≥ v}
Program 8.4.2 bothforthe value of theSheldon testRoss statistic and the corresponding p-value. Source:computes Probability and Statistics Engineers and Scientists,
re Tn+m−2 is a t-random variable having n + m − 2 degrees of freedom. * Normal distributions and unknown variances. n and m are sample sizes for X and Y. 95% confidence level f we are interested in testing the one-sided hypothesis
random variable having n + m − 2 degrees of freedom.
µone-sided versus H1 : µx > µy volunteers at a cold research institute caught a cold after having EXAMPLE Twenty-two d in testing Hthe hypothesis 0 : 8.4b x ≤ µy
random variable of freedom. is thewith 100k degrees α/2 percentile n+m−2
point of a t-random variable with n + m − 2
reedom (see Figure 8.5). ively, theoftest be P-value: run by determining If T is(2/2) observed to equal estimator the can common variance σ 2 , recalling is given bythe p-value. The the basics esting the Equality of Means of Two Normal Populations resulting p-value of the test of H0 against 317H1 is given by 2 + (m − 1)S 2 (n − 1)S x y Sp2 =Compute * : |v|} our example n+m−2 | ≥ np-value +pmvalue − 2=in P{|T
2P{Tn+m−2 ≥ |v|} s true, and so µx 1.187 − µy = 0, the=0.208 statistic Area5! Area5!
is a t-random variable X − Y having n + m − 2 degrees of freedom. T ≡ ! E 8.5 Density of a t-random variable with k degrees of freedom. 6.98 interested in testing the one-sided hypothesis 2 S (1/n + 1/m)
m−2
2t!,k
0
t!,k
6.891e-12
p
re Sp2 , the pooled estimator of the common variance σ 2 , is given by
H :µ ≤µ
versus
H :µ >µ
x of2 +freedom. y 1 x y (m − 1)Sy2 From this, it follows that we can (n − 1)S ith n + m − 2 0Sdegrees x 2 1.622 p = at µx = µy as follows: n + m − 2 µx − µy =values 0, the statistic Hence, H0 is true, and ll bewhen rejected atso large of T . Thus the significance level α test is to, t0.95,371 =1.649 Conf. level = 0.95 accept H0 if |T X −|Y ≤ tα/2, n+m−2
318
reject H0
H0 if T ≥ tα, n+m−2 if reject |T | > tα/2, n+m−2
T ≡! Sp2 (1/n + 1/m)
not reject H
6.98 > 1.622 Chapter 8: Hypothesis Testing
otherwise
a t-distribution with n + m − 2 degrees of freedom. From this, it follows that we can 0 the hypothesis that µx = µy as follows:
Hypothesis rejected , but this tells he 100 α/2 percentile point of a t-random variable with n + m − 2 nothing about causality. ee Figure 8.5).accept H0 if |T | ≤ tα/2, n+m−2 reject H0 if |T | > tα/2, n+m−2 thebevalue the test the statistic is isv,observed then the p-valueWe is need givenexplanations by estIfcan run by of determining p-value.TIf T to equal re tα/2, n+m−2 is the 100 α/2 percentile point of a t-random variable with n + m − 2 • 8.5).Ifofthe test statistic T is v , then p value is p-value of (see theFigure test H0 value againstofHthe rees of freedom 1 is given by
Alternatively, the test can be run by determining the p-value. If T is observed to equal hen the resulting p-value of the test of H0 against H1 is given by
p-value = P{|Tn+m−2 | ≥ |v|} p-value = P{|Tn+m−2 | ≥ |v|}
≥ |v|} = 2P{T = 2P{Tn+m−2 n+m−2 ≥ |v|}
p-value = P{Tn+m−2 ≥ v}
6.891e-12
Program 8.4.2 bothforthe value of theSheldon testRoss statistic and the corresponding p-value. Source:computes Probability and Statistics Engineers and Scientists,
re Tn+m−2 is a t-random variable having n + m − 2 degrees of freedom. * Normal distributions and unknown variances. n and m are sample sizes for X and Y. 95% confidence level f we are interested in testing the one-sided hypothesis
random variable having n + m − 2 degrees of freedom.
µone-sided versus H1 : µx > µy volunteers at a cold research institute caught a cold after having EXAMPLE Twenty-two d in testing Hthe hypothesis 0 : 8.4b x ≤ µy
The P-value and significance of the test § By convention, p-values of