Empirical Software Engineering. Hypotheses testing

Technische Universität München Empirical Software Engineering Hypotheses testing Dr. Antonio Vetrò Technische Universität München Institute for Info...

Author: Leon May

2 downloads 2 Views 1MB Size

Report

Download PDF

Recommend Documents

Software Engineering. Software Testing. Software Engineering SW Testing Slide 1

Testing Hypotheses (and Null Hypotheses)

Software Engineering Testing

Empirical Software Engineering using R

Chapter 8 Testing hypotheses

Testing of Hypotheses

STA215 Testing Hypotheses about Proportions

Appendix II Testing Statistical Hypotheses

TI2210 Software Testing and Quality Engineering

Chap. 8 Testing statistical hypotheses

Advanced Software Engineering: Software Testing COMP 3705(L4)

SOFTWARE TESTING. Software Testing - II

Software Testing. Testing types. Software Testing

Taxonomy of Common Software Testing Terminology: Framework for Key Software Engineering Testing Concepts

Flexibility in Research Designs in Empirical Software Engineering

MULTIPLE HYPOTHESES TESTING IN SMALL MICROARRAY EXPERIMENTS

Chapter 20 Testing Hypotheses about Proportions

Threats to Validity and Mitigation Strategies in Empirical Software Engineering

Statistical Hypotheses Testing with Imprecise Data

Testing Hypotheses (Means, Proportions, and Standard Deviations)

Chapter 20 Summary Testing Hypotheses about Proportions

System Testing. Software Engineering 1 Lecture 16. Bernd Bruegge, Ph.D. Applied Software Engineering Technische Universitaet Muenchen

Chapter 12 Testing Hypotheses About Proportions

Chapter 20: Testing Hypotheses about Proportions

Technische Universität München

Empirical Software Engineering Hypotheses testing Dr. Antonio Vetrò

Technische Universität München Institute for Informatics Software & Systems Engineering

Outline §  §  §  §  §  § 

Statistics recall Hypothesis testing process Confidence intervals A software engineering example Final considerations Type of tests

2

Outline §  §  §  §  §  § 

Statistics recall Hypothesis testing process Confidence intervals A software engineering example Final considerations: HP testing pitfalls Type of tests

3

Statistical inference §  The process of making guesses about the truth from a sample Sample statistics n

µˆ = X n =

∑x i =1

n

n

∑ (x − X i

σˆ 2 = s 2 =

Population parameters

∑x µ=

i=1

N

N

∑(x − µ) i

σ2 =

i =1

N

2

i =1

n −1

*the notation ^ is often used to indicate “estimate”

Truth (not observable)

N

n)

Sample (observation)

2

Make guesses about the whole population

Statistics vs. Parameters

§  Sample Statistic –  Any summary measure calculated from data; e.g., could be a mean, a difference in means or proportions, an odds ratio, or a correlation coefficient •  E.g., the mean size in a sample of 1000 classes of a scientific Java application is 245 LOCs •  E.g., the correlation coefficient between LOCs and number of bugs in the sample of 100 Java classes is 0.65

§  Population parameter –  The true value/true effect in the entire population of interest •  E.g., the true mean size in all classes of scientific applications written in Java is 250 LOCs •  E.g., the true correlation coefficient between LOCs and number of bugs in classes of all scientific applications written in Java is 0.67

Examples of Sample Statistics:

Single population mean Single population proportion Difference in means Difference in proportions Odds ratio/risk ratio Correlation coefficient Regression coefficient …

The Central Limit Theorem: If all possible random samples, each of size n, are taken from any population with a mean µ and a standard deviation σ, the sampling distribution of the sample means (averages) will: 1. have mean:

µx = µ

2. have standard deviation:

σ σx = n

3. be approximately normally distributed regardless of the shape of the parent population (normality improves with larger n)

Central Limit Theorem caveats for small samples: §  For small samples: –  The sample standard deviation is an imprecise estimate of the true standard deviation (σ); this imprecision changes the distribution to a T-distribution. •  A t-distribution approaches a normal distribution for large n (≥100), but has fatter tails for small n ( μy , by chance E.g., P-value < 0.001 means: P(empirical data/null hypothesis) tα/2, n+m−2

not reject H

otherwise

a t-distribution with n + m − 2 degrees of freedom. From this, it follows that we can 0 the hypothesis that µx = µy as follows:

he 100 α/2 percentile point of a t-random variable with n + m − 2 ee Figure 8.5).accept H0 if |T | ≤ tα/2, n+m−2 reject H0 if |T | > tα/2, n+m−2 thebevalue the test the statistic is isv,observed then the p-value is estIfcan run by of determining p-value.TIf T to equal re tα/2, n+m−2 is the 100 α/2 percentile point of a t-random variable with n + m − 2 • 8.5).Ifofthe test statistic T is v , then p value is p-value of (see theFigure test H0 value againstofHthe rees of freedom 1 is given by

given by

Alternatively, the test can be run by determining the p-value. If T is observed to equal hen the resulting p-value of the test of H0 against H1 is given by

p-value = P{|Tn+m−2 | ≥ |v|} p-value = P{|Tn+m−2 | ≥ |v|}

≥ |v|} = 2P{T = 2P{Tn+m−2 n+m−2 ≥ |v|}

p-value = P{Tn+m−2 ≥ v}

Program 8.4.2 bothforthe value of theSheldon testRoss statistic and the corresponding p-value. Source:computes Probability and Statistics Engineers and Scientists,

re Tn+m−2 is a t-random variable having n + m − 2 degrees of freedom. * Normal distributions and unknown variances. n and m are sample sizes for X and Y. 95% confidence level f we are interested in testing the one-sided hypothesis

random variable having n + m − 2 degrees of freedom.

µone-sided versus H1 : µx > µy volunteers at a cold research institute caught a cold after having EXAMPLE Twenty-two d in testing Hthe hypothesis 0 : 8.4b x ≤ µy

random variable of freedom. is thewith 100k degrees α/2 percentile n+m−2

point of a t-random variable with n + m − 2

reedom (see Figure 8.5). ively, theoftest be P-value: run by determining If T is(2/2) observed to equal estimator the can common variance σ 2 , recalling is given bythe p-value. The the basics esting the Equality of Means of Two Normal Populations resulting p-value of the test of H0 against 317H1 is given by 2 + (m − 1)S 2 (n − 1)S x y Sp2 =Compute * : |v|} our example n+m−2 | ≥ np-value +pmvalue − 2=in P{|T

2P{Tn+m−2 ≥ |v|} s true, and so µx 1.187 − µy = 0, the=0.208 statistic Area5! Area5!

is a t-random variable X − Y having n + m − 2 degrees of freedom. T ≡ ! E 8.5 Density of a t-random variable with k degrees of freedom. 6.98 interested in testing the one-sided hypothesis 2 S (1/n + 1/m)

m−2

2t!,k

0

t!,k

6.891e-12

p

re Sp2 , the pooled estimator of the common variance σ 2 , is given by

H :µ ≤µ

versus

H :µ >µ

x of2 +freedom. y 1 x y (m − 1)Sy2 From this, it follows that we can (n − 1)S ith n + m − 2 0Sdegrees x 2 1.622 p = at µx = µy as follows: n + m − 2 µx − µy =values 0, the statistic Hence, H0 is true, and ll bewhen rejected atso large of T . Thus the significance level α test is to, t0.95,371 =1.649 Conf. level = 0.95 accept H0 if |T X −|Y ≤ tα/2, n+m−2

318

reject H0

H0 if T ≥ tα, n+m−2 if reject |T | > tα/2, n+m−2

T ≡! Sp2 (1/n + 1/m)

not reject H

6.98 > 1.622 Chapter 8: Hypothesis Testing

otherwise

a t-distribution with n + m − 2 degrees of freedom. From this, it follows that we can 0 the hypothesis that µx = µy as follows:

Hypothesis rejected , but this tells he 100 α/2 percentile point of a t-random variable with n + m − 2 nothing about causality. ee Figure 8.5).accept H0 if |T | ≤ tα/2, n+m−2 reject H0 if |T | > tα/2, n+m−2 thebevalue the test the statistic is isv,observed then the p-valueWe is need givenexplanations by estIfcan run by of determining p-value.TIf T to equal re tα/2, n+m−2 is the 100 α/2 percentile point of a t-random variable with n + m − 2 • 8.5).Ifofthe test statistic T is v , then p value is p-value of (see theFigure test H0 value againstofHthe rees of freedom 1 is given by

Alternatively, the test can be run by determining the p-value. If T is observed to equal hen the resulting p-value of the test of H0 against H1 is given by

p-value = P{|Tn+m−2 | ≥ |v|} p-value = P{|Tn+m−2 | ≥ |v|}

≥ |v|} = 2P{T = 2P{Tn+m−2 n+m−2 ≥ |v|}

p-value = P{Tn+m−2 ≥ v}

6.891e-12

Program 8.4.2 bothforthe value of theSheldon testRoss statistic and the corresponding p-value. Source:computes Probability and Statistics Engineers and Scientists,

re Tn+m−2 is a t-random variable having n + m − 2 degrees of freedom. * Normal distributions and unknown variances. n and m are sample sizes for X and Y. 95% confidence level f we are interested in testing the one-sided hypothesis

random variable having n + m − 2 degrees of freedom.

µone-sided versus H1 : µx > µy volunteers at a cold research institute caught a cold after having EXAMPLE Twenty-two d in testing Hthe hypothesis 0 : 8.4b x ≤ µy

The P-value and significance of the test §  By convention, p-values of