Applied Statistics in Biological Research

Applied  Statistics  in  Biological  Research       1     Applied Statistics in Biological Research Basic concepts, related problems and common met...
Author: Logan Kelley
3 downloads 0 Views 3MB Size
Applied  Statistics  in  Biological  Research       1  

 

Applied Statistics in Biological Research Basic concepts, related problems and common methods

Content: I. Introduction:

The research process The ‘who is who…?’ some useful definitions

II. Basic Concepts:

Experimental vs. correlational research Methods of data collection Analyzing data/descriptive statistics Frequency table & distribution Population and sample Random sampling distribution (RSD) Estimating parameters and the Standard Normal distribution Hypothesis testing

III. Related Problems: How many, how often, how sure . . .? and the badly desired significance black sheep, outliers and underdogs multiple testing IV. Common Methods: selecting the correct test simple cases: comparison of 2 or more groups on one metric character; and more advanced cases: more than 2 groups and/or on more than one character correlation, regression and analyses of frequencies

    September2014  -­‐  CSF  -­‐  vienna  BIOCENTER                                         ([email protected])

 

 

Dr.  Karin  Schmid    

 

Applied  Statistics  in  Biological  Research       2  

 

I. INTRODUCTION

1.1 The research process/strategy Study the background literature and identify feasible methods of testing the hypothesis formulate clear objectives extract hypotheses and formulate them in a formal logical language, allowing straight observation/examination of the hypothesized relation of the variables in question. Design the experimental protocol carefully: number of experimental and /or control groups; description of the intervention(s), the sequence or course of any intervention. Who, where, when and how will the experiment be performed. the statistical model should be developed and defined in parallel and includes the statistical method to be used, the necessary sample size, the desired power and the expected effect size. data analyses: should be correctly performed and the maximum information should be extracted. interpretation: only test your a priori formulated hypotheses; do the results reveal substantial scientific progress or knowledge?; post-hoc interpretation of randomly discovered relations is incorrect and un-professional scientific work. report: use guidelines to prevent losing important information and support the cumulative progress in research.

   

Applied  Statistics  in  Biological  Research       3  

 

1.2. The ‘who is who’ in statistics Variables measures that vary/change between individuals, entities, locations or in time • independent variable (cause) ~ factor • dependent variable (effect) ~ variate nominal or categorical variables: • binary: 2 categories (dead/alive) • categorical: > 2 categories (vegetar./vegan/ omnivore) ordinal or rank scaled variables: categories have a logical order (ranks) interval scaled variables: equal intervals represent equal differences, e.g. testscore…… rational scaled variables: ratio of scores makes sense, e.g. length, weight……. 1.3. Some useful definitions: Treatments, manipulations or interventions, which are under control of the experimenter, are usually defined as ‘fixed factors’. A fixed (effect) factor: discrete variable used to classify experimental units; for example ”sex” (factor with two levels: “male” and “female”); “diet” (levels “low”, “medium” and “high”), dose, genotype and any treatment which can be administered. The levels within each factor can be discrete, such as “drug A” and “drug B”, or they may be quantitative such as 10, 20, 30 and 40 mg/kg. Random (effect) factors: usually not controlled by the investigator; unknown factors, that may influence the variable of interest: e.g. inter-individual differences, litter effects, time and environmental effects, differences in diet. These effects are responsible for noise (variation), so the aim is to partition these effects out. Control of variation: is of fundamental importance when designing an experiment. Sample size can be reduced, power increased and smaller responses/effects detected. situational variation: groups should be processed identically throughout the whole experiment; e.g. housed in the same room. inter-individual variation: age, weight, sex Uncontrolled variability reduces the signal/noise ratio - so larger sample sizes are needed to detect the effect of a treatment. Avoiding bias: time, space or other unknown factors also influencing results can only be controlled by randomisation. Randomisation of experimental units to treatment groups depends on the experimental design (randomised block design; completely randomised design; randomisation of the order in which measurements are made etc.).

   

Applied  Statistics  in  Biological  Research       4  

 

II. BASIC CONCEPTS

2.1. Experimental vs. correlational research Correlational research: register what naturally happens without any intervention: e.g. genomewide association studies

Experimental research: manipulate at least one variable and register the effect on another Ex.: ‘… it is a proven fact that there is a significant correlation between the number of murders committed and ice-cream consumed.’ ‘… children with longer arms reason better than those with shorter arms.’ .. shoe size is strongly correlated with reading skills:’ … spurious, sometimes even ‘spooky’ correlations   Correlation measures association,

but association is not causation.

2.2.

Methods of data collection

manipulate independent variable using different entities: between-groups, between-subjects independent design manipulate independent variable the same entities: within-subjects, repeated-measures design Sources of variation:

systematic variation due to the manipulation & unsystematic variation: random factors

2.3. Error propagation: every measured quantity: true value + error/uncertainty when we use a measured quantity to calculate another quantity (sum, mean, division) the error/uncertainty ‘propagates’ to the uncertainty of the quantity. assumption: errors are uncorrelated and random → error propagates differently, depending on the mathematical operation

2.4.

Analyzing data / descriptive statistics

Frequency table and frequency distribution Measures of the central tendency: mode, median, mean Measures of the dispersion: range, inter quartile range (IQR), standard deviation, variance Skew: negative vs. positive (frequent scores are clustered at the upper or lower end, respectively) Kurtosis: leptokurtic (thin and spiky) vs. platykurtic (broad and flat)  

   

Applied  Statistics  in  Biological  Research       5  

 

2.5. Population and sample samples are drawn from the population; parameters are estimated from the sample

aim: we want to say something about the population based on analyzing the sample!

2.6. Random sampling distribution (RSD) is the distribution of sample means SEM (standard error of the mean): standard deviation of sample means

The central limit theorem (CLT) states that the distribution of means of a large number of independent, identically distributed variables will be approximately normal, regardless of the underlying distribution.

2.7.

Estimating parameters

Standard Normal Distribution CI:

± (1.96 x SE)

   

Applied  Statistics  in  Biological  Research       6  

 

2.8.

Hypothesis testing

Alternative hypothesis H1: µi ≠ µj postulates an effect Null hypothesis H0: µi = µj postulates no effect   Hypotheses postulate effect for population – decision is drawn from sample! What we test: . . . the chances for obtaining the data we’ve collected assuming that the null hypothesis is true! p-value < .05 (or lower) – significant Hypothesis: ‘male and female mice differ with respect to their tail length’ two groups on one metric character -> t-Test for independent samples (details see below) systematic variation: explained by the model unsystematic variation: not explained by the model ~ error !"#$%& !"#

%$=

!"#$"%&'  !"#$%&'!( !"#$"%&'  !"#  !"#$%&'!(

=

!""!#$ !""#"

test statistic (t, F, chi2 …)

Truth (for population studied)

Decision (based on sample)

Null hypothesis true

Null hypothesis false

Reject null hypothesis

Type I. error (α - error)

Correct decision

Accept Null hypothesis

Correct decision

Type II. Error (β – error)

Type I./ Type II. error In a controlled experiment usually two or more samples are compared with respect to their mean, median, their distributions or variance. We normally set up a “null hypothesis” and aim to reject it. Due to inter-individual variability we run the risk of making a mistake. If we fail to find a true difference, then we have a false negative result, also known as a Type II. or beta error. If we conclude that there is a difference, when in fact it is just due to chance or sampling variation, then we have a false positive, Type I. or alpha error. These are shown in the table above. Type I error: is controlled by the significance level (alpha). Type II error: control is more complicated; depends on several parameters, most importantly on the effect size (difference between means of the groups), the inter-individual variability and the sample size.

   

Applied  Statistics  in  Biological  Research       7  

 

III. RELATED PROBLEMS 3.1.

How many, how often, how sure . . .? Power, effect & sample size

3.1.1. Power Power is the probability that an experiment can detect a treatment effect (signal), if there is one. Power is often set at 0.8 or 0.9 (80% or 90%), though the higher the power the larger the sample size required. Power depends on effect size, sample size and significance level. In fact the null hypothesis can hardly ever be true, since the probability that some values calculated from two random samples will be equal is vanishingly small. If the sample size is large enough, any small difference will become significant, without any scientific meaning. Hence we have to decide how big a difference between two test statistics has to be for being meaningful? 3.1.2. Effect size characterizes the magnitude of the difference between the means of the two groups (M1-M2), which is likely to be of clinical or scientific importance. It has to be specified by the investigator in advance. If an estimate of the standard deviation is available (previous work in the same field), a power analysis can help to estimate the effect size you are likely to be able to detect for the sample size you decide to use. Interindividual variability (noise) is the variation among the experimental subjects, expressed as standard deviation (in the case of measurement characteristics) and needs to come from previous studies or a pilot study. If no good estimate is available, a power analysis can be conducted with a low and a high estimate to see what difference it makes to the estimated sample size or to use the standard deviation of the control (non-treated) group. “Standardised effect size” or “Cohen’s d” is used as a general indication of the magnitude of an effect. Values of d of 0.2, 0.4 or 0.8 are considered as “small”, “medium” and “large” effect sizes respectively in psychological research. In work with laboratory animals, especially inbred strains, much larger effects are seen since the noise is much better controlled. For animal studies use: small effect d= 0.5, medium effect d= 1.0 and large effects d= 1.5 some useful formulas: similar SDs in both groups Inhomogeneity of variances - SD of control group n1 ≠ n2 use of weighted SD*pooled

   

Applied  Statistics  in  Biological  Research       8  

 

Calculation of the standardized mean difference from t-stats or correlation coefficient r n1 = n2:

3.1.3. Sample size   is the number in each (treatment or control) group. If we know the size of the effect to be expected, the power and significance level, the sample size can be estimated by various programs or stat. packages (G*Power). If there is a fixed number of subjects/entities available the achieved power and effect size can be estimated.

Sample size → affects SD → SE → significance • sample size affects whether an observed difference between samples is deemed significant. • in large samples even very small differences can be significant • in small samples even large differences can be non-significant • • • • •

power sample size inter-individual variability effect size (magnitude of the response to a treatment) significance level are interrelated!

If it is possible to specify four of the above mentioned variables you can estimate the fifth one. Hence we are able to estimate sample size or effect size or power.

   

 

Applied  Statistics  in  Biological  Research       9  

Critical view When a statistical result is significant, can we assume that it is also important? No! p is affected by n èsmall and unimportant effect can be significant Does a non-significant result mean that the null hypothesis is true? No! The null hypothesis is never true Does a significant result means that the null hypothesis is false? No! It just means that that the probability of making a type 1 error is small if the null hypothesis is accepted. Two results, say the one with p =.0499, the other with p = 0.512, should we really have different conclusions? No   Solution: meta-analysis: report:

several effect sizes – constant positive effect? mean effect size or weighted mean effect size confidence interval ; exact p–value + effect size [p = 0.06 or p = 0.93: both p > .05 non-significant]

3.2. Treatment of black sheep, underdogs and outliers … 3.2.1. General Assumptions Violations against assumptions are one main source of bias: The statistical model, test statistic or p-value are inaccurate and can lead to wrong conclusions Additivity and linearity: the outcome variable is linearly related to any predictors; with several predictors - their combined effect is best described by adding their effects together. Normality: for CIs the estimate has to come from a normal distribution (ND); for significance tests, the RSD must be normally distributed; for estimates that define a model the residuals in the pop. must be normally distributed; central limit theorem: .. if your sample is large enough normality is less of a concern Homoscedasticity ~ homogeneity of variances: influences/ biases the standard error and therewith CIs and significance tests; Levene Test – corrected model Independence: errors in the model are not related to each other; important for significance testing and CIs 3.2.2. Outlier:   score very different from the rest of the data scores • changes parameter estimate (the mean) • even greater influence on the error associated with that estimate • sum of squared error higher used to compute the SD, • which in turn is used to estimate the SE • most test statistics are based of the SS and thus will be biased too Identify: graphically: boxplots and histogram use z-scores (± 2 SD; see also ‘trimming data’) best case: mistake of tipping in data : )    

 

Applied  Statistics  in  Biological  Research       10  

3.2.3. Normality: Kolmogorov-Smirnov Test: exact test for any hypoth. distribution; can be used in  small samples, though tends to be conservative, fails to detect deviations (Lillefors correction for critical values) or Shapiro Wilk Test: specifically for ND; more power; also for small samples disadvantage in both: based on NH significance testing • large samples – significant for marginal effects • small samples – lack power to detect violations large sample – don’t worry too much (CLT) small sample – don’t rely on a non-significant test (look at skewness, kurtosis and graph) 3.2.4. Reducing bias – correction of outliers: trimming: certain amount of scores from the extremes are deleted (a) trimming by a percentage bases rule: e.g. deleting 10% of the highest and lowest scores (resulting: trimmed mean or M-estimator) (b) trimming by a SD based rule: remove all values that lie above or below a certain number of SD greater than the mean (usually it is 2.5 SD) winsorizing: replace outliers with the next highest score that is not an outlier or use a z-score of 3 and replace the outlier by 3 x the SD robust methods: non-parametric procedures; bootstrapping: properties of the (unknown) sampling distribution are estimated from the sample data transforming data: normality or linearity in question; changes the form of relationship, but not the relative distance between scores 3.2.5. Transformations: transformation is performed to every single value; if we compare differences between variables or over the time, all variables have to be transformed. Log transformation (log(xi)): positive skew, positive kurtosis, unequal variances, lack of linearity Square root transformation ( 𝑥𝑖): more effect on large scores; positive skew, positive kurtosis, unequal variances, lack of linearity Reciprocal transformation (1/xi): reduces large scores; resulting scale is inverse 1/(xhighest –xi); positive skew, positive kurtosis, unequal variances Reverse score transformations: negatively skewed data need to be reversed before any transformation by (xhighest –xi); this reverses the scores – caution: interpretation prefer robust methods; enhance sample size (>40 or more); F-test and t-test are quite robust in skewed distributions!

   

 

Applied  Statistics  in  Biological  Research       11  

3.3. Non-parametric Tests Basic idea: metric data are ranked (or data have already ordinal scale level) analysis is carried out on ranks • assumption-free - less assumptions • small samples (CLT does not really apply) • ordinal or ranked data sets Advantages: overcome problem of distribution shape overcome problem of outliers Costs: loss of information about magnitude of differences less powerful limited number of comparisons mean is not representative (report median instead) 3.4. Multiple Testing Conducting more tests results in inflated error rates: overall probability of no Type I. error is multiplied Ex: conduct 3 tests – each at α = .05 overall probability of no type I. error = (0.95)3 = 0.95 x 0.95 x 0.95 = 0.857 probability of at least one type I. error: 1 - 0.857 = 0.143 or 14.3% an increase from 5% to 14.3% conducting 10 tests: 1 – 0.9510= .40 40% chance of having made at least on Type I error familywise error rate: FWER= 1 – (0.95)n

3.4.1. Control of inflated error rates ! Bonferroni correction (Pcrit = ! ): k = number of comparisons; conservative; loss of power

The Šidák procedure (αsid = 1 – (1-α)1/m): m = number of comparisons; independence of test statistics; more power Hochberg’s Sequential Method: independence of test statistics; more power Tukey prodecure: only for pairwise comparisons; independent observations and equal variation; more power Games-Howell: unequal variances unequal group sizes small sample sizes 3.4.2. Different approaches to control Type I. errors: • Per comparison error rate (PCER): the expected value of the number of Type I errors divided by the number of hypotheses • Per-family error rate (PFER): the expected number of Type I errors • Family-wise error rate: the probability of at least one type I error, FEWR = P (V ≥ 1) • False discovery rate (FDR) is the expected proportion of Type I errors among the rejected hypotheses; FDR tests are designed to control the expected proportion of incorrectly rejected null hypotheses ("false discoveries") • Positive false discovery rate (pFDR): the rate that discoveries are false    

Applied  Statistics  in  Biological  Research       12  

 

IV. COMMON METHODS

1. 4.1. Selection of the correct test How many groups are we planning to compare? 2 groups ore more than two groups 2. Will each individual be tested once ore are there measures over several points of time? A repeated measures design is a design where one and the same dependent variable is measured at two or more time points. Note that it is the same character or behaviour that is measured in the time dimension. If we investigate the effect of a drug, say on several blood parameters, we have different dependent variables also called variates. These can either be compared separately in different tests or by multivariate methods. [! Note that: whenever possible a within subjects comparison is favourable!] The number of animals is lower, the precision and power of test is higher - due to lower interindiv. variablility. 3. What will be measured? The ‘dependent variable’, variable of interest, the character we plan to manipulate with the treatment. The statistical method depends on the scale level of the dependent variable. The higher the scale level, the better the statistical approach –as long as … 4. Are the respective assumptions met? 4 hierarchical scales: characteristics and options for further operations • ratio scale natural zero point ; all math. Operations allowed; e.g. lengths, weight • interval scale same intervals; e.g. IQ-scores, but no ratios • ordinal scale scale of ranks; >/< ; e.g. ranks; level of education • nominal scale equal/un-equal; frequencies in categories; e.g. sex; stream, group Keep in mind, it is always possible to scale downwards, from a higher to a lower scale level, but not the other way round. Accordingly to the scales the statistical methods available are to be seen hierarchically concerning their power and value. This is, however only true if the test-specific assumptions are met or the violations are minor.

   

Applied  Statistics  in  Biological  Research       13  

 

4.2. Selection of statistical hypothesis test - comparison of 2 or more groups Scale level

comparison of

comparison of

2 groups

> 2 groups

of dependent variable Measures metric & ND

independent

Dependent

t-Test

t-Test

for independent samples ordinal/ranks Nominal

Test of Median, Man-Whitney-UTest Chi-Square-Test

independent

dependent

univariate univariate ANOVA ANOVA for for for paired samples independent repeated measures samples sign-Test, Kruskal-Wallis HFriedman’s Wilcoxon-signedTest Rank-AV Rank-Test Chi-Square-Test Cochran-Q-Test McNemar‘s χ2 Test of symmetry

Random sampling is required for all statistical inference because it is based on probability, though true random samples are difficult to find.

4.3. Simple cases: 2 independent groups on one metric character Example: H1: male and female mice differ with respect to their tail length H0: the population means from the two unrelated groups are equal T-Test for independent samples Assumptions: unrelated, independent groups; normality of the dependent variable (Shapiro-Wilk) homogeneity of variances (Levene’s Test) Violations: robust against violation of normality assumption correction for heterogeneous variances What if …? deviation from normality & different group sizes transform data (log of data) or apply …

   

Applied  Statistics  in  Biological  Research       14  

 

Mann-Whitney U-Test comparison of 2 independent groups for ranked data H0 : the distributions of both groups are equal the groups do not differ with resp. to their medians Assumptions: independence of observations responses/values are ordinal Asymptotic method: approximation for large samples (n1 > 10 and n2 > 10: U approx. nd) Exact method: gives you the exact significance (in small samples < 50)

4.4. Simple cases: 2 related groups on one metric character Example: one group of n=12 mice was tested on the effect of ‘Red Bull’ on their explorative behaviour on the T-maze compared to application of saline. H1: the number of head-dips differs between the two conditions (same entities tested twice) H0: the population mean of paired differences of the two samples is 0 Paired-Samples t Test • one sample has been tested twice (repeated measures) or, • two samples have been "matched" or "paired", in some way. (matched subjects design) Assumptions: paired observations; independent observations; approx. normal distribution of difference scores; homogeneity of variances; no significant outliers Violations: robust against violation of normality assumption What if …? strong deviation from normality correction of outliers or apply … Wilcoxon signed rank Test comparison of 2 related groups for ranked data H0: the median difference between the pairs is zero Assumptions: data are paired and come from the same population each pair is chosen randomly and independent data are measured at least on ordinal scale the distribution of the differences is symmetric around the median

4.5.

One sample t-Test

H0: the sample mean does not deviate from the population    

 

Applied  Statistics  in  Biological  Research       15  

parameter µ Assumptions: normal distribution of dependent variable random samples population independence of samples known population mean Violations: robust against violation of normality assumption for sample sizes equal to or greater than 30 ___________________________________________________________________________ 4.6. Simple cases: more than 2 independent groups on one metric character Ex: H1: different drug treatment in parent animals reveals differences in tail length of their offspring H0: µi = µj = … µk one (between) factor (≥ 3 levels) alcoholics, junkies, control one dependent metric variable tail length of their offspring univariate one-way ANOVA Assumptions: independence of observations & random samples normal distribution of dependent variable (in each group) –> Shapiro-Wilk test homogeneity of variances -> Levene’s test Violations: independence – very seriously violation of ND: robust resp. Type I. error; skewness little effect, platykurtosis attenuates power homogeneity: robust if group sizes are approx. equal (largest/smallest < 1.5) What if …? strong deviation from normality and/or unequal sample size correction of outliers; transformation of raw data or Kruskal-Wallis (H)-Test more than 2 independent groups on ranked/ordinal data, normality not assumed, outliers H0: whether samples originate from the same distribution Assumptions: independence of observations the responses are ordinal identically shaped and scaled distribution for each group

___________________________________________________________________________ 4.7. ANOVA/MANOVA Models    

Applied  Statistics  in  Biological  Research       16  

 

Whenever possible: try to get on with an ANOVA method, it has the highest power and allows max. output of the data! The great benefit of all ANOVA designs comes with the addition of another 2nd or 3rd factor, since we can test more hypotheses within a single test. Typically, there are many factors such as sex, age, genotype, diet etc. which can influence the outcome of an experiment. Factorial designs are efficient and provide extra information (the interactions between the factors), which cannot be obtained when using single factor designs; and factorial designs are powerful because differences among the levels of each factor are determined by averaging across all other factors. Overview of ANOVA/MANOVA designs > 2 groups DV = 1 Factor(s) ANOVA

independent

univariate

one-way

two-way

three-way



1

2

3



3 mouse strains

3 mouse strains x 2 sex

3 mouse strains x 2 sex x 3 dose

3x2

3x2x3

3 mouse strains at t1, t2, t3

3 mouse strains x 2 sex at t1, t2, t3

3x3

3x2x3

3 mouse … strains x 2 sex x 3 dose at t1, t2, t3 3x2x3

1

2

3

3 mouse strains x 2 sex on BGL, temp., cholest.

3 mouse strains x 2 sex x 3 dose on BGL, temp., cholest.

3x2

3x2x3

Design repeated m. univariate (mixed model repeated measures design) Design DV > 1 MANOVA

independent

Design

(M)ANCOVA

multivariate 3 mouse strains on BGL, temp., cholest.

3x3 repeated m.

multivariate … same at t1, t2, t3

… same at … same at t1, t2, t3 t1, t2, t3

independent & repeated m.

univariate 1 fixed 2 fixed & factor factors multivariate & covariate & covariate

3 fixed factors & covariate



… …

   

Applied  Statistics  in  Biological  Research       17  

 

4.8. advanced cases: more than one independent group/factor on one metric character (univariate) two-way ANOVA factor 1 (sex) factor 2 (drug) one dependent metric variable: tail length 3 hypotheses in one (omnibus) test main effect 1: no difference between male and female main effect 2: no difference between drug treatment interaction: no difference between sex x drug 3 x 2 design: one instead of 7 t-tests (α – Inflation)!   Example: two-way ANOVA: 2 x 3 design on one dependent variable: tail length

Main  effect

Main   effect

Basic concept SS

t

Total variability

SS

m

variance explained by the exp. manipulation

SS

A

variance explained by factor A

SS

B

variance explained by factor A

SSr unexplained variance

SS

AxB

variance explained by interaction

   

Applied  Statistics  in  Biological  Research       18  

 

Specific hypothesis: predefined contrasts: comparisons constructed to answer specific research questions. • Deviation • Simple • Repeated • Helmert • Difference or Post-hoc comparisons • Fisher's least significant difference  LSD liberal   • Studentized Newman-Keuls lacks control over FWER   • Bonferroni’s (small number of means) control Type I. error, but conservative • Tukey’s (large number of means) control Type I. error, but conservative • Dunn conservative   • Scheffé conservative REGWQ (Ryan-­‐Einot-­‐Gabriel-­‐Welsch  Q)   •

good power; comparison of all pairs, not for different group sizes

Keep in mind: Type I. error rate and statistical power are linked. If a test is conservative (prob. of Type I. error small) it is likely to lack statistical power (prob. of Type II. error high)

4.9. advanced cases: more than 2 groups, more than 2 times ANOVA for repeated measures: > 2 groups and more than 2 times “extension of paired t-test” for more than 2 groups one dependent metric variable measured over more than two points in time or Mixed model repeat. measures: one (between) factor (group variable) one dependent metric variable measured more than 2 times (within factor) variants: one between – one within factor one between – two within factors two between – one within factor a.s.o. benefit:

randomized block designs; variability among subjects due to individual differences is completely removed from the error term -> more power and fewer number of subjects

Assumptions: independence of observations multivariate normality sphericity: homogeneity of variances of differences of the repeated measures levels (k > 2; Mauchly’s Sphericity Test; Epsilon: ε = 1 max. deviation: ε = 1/(k-1) Violations: independence – violation very seriously violation of multivariate ND: robust resp. Type I. error sphericity: Greenhouse-Geisser ε for correction of df or Multivariate approach (if n > k+10)    

Applied  Statistics  in  Biological  Research       19  

 

Example: two-way ANOVA for repeated measures 2 fixed (between) factors: strain (3); sex (2) 1 rep. measures factor: BGL at t0, t1 and t2 (within) 18 cell means

Main  effect

Hypotheses: main effect 1: no difference betw. mouse strains (12 x 12 x 12 mice) main effect 2: no difference betw. male and female mice (24 x 24 mice) within effect: no difference in points of time (36 mice) interactions: no difference between strains x sex no difference between strains x time no difference between sex x time no difference between strains x sex x time post-hoc pairwise comparisons; when sphericity assumption is violated: when ε > .70 -> Tukey when ε < .70 -> Bonferroni What if …? Friedman rank ANOVA but only for one repeated measures factor! H0: no difference in the central tendency of the samples Assumptions: data are paired each pair is chosen randomly and is independent data are measured at least on ordinal scale comparable distributions of the samples ___________________________________________________________________________

   

Applied  Statistics  in  Biological  Research       20  

 

4.10. advanced cases: several factors on more than one metric character multivariate approach one-way MANOVA one fixed factor (between) more than one dependent metric variable Assumptions: independence of observations & random samples observations on dependent variable follow a multivariate ND/group population covariance matrices for the p dependent variables are equal Violations: independence – violation very seriously violation of mv ND: robust resp. Type I. error; platykurtosis attenuates power; in fact never tested cov. matrices: robust if group sizes are approx. equal (largest/smallest < 1.5)

Example: one-way MANOVA 1 fixed (between) factor: sex (2) 3 dependent variables: BGL, Cholesterine, MCV 2 x 3 design: 6 cell means

v  

after treatment variables

Sex

BGL

Chol.

MCV

female male

H0 :

no difference between f/m with resp. to BGL or Chol. or MCV no difference between f/m with respect to pattern/relation between BGL, Chol. and MCV

_____________________________________________________________________

   

Applied  Statistics  in  Biological  Research       21  

 

4.11. Correlation Example: … dramatic increase of people with diabetes mellitus. In the sense of monitoring diabetes it would thus be of high interest whether there is a relation between glucose level in blood and glucose level in saliva. 2 interval scaled variables: relation? Do the variables co-variate?

Standardized covariance

Pearson’s product-moment correlation coefficient

Bivariate correlation Assumptions: linearity normality small samples bias – outliers interval scale level

Assumptions met: Non-linearity: Non-normality/outliers:

Pearson r Transform data Spearman rho (rs) Kendall’s τ (for small samples and tied ranks)

   

Applied  Statistics  in  Biological  Research       22  

 

If one of the two variables is dichotomous . . . Biserial correlation (rb) :dichotomous variable is discrete; ‘dead or alive’,‘male/female’ ‘marker present/absent’ Point-biserial correlation (rpb): dichotomous variable is continuous; ‘passing or failing an exam’; ‘p-values’ : ); ‘artificial’ dichotomy Partial correlation: qualifies the relationship between two variables while accounting for the effects of a third variable on both variables. ___________________________________________________________________________ 4.12. Linear Regression: simple Example: … we want to know whether the cholesterol level as a baseline measure can be used in terms of a predictor for the cholesterol level after one month? So again, we fit a model to our data…..

Constant b0:if X = 0, the predicted value for y

Model Summary

Model 1

R ,861

R Square a

Adjusted R

Std. Error of the

Square

Estimate

,741

,740

25,258

a. Predictors: (Constant), Cholesterol, baseline

R2 The baseline cholesterol level accounts for 74.1% of the variation. r estimate of the overall fit of the regression model.

   

Applied  Statistics  in  Biological  Research       23  

 

ANOVA Model 1

Sum of Squares

b

df

Mean Square

F

Regression

314337,948

1

314337,948

Residual

109729,408

172

637,962

Total

424067,356

173

Sig.

492,722

,000

a

a. Predictors: (Constant), Cholesterol, baseline b. Dependent Variable: Cholesterol after 1 Month

F-ratio: regression model results in a significant prediction

Coefficients

a

Standardized Unstandardized Coefficients Model 1

B (Constant) Cholesterol, baseline

Std. Error

34,546

9,416

,863

,039

Coefficients Beta

t

,861

Sig.

3,669

,000

22,197

,000

a. Dependent Variable: Cholesterol after 1 Month

gradient of regression line (b1) – change in the outcome Assumptions: • Additivity and Linearity • Independent errors: residuals should be uncorrelated; Durbin-Watson Test around 2 • Homoscedasticity (variances are assumed not to be significantly different) • Normally distributed errors • Predictors are uncorrelated to ‘external variables’ • No perfect multicolinearity Variable types: predictor variables: quantitative or categorical (at least 2 categories) outcome variable: quantitative, continuous chol1 = 34.546 + (0.863 x cho.0) For a baseline cholesterol level of 280 the level of cholesterol after one month can be expected to be 276

   

Applied  Statistics  in  Biological  Research       24  

 

4.13. Multiple Regression Example: … we want to include other predictors in the model: age, body weight, blood glucose level (BGL)..

_____________________________________________________________________ 4.14. Analysis of frequencies: χ2-procedures 2

Basic idea of all χ -procedures: comparison of an observed frequency with an empirical frequency according to a theoretical distribution (normal, equal, other) Is this difference between the expected and observed frequency due to sampling error, or is it a real difference?

…compare whether the parents’ drug addiction led to a different risk for dying (for three categories) by application of a one dimensional χ2-test , or we conduct a k x l χ 2-test and compare the frequencies of two categorical variables in one test (sex x parent drug addiction: risk of dying)

   

Applied  Statistics  in  Biological  Research       25  

 

k x l χ 2-test: Frequency

Overview: Chi-square procedures 1 variable

2 variables 2

2

dichotomous (2-tiered)

1 time: one-dimensional χ 2 times: McNemar test of symmetry (pre-/post) k times: Cochran-Q test

multi-level (categorical)

one-dimensional χ : k.l χ -test comparison of an emprirical and theoretical distribution

2

4-fields-χ -test

2

k variables configural-frequency analysis for alternative variables

configural-frequency analysis for multi-level variables

References and recommendations: Books Brilliant introduction into the basics of Statistics with lots of laughter: Field, A. (2012). Discovering Statistics using IBM SPSS Statistics. 4rd Ed.; Sage Publications. or Field, A. (2012). Discovering Statistics using R ….; Sage Publications. Field, A. (2012). Discovering Statistics using SAS ….; Sage Publications. More advanced/specialized literature:    

 

Applied  Statistics  in  Biological  Research       26  

Stevens, J.P (2009). Applied Multivariate Statistics for the Social Sciences. 5th Ed.; Routledge, NY. Tinsley, Brown (2000). Handbook of applied multivariate statistics and mathematical modelling.     Cohen et al. (2003). Applied multiple regression / correlation analysis for the behavioural sciences. 3rd ed. Ewens, W.J. and Grant, G.R. (2002): Statistical methods in bioinformatics. Springer, New York. Very good for small samples and non-parametric tests (though in German): Bortz, J.; Lienert, G.A. (2008). Kurzgefasste Statistik für die Klinische Forschung. Leitfaden für die verteilungsfreie Analyse kleiner Stichproben. Springer, Berlin. Basic books and bibles: Stevens (1999). Intermediate statistics. A modern approach Gravetter, Wallnau (2012). Statistics for the Behavioral Sciences, 9th ed. Bortz, J. (1999): Statitsik für Sozialwissenschaftler. 5. Auflage; Springer, Berlin. Bortz, J. & Döring, N. (2006): Forschungsmethoden und Evaluation: für Human- und Sozialwissenschaftler: Fur Human- Und Sozialwissenschaftler (Springer-Lehrbuch) 4. Auflage; Springer, Berlin ___________________________________________________________________________