The Gaussian distribution (normal distribution)

2009-09-01 The Gaussian distribution (normal distribution) When the distribution of the observations is normal, then 95% of all observation are loca...
Author: Rolf Banks
2 downloads 0 Views 88KB Size
2009-09-01

The Gaussian distribution (normal distribution)

When the distribution of the observations is normal, then 95% of all observation are located in the interval: mean-1.96⋅SD to mean+1.96⋅SD represents a descriptive 95% confidence range for the individual observations, whereas the 95%CI for the mean represents a statistical uncertainty of the arithmetic mean

90% of all observation are located in the interval: mean-1.64⋅SEM to mean+1.64⋅SEM represents a descriptive 90% confidence range for the individual observations, whereas the 90%CI for the mean represents a statistical uncertainty of the arithmetic mean

99% of all observation are located in the interval: mean- 2.58⋅SEM to mean+2.58⋅SEM represents a descriptive 99% confidence range for the individual observations, whereas the 99%CI for the mean represents a statistical uncertainty of the arithmetic mean

1

2009-09-01

HYPOTHESIS TESTING in statistical inference

1. State hypotheses for studied groups or populations 2. Verify hypotheses by using appropriate statistical tests

The null hypothesis (often written Ho) proposes that there is no difference. The null hypothesis is the basis of a statistical test. If a "significant difference" is found, the null hypothesis is rejected, but if no difference is found,, the null hypothesis yp is accepted. p

When the null hypothesis is rejected, then the alternative hypothesis (often written H1) is accepted

What about an error of statisitcal infererence?

2

2009-09-01

Types of "error " are defined in terms of the null hypothesis. Two potential errors could be made. They are called Type I and Type II.

Null hypothesis Decision

True

False

Accept Ho Reject Ho

OK Type I

Type II OK

These four examples summarize the possibilities surrounding these errors. First, if we are given a situation in which the null hypothesis is true, that is, there is no difference, we can either accept it and make the correct decision, or reject it and make an incorrect decision, or a Type I error. Second,, if the null hypothesis yp is false,, we can reject j it making g a correct decision,, or accept it and make an incorrect decision, or a Type II error.

Null hypothesis Decision

True

False

Accept Ho Reject Ho

OK Type I

Type II OK

3

2009-09-01

Statistical Inference In statistics, a sample is used to obtain results that are representative for a target population (whole) to which we want to generalize.

How can a sample that is representative of that population be selected? One way is by random selection A random sample is drawn from the selection. population of interests so that every member of the population has the same chance of being selected in the sample.

PARAMETRIC vs. NONPARAMETRIC TESTS Nonparametric Tests Nominal Data Two-Group Case Chi-square

k-Group Case Chi-square Dependent Groups Two-Group Case McNemar test k-Group Case

Ordinal Data

Mann-Whitney U

Kruskall-Wallis

Wilcoxon Friedman matched samples

Parametric Analog

t - test

One-way ANOVA

Paired t - tests Repeated measures' ANOVA

4

2009-09-01

CHI-SQUARE TEST

It is used when the data are nominal (categorical) and presented in tables of counts, frequently to test whether two or more characteristics are independent. independent It compares the actual number (or frequency) in each group with the "expected" number. We present the observed counts as a contingency table. The number of rows and columns of the table depends on the number of categories of two variables. In a contingency g y table,, we create an index that computes p for each outcome cell in the following way: (observed count - expected count)2 / expected count and then we sum this index over all cells. If O stands for observed counts and E for expected count, the index is written chi2 = Σ(O-E)2 / E

Example 1: At City Hospital, patients often do not show up for their clinic appointments. To determine whether a telephone reminder improves the show rate, you telephoned 25 patients the day before their appointment. Of the 25 patients who were telephoned, 20 kept their appointments. Only 8 of 20 who were not telephoned attended the clinic. (a) Construct a 2x2 table of observed and expected counts assuming independence between the telephone reminder and the show rate. (b) Compute the chi-square test and the P value.

5

2009-09-01

Example 2: For three successive 10-week periods, the number of weeks with migraine attacks has been: Period First 10 weeks

Second 10 weeks

Third 10 weeks

Attacks

6

2

1

No attacks

4

8

9

NONPARAMETRIC TESTS Comparing two dependent groups - the Wilcoxon test The Wilcoxon test is used to determine whether two dependent samples (groups) represent similar populations. This free distribution test is based on the ranked differences between the measures of matched members of two groups, and therefore is an ordinal test. It is used as an alternative of paired t-test for dependent groups when the sample distributions are normal distributions.

6

2009-09-01

Comparing two independent groups - the Mann-Whitney test The Mann-Whitney test is used when the distribution of a variable is not normal normal, and t-test t test for two independent groups cannot be used. It is based on procedures whereby the interval scores of the two distributions are combined, the scores are assigned ranks from the lowest to the highest value, and then the sums of the ranks for the two distributions are compared compared. If the two distributions have similar interval scores, the ranks should also be similarly distributed across the two groups. The test is particularly sensitive to differences in the central tendency in the distributions of the two populations.

PARAMETRIC ANALOGS THE PAIRED t TEST Two samples are said to be paired when each data point of the first sample is matched and is related to a unique data point of the second sample. The paired samples may represent two sets of measurements on the same people. In this case each person is serving as his or her own control. control Analyzing the data, we wish to test the null hypothesis that the difference between the two sample means is not significant.

7

2009-09-01

Let us consider the difference between two corresponding measurements di = xi - yi . If all di are normally distributed, then the best test of the hypothesis H0 is based on the mean difference d. Specifically we have the following test procedure as the paired t test: t = d / (sdd / SQRT(n)) where sdd is the standard deviation of the observed differences.

t TEST FOR TWO INDEPENDENT SAMPLES Two samples are called independent when the data points in one sample are unrelated to the data points in the second sample. We will assume that measurements are normallyy distributed in the first group g p and in the second group as well. We wish to test the null hypothesis that the difference between the two sample means is not significant, i.e., x-y = 0. t= (mean(x)-mean(y)) / S / sqrt(1/n1 +1/n 2) where S=sqrt( ( (n1-1)s1 + (n2-1)s2)) / (n1+n2-2) )

8