Tests for categorical data

Tests for categorical data Tests for categorical data Variables Numerical Categorical (quantitative) (qualitative) Discrete Continuous e.g. s...
Author: Barbara Price
5 downloads 3 Views 954KB Size
Tests for categorical data

Tests for categorical data Variables

Numerical

Categorical

(quantitative)

(qualitative)

Discrete

Continuous

e.g. sick days

e.g. weight, height

Ordinal

Nominal

e.g. disease stage

e.g. sex, blood group

Tests for categorical data Two examples of categorical variables: Questions from the population survey in Östergötland 2006. What is your highest level of education?  Primary school (9 years)  College (2-4 years)  University

What do you do for your living?    

Employd Unemployd Student Else

Tests for categorical data Comparing the results for the two main cities in Östergötland.

Comparing the results for men and women in Östergötland.

Tests for categorical data Better to compare relative frequencis

Tests for categorical data A statistical test for differencies in the distribution of level of education. 𝑯𝑯𝟎𝟎 : The populations distribution is the same in the two cities 𝑯𝑯𝟏𝟏 : The populations distribution is not the same in the two cities

Remember: Based on our sample, we are testing hypothesis for population characteristics .

Tests for categorical data The same principles as for t- or z-test apply: • State your statistical hypothesis • Draw a random sample • Calculate the test variable, given H0 is true • Reject H0 or do not reject H0

What are the expected frequensis given 𝑯𝑯𝟎𝟎 is true?

Tests for categorical data

The expected count for primary school in Linköping given 𝑯𝑯𝟎𝟎 is true is 𝟎𝟎. 𝟐𝟐𝟐𝟐 � 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 = 𝟐𝟐𝟐𝟐𝟐𝟐

Tests for categorical data A general formula for expected counts is 𝒓𝒓𝒐𝒐𝒐𝒐 𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 � 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎𝒎 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝟓𝟓𝟓𝟓𝟓𝟓�𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐 𝟗𝟗𝟗𝟗𝟗𝟗�𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐

=261.0 =495.6

𝟓𝟓𝟓𝟓𝟓𝟓�𝟗𝟗𝟗𝟗𝟗𝟗 𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐

=244.0

and so on

Tests for categorical data The test variable compares observed frequencis to expected frequensis like 𝑶𝑶 − 𝑬𝑬 𝟐𝟐 Χ =� 𝑬𝑬

𝟐𝟐𝟐𝟐𝟐𝟐 − 𝟐𝟐𝟐𝟐𝟐𝟐 𝟐𝟐 Χ = 𝟐𝟐𝟐𝟐𝟐𝟐

𝟐𝟐

𝟐𝟐

𝟐𝟐𝟓𝟓𝟓𝟓 − 𝟐𝟐𝟒𝟒𝟒𝟒 + 𝟐𝟐𝟐𝟐𝟐𝟐

𝟐𝟐

+ ⋯ = 𝟐𝟐𝟐𝟐. 𝟕𝟕

The test variable is approximately Chi2-distributed with (r1)(c-1) degrees of freedom given 𝑯𝑯𝟎𝟎 is true.

Assumption: All expected values ≥ 5.

Tests for categorical data Choose a significant level, e.g. α=0.05 and compare the value of the test variable with the table value found in a Chi2 table.

Tests for categorical data Reject 𝑯𝑯𝟎𝟎 if observed χ2 > critical value

Conclusion: Reject 𝑯𝑯𝟎𝟎 . The distribution of level of education is probably not the same in Linköping and Norrköping.

Tests for categorical data In SPSS you find Chi2 test here

Tests for categorical data

Tests for categorical data

Tests for categorical data Example: Compare proportions in more than two groups. Suppose a new drug is to be tested in four different patient groups. The result is a binary variable, either there is a positive response or not.

Patient group 1

2

3

4

Total

Positive response

45

35

30

45

155 (38.75 %)

No response

55

65

70

55

245 (61.25 %)

Tests for categorical data Hypothesis: 𝑯𝑯𝟎𝟎 : 𝝅𝝅𝟏𝟏 = 𝝅𝝅𝟐𝟐 = 𝝅𝝅𝟑𝟑 = 𝝅𝝅𝟒𝟒

𝑯𝑯𝟏𝟏 : at least one 𝝅𝝅𝒊𝒊 ≠ 𝝅𝝅𝒋𝒋 Patient group

1

2

3

4

Positive response

45 (38.75)

35 (38.75)

30 (38.75)

45 (38.75)

No response

55 (61.25)

65 (61.25)

70 (61.25)

55 (61.25)

Χ 𝟐𝟐

=∑

𝑶𝑶−𝑬𝑬 𝟐𝟐 𝑬𝑬

= 𝟕𝟕. 𝟏𝟏𝟏𝟏

3 d.f.

p-value = 0.068

Conclusion: There is no statistical evidence of different proportions.

Tests for categorical data Another application of a Chi2 test Suppose you want to test the distribution of a categorical variable. You have a sample and want to find out if this sample is from a population with this particular distribution. Example: Let the definition of socioeconomic status divide the population in Sweden in four categories of equally size. (It could be based on e.g. occupation, education, income). We have a sample from a population of patients and want to check if the socioeconomic distribution is the same in the population of patients as in the swedish population.

Tests for categorical data Formalise this in to statistical hypothesis. 𝑯𝑯𝟎𝟎 : 𝝅𝝅𝟏𝟏 = 𝟎𝟎. 𝟐𝟐𝟐𝟐 𝝅𝝅𝟐𝟐 = 𝟎𝟎. 𝟐𝟐𝟐𝟐 𝝅𝝅𝟑𝟑 = 𝟎𝟎. 𝟐𝟐𝟐𝟐 𝝅𝝅𝟒𝟒 = 𝟎𝟎. 𝟐𝟐𝟐𝟐 𝑯𝑯𝟏𝟏 : all proportions is not as specified in H0 Patient group

1

2

3

4

Observed counts (%)

24 (20.0)

33 (27.5)

42 (35.0)

21 (17.5)

Expected counts (%)

30 (25.0)

30 (25.0)

30 (25.0)

30 (25.0)

Tests for categorical data 𝑶𝑶 − 𝑬𝑬 𝟐𝟐 Χ =� 𝑬𝑬

𝟐𝟐

= 𝟗𝟗. 𝟎𝟎

in a one sample case d.f.= number of categories-1 critical value = 7.81

3 d.f.

α=0.05

reject H0 , the distribution in the patient population is not the same as in H0

Tests for categorical data You find one sample Chi2 test in SPSS here

Tests for categorical data

Tests for categorical data

Tests for categorical data

Tests for categorical data

Tests for categorical data Summary for Χ 𝟐𝟐 -tests for categorical variables

• Comparing the distribution of two or more populations. Degrees of freedom = (r-1)(c-1) • Comparing the distribution of one population to a hypothesised distribution. Degrees of freedom = number of categories – 1 • Compute expected count given H0 is true. • Calculate

Χ 𝟐𝟐

=∑

𝑶𝑶−𝑬𝑬 𝟐𝟐 𝑬𝑬

• Compare to critical value or look at p-value • Reject or not reject H0