Supplementary points. Multiple categorical variables. Supplementary points. Data set women94. Multiple Correspondence Analysis. Supplementary points

Supplementary points Multiple categorical variables Multiple Correspondence Analysis Suppose J substantive categories and K demographic groups Burt...
Author: Oliver Cain
4 downloads 2 Views 715KB Size
Supplementary points

Multiple categorical variables Multiple Correspondence Analysis

Suppose J substantive categories and K demographic groups

Burt matrix 1

J

1

K

B

Supplementary points Missing and “middle” responses

Z1

Supplementary points Burt matrix

B Suppose J substantive categories and K demographic groups

Indicator matrix of individual respondent (case) data

1

J

1

Z1 T Z1 J 1

Z2 T Z1

K

Z1

1

K

CA of B (or adjusted): standard coordinates the same shows each demographic category, at the average of the cases in this category shows each case as a point, at the average of his/her responses

Z2

1

J

1

K

1

J 1 K

Z1 T Z1

Z1 T Z2

Z2 T Z1

Z2 T Z2

Data set “women94”

Substantive variables: Do you strongly agree/ agree/ neither…nor…/

disagree/ strongly disagree to these statements… A: a working mother can establish a warm relationship with her child B: a pre-school child suffers if his or her mother works C: when a woman works the family life suffers D: what women really want is a home and kids E: running a household is just as satisfying as a paid job F: work is best for a woman’s independence G: a man’s job is to work; a woman’s job is the household H: working women should get paid maternity leave

Demographic variables g: gender (1=male, 2=female) m: marital status (1=married/living as married, 2=widowed, 3=divorced, 4=separated, but married, 5=single, never married) e: education (0=no formal education, 1=lowest education, 2= above lowest education, 3=higher secondary completed, 4=above higher secondary level, below full university, 5=university degree completed a: age (1=16-25 years, 2= 26-35, 3=36-45, 4=46-55, 5=56-65, 6=66 and older) Sample: Spanish sample (year 2002); N=2471 (including missing values)

ISSP 1994 survey on Family and Changing Gender Roles A [+] B [–] C [–] D [–] E [?] F [+] G [?] H [+] I [–] J [?] K [?]

A working mother can establish just as warm and secure a relationship with her children as a mother who does not work A pre-school child is likely to suffer if his or her mother works All in all, family life suffers when the woman has a full-time job A job is all right, but what most women really want is a home and children Being a housewife is just as fulfilling as working for pay Having a job is the best way for a woman to be an independent person Most women have to work these days to support their families Both the man and woman should contribute to the household income A man’s job is to earn money; a woman’s job is to look after the home and family It is not good if the man stays at home and cares for the children and the woman goes out to work Family life often suffers because men concentrate too much on their work

Data – middle alternative Common to both surveys is the measurement scale, a 5-point bipolar scale: extreme categories

strongly agree

agree

neither agree nor disagree

disagree

strongly disagree

missing

moderate categories

We are particularly interested in the middle alternative, how it associates – with other middle alternatives – with other response categories – with the demographic covariates

24 countries (N=33,590)

Of course, there are also missing values, but since we will analyse the data at the nominal level, a missing value is just an additional category

Background & previous research

Background & previous research

+ in favour of working women – against women working ? not clear

Presser & Schuman (1980) The measurement of a middle position in attitude surveys The Public Opinion Quarterly. Revised version in Schuman & Presser, Questions & Answers in Attitude Surveys. Experiments on Question Form, Wording, and Context. Sage, 1996

Andrich, de Jong & Sheridan (1997) Diagnostic opportunities with the Rasch model for ordered response categories. In: Applications of latent trait and latent class models in the social sciences

Use contingency tables and 2 tests in 5 split-ballot experiments.

Use Rasch modelling.

They assess the consequences of offering/omitting “a logical middle position, for example whether one is liberal or conservative could be answered by ‘middle-of-the-road’ ”…. “Although there is a very slight decrease in the proportion of spontaneous ‘don't know’ responses when the middle alternative is offered, almost all the change in the middle position comes from a decline in the polar positions.”

“…the middle category designated as Neutral, Not Sure or Undecided in the Likert-style response format … should not be treated as an attitude more or less somewhere between a negative and a positive attitude.”

Background & previous research

Background & previous research O’Muircheartaigh, Krosnick & Helic (2000) Middle alternatives, acquiescence, and the quality of questionnaire data.

González-Romá and Espejo (2003) Testing the middle response categories «Not sure», «In between» and «?» in polytomous items.

Working paper posted on web.

Psicotema

Use response counts and structural equation modelling

Use Bock’s Nominal Model to verify ordering.

‘Approximately half the respondents … were asked to select an answer from among the following five alternatives: strongly agree, agree to some extent, neither agree nor disagree, disagree to some extent, strongly disagree. These responses were coded 5, 4, 3, 2, and 1, respectively. The other half of the respondents … were asked to select an answer from a set of four, omitting the “neither agree nor disagree” option. Answers were coded 5, 3.66, 2.33, and 1, respectively.’

They use different wordings for the middle category, and find that the ordering is verified only for the wording “in between”.

Hernández, Espejo and González-Romá (2006) The functioning of central categories Middle Level and Sometimes in graded response scales: Does the label matter?

…“We also found evidence of acquiescence response bias in answers to the agree/disagree items...”

Psicotema

Methods

We use correspondence analysis (CA) and several of its variants to visualize and interpret the relative positions of the response categories, at a country level, a respondent level and demographic-subgroup level. 1. Simple CA provides maps of aggregated or count data, for example proportions of question responses for the set of countries. 2. Subset correspondence analysis permits focusing on a particular set of response categories so that we can eliminate missing responses, for example, or restrict our attention to specific categories such as the middle responses. 3. Multiple correspondence analysis (MCA) and its refinement joint correspondence analysis (JCA) provide maps of individual-level data, concentrating on the two-way relationships between the questions. In this respect MCA functions like an exploratory factor analysis for categorical data (on nominal scales). Usually we are not interested in individual case points but in mean points for the demographic and other external variables (e.g., age, level of interest…) 4. Canonical correspondence analysis (CCA) focuses on (or partials out) external variables – thus we can explore variation in the responses that is focused on interest, for example, or eliminate aquiescence effects.

B: A pre-school child is likely to suffer if his or her mother works AU DW DE GB NI US A H I IR NL N S CZ SL PL BG RU NZ CD RP IL J E ave

B-agree 0.497 0.684 0.325 0.377 0.382 0.406 0.720 0.727 0.671 0.472 0.439 0.359 0.272 0.487 0.584 0.664 0.682 0.717 0.498 0.306 0.564 0.434 0.373 0.516 0.514

B-MX 0.149 0.141 0.186 0.215 0.187 0.142 0.106 0.160 0.152 0.120 0.225 0.218 0.267 0.197 0.177 0.115 0.178 0.130 0.182 0.183 0.144 0.193 0.238 0.137 0.171

B-disagree 0.354 0.175 0.490 0.408 0.431 0.452 0.174 0.113 0.177 0.407 0.336 0.424 0.461 0.315 0.238 0.220 0.140 0.153 0.320 0.511 0.292 0.373 0.389 0.346 0.316

Question B: A pre-school child is likely to suffer if his or her mother works

Ternary coordinates: Euclidean distance

B: A pre-school child is likely to suffer if his or her mother works B-agree AU 0.497 DW 0.684 DE 0.325 GB 0.377 NI 0.382 US 0.406 A 0.720 H 0.727 I 0.671 IR 0.472 NL 0.439 N 0.359 S 0.272 CZ 0.487 SL 0.584 PL 0.664 BG 0.682 RU 0.717 NZ 0.498 CD 0.306 RP 0.564 IL 0.434 J 0.373 E 0.516 ave 0.514

B-MX 0.149 0.141 0.186 0.215 0.187 0.142 0.106 0.160 0.152 0.120 0.225 0.218 0.267 0.197 0.177 0.115 0.178 0.130 0.182 0.183 0.144 0.193 0.238 0.137 0.171

B-disagree 0.354 0.175 0.490 0.408 0.431 0.452 0.174 0.113 0.177 0.407 0.336 0.424 0.461 0.315 0.238 0.220 0.140 0.153 0.320 0.511 0.292 0.373 0.389 0.346 0.316

Question B: A pre-school child is likely to suffer if his or her mother works

Stretched ternary coordinates: chi2 distance

Question B: A pre-school child is likely to suffer if his or her mother works

From 3 to 4 categories… Middle response (M) and missing response (X) separated

Animation achieved by saving 101 frames in R of the ternary diagram as the metric smoothly changes from equal weighted Euclidean distance to differentially weighted chisquare distance, then frames saved into a GIF file

B-agree AU 0.497 DW 0.684 DE 0.325 GB 0.377 NI 0.382 US 0.406 A 0.720 H 0.727 I 0.671 IR 0.472 NL 0.439 N 0.359 S 0.272 CZ 0.487 SL 0.584 PL 0.664 BG 0.682 RU 0.717 NZ 0.498 CD 0.306 RP 0.564 IL 0.434 J 0.373 E 0.516 ave 0.514

B-M 0.140 0.104 0.150 0.176 0.151 0.121 0.088 0.146 0.138 0.082 0.197 0.179 0.220 0.178 0.144 0.068 0.067 0.086 0.157 0.160 0.143 0.174 0.199 0.090 0.138

B-disagree 0.354 0.175 0.490 0.408 0.431 0.452 0.174 0.113 0.177 0.407 0.336 0.424 0.461 0.315 0.238 0.220 0.140 0.153 0.320 0.511 0.292 0.373 0.389 0.346 0.316

B-X 0.009 0.037 0.036 0.040 0.036 0.021 0.018 0.014 0.015 0.038 0.028 0.039 0.047 0.020 0.033 0.047 0.110 0.044 0.026 0.022 0.001 0.019 0.039 0.047 0.033

Rotation in three dimensions of the country profiles within a tetrahedron, starting and ending with middle (M) and missing (X) categories lined up (i.e., the twodimensional map seen previously. Chi2 distance evens out the contributions by the categories

Question B: A pre-school child is likely to suffer if his or her mother works

From Euclidean to chi2 distance (regular to irregular simplex)

Question B: A pre-school child is likely to suffer if his or her mother works

principal axes of CA

irregular simplex in higher dimensions

B-X

Two-dimensional CA map, with “asymmetric scaling”, i.e.

1994: response proportions, 24 x 66 table

2

0.0142 (13.0 %)

Question A “A working mother can establish just as warm and secure a relationship with her children as a mother who does not work”

1.5

• rows (countries) in principal coordinates as the projections of the profiles

1

0.5

• columns (response categories) in standard coordinates as the projections of the corners of the simplex

BG

B-disagree E IR PL RU SDE N NIGB DW J NZ CD US IL SL A NLAU CZ I H RP

0

0.0875 (80.4 %)

B-agree

-0.5

B-M -1.5 -1.5

-1

-0.5

0

0.5

1

“Symmetric” CA map of response proportions 1 strongly

0.0472 (19.3%)

liberal 0.6

C5 B5

D5 I5 J5

0.4

G5

-0.8 -0.6

2 agree M middle 4 disagree 5 strongly

K5 F5 J

extreme responses

disagree

X missing E1 K1 G1

C1

Qu.I “Man’s job is to earn money; woman’s job is to look after home and family”

traditional BG

D1

H B1

J1 I1

23-dimensional

I2

J2 RP

Total inertia = 0.2453

traditional

60.9% inertia explained in 2-d CA map -0.4

-0.2

A2 0.349 0.356 0.275 0.443 0.468 0.409 0.230 0.197 0.409 0.428 0.490 0.415 0.432 0.261 0.430 0.319 0.218 0.406 0.402 0.412 0.568 0.429 0.158 0.406

AM 0.092 0.040 0.017 0.120 0.085 0.050 0.033 0.194 0.100 0.057 0.098 0.129 0.131 0.096 0.077 0.065 0.071 0.069 0.093 0.069 0.145 0.092 0.147 0.036

A4 0.282 0.158 0.050 0.188 0.201 0.189 0.150 0.161 0.179 0.215 0.155 0.254 0.152 0.263 0.268 0.327 0.139 0.185 0.285 0.160 0.213 0.166 0.073 0.337

A5 0.085 0.040 0.008 0.043 0.054 0.048 0.041 0.107 0.094 0.100 0.026 0.052 0.034 0.157 0.038 0.074 0.207 0.039 0.059 0.035 0.018 0.064 0.077 0.053

AX 0.009 0.037 0.022 0.030 0.037 0.015 0.021 0.013 0.003 0.017 0.023 0.032 0.030 0.012 0.018 0.041 0.072 0.050 0.028 0.015 0.001 0.016 0.030 0.027

······

K1 0.127 0.117 0.099 0.058 0.068 0.084 0.259 0.384 0.117 0.135 0.049 0.090 0.067 0.196 0.098 0.098 0.477 0.111 0.107 0.099 0.046 0.200 0.245 0.078

K2 0.608 0.488 0.427 0.544 0.495 0.480 0.418 0.349 0.542 0.596 0.531 0.570 0.352 0.386 0.523 0.519 0.253 0.295 0.606 0.476 0.468 0.494 0.264 0.510

KM 0.138 0.148 0.180 0.180 0.181 0.198 0.126 0.171 0.154 0.071 0.217 0.179 0.286 0.220 0.181 0.122 0.083 0.243 0.138 0.210 0.213 0.148 0.138 0.088

K4 0.112 0.142 0.180 0.172 0.212 0.168 0.127 0.055 0.145 0.132 0.155 0.107 0.189 0.140 0.130 0.147 0.045 0.224 0.119 0.163 0.251 0.103 0.081 0.226

K5 0.013 0.028 0.042 0.021 0.005 0.028 0.039 0.025 0.028 0.033 0.012 0.012 0.041 0.040 0.019 0.023 0.036 0.050 0.011 0.031 0.021 0.023 0.235 0.021

KX 0.003 0.077 0.072 0.025 0.040 0.041 0.032 0.017 0.014 0.032 0.036 0.043 0.065 0.019 0.048 0.091 0.105 0.077 0.018 0.020 0.002 0.032 0.037 0.076

0

0.2

0.4

e.g., Spain 14.1% strongly agree to Qu.A (average across countries: 24.9%)

7.8% strongly agree to Qu.K (average across countries: 13.6%)

Decomposition of inertia across categories

agree

H5

CD E5 A1 DE HM GM S BM 0.2liberal JM NL I4 N EM FM GXEX F1 US DM CM D4 G4 IM NZC4 H1 A J4 B4H4 F4E4 I KMIL AM DX CX A5 AU GB 0 CZ 0.1021 (41.6%) DW JX CX AX K2 NI IX HX KX G2 A2IR FX K4 B2 F2 -0.2 moderate H2A4 C2 SL E & middle E2 responses PL RU -0.4 D2

-0.6

A1 0.183 0.369 0.627 0.176 0.155 0.289 0.524 0.327 0.215 0.183 0.208 0.118 0.222 0.212 0.169 0.173 0.293 0.250 0.134 0.308 0.054 0.232 0.515 0.141

“Family life often suffers because men concentrate too much on their work”

-1

93.4% inertia explained

0.8

AU DW DE GB NI US A H I IR NL N S CZ SL PL BG RU NZ CD RP IL J E

Question K

······

0.6

0.8

1

Inertia contributions

Category 1 2 M 4 5 X total

Inertia 0.0921 0.0400 0.0197 0.0394 0.0418 0.0123 0.2453

Missings (X) 1 2 M 4 5 X

Middles (M)

middles & missings account for 0.03200 of the total inertia of 0.2453, i.e. , 13.0%; “extreme” responses account for 0.1339, i.e. 54.6%.

Methods We use correspondence analysis (CA) and several of its variants to visualize and interpret data at a country level as well as respondent and demographicsubgroup level. 1. Simple CA provides maps of aggregated or count data, for example proportions of question responses for the set of countries. 2. Subset correspondence analysis permits focusing on a particular set of response categories so that we can eliminate missing responses, for example, or restrict our attention to specific categories such as the middle responses (Greenacre & Pardo, SMR, 2006) 3. Multiple correspondence analysis (MCA) and its refinement joint correspondence analysis (JCA) provide maps of individual-level data, concentrating on the two-way relationships between the questions. In this respect MCA functions like an exploratory factor analysis for categorical data (on nominal scales). Usually we are not interested in individual case points but in mean points for the demographic and other external variables (e.g., age, level of interest…) 4. Canonical correspondence analysis (CCA) focuses on (or partials out) external variables – thus we can explore variation in the responses that is focused on interest, for example, or eliminate aquiescence effects.

Animation achieved by changing weight

CA of proportions of response categories 0.8

H5 0.6

0.2

0

-0.2

-0.4

-0.8 -0.6

Data: WOMEN WORKING, stacked frequencies, N=33590

K5 F5 J

Method: Contribution of categories not in subset reduced by factor , from 1 (CA) to limiting case of 0 (subset CA), always using original masses for centring and weighting)

extreme responses

E1 K1 G1

C1

BG D1

H B1

J1 I1

responses

I2

RP

-0.6

For example, to show the missings and all moderate responses in this example) we multiply these dummy variables by  where  starts at 1 (i.e., the regular MCA) and decreases in small steps of 0.01 until 0 (i.e., the subset MCA). At each step a hybrid of a regular MCA and a subset MCA is performed, maintaining the margins of the table constant.

G5

CD E5 A1 DE HM GM middles S BM JM NL I4 N EM FM GXEX F1 US DM CM D4 G4 IM NZC4 H1 A J4 B4H4 F4E4 I KMIL AM DX CX A5 GB AU CZ 0.1021 (41.6%) DW JX CX AX K2 NI IX HX KX G2 A2IR FX K4 B2 F2 missings H2A4 C2 SL moderate & E E2 middle PL RU D2 J2

The animations which link methods are achieved by either reducing the mass of certain points or transferring mass between points.

…………

C5 B5

D5 I5 J5

0.4

CA to subset CA

 = 1, 0.99, 0.98, 0.97, …………, 0.04, 0.03, 0.02, 0.01, 0

0.0472 (19.3%)

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Total inertia

Principal inertias in 2-d soln.

Subset CA of proportions of middles and missings

CA to subset CA

0.4

0.0065 (20.3%)

GX DX HX

EX

Data: WOMEN WORKING, stacked frequencies, N=33590

AX FX CX BX

HM JX 0.2

Method: Contribution of categories not in subset reduced by factor , from 1 (CA) to limiting case of 0 (subset CA), always using original masses for centring and weighting)

GM 0

EM FM S DM NL N AM BM CM KM IM J NZUS JM CD GB H

Total inertia

BG

PL DW

NI

RU

CZ

AU

SL IL

KX

IX

A

IR

0.0144 (45.0%)

E DE

I RP

-0.2 -0.6

Principal inertias in 2-d soln.

Methods We use correspondence analysis (CA) and several of its variants to visualize and interpret data at a country level as well as respondent and demographicsubgroup level. 1. Simple CA provides maps of aggregated or count data, for example proportions of question responses for the set of countries. 2. Subset correspondence analysis permits focusing on a particular set of response categories so that we can eliminate missing responses, for example, or restrict our attention to specific categories such as the middle responses. 3. Multiple correspondence analysis (MCA) and its refinement joint correspondence analysis (JCA) provide maps of individual-level data, concentrating on the two-way relationships between the questions. In this respect MCA functions like an exploratory factor analysis for categorical data (on nominal scales). Usually we are not interested in individual case points but in mean points for the demographic and other external variables (e.g., age, level of interest…) 4. Canonical correspondence analysis (CCA) focuses on (or partials out) external variables – thus we can explore variation in the responses that is focused on interest, for example, or eliminate aquiescence effects.

0 of the M- and 0.2 X-percentages, 0.4 Of the (small -0.4 – 13.0% – part-0.2of the) inertia which 0.6 is contained in a 22-dimensional space, 65.3% is explained in this map. The fact that all the M’s are together and all the X’s together and separate, does not reflect category associations at an individual level. For example, the proximity of Spain (E) and Russia (RU) means that their percentages of M- and Xresponses are similar (less M’s than average, more X’s than average)

Looking at respondent-level data To investigate response behaviour at individual respondent level we pass from CA to multiple correspondence analysis (MCA). The classic definition of MCA is the CA of the data coded in an indicator matrix, i.e., as dummy variables, one variable for each response category ABCDEFGHIJK

A1 A2 AM A4 A5 AX

1 3 3 5 5 1 1 1 5 5 4

1

0

0

0

0

0

0

0

1 ...

9 2 2 2 2 4 4 4 2 2 2 . . .

0

0

0

0

0

1

0

1

0 ...

Original responses (Q = 11)

B1 B2 BM ...

. . .

Dummy variables (J = 66)

Dimensionality is 66 less the 11 linear restrictions on the columns, i.e. 55

MCA of all response categories 0.5

0.1071 (16.2%)

I5 J5 0

EM B4 D4 C4 FM E4BM GM DM KM JM HM K4 CM F4 I4 J4G4 K2

all extremes

D5 E5 C5B5 F5 H5 G5K5 F1 K1 H1 E1 C1 D1 A1 G1 A5 B1 J1 I1

A2 AM H4 F2 G2 E2 IM

agree

2 agree M middle 4 disagree 5 strongly

0.1213 (18.3%)

C2 D2A4 B2 all H2J2 I2 moderates & middles

1 strongly

disagree

-0.5

X missing -1

EX

all missings

DX KX

Decomposition of inertia across categories in MCA Inertia contributions

Category 1 2 M 4 5 X total

Inertia 0.1297 0.0827 0.0881 0.0928 0.1207 0.1485 0.6626

Missings (X) 1 2 M 4 5 X

Middles (M)

FX

-1.5

Total inertia = 0.6626

JX BX HX GX -2

CX

34.5% inertia explained

AX

middles & missings account for 0.2366 of the total inertia of 0.6626, i.e. , 35.7% ; missings account for largest part of inertia

Adjusted value: IX

-2.5 -0.5

0

0.5

59.9% inertia explained 1

Rotating the subset MCA solution Data: WOMEN (excluding missings) WORKING,

MCA to subset MCA Data: WOMEN WORKING, Burt matrix, N=33590 Method: Contribution of categories not in subset reduced by factor , from 1 (CA) to limiting case of 0 (subset CA), always using original masses for centring and weighting)

stacked frequencies, N=33590 Using ca package in R. middles

Total inertia

Principal inertia in 2-d soln.

Also showing the country points in “symmetric” scaling so they are more spread out in the visualization for ease of interpretation Reference: Nenadić, O. & Greenacre, M.J. (2007). Correspondence analysis in R, with two- and three-dimensional graphics: the ca package. Journal of Statistical Software 20(3). URL http://www.jstatsoft.org/v20/i03/.

What does a “perfect unidimensional model” look like in an MCA? Data that follow a perfect traditional-to-liberal scale were generated, reversing the scales of oppositely worded statements, and randomly adding 3% missing responses. Here are different MCA maps of the data:

0.0

B5 J2 F5

-0.5

-0.5

0.0

0.5

1.0

1.5

BX FX

1

GM F4

G5 B2

J4 CM

K5 A5

H4 DM

E5

C2 D2

KM B4

EM IM AM

I5 A1

I1

E2

E5

E1

I2

DM H4

D1 C1

K1 G1

I5 A1 CM J4 FM G4

K1 G1 J1 H1

F1

J1 H1

A4 BM K4

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

-2

B1

-1

0

1

2

In first two dimensions:

In dimensions 1 and 3:

In dimensions 1 and 4:

Parabola: the “horseshoe”/ “arch”/ “Guttman” effect

Cubic

Quartic, etc....

Looking by country: Spanish and West German data

• Previous MCA maps based on all 33590 respondents from 24 countries.

MCA of Spanish “women working” data (N = 2494)

• We will also introduce the following demographic variables into our study:

3

Missing categories

Dim 2

• Our aim is to investigate how respondents use the middle responses and if there any associations with demographic variables. To avoid inter-country differences in our results we concentrate (separately) on two countries: Spain (N = 2494) and West Germany (N = 2324) – as we shall see, they present contrasting results.

HXIX AX

4

• We have already seen that there are inter-country differences in their overall levels of middle and missing responses.

2

E5 F5 G1 H1 I1 K1 J1

C4 JM

H5 J5 FM G4

1

A2 H5 G5 F1 E1 D1 C1 B1 A1

AX HX DX GX EX KX IX JX CX

-3

K5 J5 B2 I5

B5 J2 K2 C5 D5

C5

D4 HM

D5

EM

H2 C5 D5

-1.0

D2 C2

-1.5

Extreme categories (strongly agree and strongly disagree)

. . ......................... . ..... . . .................. .. ... .... ... . . . ........... ............................ .. . .. ... . ....... . . ................................. .............. ............................... ............. .. . . ..............H4 ......................................... .... ........HM GM KM .............................. ...................IM CM DM ......A4 ......F4 G4 BM Moderate and middle K4 JM AM EM C4 .....K2 FM B4 .......... ......... ....D2 ....E2 B2 J4 E4 D4 G2 J2 C2 A2 categories

0

Age (6 categories) Education (7 categories) (Education not available in Spanish 1994 sample)

.

. B5 F5 C5 G5 . . . K5 K1 H5 J1 D5 . ..D1 C1 E1 . .EX .. KX ... B1 . J5 .G1 . . . . . I1 . . .... . ..... ...A5 . .. E5..F1 . .. .. . .....H1 .. ... . . .. . . I5 A1 ...... ...... .............. .. . . . .. .... .. ... ..... ............ .. ... ... .... . . ... . . ..... . . .. ... . ..

Gender (2 categories) Marital status (5 categories)

. . JX . .GX. BX FX . DX CX

I4F2 H2 I2

-1

0.0

E2

G5 J5 H5 F2

AX IXEX BX HX IM KX JXFX GX DX CX

G2

I2 A5

-1.0

K2

F2

F2

B5 F5

0

1.5

C9 H9 A9 D9B9 K9 I9 F9E9 J9G9

H2

-0.5 -1.0

0.5

0.5

E4

A4 BM K4

H2

A5 K5 B2

AM

I2 E2D2 C2

E4

I1

B4 KM

AM H4 G4

K2A2 G2 I4 J2

-1

KM D4

B1 F1 C1 D1 E1

GM F4

I4 A2

1.0

JM C4

F4 G2

D4 HM C4 JM E4

I4 BM

-2

1.0

GM FM EM DM K4 CM J4 A 4 HM B4 IM

What does a “perfect unidimensional model” look like in an MCA?

-2

0

2 Dim 1

4

traditional H5

4

Subset MCA of Spanish “women working” data

Rotating the subset MCA solution (subset excluding missings)

moderate & middle

strong

I1 J1 B1 D1 C1 A5 E1

2

. . . H4 . . .. .. ... . ... .. ... . ... .. .. .. . . . . .... ...... .. .. .. . . . . .. ....... ..... . ..... ..... .... . .. . .. . . . . . . .. . ......................... ....F4 . A4 I2 K1 . .... .. ... ...... ...................HM ... . .......D2 J2 . . . . .... .................................B2 ....E2 .. . . .....................C2 ..... . . . .. ...................................G4 . . . . . . . . . . . . . . . . . . . . . . . . G1 .... .IM DX CX BX AX HX GX FX EX JX .IX...K4 . . .... . . ...........KX .AM .......... ...........K2 ....G2 . .. . . .......... .....................................FM H2 ....DM ................................F2 ..... ... ...............GM . . . . . . . . . .. . ....JM .... BM .. .. ........ .......................KM ......CM ..EM ..E4 F1 . A2 . I4 . . . J4 . . . . . . . . . . .H1 .. .. ....... ....... . D4middles . .. . .. . .. ...................... B4 K5 .. .. ........ .. .. . . C4 ... ..A1 . ..

F5 G5

(N = 2494)

0

missing responses excluded from subset

Data: WOMEN WORKING, Spanish data, subset MCA, N=2494 Method: Rotation about 2nd axis, recorded using plot3d function in ca package for R

J5 E5 I5

-2

middles amongst the moderate responses in this two-dimensional view

.

B5 D5 C5

liberal -4

6

5

N=2324 GX

4

JX EX

IX AX

-1

0

1

0.288 (24.9%)

CX BX

3

-2

6

0.288 (24.9%)

MCA of W.GERMAN DATA

-3

MCA to subset MCA

5

Missing categories FX

4

KX DX

Data: WOMEN WORKING, W.German sample, N=2324

3

HX (B5) Strongly disagree that preschool child will suffer because 1 mother works

2

0

2

1

D5 C4J5 B4 I5 C5 B5 I4CM E5D4 J4H2 E4 F2 G2 A1 LI BM BE RA L

-1

0.318 (33.1%)

K2 B2 EM A2 GM K4 C2 IM JM HM J2 H1 K5 G4 E2 F1 KM DM AM F4 H4 I2 FM D2 G1 A4 F5 B1 J1 K1 E1 C1 G5

-2

TR AD I TI -3 -2

-1

Method: Contribution of categories not in subset reduced by factor , from 1 (CA) to limiting case of 0 (subset CA), always using original masses for centring and weighting)

0

1

2

0.318 (33.1%)

(H5) Strongly disagree that man & women should both contribute to I1 household A5 income D1

0

-1

Principal inertia in 2-d soln.

-2

H5 ON AL 3

4

Total inertia

-3 -2

-1

0

1

2

3

4

Subset MCA of W German “women working” data

(N = 2324)

missing responses excluded from subset

5

B5

liberal 4

C5 K5

“strongly disagree” that family life often suffers because men concentrate too much on their work

3

D5

I5

J5 E5

H5

2

A1 1

G1

B4 C4 BM

0

traditional

F5 G5 K1

H1 F1

E1

D1 A5 I1

B1 J1 C1

K4

CM E4 GMKM G4 HM G2 D4I4J4H2 JM K2 EM F4 H4 AMFM F2 DM A2 IM B2 E2 D2 I2A4 C2 J2

-1

-2 -2

Rotating the subset MCA solution (excluding missings) Data: WOMEN WORKING, W.German data, subset MCA, N=2324 Method: Rotation about 1st axis, recorded using rgl.snapshot in rgl package and plot3d.ca in ca package for R

-1

0

1

2

3

4

Rotating the subset MCA solution (excluding missings) Data: WOMEN WORKING, W.German data, subset MCA, N=2324 Method: Rotation about 2nd axis, recorded using rgl.snapshot in rgl package and plot3d.ca in ca package for R

Response sets in Spanish sample Looking more closely at the individual Spanish data, we discover the following response sets (figures for W.Germany for comparison): “strongly agree” to all questions “agree” to all questions “neither/nor” to all questions

– 4 respondents – 20 respondents – 6 respondents “disagree” to all questions – 2 respondents “strongly disagree” to all questions –0 missing values for all questions – 18 respondents 50 out of the 2494 respondents, i.e. 2%

(WG) (2) (6) (0) (0) (0) (6) (½%)

The “middle” and “missing” response sets accentuate the association within these categories The “strongly agree” and “agree” response sets (categories 1 and 2) to questions which have reverse orientations will tend to bring the opposite poles closer than they would otherwise have been. We now remove all these response sets, all the features previously seen are still there, just their ordering on the principal axes changes: e.g., the group of “missings” is now on the 3rd dimension and the “middles” on the 5th.

MCA of Spanish data without response sets

1.5

0.059 (23.8%)

(without 50 response sets) Dimensions 1 and 2

C5

MCA of Spanish data without response sets

B5 I5 E5 J5

1

C4 B4 D4 I4J4 BM KM A2 E4EM JMCM F2 H2 DM GM G2 K2K4 FM

0.5

0

Data: WOMEN WORKING, Spanish sample, N=2444

D5

A1 H1 F1 K5 0.102 (41.0%)

AM E2 IMG4 C2B2 J2 D2 I2 A4 F4 HM

-0.5

G1 K1

KX JX EX GX CX DX BX H4 AX FX IX

-1

-1.5

G5 F5 E1 C1 D1 B1 J1 I1

A5

HX

-2

Data: WOMEN WORKING, Spanish sample, N=2444 (without 50 response sets) Dimensions 1, 3 and 5

H5 -2.5 -1

-0.5

0

0.5

1

1.5

2

2.5

Focusing the display on the number of middle responses 11 questions (A,B,C,D,E,F,G,H,I,J,K) from ISSP survey

Count of middles

A1 A2 AM A4 A5 B1 B2 BM B4 B5 ... K1 K2 KM K4 K5

n respondents

g groups

1/n 1/n . . . . . . . 1/n r1 r2 . rg

0 0 1 0 0 0 1 0 0 0 ... 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 ... 0 1 0 0 0

Z

# m m0 m1 ... m6+

1 0

0 1 ... 0 1 0 ... 0

Classification in terms of number of middles

X

X TZ ( 1 )1 0  n  w    (1   )   Pass the weight smoothly from the      0  r  respondents to the group centroids:

MCA  MCA-Discriminant analysis (middle groups): Spain

MCA  MCADiscriminant analysis (middle groups): W. Germany

Methods We use correspondence analysis (CA) and several of its variants to visualize and interpret data at a country level as well as respondent and demographicsubgroup level. 1. Simple CA provides maps of aggregated or count data, for example proportions of question responses for the set of countries. 2. Subset correspondence analysis permits focusing on a particular set of response categories so that we can eliminate missing responses, for example, or restrict our attention to specific categories such as the middle responses. 3. Multiple correspondence analysis (MCA) and its refinement joint correspondence analysis (JCA) provide maps of individual-level data, concentrating on the two-way relationships between the questions. In this respect MCA functions like an exploratory factor analysis for categorical data (on nominal scales). Usually we are not interested in individual case points but in mean points for the demographic and other external variables (e.g., age, level of interest…) 4. Canonical correspondence analysis (CCA) focuses on (or partials out) external variables – thus we can explore variation in the responses that is focused on interest, for example, or eliminate aquiescence effects.

Partialling out acquiescence effects

Identifying or partialling out “acquiescence effects” 11 questions (A,B,C,D,E,F,G,H,I,J,K) from ISSP Family and Changing Role survey II (1994).

O’Muircheartaigh, Krosnick & Helic (2000) ‘We estimated a model in which all items were allowed to load on the same latent factor representing attitude toward science, plus a second latent factor intended to represent acquiescence. All items were constrained to load equally on this latter factor, an assumption required to identify the model. This is reasonable, because acquiescence is defined as a tendency to agree with any item regardless of its content, so it should account for the same amount of variance in responses to all the items. The acquiescence factor was constrained to be uncorrelated with the factor representing attitudes toward science, another assumption required in order to identify the model.’

A1 A2 AM A4 A5 AX B1 B2 BM B4 B5 BX ... K1 K2 KM K4 K5 KX

0 0 1 0 0 0 1 0 0 0 0 0

0 1 0 0 0 0 ... 0 0 0 0 1 0 0 0 0 0 1 0 ... 0 0 0 0 0 1

Z

counts of number of responses in 1 2 M 4 5 X each category 2 3 1 3 1 1 3 1 0 2 2 3

X

The counts in the matrix X are variables which quantify tendencies to use the same response category. CCA can restrict the MCA solution to be linearly related to any subset of these external variables. This is a projection in the MCA space. We can also partial out acquiescence effects by looking in the space orthogonal to the restricted space (this is an alternative strategy to subset CA).

I1

A1 H1 I5 A5 F1 E5

J5 B1 G1 C1 E1 D1 D5 J1 K5H5 K1 G5 B5C5 F5

KX

CX DX BX GX FX

-3

AX HX IX

1 0 -2

Data: WOMEN WORKING, Spanish sample, N=2494

missings

JX

I2 J2 H ED 2 F2 C 22 B2 A2 G 2 I4 A4 K2 J4 F4 E 4AM FM JM D 4 K4 G 4IM B4 BM EC M CM D4 M H4M KM GH M

KX EX

1

H1

0

F1

C4 B4 D4 J4A2 I4 KM C M E4 BM JX M GXDE MJM GM IX H2 GF2 2 XXAM AXD C BX E XK4 FM IMK2 KX

missings

G1

G4 E22 B2 C FX DJ2 2 HX HM I2 F4 A4

K1

-1

G5 F5 H4 E1 A5 C1 D1 B1 J1 I1

H5

-3

-2

-1 CA1

0

1

4

Data: WOMEN WORKING, Spanish sample, N=2494

1

CCA counts of missings and middles as external variable which restricts the solution

H5

0

1

J5 E5 I5

K5

3

H4 J1 J5 G 4D I1E 1 K5 F4 C 4B2 C 1 G 5 J4 C 4 5 B1 K4 G 1 D 5 H 1I21 I4 J2 C 2 A4 E 5 D 2 B4 K2 K1 F2 E 4 E 2 G F5 B5 F1 A5 I5 A1 H2 A2

KX EX

HX

CX X DXG BX FX

JX

IX

AX

missings

CCA2

D5

A1

2

-1

0

1

CCA1

B5

-2

Data: WOMEN WORKING, Spanish sample, N=2494

-1

AX H X IX

GJM M EKM MM D BM HM M C IM FM AM

middles

-2

C5

CA2

Partial CCA solution orthogonal to one that restricts the solution to be linearly related to missing counts

-2

JX

missings

H I1 1 A1 A5 I5 E1 F1 1 B1 CD 1J1 H EG 5 1 K5 K1 J5G5 F5 D5 CB5 5

0

-3

C XBX FX D X GX

CA1

-1

EX

-2

Data: WOMEN WORKING, Spanish sample, N=2494

CA2

0

I2J2E H2 I4 2F2 A2 C2 G2 D2 D4 EB4 4 K2 J4 B2 FM C4 EM AM JM K4 BM A4F4 DM IMG4 CM KM GM HM H4

-1

1

MCA

CCA counts of missings as external variable which restricts the solution

0

1

2 CCA1

3

4

Partial CCA

CCA solution partialling out the middles and missings and restricted to the extremes and moderates

solution orthogonal to one that restricts the solution to be linearly related to all acquiescence effects Data: WOMEN WORKING, Spanish sample, N=2494 H5

G5 F5 A5

I1

Data: WOMEN WORKING, Spanish sample, N=2494

H4

B1 J1D1 C1 K5 E1

A2 C4 B4 F2 D BM J4 G 2I4 H 2 D X E 44 C X JM IX K2 BX KX IM KM C M AM JX HM FX K1 E M D M E X AX E 2G FM X G M K4 D 2 HX C2 J5EI5 5G H 1 J2 B2 F1 GA4 4 I2 D5 A1 F4 B5 C5

CCA of Spanish “women working”:cases

CCA of Spanish “women working” data:categories 6

conservative, extreme responses

+

4

3

4

H5

G5

all middles and missings at centre:

+ +

2

2

F5

A5

liberal, moderate responses

1

+

F4

K5

A2 C4 F2 I4B4 E1 BM D4 G2 H2 DX EJ4 4 JM IX BX CX K2 KXIM KM AM CM JX HM FX AX EM E X K1 DM E2GX FM GM D2 HXK4C2 H1 J5 E5I5 G1 J2B2 F1 I2 G4A4 D5 A1 B5

-1

H4

-4

conservative, moderate responses -2

C5

0

++ + +

+ +

+ ++ +++ ++ + + ++ ++++++++ + +++ + +++ ++ + +++ ++ + ++ ++ + + + + + + +

liberal

+

-2

0

conservative

0

+ B1 J1 D1 C1 I1

liberal, extreme responses 2

-8

-6

-4

-2

0

2

4

4

CCA of W.German “women working”:categories

CCA of W.German “women working”:cases 6

H5

G5

4

3

conservative, extreme responses

2

2

F5

A5

1

liberal, moderate responses

0

conservative

F4

-1

conservative, moderate responses -4

-3

-2

0

D1 CM B4 J1 K5 A2F2 H2I4BM C4 GXKM J4D4 K4 E1 HX G2 AXE E XME4 FX C1 B1HM DX JM KX A1 CX K1 K2 BX IM G1 E2 B2 JX GM AM C2 IX D5 FM D2J2DM C5 J5E5I5 F1 H1 I2

liberal

liberal, extreme responses

B5 H4 G4 A4

-1

0

1

-2

I1

2

-4

-2

0

AM

• Gender : g1 (male), g2 (female) 6 groups: a1 (up to 25), a2 (26-35), a3 (36-45) a4 (46-55), a5 (56-65), a6 (66 and over)

exactly those worded positively towards women

HM

1

• Age

EM FM GM KM

0

• Marital status 5 groups: m1 (married), m2 (widowed), m3 (divorced), m4 (separated), m5 (single) • Education 7 groups: e1 (none), e2 (incomplete primary), e3 (primary), e4 (incomplete secondary), e5 (secondary), e6 (incomplete tertiary), e7 (tertiary)

exactly those worded negatively towards women

DM IM

-1

JM

(Education not available for Spanish sample in 1994)

2

4

Subset MCA of middle categories – “women working”, Spain 2

Demographic categories

-6

00

11

BM CM

22

3 3

Demographic averages on subset MCA map of middle responses – “women working”, Spain 0.10 0.05

statements positive about women working

widowed 66+yrs m2

divorced

a6

M

0.00

g1 a2

a1

m5 m1 a3a4 g2

F

more middles

a5

56-65yrs

AGE GROUPS

1.2 1.1

(M2) widowed 0.308

1.0

(not significant)

0.9

widowed & separated groups small samples

0.8

(not significant)

Middles Missings

0.7

(M4)separated 0.654 0.6

m4

separated

(significant P

Suggest Documents