Factor Analysis for Categorical Data

Factor Analysis for Categorical Data Author(s): D. J. Bartholomew Reviewed work(s): Source: Journal of the Royal Statistical Society. Series B (Method...
Author: Jade Bates
3 downloads 0 Views 3MB Size
Factor Analysis for Categorical Data Author(s): D. J. Bartholomew Reviewed work(s): Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 42, No. 3 (1980), pp. 293-321 Published by: Wiley-Blackwell for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2985165 . Accessed: 30/10/2012 08:49 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

.

Wiley-Blackwell and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series B (Methodological).

http://www.jstor.org

J. R. Statist. Soc. B (1980), 42, No. 3, pp. 293-321

Factor AnalysisforCategoricalData By D. J. BARTHOLOMEW Lonidoni School ofEconiomics anidPoliticalScienice [Read beforethe ROYAL

at a meetingorganizedbythe RESEARCH May 21st,1980,ProfessorP. WHITTLE in theChair]

STATISTICAL SOCIETY

SECTION

on Wednesday,

SUMMARY

The methodof factoranalysisis widelyused as an exploratory tool to reducethe ofmultivariate dimensionality data.Thefactthatthestandard modelisstrictly applicable onlywhenthemanifest variables arescaledis a seriouslimitation insocialsciencewhere thevariables areoften categorical. Inthispaperweaimtoprovide a theoretical framework withinwhichmethodsforthefactoranalysisofcategorical data can be devisedand compared.Discussionis restricted to thecase of orderedcategories wherethelatent variablesarecontinuous. It is arguedthatthechoiceofmodelshouldbe madefroma restricted set whichincludestwoexisting modelsas specialcases.A newmethodis proposed together witha simple approximate technique offitting fortheone-factor model. Thepaperconcludes withanevaluation ofexisting methods andmakessomesuggestions aboutthedirection whichfuture research shouldtake. Keywords:FACTOR ANALYSIS; LATENT STRUCTURE ANALYSIS; MULTIVARIATEANALYSIS; CATEGORICALDATA;MULTI-DIMENSIONAL CONTINGENCYTABLES;DATAREDUCTION; SCALING;ORDINALDATA 1. THE BACKGROUND

ofthedata. AN important objectofmuchmultivariate analysisis to reducethedimensionality

both to providean This is particularly desirablein theexploratorystagesofan investigation linesformodel building.When the variablesare intelligible summaryand to suggestfruitful continuousand measuredon a commonscale,principalcomponentanalysisoftenservesthis purpose.Factor analysisachieves much the same end by settingup a model in whichthe observedvariablesare relatedto a smallerset oflatentvariablesand to an "error".Neitherof thesemethodsis directlyapplicable to categoricalvariablesyetthe need fordata reduction Thisis particularly trueinthesocialsciences is no lesspressing. techniquesinsuchcircumstances for wheremuchofthedata arisingis categorical.The aimofthispaperis to providea framework the developmentof methodsfor use when all the variables are measuredon an ordered categoricalscale. In the process we shall show how earlierapproaches for dichotomous variablesarise as special cases of our generalformulation. We assume that we have a simplerandomsample of size N whose membersare crossclassifiedon p categoricalorderedvariables.Orderingis verycommonbut it can always be achieved,at the loss of some information, by reducingeach dimensionto a dichotomy.The have a joint tablewhosecellfrequencies sampledata can be setout in a multi-way contingency thatany marginalfrequenciesbe fixed. Thereis no requirement multinomialdistribution. of theoriginalcontingency Our aim is to determinewhetherthep-variaterepresentation loss of information, table can be replaced,withoutsignificant by one in a smallernumberof forthisto be done alreadyexistsin dimensions.We shallarguethatthetheoreticalframework latentstructureanalysis,of whichnormalfactoranalysisis a special case. It appears to have Latentstructure analysishas receivedlittleattentionfromstatisticians. originatedwithLazarsfeld,a sociologist,and is expoundedin Lazarsfeldand Henry(1968).A morerecentdiscussionofsomeaspectsfroma statistical pointofviewis containedin Goodman

294

BARTHOLOMEW -

Factor Analysisfor CategoricalData

[No. 3,

haveusedlatent is provided byFielding(1977).Sociologists (1978)and a usefulintroduction The rawdata in such and measuring attributes. analysisas a toolforinvestigating structure under to elicittheattitude to questions designed byindividuals casesconsistoftheresponses areconsistent themultidimensional responses thenliesin whether Theinterest investigation. areencountered scales.Similar problems attitude ofone(ormore)underlying withtheexistence is reference A keystatistical withabilitiesratherthanattitudes. concerned bypsychologists oflatentstructure foundation LordandNovick(1968).Someofthisworksharesthetheoretical scalingarisesas a specialcaseof andHenry(1968)pointoutthatGuttman Lazarsfeld analysis; theirmodel. their structure sharea commonmathematical thesetwofieldsofapplications Although Whenscalingabilitiesone startsfromthe are somewhatdifferent. ways of proceeding existsandiscapableofbeingmeasured intelligence) thatan ability (suchas general supposition tobe related whicharebelieved selected aretherefore ona numerical scale.Responsevariables butin ofability. Similar conditions mayalsoapplywithattitudes dimension totheunderlying direction. Thatistheresponses itis moreusualtoproceedinthereverse enquiries sociological thereis evidenceofone or moreunderlying are givenand theaim is to discoverwhether in Thisis thepathusuallyfollowed whichcouldaccountfortheresponse pattern. dimensions andistheoneweshalladopthere. factor andexploratory analysis analysis component principal Partlythisisbecause withstatisticians. Latentvariablemodelshavenotfoundwidefavour has themodels.Thisdifficulty methods usedforfitting arbitrary ofthetediousandsomewhat More procedures. efficient estimation toimplement byusingcomputers beenlargely overcome ofthemethods andarbitrariness serious, havebeendoubtsabouttheapparentsubjectiveness aspectsof andtechnical theresults. thisisbecausethesubstantive Possibly, usedforinterpreting intheappliedfieldcandeploythem thatonlyan expert areso closelyintertwined themethods and arisentomeetrealneedsinpsychology modelshavecertainly Latentstructure effectively. (see,forexample,Aignerand amongeconomists interest sociologyand thereis a growing inmostqualitative analysesof 1977).I wouldarguethatsuchmodelsareimplicit Goldberger, and,as explicit nature ofstatisticians tomaketheir socialphenomena andthatitisthebusiness overduebutin thepresent Attention is therefore bystatisticians faras possible,quantitative. dataand areaofcategorical Itis totaketherelatively neglected paperouraimis moremodest. be described as factoranalysis. considerhowbestto carryoutwhatmayreasonably data.Someearlier to treatthefactoranalysisofcategorical attempt Thisis notthefirst ofthe within theframework tobringtheproblem haveaimed,bysomemeansorother, attempts factor common model.Muthen normaltheory, (1978)isthelatestina groupofpapers standard (1975)whichdealwiththecasewhere BockandLieberman (1970)andChristofferson including thatthe2Pcontingency In essence,theydo thisbysupposing all variablesaredichotomous. intotwo distribution multinormal ofa p-variate eachdimension tablearisesfromgrouping ofthefactor Theunderlying variables arethenassumedtohavethelinearstructure categories. and Henry(1968)andLord model.Thisapproachis verycloselyrelatedto thatofLazarsfeld and Novick(1968).All ofthesemodelsariseas specialcasesofourgeneralapproach. data and he also McDonald(1969)proposeda methodfor analysingmulti-category point. modelhisstarting structure muchoftheearlier work.Likeushemadethelatent reviewed neither doesthatofBock ofthecategories; McDonald'smethoddoesnotutilizetheordering logisticmodel. (1972),basedon a multivariate and the In thispapertheaim has beento approachtheproblemfromfirstprinciples tobe Muchmoreremains thancomputational techniques. rather is onfundamentals emphasis on thecomputational achievedso farareencouraging. sidebuttheresults doneespecially 2. THE

MATHEMATICAL FRAMEWORK

and Notation 2.1. Terminology

dataas initsgeneral context andthenintroduce theproblem categorical Webeginbysetting andaredenoted variables whichweobservewillbecalledmanifest a specialcase.Thevariables

1980]

BARTHOLOMEW -

Factor Analysisfor CategoricalData

295

modelsupposesthesevariablesto be relatedto a setof byx = (x 1,x2, ..., xp)T.A latentstructure q unobservablelatentvariablesdenotedby y = (Y1, Y2, ..., Yq)T. For themodelto be practically betweenx and yis stochasticand may usefulq needsto be muchsmallerthanp. The relationship ofx giveny. be expressedby a conditionalprobabilityfunction7(x Iy) beingthedistribution This will be a densityor probabilitymass accordingas x is continuousor categorical.The problemis to infersomethingabout y fromtheobservedvalues ofx. Let p(y)denotethejoint of the y's and f(x) thatof the x's thenthe two are relatedby distribution f(x)

=

{

P(Y)dy, y(xy)

(1)

whereR is the rangespace of thelatentvariables.Afterx has been observedour knowledge about y is givenby (2) p(YI X) = P(Y)7(X IY)/f(X). ofy is of The data reductionwe are seekingis thusachievedfromthefactthatthedistribution smallerdimensionthan that of x. In practicewe may well be contentwithsome suitable of y such as E(y I x). summarymeasureof the conditionaldistribution a cell ofthetableandf(x) will table,x willidentify contingency In thecase ofthemulti-way be its multinominalprobability.We shall label the categoriesalong each dimensionby 0, 1,2,..., 0 beingthe"lowest"level,1 the nextand so on. Thus, forexample,thedesignation to thecellwherethefirstvariableat level0,thesecondat level2,thethirdat level (0,2, 1,3) refers featureof our model is thatthe latentvariablesare 1 and the fourthat level 3. A distinctive continuous;p(y)and p(yIx) are thusdensities.This choiceis based on thefactthatmostlatent variableswhich arise in social science discourseare thoughtof as being continuous.For are all regardedin example,qualityoflife,standardofliving,politicalhue and aggressiveness thisway.The case wherethelatentvariablesarebettertreatedas categoricalmaybe handledby latentclass analysis,forwhichsee Goodman (1978). 2.2. Assumptions which Littleprogresscan be made withoutsome assumptionsabout thevariousfunctions thatis that we have defined.The firstassumptionwe make is thatthe y's are independent, P(Y) =

q

l P(YA).

i= 1

compellingreasonforthisassumptionbutitmakestheanalysiseasierto Thereis no completely seemsreasonableto adopt it untilpracticalconsiderations It therefore carryout and interpret. dictateotherwise.The secondassumptionis about theformofp(yi).We shallarguebelowthat this distributionis essentiallyarbitraryand that the choice may be made to suit our convenience.For this reason we have made it uniformon (0, 1). The justificationforthis assertionrequiresus to look morecloselyat thenatureofa latentvariable.Thereseemto be two distinctcases as follows: (a) The latentvariablemay be "real" in the sensethatit could,in principle,be measured An examplewouldbe somesensitivequantitylikepersonalwealth.To avoid askingthe directly. in thehope directquestionwe mightask a batteryofquestionsabout possessionsand life-style in thiscase. The variable-wealth and scale theunderlying thattheymightenable us to identify and it would be quiteinappropriateto assume ofwealthis certainlynot arbitrary distribution thatit was uniform.Such cases seem quite rare.More commonlywe have the second case. evenin (b) The latentvariableis not"real"meaningthatitcould notbe measureddirectly, economyofthought.Attitudesand abilities principle.It is a mentalconstructused to facilitate largelycome into thiscategory.

296

- Factor Analysisfor Categorical Data BARTHOLOMEW

[No. 3,

toconstruct oneto suitour scaleinsuchcasesweareat liberty Sincethereis no "natural" availablein themanifest Sinceordering is thehighestlevelof measurement convenience. to askforno morethanan ordinallevelofmeasurement ofthe itseemsreasonable variables transformation totheextent thatanymonotonic latentvariables also.Sucha scaleis arbitrary ofthelatent thedistribution ofthechosenscale wouldserveequallywell.Thus whatever suchas theuniform, by distribution, variableona chosenscaleitcanalwaysbe givena desired monotonic an appropriate changeofscale. aboutthis.The tobespecified is i(y Ix).Wemaketwoassumptions function Theremaining totherationale ofthemethod, is thatofconditional whichis fundamental crucialassumption, We assumethat independence. p

n(x IY)=

nli(xi

(3)

IY)

This meansthatthe observeddependenceamongthe x's is whollyexplainedby their on they's.Eliminating variation inthelatterremoves theinter-dependence ofthe dependence on the bytheirdependence explained amongthex's is fully x's.In thatsensetheassociation are thattheobserved variables tothehypothesis expression latentvariables. Thisgivesformal oflatent interms ofa smaller If(3)werenottrueitwouldimply dimensions. describable number on thex's. a commoninfluence thattherewas someothervariableexerting Undertheassumptions madeso far(1) becomes f(x) =

I' Ij O

.. *@J O

p

l

i=1

p

i(xiI y)dy = E [l zi(xiIy). i=l

(4)

is calledthetracefunction, oftheresponse function Thechoiceoftheform ni(xiIy),sometimes ofthemodel. thefinalstepin thespecification 3. THECHOICE OF RESPONSE FUNCTION thateachvariableis For theapplication to contingency tableswe shallsuppose,initially, willbe relaxedin Section7. In thiscase we maywrite dichotomous. Thisrequirement i(xiIy) = {4(y)}Xi{ 1_ 7i(y)I-xi, (xi = 0,1);

(5)

ofa response at the"upper"levelon theithmanifest probability ;i(y)is thustheconditional variable(also spokenofas a positiveresponse). whichthefunction We beginby listingsomeproperties ;i(y) shouldpossessand then of Let # denotethefamily considerwhether functions existwhichmeetthespecification. thenwe claimthat# shouldpossessthefollowing properties: acceptablefunctions (i) 0 0, otherwiseRi -1 < 0 withequalityonlyifat leastone ofni(y)and icj(y)are constant. Proof Ri - 1 = {Eni(y) nj(y) - Eni(y) Enj(y))}/Eni(y)(1 - j(y))Eicj(y)(1 - i(y)) hence the signof Ri -1 is the same as thatof di,= Eni(y)7cj(y) - Eni(y)Eij(y). Now I

di

;i(y) {fj(y) - Eij(y)} dy.

=

Supposefirst that c1j(y) is monotonicdecreasing, thenwecan findy = y*suchthaticj(y)> Eicj(y) foryk y* and icj(y)< Eij(y) fory< y* so thatdij may be written ;i(y) {j(y)

dij=

-

Eij(y)} dy+

{

;i(y) {fj(y)- Eij(y)} dy.

If 7;i(y)is also monotonicnon-decreasing

J {icj(y) Enj(y)}dy+ 7ci(y*)J {j(y)

dij ;> i(y*)

-

-

Ej(y)} dy = 0.

If both functionsare monotonic non-increasinga similar argumentleads to the same conclusion;otherwisetheinequalityis reversed.Equalityobviouslyoccursonlywhenone or both functionsare constant. The practicalrelevanceofthistheoremis as follows.Ifwe reversetheorderofthecategories on dimensioni theprobabilityofa positiveresponsewillbe 1- ici(y)insteadof ri(y).If;i(y) was formerly decreasingthe correspondingprobabilityfor that dimensionwith the categories reversedwillbe increasing.In theone-factor case, therefore, itmustbe possibleto re-orderthe dimensionsso thattheresponsefunctions eitherall increaseor all decrease.Ifthisis done all the modelto be appropriateitis thusnecessary(butnot (Rii- 1)'swillbe positive.For a one-factor thatan orderingof the manifestdimensionsexistssuch thatall the cross-product sufficient) ratiosare greaterthanone. Suppose,forexample,thata tablewithp = 4 has R12> 1,R13, R14, R23, R24 < 1,R34 > 1.Reversing thecategorieson a dimensionchangesthesignofRi - 1.In this case reversingthe orderon dimensions1 and 2 willproducea set of positivevalues. In practice,ofcourse,theRij'shaveto be estimatedfromthesamplecross-product ratiosand so thesignscannotbe determined withcertainty. Nevertheless, ifall or mostofthe(RiJ- 1)'scan be made positiveit is worthtryinga one-factormodel.

300

BARTHOLOMEW -

Factor Analysisfor CategoricalData

[No. 3,

The followingtheoremholdsforall membersofthechosenfamilyofresponsefunctions. It providesthebasis fortheapproachto estimationproposedinthenextSectionand italso serves to displaya featurewhichlinksvariousmethodsof factoranalysistogether. Theorem2. Eni(y)lj(y) - Eni(y)Eij(y)

= Tr2G -

+ termsof the4th degreein oadand

whereT2 = EH2(yj) (i,j

q

1'(oiO)G - 1'(axO)E

k= 1

aik ajk

eJ7,

l,2,...,p; ioj). The proofis based on a straightforward Taylorexpansionoftheresponsefunction and term by termintegration. The left-hand side is thepredictedcovariancebetweenxi and xj and the theoremshows thatthishas a simpleformifthedeparturefromcompleteindependence,as measuredbythex's,is small.The covariancesin thenormaltheoryfactormodelhave theform whichare estimatesof the 4j= 1 aik ajk and thissuggestswe should look forsamplefunctions quantities =

E iry) i(y) - Eni(y)Eirj(y)

(13)

Ifsuchcan be foundtheywouldhavethesamestructure (forsmallx's)as in thenormalcase and henceknownmethodsofestimationcould be used.Ifwe takethelogitformforG and H itturns out that =

Ea,

k=1

ajk XJk+termsof 4th degree,

(14)

where U2 = E log2{y/( -y)} = 3-289,868. The samplecross-product ratiocan be used to estimateRij and hencethe x'svia (14). Since cross-productratiosare the "natural"measuresof associationin 2P contingency tables it is to findthemarisingin thiscontextthussupportingthechoiceof thelogitfunction. satisfying If the probitfunctionsare chosen then(13) is a firstapproximationto the tetrachoric correlationcoefficients whichmay be taken as partlyjustifying the heuristicmethodwhich carriesout a normalfactoranalysison thesecoefficients. 6.

LOGIT TO THE 2P TABLE 6.1. Methods Sincethelogitand probitfunctions are verysimilarone would expectbothversionsofthe generalmodelwe haveconsideredto givesimilarresultsand to involveabout thesameamount ofcalculation.The logitfunction is easierto computethantheprobitbutthisadvantageis likely to be fairly marginal.Bock and Lieberman(1970)developeda maximumlikelihoodmethodfor theprobitmodeland illustrated iton twoexampleswithp = 5 and q = 1.The methodinvolved extensivenumericalintegration and theysuggestedthatitwouldnotbe feasibleforp inexcessof 10 or 12. Christofferson (1975) founda fastermethodusinga least squares fitof Eni(y)and E;i(y) ij(y) (i,j = 1,2,...,p) to their sample estimates.Muthen (1978) has made further on thismethodby substantially improvements reducingtheamountofnumericalintegration required.It appears fromthisworkthatlittleinformation is lost by usingonlythe firstand secondordermarginsforestimation.Programscould be providedforthelogitmodelusingthe same methodsand theywould presumablyinvolvesimilaramountsof computation. However,thelogitmodelhas an important whichoftenmakesitpossibleto obtain property a simpleapproximatesolutionwhenq = 1.Thissolutionalso offers a good starting pointforan iterativeprocedureby whichit can be improved.The basis ofthemethodrestson thefactthat theapproximationgivenby(14) is remarkably good evenwhenthex'sare farbeyondtherange FITTING THE ONE-FACTOR MODEL

1980]

BARTHOLOMEW -

Factor Analysisfor CategoricalData

301

when theycan be describedas "small". Table 1 gives values of (Rii- 1)/ocia cjc2 forvarious combinationsof(;ri,7cj)and (oi,aj). The approximationis to be judged by theclosenessofthe ratiosto 1 (the second subscripton a has been dropped). TABLE 1

Valuesofcij = (Rij- 1)/oci oeja2. 7he entriesare unchangedif(;i, 7cj)is replacedby(1 -i, and (oi,xj) by (aj, ci)

1- icj)

(7ri, 7ri) (ire (irI) (1)10)

(2,2) (2,1) (2, i) (1, 1) (1, i) (j, i) (j, 4) (41

i)

-) (-h,

i)

0 942 0-801 0 614 0 912 0-846 0-935 0 917 0 971 0-994

1192 0 944 0 668 1063 0 934 1011 0 965 1.001 1000

(20,

0 984 0850 0-644 0 988 0 912 1-015 0-977 1-016 1-004

1 119 1008 0 731 1245 1125 1263 1139 1.119 1018

20)

1280 1-196 0820 1576 1372 1535 1288 1-188 1003

The worstcases occur when ci and ocaare farapart and when ;i and ij are small. Only positivevalues of ci and cajhave been consideredforreasonswhichwillemergebelow. Ri -1 is not theonlyfunctionoftheexpectationswhichhas ai ajU2 as thefirsttermofits theapproximationis much expansion.The same is true,forexample,of lnRi . Unfortunately whichwe have investigated. less good forthis,and otherfunctions, The basis of our methodofestimationis to findestimatesa and ir such that(a) thecrossproductratiosforthe model are as close as possibleto those observedand (b) the marginal usingtheresult ofthemodeland thedata agreeexactly.The x'sarefounditeratively proportions of (14) as a startingpoint.We proceedby thefollowingsteps. (1) Find a vectora such that oioj is as close as possible to the estimatedvalues of cij= (Rij- 1)/a2 fori,j = 1,2, ...,p; i:j. marginalproportionusing (2) Find irbyequatingEni(y)(i = 1,2, ...,p) to thecorresponding the vectora obtainedin step 1. (3) Improvethe estimateof a by a methodto be described. (4) Re-estimateir usingthe improveda. (5) Repeat the cycleuntilir and a (or A) converge. requiredwill Ifmanycyclesoftheiterationarerequiredtheamountofnumericalintegration computersifp is,say,greater preventthemethodbeingusedon a routinebasis withpresent-day than 10. However,we shall show that the firstapproximationis oftenquite adequate for practicalpurposes.Thiscan be obtainedrapidlyon a computerwithno practicallimiton p. The fromtheestimatedparametersdoes requirenumerical calculationoftheexpectedfrequencies (withall methods)and thismay take a considerabletimeforlargep. integration The methodrequiresp > 3; ifp > 3 itis notpossibleto reproducethecij'sexactlyso we must findan a such thatthedistancebetweenthecij's and the ai caj'sis as small,in some sense,as possible.Preciselythisproblemarisesin normaltheoryfactoranalysisin whichcontextthecij's are covariancesand the x's are factorloadings.One solution,also applicablehere,is based on minimizing p

p

E(^jaa)2. (A i= 1 j= 1 joI

302

BARTHOLOMEW -

Factor Analysisfor CategoricalData

[No. 3,

in It is an iterative knownas the"minres" forexample, procedure methodand willbe found, Harman(1970). whichisbothintuitively Analternative method appealing andeasytoapplyisas follows. We

shall call it the rowand columnmethod.

that thematrix withelements hastheproperty Consider {eiGej}(i,j-1, 2,...,p).Thismatrix

= (Rowi total)x (Column] total)/Grand total. (i,j)thelement (15) If we regardthecij'sas estimates oftheoff-diagonal elements we can treattheestimation forthismatrix suchthat(15)holds.Wecan,infact, as oneoffinding problem diagonalelements The x'swe seekwilltherefore the ensurethat(15) holdsforeverydiagonalelement. satisfy equations EC+ aX;4

a=

where p

Ci=

E Aij, C=

j=1 j#i

(i = ,2,...,p),

j=l

(16)

p

ECi.

i=1

to Theseequationsare equivalent oi(

oj oi=

Ci (i = 1,2,...,p).

(17)

as a meansofestimation sincetheyresult Theyhaveconsiderable appealintheirownright from ifthea's rowtotalstotheirobserved values.Theirsolution issimplest equating theoff-diagonal areall ofthesamesign.On thisquestionwe havethefollowing lemma: Thetwoa-vectors whichsatisfy ofthesamesignifand onlyif Lemma. (17)haveelements Cik0 forall i. Beforeproceeding withthemethodof solutionit is therefore to ensurethat necessary theorderofcategories, Ci> 0 (i = 1,2,...,p) bychanging if/necessary. WritingA = If I ci,(17) becomes oi(A-oi) = Ci or a3- Aai+Ci = 0

whichyields

(18)

ji= ZA? MYA2-40j)4 (i = 1,2,..., p).

(19) Ifwe sumbothsidesofthisexpression overi weshallobtainan equationforA. Oncethisis solved(19) willprovideestimates ofci (i = 1,2,...,p). Thereisan ambiguity ofsigninvolved in(19)whichleadstotwoalternative equations forA. A is therealroot,whereitexists,of p-2=

p

(1-4C./A2) .

i= 1

(20)

A is therealrootof Otherwise p-2

p-i

(I E(-4CJA)

i= 1

(-4Cp/2+

(21)

It is notimmediately obviousinwhatsensethisprocedure thedistance minimizes between the c. 's andtheproducts whenweobservethatthesameestimating aiaj. Thisbecomesapparent

1980]

BARTHOLOMEW -

Factor Analysisfor CategoricalData

303

equationsresultfrommaximizing

( xioj In =EE~ ~ci =1 j=11=1 p

p

i#j

p

i#j

withrespect tothex's.Thegreatest possiblevalueof isI I

oc ioej log (CcieC)

(22) so bymaking (22)as

nearto thisvalueas wecan weareachievingthebestfitina certainsense.For thesolutionof(17) to be a maximumit is necessaryforCi >0 (i = 1,2,...,p). This also ensuresthattheax'swillall have the same sign and thusthatthe argumentof the logarithmin (22) is positive. Having obtaineda we nextestimatethe ;1i's from N

dy, (i = 1,2,...,p), _______________

(23)

whereNi is thenumberofpositiveresponseson dimensioni. These equationsmay be solved iteratively by the usual Newton-Raphsonmethodusing;ri= NJ/Nas a startingvalue. The methodso farrestson thesuppositionthatcii = cicj. This is onlyan approximationso wenextwrite = where Oijdependsweaklyon ;i, 7cj,ai and cj butwillusuallybe closeto 1. Using the estimatesof a and ir alreadyobtainedwe nextestimateOi% by

cij aiajOij tij =

Ci (1ii,Jrj, cZi, Ii)/1ji c2j

(i,j = 1,2,..., p; i #j),

(24)

cij(i, i,rj 2,Lj) being the value of cij when the parametersare giventhe same values as the preliminary estimates.The cycleofestimationis now repeatedby replacingthestartingvalues cij by cij/i. It can happen,as one oftheexamplesbelow shows,thattheestimatesofa do not ofthisis apparentfromthefactthatthereis nothingin appearto convergeat all. The possibility In sucha case an iterative themodelto preventone or morea's beinginfinite. procedurestarting fromfinitevaluesmayneverterminate. Thisfeatureis notas seriousas itmightseem.Whenan x is large,greaterthan2 say,bigchangesin a produceonlysmallchangesin theshape of;i(y) and all thatmattersis hencein theoverallfitofthemodel.From thepointofviewofinterpretation no difficulty arisesifwe stoptheiteration thatthexin questionis "large".In practice,therefore, in the fitis obtained. as soon as no worth-while improvement wherethea's turnout to be smalland ofthesame order In mostcases we have investigated, of magnitude,convergenceof the parameterestimatesis rapid.This is especiallytrueof Rt.

6.2. Examples To illustratetheuse oftheone-factor logitmodeland to compareitwiththeprobitmethod we shallgivetheresultsoffitting themodelto sevensetsofdata. Two ofthesewereusedbyBock and Lieberman(1970),Christoffersson (1975) and Muthen(1978).Theyrelateto 1000cases on each ofSectionsVI and VII oftheLaw School AdmissionTest(LSAT). Backgrounddetailsand theoriginaldata are in Bock and Lieberman(1970).The resultsoffitting thelogitand probit modelsare givenin Table 2. For thelogitwe givethefirst approximation obtainedfrom(14) and the final estimatesafteriteration.For the probit model we give Bock and Lieberman's maximum likelihood estimates,Muthen's generalizedleast squares (GLs) estimatesand Muthen'sunweightedleast squares estimators.The latterare obtainedby doing a standard factoranalysison the tetrachoriccorrelationsobtainedfromthe table. We have re-parameterized the probitmodel to conformwith(6) as explainedlater.For Section VI the fitby all methodsis excellentwithA almost equal to its expectation.The betweenthevariousparameterestimatesare negligibleand wouldhave no effect differences on theinterpretation ofthefactor.On thesegroundsthereis therefore nothingto choose between

304

[No. 3,

Factor Analysisfor CategoricalData

BARTHOLOMEW -

thelogitandprobitmodels.In thecase ofSectionVII thefitis lessgoodandthereis greater variation intheestimates but,again,thesearenotsufficient to affect theinterpretation ofthe analysis. TABLE

2

estimates andgoodnessoffitfor theLSAT data usingtheprobitand logit Comparisonofparameter models Probit

Logit

SectionVI al 0(2 0(3

a4 a5

7rl 7r2 ir3 ir4 ir5 A

SectionVII a,

First approximation

Final estimate

00460 00431 00516 00401 00373 0-941 00731

0-410 0-424 0-538 0-391 0-351 0-938 0 730

0-418 0 433 0 537 0 404 0 359 0-924 0 709

0-417 0 455 0-510 0 457 0-380 0.925 0 707

0-784 0-885

0-763 0-870

0-762 0-870

0-562

0-785 0 887 21-24

0-663

0.555

0-552

0-763 0-870

21-28

00898

0-560 0-648 0-986

0-588 0-667 0.959

0.609 0-598 0.922

00444 0-876 00687 0-848 0-620 0-868

0-420 0-870 0-688 0-849 0-621 0-865

0-411 0-828 0.658 0-772 0.606 0.843

0-413 0.828 0-657 0 775 0-606 0-843

0430 0-828 0-658 0-772 0-606 0-843

00574

04

00455

0(5 7rl 7r2 ir3 i4 ir5 A

21-17

0-552

0-402 0-448 0.550 0-402 0 345 0-924 0 709

0 604 0-581 0 907

a2

a3

0-563

Maximum Muthen Muthen likelihood (GLS) (ULS)

33-1

0-465

32-21

0.462

0-480

0-480

31-59

Fromthecomputational isthesimplest andcan pointofviewthe"logitfirst approximation" easilybe carriedout witha pocketcalculatorforproblemsof thissize. In thatcase the in thesolutionof(23)can be avoidedbyusinga normalapproximation integration givenby fii .

+ ai2)+@- 1(Nj1N)}. Oct1

(25)

effort The computing requiredforthefinalestimates dependson how manycyclesof the iterationare necessaryand this,in turn,dependson the accuracyrequired.No exact butitseemslikely tobefaster haveyetbeenmadewiththevariousprobitmethods comparisons thanthemaximum likelihood methodbutslowerthanMuthen'sGLS method. oftheexpected ofthe Thecomputation and they-scores involves calculations frequencies sameorderforbothmodelsthoughherethelogithas theslightadvantageofbeingeasierto calculatethantheprobit. In social,as opposedto psychometric and educational applications, p is oftenquitesmall andwhatiswantedisa simplemethod ofextracting oneortwofactors anda wayofproviding a scaleofmeasurement forthelatentvariables. Wetherefore ofthiskind givefivefurther examples

1980]

BARTHOLOMEW -

305

Factor Analysisfor CategoricalData

in whichour main aim will be to see how good the firstapproximationis and to illustrate process.The sets of data used are as follows. problemswhichmay arise in the fitting SetI is takenfromLombardand Doering(1947)and itrelatesto knowledgeaboutcancer.A sample of 1729 individualswereclassifiedon fourdimensionsconcerningsourcesof general knowledgeeach having two categoriesas follows:(1) Radio/no radio; (2) Newspapers/no variablewas newspapers;(3) Solid reading/nosolid reading;(4) Lectures/nolectures.A fifth whetheror nottherespondenthad a good knowledgeofcancer.Herewe shalllook onlyat the firstfourvariablesto see whetherthereis evidenceofa singlelatentvariablewhich,we might people are in general. anticipate,would have to do withhow well informed SetsII andIII are fromSolomon(1961) and concernattitudesto scienceexpressedby2982 youngpeople. They weredividedinto two equal groupson thebasis of theirIQ (High = II, Low = III). Attitudeswere elicitedin the formof positiveor negativeresponsesto four questions.The data and the questionsare reproducedin Plackett(1974) whichalso contains Set I. SetIV is a 25 tablefromUpton(1978)wherethequestionsfroma surveyon entryto theEEC variable. weresuch as mightbe expectedto relateto a latentpoliticalleft/right Set V is takenfroma studyofmobilityoftheelderlyand it is includedhereas an example whichgivesriseto problemsin fitting. 3 Parameterestimatesand goodnessoffitofthelogitmodelforthefivecases describedin thetext TABLE

II

I First

Final

al a2 a2

00444 0 445 11228 1 550 00864 0860 0-506 0-456

7rl

0 213 0-604 0 461 0 057

a4 S

7r2 7r3 7r4

--

Final

First

0 169 0 448 0818 0-217

0 195 0 400 1068 0-223

0 168 0 164 0 097 0 161 1 143 15225 0 242 0-168

0-818 0-174 0 646 0 543

0 819 0 178 0 664 0 543

0-839 0-169 0 526 0-446

-

11-80

11-10 7

0 839 0-167 0 732 0-448 -

-

--

2371 1915 A 7 Degrees of freedom

Final

First

-

0-212 0-620 0 464 0 061

17-03

1292 7

v

IV

iIIt First

Final

First

Final

0 962 0 351 0-546 0-998

0 986 0 397 0571 1 074

2 757 2 682 0275 0-386

1 695 2 382 0457 0-594

0 008 0-526 0 237 0-411

0 037 0 524 0-221 0-402

0 493

0-520

0 704 0-454 0 469 0 703

0 707 0 453 0 467 0 709

0-389

0 387

8983

9040 21

-

2162

3331 7

t After15 cycles.

factormightbe involvedbutthe In all cases,exceptII, thevalue ofA suggeststhata further reductionin A fromthe case of completeindependenceis always verysubstantial.The first is good on thewholethoughthereare severalmarkeddiscrepanciesto whichwe approximation turnin a moment. In case III theestimatesforoxwerenot convergingand theiterationwas stoppedafter15 converged.In otherwords,thefitat thisstage cycles.At thispointthevalue ofA had virtually The extremecase,a = oo,is equivalentto what bychangingtheparameters. was hardlyaffected on theresponseonce is called a Heywoodcase in factoranalysiswherethereis no uncertainty

306

BARTHOLOMEW -

Factor Analysisfor CategoricalData

[No. 3,

whichis assumedin function It is thiskindofresponse thevalueofthelatentvariableis fixed. inourmodel.In itas anomalous seemstobenoreasonforregarding scalingandthere Guttman is lessgoodbutit stillshowsthesamebasicpattern. approximation thiscase ourfirst modelis poorthefirst in thatalthoughthefitoftheone-factor Case IV is interesting fitthanthefulliteration. better actuallyprovidesa slightly approximation the ratiowasverylarge(273)whereas onecross-product a 24tablewhere Case V arisesfrom and therewasa marked to converge otherswerein therange2-4. Hereittook26 iterations gives approximation from onecycletothenextintheearlystages.Evenhere,thefirst oscillation good.We haveother ofx areparticularly thebroadoutlineofthesolutionandtheestimates one forp = 7, wheretheiteration ratios,including exampleswithverylargecross-product tozero.Thisappearstobebecausesomeof withmanyelements ofx tending appearstodiverge the (vi)ofSection3)near0 or1.Neither form withyo(seecondition havetheGuttman their1y)'s cope withthatsituation. logitnortheprobitmodelcan adequately Thesearegiven andy-scores. frequencies ofexpected thecalculation A fullanalysis requires forCase I in Table4. 4 modelto Lombardand Doering'sdata on cancerknowledge Fit of theone-factor TABLE

Cell 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110

1111 Total A

Observed frequency 477 12 150 11 231 13 378 45 63 7 32 4 94 12 169 31 1729

Independence frequencyt 279-1 23-6 251-8 21-3 359 2 30 4 324-1 27-4 87-3 7-4 78-8 6-7 112-4 9.5 101-4 8-6 1729-0 380 57

Fitted frequency

y-score 0-212 0 304 0-384 0 475 0 522 0-615 0 700 0 797 0 304 0-394 0 475 0 567 0-615 0-711 0-796 0-889

466 5 16-1 156-4 8l6 250-1 18-9 355.5 44 0 67-1 30 35 8 2-4 78-8 7.5 182-9 34-4 17290-

19415

on theassumptionofcompleteindependence t These are theexpectedfrequencies betweenthe manifestvariables.

overthe theone-factor modelis barelyadequateit is a greatimprovement Although tothedegreeof ofthecellsaccording ranking providea useful fit.They-scores "independence" exhibit. whichthecellmembers knowledge 7. MANIFEST VARIABLES WITH MORE THAN Two

CATEGORIES

7.1. Specification of theResponseFunction

to function weneeda response onanydimension Whentherearemorethantwocategories thecumulative We do thisbydefining offalling intoeachcategory. theprobability specify

1980]

BARTHOLOMEW -

Factor Analysisfor CategoricalData

307

probability = Pr {an individual s or higher},) (26) withlatentpositiony fallsin category 7ris(y) ni,O(y)= 1, ti,c(y)= 0, fallsintocategory s on wherec is thenumber ofcategories. Theprobability thatan individual theithdimension is thus y)i, s(Y)-i, s+ 1(Y).

(27)

as weusedforthedichotomy, Wesupposethatthecumulative function hasthesamelogitform withparameters measurethedeparture and The a's do not depend on s. They from a*i. nir, Thisfactalso ofcategories. independence ofthenumber and,as such,shouldbe independent Thesameapproachcouldbe usedwiththe ensuresthatthedifferences (27)arenon-negative. oftheformgivenby(6). probitor anyotherresponsefunction 7.2. FittingtheModel

tothegeneral Theapproximate method alreadygivenforthe2Ptablecaneasilybeextended case.Atthecostofneglecting someinformation, thea's can be estimated usingthemethodof Thisshouldbe doneso as to makethe Section6 byreducing eachdimension to a dichotomy. marginal frequencies as nearlyequalas possible.Moreefficient methods would,ofcourse,be desirable. Oncethea's havebeenfoundthefullsetofni,'scan be estimated byequatingeach marginal cumulative frequency to itsexpectation. Ifthenumberofmanifest is at all largethecomputation ofthe variablesand categories expected frequencies andy-scores is a substantial undertaking, evenforthe2Ptable.It is thus tohavea systematic desirable of andthisis provided approachtothecalculations bytheresult Theorem3. Thestatement ofthetheorem requires somefurther definitions andterminology. Supposethatdimensioni has ri+1 categorieswhich,in accordancewithour earlier convention, are denotedby thelevels(0,1,2,...,ri).Each cell ofthetableis identified by a sequenceoflevelswhichcanbegenerated symbolically byforming a Kronecker (direct) product ofthevectorsoflevels.Thus,forexample, ifwe had twodimensions withlevels(0,1,2) and (0,1,2,3) thelisting ofthecellsis

[fl

x[?]=

00 01 02 03 11

(28)

20 21 22

23 Theorderofthecellsdependson theorderinwhichwetakethevectors. Thisis arbitrary but onceselected itmustbeadhered tothroughout. Itisconvenient torankthevectors inincreasing orderofri,as in theaboveexample, and to multiply outfromtheright-hand end.

308

Factor Analysisfor CategoricalData

BARTHOLOMEW -

[No. 3,

In general, fora p-dimensional table,thelisting ofthecellsis givenbyforming theproduct

x

2

2

x ... x

2

An E in frontof suchan expression meansthattheexpectation is to be takenof all cell frequencies designated bytheproduct. Finally,we definethe(r+ 1)x (r+ 1) matrix A, as follows: 1 -1 1

0

~

Ar~ 0

0 -1

0

... 0

0

...

0

~~-

0

Thismatrixformsthefirstdifferences of theelements ofanycolumnvectorwhichit premultiplies. Inparticular, ifitpre-multiplies a vector ofcumulative probabilities suchas givenby (26) it yieldsthecategory probabilities of(27). The theorem is as follows: 3. The expected Theorem cellfrequencies can be computed fromtheformula

E

2

E

x

2

x ... x

X 1 1r2Y 7EIrl(y)

=N[A1 x A,2x... x A,j

X-1X 1) 7rp1(Y) 72.2Y)

7rp'#(Y)

(Note thatthe absenceof the " x " signbeforeE on the right-hand side impliesmatrix multiplication ofthestandardkind.)

1980]

BARTHOLOMEW -

Factor Analysisfor CategoricalData

309

Themarginal probabilities thatan individual fallsintothevariouscategories on the Proof. ithdimension, giveny,aregivenby t1Ii(Y)

(29)

A,,i i12(Y)

7tlr (Y)

Thustheprobability offalling intoanycellofthetableis obtainedbymultiplying the together relevant marginal forthatcell(sincetheeventsareindependent, probabilities giveny).Thisis forallcellstogether achieved theKronecker ofthevectors byforming product (29)takenoveri in thesameorderas in thestatement ofthetheorem. Usingthestandardresultthat (Ax x By x...) = (A x Bx ...)(x x yx...

theresultthenfollowson takingexpectations withrespectto y. The theorem identifies all theexpectations whichhaveto be evaluated.In all thereare

(r1 + 1) (r2 +.)...

(rp

+ 1)- 1 integrals to be calculated. The vector of expectations is then

converted intooneofexpected frequencies bypre-multiplying ofthe bytheKronecker product matrices. Thetheorem differencing as a specialcase,the2Ptableobtainedbysetting includes, ri = 1 forall i. The y-score on dimension s forcellx is givenby = E(y,5jIx)

J1J', J1 yp(yIx)dy J...

=

1 J. J1 J1 y)dy. Yp(xI y)dy/N1... Jp(xI

Nf...

(30)

Theorem3 providesa formula forthedenominator ofthisexpression. Thenumerator can be found theKronecker bymultiplying after theE byyv product andthenevaluating theexpression as before. As an illustration consider the2 x 3 caseusedas an exampleatthebeginning ofthesection: 1 E

ni]

x

ir2 (y)] =-E:

i22(y)

r1i(Y)i22(Y)

andeachelement ofthismustbe calculatedbynumerical The resulting integration. vectoris nowpre-multiplied by 1 F

A1XA2=

[

XLo 'j][

1

o

to givetheexpectedcellfrequencies.

I

-1

1?0

0 1

-1

-1 0

0~~~~~~~~~

1=l

1

0

-1

1

-1

0

0

0

0

1

0

0

0

0

0

-1 -1

1

310

Factor Analysisfor CategoricalData

BARTHOLOMEW

8. EXTENSIONS

[No. 3,

AND EVALUATION

8.1. More thanone Latent Variable

logitmodelextendin a naturalwayto thecase of theone-factor offitting The methods thenormalfactor Theorem2 holdsforall q and henceanymethodforfitting severalfactors. matrix canbe usedtoprovideapproximate ofa covariance elements modeltotheoff-diagonal way. forthea's.Thenewmethod estimates giveninSection6.1canalsobe usedinthefollowing the Nextweconstruct Firstwefittheparameters ao.1 = (a 1t1 219 ..., p1) as alreadydescribed. residualsf60 - ij &I. Thesewillincludenegativequantitiesso signsmustbe changedto render andX is retotheresiduals aO.2isthenfitted A secondsetofparameters therowtotalspositive.

For example,in thecase ofUpton's(1978)data (Case IV, Table 3) theresulting estimated. are estimates A

=

0O351,0546, 0-998,0A493), (0O962,

= (-0-518,0-241,-0518, 0-520,0-158),

&A2 In

= (0 704,0A454,0469,0 703,0 389).

to from 89x83 model;A hasbeenreduced ofx isveryclosetothatfortheone-factor Theestimate theestimates couldnowbeusedtoimprove procedure whichisstilla poorfit.Aniterative 61x47 modelssimilar ofcijfortwo-factor Calculations tobeexplored. ofthisremains butthefeasibility 2 is morerestricted thattheapproximation givenbyTheorem tothosegiveninTable1 suggest is required. investigation thanwhenq = 1. Further in itsusefulness Thisis advantages. computational theprobitmodeloffers Withmorethantwofactors however integrals {Eni(y)nj(y)}can alwaysbe reducedto bivariate becausetheexpectations manylatentvariablesthereare.In factit mayeasilybe shownthat Eni(y)= 4(DQio),

Eni(y)lj(y) =

{{

(31)

rAfo rAjo

dz1 c(Z1, Z2; pij) dZ2 (i J),

(32)

coefficient withcorrelation normaldensity where0 is thestandardbivariate Pij

q

Y. E

k= 1

(33)

& Aft,j

methods where,iik = aik/(l + yq= I L.2)+ (k = 0,1,2,...,p). The variousestimation proposedby (1975)andMuthen(1978)forthismodelarebased (1970),Christofferson BockandLieberman for intoestimates of{JAw}can easilybe converted on (31)and(32).Theirestimates essentially ofthelogit parameters tothecorresponding inturn, wouldbegoodapproximations {aik} which, model. to usethe and forreasonsgivenbelowweprefer Fromthepointofviewofinterpretation forwhich logitparameterization zi = G (axo)= G

-(o/(1

Xih= /{Lh/(1

E

EAh=

Aih

8.2. ComparisonswithotherMethods

(34)

ofthe regardless matrices outoncovariance analysescarried Itisnotunusualtofindfactor possibletotreat Forthe2Ptableitisperfectly variables. ofthemanifest ofthedistribution form ontheestimated a factor analysis andtoperform variables{xi}justlikeanyothers theindicator

1980]

BARTHOLOMEW -

Factor Analysisfor CategoricalData

311

covariancematrix.Such an analysiscan be roughlyinterpreted in termsofourmodel,whenthe (k = 1,2,...,q) are small,as follows.From Theorem2, a factormodel fittedto the off4C.1's diagonalelementswillestimatethequantities{TG - `(xi0)ocik,}(k = 1,2,...,q). The varianceofxi is G 1(ociO) {1 - G 1(aiO)}+ O(x2). For the logitfunctionG'(v) = G(v){1 - G(v)} and hencethe samplevariancesestimateG - 1'(aiO)and therefore theox'sare determined. It is doubtfulwhether such a procedurehas any practicalvalue. We have alreadynotedthatthelogitand probitmodelsare likelyto givesimilarnumerical results.At theconceptuallevelthereareconsiderableadvantagesindevelopingbothmodelsby meansofthelatentstructure argumentsused here.The traditional"factoranalysis"approach assumesthatthereare twotiersoflatentvariables.Firstthereis supposedto be a latentvariable underlying each dichotomy;a positiveresponseis then observedif that variableexceeds a thresholdvalue.Secondly,thesevariablesare relatedto thesecondtieroflatentvariablesbythe usual commonfactormodel.This maybe plausiblein some applicationsbutwithdichotomies based on house ownership, tradeunionmembership and suchlike,thenotionofan underlying latentvariableand its associated thresholdis somewhatartificial.When we add to thisthe argumentthattheformsofthedistributions ofthelatentvariablesare essentiallyarbitrary the usual modelappearsas no morethana convenientfiction.It is forthesereasonsthatwe prefer the parameterization in (34) whichhas a morerobustinterpretation. Similarconsiderations applyto thehybridmodelofLord and Novick(1968)in whichG is a logitand H a probitfunction.Samathananand Blumenthal(1978) have givena maximum likelihoodmethodforestimatingits parameterssimilarto theEM algorithm.Thereis clearly roomforfurther studyofthenumericalaspectsofall modelsinthelightofcurrent, and thelikely stateof computertechnology. future, It is unfortunate thatno othersuitableresponsefunctionhas come to lightforwhichthe various integralshave simpleexplicitforms.If we are preparedto abandon the symmetry conditionswe could considersuch functionsas

(35) ii(y) = y' or ii(y) = 1-(1-y)' forq = 1. These modelscan be fittedveryeasilybut,withonlyone parameter,theyare not flexibleformostpurposes.Introducingfurther sufficiently parametersquicklydestroystheir simplicity.The approximatemethod of fittingour logit model seems to come nearestto Whetheror notitis good enoughto be generallyusefulfor and flexibility. combiningsimplicity In themeantimethevariousmethodsfortheprobitmodel q >1 requiresfurther investigation. are available. ACKNOWLEDGEMENTS

The approachon whichthispaperis based was firstoutlinedin a paperreadat theSociety's forsuggestionsand conference at Oxfordin March 1979.I am gratefulto severalparticipants especiallyto Dr J. A. Andersonwhose remarksled to a major change of direction.The suggestionsof refereesand other readers of an earlier version have also led to many the logitmodel in Section6 has been programmedin The methodof fitting improvements. FORTRAN by J.Tomensonto whom I owe special debt. REFERENCES Models. Amsterdam:NorthAIGNER,D. J. and GOLDBERGER,A. S. (1977) (eds). Latent Variablesin Socio-economic Holland.

itemparametersand latentabilitywhenresponsesare scoredin two or morenominal BOCK, R. D. (1972). Estimating 37, 29-51. categories.Psychometrika,

35, scoreditems.Psychometrika, BOCK, R. D. and LIEBERMAN,M. (1970).Fittinga responsemodelforn dichotomously 179-197.

40, 5-32. CHRISTOFFERSON,A. (1975). Factor analysisof dichotomizedvariables.Psychometrika, (C. A. models.In The AnalysisofSurveyData, Vol. 1: ExploringData Structures FIELDING, A. (1977). Latentstructure O'Muircheartaighand C. Payne,eds), pp. 125-157,London: Wiley.

312

Discussionof thePaper by ProfessorBartholomew

[No. 3,

GOODMAN, L. A. (1978). AnalyzingQualitative/Categorical Data Log-LinearModels and LatentStructureAnalysis.

Reading,Mass.: Addison-Wesley. H. H. (1970). ModernFactor Analysis,2nd ed. Chicago: Universityof Chicago Press. HARMAN, LAZARSFELD, P. F. and HENRY,N. W. (1968).LatentStructure Analysis. NewYork:Houghton Mifflin. LOMBARD, H. L. and DOERING, C. R. (1947). Treatmentof the four-fold table by partial association and partial correlationas it relatesto public healthproblems.Biometrics, 3, 123-128. LORD,F. M. andNovICK,M. R. (1968).Statistical Theories ofMentalTestScores.Reading,Mass.:Addison-Wesley. McDONALD,R. P. (1969).Thecommon factor analysis ofmulti-category data.Brit.J.ofMath.andStatist. Psych., 22, 165-175. MUTHtN, B. (1978). Contributions to factoranalysisof dichotomousvariables.Psychometrika, 43, 551-560. PLACKETT, R. L. (1974).TheAnalysis ofCategorical Data.HighWycombe: Griffin. SAMATHANAN, L. and BLUMENTHAL, S. (1978). Thelogistic modelandestimation oflatent structure. J.Amer. Statist. Ass., 73, 794-799. H. (1961).Classification SOLOMON, proceduresbased on dichotomousresponsevectors.In StudiesinItemAnalysisand Prediction(H. Solomon,ed.), pp. 177-186 Stanford:StanfordUniversityPress. UPTON,G. J.G. (1978).TheAnalysis ofCross-tabulated Data. London:Wiley.

DISCUSSION OF PROFESSOR BARTHOLOMEW'S PAPER

AITKIN(University ofLancaster): I ampleasedtoproposethevoteofthanks forDavid Professor MURRAY increasing practical importance Bartholomew's paper.Thesubjectoflatentvariablemodelsis ofrapidly notesinSection ofapparently unconnected areas.Professor Bartholomew statistical andunifies a number 1 thatlatent partly becauseofthedifficulty withstatisticians, modelshavenotfoundwidefavour variable takes isoverdue. Hispapertonight andthatattention tosuchmodelsbystatisticians offitting themodels, data. an important properlatentvariablemodelsforcategorical steptowardsdeveloping continuous and is madebetween Thebasisofthemodelsis setoutinSection2. Anearlydistinction modelhas beenusedwithbothtypesof independence thoughtheconditional discrete latentvariables, comesdown function (iHvi) in section3, thechoiceofresponse latentvariable.Giventheproperties Bartholomew andthelatentvariables. variables toa probit/logit choiceforboththemanifest essentially 4 and5 discussindetailthe arenotentirely clear.Sections model, forreasonswhich choosesthelogit/logit showsthat discussed playan important role.TheLSATexample fitting ofthemodel.Herethecross-ratios fortheprobit withtheMLestimates consistent thecomputing method forthelogitmodelgivesestimates thoughit is notclearthatit givesefficient The methodis practicable, modelofBockand Lieberman. estimates. Laird bytheEMalgorithm (Dempster, forMLestimation modelsarenatural candidates Latentvariable forEM,buttheprobit modelisverysuitable, as thesufficient andRubin1977).Thelogitmodelisunsuitable inthe"complete sumsofsquaresandcross-products. statistics data"modelarejusttheusualregression backinthenormal factor modelcouldbeachieved byEMusingsimple DLR pointed outthatMLestimation atLancaster a GENSTAT macrofor leastsquarescomputations, andJohnHindehasdeveloped and-forward SteadandCreason,ina noteto appearin Biometrics, analysisusingEM.Hasselblad, exploratory factor byEM.The curvecan be fitted pointoutthatthestandardprobitanalysismodelfora dose-response latent intheprobit/probit oftheparameters ofthesetwoapproaches allowstheestimation combination I amcurrently completing jointworkwithDarrellBockon variablemodelbyMLusingan EMalgorithm. thisprocedure. classanalysis. maybe treated bylatent notesthatcategorical latentvariables Professor Bartholomew itisofsomeinterest Whilecontinuous choicesforabilities andattitudes, latent variables aremorenatural on a scalewhichis essentially a scalingofthecellentries thatthesimplelatentclassmodelalso provides forfitting thelatentclassmodelarethe Thisis ofpractical valuebecausethecomputations continuous. The sameas thosefora generalmixture model,andcan easilybe donein GLIMusingan EMalgorithm. a simpleillustration. cancerknowledge example(Set 1) provides mixture. Therearetwoclassesofpeople: multinomial Thelatent classmodelusedisa two-component variable, In Professor notation, yin(1) is a Bernoulli Bartholomew's andbadlyinformed. well-informed, withP(y= 1) = A,P(y = 0) = 1-A. The responsefunctionof(5) is thenjust 7rAxiI Y) =-Xi(l-

-

y) IXi

1980]

DiscussionofthePaper by ProfessorBartholomew

313

they suffix indicating thattherearetwosetsofzi in thetwolatentclasses.Theconditional probability function ofy givenx in (2) is then (x I y = 1_i(xIY=?' 1)

x) = >f

P(y=1I

a monotonefunctionof the likelihoodratioforthe two components,and similarlyforP(y = 0 1x). The EM algorithmbeginswithstartingvalues fortheprobabilitiesof latentclass membership, most simplybyassigningeach cellto one ofthetwoclasses.Parameterestimatesare thenobtainedin theM-step fromthe conditionalindependencemodel. These are substitutedinto the likelihoodratio to give new probabilitiesofclass membership in theE-step.The sequenceofstepscontinuestillconvergence to theML estimatesof the ;iy. This is verysimplyaccomplishedin GLIM witha smallmacro.At convergence, the oflatentclassmembership probabilities havealso converged, and thesealso providea rankingofcellsfrom "mostwellinformed" to "leastwellinformed." In addition,fortheconditionalindependence model,thelog oftheratioof theprobabilitiesofclass membership is a linearfunctionof thexi-a lineardiscriminant whosecoefficients function, are thelog-oddsratiosfortheithitem.Thisdiscriminant function can also be usedto scale theindividualcells.A smallorzerocoefficient indicatesthatthecorresponding itemdoes not discriminate betweenwell and poorlyinformed classes,and can be droppedfromthe scale. In the cancer data, the two-classmodel gives a goodness-of-fit value A of 15 4, using one extra parameter,so it fitsas well as the one-factormodel. The discriminant functionis 1 43x1+ 3 62x2+ 2 35x3 + 1 61x4 assigningmost weightto newspapers,next to solid reading,and least to radio and lectures.These coefficients are verysimilarin relativemagnitudeto the factorloadings in Table 3, column 2. The discriminant score,and theestimatedprobabilityofbelongingto thewell-informed group,are shownin Table Dl, togetherwiththecell code and theestimatedfactorscore fromTable 4. TABLE

Cell

Factor (y) score

0001 0010 0011 0100 0101 0110

0 304 0 384 0475 0 522 0 615 0700

0000

0 212

0111 1000 1001 1010

0-797 0 304 0 394 0 475

1100 1101 1110 1111

0615 0711 0796 0889

1011

0 567

DI

Discriminant score

000

1 61 2 35 396 3 62 5 23 5.97

7 58 143 3 04 3.78

5 39

505 666 740 901

Probability ofbeing well-informed

0 029

0 131 0 241 0613 0 531 0 850 0922

0 983 0 112 0 386 0 569

0 868

0825 0959 0980 0996

A plot of the factorscore againstthediscriminant score shows a verynearlylinearrelationship. Similarresultsare obtainedfortheLSAT data ofBock and Lieberman.For Section6,thevalueofA for themixturemodel after12 iterationsis 23 9, witha discriminant functionof 1 66x1+ 1 48x2+ 191x3 + 1.32x4 + 1 26x5. For Section7, A is 35 5 after12 iterationswithdiscriminant function 1 76xi +2 02x2+2 67x3+ 1 48x4+ 1 37x5. The goodness-of-fit ofbothmodelscan be improvedbyfurther iterations, withoutessentially changingthe discriminant function.

[No. 3,

Discussionof thePaper by ProfessorBartholomew

314

to thevalueof latentvariablemodelsin unifying of thesecomments I referred at thebeginning on principal areas.I shouldliketo concludewithan example:regression unconnected apparently principal to extract advocatedpractice, or at leasta commonly It is commonpractice, components. subsetofthe ona suitable theresponse andthenregress setofpredictors, correlated a highly from variables (1975)-makes andGoldberger modelofJoreskog model-theMIMIC variable A latent variables. principal x isa setof thesetofpredictors toachieve. istrying Underlying regression component clearwhatprincipal model. a regression onz through ofx anddepends yisindependent z. Givenz,theresponse variables latent givenz: independent The x areconditionally Yi|ziN1(y + zi, T2) independently ofyi xiIzi - N(O + Azi,T) independently whilemarginally withP a diagonalmatrix, zi N(O,I). forthisis under program anda GENSTAT byMLusingan EMalgorithm, Themodelcanagainbe fitted byJohnHinde. development andimportant forthisstimulating thevoteofthanks inproposing I havemuchpleasure Inconclusion, paper. anapproachtothe hasdescribed Bartholomew Professor ofNottingham): (University Dr A.M. SKENE at exists which totherather sparsebodyoftheory addition welcome isa very datawhich ofordinal analysis whichreleasesomeofthe developments inthisarea.Thefullimpactmustofcourseawaitfuture present tooneorperhaps Therestriction estimation procedure. bythepresent constraints imposed computational theory tousingnormal tothoseaccustomed restriction severe mustappeartobea very variables twolatent ofbothp(y)and7t(xIy) nature andawareofthearbitary thislimitation accepting However, factor analysis. model. I lookedto possibleusesofthislogitlatentstructure and data reduction. modelling Therearetwoareasofapplication; ina observed forthepattern a verygoodexplanation mayprovide variable latent A singlecontinuous to supportthe arguments ifthereareextra-statistical table,particularly contingency multidimensional and thus On theotherhand,latentvariablestendto be mentalconstructs ofsucha variable. existence oflatentclass The flexibility formulation. can usuallybe madefora discrete equallyvalidarguments shouldbe ourstarting formulation thatthediscrete byGoodman(1978)suggests analysisas described point,withthelogitmodelbeingadoptedwhenthelatentclassanalysisrevealsclasseshavinga clear ordering. the from scores.It follows x byE(yIx),thefactor byreplacing datareduction Thelogitmodeleffects tocalculate p(yIx()) matter variables ofthemanifest giveny,thatitis a trivial conditional independence onlyrequires estimation ofx. This,coupledwiththefactthatparameter wherex(l) is anysubvector thatthelogit leadstotheobservation ofthemanifest variables, oftheoneandtwowaymargins knowledge data. is unaffected bymissing model'sapplicability measure. as a summary itsmeanmaybetotally inappropriate however, Oncep(yIx)hasbeenobtained isthatoffinding ofp(yIx)fortheLombardandDoeringdata.Theproblem twoinstances Fig.D1 displays

2.0

20

00

y score

1.0

0.0

y score

1.0

(ii)p(yIx' = (0,1,1,0)). FIG.Dl. Conditional forLombardandDoeringdata.(i) p(yIXi = (0,0,0,0)). distributions

1980]

DiscussionofthePaper by ProfessorBartholomew

315

suitable forheavily summary measures skewed distributions. Thechoiceofsummary statistic isevenmore whentwolatentvariables complicated arefitted anditis certainly to makemuchhereofthe dangerous analogywithnormalfactoranalysis. Effective data reduction is striking a balancebetweenreducing dimension and retaining that information whichis relevant to a specific objective. The realvalueof factorscoresor conditional distributions cannotbejudgedintheabstract anditis pointless themeaning ofthesequantities. debating Theirultimate valuemustbejudgedby,forexample, theaccuracy ofthefinalpredictive equationorthe insights gainedintothesubjectoftheanalysis. Thispointis relevant toall latentstructure modelsandcanbe illustrated bythefollowing modelused formedicaldiagnosis. vectorS, onepossibleformulation ofp(SIDi)is thelatent GiveDdiseasesDi,i = 1.I andsymptom classmodel n

K

p(S IDi) = E H Pk(SkICj)p(Cj IDi). j= 1 k= 1

(1)

Conditionalupon latentclass,Ci, we assumethatthe symptoms are mutually independent and

independentof Di. The parametersof this model,viz. the parametersof Pk(. -) and the probabilities usingtraining data,Skene(1978),and,givena particular p(CjIDi)j = 1.n; i = 1.I canbeestimated

realisation ofS, sayT, diseaseprobabilities p(DijT)oc(p(TjDi)p(Di),can be computed. noclaimtobea representation Equation1makesabsolutely ofthetruth. Inanyparticular application it standsor fallsbyitsabilityto correctly diagnosepatients. Thereis a secondwayofwriting thismodel.GivenT, we mayfirst calculate

HPk(Tk IC)

p(Cj IT)oc

k

p(C)

where p(ci)

=

E p(cj jDi) p(Di)

and thencalculate p(Di jT)

= E

p(Di jCj) p(Cj jT).

Theprobabilities p(CjIT)j = 1.n define a probability distribution overthelatent classesandthisalone is usedin calculating thediseaseprobablities. Thisparticular formulation makesthetwostepsoftheclassification muchclearer. Thefirst stepofdata reduction isfollowed bytheusingofthetransformed thisformulation data.However isalsoveryseductive as itexposesthelatent classesandraisesthepossibility thattheymight haverealmeaning. Suchemphasis is,in themain,unwarranted. Professor ineffect, Bartholomew, hasdescribed a rather different wayofdoingthisfirst stepofdata reduction. Theultimate testofthisparticular modeliswhether effective orgoodunderstanding predictions ofparticular datasetsresult. I havemuchpleasurein seconding thevoteofthanks. Thevoteofthankswas passedbyacclamation. MrC.J.SKINNER(University I shouldalsoliketothankProfessor fora ofSouthampton): Bartholomew veryinteresting paper.I particularly oftheresponse enjoyedthediscussions function in Section3 andI notethatcertainlatentclassmodelsmayalso be includedin thegeneralformulation ofequation(6), ifH becomesa discrete valuedfunction. concernthesuggestion in thispaperthatthelogitmodelis preferable My maincomments to the similar numerically probit model,andI shouldliketooffer a fewwordsindefence oftheprobit model.One reasongivenforpreferring thelogitmodelisthatSection 6 provides a simple solution approximate (atleast whenq = 1).Thissolution beviewed may,however, as aniterated analogueofthesimple heuristic solution fortheprobitmodel,wheretheestimated to thetetrachoric In fact,ifone correlations. cij'scorrespond to iterate theheuristic ina corresponding method attempts onefindsthatsuccessive iterations manner, giveanidentical solution, because,undertheprobit parameterisation, Eni(y)doesnotdependonthefactor loadingsand thecorresponding two-stage Oij'sare all unity.One advantageof sucha non-iterated

316

Discussionof thePaper by ProfessorBartholomew

[No. 3,

structure correlation factoranalysispackagesor moresophisticated is thatconventional procedure variablesor with and Sorbom,1978)maybe usedwithcategorical packagessuchas LISREL(Joreskog ofcategorical andcontinuous variables. combinations obtainedbytheheuristic thatpointestimates as in Table2, suggests evidence, Availableempirical withtheheuristic estimates. A supposedproblem likelihood areveryclosetothefullmaximum procedure testofmodelfit. a statistical as forexamplestatedby Muthen(1978),is thatofobtaining procedure, bydirect analogytoSection4. ofthispaper, goodnessoffittestmaybe obtained However, a chi-squared normal ofmultivariate ofa number requires theevaluation oftheteststatistic wherethecomputation standard errors. method is thatofobtaining withtheheuristic problem Perhapsa moredifficult integrals. to tointhispaper,anditwouldbeinteresting is notreferred Aitken hasnoted,thisproblem AsProfessor errors. adaptedto givestandard knowifthemethodin Section6 can be simply modification modelprovides aneasilyunderstood I findinteaching thissubject thattheprobit Finally, contingency simplelinksbetween andI finditvaluabletodemonstrate analysis, ofnormaltheory factor variableanalysis. tableanalysisand continuous ofEssex):Professor Bartholomew hassetupan elegantmathematical Dr G. J.G. UPTON(University I willtherefore confine to a mycomments whichfillsme,at least,withaweandwonder. superstructure thathe has obtained. oftheresults discussion beenobtained haveapparently fortheLombardandDoeringdata,which thy-scores Table4 includes ofthea to theestimates simply connected of(10).Thesescoresare,however, integration bynumerical I to thecelldefinition, i,j, k and1,eachtakingvalues0 or 1 corresponding Usingsubscripts parameters. is givenby fitto they-scores notethatan excellent 4 895yijki = 1 024+Oei+a2i+a3k+a4l.

forthisequation,butthefitis too goodto be accidental. I havebeenunableto findan explanation ofhis to theinterpretation Bartholomew I am unhappy paidbyProfessor aboutthescantattention todefine readthatitisunnecessary Inparticular, I cannotbelievethatmybookhasbeenso widely results. intotheCommon votes(fororagainstentry dimensions fordatasetIV.Thesearereferendum themanifest andsocialclass.Theorder in 1975,amountofschooling, unionmembership allegiance Market), political oftheordergiveninmybook.Thepositive place ofunionmembership isthereverse oe-values ofcategories at thelefthandend MarketLabourunionmember working-class anti-Common theminimally-schooled itisdistinctly are contrasts isreasonable. thatthestrongest surprising ofthepolitical However, axis,which politicalallegiance. ratherthanmanifest voteand unionmembership thoserelatedto referendum tothereferendum it fitsa secondlatent factor data.Forthisfactor Insection 8 Professor Bartholomew whois atoneend MarketConservative non-union middle-class anti-Common is theminimally-schooled I amunhappy thatoneofthe abouttheassumption ofan axis.Couldthisbeanagedimension? However, foundintheone-factor be thedimension modelwillofnecessity inthetwo-factor twolatentdimensions dimension found ina casewherethere werereally twodimensions, thesingle model.I wouldhavefeltthat, thetwo.My modelis morelikelyto be an over-worked hybrid lyingbetween theone-factor byfitting from two without randomvariation, couldbe testedbycreating a datasetwhichwasderived, hypothesis thesinglelatentvariablemodel. and thenfitting latentvariables, theories thecourseofcosmological MrG. J.A.STERN(I.C.L.): ofscience seemtofollow Manybranches in thelines,whichconsistofa coupletbyPope and a modernsequel: suggested Nature,and Nature'slawslayhidin night, God said,"LetNewtonbe,"and all was light. "Ho! It did notlast,theDevilshouting thestatusquo. be!"restored Let Einstein whichis followedby worsedarknessin theshapeof theory, Darknessis followedby a clarifying follows so thatclarification andpossibly meaninglessness, leadingto incomprehensibility sophistication U-curve. an upside-down werehighly It seemsto me thatSpearman'sG meantsomething: manymeasuresofintelligence oftheoriginal thatlinearcombinations I wouldsuggest to a singlefactor. variates, correlated Likewise, It meanssomething. thecasewithprincipal as isoften hasa meaning, components, thecombination where andtheimpossibility withallsortsofnon-orthogonality modelmeans, toseewhatthefullfactor isharder in termsoftheoriginalvariates thefactors ofexpressing (exceptbyestimation).

1980]

DiscussionofthePaper by ProfessorBartholomew

317

Ifa factor analysis typeoftheory weretobeappliedatall,itshouldbe,I suggest, toprecise datawhere therearemanyreadings so thatpreciseestimates ofparameters can be made.I wouldsuggest thatthe cancerdata,forexample, is farfrom that.Can peoplereallyrecallaccurately from whichcombination of radio,papersetctheygottheircancerknowledge from? I suggest thatanother samplewouldyielddifferent answers, whichwouldgreatly altertheestimates of theparameters. Moreover, eventheone-factor modelisfitting eightparameters anda variate towhatare really sixteen points, anda twofactor modelwouldbeworsestillinleading with(as I believe) toestimates a hugevariance. I don'tknowwhattheansweris,evenafter playing withthedataofTable4,butwouldventure these suggestions: fitisnotallthatbadhavingregard (1) In facteventheindependence totheimprecision ofthedata.At most,I suggest, a slightmodification ofthisassumption is needed. (2) Morethanone factor shouldnotbe considered fortheabovereasons. (3)Possibly themodelwouldbemoreconvincing iftheyhada physical meaning, perhaps related tothe correlation between theanswersto thequestions. (4)Ifparameters as wellas theyareneeded, I wouldsuggest thatthea's shouldnotbeusedbutonlythe i's.

I suggest Inconclusion, thatwithsocialmultivariate dataweareoften toexplain, trying comment on, lookat,rather imprecise figures ina waywhichaddstopeople'sunderstanding ofwhatthedataissaying. Has thisbeenachieved here?I think thatquitea fewassumptions havebeenbuiltintothetheory whose theuserofthetheory willoften implications notfully comprehend, andso itwillbe hardfortheuserto knowwhathasbeenachieved, I think inmanycases.Certainly thisisa coreonwhich a clearer andsimpler couldbe built,and I wouldhopethatthiswillbe done. theory The foregoing expresses myownview. Dr P. M. E. ALTHAM I wouldliketothankProfessor (Cambridge University): Bartholomew fora useful and stimulating paper,and maketwobriefpoints. (i) AlthoughI shouldperhapsthinkfurther about Dr. Upton'scomments beforeI speak,my impression isthatI wouldfinditnottoohardtointerpret thelatent structure modelstoa socialscientist, notharderthaninterpreting andcertainly a loglinear modelwithcomplicated highorderinteractions. feature whichI findattractive. (ii) Latentstructure modelspossessthefollowing Theessential feature we postulate ofthemodelis thatfortheobservable variables the x1...xp,whicharegenerally discrete, ofthelatent variables existence Thusthejointdistribution of y,suchthatgiveny,x1...xpareindependent. in as thatof x1...xp,.Thisseemsa desirableproperty anysubsetof x1...xp has thesamestructure x'smaynotbevery where thenumber ofobservable thesocialscientist would applications clearly defined; wantto includeorexcludeextrax's or"questions" without hismodel.This probably drastically altering is notsharedbyloglinear "invariance" feature ofcourseit mustbe recognised that analysis, although rather andlatentstructure areaddressing different butforthesametypeof loglinear analyses problems, data. A consequence ofthisproperty, Bartholomew's as pointed outalreadybyDr Skene,is thatProfessor oftheproperty more logitmodelis"unaffected" bymissing data;I onlywishtoputthepositive advantages strongly. The following in writing, contributions werereceived afterthemeeting. Professor E. B. ANDERSEN ofCopenhagen): It hasbeenverystimulating (University to readProfessor newunifying Bartholomew's Thekeyissueis,ofcourse,howto approachto latentstructure analysis. modelinlatent Professor Bartholomew space.Although foralwayshaving a uniform arguesvery forcefully itisimportant distribution ofthelatentvariable, tonote,thatmanyarguments demandthatweconsider withparameters. latent distributions Wemaythusbe interested incomparing severallatent distributions inchangesina latentvariableovertime.In suchcasesa statistical orwemaybe interested analysiswill taketheform ofa comparison oftheparameters ofdifferent latent Oneofthemodels distributions. usually mentioned Bartholomew-with G a logitand H a probit-wasconsidered byProfessor in a paperby Andersen and Madsen(1977)and it has recently beenextended to coverthetypeofcomparsons I mentioned above(Andersen theapproachofProfessor (1980)).Ifonecompares Bartholomew withthe results itappearsthatthea-parameters justmentioned, ofmodel(6)playdifferent roles.Someofthemare connected withthemanifest variables andsomeofthemrelatemoretothelatent parameters variables. For an interpretation oftheresults ofan analysisit maybe worththeeffort to makesucha distinction.

318

DiscussionofthePaper by ProfessorBartholomew

[No. 3,

the variable, modelwithonelatent H-probit theG-logit, ifweconsider Asanexample, aj0 willcombine curvemodel(orRaschmodel)andthemeanofthe logistic itemcharacteristic ofthesimple itemparameters ofthelatentvariable. deviation and equalto thestandard whileail is a constant latentvariable, paperis doubly Ba.holomew's uponTyne):Professor ofNewcastle (University Dr J.A. ANDERSON data analysisand factor categorical topics,multivariate twoimportant welcomebecauseit combines modelsandcontains forfactor properties ofnecessary summary helpful Hispaperprovides a very analysis. and modelsareuseful thatthesefactor ofRii.I agreeentirely on theexpectation results theinteresting fortheprobitmodel. butI stillretainsomepreference important a exactlyand to optimize hereis to fittheone-waymargins suggested The methodofestimation fortheprobit approachhasbeenestablished A similar margins. offitofthetwo-way ofgoodness measure that(i) theequation(25) is exactand (ii) maximum modelby Mackenzie(1976),withtheadvantages formanymoredimensions on theestimates forthe(aij)conditional estimation likelihood (xi)is feasible Bock canbe derived. errors andstandard efficient thanp = 10;thesecanbe shownto be asymptotically andmayin of(xi)and(ceij) estimation tothesimultaneous p < 12,refers andLiebeman's (1970)limitation, methods ofoptimization. bybetter anycase be superceded relatesto aboutthelogitmodel,whenthereis morethanonefactor, concern A morefundamental a kitis possibleonlyto estimate say,k factors, In continuous factor analysismodelswith, rotatability. thisspaceis determined within 1971).Thechoiceoffactors space(Lawley& Maxwell, dimensional factor thesame hasexactly analysis factor Theprobitmodelforcategorical orbyexternal criteria. subjectively thelogit However, loadings. ofthefactor torotation spacecorresponds Rotationofthefactor property. variates logistic, homoscedastic case,independent, as unlikethenormal modelappearsnottoberotatable I am Sincetheprobitand logitmodelsareso closein otherrespects, underrotation. arenotinvariant in thelogitmodelwhichwouldlead to numerical rotatability concerned thatthereis an approximate case. factor anddealtwithas inthecontinuous intheestimation unlessrecognized procedure instability is to be Bartholomew and Professor literature in the statistical This fieldhas been neglected ourinterest. bothon hisresults and on stimulating congratulated as thetopicis I foundthepaperparticularly ofBristol): stimulating MrC. L. F. ATTFELD (University metotheprevious tointroduce attempts andsothepaperserved familiar I amnotaltogether onewithwhich asspecialcases ofhismoregeneral canbeviewed Bartholomew Professor shows, which tosolvetheproblem, obtainedby the"logitfirst estimates by theaccuracyof theparameter approach.I was impressed ona states, canbeobtained Professor Bartholomew fortheonefactor casewhich, method approximation" shouldprovean invaluabletoolin thepreliminary analysisof The approximation pocketcalculator. data. categorical ofthelatent ofindependence theassumption to see theresultofrelaxing It wouldbe interesting tojustify. I findindependence variables economic withunobservable verydifficult In working variables. andthen condition theindependence theanalysis without imposing Woulditbepossibletoworkthrough construct a testforindependence? variables which are"real", inSection2 thatlatent remark Bartholomew's I wouldarguewithProfessor most arequiterare.On thecontrary suchas "personal wealth", bemeasured directly i.e.caninprinciple theexpected rateof thisform, income, areofexactly e.g.permanent ineconomic theory latentvariables because ofthesecasesthereis noproblem It is truethatinthemajority investment. anticipated inflation, ofthem) which variables withmanifest variables (ora transformation canbeassociated variables thelatent normal.The modelscan thenbe as multivariate and distributed can be assumedto be continuous methoddue to likelihood due to Brown(1974)or themaximum estimated usingtheGLS procedure and Sorbom(1974). in Joreskog outlined Joreskog and paperinteresting Bartholomew's ofLondon):I found Professor H.GoLDsTEIN(University Professor toassumption heattaches (iv)inSection3.thisseems butI ama littlepuzzledbytheimportance useful, 1- i(y) should ingeneral, setofmodelsandI failtoseewhy, restrictive tometoleadtoan unnecessarily as i(y). For example,foran examination questionwitha of functions belongto thesame family choice sayto a multiple response, expectan incorrect responseone wouldnormally correct/incorrect and I wouldnot to a correct mentalprocesses response, to be obtainedbywayofdifferent q.uestion, loglog 1980)thatthecomplementary (Goldstein, therefore expect(iv)to hold.I havearguedelsewhere a more satisfy (ii))maybeinsomecircumstances (iv)butdoes,incidentally, function doesnotsatisfy (which one than the logitor probitforexam typedata. WhilstI would acceptProfessor appropriate

1980]

Discussionof thePaper by ProfessorBartholomew

319

ifitweretobe used of(iv)formanykindsofdata,itwouldseemunfortunate Bartholomew's justification functions areavailable. wheremorerealistic to as "local independence" in muchof the (referred independence On thetopicof conditional independence assertsin Section2.2thatifconditional literature), Professor Bartholomew psychometric on themanifest an influence thatsomeotherlatentvariablewasexerting werenottrue,thenthisimplies I am notsureI agree.Supposewe havedetermined thelatentspace,and choosea setof variables. If variables. thesamesetofvaluesonthelatent ata singlepointinthisspace,thatis,allhaving individuals in meansthattheresponse probabilities thenconditional independence weconsider a 2Ptableofresponses theseonthelatent variables. This,however, ofthetableandthrough thistabledependonlyonthemargins a strong andevenifconditional independence didnothold,wemight still seemstobe rather assumption, "interaction" in thistable,via additionalparameters probabilities be able to relatetheappropriate thisonlyifwehad In practice wecouldpresumably attempt (loadings), tothesamesetoflatentvariables. thisisa difficult thing toachieveinthesocial although observations onindividuals, independent replicate withthebetween-individual variation, of thelatentspaceis concerned sciences.The dimensionality areassociated witheach willdetermine howmanyparameters thewithin-individual dependencies whereas latentvariable. whatProfessor Bartholomew says about the care neededin I wouldliketo endorsestrongly seemstobe fullofwhatare Thehistory offactor analysis models. from latentvariable interpreting results Ofcourse, withsubstantive oneway devices convenient reality. beingconfused mathematically essentially butdifferent some models,including ofinculcating a propercautionis ifonecan showthatreasonable dataset.I verymuchhope ofa common interpretations which (iv),canleadtoquitedifferent do notsatify to thisissue. willgivesomefurther Bartholomew thought thatProfessor TheAUTHOR as follows. repliedlater,in writing, thanthepresent a moreextended reply questions which merit Thediscussion hasraisedmanyfundamental andthefollowing contributions tothosewhosubmitted limit ontimeandspaceallows.I ammostgrateful discussion. contribution to whatwillI hopebe a continuing areintended as first incomplete remarks thelatent classmodelinwhich tothepossibility ofusinga latent havedrawnattention Severalspeakers classesarisesas a special out,sucha modelwithtwolatent As MrSkinner pointed variable iscategorical. case ofourgeneralmodel.Thussupposewechoose ? < Y< Y. 1 (A) in latentclass1. Modelswiththreeor moreorderedlatent thenyois theproportion ofthepopulation itisnotexpressible in(1)satisfies alltheconditions ofSection3 though Thefunction classesarisesimilarly. in thesameway.In-so-far as (A) can in theform(6) and hencetheparameters are notinterpretable wewouldexpect thelatent classmodeltogivefits orviceversa, thelogit(orprobit) functions, approximate Aitkin's thatthisisindeedthecaseand tofindfrom Professor calculations similar toours.Itisre-assuring ofhisdiscriminant functions willrepaydetailed Itisworth thatthecoefficients noting hisexamples study. thesamepattern as thea's inourmodel.In particular theyindicate that,intheexampleoncancer exhibit radioorlectures. thanwhatisheardthrough written ofinformation moreweight carry sources knowledge, or ofrealism Thechoicebetween a latentclassanda latentvariablemodelmaybe madeon grounds latentvariablesare almost convenience. thatcontinuous suggests My ownexperience computational bothkindsofmodelcanbe accommodated butinanycase,as wehavejustshown, alwaysmorerealistic like(A).He within Professor Aitkin's caseisthusessentially oneforusingstepfunctions theoneframework. whichcan theexistence ofcomputer whichstemfrom claimsthecomputational programmes advantages of term herebutthemainthrust themodel.Theremaywellbeshort advantages easilybeadaptedforfitting tobe which relative merits a framework allowstheir thepaperwastolookata wideclassofmodelswithin adoptionofsecondbestmodelsuntilthe assessed.I hopethatthiswouldcautionagainstthepremature fullrangeofoptionshas beenthoroughly and evaluated. explored a probit, overwhether at stakeintheargument Froma practical pointofviewthereseemstobe little thelogitmodelwas version ofthemodelshouldbe used.Myowndecisiontoinvestigate logitora hybrid ratios.Thelatterseemmore ofitscloselinkwiththecross-product bythediscovery largely stimulated ofassociation thantetrachoric correlations and theremarkably goodapproximation naturalmeasures Attheveryleastit ofsomedeeperconnection yetto be uncovered. provided by(14) maybe indicative version ofthelogitmodelas thegeneral In thelongrunI seenoobstacletomaking further study. justifies ;i(y) =;i , =i2'

Yo