In reviewing the pertinent literature, the authors ask the question: is personality testing culturally biased?

Constantino, G. & Malgady, R. (1996). Development of the TEMAS, A multicultural thematic apperception test: Psychometric properties and clinical utili...
Author: Neal Richardson
2 downloads 0 Views 71KB Size
Constantino, G. & Malgady, R. (1996). Development of the TEMAS, A multicultural thematic apperception test: Psychometric properties and clinical utility. In G. R. Sodowsky, & J. Impara (Eds.). (1996). Multicultural assessment in counseling and clinical psychology (pp.85-136). Lincoln, NE: Buros Institute of Mental Measurements. (2 students) Tamara Hammitt Bisbee 1st Year Doctoral Student PY 870 Tests and Measurements Department of Clinical Psychology Antioch New England Graduate School Keene, New Hampshire In reviewing the pertinent literature, the authors ask the question: is personality testing culturally biased? Pertinent psychometric definitions of bias: •

Face Validity The available evidence concerning the face validity of current psychometrics is qualitative. There is not empirical evidence concerning face validity to date. The authors note that some of the items on the MMPI-2 refer to culturally patterned beliefs, behaviors and feelings which are not pathological in certain Hispanic cultures. For example, Puerto Rican spiritualistic beliefs define mental illness as the invasion of evil spirits. In cases such as this, interview questions which infer pathology from spiritualistic responses would not be valid. Implications for future research: The authors note that awareness of culturally patterned behavior does not mean that signs of dysfunction in mainstream culture should be disregarded simply because they are rooted in mainstream culture. Further, there is indication that clinicians without specific guidelines for how to take culture into account in the diagnostic process develop their own notions which are usually inaccurate. Consequently, the authors contend that specific research concerning whether a test elicits an ostensibly balid assessment in the context of the client’s culture is necessary. The authors contend that future research in this area should address whether items suspected of being biased on commonly used scales of psychopathology or psychodiagnostic criteria provide assessment that is discordant with other items which are clearly not biased.



Content Validity The authors contend that if standardized test items which are understood to be biased are simply thrown out, the content validity would be hindered. Efforts to change current

psychometrics in this way would be reductionistic. Therefore, efforts need to be reconstructive such that key elements of the minority clients’ cultures are not lost. In other words, it is necessary to define which symptoms cross cultures and which are unique to a given culture. •

Normative Differences between Populations Evidence of normative differences: 1987 – Hispanic Health and Nutrition Examination Survey reported higher rates of depression measured by the CES-D among Puerto Ricans when compared to Mexicanand Cuban-Americans and Whites. 1987 – DSM-III-R Diagnostic Interview Schedule indicated higher Puerto Rican rates of cognitive impairment, somatization and alcohol abuse/dependence than White rates. 1987 – A review of 37 studies of MMPI cross cultural comparisons of Blacks, Hispanics and whites yielded Hispanic-White and Hispanic-Black differences on selective scales in 6 out of 7 studies. 1992 – A comparison of native Puerto Ricans, Mexican-Americans and non-Hispanic Whites on five DSM-III disorders as indicated by the DIS indicated that MexicanAmericans were at high risk for affective disorder and alcohol abuse/dependence while Puerto Ricans were at the highest risk for somatization disorder. The authors note that although there is empirical evidence of normative differences between different ethnic populations which would suggest the need for ethnic specific test norms, this would imply that one population is not more disordered than another. The danger in this case would be that if mean differences between ethnic populations actually indicate valid differences in pathology, the development of separate norms would be a disservice to ethnic minorities who would be less likely to receive services. Therefore, the authors contend that it research is necessary in order to ascertain the real reason for the mean differences. If the differences are related to the lack of utility of current constructs, this would imply the need for new test construction or what the authors refer to as reconstruction.



Invariance of Factor Structure The authors suggest that technically evidence of cultural test bias would encompass a difference between ethnic groups in number of factors, the pattern of factor loadings, the percentage of variance explained or correlations among factors.

Evidence of invariance across ethnic populations is limited and equivocal:

(1980, 1981) The Center for Epidemiological Studies Depression Scale (CES-D) was found to display similar factor structures among White, Black and Mexican groups. (1979) Differences in the number and composition of MMPI factors among Whites, Blacks and Mexican-Americans were observed. (1984) MMPI factor differences were not identified. Consequences of differential latent structure: A test that profiles multiple scales derived from factor analysis may not be useful if the items do not hang together into the same factors with minority individuals as with majority individuals. A different arrangement of items into different scales would be warranted in this case which would inevitably change the reliability of the test. If different factor structures underlie a total test score, the question of normative means is important. The test may be measuring different dimensions of the same construct across cultures or measuring different constructs. Variations in the internal properties of a test across cultures imply the need for specific research to determine if there is construct validity. •

Predictive Validity The question of whether there are population differences in the way in which psychometric scores relate to an external criterion related measure has not been studied. The authors identify two forms of validity in this area. Differential validity: a measure of equivalence across populations in the test’s prediction of an external criterion. Differential prediction: the equivalence of regression equations related to above.

The authors’ answer to the question: The authors conclude that there is some important empirical consensus that cultural factors do actually impact on the outcomes of standardized testing and diagnosis. They take this as evidence of a need for developing culturally sensitive psychological tests for reliable and valid diagnosis and personality assessment of diverse individuals.

Background: Projective Tests •

The Evolution of Thematic Apperception Tests

Thematic apperception techniques were developed based on a psychodynamic assumption that individuals project often repressed unconscious drives onto ambiguous stimuli. ( This is akin to the idea of the blank slate in psychoanalytic technique). In this case, the content of responses was considered key in the analysis of Thematic Apperception Test stories. TAT stories were thought to reflect fantasy material or primary process material. Ego Psychology’s introduction of the focus on the ego as an organizing structure shifted analysis to a primary focus on the theme of the story (fantasy or primary process material) with symbolic content being secondarily important. The advent of focus on the cognitive component of psychology yielded the new idea that TAT stories are not fantasy material, but rather products of conscious cognitive processes which require analysis of both symbolic content and structure (ego defenses). Currently, there is effort to utilize thematic apperception tests to integrate the assumptions of both ego psychology and cognitive-behavioral psychology, with particular interest in assessing problem-solving strategies, coping styles and selfinstructional styles. •

Psychometric Bias of Projective Tests Because story telling is reliant on language usage, projective tests are considered invalid for inarticulate individuals. While research suggests that Hispanic and Black children have been evaluated as being less verbally fluent, less behaviorally mature and more pathological than nonminority children related to their performance on projective tests, there is also strong research indicating that minority children are articulate when tested with culturally sensitive instruments. As early as 1943, it was discovered that the Thematic Apperception Test stimuli (pictures) had limited relevance to individuals from different cultures. Recent studies indicate that culture-specific stimuli were necessary for personality assessment of unacculturated Hopi and Zuni Indians, whereas the TAT was most effective with acculturated. Further, since the relatively recent development of an objective scoring system and norms, the TAT has been found to have limited utility with European Spaniards. The TEMAS (Tell-Me-A-Story) test was created in response to relevant research , the development of which encompasses a correct etic orientation in order to demonstrate multicultural construct validity.

Development of the TEMAS •

Theoretical Framework

The TEMAS is based on a dynamic-cognitive framework which directs that the context of the sociocultural system is foundational. Personality is understood to be a structure comprising the motives that are learned and internalized dispositions and that interact with environmental stimuli to determine behavior in specific situations. When a test’s stimuli are enough like the situations in which the personality functions were originally learned, these functions can be transferred to the testing situation and are reflected in the thematic stories. •

Age The TEMAS is normed for children and adolescents aged 5-13. The TEMAS can be used clinically with children and adolescents aged 5-18.



Stimulus Cards There are two parallel versions of the pictures. The minority version consists of pictures featuring predominantly Hispanic and African-American characters in urban environments. The nonminority version consists of corresponding pictures of predominantly White characters in urban environments. The personality functions depicted in both sets encompass identical themes. Both versions have a Long Form consisting of 23 cards and a Short Form consisting of 9 cards. Of the short form cards, 4 are administered to both genders and 5 are genderspecific. Of the long form cards, 12 are administered to both genders while 11 are gender specific and 1 is age-specific. Notably, there are 4 cards with pluralistic characters which can be used interchangeably for both minority and nonminority versions (cards 15, 16, 20 and 21).



What the TEMAS measures Cognitive Functions: There are 18 cognitive functions which can be scored for each protocol: Reaction Time (RT); Total Time (TT); Fluency (F); Total Omissions (OM); Main Character Omissions (MCO); Secondary Character Omissions (SCO); Event Omissions (EO); Setting Omissions (SO); Total Transformations (TRANS); Main Character Transformations (MCT); Secondary Character Transformations (SCT); Event Transformations (ET); Setting Transformations (ST); Inquiries (INQ); Relationships (REL); Imagination (IMAG); Sequencing (SEQ); and Conflict (CON). Personality Functions: There are nine personality functions assessed and each stimulus card pulls for at least one of them. Interpersonal Relations (IR); Aggression (AGG); Anxiety/Depression (A/D); Achievement Motivation (AM); Delay of Gratification (DG); Self-concept (SC); Sexual Identity (SEX); Moral Judgment (MJ); Reality Testing (REAL).

Affective Functions: The TEMAS system evaluates seven Affective Functions. Happy (HAP); Sad (SAD); Angry (ANG); Fearful (FEAR); Neutral (NEUT); Ambivalent (AMB); Inappropriate Affect (IA). •

Examiner Qualifications Examiners should be fluent in the language in which the examinee is dominant and should have knowledge of the cultural and ethnic/racial heritage of the individual being tested.



Administration The test is administered individually. The instructions are standardized, and two types of instructions may be used: temporal sequencing or structured inquiries; both of which have explicit inquiries which are utilized. The testing is timed related to latency time between card presentation and beginning of storytelling, (RT), and the total time the child takes to complete the story including the structured inquiry time (TT).



Scoring Each story is scored separately for cognitive, affective, and personality functions. Cognitive: Fluency is indicated by number of words per story. RT and TT are recorded for each story. Conflict is scored 1 if not recognized by examiner, and blank if omitted.; Sequencing is scored 1 if it is omitted, and blank if noted; Imagination is scored 1 if narrative is stimulus bound and blank if it goes beyond the stimulus (hence exhibiting the use of imagination); Relationships is scored 1 if recognized (by the examiner) and blank if absent; Inquiries are scored 1 if they are not answered and blank of they are ALL answered; Omissions and Transformations are scored according to the number of omissions and transformations of Main Character, Secondary Character, Event and Setting. Affective: All affective functions are scored 1 if they are present, blank if not. Personality: Personality functions are scored on a Likert-type 4-point rating scale with the least adaptive resolution receiving a score of 1 and most adaptive resolution a score of 4. Standardization Sample: 281 males, 361 females from public schools in New York City. Age range 5-13 years, mean age of 8.9. Four ethnic/racial groups: Puerto Ricans and other Hispanics, Blacks, and Whites. SES of lower and middle income families.

Standard Scores: Raw scores for scales identified as Quantitative Scales are converted to T scores. Critical levels for raw score distributions have been developed based on expert clinical opinion for Qualitative Scales, the indications of which should be labeled Clnical Indicators. •

Reliability Internal Consistency: the degree to which individual cards are interrelated in measuring particular functions. Internal consistency reliabilities of TEMAS functions were obtained from a sample of 73 Hispanic and 42 Black children. Reliability coefficients for the Hispanic sample ranged from .41 for the affective function of Ambivalent to .98 for Fluency, and had a median value of .73. In the Black sample, coefficients ranged from .31 for Setting Transformations to .97 for Fluency with a median of .62. While Reaction Time, Fluency and Total Time demonstrated high levels of internal consistency in both minority groups, relatively much lower correlations were demonstrated for Omissions and Transformations of perceptual details which the authors attribute to the fact that these functions constitute clinical scales and are not frequently occurring in nonclinical populations. Coefficients for the cognitive function of Sequencing was low in the Black Sample, and each minority group had different peaks in reliability for Affective functions.Coefficients for personality functions varied with some high levels of internal consistency in the Hispanic sample, and uniformly lower alphas for Black children. The authors note that low reliabilities for personality functions may be partially due to the fact that relatively fewer cards are utilized for them. Coeffieicnt alphas for the standardization sample, differentiated by racial/ethnic group for the Long Form were mostly moderate with a median alpha of .83 for the Quantitative Scales for the Total Sample. Reliability Indicators for the Qualitative Scales were lower, likely due to the nonmetric nature of scoring. Test Retest Reliablity Test-Retest Reliability was computed for the Short Form utilizing two administrations separated by an 18-week interval. The sample included 51 randomly chosen subjects from 210 Puerto Rican students screened for behavior problems. A generally low level of reliability was found which the authors attribute to the possible introduction of interrater reliability error encompassed in different raters for pre- and post- testing. Interrater Reliability Estimates of interrater reliability have ranged from 31-100% in an early study to 75-95% in a later study which the authors attribute to the evolution of a more stable scoring process.



Validity

Content Validity The agreement of 14 clinicians of varying educational, experiential and ethnic specifications and clinical orientations yielded very high agreement (71%-100%) for the psychological conflict identified in each card. Criterion-related Validity TEMAS profiles were found to significantly predict Ego Development, teacher’s behavior ratings, delay of gratification, self-concept of competence, disruptive behaviors and aggressive behavior as measured by various psychological tests and ratings of role play situations by psychologists. The multiple correlation for predicting trait was not significant. Hierarchical multiple regression analysis was used to assess the utility of TEMAS profiles for predicting post therapy scores on the criterion measures as well. Pretherapy TEMAS profiles significantly predicted all therapeutic outcomes ranging from 6% to 22% with the exception of Self-concept of competence. •

Psychometric Studies Support for increased responsiveness of minority children to culturally relevant stimuli: 1983 – Assessment of responsiveness of 72 Hispanic children in 4th and 5th grades from New York City public schools to Thematic Apperception Test and TEMAS. Students were more verbally fluent related to TEMAS cards, and this was more pronounced for females. The students were also more likely to respond in Spanish to TEMAS and to switch from English on TAT to Spanish on TEMAS administrations. 1983 – Comparison of TAT (both minority and nonminority versions) and TEMAS performances of 72 Hispanic, 41 Black, and 43 White students in grades K-6. Females were more fluent than Males generally. Hispanics and Blacks were more responsive to both TEMAS tests when compared to the TAT, and only Hispanics were less fluent than Whites on the TAT. Attending to effect size yielded small convergent and discriminant effects since ethnic minorities were more fluent on the minority version whereas Whites were more fluent on nonminority version. Psychometric properties: 1984 – Administration of the TEMAS to 73 public school Puerto Rican children and 210 clinical Puerto Rican children of low SES in grades K-6 yielded internal consistency and interater reliability. Further, TEMAS indices significantly discriminated between public school and clinical samples.

1988 – TEMAS profiles of 100 Hispanic and Black psychiatric outpatients and 373 public school students from the inner-city of low SES discriminated the two groups and explained 21% of the variance independent of ethnicity, age and SES. Discriminant accuracy was 89%. Better discrimination was evident for Hispanics than AfricanAmericans. 1991 – TEMAS was administered to 152 normal and 95 clinical Hispanic, Black and White school age children to measure attention to pictorial stimuli depicting characters, events, settings and psychological conflicts. ADHD children were significantly more likely to omit information about each aspect. Differences were large and persistent when structured inquiry was used. Results suggest possible enlargement of clinical utility of TEMAS. 1991 – Comparison of normative profiles, reliability and criterion related validity of TEMAS with school and clinical children from three different Hispanic cultures: Puerto Ricans from NY City, San Juan natives, and Argentinians. Results supported utility of TEMAS in all three groups Notably, Argentinian children scored lower in Moral Judgment on card 15 related to their tendency to perceive the presence of a policeman as punitive related to their relevant cultural experience i.e., the military regime of the 1980’s in which policemen were seen predominantly as punitive. 1992 – TEMAS profiles (nonminority version) discriminated between public school and outpatient samples of White children from inner city, low to lower-middle SES female headed households. The TEMAS profiles discriminated between clinical and nonclinical samples with 86% accuracy. 1993 – The TEMAS profiles of 45 (of 80) Hispanic school age children attending two mental health centers showed levels of agreement with DSM-III diagnosed categories ranging from .73 to .92. 1991 – Forty Mexican American and Anglo-american 10-12-year-olds in grades 4-7 were administered the Robert’s Apperception Test for Children (RAT-C) and the TEMAS with the purpose of describing the relationship between level of acculturation and the performance on the tests. Results indicated that the TEMAS seemed to be more culturally sensitive in assessing Mexican-Americans than the RAT-C, both tests seemed to be valid for assessing personality functioning among Anglo-American children. TEMAS as therapeutic stimuli: 1994 – 90 9-13-year olds were treated in this study by a culturally sensitive storytelling intervention of 8 weeks duration. ( The TEMAS test was used as an intervention, NOT as an assessment instrument). Results indicated significant improvement in anxiety, depression and phobic symptoms, and school conduct as measured by the CED and STAIC.

Suggest Documents