On the law of intelligence

ARTICLE IN PRESS Developmental Review xxx (2004) xxx–xxx www.elsevier.com/locate/dr On the law of intelligence William Lichten* Koerner Center for E...
Author: Cordelia Ray
15 downloads 2 Views 889KB Size
ARTICLE IN PRESS

Developmental Review xxx (2004) xxx–xxx www.elsevier.com/locate/dr

On the law of intelligence William Lichten* Koerner Center for Emeritus Faculty, Yale University, New Haven, CT 06520-8368, USA Received 21 November 2003; revised 26 March 2004 Available online

Abstract The law of intelligence is presented in test independent form. Mental abilities, physical brain size, and infant motor capacity follow the same law of growth from birth to adolescence. Mental growth is independent of race, SES or the Flynn effect. The vitality of the mental age scale calls for a reexamination of WechslerÕs deviation IQ. This paper builds on YenÕs method of standardized differences (1986). The main theoretical advance here is to put development back into intelligence testing and to show a universality among different measures of the growth of the human nervous system. Ó 2004 Elsevier Inc. All rights reserved.

This paper suggests a new theoretical structure for psychoeducational measurement and uses it to derive a scale of growth of mental ability. Over the years, many researchers sought the ‘‘law of intelligence,’’ the growth curve of mental ability, a goal to be addressed by this paper (Bloom, 1964; Bock, 1983; Gesell, 1928; Heinis, 1924; Jensen, 1973; Keats, 1982; Thorndike, Bregman, Cobb, & Woodward, 1927; Thurstone, 1925, 1928; Thurstone & Ackerson, 1929; and many others).

Remarks on natural laws We consider quantitative relations in physics and psychophysics, fields that are sometimes emulated by mental testers. *

Fax: 1-203-432-8247. E-mail address: [email protected].

0273-2297/$ - see front matter Ó 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.dr.2004.04.001

ARTICLE IN PRESS 2

W. Lichten / Developmental Review xxx (2004) xxx–xxx

Scales A pound of meat is a pound of meat. It weighs the same on the butcherÕs scales whether by itself or if added to another piece of meat. The scale divisions are uniform in meaning over the entire range of measurement. Similarly, an inch is the same anywhere on a yardstick. Likewise 1 s has a simple, well defined and measurable meaning at any place or time. Thus the basic units of physics, mass, length, and time, and the laws based on them, are measured on uniform, well-standardized scales. Psychophysical scales measure the relation between subjective sensations (such as loudness, pitch, and brightness) and objective physical correlates (sound intensity, frequency, and light intensity). The scales of psychophysics purport to be uniform. For example, one asks an observer to vary the intensity of one sound until it seems half as loud as a second sound. This way, one can set up a loudness scale which has equal divisions. (Licklider, 1951; Stevens, 1951, 1975; Stevens & Davis, 1938; Woodworth & Schlosberg, 1954). An example of a psychophysical law is that of Weber–Fechner, that the sensation of pitch or loudness of a sound, brightness of a light, etc., is proportional to the logarithm of the corresponding physical variable. (For a sampling of the many discussions of this law and alternatives, see Engen, 1971; Luce, Bush, & Galanter, 1963; Luce & Krumhansl, 1988; Luce & Suppes, 2002; Stevens, 1975; Suppes & Zinnes, 1963; Thurlow, 1971; Woodworth & Schlosberg, 1954.)

Is a law of intelligence possible? Quantitative physical and psychophysical laws hinge on measurement scales. Can we set up a law of intelligence by merely following the examples of physics and psychophysics? Unfortunately the matter is not so simple. As Jensen (1993, p. 141) put it, ‘‘There are no existing tests that could render such statements as the following at all meaningful: ÔA person gains half of his adult level of mental ability by the age of five.’’Õ The rationale underlying this statement is the impossibility of comparing directly the growth of intelligence at different ages. For example, infants are in PiagetÕs sensorimotor stage: . . .during the first year. . .intelligence, strictly speaking, is not yet observed. (Piaget & Inhelder, 1969, p. 9)

On the other hand, adults are in a formal operational stage. Comparing the two would be a case of apples and oranges. The earliest intelligence measurements were expressed on a mental age (MA) scale (Binet & Simon, 1916). On BinetÕs scale, test scores advanced by even amounts each year. But almost all subsequent mental tests showed a quite different growth pattern. Terman and Merrill (1937) noted MA was a very uneven scale, with rapid mental growth among infants and young children and near stasis in late adolescence. Thus neither the MA scale nor its grade level achievement twin can answer JensenÕs rhetorical question. Yet both are quite useful and are still widely used in the clinical,

ARTICLE IN PRESS W. Lichten / Developmental Review xxx (2004) xxx–xxx

3

developmental, and educational literature. In Terman and MerrillÕs words (1937, p. 25) The expression of a test result in terms of age norms is simple and unambiguous, resting upon no statistical assumptions. A test so scaled does not pretend to measure intelligence as linear distance is measured by the equal units of a foot-rule, but tells us merely that the ability of a given subject corresponds to the average ability of children of such and such an age.

It was a pity that the MA scale was dropped from IQ testing. This paper will use the MA scale to derive the law of intelligence. For further discussion of MA, see the later section IQ and Mental Age Scales. Layzer (1972, p. 276) pointed out a related difficulty with IQ, as a measure of deviation of intelligence from the average at a given age: ‘‘IQ does not measure an individual phenotypic character like height or weight; it is a measure of the rank order or relative standing of test scores in a given population.’’ (See also Jensen, 1993.) To illustrate his point, consider this question. Which step in intelligence is the greater: from 100 to 130 or from 70 to 100 IQ points? 130 and 100 might be the difference between a research medical doctor and a butcher. On the other hand, 100–70 is the gap between average and mentally retarded, which under a recent Supreme Court decision can be the difference between life and death (Atkins vs Virginia, 2002; Greenhouse, 2002). Both intervals represent 30 points, but how can you compare the two? Thus it is again an apples and oranges problem to equate units at different points on existing mental ability scales. Although we may not yet say what a mental growth scale is, we can certainly say what it is not. Mental ability is not like a pound of meat; it cannot be put on a scale where each interval is exactly equal to every other one in meaning and in size. If it were that simple, the problem of finding the law of intelligence would be solved and there would be no need for this paper. We now turn to the measurements which can be the basis of such a law. The measurement of mental ability (IQ, achievement, etc.) Without a simple, linear scale, how can we deal quantitatively with intelligence? In the words of Jensen (1969, pp. 5–6) Intelligence, like electricity, is easier to measure than to define. And if the measurements bear some systematic relationship to other data, it means we can make meaningful statements about the phenomenon we are measuring. There is no point in arguing the question to which there is no answer, the question of what intelligence really is. The best we can do is to obtain measurements of certain kinds of behavior and look at their relationship to other phenomena and see if these relationships make any kind of sense and order.

Luce and Krumhansl (1988) pointed out that the situation is similar to the early days of the study of heat. Nobody knew exactly what temperature was. The basis of the concept was subjective feeling of hot and cold. It took centuries for the development of the laws of thermodynamics before temperature was really understood. Nevertheless, pioneers went about constructing thermometers based on expansion of

ARTICLE IN PRESS 4

W. Lichten / Developmental Review xxx (2004) xxx–xxx

liquids. They marked their instruments at two standard temperatures and divided it into equal steps. For example, FahrenheitÕs scale had its zero at a mixture of ice and salt and its 96 at body temperature. A simple way to compare scales is to match mid points. For example, consider thermometers with standard temperatures at 0 and 100 °F. The scale midpoints for gas thermometers differ from each other by only few thousandths of a degree. The midpoint of the mercury scale differs from gas thermometers by 0.1 °F and from alcohol by 1 °F. The excellent agreement among most thermometric materials means that thermometer scales are independent of the material used. Water is an exception. It would make a poor thermometer. It would read 81.3 °F at the midpoint of the 32–100 °F scale (66 °F). The reason for this gross discrepancy is the non-linear expansion of water. If one were to plot temperature from almost any scale against another, the plot would be a straight line. This linear agreement among scales made it reasonable to use any one to define temperature. Such scales preceded and agree with the now well understood laws of thermodynamics. Note that we cannot directly compare different parts of the temperature scale with each other, as we might with two yardsticks by laying one on top of the other. There is no simple, direct way to compare the temperature intervals 0–10 °C and 90–100 °C. Yet the consistency and the linearity among scales make the measurement of temperature exact. In conclusion, this paper is a search for scales of the growth of mental ability which do not depend on the specifics of the test used to measure it. A simple, practical test of this consistency of such scales is to compare growth midpoints. We obey Jensen (1969) and avoid the claim that this is the way that ‘‘intelligence’’ really grows. Rather, we shall compare the current scale with others in an effort to gain insight into the nature of intelligence and other mental abilities.

Growth and variation. Local vs. global properties The developmental psychologist Wohlwill (1973) split the growth of any quantitative psychological trait into a universal growth function (AllportÕs nomothetic, 1942) and the individual variation about that function (AllportÕs idiographic). McCall, Eichorn, and Hagerty (1977, p. 3) noted that developmental psychologists tend to slight one of these two factors: Ironically, most empirical research has stemmed from an individual difference tradition in which cross age correlations were calculated between indices of mental performance, while the major theorist, Piaget, deals only with developmental function.

McCallÕs criticism is particularly germane to IQ testing, where the WechslerÕs (1939) deviation scale slights developmental function. One aim here is to overcome this lack. Luce and Krumhansl (1988, p. 39) distinguished between local psychophysics, ‘‘which is concerned. . .with stimuli which are physically little different’’ vs. ‘‘global . . . sensations over the full dynamic range of the physical stimuli.’’ In physics, local

ARTICLE IN PRESS W. Lichten / Developmental Review xxx (2004) xxx–xxx

5

properties are differential; global features are integral. Thus a differential equation governs the acceleration of a falling body at a given position. Finding the integral of this equation gives the overall motion of the body or projectile. Either form is derived from the other by means of calculus. The integral form contains more information (the boundary conditions) than the differential version. NewtonÕs laws give the acceleration of a projectile at each point of its trajectory, but it takes a whole chapter in physics textbooks to relate this local condition to global information, such as the time the object takes to fall to the ground or the path followed by a projectile. Likewise, in psychophysics, WeberÕs law, that the just noticeable difference (j.n.d.) of a stimulus is proportional to its magnitude, is a local relation stated in differential form. The Weber–Fechner law, that states that the sensation is proportional to the logarithm of the stimulus magnitude, is a global version. As in physics, the global and local relation can be derived mathematically from each other. However, in the psychophysical case the global law involves further assumptions. (For references, see Remarks on natural laws section at the beginning of this paper.) Importance of growth Growth was at the heart of the first intelligence tests (Binet & Simon, 1916), which measured on the mental age scale but neglected to measure variation at a given age. On the other hand, the modern deviation IQ scale is based on variation only and gives no information about growth (Wechsler, 1939). For developmental psychology and education growth is sine qua non. Casual observations as far back as Aristotle have shown that two adults of the same age are more alike than a baby and an adult. The total growth of mental ability from birth to adulthood is large compared to population variations occurring at a given age. (see Appendix). The global nature of mental ability goes beyond variation (IQ) at a given CA and also includes the much larger growth and decay over the entire life cycle. We can view IQ and growth over a short term (such as a year) as local and thus as incomplete. Normally factor analyses of IQ tests are taken from data at the same CA. The general factor of such analyses is g, general intelligence (Jensen, 1998). If a factor analysis were instead made of data across the range of ages, for example in the WISC, Wechsler Intelligence Scale for Children, the general factor would become CA by far. The problems of different fields are much more alike than their practitioners think. . .the physical sciences have learned much by storing up amounts, not just directions. . .being so uninterested in our variables that we do not care about their units can hardly be desirable. Tukey (1969, pp. 83, 86, 89)

The need for units This paperÕs goal is nomothetic: to find universal properties of the growth of ability. Accordingly, it aims to express measurements and derived scales in well-defined

ARTICLE IN PRESS 6

W. Lichten / Developmental Review xxx (2004) xxx–xxx

units. However, in psychology units are hard to come by. The psychophysicist, in the classical 19th century tradition of Wundt, Hering, and Helmholtz, measured mental events, such as brightness, loudness, and pitch, in terms of tangible, physical quantities like intensity and frequency. During the 20th century the word ‘‘psychophysics’’ became ‘‘psychometrics.’’ Mental testers had one physical variable (CA) upon which to hang their hats. In addition, standardized tests often use deviation based units like IQ or percentiles. The present paper is based on normed, aggregated mental test data, the average and SD as a function of CA (for intelligence tests) or grade (for achievement tests). It works equally well and consistently on a variety of ability measures and derived scales: raw scores; MA; IQ; Thurstone (1925), Rasch, and Item Response Theory. Examples can be found in Appendix and Yen (1986). This paper limits its data to well standardized mental ability tests. The generic term ‘‘mental ability’’ covers a wide variety of standardized IQ, achievement, and infant development tests. Although some authors treat a even wider range of abilities (Gardner, 1983; Salovey & Mayer, 1990; Sternberg, 1997; Torrance, 1988; Torrance & Goff, 1989), none has made a standardized test which could be used in this monograph (Jensen, 1998). The goal of this paper is to find out to what extent an objective growth scale can be based on these measures. Intelligence is what the tests test. Boring (1923, p. 35)

The tests This paper builds scales from aggregated data (group or population averages), which are ‘‘true scores’’ (Gulliksen, 1987). It uses IQ, infant development and achievement tests from birth to adolescence. Standardized tests for teenagers and adults, such as the SAT (formerly the Scholastic Aptitude/Assessment Test), ACT (formerly American College Testing Program), GRE (Graduate Record Examination), LSAT (Law School Admission Test), MCAT (Medical College Admission Test), etc., and individual scores show idiosyncratic rather than lawful behavior and thus receive limited consideration in this paper. The tests consist of a nested series of components. Items The smallest unit of mental ability tests is the item, which consists of single question or task. Subscales Similar items, such as vocabulary or arithmetic problems, are grouped together to form subscales.

ARTICLE IN PRESS W. Lichten / Developmental Review xxx (2004) xxx–xxx

7

Scales Subscales in turn are combined to form scales. For example, the verbal scale in the Wechsler IQ tests is a combination of information, similarities, arithmetic, vocabulary, and comprehension subscales; the performance scale combines five subscales such as picture completion and block design. The true (error-free) raw score T for persons of a certain ability is simply the total number of correctly answered or performed items listed in the norms manual for that group. Testing companies take a representative sample of the population to approach error-free tables in the manual. For example, an average 10-year-old would get 14 items correct on the WISC-III information subtest which translates into a scale score of ten, according to the manual. The subscale scores are combined to arrive at scale scores, which are then combined to give an IQ for intelligence. Achievement test scores usually are given in percentiles (deviation score) or in grade level as a growth measure. The full scale score (or composite score) combines all scales. This paper uses the terms subscales, scales, and full scale. Scale score should be distinguished from the term scaled score or standard score z, which is the deviation of a raw score T from the mean M, expressed in standard deviation units z ¼ T rM . T

IQ and mental age scales The IQ and MA scales are test independent (all error-free tests give the same true IQ and MA). On an easy test, an average 10-year-old may get 75 right answers on 100 items; on a hard test, the score might be only 25 items correct. On a sufficiently large sample of a representative population, both scores would assign the same IQ and MA to the average 10-year-old or to a sample of persons of any age with the same mental ability. For intelligence tests, MA is a single, easily understood quantity in well defined units of years (see Fig. 1), as is grade level for achievement. MAÕs leveling off in late adolescence is characteristic of mental tests. Terman and Merrill (1937, p. 25) noted: . . .the mental age unit. . .appears definitely to decrease with age. . .the difference between 1year and 2-year intelligence (100 IQ points-auth.) is so great that any one can sense it. . .The difference in intellectual ability between the average child of fifteen and the average child of 16 (then 5 IQ points-auth.) is so small that it can barely be detected by the most elaborate mental tests.

Conversely, the SD or IQ unit, measured in MA or grade units, increases with age (see Fig. 1 and Eq. (A.2)). At birth, rMA should vanish. This behavior may seem strange for MA and its SD (Fig. 1) in isolation. When both are combined to form a standardized growth function (in Appendix to this paper), MA falls in line with other mental test scales (Fig. 18 and Table 1). MA (and grade level) have the shortcoming that neither can handle children who fall outside the scale range at either end. The reason is that both MA and grade level

ARTICLE IN PRESS 8

W. Lichten / Developmental Review xxx (2004) xxx–xxx

Fig. 1. The mental age (MA) scale (Terman & Merrill, 1937.) Vertical bars:  1 SD.

can only refer to average abilities, and average ability never reaches above average children at the top of the scale, nor does it reach below average children at the bottom of the scale. For example, the top of the California Achievement Scale (CAT) is at the grade level of 12 years 8 months, the last grade at which exams are given. When a student, class, or school average is higher than that, an arbitrary score of ‘‘12.8’’ is assigned to it. The Iowa Achievement Test (ITBS) uses a fictitious scale going up to the 18th grade to handle above average 12th graders. In either case, the number assigned has little meaning. The growth scale used here is given in units of standard scores for each age. This scale is extended simply by adding standard scores to it. Likewise, deviation IQ has no problem handling exceptional persons at any age. Since the time that MA was dropped by Wechsler (1939), IQ test scales have been deviation based. These scales measure variation but not growth and thus are subjected to the criticism voiced by McCall et al. (1977) and others. Indeed, one must make correlational, longitudinal studies to study mental development (Anderson, 1939; Bloom, 1964; Furfey & Muehlenbein, 1932; McCall et al., 1977). To remedy that lack, this paper uses the MA scale inter alia to handle both growth and variation.

Standard scores The familiar standard scale z equates tests by aligning the population means and standard deviations (SD). For example, IQ has a mean of 100 and SD ¼ 15. Figs. 2 and 3 show another example: distributions for two college entrance examinations, the SAT-Verbal (mean ¼ 505, SD ¼ 111) and the ACT-English (mean ¼ 20.4, SD ¼ 5:4). Fig. 4 plots both distributions against standard scores. (The vertical heights of both distributions also are normalized.) The distributions of standard scores equate well.

ARTICLE IN PRESS W. Lichten / Developmental Review xxx (2004) xxx–xxx

9

Fig. 2. SAT-Verbal score distribution.

Fig. 3. ACT English score distribution.

Deviations between two sets of measurements which have effect sizes which are less than 0.2 SD are considered small (Cohen, 1988). Inspection of Fig. 4 shows both tests to be interchangeable within this precision over the range of z values between )2 and +2. The SAT and ACT align because both have the same shaped distributions and the huge number of tests irons out statistical fluctuations. For most tests, the distributions near the mean (jzj  2) are close to normal, which makes alignment of standard scores practical. For larger jzj, (Figs. 2 and 3) curves deviate from each other and results become less comparable.

ARTICLE IN PRESS 10

W. Lichten / Developmental Review xxx (2004) xxx–xxx

Fig. 4. Scaled score distributions of SAT-V and ACT English.

Norming samples at each age for IQ tests are only a few hundred at most. For deviations of jzj  1 (IQ well outside the normal range of 85–115), there are so few cases in the norming samples that one cannot talk of distributions at all and it is impossible to compare different IQ tests. For example, the WISC-IV standardization (norming) sample consisted of 200 children at each age. Assignments of exceptional children (gifted or retarded: IQ > 130 or

Suggest Documents