Detection of Geometric Image Distortions at Various Eccentricities

Detection of Geometric Image Distortions at Various Eccentricities Jyrki Rovamo,* Pia Mdkeld,* Risto Nasdnen,* and David Whitakevf Purpose. Human abi...
Author: Rodney Lynch
2 downloads 0 Views 2MB Size
Detection of Geometric Image Distortions at Various Eccentricities Jyrki Rovamo,* Pia Mdkeld,* Risto Nasdnen,* and David Whitakevf

Purpose. Human ability to perceive spatial stimuli declines with increasing eccentricity. To study this phenomenon with natural images, the authors applied the spatial scaling method by measuring the smallest detectable amount of geometric change in a human face at several eccentricities for a series of stimulus magnifications to find out whether performance could be made equal across the visual field simply by an appropriate enlargement. Methods. The authors used a novel method to produce subtle changes to an image of a face. The smallest change recognized was determined using a two-alternative forced-choice method and expressed in terms of correlation sensitivity, the inverse of the correlation between the images that just could be discriminated. Results. The detection of changes in the facial features, presumably a spatially complex task, became equal across the visual field simply by an appropriate change of scale. The E2 value represents the eccentricity at which the foveal stimulus size must double to maintain performance at the foveal level. The E2 values, found to be 1.73° to 2.45°, were similar to our previously measured values for vernier acuity, orientation discrimination, and curvature detection and discrimination, obtained with the same method of spatial scaling. Conclusions. The authors' results indicate that with adequate stimulus magnification, one is capable of detecting geometric changes in complex images such as face equally at the fovea and in the periphery. In this task, there seems to be no qualitative difference between the accuracy of foveal and peripheral processing. Invest Ophthalmol Vis Sci. 1997;38:1029-1039.

F ace perception has been studied by using the faces of two persons or two images of the same face when one of the images has been changed in some way. A natural face can be modified by displacing the features within the face,' exchanging the features from another face, 2 or changing the facial expression. 3 Caricatures can be produced by exaggerating or diminishing the features of a face. 4 T h e images of two different faces also can be combined by means of a beam-

From the * Department of Optometry and Vision Sciences, University of Wales, College of Cardiff, Cardiff, Wales; and the tDepartment of Optometry, University of Bradford, Bradford, United Kingdom. Presented in part at the 17lh European Conference on Visual Perception, Eindhoven, The Netherlands, September 1994. Supported in part by grants (if the Association of Finnish Ophthalmic Opticians and Information Centre of Optics Business. Also supported (PM) by grants from the National Agency of Health and Welfare in Finland and the Trades Union of Finnish Ophthalmic Opticians. Submitted for publication May 22, 1996; revised November IS, 1996; accepted December 5, 1996. Proprietary interest category: N. Heprint requests: Jyrki M. Rovamo, Department of Optometry and Vision Sciences, University of Wales, Cardiff, P.O. Box 905, Cardiff CFI 3XF, Wales, United Kingdom.

splitter or digital image processing. Hiibner et al5 used the latter method to fuse the images of W. C. Fields and Salvador Dali. When die locations of facial features (e.g., the eyes, nose, or mouth) have been varied, the threshold of dislocation has been estimated by recording how many pixels a feature can be displaced before the displacement is noticed.1 Benson and Perrett4 determined the "likeness" or "familiarity" rating for caricatures of familiar faces when their features digitally were exaggerated or diminished by various magnitudes. With mixed faces used by Hiibner et al,5 the observer's task was to indicate in each trial which of the two faces presented, comprising different percentages of information from the two original faces, resembled more the face of Fields, for instance. When the task is to recognize persons or expressions, thresholds have been recorded as percentages of correct answers8"9 or as reaction times.4'910 Face perception ability can be further assessed by determining the contrast threshold for face recognition. This

Investigative Ophthalmology & Visual Science, April 1997, Vol. 38, No. Copyright © Association for Research in Vision and Ophthalmology

Downloaded From: http://iovs.arvojournals.org/pdfaccess.ashx?url=/data/journals/iovs/933200/ on 01/29/2017

1029

1030

Investigative Ophthalmology & Visual Science, April 1997, Vol. 38, No. 5

has been done, for instance, when investigating the effect of age11 or low vision8 on face recognition as well as when comparing contrast thresholds for the detection and discrimination of "real world" targets with visual acuity and contrast sensitivity for vertical, sinusoidal gratings of selected spatial frequencies.12 Individuals with deficits in the central visual field frequently report difficulty in recognizing faces.3 Bullimore et al3 investigated face recognition by presenting (monocularly) faces of various sizes to the subjects. The observers included four normal subjects. The smallest face for which both the expression and identity could be distinguished at 50% probability represented the threshold expressed in terms of viewing distance. Thresholds for patients with age-related maculopathy were approximately 12 times higher than for normal subjects. During an experimental session, Kolers et al13 presented a large number (300) of different faces, each in two different sizes. The sizes were chosen randomly from five available sizes ranging from 2° to 10° of visual angle in the vertical direction. The task was to indicate whether the face (of whichever size) had appeared before in the sequence. The effect of size on face recognition was determined, and thresholds were expressed in terms of percentage correct as a function of time lag or size ratio between the presentations. According to the results, it was easiest to recognize large faces, and similarity in size between the two presentations facilitated recognition. Few studies have investigated face discrimination specifically in the peripheral visual field, although increasing the stimulus size or presenting the images in two different visual field locations for comparison actually extends the stimuli into the peripheral visual field. The latter type of studies usually investigates hemispheric asymmetry (i.e., whether the left or right cerebral hemisphere would show superiority) in recognizing faces under different experimental conditions whereas eccentricity itself is not the subject of interest. However, Anderson and Parkin14 positioned the comparison image at the fovea and the test image in a peripheral location, and the task was to identify whether the second image shown in the periphery (at 4° eccentricity) was the "same" or "different" face as the first one shown at the fovea. The judgments were made more quickly when the second face was presented in the left rather than right visual field, in accord with many previous studies. In addition, Hiibner et al5 compared performance at the fovea and 2° eccentricity with a mixture of two faces or a face mixed with a checkerboard texture. By "M-scaling"lD (i.e., increasing the peripheral image size in inverse proportion to visual acuity), the face + checkerboard combination became equally distinguishable at the fovea and in die periphery, whereas a two-face combina-

tion was more difficult in die periphery despite the size scaling. In the periphery, the visibility of complex stimulus patterns is degraded by spatial interference of adjacent contours.16"19 The extent of this crowding effect has been found to be as much as 0.5 times the eccentricity studied in a letter recognition task.16 Similar results have been reported for orientational judgments.17 In a line-vernier task, the crowding area has been found to increase approximately at the same rate as vernier threshold.18 When the task is to recognize the orientation of a T-shaped target among other Ts, the crowding area increases faster than for resolving an isolated T-target.19 It would be conceivable that crowding also would degrade peripheral face recognition, because a face consists of a collection of features packed together closely. Human ability to recognize faces has been suggested to decline widi increasing eccentricity,9 and equalization of foveal and parafoveal performance by means of increasing size may be impossible.5 However, visual performance in many other tasks (refer to Table 1) can be made equal across the visual field simply by an appropriate, task-dependent magnification (i.e., spatial scaling of stimulus as a function of eccentricity). The need for magnification can be described quantitatively in terms of E2 value, which represents the eccentricity at which the foveal stimulus size must double to maintain performance at the foveal level.18 There is a large range of E2 values for different visual tasks. Based on the above data, it is interesting to investigate whether size scaling works for a spatially complex task (i.e., detecdon of geometric changes in a facial image), especially because there are tasks where scaling appears to be unable to equalize peripheral and foveal performance (e.g., color vision,20"22 phase discrimination5'23'24). We used a novel method in which a desired amount of geometric distortion is introduced into an image. A face was distorted by various amounts, and the smallest distortion recognized was determined using a two-alternative forced-choice method. Results were expressed in terms of correlation sensitivity, the inverse of the correlation between the images (original and distorted) that could just be discriminated. Eccentricity dependence for the stimulus size was determined by using the method of spatial scaling.29"31 Correlation sensitivities were thus measured at the fovea and various eccentricities for a sequence of face stimuli that were all magnified versions of each other. Then correlation sensitivities were plotted against logarithmic stimulus size for each retinal location. Provided that size scaling is possible and the range of stimulus sizes is chosen appropriately, the data at each eccentricity will be displaced relative to one another

Downloaded From: http://iovs.arvojournals.org/pdfaccess.ashx?url=/data/journals/iovs/933200/ on 01/29/2017

Spatial Discrimination at Various Eccentricities

1031

by an amount that depends on the rate at which performance deteriorates for the task. The amount of displacement shows the scaling factors (i.e., the rate at which magnification needs to increase with eccentricity) . Assuming that the scaling factor, equal to unity at the fovea, increases linearly with eccentricity, we get the E2 value as the inverse of the slope.18'28 The purpose of the experiment was to measure the smallest detectable amount of geometric change in a human face at several eccentricities. The results should show whether performance in this task can be made equal across the visual field simply by an appropriate change of scale and what is the E2 value of the task.

of more evenly distorted images as described in the Appendix. Examples of these images are shown in Figures ID and IE. Careful inspection of the images shows that in addition to the mouth, there now are clear distortions in the eyes and along the nose. Three face image sizes presented at 90% root mean square contrast [CRMS = (E/A)~ 05 , where E is the energy of the image and A is its area] on the display were used in the experiment. They were 9.9 X 10.7, 5 X 5.4, and 3 X 3.2 cm2 in horizontal (width) and vertical (height) dimensions, respectively. The horizontal and vertical dimensions of the equiluminous surround were 26.3 and 19.5 cm, respectively. The range of viewing distances was 0.14 to 4.91 m, producing angular image widths of 0.35° to 35°.

METHODS

Procedures Sensitivity for the detection of changes in facial features was measured for a series of stimulus sizes at the fovea and eccentricities of 5°, 10°, and 20°. Eccentricities and stimulus sizes were studied in random order. To avoid adaptation, which was strong at 10° and 20° eccentricities, sensitivity was never measured twice in succession at the same eccentricity. The subjects also fixated above or below the fixation point during the first presentations showing the most distorted images, as long as they were easily discernible from the original image. Further, the subjects did not start the next presentation immediately, but rested with their eyes open for a period of approximately 10 seconds between individual stimulus presentations. The room was illuminated dimly so that just enough indirect light was available for the fixation target (a black dot on a white background) to be visible. No reflections were visible on the cathode ray tube screen, and its surround was always of lower luminance than the screen itself. For the foveal presentation, the fixation target was positioned in the middle of the right-hand edge of the image to create comparable decline in the retinal sampling density across the image for the foveal and peripheral stimuli. Thus, as both observers used their right eye, the whole image was positioned in the nasal visual field. Interestingly, control experiments showed that thresholds remained the same whether fixation was at the center or at the edge of the image. For the peripheral stimuli, the fixation target was placed so that the stimulus was on the horizontal meridian further in the nasal visual field. Eccentricity, therefore, refers to the angular distance between the nearest (right) edge of the stimulus and the point of fixation.

Apparatus The stimuli were generated under computer control on a high-resolution 16-inch RGB monitor with fast phosphor B22 driven at the frame rate of 60 Hz by a VGA graphics board that generated 640 X 480 pixels. The pixel size was 0.42 mm X 0.42 mm. The display was used in a white mode. The average photopic luminance of the cathode ray tube was measured with a Minolta (Tokyo, Japan) Luminance Meter LS-110 and set to 50 cd m~2. The nonlinear luminance response of the display was linearized by using the inverse function of the luminance response in stimulus image computations. To obtain a monochrome signal of 256 intensity levels (8 bits) from a monochrome palette of 16,384 (14 bits), we combined the red, green, and blue outputs of the VGA board by using a video summation device built according to Pelli and Zhang.32 Stimulus The stimuli were created and the experiments were run by means of a software developed by one of the authors (RN). The software used the graphics subroutine library of Professional HALO 2.0 developed by Media Cybernetics. The original image was a photograph of a female face (Fig. 1A). It was transformed to a digital form by means of a scanner. The image manipulation has been described in detail in the Appendix. In the facial test images (Figs. IB, 1C), there are minor distortions also in the nostrils, but the distortion is most detectable at the mouth area. This is coincidental and is because of the magnification matrix used. The observers thought, however, that at each location of the visual field, it actually was the distortion of the mouth that could be seen just above threshold, so that feature was specifically attended to. Therefore, for a control experiment, we created another series

The least distorted image that still could be discriminated from the undistorted image was de-

Downloaded From: http://iovs.arvojournals.org/pdfaccess.ashx?url=/data/journals/iovs/933200/ on 01/29/2017

1032

Investigative Ophthalmology & Visual Science, April 1997, Vol. 38, No. 5

FIGURE 1. (A) The undistorted face stimulus. (B) A slightly distorted test stimulus in the first image set. Distortion is just greater than the smallest detectable in optimum conditions. (C) A moderately distorted test stimulus in the first image set. (D) A slightly distorted test stimulus in the second image set. (E) A moderately distorted test stimulus in the second image set. In the distorted test images of the first set (examples B,C), there is some distortion at the nose, but the most detectable changes occur at the mouth area. In the second set (D,E), distortion is distributed more evenly. In addition to the mouth, there now are changes in the eyes and along the nose. The subjective impression was that a different expression of the face containing more "smile" in the second set in comparison with the original face (A) showed the distortion.

termined by a two-alternative forced-choice algorithm with feedback. The subject's task was to indicate, using the keyboard, which one (first or second) of the two successive 1000-msec exposures accompanied by identical sound signals contained the distorted stimulus. The interstimulus interval was 600 msec, and the delay for the new trial after each response was 250 msec. The estimation of threshold distortion took place in two consecutive staircases. The first staircase started from the most distorted image, and distortion was reduced image by image within the series. A random subthreshold starting point was established using this staircase with one-correct-down, one-wrong-up rule. The second wrong choice initiated the second staircase,

which measured the distortion required for the level of 84% correct with four-correct-down, onewrong-up rule. 33 The estimate of threshold distortion was calculated as the arithmetic mean of the last eight reversals in the correlation coefficients between the undistorted image (Fig. 1A) and the distorted images within an estimation session. Correlation coefficient r = 1,1

Suggest Documents