A Decision Theory Approach to Picture Smoothing

32 IEEE TRANSACTIONS ON COMPUTERS, VOL. c-27, NO. 1, JANUARY 1978 A Decision Theory Approach to Picture Smoothing MORTON KANEFSKY, MEMBER, IEEE, AN...
4 downloads 0 Views 2MB Size
32

IEEE

TRANSACTIONS ON COMPUTERS, VOL. c-27, NO. 1, JANUARY 1978

A Decision Theory Approach to Picture Smoothing MORTON KANEFSKY, MEMBER, IEEE, AND MICHAEL G. STRINTZIS, MEMBER, IEEE

Abstract-This paper considers a detection theory approach to the restoration of digitized images. The images are modeled as second-order Markov meshes. This model is not only well suited to a decision approach to smoothing, but it enables computer simulations of images thereby permitting a statistical analysis of restoration techniques. Smoothing procedures that are near optimal in the sense of approaching a nonrealizable bound are demonstrated and evaluated. The achievable reduction in mean-square error is considerable for coarsely quantized pictures. This reduction, for the four-level pictures considered, is somewhat greater than that achievable by linear techniques. The approach actually minimizes the probability of error which may be important for preserving picture features. Index Terms-Decision theory, image enhancement, image simulations.

I. INTRODUCTION

TF the quantization of digitalized pictures is X relatively coarse, it seems reasonable that a discrete decision approach to smoothing might be preferable to linear smoothing. It certainly cannot be claimed that quantized picture statistics are Gaussian, which is the assumption for the -optimality of linear smoothing. In fact, even for finely quantized pictures, the assumption of a Gaussian characteristic for the image cannot be reconciled with the observed and documented preponderance of "black" and "white" over "gray" points in most images. It is only necessary, as far as the decision approach is concerned, to assume that the additive noise is Gaussian, which is much more reasonable. If the image has convenient statistical properties, it is possible to incorporate the values of the surrounding data points into the decision mechanism. Such an approach could yield significant smoothing even in the finely quantized or, equivalently, strong noise situation. The distinction between this approach and nonlinear filtering is that optimality is sought in- the sense of minimizing the probability of error rather than the mean-square error. It is anticipated, however, that the optimum detector will also perform quite well in the mean square error sense. The attractiveness of the decision approach lies not in any philosophical judgement on error criterion, but rather in the natural way that detection theory can be applied to the smoothing of digitalized pictures. As a starting point, a convenient statistical model of two-dimensional monochromatic images is needed. Among the models proposed to date, the "Markov mesh" model [1] appears particularly well suited to cases where the

scanner output is coarsely quantized, while the "Markov field" models [2]-[4] are probably most appropriate when (as in video applications) the scanner output is finely quantized. These models are comprehensive in that they provide full probabilistic image descriptions and are also highly accurate. On the other hand, these models are rather unwieldy, as applied to linear smoothing and enhancement procedures, and have not yet led to any practical image restoration procedures. Other models [5]-[8] have been constructed to fit known linear estimation procedures; the Kalman-Bucy linear recursive estimation procedure in particular. These models characterize only the spatial correlation function of the pictures which is generally assumed to be either one-dimensional stationary or twodimensional stationary and factorable into the product of the respective point correlations along each of the two principal axes. In actual fact, it would be more reasonable to assume [2] that the correlation is a function of the Euclidean distance of points on the plane, which is not factorable. However, the motivation of much of the reported work has been to obtain computationally simple, optimal, linearestimation procedures withonly secondary concern for the accuracy of the image model. It will be seen that the Markov mesh model, which naturally leads to an optimal decision theory approach to image restoration, results, in at least some cases, in a spatial correlation that is a function of Euclidean distance (see Appendix A). In order to analyze the effectiveness of the smoothing techniques considered here and to compare them with the upper bound of the smoothing capability of a decision theory approach (and indeed to determine this upper bound) a statistical analysis technique, that enables the calculation of mean-square-error or error probability for classes of images, is needed. There is a virtual absence of such quantitative analysis reported in the literature. Fortunately, a unique aspect of the Markov mesh, as distinguished from other two-dimensional image models, is that it readily enables the computer simulation of stationary, random pictures with prescribed spatial dependence. The simulation of Markov mesh pictures is used to measure the capability of the decision theory approach to

smoothing.

II. SIMULATION OF MARKOV MESH PICTURES Consider the two arrays shown in Fig. 1 where the array Xab corresponds to the "past" of the picture (relative to the data point Xab) when scanned in the normal manner.

The array Zab is the entire picture (past and future) except for the data point Xab. In a manner that is analogous to a one-dimensional Markov process, the Markov mesh is 0018-9340/78/0100-0032$00.75 C) 1978 IEEE

Manuscript received August 5, 1976: revised April 10, 1977. -_ The authors are with the Department of Electrical Engineering, University of Pittsburgh, Pittsburgh, PA 15213.

33

KANEFSKY AND STRINTZIS: DECISION THEORY APPROACH TO PICTURE SMOOTHING

a

.

/

/ /

/' '

Fig. 1. The Markov mesh model.

7

.

defined by the relationship [1]

P(Xab/Xab) = P(Xab/Sab)-

Oe

0

0

(1) 0/ // Thus, Sab is the state of the picture which is characterized Fig. 2. A four-level second-order Markov mesh picture. by the transition matrix P(XablSab) and the initial or border statistics. The paper by Abend et al. [1] shows that this detinition results in each data point being dependent transition matrices only. The analysis in the rest of this on a local neighborhood, or more precisely paper is based on these computer simulations and on the (2) resultant measured state probabilities. P(XablZab) = P(XablYab), and explores the relationship between the state and the neighborhood Yab. In particular, if the state is specified by III. BOUNDS ON THE CAPABILITY OF THE DECISION THEORY APPROACH the shaded region in Fig. 1(a), which will be called a "second-order Markov mesh," the neighborhood that Xab deThis discussion will limit itself to "real-time" processing pends on is indicated by the shaded region in Fig. 1(b). The which is interpreted as using the "past" of the picture simulations in this paper are based on a four level sec- (when scanned in the normal manner) in addition to the ond-order Markov mesh and, hence, the pictures are "present" data point to make the decision, The reason for characterized by a 64 X 4 transition matrix. this restriction is that the state of the data point, when The simulation begins by initializing a 40 X 40 array using the four-level, second-order, Markov mesh model, with zeroes on the left and top border and then the array can have 64 different values of processed in real time and is swept out by using a random number generator to de- 65 536 different values if the real-time restriction is termine the value of the next data point from a selected dropped. Thus, the restriction to real-time processing matrix. Only the lower right hand 20 X 20 portion of the greatly simplifies the analysis. The extension to nonreal array is retained for processing in order to guarantee time should be conceptually straightforward although it minimal dependence on the initial borders. The transition may introduce computational problems. The additive matrix selected was arbitrary and was obtained after a brief noise is presumed to be zero mean, Gaussian, and white period of parameter juggling in an effort to obtain blob or (i.e., independent from one data point to another). craterlike pictures. Only 10 of the entries in the 64 X 4 A nonachievable lower bound on the error probability transition matrix are different from zero or one. One of that can be achieved with detection theory methods is these pictures is shown in Fig. 2. The pictures are then used obtained by considering the nonrealizable situation of a to determine the state probabilities, the spatial correlation decision mechanism based on perfect knowledge of the ,function, and other parameters. It was found that these actual state of the picture. It will henceforth be assumed measurements converged rather rapidly and were inde- that a data point Yij is given by pendent of the initializing borders. It was also found that, (4) Yij = xij + nij for the particular transition matrix used, 25 of the 64 states had a zero probability of occurring and the spatial corre- where nij is a white noise process with variance Or. The lation function was accurately modeled by ^ signal xi; is modeled by a Markov mesh and hence is characterized by the conditional probability P(xij/Sij) R(Nx,Ny) which is an element of the transition matrix specifying the = 0.81 exp 1-[(0.44Nx)2 + (0.34Ny)2]"/21 + 0.30, (3) Markov mesh process. The probability of error is miniwhere N. represents the number of horizontal steps and mized if that value of xij is chosen which maximizes the Ny the number of vertical steps (Appendix A). It appeared, probability P(xij/yij,Sij); i.e., the a posteriori most likely from inspection of the measured correlation, that it could value of xij. it is well known that by a simple application not be modeled adequately by a separable function. It is of Bayes' rule, this is equivalent to maximizing P(xij/Sij) not known whether the Markov mesh leads inevitably to Pn (Yij- xi1), where Pn (a) is the probability density of the a spatial correlation function that depends on Euclidean noise. If the additive noise is Gaussian, it can easily be seen distance or whether this is true of a particular class of that the a posteriori most likely value of xij is obtained by

34

IEEE TRANSACTIONS ON COMPUTERS, VOL.

choosing that xij which minimizes dij, defined as: (5) dij = (yj - Xj)2- 2 In P(xij/sij). This procedure has been tested via simulation with the mean-square difference between the original and noisy picture, both before and after processing, specifying the mean-square error. As the decision approach for the data along the upper and left borders of the picture must be differeht, since there are no known states, these borders were omitted from the calculations. Another convenient bound is determined by considering a decision mechanism which ignores the state of the picture and simply selects the closest value in that it minimizes: (6) dij = (yij - xij)2. While this will clearly determine an upper bound on the error probability for the optimum realizable decision mechanism, it will not determine an upper bound for all decision mechanisms. It is possible that an ad hoc technique might result in a larger error than this "closest value" procedure, in which case it should be discarded. The mean-square error of the preprocessed pictures and the lower and upper bounds were calculated' and are shown plotted in Fig. 3 as a function of noise variance. The signal variance, as seen from (3), is 0.81. Thus the range of Fig. 3 represents a signal-to-noise ratio between 2.25 and 0.5. Observe that the nonrealizable lower bound indicates the possibiity of considerable improvement, particularly for small signal-to-noise ratios. Pictures, with a signalto-noise ratio of 1, have a potential. smoothing factor approaching 5.8 while the "nearest value" upper bound achieved a smoothing factor of 1.4. The other 3 curves in Fig. 3 indicate the performance of realizable algorithms that utilize the "past" of the picture. These are discussed in the following section. Note that one of them is worse than "nearest value" processing and is discarded while the other two fall between the bounds. IV. OPTIMAL AND AD Hoc SMOOTHING TECHNIQUES The first attempt was to consider an ad hoc procedure based on (5) achieved by replacing the actual state with the previous estimated values. This procedure would have the advantage of not having to retain any portion of the original noisy picture in storage (at least 2 lines of data for the second-order Markov mesh considered). Surprisingly, the results were very poor and as indicated in Fig. 3 (labeled ad hoc procedure), these were no better than the original noisy picture. Replacing the state estimates with those based on closest value estimates [see (6)] made things even worse. One is forced to conjecture that ad hoc detection procedures using an estimated state in place of the actual state will not work well. On hindsight, this result seems reasonable since the picture must have substantial spatial dependence in order for any scheme to do significantly 1 The calculations are based on limited simulations and hence the results should be regarded as "ball park" figures rather than precise ones.

C-27, NO. 1, JANUARY 1978

AD-HOC

PROCEDURE

.7 NEARE ST VALUE

ad

UN

"UPPER" BOUND

MAXI-MU

LIKELIHOOD PROCEDURE

OPTIMUM

B(l,.) .2

PROCEDURE UNDER BLOCK CONSTRAINT

NON-REALIZABLE LOWER BOUND

NOISE VARIANCE -_

I .81

.36

1.62

Fig. 3. A comparison of decision methods for smoothing.

better than closest value processing, and in this event, errors in the estimated state values would propagate through the picture. It may be, however, that this tendency is exaggerated by the transition matrix used, which resulted in 25 of 64 states having a zero probability of occurring. To effectively use surrounding data points, one must store and use the actual noisy data rather than the estimates. If one considers an (N + 1)(M + 1) block of noisy date points that includes the point to be estimated in the lower right hand corner (yij), then the optimum rule is to choose that xij which maximizes the probability P(Xij/yi,j; Yi-1j; ***,Yi-Nj; Yi4-i, * ,Yij-M; Yi-li-1; ***Yi-N.J-M) Again, by the well-known manipulation using Bayes rule, this is equivalent to choosing that xij which maximizes 4

4

Xl-1J" 1

Xi-nj-m=l

exp

2

1

N

M

(Yi-n,m-m

2an n^=O m=O

Xi-nj-m)2

P(xij/xi-lej;xi-lej-,;xiji-,)P(B(Nf>M)),

(7) where B(N,M) represents the block of noisy data points excluding the point Yi,j. While it is true that the bigger the block size the better the smoothing, it could be conjectured that the improvement over the block that contains the state data points (in this case B(1,1)) should be minimal. Furthermore, the probability of large blocks are difficult to determine (requiring extensive simulations) and would require a great deal of storage in the processor. For these reasons the second technique considered (indicated in Fig. 3 as optimum under B(1,1) block constraint) was based on choosing that xij which maximizes 4

4

4

Xi-lj=l Xi-ij-1=1 Xi,j-l=

* exp

2cTn n=0 m=O

(Yi-n-m Xi-n,j-m)

P(xij/xi-lj ;xi-lj- 1;xi,j-,)P(xi- 1j ;xi-l,j- ;xi,j-1)- (8) This is not an absolutely optimum procedure since the noisy picture is no longer a second-order Markov mesh. We

35

KANEFSKY AND STRINTZIS: DECISION THEORY APPROACH TO PICTURE SMOOTHING

M

M, S. E

.3 .

PROBABILITY OF ERROR

N

EXPONENTIAL MULTIPLIER

I

2.5

I~~~~~~~~~~~~~~~

1

Fig. 4. Relationship between mean-square error and error probability.

observe, from Fig. 3, that this procedure works quite well, coming significantly closer to the nonrealizable bound than nearest value processing. This result strengthens the 1.0 conjecture that using larger blocks of data should result in minor improvement. QR(NX RAN +20) The final procedure considered was similar to that indicated by (8) except that the state probabilities were all assumed to be equal. While this maximum likelihood 0.3 procedure will clearly be inferior, the purpose of this N simulation was first to determine the consequence of not 0 calculating and storing the state probabilities and second 40 20 to observe the sensitivity of this technique to accurate Fig. 5. Periodic plot of two-dimensional correlation function R(NX,Ny) = R(NX + 2ONy). knowledge of the state probabilities. The results of this maximum likelihood procedure are shown in Fig. 3 and one can observe the resultant degradation. Clearly the proce- two-dimensional correlation matrix is based on considering dure is not very sensitive to accurate knowledge of the state the picture lines (20 data points) serially so that the tips probabilities. of the curve represents the correlation in the vertical direction. A reasonable mathematical fit to these data can V. MEAN-SQUARE ERROR AND ERROR PROBABILITY be made by (3) of the text or The smoothing procedure indicated by (8) was based on R(NX,Ny) = 0.81 exp Iv'(0.44N )2 + (0.34N )21 + 0.30, the concept of minimizing the probability of error rather. (A-1) than the mean-square error. An interesting relationship between these errors can be observed by introducing a where N, and Ny represents the member of horizontal and multiplying constant K into the argument of the expo- vertical steps. Factorable models of the type nential in (8). This has the effect of misrepresenting the noise variance and thereby changes the relative emphasis R(NX,Ny) = 0.81 exp {-c IN,I -dINy I + 0.30, of the processor between the a-priori information and the (A-2) noisy data. This modified procedure, while increasing the probability of error, could result in a smaller mean-square did not fit as well. The correlation values in the area beerror. This effect is illustrated in Fig. 4 where it is assumed tween the x and y axis were somewhat larger than the that the signal-to-noise ratio is unity. We observe, for this factorable model predicts. However, the variance of the case, that a K of the order of 2.5 decreases the mean-square measurements were comparable to the difference in the error from the value shown in Fig. 3 by nearly 15 percent. models. Whether the resultant mean-square error of the modified REFERENCES procedure is close to the minimum obtainable mean-square error is purely a matter of conjecture at this point. [1] Abend, Harley, and Kanal, "Classification of binary random patAPPENDIX A Spatial Correlation Matrix of Markov Mesh For the 64 X 4 transition matrix -considered here and in [9], the mean and spatial correlation matrix was estimated by measurements on 10 independent 20 X 20 pictures of the type shown in Fig. 2. While the correlation matrix of each picture varied somewhat, reasonable convergence when averaging the pictures did take place. The results are shown in Fig. 5. This rather standard way of plotting the

terns," IEEE Trans. Inform. Theory, vol. IT-18, Oct. 1972.

[2] J. W. Woods, "Two-dimensional discrete Markovian fields," IEEE Trans. Inform. Theory, vol. IT-18, pp. 232-240, Mar. 1972.

[3] E. Wong, "Two-dimensional random fields and representation of images," SIAM J. Appl. Math., vol. C-21, pp. 756-770, 1968.

, "A likelihood ratio formula for two-dimensional random fields," [4] IEEE Trans. Inform. Theory, vol. IT-20, pp. 418-420, July 1974. [5] N. E. Nahi and T. Assefi, "Bayesian recursive image estimation,"

IEEE Trans. Comput., vol. C-21, pp. 734-738, July 1972.

[6] N. E. Nahi and C. A. Franco, "Application of Kalman filtering to image enhancement," in Proc. IEEE Conf. Decision and Control, New Orleans, LA, pp. 63-65, Dec. 1972. [71 A. Dabibi, "Two-dimensional Bayesian estimate of images," Proc. IEEE, vol. 60, pp. 878-884, July 1972.

36

IEEE TRANSACTIONS ON COMPUTERS, VOL.

c-27, NO. 1, JANUARY 1978

18] S. R. Powell and L. M. Silverman, "Modeling of two-dimensional mation theory, nonparametric detection, data compression, and image. covariance functions with application to image restoration," IEEE Trans. Automat. Contr., vol. AC-19, pp. 8-13, Feb. 1974. [9] M. Kanefsky, "Modeling and simulation of two dimensional pictures," in Proc. 6th Pittsburgh Conf., Apr. 1975.

_

Morton Kanefsky (S'56-M'63) was born in Philadelphia, PA, on November 21, 1935. He received the B.S. degree in electrical engineering from the University of Pennsylvania, Philadelphia, in 1957 and the M.S. and Ph.D. degrees in 1960 and 1964, respectively, from Princeton University, Princeton, NJ. At present he is an Associate Professor of Electrical Engineering, University of Pittsburgh, Pittsburgh, PA. His principle research activities are in the areas of detection and esti-

enhancement.

Michael G. Strintzis (S'68-M'70) was born in Athens, Greece, on September 30, 1944. He received the B.S. degree in electrical and mechanical engineering from the National TechUniversity of Athens, Athens, Greece, in _nical 1967 and the M.A. and Ph.D. degrees in electrical engineering from Princeton University, Princeton, NJ, in 1969 and 1970, respectively. From 1967 to 1968 he was a Teaching Assistant in the Department of Electrical Engineering, Princeton University. He is presently Associate Professor of Electrical Engineering, University of Pittsburgh, Pittsburgh, PA. Dr. Strintzis is a member of the Society for Industrial and Applied Mathematics and Sigma Xi.

Generation of Optimal Transition Count Tests JOHN P. HAYES, MEMBER, IEEE

and 1-to-O transitions in R. TC testing is of interest because it can be implemented using very inexpensive test equipment, and because it provides logarithmic compression of the test response data. Some work on the formal analysis of TC testing has been reported in the last few years [1]-[6]. A problem of basic importance is the efficiency of TC tests compared to conventional tests. The following aspect Index Terms-Combinational circuits, fault detection, fault of this problem is considered here. Let T be a set of n test location, fault tables, optimal tests, test generation, transition patterns for an arbitrary set of faults F in a combinational counting. logic circuit. What is the shortest TC test sequence S constructed from test patterns in T that detects (distinI. INTRODUCTION all faults in F that are detectable (distinguishable) guishes) TRANSITION count (TC) testing is a recently by transition counting, and what is the length of S as a Tdeveloped method fer testing logic circuits. It involves function of n? applying a predetermined sequence of test patterns called Suppose that a fault table or matrix M defining the rea TC test to the circuit being tested, and observing the lationship between T and F is given. Fault tables have transition count c (R) of the resulting response sequence frequently been used to analyze conventional testing [71, R at some test point, where c(R) is the number of 0-to-1 [81. There is one row in M for every test ti in T and one column in M for every fault fj in F. N, the circuit under Manuscript received September 24, 1976, revised May 11, 1977. This test, is considered to have a single primary output line P research was sponsored by the Joint Services Electronics Program through the Air Force Office of Scientific Research (AFSC) under Con- which is the test point chosen for observing TC's. The tract F 44620-76-C-0061. entry in the ith row and jth column of M, denoted by mij, The author is with the Department of Electrical Engineering and the is the value observed at P when ti is applied to N with fj Department of Computer Science, University of Southern California, present. When fault detection is being considered, column Los Angeles, CA 90007. Abstract-The problem of generating minimum-length transition count (TC) tests is examined for combinational logic circuits whose behavior can be defined by an n-row fault table. Methods are presented for generating TC tests of length n + 2 and 2n- for fault detection and fault location, respectively. It is shown that these tests are optimal with respect to the class of n-row fault tables in the sense that there exist n-row fault tables that cannot be covered by shorter TC tests. The practical significance of these tests is discussed.

0018-9340/78/0100-0036$00.75 ©O 1978 IEEE

Suggest Documents