int-journal-child-abuse

Use of Artificial Neural Network as a Risk Assessment Tool In Preventing Child Abuse Iraj Zandi Professor Emeritus of Systems & National Center Profes...
Author: Clinton Burke
10 downloads 2 Views 1MB Size
Use of Artificial Neural Network as a Risk Assessment Tool In Preventing Child Abuse Iraj Zandi Professor Emeritus of Systems & National Center Professor of Resource Management and Technology University of Pennsylvania Philadelphia, PA 19104 zandi@ seas.upenn.edu

Radnor/papers/int-journal-child-abuse

1

Use of Artificial Neural Network as a Risk Assessment Tool In Preventing Child Abuse Iraj Zandi

Abstract Every year Child Protection Services (CPS) in the US receive millions of reports of incidences of children being abused. CPS, by law, is required to investigate these reports and if necessary to intervene to protect the children at risk. Since resources are limited, some kind of triage is taking place. Various CPS offices use different risk assessment tools. The purpose of this paper is to report on a preliminary effort to explore the feasibility of utilizing artificial neural network (ANN) technology as a risk assessment tool. Third National Incidence Study of Child Abuse and Neglect (NIS-3), a congressionally mandated, periodic effort of National Center on Child Abuse and Neglect provides a rich set of data on child abuse was used to explore the utility of using ANN as a risk assessment tool. While this data set is adequate for the feasibility study, future research must rely on more targeted data. In the current study, several different ANN designs were constructed and experimented with under a variety of conditions. These included several designs of multi layers neural network (MLNN) using several different training algorithms and a radial basis network (RBN.) The data was divided randomly to three groups: a Training Set, a Validation Set, and a Test Set. The procedure was to use the Training Set to train the network and test its veracity using the Validation Set and the Test Set. We found that a 31-25-1 MLNN architect, after being trained, is capable of classifying the Training Set 100% accurately. However, its performance deteriorates for the Validation Set and the Test Set – new data that was not included in the Training Set. Despite this deterioration, the results are very encouraging. The network of Experiment VII-1 was capable to classify correctly 90 percent of abused cases for the Validation Set and 89 percent for the Test Set. This means that it missed 10 percent of abused children for the Validation Set (false negative) and 11 percent of the Test Set. In addition, it also misclassified 13 percent (false alarm) of those children that the survey did not find to have been abused in the Validation Set as being abused (false positive.) For the Test Set the misclassification was the same 13 percent. A radial basis network classified 100% accurately the Training Set and 93 percent of abused cases in the Test Set. It also misclassified 16 percent of not abused cases as being abused (false positive.) Although, definitive data for the performance of currently used risk assessment tools is not available, experts tell us the performance of neural network is a clear improvement over the current practice. Much remains to be researched.

2

Use of Artificial Neural Network as a Risk Assessment Tool In Preventing Child Abuse Iraj Zandi Professor Emeritus of Systems & National Center Professor of Resource Management and Technology University of Pennsylvania Philadelphia, PA 19104 zandi@ seas.upenn.edu Introduction –Every year Child Protection Services (CPS) in the US receive millions of reports of incidences of children being abused. CPS, by law, is required to investigate these reports and if necessary to intervene to protect the children at risk. Since resources are limited, some kind of triage is taking place. The purpose of this paper is to report on a preliminary effort to explore the feasibility of utilizing artificial neural network (ANN) technology as a risk assessment tool in campaign against child abuse. Whenever, a Child Protection Service (CPS) receives information that would indicate the possibility of a child being harmed, the agency must make two decisions regarding: 1) the level of risk that the child may be exposed to, and 2) the nature of the response by the agency. Obviously the nature and timing of response depends on the level of risk. The higher the risk, the more need for speedier and more intrusive action. Across the US, the welfare agencies face this predicament about three million times a year. Unfortunately, despite dedicated professionals involved, about 1100 children die each year as a result of the caretaker’s abuse, and numerous others are harmed to various degrees. A tool that would provide CPS with a better capability to identify, at an early stage of involvement, those children that may be at-risk, could have a significant social value.

3

Protective interventions by CPS can only be justified if the hypothesis of existence of some kind of stable pattern is accepted. We reason that if such a pattern exist and we can discern it, then the society is obliged to intervene in cases that children are found to be potentially at risk. CPS’ action is sanctioned upon the acceptance of the hypothesis that family life, environmental features, socio-economic factors, psychiatric profile of participants, and other variables play a role in child abuse situations. Further, we must also agree that some more or less regular pattern exists that distinguishes between potentially abusive situations from non-abusive ones. Obviously, if we are not allowed these assumptions, all cases must be treated alike. All sciences are predicated on the assumption of some regularity in the world. Scientific treatment of the problem of child abuse is no exception. Risk assessment is an ongoing process and is a tool that is used during the investigation phase and continues until case closure. It supports the findings of the child welfare workers, but does not replace good judgment. Risk assessment in child welfare is only a tool used in determining the need for agency intervention. Determining the level of risk to a child involves much more than a single computation of data ( Stevenson 2001). Nevertheless, if the analysis of data can provide some guideline to professionals, then the more realistic and accurate that guideline, the more effective the intervention would be. That explains why all CPS offices use some kind of screening model based on their past experience. The application of ANN technology to child abuse problems is novel and may offer a more reliable, faster, and more efficient alternative to current screening techniques used by CPS. If successful, the proposed methodology could usher

4

in the use of a new computational, learning technology in the field of child abuse prevention. ANN - Artificial neural network (ANN) is a technology that is capable of discovering patterns in a set of data - if such patterns exist. ANN is a learning system that tacitly discovers the working mechanism of a system from examples of its past behavior. Thus, for complex systems that explanatory theories are lacking, but data regarding past behavior is available, ANN is a powerful tool of investigation. In addition to pattern recognition, ANN has other useful capabilities. However, here we are concerned with only its pattern recognition capabilities. Appendix A is a description of ANN for nontechnical readers. Data –The data obtained by the Third National Incidence Study of Child Abuse and Neglect (NIS-3), a congressionally mandated, periodic effort of National Center on Child Abuse and Neglect (a Center within the Administration for Children and Families,) was used in the current research. NIS-3 data was deemed adequate for this preliminary research on the feasibility of using ANN technology. Its accessibility was also a motivating factor. Future research may be conducted using a more appropriate data set. NIS-3 survey generated several sets of data, including what is known as CPSOnly data file, which contains information on 7,565 individual cases. The survey attempted to collect data about 248 different information fields (dimensions) for each case (Sedlak 1997). The CPS-Only data required substantial manipulation prior to being used to train an ANN. Adam Kaufman (Kaufman 1999) converted data for use with ANN. Kaufman had to calculate the Metropolitan Status (Metstat) for each case, remove cases with

5

missing data, fix errors and data inconsistencies, ensure that each field represented only one piece of information, and lastly, remove those variables that did not vary over population. This process left 1767 complete cases.

Since not all of the 248 fields (dimensions) of information are available to a decision maker at the time of triage, Kaufman selected only 40 of those dimensions for training ANN. The most difficult data preparation procedure involved ensuring that each variable (field) represented only one piece of information. Among the 40 fields there were several that coded for multi-variable information (for example the variable METH that contains information regarding the ethnicity of the mother/substitute, includes seven pieces of distinct information.) To account for this multiplicity, Kaufman re-coded the data set by expanding multi-dimensional fields thus, generating 1767 cases with 141 information fields.1 Experiments – We started with a pool of 1767 cases. Each case consisted of a vector of 141 elements, each element representing an information field. We used Principle Component Analysis (PCA) to reduce the problem space. This process eliminated only those variables that exhibited up to one percent variability among the examples. Figure 1 shows the process we followed. After application of PCA to input vectors the dimension was reduced to only 31 effective elements. The output field contained information indicating whether a child was found to be abused severely (harmed), designated by 1, or not abused, designated by 0. This data was organized in two matrices: 1) an input matrix of 31 x 1767 and 2) a target matrix of 1 x 1767. Further, these matrices were divided,

1

If you think of each case being represented by a vector of property, each field constitutes an element of that vector.

6

randomly, among three sets: a Training Set consisting of 884 cases, a Validation Set of 441 cases and a Test Set of 442 cases. The procedure was to use the Training Set to train the network and test its veracity using the Validation and the Test Sets. Figure 1 – Application of Principal Component Analysis (PCA) Dimension of Original Inputs (141 Fields): Training Set 884 Validation Set 441 Test Set 442 1767

PCA

Reduced Dimension (31 Fields): Training Set 884 Validation Set 441 Test Set 442 1767

Neural Net

Target: Harmed or Not Harmed

Table 1 shows various architectures that were constructed and experimented with. Figure 2 shows an artificial neural network of the type used in Experiment III. The figure shows a 31-25-1 multi-layers neural network (MLNN.) This is a three layers network comprising of 31 neurons in the first layer (input), 25 neurons in the second layer (hidden) and 1 neuron in the third layer (output). Transfer functions for neurons in the input and hidden layers are hyperbolic tangent sigmoid2 and the single neuron in the

2

In Matlab this is called “tansig.” Tansig is the name used in Matlab for the function of the following

form: tansig(n) =

2 -1 (1 + exp( −2n)) 7

output layer has a linear transfer function3. The calculations were made in the Matlab Environment and we used their Neural Network Toolbox. Matlab is a commercial product of The Math Works Inc. and their neural network tool box is a high performance software. Table 1- Various ANN Architectures Used for Experimentation (The technical Terms are from Neural Network Toolbox of Matlab)

Experi ment I

Architect 31-31-1

II III IV V VI VII

31-20-1 31-25-1 31-25-1 31-25-1 31-25-1 31-25-1

VIII

Radial B.

Type

MLNN MLNN MLNN MLNN MLNN MLNN

Transfer Function Training Layer 1 Layer 2 Layer 3 Algorithms Tansig Tansig Purelin QuasiNewton Backpropag ation (Matlab: Trainbfg) Tansig Tansig Purelin Trainbfg Tansig Tansig Purelin Trainbfg Tansig Tansig Purelin Trainbfg Tansig Tansig Purelin Trainbfg Tansig Tansig Purelin Trainbfg Tabsig Tansig Purelin Trainbr

RBF

Radbas

MLNN

Purelin

GOAL ERR

0

0.15 0 0.1 0.15 0.20 Validation Set

Newrbe

After the network was trained, it was first employed to classify the Training Set itself. This was undertaken for the purpose of providing information as to the capability of the network to learn the Training Set. This followed by presenting to it the Validation and Test Sets. The results are shown in Table 2a (for Training Set), 2b (for Validation Set), and 2c (for Test Set.)

3

In Matlab this is called “purelin.” Purelin is the name used in Matlab for the function of the following form:purelin (n) = n.

8

Figure 2 – Schematic Presentation of a Three Layers ANN

Two points regarding Tables 2a, 2b, and 2c need to be made: 1) In experiment VIII where we used a Radial Basis Network we divided data to only two sets. The Training Set consisted of 1150 cases and the Test Set consisted of 617 cases. 2) Since the output neurons’ transfer functions in all experiments were linear, we needed to find a value of the output that separates two classes. Column S shows values that we experimented with. This point will be discussed later in this paper. For all, but Experiment II the networks could be trained to learn the Training Set with almost perfect accuracy, as is obvious from the last three columns of Table 2a. The “Correct” column indicates the percent of correct prediction of children observed in the survey to have been actually abused. The False Negative (FN) column shows the percentage of abused cases that the network has missed to identify. False Positive (FP) column shows the not-abused cases, which erroneously were identified by the network as being abused (false alarm.) The network in Experiment II had 20 neurons in the hidden layer and was incapable to learn the Training Set, so it was discarded.

9

Table 2a – Results of Experimentation For The Training Set Experiment

I II III-1 III-2 III-3 IV-1 IV-2 IV-3 V-1 V-2 V-3 VI-1 VI-2 VI-3 VII-1 VII-2 VII-3 VIII-1 VIII-2 VIII-3

S

Training Set Data (Number) Abused NotAbused 0.8 66 818 NA 66 818 0.5 66 818 1.5 66 818 2.0 66 818 0.3 66 818 0.8 66 818 1.2 66 818 0.2 66 818 0.8 66 818 1.5 66 818 0.4 66 818 0.8 66 818 1.2 66 818 0.3 66 818 0.8 66 818 1.0 66 818 0.15 1150 0.4 1150 1.0 1150

ANN Prediction in % Correct False False Negative Positive 100 0 0 ---------100 0 0 100 0 0 100 0 0 100 0 3 100 0 1 97 2 0 100 0 13 100 0 0.4 100 0 0 100 0 3 100 0 0.1 100 0 0.2 100 0 0 100 0 0 100 0 0 100 0 0 100 0 0 100 0 0

Table 2b exhibits the results when we presented Validation Set (this is new data that the network has not seen before) to the trained networks. There is no entry for Experiment VIII. This will be discussed later. Comparing Tables 2a and 2b indicates that the performances of all networks studied deteriorated to various degrees when they encounter new cases that were not included in the Training Set. Table 2c shows the results when the Test Set was presented to the networks. Results - There are a number of ways that one could express the performance of a trained

ANN. One such procedure is to calculate the mean sum of squares of the network’ errors

10

(mse). The algorithms4 employed in the current study calculate and plot the progress of mse as the networks learn. Figures 3 shows the performance during the training for a 3131-1 multi-layers neural network (MLNN) architect as the network learns.

Table 2b – Results of Experimentation For Validation Set Experiment

I II III-1 III-2 III-3 IV-1 IV-2 IV-3 V-1 V-2 V-3 VI-1 VI-2 VI-3 VII-1 VII-2 VII-3 VIII-1 VIII-2 VIII-3

S

Validation Set Data (Number) ANN Prediction in % Abused NotCorrect False Abused Negative 0.8 31 441 74 25 NA 31 441 ------0.5 31 441 87 13 1.5 31 441 71 29 2.0 31 441 70 29 0.3 31 441 87 13 0.8 31 441 80 19 1.2 31 441 74 26 0.2 31 441 90 10 0.8 31 441 77 22 1.5 31 441 77 22 0.4 31 441 87 13 0.8 31 441 87 13 1.2 31 441 84 13 0.3 31 441 90 10 0.8 31 441 84 16 1.0 31 441 84 16 0.2 0 0 -----0.4 0 0 -----1.0 0 0 -------

False Positive 13 ---24 9 6 25 9 8 27 14 6 20 9 9 13 8 6 ---------

It is to be noted that the error drops first moderately, then slowly and finally rapidly until it falls to about zero. In fact, this trained network distinguishes all of the abused children in the training set from those not abused, with 100 percent accuracy, see Experiment I in Table 2a and also Figure 4. In Figure 4 we have plotted the distribution 4

A variety of training algorithms were tested., see Table 1. “Trainbfg”, “trainbr”, and “newbe” are names in Matlab. Trainbfg is a quasi-Newton backpropogation algorithms. Trainbr is an algorithms that uses the Validation Set to stop over-training. Newbe creates a radial basis network and specializes it to the Training Set.

11

of original data in the Training Set, super-imposed on the distribution of the outputs of the trained neural network upon presentation of input vectors in the Training Set. There are 844 cases in the Training Set and this is shown in the horizontal axis. Associated with each case is a target and also the output from neural network. The distributions of original Training Set and the neural network outputs are identical and cannot be distinguished in the Figure 4 – classification is 100 percent correct. Table 2c – Results of Experimentation For the Test Set Experiment

I II III-1 III-2 III-3 IV-1 IV-2 IV-3 V-1 V-2 V-3 VI-1 VI-2 VI-3 VII-1 VII-2 VII-3 VIII-1 VIII-2 VIII-3

S

Test Set Data (Number) Abused NotAbused 0.8 33 442 NA 33 442 0.5 33 442 1.5 33 442 2.0 33 442 0.3 33 442 0.8 33 442 1.2 33 442 0.2 33 442 0.8 33 442 1.5 33 442 0.4 33 442 0.8 33 442 1.2 33 442 0.3 33 442 0.8 33 442 1.0 33 442 0.2 45 617 0.4 45 617 1.0 45 617

ANN Prediction in % Correct False Negative 78 21 ------87 13 71 29 70 29 93 6 75 24 70 30 89 12 82 19 63 36 89 12 72 27 72 27 89 12 88 12 91 9 93 6 73 27 77 22

False Positive 11 ---24 9 6 22 9 4 22 8 2 18 3 3 13 7 7 16 4 9

12

Figure 3 Performance of a 31-31-1 Network (Experiment 1) During Training

Figure 4 -Distribution of Outputs of Trained Network When the Test Set was presented to it, superimposed on the Distribution of actual Test Set Data Harmed Discrimination Index

Not Harmed

Cases in the Training Set

The meaning of vertical axis in Figure 4 needs explanation. Any horizontal line drawn between lower and upper data points on vertical axes can be an instrument for separating the two classes. Drawing a horizontal line at a given value of the vertical axis and counting the number of cases that fall either above or below that line can implement a separation. Those falling above the line are cases that the trained neural network has identified as belonging to the abused class. Column S in Table 2 shows our search.

13

It is to be noted that the values on the vertical axis in Figure 4 are different than 1 and 0 (that were originally assigned to abused and not abused cases) due to the fact that the data was treated through Principle Component Analysis algorithm, see Figure 1. Tables 2b and 2c show the results when we presented the cases in Validation and Test sets to the previously trained network of Experiment I (these are data that the trained network has not seen before). As it can be seen, the capability of the network to distinguish between two classes (harmed and not-harmed) deteriorates. There are 441 cases in the Validation Set distributed according to Figure 5. Figure 6 shows the distribution of outputs of trained neural network when the Validation Set was presented to it. Here the separation is not as clean as in Figure 5. From Figure 6 it can be seen that the lower the value of S the more correctly the network classifies the abused cases but for a price: the network also misclassifies other cases. Any horizontal line in Figure 6 (S in Tables 2a, 2b and 2c) which, belongs to Validation Set, classifies the cases in three different ways. If we consider the points above the line we notice that many abused cases fall below the line and are correctly classified. However, we also notice that many of abused cases

Figure 5 Distribution of Actual Validation Set

14

are misclassified (False Negative.) In addition, we find that some of the cases that are not abused cases are misclassified as abused cases (False Positive). Tables 2a, 2b, and 2c show these three types of classification when the value of the vertical axis is varied. Figures 7a and 7b show similar behavior with respect to the Test Set. In all cases the performance suffers when new data are presented to the trained network. Figure 6 Distribution of Outputs of the 31-25-1 Trained Network when the Validation set was presented to it

Harmed

S

Not Marmed

In experiment II we decreased the number of neurons in the hidden layer and created a network of 31-20-1. Figure 8 shows that this network cannot learn the Training Set. The error remains very high. So we tested a 31-25-1 multi-layers neural network (MLNN.) Experiments III to VII were undertaken to observe the performance of this network under different conditions and with different training algorithms.

15

Figure 7a -Distribution of Actual Test Set

Figure 7b - Distribution of outputs of 31-25-1Trained Network when the Test Set was presented to it

Experiment III was conducted without any restriction with respect to learning process of the network. The goal for performance was set to 0 error. Figure 9 shows the error function. As can be observed from Table 2a the network learns the Training Set almost perfectly. This reduction of hidden layer also improves the network’s capability to respond to new data, as can be seen from comparing the last three columns of Table

16

5a, 5b and 5c. The “Correct” classification increases from 74 to 83 percents for Validation Set and from 78 to 89 percents for the Test Set. Experiments III-2 and III-3 were made to explore the impact of S. Figure 8 Performance of a 31-20-1 Network

Figure 9 Performance of a 31-25-1 Network

Another way to improve the performance of a network is to stop over-training the network. Experiments IV, V, and VI were undertaken to explore this possibility. Figure 10 shows the error function for Experiment VI where we stopped learning process when error function fell below 20 percent. It is interesting to note the response of this network after training, to the Training Set itself. Figure 11 shows the distribution of outputs for Training Set. Comparison of Figure 11 with Figure 4 shows the impact of pre-mature

17

training stoppage. In Figure 4 the distribution of outputs is identical with the distribution of inputs, while in Figure 11 the scatter of outputs is self-evident. Tables 2b and 2c show that performance for S = 0.4 improves in the case of Validation Set from 83 to 87 percent but does not change for the Test Set. Figure 10- Performance of a 31-25-1 Network, Goal Set as .2

Experiment VII was undertaken to explore a technique that relies on Validation Set to stop over training. This technique for S = 0.3 predicts 90 percent “Correct” for Validation Set and 89 percent for Test Set.

Figure 11 -Distribution of outputs of the 31-25-1 Trained Network to the Training Set

18

It takes several hours (sometime as much as 5 hours) to train a MLNN of the type discussed above with a desktop personal computer.5 In Experiment VIII we tested a two layers Radial Basis Network (RBN.) RBN is a radically different architecture than the MLNN and can learn the Training Set in about 10 minutes and with excellent results as can be seen from Tables 2a, 2b and 2c. This network for S = 0.2 classifies 93 percent “Correct” for Test Set. Figure 12 shows the distributions of the Training Set and the Test Set being compared with the network’s outputs when these sets were presented to the trained networks. This comparison shows the capability of RBN to classify. An Observation – This preliminary Work6 shows that ANN technology has a potential

to be used as a risk assessment tool to aid child protection agents to make decisions regarding the children at risk The development of any risk assessment tool in any field of interest depends on being able to discover the patterns that distinguish different situations. These patterns can be discovered in two ways. If a verifiable theory regarding the governing mechanism of the system under consideration exists, the classification is straightforward. However, if the system is complex and explanatory theories are lacking, the only other choice for

5

It must be noted that training time is not the same as when we present new data to the network, which takes only a few seconds. 6 Almost simultaneously with the distribution of the first draft of this paper (November 2000) a very significant paper by David Marshall and Diana English appeared in print (Marshall 2000) in which the authors described their use of neural network in the State of Washington. This scholarly paper is very important and supports the notion of the utility of neural network as a risk assessment tool in the field of child abuse. In this paper the authors discussed the advantages of using neural network for the modeling of complex social science data. They applied neural network analysis to Washington State Child Protective Service risk assessment data. This paper discusses various aspects of neural network design and compares the advantages and disadvantages of multi-layer perceptron with a number of popularly used statistical techniques such as logistic regression, etc. An extensive and useful literature survey has been presented They concluded that neural networks can be a useful tool to the analyst seeking to model complex relationships in the risk assessment field related to child abuse.

19

Fig. 12a - Distribution of the Training Set

Fig. 12b - Distribution of outputs of Trained Network When the Training Set Was Presented to It

Fig. 12c – Distribution of the Test Set

Fig. 12d Distribution of the Outputs of Trained Network When the Test Set Was Presented to It

discovery is by a process based on examples of past behavior of the system. ANN is a technology that is capable of discovering patterns in a set of data, if such patterns exist. ANN is an inductive learning system that tacitly discovers the working mechanism of a system from examples of its past behavior. Thus, for complex systems that explanatory theories are lacking, but data regarding the past behavior is available, ANN is a powerful tool of investigation. Child abuse phenomenon belongs to one such system.

References: 1- Kaufman, Adam (2000). Application of Neural Network To Child Abuse Risk Assessment. Philadelphia, PA. University of Pennsylvania (unpublished.)

1

2- Marshall David B. and English, Diana J. (2000). Neural Network Modeling of Risk Assessment in Child Protective Services,” Psychological Methods, Vol 5, No.1, 102-124 3- Sadlak, A.J., Hartman, I., & Schulz, D. (1997). Third National Incidence Study of Child Abuse and Neglect (NIS-3) Public Use Manual. Washington, DC: U.S. Department of Health and Human Services. 4- Stevenson, Wayne, T, Director, Bureau of County Children and Youth Programs, State of Pennsylvania, Private communication, Jan. 4, 2001.

17

Appendix A A Non-technical Description of Artificial Neural Network The task of distinguishing various categories of “objects” is sin-qua-none to all human activities and is fundamental to all sciences. Table 1 shows importance of the capability to classify objects in several different fields.

Table 1- EXAMPLES of Application FIELDS CHILD ABUSE BANKING MEDICINE DEFENCE RECOGNITION INDUSTRY

CLASSES ABUSED VS. NOT-ABUSED ABUSED VS. NOT-ABUSED CANCEROUS VS NOT-CANCEROUS ENEMY VS FRIENDLY AIRCRAFTS GRAND-MOTHER VS NOT GRAND-MOTHER DEFECTIVE VS NORMAL PARTS

For the sake of ease in discourse, let us examine a simple case. Let us assume that we have a container full of oranges and apples and we want to separate them, see Figure 1. A human being, even with a low IQ, can accomplish the task of separating apples from oranges readily, albeit tediously. In contrast, even a modern powerful computer requires massive numbers of equations and instructions to do the same. How does a human being do it? The source of this capability must lie in the brain. It is estimated that an average human brain consists of about 100000000000 neurons1! In this gray mass, each neuron is connected to approximately 10000 other neurons! All that we sense, know, feel and think, including our capability to distinguish patterns, must be processed in this gray mass. It would be instructive to explore how it works. ANN research seeks to unravel this mystery by attempting to model the brain.

. Figure 1- Statement of the Problem

1

I could have written 1011 instead of those zeros, but I think this way the non-technical person would have a better intuitive feeling of the magnitude.

18

Figure 2, which, appears in almost all elementary publications about brain, schematically shows two typical cortical pyramidal neurons. This neuron receives signals from other neurons via its dendrites, processes these signals in its cell body, and if the combined nature of the processed signals meets a certain criterion, the neuron will fire a signal to other neurons. Thus neurons communicate with astonishing results of what we call consciousness.

Figure 2 – Two Typical Neurons in Brain

Figure 3 (Johnson 1997) depicts cellular structure of the human visual cortex based on Golgi stain preparation taken from Conel (1939-67.)

Figure 3- Cellular Structure of Visual Cortex

The development of complexity in brain structure at the early stage of life is fascinating. Two observations regarding the neurons are important to the current discussion: 1) Each neuron performs a simple mechanical function of manipulating signals it receives, and in

19

turn, fires signals, if certain conditions are met (see Figure 4), and 2) There are huge numbers of interconnections among neurons. ANN research is attempting to model these two properties and explore whether meaningful information can be acquired.

Response

Figure 4- Stimulus-Response relationships in a Single Neuron

Figure 5 shows the structure of an artificial neuron. Input 1

Weight 1

Algorithms

Weight R

Bias

Input R Figure 5- AN ARTIFICIAL NEURON

Note that in certain respects, this artificial neuron resembles a biological neuron. It receive R input signals from outside (from senses or other neurons,) manipulates these inputs via the use of some kind of “algorithm” (with different weights for different signals) creating an interim output n, which when presented to a certain “function” produces an output “a” that can be sent to other neurons. In the world of artificial neurons there are many different kinds of “algorithms” and many different “functions,” so that the behavior of any biological neuron, with respect to the signal processing, can be modeled using an artificial neuron.

20

This simple mechanical structure has a very interesting and useful property. It can learn to distinguish between two linearly separable patterns. Let us examine again the apples and oranges problem. A human distinguishes an apple from an orange by looking at their color, shape, texture, etc. Without the loss of generality, let us concentrate only color and shape. Since in this case identification requires two pieces of information, we refer to it as a vector of properties with two elements and represent it by the notation: [shape, color] Obviously, not all apples are alike, neither all oranges. Figure 6 exhibits this variations in a two dimensional space. Despite the variations, the cluster of apples is sufficiently separable from the cluster of oranges that for all practical purposes, we do not have difficulty to separate them. If we take a bunch of apples and a bunch of oranges (called examples of apple and orange types) and somehow measure the color and the shape of these apples and oranges, we can then teach a single neuron to learn (see the section in Training below) to locate the line of separation shown in Figure 6. Thus, if we present to a trained neuron a new apple, it can distinguish it from an orange, by simply finding which side of the line of separation the combined properties of color and shape belong.

Line of separation Oranges Shape Apples

Figure 6- Line of Separation

In practice, what we need to do is to transfer the problem of apples and oranges to an abstract domain in which artificial neuron of the type shown in figure 5 can decide which side of separation line the combined properties of a given object belong. Using vector notations and appropriate metrics to represent the object’s properties allows us to implement this transformation. Of course in the real world the vector of properties may consist of many elements. In general a vector of properties may contain n elements and may be represented by notation [p1 p2 ….pn] , each p representing a different property. For example when the State of Pennsylvania wants to distinguish between children at-risk from abuse and maltreatment versus the children who are not, it collects 15 pieces of information, see Table 2. In this case the vector of priorities has 15 elements - we are dealing with a 15 dimensional space of properties. Of course, we cannot graphically 21

show this space as we did for a two dimensional space of shape and color. We have to envision this space in our minds. Table 2 – Pennsylvania Collects Data on the Following 15 properties to Identify Children-at- Risk Environment: Vulnerability Severity/Frequency and Recentness Prior Abuse Extent of Emotional Harm Caregiver: Age, Physical, Intellectual or Emotional Status Cooperation Parenting Skill/ Knowledge Alcohol/Substance Abuse Access to Children Prior Abuse/Neglect Relationship with Children Family: Family Violence Condition of the Home Family Support Stresses

State of Texas, on the other hand, collects 37 pieces of information to evaluate the situation regarding children at-risk, see Table 3.

Table 3- Texas Collects Data on the Following 15 properties to Identify Children-at- Risk Child Characteristics: Child Age Risk Level Disability Development Behavioral Problems Self-protection fear of Caretaker Severity of CA/N: Dangerous Acts Physical Injury-harm Emotional Harm-abuse Medical care Basic Needs Supervision Hazards in Home sexual Abuse Nonsexual exploitation

Chronicity of CA/N: Victimization of Others Caretaker Impairments Deviant Arousal Substance Abuse History of Domestic Abuse History of Domestic Violence History of CA/N Parenting Skills Nurturance

Recognition of Problem Protection of Child Cooperation with Agency

Caretaker Relationship with Child: Response to Child’s Behavior Attachment-bonding Child’s Role in Family Pressuring Child to react Personal Boundary Issue response to Disclosure Social & Economics: Stress on Caretaker Employment Status Social Support Economic Resources Access: Access to and responsibility for Child.

22

In the complex system of child abuse, the families that potentially may harm their children are not linearly separable from the families that will not. Thus, a single artificial neuron, which can separate only linearly separable (i.e., when a straight line of separation suffice, see Figure 5) objects, is not up to the task. This problem of linearly nonseparable objects occurs even in much simpler situations than child abuse. For instance, look at Figure 6, where two classes of objects are shown in a two dimensional space (X1, X2). No straight line exists that can discriminate between these two classes7. X2

X1

Figure 6- Non-linearly Separable Objects

To implement the separation between the two classes shown in Figure 6 we need three neurons configured in an architecture shown in Figure 7.

weights

N11 A11

X1 X2 N21

Inputs

First Layer

A2

1

N2

a

Output

Second (output) Layer

Figure 7- A Two Layers Neural Network First Layer consists of Two Neurons and the Second (output) Layer consist of one Neuron

7

This figure represents the logic’s XOR statement.

23

Each of the neurons in the first layer generates one separation line, as shown in Figure 8, and the single neuron in the second layer performs an AND function combining appropriate objects to implement discrimination between classes.

X2

X1

Figure 8- Implementation of XOR

Network in Figure 7 is an example of multi-layer neural networks. In general, a multilayer neural network may consist of many neurons arranged in many layers with the capability to discriminate non-linearly separable classes. In the case of social systems, such as child abuse, we are dealing with a multi-dimensional space that cannot be shown graphically. Nevertheless, if there exit differences that discriminate between the families that potentially might harm their children and those that might not, a multi-layer neural network can learn to distinguish them. Training – Neural network is an inductive system. It learns from examples. By “learning,” we mean that an artificial neuron has adjusted its weights so that output “a” , see Figure 9, belongs either to the apple group or to the orange group. There are many algorithms to accomplish this task. The process starts by choosing small random values for weights, then presenting the property vector of an apple to the neuron, calculating “a” (step one), comparing “a” with target “t” for apple, calculating the error “t-a” (step two), and adjusting the weights to minimize some function of the error (step three). This process continues for each apple and each orange in the training lots available to us until error function falls below a certain acceptable level. Figure 9 graphically shows the training process. There are many different functions that can be used. A widely used function is the sum of the squred of errors. When the error function is small enough we say that the network has learnt the data. After the network has been trained, if we present to it a new property vector, representing an object, it can decide to which class that object belongs.

24

In the child protection system a proprty vector may look like: { age of caregiver, vulnerability, prior abuse …. alcohol abuse}8 and the targets may look like: {abused (1), not abused (0)}. Figure 9- Training Process n Examples =ONE EPOCH

Orang Apple

Given by examples

PROPERTY VECTORS TARGETS

OUTPUTS (calculated by

8



t1=0

a1

t2=1

Apple

Orange

….

ti=1

a2…. … ai ……

t2-a2….

steps

.ti-ai….

tn=0

an

one

tn-an

two

ERRORS

t1-a1

Learning Algorithms

Minimizing some function of errors such as the sum of squared errors

If we use Pennsylvania data.

25