Chapter 5: Discriminant Analysis

Chapter 5: Discriminant Analysis Discriminant Analysis is the appropriate statistical technique when the dependent variable is categorical and the ind...
Author: Daniel Fisher
3 downloads 3 Views 257KB Size
Chapter 5: Discriminant Analysis Discriminant Analysis is the appropriate statistical technique when the dependent variable is categorical and the independent variables are quantitative. In many cases, the dependent variable consists of two groups or classifications, for example, male versus female, high versus low or good credit risk versus bad credit risk. In other instances, more than two groups are involved, such as a threegroup classification involving low, medium and high classifications. The basic purpose of discriminant analysis is to estimate the relationship between a single categorical dependent variable and a set of quantitative independent variables. Discriminant analysis has widespread application in situations where the primary objective is identifying the group to which an object (eg. person, firm or product) belongs. Potential applications include predicting the success or failure of a new product, deciding whether a student should be admitted to graduate school, classifying students as to vocational interests, determining what category of credit risk a person falls into or predicting whether a firm will be a success or not. Discriminant analysis is capable of handling either two groups or multiple groups. When three or more classifications are identified, the technique is referred to as multiple discriminant analysis (MDA). Discriminant analysis involves deriving a variate, the linear combination of the two (or more) independent variables that will discriminate best between defined groups. The linear combination for a discriminant analysis, also known as the discriminant function, is derived from an equation that takes the following form: 75

Z where: Z Wi Xi

= W1X1 + W2X2 + W3X3 + … + WnXn = Discriminant score = Discriminant weight for variable i = Independent variable i

Discriminant analysis is the appropriate statistical technique for testing the hypotheses that the group means of a set of independent variables for two or more groups are equal. This group mean is referred to as a centroid. The centroids indicate the most typical location of any individual from a particular group, and a comparison of the group centroids shows how far apart the groups are along the dimension being tested. A situation where there are three groups (1, 2 and 3) and two independent variables (X1, and X2) is plotted below.

The test for the statistical significance of the discriminant function is a generalized measure of the distance between the group centroids. If the overlap in the distribution is small, the 76

discriminant function seperates the groups well. If the overlap is large, the function is a poor discriminator between the groups. Multiple discriminant analysis is unique in one characteristic among the dependence relationships we will study. If there are more than two groups in the dependent variable, discriminant analysis will calculate more than one discriminant function. In fact, it will calculate NG-1 functions, where NG is the number of groups. Step 1: Objectives Of Discriminant Analysis Discriminant Analysis can address any of the following research questions: • Determining whether statistically significant differences exist between the average score profiles on a set of variables for two (or more) defined groups. • Determining which of the independent variables account the most for the differences in the average score profiles of the two or more groups. • Establishing procedures for classifying statistical units (individuals or objects) into groups on the basis of their scores on a set of independent variables. • Establishing the number and composition of the dimensions of discrimination between groups formed from the set of independent variables. HATCO Example One of the customer characteristics obtained by HATCO in it’s survey was a categorical variable (X11) indicating which purchasing approach a firm used: total value analysis (X11 = 1) versus specification buying (X11 = 0). HATCO management team 77

expects that firms using these two approaches would emphasize different characteristics of suppliers in their selection decision. The objective is to identify the perceptions of HATCO (X1 to X7) that differ significantly between firms using the two purchasing methods. The company would then be able to tailor sales presentations and benefits offered to best match the buyer’s perceptions. Step 2: Research Design for Discriminant Analysis The number of dependent variable groups (categories) can be two or more, but these groups must be mutually exclusive and exhaustive. When three or more categories are created, the possibility arises of examining only the extreme groups in a two-group discriminant analysis. This procedure is called the polar-extremes approach. This involves comparing only the extreme two groups and excluding the middle group from the discriminant analysis. The polar-extremes approach may be useful if we had three groups of cola drinkers: light, medium and heavy and there was considerable overlap between the three categories. We may not be able to clearly discriminate between the three groups, but the differences between light and heavy users may be more pronounced. Independent variables are usually selected in two ways: either from previous research or from intuition – selecting variables for which no previous research or theory exists but that might logically be related to predicting the groups for the dependent variable. Discriminant analysis is quite sensitive to the ratio of sample size to the number of predictor variables. Many studies suggest a ratio 78

of 20 observations for each predictor variable, although this will often be unachievable. At a minimum though, the smallest group size must exceed the number of independent variables. Many times the sample is divided into two subsamples, one used for estimation of the discriminant function (the analysis sample) and another for validation purposes (the holdout sample). This method of validating the function is referred to as the split-sample or cross-validation approach. No definite guidelines have been established for dividing the sample into analysis and holdout groups. The most popular procedure is to divide the total group so that one-half of the respondents are placed in the analysis sample and the other half are placed in the holdout sample. Some researchers prefer a 60-40 or a 75-25 split however. When selecting the individuals for the analysis and holdout groups, one usually follows a proportionately stratified sampling procedure, ie. if a sample consists of 40 males and 60 females, the holdout sample should consist of 20 males and 30 females. If the sample size isn’t large enough to split in this way (if n < 100) then one compromise would be to develop the function on the entire sample and then use the function to classify the same group used to develop the function. This gives an inflated idea of the predictive accuracy of the function though. Example (HATCO, continued) The discriminant analysis will use the first seven variables from the database (X1 to X7) to discriminate between firms applying each purchasing method (X11). Also, the sample of 100 observations meets the suggested minimum size and provides a 15to-1 ratio of observations to independent variables. 79

We can split the sample size of 100 into an analysis sample of 60 objects and a holdout sample of 40 objects. We should also make sure that we split the total sample using a proportionately stratified sampling procedure, although we should make sure that the split is performed randomly to negate any possible bias in the ordering of our data. Step 3: Assumptions of Discriminant Analysis The key assumptions for deriving the discriminant function are multivariate normality of the independent variables and unknown (but equal) dispersion and covariance matrices for the groups. Data not meeting the multivariate normality assumption can cause problems in the estimation of the discriminant function. Therefore, it is suggested that logistic regression be used as an alternative technique, if possible. Unequal covariance matrices can adversely affect the classification process. If the sample sizes are small and the covariance matrices are unequal, then the statistical significance of the estimation process is adversely affected. But more likely is the case of unequal covariances among groups of adequate sample size, whereby observations are “overclassified” into the groups with larger covariance matrices. Another characteristic of the data that can affect the results is multicollinearity among the independent variables. Finally, an implicit assumption is that all relationships are linear. Nonlinear relationships are not reflected in the discriminant function unless specific variable transformations are made to represent nonlinear effects. Example (HATCO continued)

80

Our previous examinations of the HATCO data set indicated no problems with multicolinearity, and tests on the assumptions of normality were also performed in Chapter 2. There was not sufficient evidence to stop us proceeding with our analysis. Step 4: Estimation Of The Discriminant Function and Assessment Of Overall Fit Simultaneous estimation involves computing the discriminant function so that all of the independent variables are considered concurrently. Thus the discriminant function is computed based on the entire set of independent variables, regardless of the discriminating power of each independent variable. The simultaneous method is appropriate when, for theoretical reasons, the analyst wants to include all the independent variables in the analysis and is not interested in seeing intermediate results based only on the most discriminating variables. Stepwise estimation is an alternative to the simultaneous approach. It involves entering the independent variables into the discriminant function one at a time on the basis of their discriminating power. The stepwise procedure begins by choosing the single best discriminating variable. The initial variable is then paired with each of the other independent variables one at a time, and the variable that is best able to improve the discriminating power of the function in combination with the first variable is chosen. Eventually, either all independent variables will have been included in the function or the excluded variables will have been judged as not contributing significantly to further discrimination. The reduced set of variables is typically almost as good, and sometimes better than, the complete set of variables. Wilks’ lambda, Hotelling’s trace and Pilliai’s criteria all evaluate the statistical significance of the discriminatory power of the

81

discriminant function(s). Roy’s greatest characteristic root evaluates only the first discriminant function. Assessing Overall Fit As discussed earlier, the discriminant Z scores of any discriminant function can be calculated for each observation by the following formula: Zjk = a + W1X1k + W2X2k + … + WnXnk where Zjk = discriminant Z score of discriminant function j for object k a = intercept Wi = discriminant coefficient for independent variable i Xik = independent variable i for object k This score provides a direct means of comparing observations on each function. The statistical tests for assessing the significance of the discriminant functions do not tell how well the function predicts. We may have group means that are virtually identical, but find significant results because our sample size was large. To determine the predictive accuracy of a discriminant function, the analyst must construct classification matrices. A classification matrix is a matrix containing numbers that reveal the predictive ability of the discriminant function. The numbers on the diagonal of the matrix represent the number of correct classifications, with the off-diagonal numbers representing misclassifications. Before a classification matrix can be constructed, however, the analyst must determine which group to assign each individual. If we have two groups (A and B) and a discriminant function for each

82

group (ZA and ZB) we will assign each individual into the group on which it has the higher discriminant score. The optimal solution must also consider the cost of misclassifying an individual into the wrong group. If the costs of misclassifying an individual are approximately equal, the optimal solution will be the one that will misclassify the fewest number of individuals in each group. If the misclassification costs are unequal, the optimum solution will be the one that minimizes the cost of misclassification. If the analyst is unsure if the observed proportions in the sample are representative of the population proportions, then equal probabilities should be employed. However, if the sample is randomly drawn from the population so that the groups do estimate the population proportions in each group, then the best estimates of actual group sizes and the prior probabilities are not equal values but, instead, the sample proportions. To validate the discriminant function through the use of classification matrices, the sample should have been randomly divided into two groups. One of the groups (the analysis sample) is used to compute the discriminant function. The other group (the holdout, or validation sample) is retained for use in developing the classification matrix. The procedure involves multiplying the weights generated by the analysis sample by the raw variable measurements of the holdout sample. Then the individual discriminant scores for the holdout sample are calculated and each individual is assigned to the group on which it has the higher discriminant score. A statistical test for the discriminatory power of the classification matrix is Press’s Q statistic. This simple measure compares the number of correct classifications with the total sample size and the number of groups. The calculated value is then compared with a 83

critical value from the Chi-Square distribution with 1 degree of freedom. If this value exceeds this critical value, the classification matrix can be deemed statistically better than chance. The Q statistic is calculated by the following formula: Press’s Q = [N – (nK)]2 N(K – 1) where N = total sample size n = number of observations correctly classified K = number of groups One must be careful in drawing conclusions based solely on this statistic, however, because as the sample size becomes larger, a lower classification rate will be deemed significant. Example (HATCO continued) Firstly we will examine the group means for each of the independent variables based on the 60 observations constituting the analysis sample. The comparison of group means is performed in the table below: Variable X1, Delivery Speed X2, Price Level X3, Price Flexibility X4, Mnufctrer Image X5, Service X6, Salesforce Image X7, Product Quality

X11=0 X11=1 F-value Significance 2.712 4.3343 36.53

Suggest Documents