Anda di halaman 1dari 16

Discriminant Function Analysis

Overview
Discriminant function analysis, a.k.a. discriminant analysis or DA, is used to classify cases into the values of a categorical dependent, usually a dichotomy. If discriminant function analysis is effective for a set of data, the classification table of correct and incorrect estimates will yield a high percentage correct. Discriminant function analysis is found in SPSS under Analyze, Classify, Discriminant. One gets DA or MDA from this same menu selection, depending on whether the specified grouping variable has two or more categories. Multiple discriminant analysis (MDA) is an extension of discriminant analysis and a cousin of multiple analysis of variance (MANOVA), sharing many of the same assumptions and tests. MDA is used to classify a categorical dependent which has more than two categories, using as predictors a number of interval or dummy independent variables. MDA is sometimes also called discriminant factor analysis or canonical discriminant analysis. There are several purposes for DA and/or MDA: To classify cases into groups using a discriminant prediction equation. To test theory by observing whether cases are classified as predicted. To investigate differences between or among groups. To determine the most parsimonious way to distinguish among groups.

Contents Key concepts and terms Tests of significance Effect size measures Interpreting discriminant functions SPSS output Assumptions Frequently asked questions Bibliography

To determine the percent of variance in the dependent variable explained by the independents.

To determine the percent of variance in the dependent variable explained by the independents over and above the variance accounted for by control

variables, using sequential discriminant analysis. To assess the relative importance of the independent variables in classifying the dependent variable. To discard variables which are little related to group distinctions. To infer the meaning of MDA dimensions which distinguish groups, based on discriminant loadings. Discriminant analysis has two steps: (1) an F test (Wilks' lambda) is used to test if the discriminant model as a whole is significant, and (2) if the F test shows significance, then the individual independent variables are assessed to see which differ significantly in mean by group and these are used to classify the dependent variable. Discriminant analysis shares all the usual assumptions of correlation, requiring linear and homoscedastic relationships, and untruncated interval or near interval data. Like multiple regression, it also assumes proper model specification (inclusion of all important independents and exclusion of extraneous variables). DA also assumes the dependent variable is a true dichotomy since data which are forced into dichotomous coding are truncated, attenuating correlation. DA is an earlier alternative to logistic regression, which is now frequently used in place of DA as it usually involves fewer violations of assumptions (independent variables needn't be normally distributed, linearly related, or have equal withingroup variances), is robust, handles categorical as well as continuous variables, and has coefficients which many find easier to interpret. Logistic regression is preferred when data are not normal in distribution or group sizes are very unequal. However, discriminant analysis is preferred when the assumptions of linear regression are met since then DA has more stattistical power than logistic regression (less chance of type 2 errors - accepting a false null hypothesis). See also the separate topic on multiple discriminant function analysis (MDA) for dependents with more than two categories.

Key Terms and Concepts


Discriminating variables: These are the independent variables, also called predictors. The criterion variable. This is the dependent variable, also called the grouping variable in SPSS. It is the object of classification efforts. Discriminant function: A discriminant function, also called a canonical root, is a latent variable which is created as a linear combination of discriminating (independent) variables, such that L = b1x1 + b2x2 + ... + bnxn + c, where the b's are discriminant coefficients, the x's are discriminating variables, and c is a constant. This is analogous to multiple regression, but the b's are discriminant coefficients which maximize the distance between the means of the criterion (dependent) variable. Note that the foregoing assumes the discriminant function is estimated using ordinary least-squares, the traditional method, but there is also a version involving maximum likelihood estimation.

Pairwise group comparisons display the distances between group means (of the dependent variable) in the multidimensional space formed by the discriminant functions. (Not applicable to two-group DA, where there is only one function). The pairwise group comparisons table gives an F test of significance (based on Mahalanobis distances) of the distance of the group means, enabling the researcher to determine if every group mean is significantly distant from every other group mean. Also, the magnitude of the F values can be used to compare distances between groups in multivariate space. In SPSS, Analyze, Classify, Discriminant; check "Use stepwise method"; click Method, check "F for pairwise distances." Number of discriminant functions. There is one discriminant function for 2-group discriminant analysis, but for higher order DA, the number of functions (each with its own cut-off value) is the lesser of (g - 1), where g is the number of categories in the grouping variable, or p,the number of discriminating (independent) variables. Each discriminant function is orthogonal to the others. A dimension is simply one of the discriminant functions when there are more than one, in multiple discriminant analysis. The first function maximizes the differences between the values of the dependent variable. The second function is orthogonal to it (uncorrelated with it) and maximizes the differences between values of the dependent variable, controlling for the first factor. And so on. Though mathematically different, each discriminant function is a dimension which differentiates a case into categories of the dependent (here, religions) based on its values on the independents. The first function will be the most powerful differentiating dimension, but later functions may also represent additional significant dimensions of differentiation.

The eigenvalue, also called the characteristic root of each discriminant function, reflects the ratio of importance of the dimensions which classify cases of the dependent variable. There is one eigenvalue for each discriminant function. For two-group DA, there is one discriminant

function and one eigenvalue, which accounts for 100% of the explained variance. If there is more than one discriminant function, the first will be the largest and most important, the second next most important in explanatory power, and so on. The eigenvalues assess relative importance because they reflect the percents of variance explained in the dependent variable, cumulating to 100% for all functions. That is, the ratio of the eigenvalues indicates the relative discriminating power of the discriminant functions. If the ratio of two eigenvalues is 1.4, for instance, then the first discriminant function accounts for 40% more between-group variance in the dependent categories than does the second discriiminant function. Eigenvalues are part of the default output in SPSS (Analyze, Classify, Discriminant).

The relative percentage of a discriminant function equals a function's eigenvalue divided by the sum of all eigenvalues of all discriminant functions in the model. Thus it is the percent of discriminating power for the model associated with a given discriminant function. Relative % is used to tell how many functions are important. One may find that only the first two or so eigenvalues are of importance. The canonical correlation, R*, is a measure of the association between the groups formed by the dependent and the given discriminant function. When R* is zero, there is no relation between the groups and the function. When the canonical correlation is large, there is a high correlation between the discriminant functions and the groups. Note that relative % and R* do not have to be correlated. R* is used to tell how much each function is useful in determining group differences. An R* of 1.0 indicates that all of the variability in the discriminant scores can be accounted for by that dimension. Note that for two-group DA, the canonical correlation is equivalent to the Pearsonian correlation of the discriminant scores with the grouping variable.

The discriminant score, also called the DA score, is the value resulting from applying a discriminant function formula to the data for a given case. The Z score is the discriminant score for standardized data. To get discriminant scores in SPSS, select Analyze, Classify, Discriminant; click the Save button; check "Discriminant scores". One can also view the discriminant scores by clicking the Classify button and checking "Casewise results." Cutoff: If the discriminant score of the function is less than or equal to the cutoff, the case is classed as 0, or if above it is classed as 1. When group sizes are equal, the cutoff is the mean of the two centroids (for two-group DA). If the groups are unequal, the cutoff is the weighted mean. Unstandardized discriminant coefficients are used in the formula for making the classifications in DA, much as b coefficients are used in regression in making predictions. The constant plus the sum of products of the unstandardized coefficients with the observations yields the discriminant scores. That is, discriminant coefficients are the regression-like b coefficients in the discriminant

function, in the form L = b1x1 + b2x2 + ... + bnxn + c, where L is the latent variable formed by the discriminant function, the b's are discriminant coefficients, the x's are discriminating variables, and c is a constant. The discriminant function coefficients are partial coefficients, reflecting the unique contribution of each variable to the classification of the criterion variable. The standardized discriminant coefficients, like beta weights in regression, are used to assess the relative classifying importance of the independent variables. If one clicks the Statistics button in SPSS after running discriminant analysis and then checks "Unstandardized coefficients," then SPSS output will include the unstandardized discriminant coefficients.

Standardized discriminant coefficients, also termed the standardized canonical discriminant function coefficients, are used to compare the relative importance of the independent variables, much as beta weights are used in regression. Note that importance is assessed relative to the model being analyzed. Addition or deletion of variables in the model can change discriminant coefficients markedly. As with regression, since these are partial coefficients, only the unique explanation of each independent is being compared, not considering any shared explanation. Also, if there are more than two groups of the dependent, the standardized discriminant coefficients do not tell the researcher between which groups the variable is most or least discriminating. For this purpose, group centroids and factor structure are examined. The standardized discriminant coefficients appear by default in SPSS (Analyze, Classify, Discriminant) in a table of "Standardized Canonical Discriminant Function Coefficients". In MDA, there will be as many sets of coefficients as there are discriminant functions (dimensions).

Functions at group centroids are the mean discriminant scores for each of the dependent variable categories for each of the discriminant functions in MDA. Two-group discriminant analysis has two centroids, one for each group. We want the means to be well apart to show the discriminant function is clearly discriminating. The closer the means, the more errors of classification there likely will be. SPSS generates a table of "Functions at group centroids" by default when Analyze, Classify, Discriminant is invoked.

Discriminant function plots, also called canonical plots, can be created in which the two axes are two of the discriminant functions (the dimensional meaning of which is determined by looking at the structure coefficients, discussed above), and circles within the plot locate the centroids of each category being analyzed. The farther apart one point is from another on the plot, the more the dimension represented by that axis differentiates those two groups. Thus these plots depict discriminant function space. For instance, occupational groups might be located in a space representing educational and motivational dimensions. In the Plots area of the Classify button, one can select Separate-group plots, a Combined-group plot, or a territorial map. Separate and combined group plots show where cases are located in the property space formed by two functions (dimensions). By default, SPSS uses the first two functions. The territorial map shows intergroup distances on the discriminant functions. Each function has a

numeric symbol: 1, 2, 3, etc. Cases falling within the boundaries formed by the 2's, for instance, are classified as 2. The individual cases are not shown in territorial maps under SPSS, however.

Tests of significance

(Model) Wilks' lambda is used to test the significance of the discriminant function as a whole. In SPSS, the "Wilks' Lambda" table will have a column labeled "Test of Function(s)" and a row labeled "1 through n" (where n is the number of discriminant functions). The "Sig." level for this row is the significance level of the discriminant function as a whole. The researcher wants a finding of significance, and the larger the lambda, the more likely it is significant. A significant lambda means one can reject the null hypothesis that the two groups have the same mean discriminant function scores and conclude the model is discriminating. Wilks's lambda is part of the default output in SPSS (Analyze, Classify, Discriminant). In SPSS, this use of Wilks' lambda is in the "Wilks' lambda" table of the output section on "Summary of Canonical Discriminant Functions."

Stepwise Wilks' lambda appears in the "Variables in the Analysis" table of stepwise DA output, after the "Sig. of F. to Remove" column. The Step 1 model will have no entry as removing the first variable is removing the only variable. The Step 2 model will have two predictors, each with a Wilks' lambda coefficient. which represents what model Wilks' lambda would be if that variable were dropped, leaving only the other one. If V1 is entered at Step 1 and V2 is entered at Step 2, then the Wilks' lambda in the "Variables in the Analysis" table for V2 will be identical to the model Wilks' lambda in the ""Wilks' Lambda" table for Step 1, since dropping it would reduce the model to the Step 1 model. The more important the variable in classifying the grouping variable, the higher its stepwise Wilks' lambda. Stepwise Wilks' lambda also appears in the "Variables Not in the Analysis" table of stepwise DA output, after the "Sig. of F to Enter" column. Here the criterion is reversed: the variable with the lowest stepwise Wilks' lambda is the best candidate to add to the model in the next step.

(Model) Wilks' lambda difference tests are also used in a second context to assess the improvement in classification when using sequential discriminant analysis. There is an F test of significance of the ratio of two Wilks' lambdas, such as between a first one for a set of control variables as predictors and a second one for a model including both control variables and independent variables of interest. The second lambda is divided by the first (where the first is the model with fewer predictors) and an approximate F value for this ratio is found using calculations reproduced in Tabachnick and Fidell (2001: 491).

ANOVA table for discriminant scores is another overall test of the DA model. It is an F test, where a "Sig." p value < .05 means the model differentiates discriminant scores between the groups

significantly better than chance (than a model with just the constant). It is obtained in SPSS by asking for Analyze, Compare Means, One-Way ANOVA, using discriminant scores from DA (which SPSS will label Dis1_1 or similar) as dependent.

(Variable) Wilks' lambda also can be used to test which independents contribute significantly to the discrimiinant function. The smaller the variable Wilks' lambda for an independent variable, the more that variable contributes to the discriminant function. Lambda varies from 0 to 1, with 0 meaning group means differ (thus the more the variable differentiates the groups), and 1 meaning all group means are the same. The F test of Wilks's lambda shows which variables' contributions are significant. Wilks's lambda is sometimes called the U statistic. In SPSS, this use of Wilks' lambda is in the "Tests of equality of group means" table in DA output.

Dichotomous independents are more accurately tested with a chisquare test than with Wilks' lambda for this purpose.

Effect size measures

Classification functions: There are multiple methods of actually classifying cases in MDA. Simple classification, also known as Fisher's classification function, simply uses the unstandardized discriminant coefficients. Generalized distance functions are based on the Mahalanobis distance, D-square, of each case to each of the group centroids. K-nearest neighbor discriminant analysis (KNN) is a nonparametric method which assigns a new case to the group to which its k neighest neighbors also belong. The KNN method is popular when there are inadequate data to define the sample means and covariance matrices. There are other methods of classification. The classification table, also called a classification matrix, or a confusion, assignment, or prediction matrix or table, is used to assess the performance of DA. This is simply a table in which the rows are the observed categories of the dependent and the columns are the predicted categories of the dependents. When prediction is perfect, all cases will lie on the diagonal. The percentage of cases on the diagonal is the percentage of correct classifications. This percentage is called the hit ratio.

Expected hit ratio. Note that the hit ratio must be compared not to zero but to the percent that would have been correctly classified by chance alone. For two-group discriminant analysis with a 50-50 split in the dependent variable, the expected percent is 50%. For unequally split 2-way groups of different sizes, the expected percent is computed in the "Prior Probabilities for Groups" table in SPSS, by multiplying the prior probabilities times the group size, summing for all groups, and dividing the sum by N. If group sizes are known a priori, the best strategy by chance is to pick the largest group for all cases, so the expected percent is then the largest group size divided by N.

Cross-validation. Leave-one-out classification is available as a form of cross-validation of the classification table. Under this option, each case is classified using a discriminant function based on all cases except the given case. This is thought to give a better estimate of what classificiation results would be in the population. In SPSS, select Analyze, Classify, Discriminant; select variables; click Classify; select Leave-one-out classification; Continue; OK. Measures of association can be computed by the crosstabs procedure in SPSS if the researcher saves the predicted group membership for all cases. In SPSS, select Analyze, Classify, Discriminant; select variables; click Save; select Discriminant scores; Continue; OK.

Mahalanobis D-Square, Rao's V, Hotelling's trace, Pillai's trace, and Roys gcr are indexes other than Wilks' lambda of the extent to which the discriminant functions discriminate between criterion groups. Each has an associated significance test. A measure from this group is sometimes used in stepwise discriminant analysis to determine if adding an independent variable to the model will significantly improve classification of the dependent variable. SPSS uses Wilks' lambda by default but offers Mahalanobis distance, Rao's V, unexplained variance, and smallest F ratio also. Canonical correlation, Rc: Squared canonical correlation, Rc2, is the percent of variation in the dependent discriminated by the set of independents in DA or MDA. The canonical correlation of each discriminant function is also the correlation of that function with the discriminant scores. A canonical correlation close to 1 means that nearly all the variance in the discriminant scores can be attributed to group differences. The canonical correlation of any discriminant function is displayed in SPSS by default as a column in the "Eigenvalues" output table. Note the canonical correlations are not the same as the correlations in the structure matrix, discussed below. Structure coefficients and structure matrix. Structure coefficients, also called structure correlations or discriminant loadings, are the correlations between a given independent variable and the discriminant scores associated with a given discriminant function. They are used to tell how closely a variable is related to each function in MDA. Looking at all the structure coefficients for a function allows the researcher to assign a label to the dimension it measures, much like factor loadings in factor analysis. A table of structure coefficients of each variable with each discriminant function is called a canonical structure matrix or factor structure matrix. The structure coefficients are whole (not partial) coefficients, similar to correlation coefficients, and reflect the uncontrolled association of the discriminating variables with the criterion variable, whereas the discriminant coefficients are partial coefficients reflecting the

Interpreting discriminant functions

unique, controlled association of the discriminating variables with the criterion variable, controlling for other variables in the equation. Technically, structure coefficients are pooled within-groups correlations between the independent variables and the standardized canonical discriminant functions. When the dependent has more than two categories there will be more than one discriminant function. In that case, there will be multiple columns in the table, one for each function. The correlations then serve like factor loadings in factor analysis -- by considering the set of variables that load most heavily on a given dimension, the researcher may infer a suitable label for that dimension. The structure matrix correlations appear in SPSS output in the "Structure Matrix" table, produced by default under Analyze, Classify, Discriminant. Thus for two-group DA, the structure coefficients show the order of importance of the discriminating variables by total correlation, whereas the standardized discriminant coefficients show the order of importance by unique contribution. The sign of the structure coefficient also shows the direction of the relationship. For multiple discriminant analysis, the structure coefficients additionally allow the researcher to see the relative importance of each independent variable on each dimension.

Structure coefficients vs. standardized discriminant function coefficients. The standardized discriminant function coefficients indicate the semi-partial contribution (the unique, controlled association) of each variable to the discriminant function(s), controlling the independent but not the dependent for other independents entered in the equation (just as regression coefficients are semi-partial coefficients). In contrast, structure coefficients are whole (not partial) coefficients, similar to correlation coefficients, and reflect the uncontrolled association of the discriminant scores with the criterion variable. That is, the structure coefficients indicate the simple correlations between the variables and the discriminant function or functions. The structure coefficients should be used to assign meaningful labels to the discriminant functions. The standardized discriminant function coefficients should be used to assess the importance of each independent variable's unique contribution to the discriminant function. Mahalanobis distances are used in analyzing cases in discriminant analysis. For instance, one might wish to analyze a new, unknown set of cases in comparison to an existing set of known cases. Mahalanobis distance is the distance between a case and the centroid for each group (of the dependent) in attribute space (n-dimensional space defined by n variables). A case will have one Mahalanobis distance for each group, and it will be classified as belonging to the group for which its Mahalanobis distance is smallest. Thus, the smaller the Mahalanobis distance, the closer the case is to the group centroid and the more likely it is to be classed as belonging to that group. Since Mahalanobis distance is measured in terms of standard deviations from the centroid, therefore a case which is more than 1.96 Mahalanobis distance units from the centroid has less than .05 chance of belonging to the group represented by the centroid; 3 units

would likewise correspond to less than .01 chance. SPSS reports squared Mahalanobis distance: click the Classify button and then check "Casewise results."

Wilks's lambda tests the significance of each discriminant function in MDA -- specifically the significance of the eignevalue for a given function. It is a measure of the difference between groups of the centroid (vector) of means on the independent variables. The smaller the lambda, the greater the differences. Lambda varies from 0 to 1, with 0 meaning group means differ (thus the more the variable differentiates the groups), and 1 meaning all group means are the same. The Bartlett's V transformation of lambda is then used to compute the significance of lambda. Wilks's lambda is used, in conjunction with Bartlett's V, as a multivariate significance test of mean differences in MDA, for the case of multiple interval independents and multiple (>2) groups formed by the dependent. Wilks's lambda is sometimes called the U statistic. A hold-out sample is often used for validation of the discriminant function. This is a split halves test, were a portion of the cases are assigned to the analysis sample for purposes of training the discriminant function, then it is validated by assessing its performance on the remaining cases in the hold-out sample.

Validation

SPSS Output Examples


Discriminant Function Analysis (two groups) Multiple Discriminant Function Analysis (three groups)

Assumptions

Proper specification. The discriminant coefficients can change substantially if variables are added to or subtracted from the model. True categorical dependents. The dependent variable is a true dichotomy. When the range of a true underlying continuous variable is constrained to form a dichotomy, correlation is attenuated (biased toward underestimation). One should never dichotomize a continuous variable simply for the purpose of applying discriminant function analysis. To a progressively lesser extent, the same considerations apply to trichotomies and higher. All cases must belong to a group formed by the dependent variable. The groups must be mutually exclusive, with every case belonging to only one group. Independence. All cases must be independent. Thus one cannot have correlated data (not before-after, panel, or matched pairs data, for instance). No lopsided splits. Group sizes of the dependent are not grossly different. If this assumption is violated, logistic regression is preferred. Some authors use 90:10 or worse as the criterion.

Adequate sample size.There must be at least two cases for each category of the dependent and the maximum number of independents is sample size minus 2. However, it is recommended that there be at least four or five times as many cases as independent variables. Interval data. The independent variable is or variables are interval. As with other members of the regression family, dichotomies, dummy variables, and ordinal variables with at least 5 categories are commonly used as well. Variance. No independents have a zero standard deviation in one or more of the groups formed by the dependent. Random error Errors (residuals) are randomly distributed. Homogeneity of variances (homoscedasticity): Within each group formed by the dependent, the variance of each interval independent should be similar between groups. That is, the independents may (and will) have different variances one from another, but for the same independent, the groups formed by the dependent should have similar variances and means on that independent. Discriminant analysis is highly sensitive to outliers. Lack of homogeneity of variances may indicate the presence of outliers in one or more groups. Lack of homogeneity of variances will mean significance tests are unreliable, especially if sample size is small and the split of the dependent variable is very uneven. Lack of homogeneity of variances and presence of outliers can be evaluated through scatterplots of variables. Homogeneity of covariances/correlations: within each group formed by the dependent, the covariance/correlation between any two predictor variables should be similar to the corresponding covariance/correlation in other groups. That is, each group has a similar covariance/correlation matrix as reflected in the log determinants (see "Large samples" discussion above).

Box's M tests the null hypothesis that the covariance matrices do not differ between groups formed by the dependent. This is an assumption of discriminant analysis. Box's M uses the F distribution. If p(M)<.05, then the variances are significantly different. The researcher wants M not to be significant, rejecting the null hypothesis that the variances of the independents among categories of the categorical dependent are not homogenous. That is, the researcher wants this test not to be significant, so as to accept the null hypothesis that the groups do not differ. Thus, the probability value of this F should be greater than .05 to demonstrate that the assumption of homoscedasticity is upheld. This test is very sensitive to meeting also the assumption of multivariate normality. Note, though, that DA can be robust even when this assumption is violated. In SPSS, select Analyze, Classify, Discriminant; click the Statistics button; check Box's M. Large samples. Where sample size is large, even small differences in covariance matrices may be found significant by Box's M, when in fact no substantial problem of violation of assumptions exists. Therefore, the researcher should also look at the log determinants of the group covariance matrices, which are printed along with Box's M. If the group

log determinants are similar, then a significant Box's M for a large sample is usually ignored. Dissimilar log determinants indicates violation of the assumption of equal variance covariance matrices, leading to greater classification errors (specifically, DA will tend to classify cases in the group with the larger variability). When violation occurs, quadratic DA may be used (not support by SPSS as of Version 13).

Absence of perfect multicollinearity. If one independents is very highly correlated with another, or one is a function (ex., the sum) of other independents, then the tolerance value for that variable will approach 0 and the matrix will not have a unique discriminant solution. Such a matrix is said to be ill-conditioned. Tolerance is discussed in the section on regression. Low multicollinearity of the independents. To the extent that independents are correlated, the standardized discriminant function coefficients will not reliably assess the relative importance of the predictor variables. In SPSS, one check on multicollinearity is looking at the "pooled within-groups correlation matrix," which is output when one checks "Within-groups correlation" from the Statistics button in the DA dialog. "Pooled" refers to averaging across groups formed by the dependent. Note that pooled correlation can be very different from normal (total) correlation when two variables are less correlated within groups than between groups (ex., race and illiteracy are little correlated within region, but the total r is high because there are proportionately more blacks in the South where illiteracy is high). When assessing the correlation matrix for multicollinearitym a rule of thumb is no r > .90 and not several > .80. Assumes linearity (does not take into account exponential terms unless such transformed variables are added as additional independents). Assumes additivity (does not take into account interaction terms unless new crossproduct variables are added as additional independents). For purposes of significance testing, predictor variables follow multivariate normal distributions. That is, each predictor variable has a normal distribution about fixed values of all the other independents. As a rule of thumb, discriminant analysis will be robust against violation of this assumption if the smallest group has more than 20 cases and the number of independents is fewer than six. When non-normality is caused by outliers rather than skewness, violation of this assumption has more serious consequences as DA is highly sensitive to outliers. If this assumption is violated, logistic regression is preferred.

Frequently Asked Questions

Isn't discriminant analysis the same as cluster analysis? No. In discriminant analysis the groups (clusters) are determined beforehand and the object is to determine the linear combination of independent variables which best discriminates among the groups. In cluster analysis the groups (clusters) are not predetermined and in fact the object is to determine the best way in which cases may be clustered into groups.

When does the discriminant function have no constant term? When the data are standardized or are deviations from the mean. How important is it that the assumptions of homogeneity of variances and of multivariate normal distribution be met? Lachenbruch (1975) indicates that DA is relatively robust even when there are modest violations of these assumptions. Klecka (1980) points out that dichotomous variables, which often violate multivariate normality, are not likely to affect conclusions based on DA.

In DA, how can you assess the relative importance of the discriminating variables? The same as in regression, by comparing beta weights, which are the standardized discriminant coefficients. If not output directly by one's statistical package (SPSS does), one may obtain beta weights by running DA on standardized scores. That is, betas are standardized discriminant function coefficients. The ratio of the betas is the relative contribution of each variable. Note that the betas will change if variables are added or deleted from the equation. Dummy variables. As in regression, dummy variables must be assessed as a group, not on the basis of individual beta weights. This is done through hierarchical discriminant analysis, running the analysis first with, then without the set of dummies. The difference in the squared canonical correlation indicates the explanatory effect of the set of dummies. Alternatively, for interval independents, one can correlate the discriminant function scores with the independents. The discriminating variables which matter the most to a particular function will be correlated highest with the DA scores.

In DA, how can you assess the importance of a set of discriminating variables over and above a set of control variables? (What is sequential discriminant analysis?) As in sequential regression, in sequential discriminant analysis, control variables may be entered as independent variables separately first. In a second run, the discriminating variables of interest may be entered. . The difference in the squared canonical correlation indicates the explanatory effect of discriminating variables over and above the set of control variables. Alternatively, one could compare the hit rate in the two classification tables.

What is the maximum likelihood estimation method in discriminant analysis (logistic discriminate function analysis)? Using MLE, a discriminant function is a function of the form T = k1X1 + k2X2 + ... + knXn, where X1...Xn are the differences between the two groups on the ith independent variable, k1...kn are the logit coefficients, and T is a function which classes the case into group 0 or group 1. If the data are unstandardized, there is also a constant term. The discriminant function arrives at coefficients which set the highest possible ratio of

between-group to within-groups variance (similar to the ANOVA F test, except that in DA the group variable is the dependent rather than the independent). This method, called logistic discriminant function analysis, is supported by SPSS. What are Fisher's linear discriminant functions? The classical method of discriminant classification calculated one set of discriminant function coefficients for each dependent category, using these to make the classifications. SPSS still outputs these coefficients if you check the "Fisher's" box under the Statistics option in discriminant function analysis. This outputs a table with the discriminant functions (dimensions) as columns and the independent variables plus constant as rows. The Fisher coefficients are used down the columns to compute a discriminant score for each dimension and the case is classified in the group generating the highest score. This method gives the same results as using the discriminant function scores but is easier to compute.

What is stepwise DA? Stepwise procedures select the most correlated independent first, remove the variance in the dependent, then select the second independent which most correlates with the remaining variance in the dependent, and so on until selection of an additional independent does not increase the Rsquared (in DA, canonical R-squared) by a significant amount (usually signif=.05). As in multiple regression, there are both forward (adding variables) and backward (removing variables) stepwise versions. In SPSS there are several available criteria for entering or removing new variables at each step: Wilks lambda, unexplained variance, Mahalanobis distance, smallest F ratio, and Raos V. The researcher typically sets the critical significance level by setting the "F to remove" in most statistical packages. Stepwise procedures are sometimes said to eliminate the problem of multicollinearity, but this is misleading. The stepwise procedure uses an intelligent criterion to set order, but it certainly does not eliminate the problem of multicollinearity. To the extent that independents are highly intercorrelated, the standard errors of their standardized discriminant coefficients will be inflated and it will be difficult to assess the relative importance of the independent variables. The researcher should keep in mind that the stepwise method capitalizes on chance associations and thus significance levels are worse (that is, numerically higher) than the true alpha significance rate reported. Thus a reported significance level of .05 may correspond to a true alpha rate of . 10 or worse. For this reason, if stepwise discriminant analysis is employed, use of cross-validation is recommended. In the split halves method, the original dataset is split in two at random and one half is used to develop the discriminant equation and the other half is used to validate it.

I have heard DA is related to MANCOVA. How so?

Discriminant analysis can be conceptualized as the inverse of MANCOVA. MANCOVA can be used to see the effect on multiple dependents of a single categorial independent, while DA can be used to see the effect on a categorical dependent of multiple interval independents. The SPSS MANOVA procedure, which also covers MANCOVA, can be used to generate discriminant functions as well, though in practical terms this is not the easiest route for the researcher interested in DA.

How does MDA work? A first function is computed on which the group means are as different as possible. A second function is then computed uncorrelated with the first, then a third function is computed uncorrelated with the first two, and so on, for as many functions as possible. The maximum number of functions is the lesser of g - 1 (number of dependent groups minus 1) or k (the number of independent variables).

How can I tell if MDA worked? SPSS will print out a table of Classification Results, in which the rows are Actual and the columns are Predicted. The better MDA works, the more the cases will all be on the diagonal. Also, below the table SPSS will print the percent of cases correctly classified.

For any given MDA example, how many discriminant functions will there be, and how can I tell if each is significant? The answer is min(g-1,p), where g is the number of groups (categories) being discriminated and p is the number of predictor (independent variables). The min() function, of course, means the lesser of. SPSS will print Wilks's lambda and its significance for each function, and this tests the significance of the discriminant functions.

In MDA there will be multiple discriminant functions, so therefore there will be more than one set of unstandardized discriminant coefficients, and for each case a discriminant score can be obtained for each of the multiple functions. In dichotomous discriminant analysis, the discriminant score is used to classify the case as 0 or 1 on the dependent variable. But how are the multiple discriminant scores on a single case interpreted in MDA? Take the case of three discriminant functions with three corresponding discriminant scores per case. The three scores for a case indicate the location of that case in three-dimensional discriminant space. Each axis represents one of the discriminant functions, roughly analogous to factor axes in factor analysis. That is, each axis represents a dimension of meaning whose label is attributed based on inference from the structure coefficients. One can also locate the group centroid for each group of the dependent in discriminant space in the same manner. In the case of two discriminant functions, cases or group centroids may be plotted on a two-dimensional scatterplot of discriminant space (a canonical plot). Even when there are more than two functions,

interpretation of the eigenvalues may reveal that only the first two functions are important and worthy of plotting. Likewise in MDA, there are multiple standardized discriminant coefficients one set for each discriminant function. In dichotomous DA, the ratio of the standardized discriminant coefficients is the ratio of the importance of the independent variables. But how are the multiple set of standardized coefficients interpreted in MDA? In MDA the standardized discriminant coefficients indicate the relative importance of the independent variables in determining the location of cases in discriminant space for the dimension represented by the function for that set of standardized coefficients.

Are the multiple discriminant functions the same as factors in principalcomponents factor analysis? No. There are conceptual similarities, but they are mathematically different in what they are maximizing. MDA is maximizing the difference between values of the dependent. PCA is maximizing the variance in all the variables accounted for by the factor.

Bibliography

George H. Dunteman (1984). Introduction to multivariate analysis. Thousand Oaks, CA: Sage Publications. Chapter 5 covers classification procedures and discriminant analysis. Huberty, Carl J. (1994). Applied discriminant analysis . NY: Wiley-Interscience. (Wiley Series in Probability and Statistics). Klecka, William R. (1980). Discriminant analysis. Quantitative Applications in the Social Sciences Series, No. 19. Thousand Oaks, CA: Sage Publications. Lachenbruch, P. A. (1975). Discriminant analysis. NY: Hafner. McLachlan, Geoffrey J. (2004). Discriminant analysis and statistical pattern recognition. NY: Wiley-Interscience. (Wiley Series in Probability and Statistics). Press, S. J. and S. Wilson (1978). Choosing between logistic regression and discriminant analysis. Journal of the American Statistical Association, Vol. 73: 699-705. The authors make the case for the superiority of logistic regression for situations where the assumptions of multivariate normality are not met (ex., when dummy variables are used), though discriminant analysis is held to be better when assumptions are met. They conclude that logistic and discriminant analyses will usually yield the same conclusions, except in the case when there are independents which result in predictions very close to 0 and 1 in logistic analysis. Tabachnick, Barbara G. and Linda S. Fidell (2001). Using multivariate statistics, Fourth ed. (Boston: Allyn and Bacon). chapter 11 covers discriminant analysis.

Anda mungkin juga menyukai