Anda di halaman 1dari 19

The Journal of Productivity Analysis, 6, 27-45 (1995) 9 1995 Kluwer Academic Publishers, Boston. Manufactured in the Netherlands.

Detecting Influential Observations in Data Envelopment Analysis


PAUL W. WILSON*

Department of Economics, University of Texas, Austin, Texas 78712

wilson@mundo.eco,utexas.edu

Abstract
This paper provides diagnostic tools for examining the role of influential observations in Data Envelopment Analysis (DEA) applications. Observations may be prioritized for further scrutiny to see if they are contaminated by data errors; this prioritizatJon is important in situations where data-checking is costly and resources are limited. Several empirical examples are provided using data from previously published studies.

Keywords: DEA, influential observations, outliers,

1. Introduction Deterministic frontier methods have become widely used in measuring efficiency in production [see Lovell, (1993) for examples of applications]. These methods typically involve constructing a deterministic frontier and measuring efficiency in terms of distances in input/output space from the frontier. The deterministic frontier is often constructed via linear programming (LP) techniques (these approaches have been termed Data Envelopment Analysis, or DEA), although other techniques such as the Free Disposal Hull (FDH) of Deprins et al. (1984) are sometimes used. Unfortunately, the deterministic nature of the frontiers means that data errors (resulting from measurement errors, coding errors, or other problems) in observations on decision-making units (DMUs) that support the deterministic frontier may severely distort measured efficiency scores for some (or perhaps all) of the remaining DMUs. This is analogous to the problem of outliers in classical linear regression (CLR) models. Outliers are observations that do not fit in with the pattern of the remaining data points and are not at all typical of the rest of the data (Gunst and Mason, 1980). Rousseeuw and van Zomeren (1990) refer to certain types of oufliers as leverage points; these observations have a disproportionate effect on the estimated slope of the regression line in CLR models, and thus may pose a more serious problem than other outliers which might only affect the estimated intercept term. Outliers (including leverage points) are sometimes called influential observations because they have a disproportionate effect in determining the estimated results in CLR models. In this paper, influential observations are defined as those
*This research was performed while under contract with the Management Science Group, U.S. Department of Veterans Affairs, Bedford, MA 01730. Shawna Grosskopf and Richard Grabowski graciously provided data used in two of the empirical examples.

28

P.W. WILSON

sample observations which play a relatively large role in determining estimated efficiency scores for at least some other observations in the observed sample. Thus, influential observations in the context of this paper will typically be outliers which support the deterministic frontier used to measure efficiency (influential observations owe their influence to the fact that they are outliers). Some outliers may result from recording or measurement errors and should be corrected, if possible, or deleted from the data. However, if data are viewed as having come from some probability distribution, then it is quite possible to observe points with low probability; one would not expect to observe many such points, given their low probability, and hence they appear as outliers. Cook and Weisberg (1982) observed that outliers of this type may lead to the recognition of important phenomena that might otherwise go unnoticed. Outliers of this type might represent the most interesting part of the data. While a plethora of diagnostic tools exist with which to examine and evaluate results from CLR models,1 there is a distinct lack of such tools suitable for applications where efficiency is measured relative to a deterministic frontier. This paper redresses part of this deficiency by suggesting a methodology to detect influential observations among a sample of DMUs. Note, however, that it will remain for the analyst to decide what to do with any influential observations that are found. The usefulness of the techniques presented below is in identifying observations which deserve closer scrutiny to determine whether the observations are real, or whether they result from data errors. Since data-checking is often costly, particularly in large samples, identifying and prioritizing observations for checking may save considerable resources. Standard methods for detecting outliers in linear regression models using ordinary least squares (OLS) residuals cannot be adapted to DEA or FDH models due to the lack of stochastic structure [deterministic parametric models estimated by corrected OLS (COLS) such as those estimated by Greene, (1980), Martin and Page, (1983), Seaver and Triantis, (1989), and others do not have this problem). Sexton (1986) illustrates the problem caused by outliers in the measurement of technical efficiency by estimating a DEA model using data reported in Charnes et al. (1981) and showing what happens to the estimated efficiency scores when one observation is deliberately miscoded. Sexton counsels the use of preprocessing error detection routines whenever possible, but makes no specific suggestions other than noting that the standard techniques based on OLS residuals cannot be used. In the case of large samples, it may not be possible to check each individual observation for accuracy, and so some method of prioritizing observations for checking is required. Several DEA studies (e.g., Charnes and Neralid, 1990) have used sensitivity analysis to gauge the robustness of results from DEA models. Charnes et al. (1992) note that such analyses usually hinge on conditions which preserve the feasibility and optimality conditions of an optimal basic solution obtained via an extreme point algorithm, and provide a more comprehensive approach wherein for each DMU a region of stability is calculated such that all perturbations within the cell preserve the DMU's classification as efficient versus inefficient. Similarly, Grosskopf and Valdmanis (1987) examine efficiency among hospitals using DEA and perform a sensitivity analysis by using alternative measures for some inputs and examining the effect on the measured efficiency scores. These techniques are not to be confused with attempts to identify outliers. Sensitivity analyses check the robustness of estimated efficiency scores with respect to deviations of observations from

DETECTINGINFLUENTIALOBSERVATIONSIN DATAENVELOPMENTANALYSIS

29

their initial location in the input-output space, whereas outlier detection involves finding observations whose location in the input output space is atypical. As suggested above, the literature on outlier detection in the context of deterministic frontier models is sparse. Wilson and Jadlow (1982) fit a deterministic parametric frontier using LP techniques, and then delete observations lying on the estimated frontier until the parameter estimates stabilize. Deletion of observations on the frontier which are outliers will likely produce large changes in the parameter estimates, while deleting observations on the frontier that are not outliers will likely have only a small effect on parameter estimates. Wilson and Jadlow do not state whether observations are deleted sequentially, pairwise, or otherwise. Their method is a variation of that used by Timmer (1971), who deletes a fixed percentage of the observations lying on the initial frontier. Dusansky and Wilson (1994, 1995) use similar approaches in DEA models. Seaver and Triantis (1989) perform an extensive analysis of outliers in estimating a frontier production model by COLS, and state that the methods they use are applicable in deterministic nonparametric DEA models as well. However, this is not possible (with the exception of the AP statistic as discussed by Wilson, 1993), since OLS residuals are not available. Wilson (1993) provides a diagnostic statistic which may be used to identify outliers in deterministic frontier models with multiple inputs and multiple outputs, but this approach becomes computationally infeasible as the number of observations and the dimension of the input-output space increase (e.g., with 7 dimensions in the input output space, the statistic becomes computationally infeasible for more than about 100 observations). The methodology presented below is much less computationally intensive, and thus may be applied in much larger samples than the statistic proposed by Wilson (1993). The next section presents a modification of the standard DEA model which avoids a censoring problem inherent in conventional DEA formulations. The third section suggests a strategy for detecting influential observations using the modified DEA model. The fourth section presents several empirical examples; conclusions are discussed in the final section.

2. A Modified DEA methodology


To illustrate the detection of influential observations in DEA applications, consider the input weak efficiency (IWE) score discussed by Ffire et al. (1985) [the IWE score is the inverse of the Shephard (1970) input distance function]. This particular DEA model has been widely applied, and the methodology to be presented below may be easily extended to other DEA models or the FDH method. Given a sample of N DMUs, the IWE score for the ith DMU is computed by solving the linear program

min{XilYi <- Yqi, Xixi >- Xqi, T q i = 1, qi E 6t N }

(1)

where qi is a (N x 1) vector of weights to be computed, 1 is a (1 N) vector of ones, Xi is a scalar, xi is a (K x 1) vector of inputs for the ith DMU, Yi is a (M 1) vector of outputs for the ith DMU, X = [Xl, . . . , XN] is a (K x N) matrix of observed inputs, and Y = [Yb . . . , YN] is a (M x N) matrix of observed outputs. The minimand Xi measures the input weak efficiency of the ith DMU; typically, one would solve (1) N

30

P.W. WILSON

times, once for each DMU. The constraint 1 qi = 1 results in a variable returns to scale technology; constant returns to scale may be imposed by omitting this constraint from the LP problem in (1). The observations that appear in the constraints of the LP problem in (1) comprise the reference set, and in the case of (1) include all observations in the sample. Clearly, Xi < 1, with Xi = 1 indicating an efficient DMU lying on the measured boundary of the production set. DMUs for which )ki < 1 are regarded as technically inefficient. To illustrate the IWE score, consider 7 DMUs, i = 1, . . . , 7, labeled A, B, C, D, E, F, and G respectively which each produce the same level of output from two input quantities Xl, x2. Input and output quantities for these illustrative DMUs appear in columns 2-4 of Table 1, and are plotted in (xl, x2)-space in Figure 1. For DMUs C and D, IWE is computed by OC/OC and OD'/OD, respectively. Hence C is ostensibly efficient as indicated by its IWE score equal to 1, while D is inefficient since it lies within the interior of the production set bounded by the piecewise-linear production-set boundary passing through A, C, and G. In addition to C, DMUs A and G are also efficient; B, D, E, and F are considered inefficient by the IWE measure. In a typical data set, there may be many observations such as A, C, and G which lie on the computed frontier. Consequently, the distribution of efficiency scores will often include a mass at one in the case of the 1WE score. Computing the linear program in (1) for each o f N DMUs yield a set of efficiency scores {~,ili = 1. . . . . N}. The set ~ = {il ~,i -- 1, i = 1, . . . , N} is termed the efficient subset and is the set of DMUs which support the measured boundary of the production set. Korostelev et al. (1994) show that for a finite sample of identically, independently distributed observations (xi, Yi) drawn from a set g, the production set boundary implied by (1) and other DEA models is biased downward relative to the true boundary of 9. Hence, the observations in ~ are not necessarily efficient in the true sense, even though their measured efficiency scores equal unity. Therefore the observations in g are regarded as ostensibly efficient. Andersen and Petersen (1989) and Lovell et al. (1993) suggest a modification to the DEA model to allow ranking of ostensibly efficient observations, z This modification is also useful for identifying influential observations. In (1), efficiency for the ith DMU is measured relative to all DMUs in the sample, including the ith DMU. The modification used by Andersen and Petersen and by Lovell et al. involves removing the ith DMU from
Table 1. Illustrative case for Figure 1.
DMU A B C D E F G x1 0.50 2.00 3.00 4.00 5.00 7.00 8.00 x2 8.00 6.00 2.00 6.00 3.00 4.00 1.00 y 2.00 2.00 2.00 2.00 2.00 2.00 2.00 Xi 4.0000 0.8519 1.6000 0.5897 0.6500 0.4815 2.0000

DETECTING INFLUENTIAL OBSERVATIONSIN DATAENVELOPMENT ANALYSIS

31

371

A A

tllB
/// // t

/911 D

eF lE

t //

//

/1
0

O
Figure 1. Input weak technical efficiency.

) X2

the constraint set when efficiency for the ith DMU is computed. For the IWE score, the linear program becomes min{h*lyi _-_ y(i)q,, h . x i >_ x(i)q., T q * ~ (it { }

(2)

where X* is the modified IWE score for the ith DMU, q* is a vector of weights, X (i) = [xj] V j ~ i, y(i) = [yj] V j ~ i, and other variables are defined as before. Hence X (0 and y(i) have dimensions K (N - 1) and M (iV - 1), respectively, and q* has dimensions 1 x ( N - 1). Unlike h i in (1), X* is not bounded from above and hence X* > 0. The modified score X* measures technical efficiency for the ith DMU relative to all other DMUs in the san> ple. For X* < 1, the interpretation is the same as for h i computed from (1). However, for X* __- 1, X* is interpreted as the amount by which the input vector for DMU i may be proportionately increased without being dominated by a linear combination of the remaining DMUs in the sample. For DMU C in the above example, the modified IWE score is computed as O C ' / O C as shown in Figure 2, and clearly exceeds unity. In Figure 2, efficiency for DMU C is

32

P.W.WILSON

Xl

/.

0
Figure 2. Modifiedinput weaktechnicalefficiency.

) X2

measured relative to frontier formed when C is dropped from the sample. The modification only affects efficiency measured for DMUs A, C, and G, which formed the efficient subset in Figure 1. For DMUs B, D, E, and F, k* = ~. Hence the frontier shown in Figure 2 is only relevant for measuring the technical efficiency of DMU C; for each of the DMUs in the efficient subset 8, efficiency is measured relative to a different frontier in the modified approach. Thus, while the modified approach in (2) is useful in detecting influential observations in the next section, the efficiency scores from (1) and the frontier implied by (1) may have a more intuitive economic meaning. Use of (2) for diagnostic purposes does not increase computational requirements, however; one can simply compute k~' from (2) and then set ),i= ~ i * ifk* < 1; otherwise. This is equivalent to computing hi directly from (1).

(3)

DETECTING INFLUENTIALOBSERVATIONSIN DATAENVELOPMENTANALYSIS

33

Note that it is possible that X* cannot be computed for some observations if deletion of observation i from the reference set results in an infeasible constraint set in (2). The infeasibility problem results when an observation lies above the frontier supported by the other observations in the sample, so that neither a radial contraction nor expansion of inputs, holding output constant, will reach the frontier. This situation is illustrated in Figure 3 for the case of a single input (x) mapped into a single output (y). This problem could be avoided by using the constant returns to scale formulation (by dropping the constraint -~ q* = 1 in (2)). The infeasibility of the constraint set for observation i means that X* is undefined. This problem does not occur for Xi computed from (1), since observation i is included in the reference set. Note that for any observations i such that X* is undefined due to an infeasible constraint set, Xi = 1. Given a sample of observations on N DMUs, (1) and (2) may be used to compute {Xil i = 1, . . . , N} and {X*]i = 1, . . . , N}, respectively. As noted earlier, the distribution of the Xi will typically include a mass at unity; this is often indicative of a censored distribution. 3 This is confirmed by the distribution of the X* which, if censored from above at unity as in (3), becomes identical to the distribution of Xi. In addition, the view that

C
'?
- - e - -

--)

O
Figure 3. Infeasibleconstraint sets.

34

P.W. WILSON

Xi is censored is consistent with studies such as McCarty and Yaisawarng (1993), Dusansky and Wilson (1994), and others which have used censored regression (e.g., Tobit) models to regress efficiency scores from models such as (1) on factors that might influence measured technical efficiency. Censoring involves a loss of information; clearly, {X* l i = 1. . . . , N} contains more information about the N DMUs in the observed sample than {Xili = 1, . . . , N}. The additional information contained in {X* [i = 1, . . . , N} is useful in detecting influential observations in the sample.

3. Detecting Influential Observations


DEA models such as (1) and (2) measure efficiency relative to a deterministic frontier supported by the sample observations. Consequently, the resulting efficiency scores incorporate statistical noise as well as inefficiency. Furthermore, any measurement error, coding errors, or other data errors will affect the measured efficiency scores. For observations in the efficient subset, this problem is acute since any errors will not only affect efficiency measured for the DMU with the data error, but possibly other DMUs as well. These observations suggest three questions regarding observations in the efficient subset: (i) How confident can we be that an observation in the efficient subset is really efficient relative to other DMUs in the sample; i.e., how much support is there for the frontier in the neighborhood of such an observation? (ii) How many efficiency scores for other DMUs are affected by the presence of an observation (possibly measured with error) in the efficient subset? (iii) How much does the presence of an observation in the efficient subset affect the measured efficiency of other DMUs? Insight regarding the answer to question (i) is obtained by examining the adjusted efficiency scores computed from (2). For the example pictured in Figures 1-2, these scores are shown in Table 1. As noted earlier, DMUs for which X* >_ 1 are in the efficient subset, which in this case include DMUs A, C, and G. For DMU C, the modified efficiency score indicates that this DMU's inputs would have to be increased by 60 percent to reach the frontier supported by the other observations in the sample. This is a relatively large amount, and indicates that there is little support for the frontier in the neighborhood of DMU C. 4 For DMUs A and G, the adjusted efficiency score is even larger, again suggesting that there is little support for the frontier in the neighborhoods of these DMUs. The values of X* for DMUs A, C, and G indicate that these observations are atypical (i.e., oufliers). In this example, one should thus be suspicious of any of the DMUs in the measured efficient subset being efficient in a true sense. Since the sample size is very small here, this should not be a surprising conclusion (the next section gives some empirical examples with larger sample sizes). To examine questions (ii) and (iii), consider once again the example represented in Figures 1-2. In particular, consider DMU C. When the conventional DEA efficiency score in (1) is used, the existence of C affects measured efficiency for DMUs B, D, E, and E When the modified efficiency score in (2) is used, the existence of C affects measured efficiency

DETECTINGINFLUENTIALOBSERVATIONSIN DATAENVELOPMENTANALYSIS

35

for these same DMUs, but also affects measured efficiency for DMU G--information which is lost when the conventional score is used due to the censoring problem. 5 Although the researcher may ultimately wish to use Xi computed from (1) to characterize the technology and any inefficiency, the information regarding DMU G may still be useful since the shape of the frontier around DMU G is affected by the presence of DMU C. This would be important if the researcher wanted to measure shadow prices or marginal rates of technical substitution. To examine the question of how much the presence of DMU C affects the efficiency measured for DMUs B, D, E, F, and G, (2) could be recomputed for each DMU in the sample while deleting DMU C from the reference set. Let the index i = 1, . . . , 7 refer to the DMUs A . . . . . G, respectively. Then ~ = {1, 3, 7}. For each j E ~, compute min{),~-lyi _< y(ij)q,j, X~jx i >_ x(ij)q~, T q ~ = 1, q~j E 6IN+} (4)

for i = 1. . . . . 7, i # j where X (ij) = [Xm] u m ;e i, j, y(ij) = [Ym] q m ~ i, j, and other variables are defined as before (note X (ij) and y(ij) have dimensions K x (N - 2) and M x (N - 2), respectively, and q* has dimensions 1 x (N - 2)). This yields, for e a c h j E g, a set {X*li = 1. . . . . N, i ~ j} (note that when observation j is deleted, X~ may be undefined due to an infeasible constraint set in (4), as discussed above in connection with (2)). Next, define nj, j E g, as the number of cases where X* ;e )~ and ~* Xg- are defined, and define n*, as nj plus the number of cases where 3,* is defined and ),~j is undefined. Then the average change in measured efficiency resulting from deleting a DMU j f i g is ~j ~ nj-1 ~iEe (X~j -- ~k*), where e = {i[i = 1 . . . . . N, i r j, ;k* and X~j are defined}. Performing this exercise for the data in Table 1 yields:

1 3 7

2 5 2

0.3474 0.4136 0.0176

0.6948 2.0680 0.0352

Thus, observations 1 and 7 (DMUs A and G, respectively) each influence measured efficiency for two other DMUs using the modified measure in (2). Observation 3 (DMU C) influences measured efficiency for five other DMUs. The average change in measured efficiency, 6/ for DMUs A and C are quite large (relative to the average amount of inefficiency that is typically found in empirical applications). However, DMU C affects measured efficiency for 2.5 times the number of DMUs affected by DMU A. Hence the quantity nj* x 6j provides a measure of the relative importance of the effects of DMUs A and C. If data-checking is costly and resources are scarce (a common scenario), the careful researcher would want to first examine whether the observation on DMU C has been measured correctly; if not, the researcher would want to either correct the observation if possible, or delete it otherwise, since its presence has

36

P.W. WILSON

such a large effect on efficiency measured for other DMUs in the sample. The researcher might next focus on the observation for DMU A, if sufficient resources were available. Andrews and Pregibon (1978), Rousseeuw and Bassett (1991), Wilson (1993), and others have discussed the masking problem in the context of outlier detection, where the effects of an outlier may be masked by the presence of one or more other nearby outliers in the space containing the data. The methodology proposed above could be extended to an iterative analysis along the lines of Dusansky and Wilson (1994) where pairs, triplets, etc. are deleted, but this would greatly increase the computational burden and would make the methodology intractable in the case of large samples. The empirical examples in the next section demonstrate that highly influential observations can be found without considering the masking problem. Furthermore, if the effects of an outlier are masked by adjacent observations, this would seem to imply a lower probability that the observations are contaminated by data errors. Seaver and Triantis (1989) and Welsch (1982) suggest that it may be desirable to use more than one outlier detection scheme; the same caveat applies here, and this article presents only one such scheme. The methodology described above allows the researcher to prioritize observations in the efficient subset for further scrutiny. The prioritization depends upon several factors: how many other observations' efficiency scores are influenced by a given observation, how much measured efficiency would change for these other observations if the given observation were deleted, and how much support exists for the frontier in the neighborhood of the given observation. Rather than relying on a statistical test to determine whether a given observation is an outlier, the researcher must use his knowledge of the underlying production process to determine whether the influence of a particular observation is large enough to warrant further scrutiny (which may be costly), and to decide how many of the prioritized observations should be scrutinized for errors. 6 Note that the approach here involves the same calculus of comparing marginal benefits and marginal costs that economists use whenever resources are scarce. If the cost of data-checking is constant across observations, one should first check observations which, if they contain a data error, might cause the most damage to the entire analysis. The next section presents some empirical examples using data from previously published studies to illustrate (1) how the ideas from this section can be applied in larger samples than the example considered above, and (2) the prevalence of the outlier problem in real data sets used in typical DEA applications.

4. Some Empirical Examples


For each example below, efficiency is measured in terms of the modified IWE score computed via (2). All computations were done on a SUN SPARCstation 10 Model 30 desktop workstation using optimized Fortran code. 7

Example 4.1.
Charnes et al. (1981) report data on results from Program Follow Through; their data contain 70 observations on five inputs and three outputs. Measuring efficiency using the modified

DETECTING INFLUENTIAL OBSERVATIONS IN DATA ENVELOPMENT ANALYSIS

37

IWE score in (2) indicates 27 ostensibly efficient observations which are listed in Table 2. For each ostensibly efficient observation, Table 2 shows the modified IWE score X*, the number of DMUs for which X* is altered when the ostensibly efficient observation is deleted (nj*), the average change in measured efficiency occurring when the ostensibly efficient observation is omitted (tj), and the weighted measure nj* x 6j. Among the ostensibly efficient observations listed in Table 2, obervations 44 and 54 stand out after inspection of the values X*; observation 44 has the largest value of X*, while X~9 is undefined. Moreover, observation 44 has the highest value for nj* x 6j, indicating a great deal of influence on other observations in the sample. Observation 52 has the next highest value for nj* x 6j, followed by observations 69, 62, 59, 56, 15, 58, etc. Observations 17, 47, and 49 each influence 11 or more other observations, but rank rather low in terms of their average level of influence as measured by nj* x 6j.
Table 2. Charnes et al. (1981) data.

jEg
5 11 12 15 17 18 20 21 22 24 27 32 35 38 44 45 47 48 49 52 54 56 58 59 62 68 69

X ~
1.0504 1.0577 1.0487 1.2868 1.2360 1.0393 1.1421 1.1122 1.0158 1.1055 1.0630 1.0615 1.0299 1.1455 2.0816 1.0120 1.1089 1.3018 1.0690 1.1863 1.2186 1.0921 1.3514 -1.5541 1.1897 1.6448

nj* 3 0 4 5 1l 0 4 6 7 5 7 3 1 3 32 4 18 4 22 37 2 7 11 4 32 3 18

4 0.0405 0.0000 0.0097 0.0742 0.0186 0.0000 0.0203 0.0037 0.0008 0.0260 0.0064 0.0055 0.0469 0.0120 0.0412 0.0518 0.0089 0.0139 0.0062 0.0205 0.0054 0.0555 0.0198 0.1370 0.0203 0.0106 0.0386

nj* ~ 0.1215 0.0000 0.0388 0.3709 0.2051 0.0000 0.0812 0.0222 0.0056 0.1298 0.0451 0.0165 0.0469 0.0360 1.3190 0.2071 0.1596 0.0556 0.1362 0.7568 0.0108 0.3883 0.2183 0.5479 0.6487 0.0319 0.6941

38

P.W. WILSON

Wilson (1993) reports results from an outlier analysis on the same data used in this example; observations 35, 44, 54, and 59 were found to be outliers, as well as 5 other observations that are not in the efficient subset. Observations 44 and 59 were found to be the most obvious outliers, consistent with the results above. The results here indicate other important influential observations such as observation 52, which influences measured efficiency for 53.6 percent of the remaining observations in the sample.

Example 4.2
Ffire et al. (1986) measure efficiency among 100 steam electric generating plants in 1975 using 5 inputs and 5 outputs. Computing the modified IWE score in (2) for each DMU in the sample reveals 83 ostensibly efficient observations as listed in Table 3. For 18 of these observations, ~* is undefined due to infeasible constraint sets. Of the remaining 65 observations, 12 have values ~* > 2, and 4 have values X* > 25. For observation 68, ~8 = 1586.4 suggesting that this plants' inputs would have to be scaled up by more than 150,000 percent to reach the frontier supported by the remaining observations; this seems quite unlikely. Moreover, this observation influences efficiency measurement for 13 other plants represented in the sample. For observation 94, X~'4is undefined. This observation influences efficiency measurement for 11 other observations in the sample, and to an extraordinary degree as indicated by ~j and nj* x 6j. Observations 1, 20, 49, 66, and 69 also exert relatively large influence on measured efficiency for other observations in the sample. While other observations listed in Table 3 may contain data errors that affect efficiency for other observations, the observations listed above seem to be the most likely to have data errors, and potentially have the largest impact on efficiency measurement within the sample.

Example 4.3
Aly et al. (1990) examine efficiency using the IWE score in (1) for a sample of 322 independent banks in 1986. Their data contain observations for 3 inputs and 5 outputs for each of the 322 banks in their sample. Computing the modified IWE score in (2) for each DMU in the sample reveals 90 ostensibly efficient observations as listed in Table 4; of these, X* is undefined for two observations, and X* > 2 for eight observations. In addition, several observations influence measured efficiency for a large number of DMUs in the sample; e.g., observation 298 affects measured efficiency for more than one third of the DMUs in the sample; 18 of the observations listed in Table 4 affect measured efficiency_for at least 10 percent of the sample. In terms of overall influence as measured by nj* x 6j, observations 26, 259, and 298 appear to have the largest influence and thus are good candidates for further scrutiny. As noted previously, the question of whether n* x 6j is large may depend upon the underlying production process. Aly et al. (1990) report mean prices, P_=_ [23.7 0.43 0.07]. Mean inputs are given by X = ]32 889000 35680000], and hence X P ' = $2,880,628. Thus, on average, a change in efficiency of 0.01 is worth approximately $28,806, which gives a guideline for interpreting the magnitudes of nj* x 3j in Table 4.

DETECTING INFLUENTIAL OBSERVATIONS IN DATA ENVELOPMENT ANALYSIS

39

Table 3. F~re et al. (1986) data.

n 7
1 2 3 4 5 6 7 8 9 10 11 12 14 15 16 17 18 19 20 21 22 24 26 27 28 29 30 31 33 34 35 36 37 38 39 40 41 42 43 45 47 49 51 52 53 55 57 58 59 60 -1.2858 1.0070 1.1253 1.2126 1.0000 1.0165 --25.1167 1.0741 1.0009 1.4722 -1.1768 1.5523 1.2107 1.0489 -3.0195 1.0263 1.0228 1.0003 1.0098 1.9247 2.9746 -216.6826 4.0362 1.1227 1.4428 2.3032 -1.0669 4.5989 1.8467 1.0163 --1.1201 1.2973 2.6866 1.5500 1.2459 1.8893 1.0136 1.0384 1.0765 1.5729 1.4589 6 9 1 5 0 0 0 0 7 13 4 0 0 1 2 8 3 4 9 5 2 2 0 0 8 11 9 0 4 0 3 5 21 0 2 4 0 9 1 0 5 8 6 2 0 0 1 0 0 10

4
3.2687 0.0294 0.0000 0.0025 0.0000 0.0000 0.0000 0.0000 0.0910 0.0604 0.0036 0.0000 0.0000 0.0000 0.0027 0.0069 0.0113 0.0056 3.2077 0.0403 0.0004 0.0000 0.0000 0.0000 0.0294 0.1867 0.1078 0.0000 0.0126 0.0000 0.0180 1.0111 0,0588 0,0000 1,0800 0.0145 0,0000 0,0382 0.0165 0.0000 0.0111 7.1677 0.0770 0.1983 0.0000 0.0000 0.0000 0.0000 0.0000 0.0330

nY
19.6122 0.2648 0.0000 0.0126 0.0000 0.0000 0.0000 0.0000 0.6367 0.7849 0.0143 0.0000 0.0000 0.0000 0.0054 0.0548 0.0338 0.0222 28.8691 0.2015 0.0008 0.0000 0.0000 0.0000 0.2349 2.0540 0.9701 0.0000 0.0506 0.0000 0.0539 5.0556 1.2338 0.0000 2.1599 0.0579 0.0000 0.3441 0.0165 0.0000 0.0557 57.3417 0,4621 0,3966 0,0000 0.0000 0.0000 0.0000 0.0000 0,3303

40

P.W. WILSON

Table 3. Continued.

jE g
63 64 66 67 68 69 70 71 72 73 75 76 77 78 79 80 81 82 83 84 86 87 88 89 90 91 93 94 95 96 97 98 100

X ~
1.1121 1.0004 -1.3901 1586.4194 1.0001 1.1262 1.0129 1.1154 1.3392 5.1170 1.0122 -1.1102 1.5068 1.0086 --1.0466 ---3.8671 1.1034 1.2555 1.0079 25.3920 -3.5466 1.0344 1.0383 -1.0258

nj* 0 1 7 0 13 2 4 1 3 3 4 10 1 2 12 0 3 5 1 0 2 4 15 1 2 1 20 11 5 0 0 27 0

4 0.0000 0.0001 1.8124 0.0000 4.6366 9.7569 0.0160 0.0058 0.0245 0.0237 0.4378 0.0012 0.0000 0.0077 0.0479 0.0000 0.0510 0.1505 0.1357 0.0000 0.0051 0.9660 0.1239 0.0176 0.0016 0.2205 0.0677 116.2929 0.0572 0.0000 0.0000 0.0311 0.0000

nJ* x 4 0.0000 0.0001 12.6869 0.0000 60.2761 19.5137 0.0640 0.0058 0.0734 0.0712 1.7513 0.0121 0.0000 0.0153 0.5746 0.0000 0.1531 0.7524 0.1357 0.0000 0.0101 3.8640 1.8584 0.0176 0.0031 0.2205 1.3541 1279.2223 0.2861 0.0000 0.0000 0.8399 0.0000

Table 4. Aly et al. (1990) data.

2 3 4 5 6 8 13 19 20 21 23

1.7363 1.3160 1.0144 1.5773 2.1902 1.5017 1.1138 1.5556 1.2830 1.0638 1.0454

40 72 0 36 93 38 12 60 53 0 3

0.0281 0.0140 0.0000 0.0106 0.0227 0.0171 0.0079 0.0212 0.0113 0.0000 0.0051

1.1247 1,0052 0,0000 0,3804 2,1154 0.6479 0.0951 1.2739 0.5968 0.0000 0.0154

DETECTING INFLUENTIAL OBSERVATIONS IN DATA ENVELOPMENT ANALYSIS

4]

Table 4. Continued.

25 26 28 29 30 31 35 36 37 38 42 53 54 58 61 63 68 75 83 86 94 99 100 104 106 107 110 112 119 121 122 132 135 136 139 147 151 153 168 173 180 184 197 205 213 228 235 238 239 240

1.2256 4.3915 1.0217 1.0827 1.6906 1.1668 1.0598 1.2734 1.0171 1.2880 1.1048 1.0311 1.0615 1.0216 1.4589 1.2025 1.0339 1.0547 1.0696 1.0352 1.3163 1.0245 1.1401 1.6791 1.3516 3.0921 1.1414 1.0572 1.0470 3.5218 1.3072 1.1386 1.2068 1.0543 1.0793 1.0851 1.5369 1.3229 1.0638 1.1333 1.1306 1.1516 1.1538 1.4875 1.2189 1.1566 1.2619 1.0144 1.1758 1.0822

1 83 1 10 8 6 0 27 2 21 25 1 9 0 5 3 0 2 1 5 19 1 5 4 19 24 5 2 1 18 11 4 88 1 10 19 92 19 4 14 4 8 6 3 25 8 20 1 38 32

0.1417 0.0416 0.0538 0.0043 0.0724 0.0166 0.0000 0.0169 0.0011 0.0129 0.0049 0.0065 0.0117 0.0000 0.0126 0.0123 0.0000 0.0022 0.1146 0.0051 0.0738 0.0013 0.0098 0.0289 0.0119 0.0575 0.0025 0.0030 0.0071 0.1159 0.0795 0.0041 0.0110 0.0107 0.0040 0.0006 0.0230 0.0267 0.0018 0.0346 0.0050 0.0061 0.0048 0.0232 0.0121 0.0065 0.0118 0.0004 0.0075 0.0035

0.1417 3.4529 0.0538 0.0435 0.5796 0.0997 0.0000 0.4555 0.0023 0.2706 0.1235 0.0065 0.1055 0.0000 0.0629 0.0368 0.0000 0.0045 0.1146 0.0255 1.4022 0.0013 0,0491 0.1158 0,2264 1,3811 0.0127 0.0060 0.0071 2.0865 0.8744 0.0166 0.9655 0.0107 0.0402 0.0105 2.1177 0.5076 0.0071 0.4843 0.0200 0.0492 0.0289 0.0696 0.3024 0.0519 0.2359 0.0004 0.2856 0.1118

42

P.W. WILSON

Table 4. Continued.

245 246 251 254 259 261 267 268 285 292 295 296 298 299 302 304 305 306 308 309 311 312 313 316 317 319 320 321 322

1.3169 1.1742 1.0717 1.1294 4.5327 1.1942 1.3726 1.9270 1.1049 1.4966 1.4098 1.2013 1.5483 2.7110 1.2529 1.1736 1.7552 1.2354 1.2392 1.6534 1.6011 1.4201 2.2224 1.4847 1.2780 1.1721 2.4940 ---

22 28 3 12 81 9 22 22 1 50 61 31 111 47 3 3 20 9 7 42 2 34 16 5 1 2 22 11 19

0.0120 0.0088 0.0004 0.0049 0.0699 0.0006 0.0059 0.0497 0.0353 0.0229 0.0253 0.0099 0.0305 0.0573 0.0627 0.0129 0.0511 0.0170 0.0134 0.0236 0.1249 0.0151 0.0646 0.0177 0.0041 0.0271 0.1358 0.0326 0.1532

0.2647 0.2461 0.0011 0.0587 5.6625 0.0050 0.1288 1.0931 0.0353 1.1430 1.5458 0.3084 3.3900 2.6917 0.1881 0.0386 1.0222 0.1530 0.0940 0.9913 0.2498 0.5141 1.0332 0.0883 0.0041 0.0542 2.9883 0.3582 2.9105

5. Conclusions
Although neither of the three studies cited as examples in the previous section make any mention of attempts to measure the influence of particular observations or to test for the presence of oufliers in the data, the examples are not intended to suggest that the authors of the studies were careless or otherwise negligent in their analyses. No good outlier diagnostics were available at the times of their studies. Rather, the examples merely illustrate the need for such diagnostics, and the potential for the present diagnostic to find influential observations in data used to estimate technical efficiency scores. The examples suggest that substantial numbers of influential observations exist in typical settings where DEA has been used. Furthermore, the diagnostics that have been proposed in this paper give an indication of the consequences of a data error in an ostensibly efficient observation, and a scheme for prioritizing observations for further scrutiny is proposed. This prioritization is particularly important in situations where data-checking is costly and resources are limited.

DETECTING INFLUENTIAL OBSERVATIONS IN DATA ENVELOPMENT ANALYSIS

43

As noted in the introduction, identification of influential observations or outliers is only a first step. It remains for the researcher to scrutinize suspicious observations to ensure that they do not contain some type of measurement error. If a data error is found in a particular observation, it should be corrected if possible; otherwise, the researcher may wish to delete the observation with the error. If no data error is found, then the observation is merely atypical of the remaining data, and may contain useful information.

Notes
1. A wide literature exists describing the detection of oufliers in the context of the CLR model. See Chatterjee and Hadi (1986) and Gray (1988) for summaries of this literature. 2. Charnes et al. (1986) appear to be the first to consider more than one class of efficient observations. The modification used by Andersen and Petersen and by Lovell et al. is related to the classification of ostensibly efficient observations discussed by Charnes et al. (1991), Thrall (1993), and elsewhere. 3. The distinction between censoring and truncation has sometimes been confused in the literature. In either case, a variable z is observed, while a variable z* is the underlying variable of interest. If z is censored from above at a value c, then zi = z* for all i such that z* -< c, and zi = c for all i such that z* > c. If z is truncated from above at a value c, then zi = z* for all i such that z* < c, and z* does not exist (and hence zi is unobserved) otherwise. 4. In many empirical applications, average inefficiency measured by (1) has often been found to range from about 0.75 to 1.00; e.g., see Aly et al. (1990), Burgess and Wilson (1993), and Dusansky and Wilson (1994, 1995). 5. Note that in the case of the modified score computed from (2), the presence of DMU C affects measured efficiency for DMU G even though C does not dominate G in the sense of using less of both inputs to produce the same or greater output. 6. The significance of efficiency score changes resulting from deletion of an ostensibly efficient observation is meaningful only in the context of the underlying production process; i.e., a given change in efficiency may be important for one industry and unimportant for another industry, depending upon the nature of the technology and input/output mapping used in each industry. In addition, use of a statistical test in the methodology of this section would require distributional assumptions regarding the efficiency scores; one of the attractive features of DEA and FDH methods is that they avoid distributional assumptions. 7. Performance of this machine is rated by the manufacturer at 86.1 million instructions per second, and 10.6 million floating-point operations per second. In the largest sample investigated in this section (Aly et al., 1990, with 322 observations), all computations were performed in less than 69 minutes of elapsed time.

References
Aly, H.Y., R.G. Grabowski, C. Pasurka, and N. Rangan. (1990). "Technical, Scale, and Allocative Efficiencies in U.S. Banking: an Empirical Investigation." Review of Economics and Statistics 72, 211-218. Andersen, E, and N.C. Petersen. (1989). "A Procedure for Ranking Efficient Units in Data Envelopment Analysis." Unpublished working paper, Department of Management, Odense University, Odense, Denmark. Andrews, D.E, and D. Pregibon. (1978). "Finding the Outliers that Matter." Journal of the Royal Statistical Society Series B, 40, 85-93. Burgess, J.E Jr., and P.W. Wilson. (1993). "Technical Efficiency in VA Hospitals," in The Measurement of Productive Efficiency: Techniques and Applications, (eds. Harold O. Fried, C.A. Knox Lovelt, and Shelton S. Schmidt), Oxford: Oxford University Press, 335-351. Charnes, A., W.W. Cooper, and E. Rhodes. (1981). "Evaluating Program and Managerial Efficiency: An Application of Data Envelopment Analysis to Program Follow Through." Management Science 27, 668-697. Charnes, A., W.W. Cooper, and R.M. Thrall. (1986). "Identifying and Classifying Scale and Technical Efficiencies in Data Envelopment Analysis." Operations Research Letters 5, 105-116.

44

P.W. WILSON

Charnes, A., W.W. Cooper, and R.M. Thrall. (1991). "A Structure for Classifying and Characterizing Efficiency and Inefficiency in Data Envelopment Analysis?' Journal of Productivity Analysis 2, 197-237. Charnes, A., S. Haag, P. Jaska, and J. Semple. (1992). "Sensitivity of Efficiency Classifications in the Additive Model of Data Envelopment Analysis?' International Journal of Systems Science 23,789-798. Charnes, A., and L. Nerali6. (1990). Sensitivity Analysis of the Additive Model in Data Envelopment Analysis." European Journal of Operational Research 48, 332-341. Chatterjee, S. and Hadi, A.S. (1986). "Influential Observations, High Leverage Points, and Outliers in Linear Regression." Statistical Science, 1, 379-392. Cook, R.D., and S. Weisberg. (1982). Residuals and Influence in Regression, New York: Chapman and Hall. Deprins, D., L. Simar, and H. Tulkens. (1984). "Measuring Labor Inefficiency in Post Offices,' in The Performance of Public Enterprises: Concepts and Measurements, (ed. M. Marchand, P. Pestieau, and H. Tulkens), Amsterdam: North-Holland, 243-267. Dusansky, R. and P.W. Wilson. (1994). "Measuring Efficiency in the Care of the Developmentally Disabled." Review of Economics and Statistics 76, 340-345. Dusansky, R. and P.W. Wilson, (1995). "On the Relative Efficiency of Alternative Modes of Producing Public Sector Output: The Case of the Developmentally Disabled?' European Journal of Operational Research 80, 608-628. F~ire, R., S. Grosskopf, and C.A.K. Lovell. (1985). The Measurement of Efficiency of Production, Boston: Kluwer-Nijhoff. F~ire, R., S. Grosskopf, and C. Pasurka. (1986). "Effects on Relative Efficiency in Electric Power Generation Due to Environmental Controls." Resources and Energy 8, 167-184. Gray, J.B. (1988). ' ~ Classification of Influence Measures?' Journal of Statistical Computation and Simulation, 30, 159-171. Greene, W.H. (1980). "Maximum Likelihood Estimation of Econometric Frontier Functions?' Journal of Econometrics 13, 27-56. Grosskopf, S., and V. Valdmanis. (1987). "Measuring Hospital Performance." Journal of Health Economics 6, 89-107. Gunst, R.E, and R.L. Mason. (1980). Regression Analysis and its Application. New York: Marcel Dekker. Korostelev, A.P., L. Simar, and A.B. Tsybakov. (1994). "Efficient Estimation of Monotone Boundaries." Annals of Statistics, forthcoming. Lovell, C.A.K. (1993). "Production Frontiers and Productive Efficiency,' in The Measurement of Productive Efficiency: Techniques and Applications, (eds. H. Fried, C.A.K. Lovell, and S. Schmidt), Oxford: Oxford University. Lovell, C.A.K., L.C. Waiters, and L.L. Wood. (1993). "Stratified Models of Education Production Using DEA and Regression Analysis?' in Data Envelopment Analysis: Theory, Methods, and Applications, (eds. A. Charnes, W.W. Cooper, A.Y. Lewin, and L.M. Seiford), New York: Quorom Books. Martin, J.P., and J.M. Page. (1983). "The Impact of Subsidies on X-efficiency in LDC Industry: Theory and an Empirical Test." Review of Economics and Statistics, 65, 608-617. McCarty, T.A., and S. Yaisawarng. (1993). "Technical Efficiency in New Jersey School Districts?' in The Measurement of Productive Efficiency: Techniques and Applications, (eds. H. Fried, C.A.K. Lovell, and S. Schmidt), Oxford: Oxford University. Rousseeuw, P.J., and G.W. Bassett. (1991). "Robustness of the p-Subset Algorithm with High Breakdown Point?' in Directions in Robust Statistics and Diagnostics: Part II, (eds. W. Stahel and S. Weisberg), New York: Springer-Verlag. Rousseeuw, P.J., and B.C. van Zomeren. (1990). "Unmasking Multivariate Outliers and Leverage Points," Journal of the American Statistical Association, 85, 633-639. Seaver, B.L., and K.P. Triantis. (1989). "The Implications of Using Messy Data to Estimate Production-FrontierBased Technical Efficiency Measures,' Journal of Business and Economic Statistics 7, 49-59. Sexton, T.R., R.H. Silkman, and A.J. Hogan. (1986). "Data Envelopment Analysis: Critique and Extensions," in Measuring Efficiency: An Assessment of Data Envelopment Analysis, R.H. Silkman (ed.), San Francisco: Jossey-Bass Co. Shephard, Ronald W. (1970). Theory of Cost and Production Functions, Princeton: Princeton University. Thrall, R.M. (1993). "Duality, Classification, and Slacks in DEA" Unpublished working paper, Jones Graduate School of Administration, Rice University.

DETECTING INFLUENTIAL OBSERVATIONS IN DATA ENVELOPMENT ANALYSIS

45

Timmer, C.E (1971). "Using a Probabilistic Frontier Production Function to Measure Technical Efficiency." Journal of Political Economy 79, 776-794. Welsch, R.E. (1982). "Influence Functions and Regression Diagnostics." in Modern Data Analysis, (ed. by R.L. Launer and A.E Siegel), New York: Academic Press. Wilson, G.W., and J.M. Jadlow. (1982). "Competition, Profit Incentives, and Technical Efficiency in the Provision of Nuclear Medicine Services." The Bell Journal of Economics 13,472-482. Wilson, P.W. (1993). "Detecting Outliers in Deterministic Nonparametric Frontier Models with Multiple Outputs." Journal of Business and Economic Statistics 11, 319-323.

Anda mungkin juga menyukai