Anda di halaman 1dari 19

PSYCHOMETRIKA — VOL . 74, NO .

3, 457–475
S EPTEMBER 2009
DOI : 10.1007/ S 11336-009-9115-2

EXEMPLAR-BASED CLUSTERING VIA SIMULATED ANNEALING

M ICHAEL J. B RUSCO
FLORIDA STATE UNIVERSITY

H ANS -F RIEDRICH KÖHN


UNIVERSITY OF MISSOURI-COLUMBIA

Several authors have touted the p-median model as a plausible alternative to within-cluster sums of
squares (i.e., K-means) partitioning. Purported advantages of the p-median model include the provision
of “exemplars” as cluster centers, robustness with respect to outliers, and the accommodation of a diverse
range of similarity data. We developed a new simulated annealing heuristic for the p-median problem
and completed a thorough investigation of its computational performance. The salient findings from our
experiments are that our new method substantially outperforms a previous implementation of simulated
annealing and is competitive with the most effective metaheuristics for the p-median problem.
Key words: cluster analysis, partitioning, heuristics, p-median model, simulated annealing.

1. Introduction

The K-means method (Forgy, 1965; Hartigan & Wong, 1979; Howard, 1966; MacQueen,
1967; Steinhaus, 1956; Thorndike, 1953) is arguably the most popular method for conducting a
nonhierarchical cluster analysis of two-mode, two-way proximity data (see Steinley, 2006 for a
review). Such data are typically represented by an N × V matrix, A, that contains measurements
for N objects on each of V variables. In most applications, the K-means algorithm attempts
to partition the N objects into K clusters based on the V variables with the goal of minimiz-
ing the sum (across all objects) of the squared Euclidean distances between each object and its
cluster centroid. The centroid of a cluster in this context consists of a set of V variable means.
Because the variable means for a cluster typically do not explicitly correspond to the variable
measurements for any particular object in the cluster, the centroids are often termed implicit or
virtual.
Although less common than K-means methods, there are partitioning methods that seek to
establish clusters such that the center of each cluster explicitly corresponds to an object. Part of
the difficulty in tracking down these methods is that they span multiple disciplines and are often
called by different names. For example, Kaufman and Rousseeuw (1990, Chapter 2) describe
the “partitioning around medoids,” or “K-medoid” problem, which requires the selection of K
representative objects for clusters. This is congruent with the “p-median” approach to clustering,
which has an extensive history in the management sciences because of its relationship to facil-
ity location planning (Maranzana, 1964). The implementation of the p-median model as a data
analysis tool was discussed by Mulvey and Crowder (1979) and Klastorin (1985), and was advo-
cated by Brusco and Köhn (2008a, 2008b). Another recent contribution was offered by Frey and
Dueck (2007), who presented a method termed “affinity propagation,” which also focuses on the
solution of an adaptation of the p-median problem. Throughout the remainder of this paper, we

Requests for reprints should be sent to Michael J. Brusco, Department of Marketing, College of Business, Florida
State University, Tallahassee, FL 32306-1110, USA. E-mail: mbrusco@cob.fsu.edu

457
© 2009 The Psychometric Society
458 PSYCHOMETRIKA

employ the term “p-median,” while recognizing the potential for other sources to use alternative
terminology for the same model.
In addition to providing a representative object (exemplar) as a centroid, several other advan-
tages of the p-median clustering method have been identified. First, Kaufman and Rousseeuw
(1990, Chapter 2) suggest that the p-median approach is more robust than K-means clustering
or other K-centroids procedures, particularly with respect to handling outliers. Second, the p-
median model has tremendous flexibility. For example, the model does not require metric data
and can be applied easily to proximity data established via the application of similarity coeffi-
cients to nominal or ordinal data. In fact, the p-median method can often be employed gainfully
to proximity matrices with triangle inequality violations, asymmetric proximity matrices, and
to nonsquare proximity matrices (Brusco & Köhn, 2008a). Third, as demonstrated by Avella,
Sassano, and Vasil’ev (2007), Beltran, Tadonki, and Vial (2006), and Brusco and Köhn (2008b),
exact (i.e., globally optimal) solutions for large instances of the p-median problems are often
achievable. Avella et al. (2007), for example, described a branch-cut-price algorithm that pro-
vides optimal solutions for problems as large as N = 3,795 and p = 500. It is important to note,
however, that the largest test problems used in most evaluations of exact methods employ in-
put proximities computed as Euclidean distances between points in two-dimensional space. The
performance of exact methods has not been sufficiently evaluated for input proximities obtained
from computing Euclidean distances between points measured on three or more variables (i.e.,
data located in a dimensional space greater than two). Moreover, violations of the triangle in-
equality and asymmetric proximity relations can pose formidable challenges for exact methods
(e.g., see Hansen, Mladenović, & Perez-Brito, 2001, p. 345).
Given the propitious characteristics of the p-median model, why does its usage pale in com-
parison to the ubiquitous K-means method? One plausible reason might be the potentially limited
scalability of p-median algorithms relative to K-means clustering heuristics. Whereas batch im-
plementations of the K-means algorithm (Forgy, 1965; Howard, 1966) can accommodate tens of
thousands of objects, some of the more popular p-median heuristics might encounter computer
storage and/or CPU time difficulties for data sets with several thousand objects. A second ex-
planation for the dearth of p-median clustering applications may stem from the lack of available
software for conducting p-median cluster analyses. Although several researchers have provided
software programs for p-median applications, the availability of procedures in commercial sta-
tistical packages is virtually nonexistent. Moreover, potential users of p-median clustering meth-
ods are currently restricted to a choice between relatively inflexible, but sophisticated, procedures
that are available as compiled Fortran or C++ codes (e.g., Avella et al., 2007; Brusco & Köhn,
2008b; Hansen et al., 2001; Resende & Werneck, 2004), or user-friendly implementations in R
or MATLAB that lack effectiveness for certain data conditions (e.g., Brusco & Köhn, 2008a;
Kaufman & Rousseeuw, 1990). Recently, Frey and Dueck (2007) presented a MATLAB imple-
mentation of their affinity propagation algorithm as a long-awaited breakthrough for application-
oriented, scalable, versatile, efficient, and effective exemplar-based clustering. Unfortunately, as
Brusco and Köhn (2008a) showed, affinity propagation suffers from a serious deficit with respect
to its ability to obtain a globally optimal partition for even modestly sized data sets. In light of
these issues, any intent to facilitate the advancement of exemplar-based clustering in the applied
behavioral sciences still faces the challenge of devising accessible programs that can produce
high-quality solutions for problems of various data structure and substantial size.
One procedure that would seem to be especially well suited for providing high-quality so-
lutions to p-median problems in a user-friendly format is simulated annealing (see Aarts &
Korst, 1989; Kirkpatrick, Gelatt, & Vecchi, 1983; van Laarhoven & Aarts, 1987), which has
been recognized as a powerful metaheuristic for solving combinatorial optimization problems
for more than two decades. Although simulated annealing does not guarantee that a globally
optimal solution will be identified, the procedure has been applied successfully to a vast array
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 459

of data-analytic tasks such as matrix permutation (Brusco, Köhn, & Stahl, 2008), classification
(Ceulemans & Van Mechelen, 2008; Ceulemans, Van Mechelen, & Leenen, 2007), unidimen-
sional scaling (Murillo, Vera, & Heiser, 2005) and multidimensional scaling (Vera, Heiser, &
Murillo, 2007). Given this formidable reputation, we were particularly intrigued by the some-
what lackluster performance of simulated annealing implementations for the p-median problem
(Chiyoshi & Galvão, 2000; Murray & Church, 1996) when compared to alternative approaches.
Consider, for example, reported results for the better of these two simulated annealing approaches
(Chiyoshi and Galvão’s) for the 40 p-median problems from the OR-Library (Beasley, 1990).
Chiyoshi and Galvão applied their simulated annealing heuristic to each of these test instances,
obtaining the known globally optimal solution for only 26 of the 40 problems. Contrastingly,
Hansen and Mladenović (1997, p. 217) obtained 39 and 38 optimal solutions (out of 40) using
variable neighborhood search and tabu search, respectively.
Our fundamental goal in this paper is to build on recent advancements in the area of
exemplar-based clustering by developing a new simulated annealing heuristic for the p-median
problem and conducting a rigorous evaluation of its performance relative to competing meth-
ods. First, we present results showing that our new method substantially outperformed Chiyoshi
and Galvão’s (2000) implementation of simulated annealing for the p-median problem. Second,
we demonstrate that our new simulated annealing heuristic generally provides partitions with
better objective function values than affinity propagation and vertex substitution across a di-
verse set of difficult test problems. Third, we provide evidence that our new simulated annealing
heuristic is competitive with the best metaheuristics available for the p-median problem (Alba &
Dominguez, 2006; Hansen & Mladenović, 1997; Resende & Werneck, 2004; Taillard, 2003).
In Section 2, we describe the p-median model and provide a brief review of exact and
approximate solution procedures. Section 3 presents the simulated annealing heuristic. Section 4
reports results for a variety of test problems from the p-median literature, ranging in size from
N = 100 to N = 5,934 objects. An application of the p-median model to a real-world data set
from the telecommunications industry is provided in Section 5. We conclude in Section 6 with a
brief summary and discussion of limitations and extensions.

2. The p-Median Problem

2.1. Model Formulation of the p-Median Problem (PMP)


Although the p-median model can be applied to two-mode matrices (e.g., demand points and
plants in facility location), clarity of presentation and relevance to later applications is facilitated
by considering a one-mode context pertaining to a collection of N objects. Moreover, for consis-
tency with Frey and Dueck (2007, 2008) and Brusco and Köhn (2008a), we present the p-median
problem as one of similarity maximization, as opposed to the equivalent dissimilarity minimiza-
tion version that is also common in the literature (Avella et al., 2007; Mulvey & Crowder, 1979;
Resende & Werneck, 2004). We denote S = [sij ] as an N × N nonpositive similarity matrix that
is measured for the set of N objects indexed by the set I = {1, 2, . . . , N }. The definition of S as
a similarity matrix implies that if i, j , and l are distinct objects, then sij > sil means that object
i is more similar in comparison to j than it is to l. We assume that sii = 0 for 1 ≤ i ≤ N (i.e.,
by convention, self-similarity is zero for all objects) and such relationships are ignored in the
analysis. The objective of the p-median model is to select a subset, J , consisting of p repre-
sentative objects (or exemplars), and to assign each object to its most similar exemplar with the
goal of maximizing the sum (across all objects) of the similarities between each object and its
nearest exemplar. A concise mathematical statement of the resulting p-median problem (PMP)
460 PSYCHOMETRIKA

optimization problem, which is similar to the one offered in dissimilarity form by Mladenović,
Brimberg, Hansen, and Moreno-Pérez (2007), is:
 
Maximize : z1 = max{sij } , (1)
J j ∈J
i∈I
subject to J ⊂ I, (2)
|J | = p. (3)

The PMP can also be represented as an integer program (Rao, 1971; ReVelle & Swain, 1970;
Vinod, 1969). The integer programming formulation of the PMP is facilitated by the following
set of decision variables:
X: X = {x11 , x12 , . . . , xN N } is a set of binary variables where xij = 1 if object i is assigned
to exemplar j and 0 otherwise for 1 ≤ i ≤ N and 1 ≤ j ≤ N .
The integer programming formulation is as follows:


N 
N
Maximize : z1 = sij xij , (4)
X
i=1 j =1


N
subject to xij = 1 for 1 ≤ i ≤ N ; (5)
j =1


N
xjj = p; (6)
j =1

0 ≤ xij ≤ xjj ≤ 1 for 1 ≤ i ≤ N and 1 ≤ j ≤ N ; (7)


xij ∈ {0, 1} for 1 ≤ i ≤ N and 1 ≤ j ≤ N. (8)

The objective function (4) represents the total sum of the similarities of each object to its most
similar exemplar. Constraint set (5) requires that each object is assigned to exactly one exemplar.
Constraint (6) guarantees the selection of exactly p exemplars. Constraint set (7) ensures that
object i can only be assigned to object j if object j is a selected exemplar. The binary restrictions
on the variables are provided by constraint set (8).

2.2. Solution Methods for PMP


The integer programming formulation of the PMP can be solved using commercial software
for modestly sized problems. However, the most effective exact procedures for the PMP are based
on Lagrangian relaxation and other customized mathematical programming methods (Avella et
al., 2007; Beltran et al., 2006; Brusco & Köhn, 2008b; Christofides & Beasley, 1982; Du Merle
& Vial, 2002; Galvão, 1980; Hanjoul & Peeters, 1985; Mulvey & Crowder, 1979; Narula, Ogbu,
& Samuelsson, 1977).
Although exact procedures have proven effective for some clustering problems, heuristics
remain especially popular for very large problems, as well as for more general proximity data.
The spectrum of heuristic approaches is quite extensive, encompassing algorithms that construct
a p-median solution via iterative selection of exemplars from 1 to p (Kuehn & Hamburger, 1963),
refine an initial feasible solution (Maranzana, 1964; Teitz & Bart, 1968), or employ Lagrangian-
based methods (Cornuejols, Fisher, & Nemhauser, 1977; Mulvey & Crowder, 1979). In addi-
tion, the PMP has been tackled using metaheuristics such as simulated annealing (Chiyoshi &
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 461

Galvão, 2000; Murray & Church, 1996), genetic algorithms (Alba & Dominguez, 2006; Alp,
Erkut, & Drezner, 2003; Moreno-Pérez, García-Roda, & Moreno-Vega, 1994), tabu search (Rol-
land, Schilling, & Current, 1996), ant-colony approaches (Levanova & Loresh, 2004), variable
neighborhood search (Hansen & Mladenović, 1997; Hansen et al., 2001), heuristic concentration
(Rosing & ReVelle, 1997; Rosing, ReVelle, Rolland, Schilling, & Current, 1998), and hybrid
methods (Resende & Werneck, 2004). A comprehensive review of these procedures was recently
provided by Mladenović et al. (2007).
In their review, Mladenović et al. (2007, p. 932) observe that the vertex substitution heuristic
(VSH) originally proposed by Teitz and Bart (1968) “. . . is one of the most often used heuris-
tics either alone or as a subroutine of other more complex methods or within metaheuristics.”
The VSH iteratively refines an initial set of exemplars by finding the best possible interchange
of one of the current exemplars with one of the unselected objects. This process continues until
there is no interchange that will lead to any improvement of the objective function. The effi-
cient implementation of this method is of considerable importance and has been the focus of
several research efforts (Hansen & Mladenović, 1997; Resende & Werneck, 2003; Rosing, 1997;
Whitaker, 1983). Fast implementations of VSH make use of lists of the most similar and second
most similar exemplars for each object. These lists help to avoid the explicit evaluation of the
replacement of each exemplar with one of the unselected objects.

2.3. The Affinity Propagation Problem (APP)


Frey and Dueck’s (2007) affinity propagation algorithm obtains a clustering of a data set by
transmitting information among objects. The affinity propagation method is designed to solve a
problem similar to PMP; however, the selection of a specific value of p is replaced with “pref-
erence” values for each object that measures their suitability for serving as an exemplar. These
object preferences are incorporated in the objective function. Denoting the preference for object
i as θi for 1 ≤ i ≤ N , the affinity propagation problem (APP) is
  
Maximize : z2 = max{sij } + θj , (9)
J j ∈J
i∈I j ∈J

subject to constraint (2). The sum of the preference scores of objects selected as exemplars sup-
posedly steers the objective function towards selecting the optimal number of clusters for a given
data set. Accordingly, a purported advantage of APP is that the incorporation of the preference
information in the objective function can be used to facilitate the selection of the number of clus-
ters. Unfortunately, the choice for the preference values is highly subjective. In the absence of
any à priori information, Frey and Dueck (2007) suggest setting each θi value equal to the median
of the similarity measures, or to the minimum similarity measure if a smaller number of clusters
is desirable. We emphasize, however, that both of these choices are somewhat arbitrary and there
is no compelling evidence that either choice results in an appropriate number of clusters.
The affinity propagation algorithm passes two pieces of information between pairs of objects
that possess similarity relationships: (a) “responsibilities,” which indicate the suitability of one
object to serve as an exemplar for another, and (b) “availabilities,” which indicate the viability
of an object to accept another object as its exemplar. The responsibilities and availabilities are
iteratively updated to provide accumulated evidence of exemplar suitability and object assign-
ment information. Although the updating rules are straightforward, affinity propagation requires
a number of nontrivial parameter decisions. These include iteration limits and thresholds to con-
trol termination, as well as a “damping factor” to prevent numerical oscillations that can occur
for some test problems.
462 PSYCHOMETRIKA

2.4. The Modified p-median Problem (MPMP)


For completeness and clarity, it is helpful to consider a modified version of PMP (and APP),
which corresponds to the maximization of (9) subject to constraints (2) and (3). As a result, this
modified p-median problem (MPMP) is an adaptation of the APP that enforces a p-cluster so-
lution, and can also be viewed as an augmented version of PMP that incorporates preference
information in the objective function. However, as noted previously, specific preference infor-
mation is typically unavailable. Consequently, in most applications of APP reported by Frey
and Dueck (2007, 2008), each object is assigned the same preference value; thereby reducing
the ambitious concept of individually varying preference scores to the selection of a “preference
constant” θ such that θi = θ for 1 ≤ i ≤ N . Thus, for any value of p, the objective function values
of PMP and MPMP (z1 and z2 , respectively) differ only by the constant pθ (i.e., z2 = z1 + pθ ).
Accordingly, the only time an analyst would need to consider MPMP instead of PMP is in the
case of unequal preferences.

2.5. Software Programs for PMP, APP, and MPMP


Although Kaufman and Rousseeuw’s (1990) implementation of VSH is available in the
R programming language, most of the more effective metaheuristics for PMP have been pro-
grammed in more traditional scientific languages such as Fortran (Hansen & Mladenović, 1997;
Hansen et al., 2001), C++ (Levanova & Loresh, 2004; Resende & Werneck, 2004; Taillard,
2003), and Pascal (Chiyoshi & Galvão, 2000; Rolland et al., 1996). Although these traditional
languages facilitate the implementation of compiled programs that are computationally efficient,
they do not necessarily offer the most user-friendly format for applied research analysts.
More recently, Frey and Dueck (2007) made their affinity propagation program available as
a MATLAB m-file, apcluster.m, from the website www.psi.toronto.edu/affinitypropagation.
Brusco and Köhn (2008a) presented an adaptation of Hansen and Mladenović’s (1997) fast
VSH procedure for MPMP that incorporates the preference information in the objective
function. This program, vsh_fc.m, is available as a MATLAB m-file from the website
http://mailer.fsu.edu/~mbrusco. Preliminary experiments conducted by Brusco and Köhn sug-
gest that vsh_fc.m provides better solutions (i.e., larger values of z2 ) than apcluster.m
in comparable CPU time. However, Frey and Dueck (2008) correctly point out that the solution
quality of vsh_fc.m deteriorates and the computation time explodes as p increases beyond
100 or more. This finding, which is well known in the p-median literature (see, for example,
Hansen & Mladenović, 1997, p. 218), has provided ample motivation for the development of
robust heuristics that are effective for large or small N , large or small p, and a broad spectrum
of data structures (Hansen et al., 2001; Levanova & Loresh, 2004; Murray & Church, 1996;
Resende & Werneck, 2004; Rosing & ReVelle, 1997).

2.6. Restricting the Focus to PMP


Throughout the remainder of this paper, all of our computational analyses focus on PMP.
Our rationale for restricting the comparisons to PMP is that the selection of nonequal prefer-
ences for APP or MPMP would be an extraordinarily subjective endeavor for most data analysis
applications in the behavioral sciences and there are no available guidelines for accomplishing
this task. Moreover, even in the case of equal preferences, the selection of the preference constant
θ is rather arbitrary. Given our emphasis on PMP, all of the objective function values in the text
and tables correspond to z1 . Although the apcluster.m algorithm does not solve the PMP
directly, a comparison of affinity propagation with the PMP methods is straightforward (Brusco
& Köhn, 2008a; Frey & Dueck, 2007, 2008). In our analyses, the apcluster.m algorithm
is always implemented under the assumption of equal preferences (i.e., θi = θ for 1 ≤ i ≤ N ),
and the objective value reported for the algorithm is z1 = z2 − pθ (this value is returned in the
variable “dpsim” from the apcluster.m program).
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 463

3. Simulated Annealing Heuristics

3.1. Overview and Key Components of the Heuristics


To describe the general structure of the simulated annealing algorithm for the p-median
problem, we define J ∗ as the best-found set of exemplars, z∗ as the corresponding best-found
objective value, and U = I \J as the set of objects that are not selected as exemplars. In addition,
we denote T as the “temperature” of the simulated annealing algorithm, which controls the prob-
ability of accepting (as a replacement for the current incumbent solution) a solution that worsens
the objective function. The “cooling factor” of the algorithm, c, periodically reduces the temper-
ature of the algorithm and the “temperature length” G represents the number of trial solutions
that are evaluated at each temperature. The basic blueprint of simulated annealing algorithms for
the p-median problem is as follows:
Step 0. INITIALIZE. Randomly generate an initial set of p exemplars, J . Set U =
I \J, J ∗ = J , compute z1 using (1), and set z∗ = z1 . Choose 0 < c < 1, G, and
an initial value for T . Set g = 0 and b = 0.
Step 1. GENERATE A TRIAL SOLUTION. Set g = g + 1. Modify J to create a trial set
of exemplars, J  , and compute the objective value for the trial solution, z1 .
Step 2. EVALUATE THE TRIAL SOLUTION. Compute  = z1 − z1 . If  < 0, then go to
Step 4; otherwise, go to Step 3.
Step 3. ACCEPT AN IMPROVED TRIAL SOLUTION. Set J = J  , U = I \J , z1 = z1 ,
b = 1, and, if z1 > z∗ , then also set J ∗ = J and z∗ = z1 . Go to Step 5.
Step 4. ACCEPT AN INFERIOR TRIAL SOLUTION? Generate a uniform random num-
ber, r, on the interval [0, 1]. If r < exp(  
T ), then set b = 1, J = J , U = I \J , and

z1 = z1 . Go to Step 5.
Step 5. UPDATE TEMPERATURE? If g < G, then set g = g + 1 and go to Step 1; other-
wise go to Step 6.
Step 6. TERMINATION? If b = 1, then set b = 0, T = cT , g = 0 and return to Step 1;
otherwise, STOP.
There are two aspects of the simulated annealing algorithm that are crucial to its effective-
ness: (1) the selection of parameters that control the cooling schedule, and (2) the generation of
trial solutions in Step 1. These issues are discussed in the following subsections.

3.2. Choosing Parameters for the Simulated Annealing Algorithm


To establish the initial temperature, T , in Step 0, we randomly generated 200 exchanges of
an exemplar in J with an unselected object in U . For each exchange, the effect on the objective
function was computed, and the maximum absolute change across all exchanges was specified as
the initial value of T . This process of selecting the initial temperature results in the acceptance of
a large number of inferior solutions in early stages of the algorithm. The temperature length, G,
was set equal to 10N . This setting enables larger problems to have more trial solutions at each
temperature, while maintaining reasonable limits on computation time.
The cooling parameter, c, controls the rate of change in the temperature. The selection of 0 <
c < 1 creates a monotonic decrease in the temperature that gradually reduces the probability of
accepting inferior solutions. Common choices for c in the combinatorial data analysis literature
are in the range of .8 ≤ c ≤ .95 (Brusco & Steinley, 2007; Brusco et al., 2008; Ceulemans et al.,
2007; Murillo et al., 2005; Vera et al., 2007). In our experiments, we tested parameter settings
of 0.8 and 0.9.
Chiyoshi and Galvão (2000) recommend adjustment of the temperature rather than reduction
during the execution of the simulated annealing algorithm. The goal is to keep the number of
464 PSYCHOMETRIKA

accepted solutions from becoming too large or too small. At each temperature, the proportion, π ,
of the 10N trial solutions that were accepted is computed, and the following function is used to
obtain the temperature adjustment factor, c, in Step 6:


⎪ 1 + (0.4 − π)/0.4 if π < 0.4,

c = 1/(1 + (π − 0.6)/0.4) if π > 0.6, (10)



1 if 0.4 ≤ π ≤ 0.6.

We developed a version of the simulated annealing algorithm that uses (10) for temperature ad-
justment. For this rule, the termination criterion in Step 6 is not appropriate, and it is necessary to
determine a limit on the number of temperature adjustments. For comparison to the standard tem-
perature reduction implementations, we obtained a limit by calculating the number of reductions
that would be necessary to reduce the initial temperature to .0001 using c = 0.9.

3.3. Generation of Trial Solutions


Murray and Church’s (1996) simulated annealing implementation for the p-median problem
used what is perhaps the most basic neighborhood move for producing the trial solution. Their
approach for obtaining J  in Step 1 was to replace a randomly selected exemplar, j ∈ J , with
a randomly chosen object, u ∈ U . As noted by Chiyoshi and Galvão (2000), however, this ap-
proach often requires considerable computational effort to produce only mediocre solutions. The
fundamental problem with this generation process is that solution improvement is contingent on
the simultaneous random selection of good choices for the outgoing and incoming object. Our
preliminary experimentation with this purely random generation strategy confirmed the findings
of Chiyoshi and Galvão, and we eliminated this approach from further consideration.
Chiyoshi and Galvão (2000) generated trial solutions by randomly selecting the outgoing
exemplar, j ∈ J , and, subsequently, finding the unselected object, u ∈ U , that produces the most
favorable change in the objective function. In some instances, the most favorable change im-
proves the objective function value and is accepted in Step 3, whereas in other situations the
most favorable change is the one that has the least detrimental effect on the objective function.
The acceptance of this inferior solution is then probabilistically determined in Step 4.
We adopt the opposite of Chiyoshi and Galvão’s (2000) approach in our generation of trial
solutions. More specifically, we randomly generate an unselected object, u ∈ U , and find the
exemplar, j ∈ J , for which replacement of that exemplar with u produces the most favorable
change in the objective function. Our rationale for this modification is twofold. First, for most
values of p encountered in clustering applications, U is a much larger set than J , and, therefore,
evaluation of |J | replacements is apt to be faster than evaluation of |U | replacements. Second,
and more importantly, we believe that our generation strategy should enable a broader range of
objects to enter the set of exemplars and thus facilitate the identification of better solutions.
For maximum efficiency, we use methods based on Whitaker’s (1983) clever ideas for find-
ing the outgoing exemplar, which uses lists of the nearest and second nearest exemplars for
each object. These lists are updated as existing exemplars are deleted and new ones are added.
Hansen and Mladenović (1997, pp. 222–225) and Kaufman and Rousseeuw (1990, pp. 102–104)
present adaptations of Whitaker’s method. Our implementation, which closely mirrors Hansen
and Mladenović’s procedure, uses the following lists:
α(i): the exemplar, j ∈ J , to which object i is most similar for 1 ≤ i ≤ N ;
β(i): the exemplar, (j = α(i)) ∈ J , to which object i is second most similar for 1 ≤ i ≤ N .
The α(i) and β(i) arrays are obtained for the initial set of exemplars, J , in Step 0 and
updated during the algorithm. In Step 1, a trial solution is obtained using the algorithm shown
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 465

F IGURE 1.
The neighborhood solution generation procedure for the SA algorithm.

F IGURE 2.
A procedure to update the α and β arrays after replacing the exiting exemplar with the entering candidate object.

in Figure 1. The entering unselected object, u ∈ U , is randomly selected based on a uniform


distribution with equal probability for each candidate object. Next, the α(i) and β(i) arrays are
used to identify the exiting exemplar, j  ∈ J . The trial solution, J  , is produced by replacing j 
with u, and  is also obtained and stored. If the trial solution is accepted as the new incumbent
solution in Steps 3 and 4, then J = J  , U = I \J , and the logic in Figure 2 is used to update the
α(i) and β(i) arrays.
466 PSYCHOMETRIKA

3.4. Implementation of the Simulated Annealing Algorithms


Three different versions of the simulated annealing algorithms were programmed as
MATLAB *.m files. The first two versions are designed for PMP. The program sa1.m uses
standard temperature reductions, whereas the second, sa2.m, uses temperature adjustments as
described in Section 3.2. The third version, sa1_mpmp.m, is designed for the case of differen-
tial preferences for exemplars; however, this program is not used for any experimental analyses
in this paper. We also prepared a Fortran version of sa1.m (sa1.for), which is appreciably
faster than the MATLAB code.

4. Computational Results

4.1. Comparison to Chiyoshi and Galvão’s (2000) Simulation Annealing Heuristic


To insure that our new simulated annealing heuristic was superior to Chiyoshi and Galvão’s
(2000) implementation, we obtained the 40 PMP test problems from the OR-Library (Beasley,
1990) and applied sa1.for using 20 restarts for each problem. Our new algorithm obtained
the globally optimal solution on at least one restart for all 40 test problems, and on at least 9 of
the 20 restarts for 39 of the 40 problems. This compares very favorably to the Chiyoshi–Galvão
simulated annealing implementation, which failed to find the global optimum for 14 of the 40
test problems on any of the 100 restarts permitted by the authors. Based on these findings, we
initiated comparisons of our new heuristic to alternative approaches with the confidence that it is
superior to previous simulated annealing implementations.

4.2. Test Problems


We evaluated the performances of apcluster.m, vsh_fc.m, sa1.m, and sa2.m using
some of the largest and most challenging test problems from the p-median literature. The test
problems stem from a total of seven unique data sets, which are described as follows:
• TravelRouting: An asymmetric similarity matrix (with triangle inequality violations) of
negative air travel times among N = 456 cities. These data were studied by Frey and
Dueck (2007) and were obtained from www.psi.toronto.edu/affinitypropagation.
• FaceClustering: A similarity matrix of negative squared Euclidean distances among
N = 900 facial images from the Olivetti data base. These data were studied by Frey and
Dueck (2007) and were obtained from www.psi.toronto.edu/affinitypropagation.
• ExtendedCircuitBoard: A similarity matrix of negative squared Euclidean distances
among N = 1,272 points. Frey and Dueck (2008) obtained these points by generating
four replications (adding Gaussian noise) of the 318 points from Lin and Kernighan’s
(1973) circuit board data. Their goal was to produce a data set with points that are very
close together, which should be conducive to better relative performance for apclus-
ter.m. These data were obtained from www.psi.toronto.edu/affinitypropagation.
• FaceVideo: An asymmetric similarity matrix among N = 1,965 facial images. These
data were studied by Frey and Dueck (2008) and obtained from www.psi.toronto.edu/
affinitypropagation.
• FL1400: A similarity matrix of negative Euclidean distances among N = 1,400 holes on
a printed circuit board whose coordinates were obtained from TSPLIB (Reinelt, 2001).
• PCB3038: A similarity matrix of negative Euclidean distances among N = 3,038 holes
on a printed circuit board whose coordinates were obtained from TSPLIB (Reinelt, 2001).
• RL5934: A similarity matrix of negative Euclidean distances among N = 5,934 holes on
a printed circuit board whose coordinates were obtained from TSPLIB (Reinelt, 2001).
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 467

4.3. Experimental Test Conditions


The apcluster.m and vsh_fc.m programs were downloaded from Frey and Dueck’s
(2007) and Brusco and Köhn’s (2008a) websites, respectively. All algorithms were implemented
on a 2.4 GHz, Intel Core 2 Duo Processor with 3 GB of SDRAM. For each of the seven data
sets, we began by attempting to obtain a solution using apcluster.m for each of eight dif-
ferent preference vector settings. Solutions were successfully obtained for six of the seven data
sets; however, apcluster.m crashed on the RL5934 data set because of insufficient mem-
ory. Thus, apcluster.m produced solutions for 48 test problems (six data sets ×8 preference
vectors), and we stored the number of exemplars corresponding to each test problem. Next, we
applied vsh_fc.m, sa1.m (once using c = .8 and once using c = .9) and sa2.m to the 48 test
problems using the number of exemplars determined by apcluster.m for each problem. Our
main study is a comparison of the five methods (apcluster.m, vsh_fc.m, sa1.m (c = .8),
sa1.m (c = .9), and sa2.m) across these 48 problems.
In addition, because the memory requirements and computation times for sa1.m did not
preclude its implementation for RL5934, we conducted a follow-up study of this algorithm using
the PCB3038 and RL5934 data sets at p = 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 300, 500, 700,
and 900 clusters. The sa1.m results for these test problems were compared to benchmark objec-
tive values published by Resende and Werneck (2004). The benchmarks correspond to the best-
known objective values obtained across Resende and Werneck’s hybrid heuristic, Hansen et al.’s
(2001) variable neighborhood search procedure, and Taillard’s (2003) decomposition heuristic.

4.4. Experimental Results


4.4.1. Main Study Results Table 1 displays, for each heuristic and for each of the 48 test
problems, the best objective function value identified during the execution of the heuristic.
Table 2 provides, for each heuristic and each test problem, a summary of the percentage de-
viation from the best-found objective function value, as well as the total computation time.
The results in Table 2 reveal that apcluster.m produces solutions to the p-median prob-
lem in less time than the simulated annealing heuristics. In contrast, vsh_fc.m is often faster
than apcluster.m when N and p are small, but much slower at larger values of N and (es-
pecially) p. The computation times for apcluster.m are modest across all test problems,
ranging from 5 seconds for several of the “Travel Routing” problems to 758 seconds for the
“PCB3038” test problem at p = 765. Although always requiring at least three times the amount
of time as apcluster.m, the sa1.m heuristic with c = .8 was also reasonably efficient with
computation times ranging from 88 to 2,068 seconds. Increasing the cooling parameter from .8
to .9 roughly doubled the computation time for sa1.m in most instances, producing a range of
181 to 4,162 seconds. The sa2.m heuristic, which was typically least efficient, yielded a compu-
tation time range of 310 to 8,256 seconds. The explosion of computation time for vsh_fc.m as
p increases is readily apparent from Table 2. Consider, for example, the FaceVideo data, where
vsh_fc.m is the most efficient heuristic when p = 7 or 11. However, at p = 239 or 591 for the
same data set, apcluster.m and sa1.m (at both c = .8 and .9) are much more efficient than
vsh_fc.m.
With respect to solution quality, the rank order of the performances of the five methods
was: (1) sa1.m with c = .9, (2) sa1.m with c = .8, (3) vsh_fc.m, and (4) apcluster.m
and sa2.m in a virtual tie for worst performance. We define the “best-found solution” as the
one corresponding to the best objective function value obtained across the four methods. When
using c = .9, the sa1.m heuristic obtained the best-found solution for 40 of the 48 problem in-
stances, whereas 32 best-found solutions were identified when c = .8. The vsh_fc.m, sa2.m
and apcluster.m methods were somewhat worse, yielding 21, 13, and 4 best-found solutions,
468 PSYCHOMETRIKA

TABLE 1.
Objective values (z1 ) for apcluster.m, vsh_fc.m, sa1.m, and sa2.m.

Test Preferences p apcluster.m vsh_fc.m sa1.m sa1.m sa2.m


problem θ c = 0.8 c = 0.9
Travel −6000 6 −65185.00 −65185.00 −65185.00 −65185.00 −65185.00
routing −4500 7 −60960.00 −60654.00 −60654.00 −60654.00 −60654.00
N = 456 −3750 8 −57692.00 −56883.00 −56883.00 −56883.00 −56883.00
−3000 9 −53959.00 −53567.00 −53567.00 −53567.00 −53567.00
−2800 10 −51149.00 −50805.00 −50805.00 −50805.00 −50805.00
−2650 11 −48474.00 −48197.00 −48197.00 −48197.00 −48197.00
−2500 12 −45866.00 −45802.00 −45802.00 −45802.00 −45802.00
−2200 13 −43955.00 −43633.00 −43633.00 −43633.00 −43633.00

Face −600 5 −18432.91 −18328.66 −18328.66 −18328.66 −18328.66


clustering −450 9 −16545.24 −16315.23 −16315.23 −16315.23 −16373.26
N = 900 −300 12 −15590.31 −15422.65 −15422.54 −15408.62 −15455.77
−240 15 −14823.26 −14660.80 −14660.80 −14660.80 −14754.73
−180 19 −13948.09 −13856.01 −13852.89 −13852.89 −13950.33
−120 30 −12348.56 −12248.83 −12273.73 −12260.53 −12349.02
−60 62 −9734.72 −9692.34 −9690.78 −9685.84 −9783.47
−30 121 −7306.19 −7306.92 −7299.38 −7298.49 −7401.37

Extended −600000 68 −18459663.39 −18480931.87 −18369271.83 −18369271.83 −18832802.20


circuit −400000 72 −16444682.65 −16515167.12 −16296022.89 −16296413.34 −16730154.24
board −200000 103 −7772316.75 −7785956.03 −7756085.14 −7744505.99 −7952671.40
N = 1,272 −100000 138 −3215589.54 −3215402.66 −3224294.43 −3217615.02 −3250815.65
−50000 146 −2767469.98 −2767469.98 −2767469.98 −2767469.98 −2795786.66
−25000 180 −1529037.54 −1520844.30 −1521121.69 −1518270.03 −1531035.94
−10000 246 −334279.45 −348001.63 −334279.45 −334279.45 −334934.94
−5000 269 −186264.73 −186056.95 −185988.10 −185988.10 −186462.41

Face −600000 7 −41565688.00 −41565688.00 −41565688.00 −41565688.00 −41565688.00


video −400000 11 −40242610.00 −40236081.00 −40236081.00 −40236081.00 −40236081.00
N = 1,965 −200000 19 −38254719.00 −38239532.00 −38239532.00 −38239532.00 −38239532.00
−100000 40 −35365836.00 −35351635.00 −35351635.00 −35351635.00 −35351635.00
−50000 100 −31349521.00 −31334347.00 −31334347.00 −31334347.00 −31349201.00
−40000 146 −29339154.00 −29333083.00 −29336020.00 −29332189.00 −29352175.00
−30000 239 −26123503.00 −26124045.00 −26123175.00 −26123175.00 −26158972.00
−20000 591 −17694235.00 −17693778.00 −17693491.00 −17694189.00 −17738816.00

FL1400 −3000 18 −63117.09 −62957.09 −62957.09 −62957.09 −64157.57


N = 1,400 −2000 20 −57870.33 −57857.94 −57857.94 −57857.94 −59264.55
−1175 30 −44056.46 −44183.03 −44013.48 −44013.48 −45237.48
−600 47 −30762.33 −30713.31 −30713.31 −30709.99 −31582.44
−300 71 −22887.45 −21974.44 −21873.84 −21873.84 −22629.92
−100 125 −14676.94 −14066.88 −14037.17 −14030.79 −14392.96
−40 212 −9500.67 −8947.86 −8908.08 −8902.26 −9175.20
−20 316 −7022.84 −6369.40 −6321.80 −6330.42 −6450.04

PCB3038 −4800 58 −471024.86 −470421.34 −469881.02 −469362.74 −484493.49


N = 3,038 −3600 67 −438683.98 −437403.93 −436409.32 −435931.68 −449834.63
−2400 93 −370741.88 −368447.66 −366960.03 −366935.42 −380013.42
−1200 145 −290271.56 −288671.84 −287056.63 −286792.65 −297725.90
−800 201 −240336.79 −239189.79 −237855.19 −237892.60 −247042.14
−400 301 −189675.27 −188539.81 −187400.74 −187418.82 −194238.33
−200 482 −140046.21 −139848.94 −138915.43 −138959.87 −143332.13
−100 765 −99312.77 −99774.22 −98993.35 −98868.81 −100794.17

respectively. Moreover, the mean percent deviation from the best-found objective function val-
ues for the sa1.m (c = .9), sa1.m (c = .8), vsh_fc.m, sa2.m, and apcluster.m meth-
ods were .008%, .032%, .289%, 1.345%, and 1.007%, respectively. Both versions of sa1.m
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 469

TABLE 2.
Percentage error and computation times for apcluster.m, vsh_fc.m, sa1.m, and sa2.m.

Test Percentage deviation from best-found solution Computation time (in seconds)
problem p apcluster vsh_fc sa1.m sa1.m sa2.m apcluster vsh_fc sa1.m sa1.m sa2.m
c = .8 c = .9 c = .8 c = .9
Travel 6 0.000 0.000 0.000 0.000 0.000 5 3 146 306 318
routing 7 0.505 0.000 0.000 0.000 0.000 5 3 145 305 318
N = 456 8 1.422 0.000 0.000 0.000 0.000 6 4 144 304 318
9 0.732 0.000 0.000 0.000 0.000 6 4 145 304 310
10 0.677 0.000 0.000 0.000 0.000 8 4 146 307 310
11 0.575 0.000 0.000 0.000 0.000 7 4 141 297 314
12 0.140 0.000 0.000 0.000 0.000 7 5 144 304 307
13 0.738 0.000 0.000 0.000 0.000 5 5 143 302 304

Face 5 0.569 0.000 0.000 0.000 0.000 28 11 94 196 627


clustering 9 1.410 0.000 0.000 0.000 0.356 22 18 103 209 616
N = 900 12 1.179 0.091 0.090 0.000 0.306 22 23 94 181 584
15 1.108 0.000 0.000 0.000 0.641 28 28 88 190 575
19 0.687 0.023 0.000 0.000 0.703 27 33 92 191 567
30 0.718 0.000 0.108 0.000 0.722 24 51 92 192 544
62 0.505 0.067 0.051 0.000 1.008 34 95 107 209 546
121 0.106 0.116 0.012 0.000 1.410 25 149 113 229 537

Extended 68 0.492 0.608 0.000 0.000 2.523 48 217 354 736 1770
circuit 72 0.912 1.345 0.000 0.002 2.664 40 225 370 765 1749
board 103 0.359 0.535 0.150 0.000 2.688 52 286 454 944 1673
N = 1,272 138 0.000 0.000 0.271 0.063 1.095 51 360 509 1058 1602
146 0.000 0.000 0.000 0.000 1.023 50 387 531 1108 1606
180 0.709 0.170 0.188 0.000 0.841 60 442 544 1130 1577
246 0.000 4.105 0.000 0.000 0.196 50 546 615 1279 1563
269 0.149 0.037 0.000 0.000 0.255 49 560 626 1306 1561

Face 7 0.000 0.000 0.000 0.000 0.000 86 55 576 1204 3636


Video 11 0.016 0.000 0.000 0.000 0.000 96 80 581 1216 3573
N = 1,965 19 0.040 0.000 0.000 0.000 0.000 107 139 509 1057 3396
40 0.040 0.000 0.000 0.000 0.000 119 276 485 1028 3350
100 0.048 0.000 0.000 0.000 0.047 116 639 494 1037 3347
146 0.024 0.003 0.013 0.000 0.068 113 892 577 1204 3411
239 0.001 0.003 0.000 0.000 0.137 118 1324 556 1141 3443
591 0.004 0.002 0.000 0.004 0.256 123 2185 582 1207 3615

FL1400 18 0.254 0.000 0.000 0.000 1.907 41 90 448 934 1747


N = 1,400 20 0.021 0.000 0.000 0.000 2.431 43 96 475 987 1813
30 0.098 0.385 0.000 0.000 2.781 59 139 483 1016 1751
47 0.170 0.011 0.011 0.000 2.841 46 214 451 933 1728
71 4.634 0.460 0.000 0.000 3.457 126 276 495 1026 1819
124 4.605 0.257 0.045 0.000 2.581 125 392 794 1208 1862
212 6.722 0.512 0.065 0.000 3.066 126 558 883 1847 1857
313 11.089 0.753 0.000 0.136 2.029 141 603 863 1786 1736

PCB3038 58 0.354 0.226 0.110 0.000 3.224 301 1542 1518 3042 8256
N = 3,038 67 0.631 0.338 0.110 0.000 3.189 471 1752 1477 3171 8172
93 1.037 0.412 0.007 0.000 3.564 277 2225 1683 3474 8205
145 1.213 0.655 0.092 0.000 3.812 302 2998 1519 3167 7634
201 1.043 0.561 0.000 0.016 3.862 310 3774 1504 3219 7566
301 1.214 0.608 0.000 0.010 3.649 316 4802 1741 3593 7752
482 0.814 0.672 0.000 0.032 3.179 617 6472 1822 3723 7443
765 0.449 0.916 0.126 0.000 1.947 758 7315 2068 4162 7821

produced an objective value that was as good as or better than that of sa2.m for all 48 test prob-
lems. The apcluster.m algorithm obtained a better solution than both versions of sa1.m in
one instance (the ExtendedCircuitBoard problem with p = 138) but yielded worse solutions than
sa1.m on the vast majority of test problems.
470 PSYCHOMETRIKA

4.4.2. Follow-up Study Results With respect to the PCB3038 and RL5934 test problems
in the follow-up study, the results in Table 3 show that sa1.m performed well relative to the
benchmark objective values published by Resende and Werneck (2004). The average percent-
age deviations from the benchmark values for sa1.m were .056% and .041% for (c = .8) and
(c = .9), respectively. For at least one of the two cooling factor settings, sa1.m matched the
benchmark objective values for PCB3038 at p = 10, 20, 40, and 60 and RL5934 at p = 20 and
30. More impressively, sa1.m obtained objective values that are better than Resende and Wer-
neck’s benchmarks for PCB3038 at p = 30 and p = 100, as well as RL5934 at p = 40 and
p = 100.
It is important to reiterate that the benchmarks used for comparison purposes in Table 3 cor-
respond to the best objective values across multiple metaheuristics (Hansen et al., 2001; Resende
& Werneck, 2004, Taillard, 2003). A direct comparison of our proposed method with these meta-
heuristics is complicated by the fact that they were typically implemented using more computa-

TABLE 3.
A comparison of sa1.m to benchmark objective values (z1 ) for PCB3038 and RL5934 problems.

Test p Benchmark* Objective values Percentage Computation time


problem (z1 ) deviation (in seconds)
from best known
sa1.m sa1.m sa1.m sa1.m sa1.m sa1.m
c = 0.8 c = 0.9 c = 0.8 c = 0.9 c = 0.8 c = 0.9
PCB3038 10 −1213082.03 −1213082.03 −1213082.03 0.000 0.000 1404 2909
N = 3,038 20 −840844.53 −840844.53 −840844.53 0.000 0.000 1504 3121
30 −677306.76 −677272.22 −677837.61 −0.005 0.078 1362 2895
40 −571887.75 −571887.75 −572032.43 0.000 0.025 1497 3109
50 −507582.13 −508146.54 −507716.05 0.111 0.026 1485 3065
60 −460771.87 −460771.87 −460771.87 0.000 0.000 1445 3031
70 −426068.24 −426228.17 −426268.06 0.038 0.047 1539 3189
80 −397529.25 −397541.63 −397592.50 0.003 0.016 1627 3302
90 −373248.08 −373493.30 −373768.27 0.066 0.139 1594 3340
100 −352628.35 −352735.66 −352610.23 0.030 −0.005 1555 3240
300 −187712.12 −187896.60 −187760.70 0.098 0.026 1528 3088
500 −135467.85 −135680.37 −135577.07 0.157 0.081 1747 3620
700 −105584.40 −105990.96 −105996.00 0.385 0.390 1792 3655
900 −86984.10 −87136.26 −87034.69 0.175 0.058 1785 3705
RL5934 10 −9794951.00 −9796928.84 −9794973.65 0.020 0.000 4948 9530
N = 5,934 20 −6718848.19 −6718848.19 −6718848.19 0.000 0.000 5081 10233
30 −5374936.14 −5377477.79 −5374936.14 0.047 0.000 6289 12102
40 −4550364.60 −4550327.09 −4550327.09 −0.001 −0.001 6235 12942
50 −4032379.97 −4032411.54 −4032538.93 0.001 0.004 6038 12358
60 −3642397.88 −3645378.47 −3643935.28 0.082 0.042 6384 13812
70 −3343712.45 −3345335.92 −3345227.05 0.049 0.045 6574 14018
80 −3094824.49 −3097003.46 −3095423.38 0.070 0.019 6842 14569
90 −2893362.39 −2894864.53 −2894233.57 0.052 0.030 6663 13919
100 −2725180.81 −2725133.64 −2726233.80 −0.002 0.039 5704 11638
300 −1394115.39 −1394533.60 −1394314.35 0.030 0.014 7605 15793
500 −973995.18 −974641.60 −974258.22 0.066 0.027 6912 14250
700 −752068.38 −752361.91 −752126.98 0.039 0.008 12861 26691
900 −613367.44 −613647.27 −613660.71 0.046 0.048 8585 17678
* This column contains the best-known objective values published by Resende and Werneck (2004).
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 471

tionally efficient hardware and software platforms. More specifically, these metaheuristics were
implemented on workstations using compiled Fortran or C++ programs, whereas our sa1.m
program is implemented on a personal computer as a MATLAB m-file, which is not compiled.
Nevertheless, the relative performance of sa1.m is exceptional. For example, in a head-to-head
comparison with published results for Hansen et al.’s (2001, pp. 347–348) variable neighborhood
search heuristic, sa1.m (c = .8) provided a better objective value for 19 of the 28 test problems
in Table 3. For c = .9, sa1.m obtained a better objective value for 21 of the 28 problems.

4.4.3. Summary of Main Findings In summary, the salient findings of our methodological
comparisons can be succinctly stated as follows:
1. Our new simulated annealing heuristic substantially outperformed Chiyoshi and Galvão’s
(2000) simulated annealing implementation across the 40 PMP test problems in the OR-
Library.
2. Simulated annealing worked much better with temperature reduction (sa1.m) than it did
with temperature adjustment (sa2.m).
3. The sa1.m heuristic performed well at both parameter settings. Increasing the cooling
parameter from c = .8 to c = .9 produced a slight improvement in average solution qual-
ity but roughly doubled the computation time.
4. Although computationally efficient relative to simulated annealing, apcluster.m sel-
dom achieves the best-found solution and can exhibit severe departures from the best-
found objective value in many instances. These findings support those reported by Br-
usco and Köhn (2008a). Moreover, apcluster.m appears to have greater computa-
tional storage requirements than sa1.m given that the former method ran out of memory
on problem RL5934.
5. Although computationally efficient and apt to identify the best-found objective value
when p is small, the relative performance of vsh_fc.m declines significantly as p in-
creases.
6. For the largest problem instances, PCB3038 and RL5934, sa1.m is competitive with the
best p-median heuristics in the literature.

5. Application: Telecommunication Customer Satisfaction Ratings

We applied Frey and Dueck’s (2007) affinity propagation program, Brusco and Köhn’s
(2008a) implementation of VSH, and simulated annealing to a telecommunications data set
originally studied by Brusco, Cradit, and Tashchian (2003) and more recently considered by
Brusco and Köhn (2008b). The data consist of N = 475 responses from corporate technology
managers regarding their satisfaction with the services of their current long-distance service
provider on V = 5 dimensions: (v1 ) price competitiveness, (v2 ) account executive responsive-
ness, (v3 ) billing accuracy, (v4 ) product offering adequacy, and (v5 ) repair service responsive-
ness. Each variable was measured using a five-point Likert scale, and the similarity matrix was
obtained based on the negative Euclidean distances between pairs of respondents.
The vsh_fc.m and sa1.m (c = 0.8) algorithms were applied to the telecommunications
data set for a range of 2 ≤ p ≤ 10 clusters. The preference weights of the apcluster.m program
were adjusted by trial-and-error to produce solutions for this same range of p. Ten additional sets
of preference weights were also evaluated for apcluster.m; however, the algorithm failed to
converge in two instances. For comparison purposes, the vsh_fc.m and sa1.m algorithms
were subsequently used to produce solutions using the number of clusters selected by apclus-
ter.m. Finally, for benchmarking purposes, the globally optimal solution for each value of p
472 PSYCHOMETRIKA

TABLE 4.
A comparison of objective values (z1 ) for apcluster.m, vsh_fc.m, and sa1.m for the telecommunications data
set for different values of p (suboptimal values are shown in bold).

Preferences p Global apcluster.m vsh_fc.m sa1.m


optimum c = 0.8
−100 2 −795.0734 −795.0816 −795.0734 −795.0734
−70 3 −709.0435 −709.0517 −709.0435 −709.0435
−60 4 −656.9141 −656.9141 −656.9141 −656.9141
−50 5 −615.3340 −618.6273 −615.3340 −615.3340
−23 6 −591.3238 −591.7196 −591.3238 −591.3238
−20 7 −569.3812 −576.2291 −569.3812 −569.3812
−16 8 −551.5746 −551.5746 −551.5746 −551.5746
−14 9 −537.0994 −537.0994 −537.0994 −537.0994
−13 10 −522.7462 −523.8997 −522.7462 −522.7462
−10 15 −464.6955 −466.1379 −464.6955 −464.6955
−9 16 −454.6059 −454.6059 −454.6059 −454.6059
−8 18 −436.8337 −436.8337 −436.8337 −436.8337
−7 21 −415.4626 −415.7108 −415.4626 −415.4626
−6 26 −384.4445 −385.1051 −384.4445 −384.4445
−5 31 −358.2898 −359.1136 −358.3604 −358.2898
−4 37 −331.3577 −331.8430 −331.3577 −331.3577
−3 n/a n/a Failed to converge n/a n/a
−2 n/a n/a Failed to converge n/a n/a
−1 160 −84.0000 −122.7990 −84.0000 −84.0000

TABLE 5.
A comparison of the 7-cluster partitions obtained for the telecommunications data set
using sa1.m and apcluster.m (the ‘Medians’ column contains the measurements of
the cluster exemplar for each of the five variables).

Cluster sa1.m solution apcluster.m solution


Size Medians Size Medians
1 63 55555 63 55555
2 91 44444 96 44444
3 63 44344 71 44344
4 88 43444 72 43433
5 87 43333 65 42444
6 52 32333 51 33333
7 31 22232 57 32232

was obtained using the procedure described by Brusco and Köhn (2008b). The results are pro-
vided in Table 4.
There are 19 different preference weights in Table 4; however, because apcluster.m
failed to converge for weights of −3 and −2, the comparison is limited to 17 different values
of p. The sa1.m algorithm obtained the globally optimal solution for all 17 problems, whereas
vsh_fc.m provided the optimum in all but one instance (p = 31). Contrastingly, apclus-
ter.m obtained the global optimum in only 5 of the 17 instances. From a practical standpoint,
it is important to recognize that suboptimality of apcluster.m can have serious ramifications
for interpretation of the clustering solution. To illustrate, we present a comparison in Table 5 of
the 7-cluster partitions for apcluster.m and sa1.m (alternatively, vsh_fc.m, which pro-
duced the same solution as sa1.m).
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 473

Table 5 reveals that, although apcluster.m and sa1.m yielded the same medians for
clusters 1, 2, and 3, the remaining four medians are different. Most noteworthy among the dif-
ferences is the fact that clusters 6 and 7 of the sa1.m partition reflect a more extreme level of
dissatisfaction than the corresponding clusters for the apcluster.m partition. For example,
consider that the median measures for cluster 7 are the same except that the sa1.m partition
has a more extreme measure of ‘2’ on the first variable in comparison to ‘3’ for the apcluster.m
partition. Similarly, cluster 6 of the sa1.m and apcluster.m partitions are the same, except
for the second variable where the sa1.m partition has a more extreme median of ‘2’. Although
these differences might seem subtle at first glance, they yield marked differences in cluster mem-
berships. For example, we computed a value of .639 for Hubert and Arabie’s (1985) adjusted
Rand index (ARI) as a measure of agreement between the two partitions. The ARI achieves a
value of one for perfect agreement and a value of zero for chance agreement. Based on Steinley’s
(2004) guidelines, a value of .639 would reflect a mediocre level of agreement between the two
partitions.

6. Summary and Conclusions

Our objective in this paper was to build on a recent stream of research pertaining to the
importance of the p-median problem and exemplar-based clustering (Alba & Dominguez, 2006;
Avella et al., 2007; Beltran et al., 2006; Brusco & Köhn, 2008a, 2008b; Frey & Dueck, 2007,
2008; Hansen & Mladenović, 2008; Mladenović et al., 2007). Although vital for theoretical pur-
poses, exact procedures are infeasible for large matrices, and most of the reported computa-
tional results for such methods are limited to similarity/dissimilarity data obtained from Euclid-
ean distances between points in two-dimensional space (see, for example, Avella et al., 2007;
Beltran et al., 2006; Brusco & Köhn, 2008a). The performance of these exact procedures for
data structures that are asymmetric and/or have violations of the triangle inequality has not been
demonstrated (see, for example, Hansen et al., 2001).
The approximate procedure termed “affinity propagation,” which was offered by Frey and
Dueck (2007), is capable of obtaining reasonable solutions in modest computation times. How-
ever, Brusco and Köhn (2008a) showed that affinity propagation seldom obtains the globally
optimal solution for small data sets, and our results indicate that this shortcoming persists for
larger problems. Moreover, we have found that apcluster.m also exhibits convergence dif-
ficulties in some instances. Brusco and Köhn’s MATLAB implementation of the VSH performs
well for problems where the number of clusters is 50 or fewer; however, its computation time
explodes, and its performance degrades as an increasing function of p.
We have developed a new simulated annealing heuristic for the p-median problem that sub-
stantially outperformed Chiyoshi and Galvão’s (2000) implementation of simulated annealing
across the test problems in the OR-Library. The new simulated annealing heuristic also over-
comes the limitations of apcluster.m and vsh_fc.m. Although sa1.m requires more com-
putation time than apcluster.m, the former method typically obtains solutions with better
objective values for both large and small values of N and p. Moreover, sa1.m is less suscepti-
ble to the computer memory and convergence problems encountered by apcluster.m in our
experiments. Relative to vsh_fc.m, sa1.m provides appreciably better objective function val-
ues in less time when N and p are large. We have shown that simulated annealing can produce
exceptional results for the p-median problem; however, it is important to acknowledge that glob-
ally optimal solutions are not guaranteed. Nevertheless, the quality of the solutions obtained for
the largest test problem in our study, RL5934, is extremely competitive with two of the best-
performing methods in the literature, (a) Hansen et al.’s (2001) variable neighborhood search
method and (b) Resende and Werneck’s (2004) hybrid heuristic.
474 PSYCHOMETRIKA

Finally, it is important to clarify that we are not purporting that the p-median model should
replace within-cluster sums of squares (K-means) partitioning as the most prominent discrete
optimization approach for cluster analysis. However, we do feel that the p-median model should
be considered as a plausible alternative to K-means because it possesses several desirable prop-
erties: (a) the provision of exemplars as centroids, (b) the flexibility to accommodate both metric
and nonmetric data, and (c) the flexibility to handle asymmetric and even rectangular proxim-
ity matrices. Through the development of software programs that can efficiently produce high-
quality solutions to p-median problems, we hope to afford researchers the opportunity to at least
evaluate the p-median approach as an alternative to K-means.

References

Aarts, E., & Korst, J. (1989). Simulated annealing and Boltzmann machines: A stochastic approach to combinatorial
optimization and neural computing. New York: Wiley.
Alba, E., & Dominguez, E. (2006). Comparative analysis of modern optimization tools for the p-median problem. Sta-
tistics and Computing, 16, 251–260.
Alp, O., Erkut, E., & Drezner, Z. (2003). An efficient genetic algorithm for the p-median problem. Annals of Operations
Research, 122, 21–42.
Avella, P., Sassano, A., & Vasil’ev, I. (2007). Computational study of large-scale p-median problems. Mathematical
Programming A, 109, 89–114.
Beasley, J.E. (1990). OR-Library: Distributing test problems by electronic mail. Journal of the Operational Research
Society, 41, 1069–1072.
Beltran, C., Tadonki, C., & Vial, J. (2006). Solving the p-median problem with a semi-Lagrangian relaxation. Computa-
tional Optimization and Applications, June 5, 2006, doi:10.1007/s10589-006-6513-6.
Brusco, M.J., Cradit, J.D., & Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings:
An application to customer value. Journal of Marketing Research, 40, 225–234.
Brusco, M.J., & Köhn, H.-F. (2008a). Comment on ‘Clustering by passing messages between data points’. Science, 319,
726.
Brusco, M.J., & Köhn, H.-F. (2008b). Optimal partitioning of a data set based on the p-median model. Psychometrika,
73, 89–105.
Brusco, M.J., Köhn, H.-F., & Stahl, S. (2008). Heuristic implementation of dynamic programming for matrix permutation
problems in combinatorial data analysis. Psychometrika, 73, 503–522.
Brusco, M.J., & Steinley, D. (2007). A comparison of heuristic procedures for minimum within-cluster sums of squares
partitioning. Psychometrika, 72, 583–600.
Ceulemans, E., & Van Mechelen, I. (2008). CLASSI: A classification model for the study of sequential processes and
individual differences therein. Psychometrika, 73, 107–124.
Ceulemans, E., Van Mechelen, I., & Leenen, I. (2007). The local minima problem in hierarchical classes analysis:
An evaluation of a simulated annealing algorithm and various multistart procedures. Psychometrika, 72, 377–391.
Chiyoshi, F., & Galvão, R.D. (2000). A statistical analysis of simulated annealing applied to the p-median problem.
Annals of Operations Research, 96, 61–74.
Christofides, N., & Beasley, J.E. (1982). A tree search algorithm for the p-median problem. European Journal of Oper-
ational Research, 10, 196–204.
Cornuejols, G., Fisher, M.L., & Nemhauser, G.L. (1977). Location of bank accounts to optimize float: An analytic study
of exact and approximate algorithms. Management Science, 23, 789–810.
Du Merle, O., & Vial, J.-P. (2002). Proximal-ACCPM, a cutting plane method for column generation and Lagrangian
relaxation: application to the p-median problem (Technical report 2002.23). HEC Genève, University of Genève.
Forgy, E.W. (1965). Cluster analyses of multivariate data: Efficiency versus interpretability of classifications. Biometrics,
21, 768.
Frey, B., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972–976.
Frey, B., & Dueck, D. (2008). Response to comment on “Clustering by passing messages between data points”. Science,
319, 726.
Galvão, R.D. (1980). A dual-bounded algorithm for the p-median problem. Operations Research, 28, 1112–1121.
Hanjoul, P., & Peeters, D. (1985). A comparison of two dual-based procedures for solving the p-median problem. Euro-
pean Journal of Operational Research, 20, 387–396.
Hansen, P., & Mladenović, N. (1997). Variable neighborhood search for the p-median. Location Science, 5, 207–226.
Hansen, P., & Mladenović, N. (2008). Complement to a comparative analysis of heuristics for the p-median problem.
Statistics and Computing, 18, 41–46.
Hansen, P., Mladenović, N., & Perez-Brito, D. (2001). Variable neighborhood decomposition search. Journal of Heuris-
tics, 7, 335–350.
Hartigan, J.A., & Wong, M.A. (1979). Algorithm AS136: A k-means clustering program. Applied Statistics, 28, 100–
128.
Howard, R.N. (1966). Classifying a population into homogeneous groups. In J.R. Lawrence (Ed.), Operational research
and social sciences (pp. 585–594). London: Tavistock.
MICHAEL J. BRUSCO AND HANS-FRIEDRICH KÖHN 475

Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Kaufman, L., & Rousseeuw, P.J. (1990). Finding groups in data: an introduction to cluster analysis. New York: Wiley.
Kirkpatrick, S., Gelatt, C.D., & Vecchi, M.P. (1983). Optimization by simulated annealing. Science, 220, 671–680.
Klastorin, T. (1985). The p-median problem for cluster analysis: A comparative test using the mixture model approach.
Management Science, 31, 84–95.
Kuehn, A.A., & Hamburger, M.J. (1963). A heuristic program for locating warehouses. Management Science, 9, 643–
666.
Levanova, T., & Loresh, M.A. (2004). Algorithms of ant system and simulated annealing for the p-median problem.
Automation and Remote Control, 65, 431–438.
Lin, S., & Kernighan, B.W. (1973). An effective heuristic algorithm for the traveling salesman problem. Operations
Research, 21, 498–516.
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In L.M. Le Cam &
J. Neyman (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1,
pp. 281–297). Berkeley: University of California Press.
Maranzana, F.E. (1964). On the location of supply points to minimize transportation costs. Operational Research Quar-
terly, 15, 261–270.
Mladenović, N., Brimberg, J., Hansen, P., & Moreno-Pérez, J.A. (2007). The p-median problem: A survey of metaheuris-
tic approaches. European Journal of Operational Research, 179, 927–939.
Moreno-Pérez, J.A., García-Roda, J.L., & Moreno-Vega, J.M. (1994). A parallel genetic algorithm for the discrete
p-median problem. Studies in Location Analysis, 7, 131–141.
Mulvey, J.M., & Crowder, H.P. (1979). Cluster analysis: An application of Lagrangian relaxation. Management Science,
25, 329–340.
Murillo, A., Vera, J.-F., & Heiser, W.J. (2005). A permutation-translation simulated annealing algorithm for L1 and L2
unidimensional scaling. Journal of Classification, 22, 119–138.
Murray, A.T., & Church, R.L. (1996). Applying simulated annealing to location-planning models. Journal of Heuristics,
2, 31–53.
Narula, S.C., Ogbu, U.I., & Samuelsson, H.M. (1977). An algorithm for the p-median problem. Operations Research,
25, 709–713.
Rao, M.R. (1971). Cluster analysis and mathematical programming. Journal of the American Statistical Association, 66,
622–626.
Reinelt, G. (2001). TSPLIB. http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95.
Resende, M.G.C., & Werneck, R.F. (2003). On the implementation of a swap-based local-search procedure for the p-
median problem. In R.E. Ladner (Ed.), Proceedings of the fifth workshop on algorithm engineering and experiments
(pp. 119–127). Philadelphia: SIAM.
Resende, M.G.C., & Werneck, R.F. (2004). A hybrid heuristic for the p-median problem. Journal of Heuristics, 10,
59–88.
ReVelle, C.S., & Swain, R. (1970). Central facilities location. Geographical Analysis, 2, 30–42.
Rolland, E., Schilling, D.A., & Current, J.R. (1996). A efficient tabu search procedure for the p-median problem. Euro-
pean Journal of Operational Research, 96, 329–342.
Rosing, K.E. (1997). An empirical investigation of the effectiveness of a vertex substitution heuristic. Environment and
Planning B, 24, 59–67.
Rosing, K.E., & ReVelle, C.S. (1997). Heuristic concentration: Two stage solution construction. European Journal of
Operational Research, 97, 75–86.
Rosing, K.E., ReVelle, C.S., Rolland, E., Schilling, D.A., & Current, J.R. (1998). Heuristic concentration and tabu search:
A head to head comparison. European Journal of Operational Research, 104, 93–99.
Steinhaus, H. (1956). Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences,
Classe III Mathématique, Astronomie, Physique, Chimie, Géologie, et Géographie, IV(12), 801–804.
Steinley, D. (2004). Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods, 9, 386–396.
Steinley, D. (2006). K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psy-
chology, 59, 1–34.
Taillard, E.D. (2003). Heuristic methods for large centroid clustering problems. Journal of Heuristics, 9, 51–74.
Teitz, M.B., & Bart, P. (1968). Heuristic methods for estimating the generalized vertex median of a weighted graph.
Operations Research, 16, 955–961.
Thorndike, R.L. (1953). Who belongs in the family? Psychometrika, 18, 267–276.
van Laarhoven, P.J.M., & Aarts, E.H.L. (1987). Simulated annealing: Theory and applications. Dordrecht: Kluwer.
Vera, J.-F., Heiser, W.J., & Murillo, A. (2007). Global optimization in any Minkowski Metric: A permutation-translation
simulated annealing algorithm for multidimensional scaling. Journal of Classification, 24, 277–301.
Vinod, H. (1969). Integer programming and the theory of grouping. Journal of the American Statistical Association, 64,
506–517.
Whitaker, R. (1983). A fast algorithm for the greedy interchange of large-scale clustering and median location problems.
INFOR, 21, 95–108.

Manuscript Received: 13 MAY 2008


Final Version Received: 27 JAN 2009
Published Online Date: 17 MAR 2009

Anda mungkin juga menyukai