SERVICES CO TUKWILAWA
F/6 12/1
IMPLEMENTATION
OF THE GIBBS POOLE-STOCKMEYER
ALGORITHMA*D THE --ETC(U)
JAN 82 J B LEWIS
F49620-B1-C-0072
UNCLASSIFIED
AFOSR-TR-82-0264N
flfflflfflflC
11111
_L15_
lip,
SECURITY
NCASSIIED
f'w.,,, I i
*FOR.TR. 82-0264
4.
!-E1
5.
AUTHOR(s)
waER
TYPE
Co.wEPn:
'Z
F49620-81 -C-GeT?
PERFORMING ORGANIZATION
10.
.61102F;
98188
AS-C
-304/,L3
______________
TArJ SA-
Boiling AFB DC
17
20332
13. NJMBER OF
IS.
IS..
S.
INz
TECH IVCAL.,
6.
4.
'i.4'~
SECURITY
-AGES
17. DISTRIBUTION STATEMENT (of the abstract entered in Block 2), if different fromtReport)
D I
D
19.
KEY WORDS (Continue on reverse aide if necessary end identity by block nucrber)
c.,201
in
algoih for reducing the bandwidth of a matrix, and 509, w-hich implements the
Gibbs-King algorithm for reducing the profile of a matrix. Reduction of ma-trix
profile is more often required than bandwidth reduction, but Algorithm 509 is
substantially slower than Algorithm 508. Consequently, the Gibbs-PooleStockmeyer algorithm has been recommended and used in contexts
(CONTINUED)
DD I
IM7
1473
UNCLASSIFIED
SECURITY CLASSIFICATION, OF THIS PAGE "I~h
L'e'
V-,
wered)
"
-0
UNCLASSIFIED
SECURITy CLASSIFICATION OF THIS PAGE(Whon Date Entered)
Vi
ITEM #20, CONTINUED: where the b.t ter pro filt I ,n r iuc'
,r
King algorithm would be more appropriate.
In add:,.' :L, ,..,
,, i
r 'm.
.
,~
'
.. '
i-,
. i,
Acoes slo
,-
Foir
XTIS, GRA&I
DTIC TAB
Unannounced
JUStitfication
Distribut ion/
I.I
Availability Codes
Avail and/or
Special
Dist
copy
1148PECTED
UNCLASSIF I ED
S-CuR.TY CLASSIFICATIOV OF
Implementations
of
two
effective
matrix
reordering
algorithms
were
published in this journal as Algorithms 508 [11, which implements the GibbsPoole-Stockmeyer algorithm for reducing the bandwidth of a matrix, and 509 [9],
which implements the Gibbs-King algorithm for reducing the profile of a matrix.
Reduction of matrix profile is more often required than bandwidth reduction, but
Algorithm 509 is substantially slower than Algorithm 508. Consequently, the
Gibbs-Poole-Stockmeyer algorithm has been recommended and used in contexts
where the better profile reduction provided by the Gibbs-King algorithm would be
more appropriate. in addition, Algorithms 508 and 509 both contain unnecessary
restrictions on problem size and provide little error checking.
We describe a
and
Gibbs-King
Algorithms
for
Author's address: Boeing Computer Services Co., Mail Stop 9C-51, 565 Andover
Park West, Tukwila, Wa. 98188. This work was supported in part by the Air Force
Office of Scientific Research under contract F49620-81-C-0072.
Approved for public release'
distribution unlimited.
'h-
1. Introduction
Many algorithms
and
programs which
solve
sparse linear
equations use
discretizations.
There
is much recent
of
"banded"
matrix
to
achieve
survey).
Determining
the
lower
costs
for
large problems
reordering
and
the
numerical
(see [2]
setting
up
than
for
the
an
data
structures for these algorithms is substantially greater in cost than for the
banded reorderings provided by the Gibbs-Poole-Stockmeyer algorithm (GPS) and
the Gibbs-King algorithm (GK).
variable banded linear equation solvers often have smaller total cost than their
more sophisticated counterparts.
banded
orderings
are
for large problems, on sequential machines like the CDC CYBER 175 computer where
arithmetic is fast and memory accesses are slow.
solved with sophisticated sparse solvers which use banded algorithms to solve
subproblems [7].
reduction algorithms.
The Gibbs-Poole-Stockmeyer algorithm [1], [11] has been shown to be an
effective tool for reducing the bandwidth and profile of finite element matrices
arising in structural engineering problems [4], [101, [16].
The Gibbs-King
variation [9] of this algorithm provides better reduction of the profile and
wavefront.
often much slower in execution than its predecessor, Algorithm 508, which
implemented the GPS algorithm; speed of execution is an important virtue of
Algorithm 508.
both algorithms.
Algorithm 508 or 509. The speedup for the GK algorithm is dramatic, and reduces
its execution time to nearly that of the quick GPS algorithm.
In addition, the
diagonal pivoting (symmetic row and column interchanges) does not produce
numerical instability.
reordering algorithms in the context of the graph of the matrix: each equation
corresponds to a single node in the graph; there is an edge or arc connecting
nodes
i and
j only if the
ij-th
The two reordering algorithms consist of three phases, and they differ only
in the last phase.
1.
2.
combine the level structures rooted at either end of the pseudodiameter into a more general level structure.
3.
b.
Discussions of the details of these phases are given in [9] and in [11].
involve creating or
are
in the
same
equivalence
class,
v, all of
the
v's
immediately
preceding class or the immediately succeeding class. Rooted level stuctures are
level structures whose first level consists of a single node
succeeding levels consist of nodes with equal minimum distance to u. There are
two obvious and compact representations for such structures in FORTRAN:
1.
2.
in each level,
of the nodes
a list
shorter
the
giving
index
of
node
initial
the
in
the
corresponding level.
These two representations provide equivalent information with differing costs.
The first form is efficient for deciding to which
belongs.
The second form provides easy access to the set of nodes forming a
given level.
operations
to extract the
for
can
phase
2.
The
third
phase
be
performed
most
easily
if
both
parts of the
the
matrices,
storage
which
requirements
are
in phase
encountered
with
1 for
surprising
symmetrically
frequency
reducible
in structural
engineering contexts.
An opportunity for improving the choice of data structures is particularly
evident in phase 3b, Gibbs' variant of the King algorithm.
scheme there are three sets
integers.
Neither
S2
nor
S2,
S3
S3
and
In this numbering
is a queue.
while additions occur only at the end of the queue, deletions are made on the
5,
basis of a merit function which depends only partly on location within the
queue.
S2 and
S3 by
simple linked lists, which reduces the complexity of the deletion operations
which occur on both priority queues.
function
implementation.
S2
the
most
data
structures
dramatic
and
the
improvement
Gibbs-King merit
over
the
connected
to
the
newly
numbered
S2
to
node.
S3
This
dynamic.
original
latter
S3
operation
all nodes
makes
the
It is in repeatedly computing
connection counts that King's numbering is more complex than the Cuthill-McKee
numbering.
S3
determined by a
being
iteration
S3.
S3
4. Storage Requirements
but
also
necessitate
different
calling
sequences
for
the
algorithms.
of the
algorithm. The first change is to the workspace needed for the reordering.
The
given as a single vector, whose length is specified by the user; the new code
checks that it does not access elements beyond this length.
6n+3 will suffice for any problem, but a vector of length
problems.
the user.
matrix, to
A vector of length
4n
The actual space used for any particular problem is reported back to
(The space required ranges from
6n+3
2n
5n + 802
sparse
symmetric
matrices.
A more dramatic reduction in space requirements has been made by changing
the representation of the graph or the connectivity structure of the matrix to a
The matrix is represented by a connectivity or adjacency
list of the columns in which nonzero entries occur off the diagonal.
Algorithms
508 and 509 require that this table be a rectangular array, with
rows and at
least as many columns as the maximum degree of any node (the maximum number of
off-diagonal nonzeroes in any row).
degree for each node.
lists, where the connections for each node are given in consecutive locations.
This
requires
vector
of
length
nz,
the
total
number
of
off-diagonal
symmetric
structure
is needed
in both
nonzeroes.
(The full
efficiency.)
cases
for
n, giving in
each entry the index in the connectivity table of the first node in the sublist
for the corresponding row. A second vector giving the degree of each node (the
length of the corresponding connection list) is still needed. This form can
never require more than
and will
require much less storage whenever the number of nonzeroes in rows varies
significantly.
[31
storage
savings
provided
by
the
compact
representation
of
the
connection table dominate the differences in space requirements between the new
and old implementations, as reported in the following section. The reasons for
preferring this format are evident.
508 and 509, a version of the new implementation which uses the old format for
the
connection
table
distribution service).
author
(not through
the ACM
5. Empirical results
The original and the new implementations of the two reordering algorithms
were compared on 30 test matrices collected by Everstine [4].
These test
problems are an outgrowth of the 19 problems used in the original papers (9] and
[111.
Algorithm 508 and other banded reordering strategies, but his paper excluded
Algorithm 509 because of its relatively poor execution speed.
The following
results
show
that
excluding
the
Gibbs-King
algorithm
is unwarranted;
very well
an
in
timings for the new imnplementation of the GPS algorithm and the GK algorithm,
and for Algorithms 508
and 509.
is the relative
This
and
2.
were obtained on a CDC Cyber 175 computer, using the FTN 4.6 FORTRAN compiler
with optimization level 2.
using FORTRAN H Extended with optimization level 2, using full length integers.
Both tables clearly show the increased speed of the new implementation and also
demonstrate that an efficient implementation of the GK algorithm can be nearly
as fast as an efficient implementation of the GPS algorithm.
Table 3 contains the reordering results from the GK algorithm needed to
complete
Everstine's
comparison
of
banding reordering
algorithms.
These
results show that, on half of the test problems, the GK algorithm performs
better than all of the algorithms considered by Everstine.
Everstine concluded
that Algorithm 508 was the best algorithm because of its effectiveness
and
speed; the Gibbs-King algorithm always performs at least as well as the GibbsPoole-Stockmeyer algorithm on these test problems, and is faster in its new
implementation than the version of Algorithm 508 used by Everstine.
In
Table 4 we give the minimum storage requirements for all of the vector arguments
to the
reordering algorithms,
is based on
the
minimum workspace requirement reported back by the code, which is less than what
the user would need without apriori knowledge.
vector of length
5n; the last column of Table 4 gives the minimum length (in
units of n), which is used to compute the first column of this table.
The
requirements for the old implementations are those of the vector and matrix
inputs, plus
801
workspace requirements
for a machine
like the
IBM 370
series,
where short
in "long
integer words".
Thus, the
actual requirements in bytes for an IBM 370 or DEC VAX 11 computer would be four
times as large.
The space requirements do not include the space occupied by the reordering
codes, which is clearly a system dependent quantity.
code produced by the FTN compiler for the CDC CYBER 175 required
for the new algorithms and
1434
2163
vords
new, faster implementation requires more storage for the code itself, the total
space required will usually be less.
10
6. Concluding Remarks
are
the original
implementations
because
the change
in data
it
improper input.
j,i-th
i,j-th
element is nonzero.
explicitly because of cost, but in most cases the new implementation will detect
an inconsistency in its data structures and halt the reordering.
There are possibilities for further enhancement of the speed of these
algorithms. Several of these were considered in the development of these codes,
but rejected.
lists which are sorted; an asymptotically faster method like quicksort [15)
might provide some speedup.
Further, the
instability [13] of the sorting algorithm would change the strategies for
breaking ties, thus changing the final reordering.
implementation
uses
insertion sorts
and
as
is also more effective, at least in the context of the orderings they consider
in [6].
We experimented with
their
recommended
"S2"
heuristic, but
took
advantage of the context of algorithms 508 and 509 to begin the pseudo-diameter
calculation with a node of minimum degree rather than an arbitrary node.
selective choice of starting
This
We concluded
that the faster heuristic also carries the risk of significantly reducing the
effectiveness of the algorithm.
same
the
heuristic
for
computing
pseudo-diameter
and
produces
the
same
ACKNOWLEDGMENT
The author wishes to thank W. G. Poole, Jr. for making available copies of the
original implementation of the algorithms and for his helpful criticism of this
paper.
REFERENCES
[1]
[2]
12
[3]
Efficient
Eisenstat, S. C., Schultz, M. A., and Sherman, A. W.
Implementation of Sparse Symmetric Gaussian Elimination, Proceedings of
the AICA International Symposium on Computer Methods for PDE's, 1975, 33-
39
[4]
[5]
[6]
[7]
[8]
George, A., Liu, J., and Ng, E. User Guide for Sparspak: Waterloo Sparse
Linear Equations Package, Dept. of Computer Science, University of
Waterloo, Waterloo, Ontario, 1979
[9]
13
Table 1
Relative Execution Times for the New and Old Implementations
Of the Gibbs-Poole-Stockmeyer and Gibbs-King Algorithms
(CDC Cyber 175)
n
59
66
72
87
162
193
198
209
221
234
245
307
310
346
361
419
492
503
512
592
607
758
869
878
918
992
1005
1007
1242
2680
Time in Seconds
for New GPS
.006
.006
.006
.009
.017
.078
.036
.026
.030
.033
.031
.045
.044
.059
.035
.055
.059
.069
.163
.101
.123
.120
.202
.223
.188
.555
.136
.271
.321
.448
1.3
1.5
1.3
1.4
1.5
2.1
1.7
1.5
1.5
1.9
1.5
1.6
1.6
1.5
1.7
1.5
1.8
1.6
2.1
1.6
1.7
2.1
1.6
1.6
1.6
1.6
1.6
1.6
1.5
1.7
2.4
2.1
2.3
4.6
4.2
12.2
2.5
16.6
4.2
3.1
14.1
22.1
3.7
16.2
7.3
14.5
6.2
36.2
3.9
9.2
16.5
5.2
7.6
7.6
9.6
6.7
50.6
7.4
34.0
32.6
14
Table 2
Relative Execution Times for the New and Old Implementations
Of the Gibbs-Poole-Stockmeyer and Gibbs-King Algorithms
(IBM 3032)
n
59
66
72
87
162
193
198
209
221
234
245
307
310
346
361
419
492
503
512
592
607
758
869
878
918
992
1005
1007
1242
2680
Time in Seconds
for New GPS
.008
.009
.008
.012
.024
.111
.052
.037
.042
.044
.043
.064
.061
.083
.047
.078
.084
.095
.208
.139
.164
.164
.286
.312
.266
.773
.192
.385
.458
.601
1.2
1.2
1.4
1.4
1.3
1.3
1.2
1.5
1.3
1.2
1.4
1.3
1.3
1.4
1.5
1.4
1.3
1.6
1.1
1.3
1.3
1.3
1.2
1.2
1.2
1.2
1.5
1.2
1.3
1.4
1.9
1.9
2.1
2.8
2.6
5.1
1.9
7.1
2.6
2.3
6.6
9.1
2.4
6.7
4.1
6.3
3.5
13.9
2.5
4.5
7.4
3.2
3.7
3.7
4.5
3.1
19.3
3.6
13.0
13.4
15
Table 3
Bandwidth, Profile and Wavefront Computed by
the Gibbs-King Algorithm
59
66
72
87
162
193
198
209
221
234
245
307
310
346
361
419
492
503
512
592
607
758
869
878
918
992
1005
1007
1242
2680
Bandwidth
11
3
12
20
17
45
11
48
20
18
48
63
28
55
14
41
37
69
29
47
78
25
62
40
64
35
135
54
129
89
Profile
255
127
172
595
1417
4416
1115
3823
2002
1115
3568
7825
2696
7335
4699
7654
5021
14539
4309
10333
14153
7417
14859
18818
19580
33076
39136
21422
51710
96746
RMS
5.53+
2.94+
3.46+
8.24+
10.09
24.86
6.95+
20.40
10.53
6.29
16.64
27.36
9.85
23.29
14.23+
19.96
11.99
32.22+
11.43+
19.70+
27.25+
12.01+
19.87
22.60+
23.39+
34.66+
44.80
22.60+
46.26
38.03+
Max
Average
Wavefront
8
3
4
12
13
36
9
32
17
12
27
35
16
34
15
30
22
50
23
33
38
25
37
25
39
36
89
33
84
60
5.32*
2.92*
3.39
7.84*
9.75
23.88
6.63*
19.29
10.06
5.76
15.56
26.49
9.70
22.20
14.02*
19.27
11.21
29.90*
9.42*
18.45*
24.32*
10.78*
18.10*
22.43*
22.33*
34.34*
39.94*
22.27*
42.63
37.10*
16
Table 4
Workspace Requirements for Old and New Implementations
of the GPS and GK Algorithms
Unpacked Integers
Packed Integers*
Old
New+
Old
59
66
72
87
162
193
198
209
221
234
245
307
310
346
361
419
492
503
512
592
607
758
869
878
918
992
1005
1007
1242
2680
1510
1594
1594
2455
3232
7750
4366
5609
4780
4546
5457
5407
6072
9452
6217
8763
9166
16395
11554
13234
12942
13688
18182
14850
18244
24610
33967
16914
23158
67802
607
752
651
1021
2064
4485
2544
2830
2821
2208
2743
4103
4121
5300
4854
5730
5859
8617
6542
8181
8718
10105
11807
11964
12106
?1800
13754
13757
16735
38657
1363
1429
1450
1933
2584
4952
3277
3937
3565
3493
3987
4179
4522
6338
4773
6249
6706
10359
7970
9090
8997
9898
12534
10899
12736
16178
20902
12383
16327
43682
New+
332
408
361
553
1112
2338
1370
1519
1520
1220
1493
2204
2215
2822
2607
3074
3175
4559
3526
4386
4662
5431
6337
6420
6511
11395
7379
7381
8988
20668
Workspace
Factor
(*n)
3.76
4.55
3.96
3.52
3.44
3.14
3.82
3.20
3.39
3.87
3.23
3.15
3.40
3.99
3.27
3.17
3.49
3.15
3.94
3.20
3.91
3.42
3.20
3.14
3.14
3.10
3.11
3.15
3.08
3.09