DOI 10.1007/s00466-007-0218-2
ORIGINAL PAPER
Received: 14 May 2007 / Accepted: 22 August 2007 / Published online: 25 September 2007
Springer-Verlag 2007
Abstract This paper compares the efficiency of multibody Keywords Multibody dynamics Real-time
system (MBS) dynamic simulation codes that rely on diffe- Performance Lineal algebra Implementation
rent implementations of linear algebra operations. The dyna-
mics of an N -loop four-bar mechanism has been solved
with an index-3 augmented Lagrangian formulation combi-
ned with the trapezoidal rule as numerical integrator. 1 Introduction
Different implementations for this method, both dense and
sparse, have been developed, using a number of linear Dynamic simulation of multibody systems (MBS) is of great
algebra software libraries (including sparse linear equation interest for dynamics of machinery, road and rail vehicle
solvers) and optimized sparse matrix computation strategies. design, robotics and biomechanics. Computer simulations
Numerical experiments have been performed in order to mea- performed by MBS simulation tools lead to more reliable,
sure their performance, as a function of problem size and optimized designs and significant reductions in cost and time
matrix filling. Results show that optimal implementations can of the product development cycle. The computational effi-
increase the simulation efficiency in a factor of 23, compa- ciency of these tools is a key issue for two reasons. First,
red with our starting classical implementations, and in some there are some applications, like hardware-in-the-loop set-
topics they disagree with widespread beliefs in MBS dyna- tings or human-in-the-loop devices, which cannot be deve-
mics. Finally, advices are provided to select the implementa- loped unless MBS simulation is performed in real-time. And
tion which delivers the best performance for a certain MBS second, when MBS simulation is used in virtual prototy-
dynamic simulation. ping, faster simulations allow the design engineer to perform
what-if-analyses and optimizations in shorter times, increa-
sing productivity and interaction with the model. Therefore,
computational efficiency is an active area of research in MBS,
and it holds a relevant position in MBS-related scientific
conferences and journals.
A great variety of methods to improve simulation speed
M. Gonzlez (B) F. Gonzlez D. Dopico A. Luaces have been proposed during the last years [13]. Most of these
Escuela Politcnica Superior, Universidad de A Corua,
methods base their efficiency improvements on the deve-
Mendizbal s/n, 15403 Ferrol, Spain
e-mail: lolo@cdf.udc.es lopment of new dynamic formulations. However, although
URL: http://lim.ii.udc.es implementation aspects can also play a key factor in the per-
F. Gonzlez formance of numeric simulations, their effect on real-time
e-mail: fgonzalez@udc.es multibody system dynamics has not been studied in detail.
D. Dopico Some recent contributions have investigated the possibilities
e-mail: ddopico@udc.es of parallel implementations [4], but comprehensive compa-
A. Luaces risons about serial implementations in MBS dynamics have
e-mail: aluaces@udc.es not been published yet.
123
608 Comput Mech (2008) 41:607615
Multibody dynamics codes make an intensive use of linear be described, since its efficiency will serve as a reference to
algebra operations. This is especially true in global methods, measure performance improvements.
which use a relative big number of coordinates and constraint
equations to define the position of the system; these methods
usually lead to O(N 3 ) algorithms, where N is the number 2.1 Test problem
of bodies, and spend around 80% of the CPU time in matrix
computations. Topological methods lead to O(N ) algorithms The selected test problem (Fig. 1) is a 2D one degree-
due to the reduced size of the involved matrices, and therefore of-freedom assembly of four-bar linkages with N loops, com-
the weight of matrix computations is also reduced. However, posed by thin rods of 1 m length with a uniformly distributed
if flexible bodies are considered, matrix computations take a mass of 1 kg, moving under gravity effects. Initially, the sys-
significant percentage of simulation time even for topological tem is in the position shown in Fig. 1, and the velocity of the
methods. x-coordinate of point B0 is +1 m/s. The simulation time is
As a result, the implementation of linear algebra opera- 20 s. This mechanism has been previously used as a bench-
tions is critical to the efficiency of MBS dynamic simula- mark problem for multibody system dynamics [3,10].
tions. These operations can be grouped into two categories:
(a) operations between scalars, vectors and matrices, and (b) 2.2 Dynamic formulation
solution of linear systems of equations; two additional ortho-
gonal categories can be established based on the data sto- The N -four-bar mechanism has been modeled using planar
rage format: dense storage or sparse storage. Many efficient natural coordinates (global and dependent) [11], leading to
implementations for these routines have been made freely 2N +2 variables (the x and y coordinates of the B points),
available in the last decade. Their performance has been com- and 2N +1 constraints, associated with the constant length
pared in previous works, both in an application-independent condition of the rods. The equations of motion of the whole
context [57] and under the perspective of a particular appli- multibody system are given by the well-known index-3 aug-
cation like Finite Element Analysis [8] or Computational mented Lagrangian formulation in the form:
Chemistry [9]. But, as it will be explained in this paper, these
studies do not fit the particular features of MBS dynamics, Mq + qT + qT * = Q
and therefore their conclusions cannot be extrapolated to this (1)
field. i+1 = i + i+1 , i = 0, 1, 2, . . .
The goal of this paper is to compare the efficiency of
different implementations of linear algebra operations, and where M is the mass matrix (constant for the proposed test
study their effect in the context of MBS dynamic simulation. problem), q are the accelerations, q the Jacobian matrix
Results will provide guidelines about which numerical libra- of the constraint equations, the penalty factor, the
ries and implementation techniques are more convenient in constraints vector, the Lagrange multipliers and Q the
each case. This information will be very helpful to resear- vector of applied and velocity dependent inertia forces.
chers developing high-performance or real-time multibody The Lagrange multipliers for each time-step are obtained
simulation codes. from an iteration process, where the value of 0 is equal to
The remainder of the paper is organized as follows: Sect. 2 the obtained in the previous time-step.
describes the test problem and the dynamic formulation used As integration scheme, the implicit single-step trapezoidal
in the numerical experiments to compare the efficiency of rule has been adopted. The corresponding difference
different implementations; Sects. 3 and 4 present efficient
implementations for dense and sparse linear algebra, respec-
tively; Sect. 5 compares the results obtained in Sects. 3 and 4 y
g = 9.81 N/kg
and extrapolates them to other dynamic formulations; finally,
Sect. 6 provides conclusions, advices for efficient implemen- B0 B1 BN-1 BN
tations and areas of future work.
123
Comput Mech (2008) 41:607615 609
equations in velocities and accelerations are: Table 1 Percentage of the total CPU time required by each algorithm
phase in the starting implementation for typical problem sizes: dense
2 2
qn+1 = qn+1 + q n ; q n = qn + qn (2) version in small problems (10 loops, 22 variables) and sparse version
t t in medium-size problems (40 loops, 82 variables)
4 4 4 Stage Dense (%) Sparse (%)
qn+1 = qn+1 + qn ; qn = qn + qn + qn
t 2 t 2 t
Evaluation of residual and tangent matrix, 48 15
Dynamic equilibrium can be established at time-step n + 1 Eqs. (3) and (5)
by introducing the difference Equation (2) into the equations Evaluation of right-term in orthogonal projec- 4 13
tions, Eq. (6)
of motion (1), leading to a nonlinear algebraic system of Tangent matrix factorizations and back- 44 51
equations with the dependent positions as unknowns: substitutions, Eqs. (4 and 6)
Other 4 21
t 2 T
f (q) = Mqn+1 + (n+1 + n+1 )
4 qn+1
t 2 t 2
Qn+1 + Mqn = 0 (3) capabilities and the linear equation solver included with this
4 4
Such system, whose size is the number of variables in the compiler (IMSL Fortran Library, from Visual Numerics), and
model, is solved through the NewtonRaphson iteration (b) a sparse matrix storage version, using the MA27 sparse
linear equation solver from the Harwell Subroutine Library.
f(q) These two implementations, typical in the multibody com-
qi+1 = f (q) i (4)
q i munity, have been tuned and improved by our group during
using the approximate tangent matrix (symmetric and the last years, and they have proved to be faster than com-
positive-definite) mercial codes [13,14]. Its efficiency will serve as a reference
to measure the performance improvements achieved with the
f (q) t t 2 T
new implementations proposed in this paper.
=M+ C+ q q + K (5)
q 2 4 Table 1 shows the results of a CPU usage profiling in
where C and K represent the contribution of damping and our starting implementations, for both dense and sparse ver-
elastic forces of the system (which are zero for the test pro- sions, applied to representative problem sizes. As stated in
blem). Once convergence is attained into the time-step, the the introduction, matrix computations consume most of the
obtained positions qn+1 satisfy the equations of motion (1) CPU time.
and the constraint conditions = 0, but the corresponding In order to test alternative implementations, the authors
sets of velocities q and accelerations q may not satisfy have developed a new MBS simulation software, implemen-
= 0 and = 0. To achieve this, cleaned velocities q ted in C++, which can be easily configured to use different
and accelerations q are obtained by means of mass-damping- matrix storage formats and linear algebra algorithms and
stiffness orthogonal projections, reusing the factorization of implementations. Numerical experiments have been perfor-
the tangent matrix: med on an AMD Athlon64 CPU. After testing different ope-
rating systems and compilers, results show that their effect
f(q) t t 2 t 2 T
q = M + C+ K q q on the performance is an order of magnitude lower than the
q 2 4 4 q effect of linear algebra implementations. Final CPU times
f(q) t t 2 have been measured using the GNU gcc compiler and the
q = M + C+ K q (6)
q 2 4 Linux O.S., without loss of generality.
t 2 T t
q q +
4 q
This method, described in detail in [12], has proved to be a 3 Efficient dense matrix implementations
robust and efficient global formulation [13,14]. All numerical
experiments will be performed using a time-step t of 1.25 Global formulations applied to reduced rigid models (e.g.
103 s and a penalty factor of 108 s. an industrial robot), or topological formulations applied to
medium-sized rigid models (e.g. a complete road vehicle),
2.3 Starting implementation lead to algorithms that operate with small-sized matrices of
dimensions less than 50 50. In these cases, dense linear
In our starting implementation, the simulation algorithm was algebra is frequently used in MBS dynamics, since it is sup-
implemented using Fortran 90 and the Compaq Visual For- posed to provide equal or higher performance than sparse
tran compiler. Two versions were developed: (a) a dense implementations. Achieving real-time in the simulation of
matrix storage version, using Fortran 90 matrix manipulation these small problems can be a challenge in hardware-in-the
123
610 Comput Mech (2008) 41:607615
systems for automobiles), due to the low computing power Starting implementation
ATLAS+Ref.
4
of embedded microprocessors, the small time-steps required ATLAS+ATLAS
ACML+Ref.
for hardware synchronization and the added control logic.
123
Comput Mech (2008) 41:607615 611
ber of non-zeros is 12, 6 and 1%. These are representative val 1, 2, 3, 4, 5, 6 val 1, 2, 3, 5, 4, 6
values for MBS simulations, and they are quite higher than indx 1, 2, 2, 2, 3, 3 indx 1, 2, 2, 3, 2, 3
typical values in other applications that require sparse matrix jndx 1,1, 2, 3, 2, 3 pntr 1, 3, 5, 7
technology (FEA, CFD).
Hence, MBS dynamics has two characteristics which Fig. 3 Sparse storage formats used in out implementations
make its sparse matrix computations different from other
applications:
(Fig. 3), and allows duplicated entries in the matrix structure.
Therefore, the matrix addition is not actually computed, since
(a) Matrix computations are very repetitive, and the sparse
the different terms are appended as duplicated entries in the
patterns remain constant during the simulation. There-
tangent matrix. Our new implementation uses the compres-
fore, symbolical preprocessing can be applied to almost
sed column storage format (Fig. 3), since it is required by
all matrix expressions at the beginning of the simula-
the sparse linear equation solvers tested in Sect. 4.2. With
tion, in order to accelerate the numerical evaluations
this storage format, matrix additions require complex data
during the simulation. Section 4.1 presents some tips to
traversing that slows down the performance. The following
exploit this feature.
approach was taken in order to optimize the operation:
(b) The involved sparse matrices are relatively small and
dense, compared with the typical values in sparse matrix B = t1 A1 + t2 A2 (7)
technology. Section 4.2 evaluates how sparse linear
equation solvers perform in these circumstances. In the preprocessing stage, the sparse pattern of B is cal-
culated as the union of A1 and A2 sparse patterns, and the
4.1 Optimized sparse matrix computations resulting pattern is added to A1 and A2 . In this way, A1 , A2
and B share the same sparse pattern (same indx and pntr
Several numerical libraries are available nowadays to support arrays in the compressed column storage format shown in
sparse matrix computations: MTL, MV++, Blitz++, Spar- Fig. 3), and therefore, the matrix addition can be computed
seKIT, etc. For our new implementations, we have chosen as a vector addition of the val arrays:
uBLAS, a C++ template class library that provides BLAS valB = t1 valA1 + t2 valA2 (8)
functionality for sparse matrices [20]. Its design and imple-
mentation unify mathematical notation via operator overloa- This technique increases the number of non-zeros (NNZ)
ding and efficient code generation via expression templates. of the addend matrices. In the proposed MBS dynamic for-
Even though, the performance of some matrix operations mulation, the NNZ of the mass matrix M is increased in
can be further improved if some special algorithms are used. a 10% approximately, which slows down the matrixvector
Results of CPU usage profiling (similar to Table 1) guided multiplications needed in the right terms of Eqs. (3) and (6).
us to optimize three operations. However, the simulation timings show that this slowdown
The first optimized operation is the rank-k update of sym- is negligible compared with the gains derived from the fast
metric matrix, qT q , computed in Eq. (5). Since the sparse matrix addition.
structure of the Jacobian matrix q is constant, a symbo- Finally, the third optimized operation concerns sparse
lic analysis is performed in order to pre-calculate the sparse matrix access. The write operation A(i, j) = ai j , straight-
pattern of the result matrix and to create a data structure forward in dense storage, needs additional position lookup
that holds the operations needed to evaluate it during the when the compressed column storage is used. In the pro-
simulation. In our starting sparse implementation, a simi- posed formulation, the update of the Jacobian matrix q in
lar approach was taken, but the Jacobian matrix was stored each iteration takes 1015% of the CPU time. The involved
as dense, to simplify the operations at the cost of a higher operations are rather simple, and most of this time is spent in
memory usage. matrix access. In order to optimize this procedure, a prepro-
The second optimized operation is the matrix addition cessing stage evaluates the Jacobian matrix and registers the
computed in Eq. (5). Our starting sparse implementation used order in which entries q (i, j) are written in the val array of
the Harwell MA27 routine as linear equation solver, which the compressed column format, creating a vector that holds
requires the sparse matrix to be stored in coordinate format indices to these positions, in the same order of evaluation.
123
612 Comput Mech (2008) 41:607615
Table 2 Efficiency of the optimized sparse matrix operations for shared memory or distributed memory parallel machines
Sparse operation CPU time (ms) Ratio have been discarded, since the small matrix sizes in MBS
real-time dynamics (almost fit in the CPU cache memory)
Not optimized Optimized
makes them unprofitable. The same argument applies to ite-
(1) Rank-k update of symme- 2528.2 9.4 269 rative solvers and out-of-core solvers, designed for very big
tric matrix linear equation systems. From the remaining solvers, those
(2) Matrix addition 140.9 1.9 74 that performed best in previous comparative studies have
(3) Jacobian matrix evaluation 11.6 3.8 3 been selected:
CHOLMOD
perform better in the second or third stage. The performance 7 KLU
of sparse solvers has been compared in previous works [5,6], 6
but the conditions of these studies (in particular, matrix sizes 5
4
and percentage of non-zeros) do not fit the above-mentioned
3
particular features of MBS dynamics, and therefore their 2
conclusions cannot be extrapolated to this field. As a result, it 1
is almost impossible to determine, without numerical experi- 0
0 20 40 60 80 100 120 140 160
ments, which sparse solver will deliver the best performance
Number of variables
in an MBS dynamic simulation.
Given the large number of existing sparse solvers, a selec- Fig. 4 Performance of different sparse linear equation solvers as a
tion process is required in order to narrow the scope. Solvers function of problem size
123
Comput Mech (2008) 41:607615 613
of variables. The legend text shows the name of the sparse small problems, in a factor which ranges from 1.5 (problems
solvers, ordered by increasing efficiency. of 10 variables) to 5 (problems of 50 variables).
Surprisingly, KLU is the fastest solver, despite being a However, this conclusion has been obtained for the pro-
general solver that does not exploit the symmetric positive posed test problem and dynamic formulation, and it could
definite condition of the coefficient matrix; in addition, it has be argued that it cannot be generalized to other situations
been designed for circuit simulation problems, which lead that lead to a coefficient matrix with a higher percentage of
to very sparse matrices, the opposite case of MBS dyna- non-zeros, as in the case of highly constrained mechanism
mics. However, these results have been obtained by using or topological formulations. The objection could be made to
the KLU refactor routine for numerical factorizations, which the efficiency ranking shown in Fig. 4. In order to get insight
reuses the pivoting strategy generated in the preprocessing about this subject, the numerical experiments used to gene-
stage. In multibody problems where the elements of the tan- rate Fig. 4 were repeated, but in this case artificial non-zeros
gent matrix of Eq. (5) may significantly change their relative were introduced in the mass matrix M, in order to generate
values during the simulation (e.g. due to violent impacts), the a tangent matrix with a variable percentage of non-zeros.
initial pivoting strategy may become invalid and the refactor Figure 5 shows the CPU times for a mechanism of 48 loops
routine would probably accumulate high numerical errors. (100 variables), as a function of matrix filling. Results show
To avoid this, the KLU solver can recalculate the pivoting that two sparse implementations, based on the Cholmod and
strategy in each numerical factorization, but this method WSMP sparse solvers, are always faster than the best dense
increases the CPU times in a 50%. On the other hand, Chol- implementation, even with 100% of non-zeros in the tangent
mod, a symmetric positive definite solver, performs at 85% matrix. This surprising fact can be explained by two factors:
of KLU, despite recalculating the pivoting strategy in each (a) Cholmod and WSMP rely on dense BLAS routines to per-
numerical factorization. Our best new sparse implementa- form the factorization, and therefore they start to operate as
tions (using KLU or Cholmod) perform faster that our star- dense solvers as the matrix filling increases; (b) the percen-
ting implementation, in a factor from 2 (small problems) to tage of non-zeros is always lower in the Jacobian matrix than
3 (big problems of 1,000 variables). in the tangent matrix, hence optimized sparse implementa-
tions achieve significant time savings in Jacobian operations,
in comparison with dense implementations.
4.3 Effect of dense BLAS implementation
Results for other problem sizes are synthesized in Fig. 6:
the different regions represent the points (problem size, matrix
Some sparse solvers, like Cholmod, rely on dense BLAS to
filling) where each implementation delivers the best perfor-
increase their performance. In addition, some sparse matrix
mance. For most MBS problems and dynamic formulations,
operations (e.g. the optimized matrix addition described in
a sparse implementation based on the KLU solver will be
Sect. 4.1) are actually computed as dense vector operations
the frontrunner. However, topological formulations with a
using BLAS routines. Results shown in Fig. 4 have been
symmetric tangent matrix will benefit from a sparse imple-
generated using the reference BLAS implementation. The
mentation based on the WSMP solver, specially when they
same numerical experiment has been executed using the fas-
are applied to rigid models, which result in a higher matrix
ter, optimized GotoBLAS and ACML implementations, and
filling.
CPU times have decreased only in a 23%. Hence, the refe-
rence BLAS implementation is recommended for MBS dyna-
mics in sparse implementations, since it provides the best
30
compromise between performance and usability.
25 KLU
CPU time (s)
20
5 Sparse versus dense implementations SuperLU
Fastest dense
implementation
15
As stated previously, dense linear algebra is frequently used CHOLMOD
10
in MBS dynamics for small problems (dimension of the coef- WSMP
ficient matrix lower than 50), since it is supposed to provide 5
higher performance than sparse implementations [27]. Our
starting sparse implementation, which already employs some 0
0 10 20 30 40 50 60 70 80 90 100
of the optimizations described in Sect. 4.1, disagrees with
% of non-zeros in the tangent matrix
this assumption, and this fact is reinforced with the perfor-
mance of our new optimized implementations: sparse ver- Fig. 5 Performance of different sparse linear equation solvers as a
sions perform always faster than dense versions even for function of tangent matrix filling, for a problem size of 100 variables
123
614 Comput Mech (2008) 41:607615
70
WSMP
60 triple size can be solved with the same resources.
50 The proposed optimizations based on symbolic prepro-
40 cessing of the sparse matrix computations can deliver
30
20
KLU huge speedups, since off-the-shelf sparse matrix libra-
10
Non reached ries do not take advantage of the constant sparse pattern
region
of operations during the dynamic simulation.
0 10 20 30 40 50 60 70 80 90 100
Optimized sparse implementations are recommended
Number of variables
since they perform better than optimized dense imple-
Fig. 6 Best implementation, as a function of problem size and percen- mentations, even for small-sized problems or relatively
tage of non-zeros in the tangent matrix dense matrices. This disagrees with the widespread belief
in MBS dynamics.
100 Concerning sparse linear equation solvers, it has been
90
found that KLU, an unfamiliar solver designed for circuit
80 LAP ACK
simulation, performs very well with many of the linear
% of non-zeros
70 WSMP
60 equation systems resulting from MBS dynamics. In addi-
50 tion, it was found that the reference BLAS implementa-
40 tion provides the best compromise between performance
30
20 CHOLMOD
and usability for sparse implementations.
Non reached
10 region
<900 >900
123
Comput Mech (2008) 41:607615 615
since all the numerical experiments have been performed 12. Bayo E, Ledesma R (1996) Augmented Lagrangian and
using a particular global formulation. mass-orthogonal projection methods for constrained multibody
dynamics. Nonlinear Dyn 9:113130
13. Cuadrado J, Gutierrez R, Naya MA, Morer P (2001) A compari-
Acknowledgments This research has been sponsored by the Spanish son in terms of accuracy and efficiency between a MBS dynamic
MEC (Grant No. DPI2003-05547-C02-01 and the F.P.U. Ph.D. fellow- formulation with stress analysis and a non-linear FEA Code. Int J
ship No. AP2005-4448) and the Galician DGID (Grant No. Numer Meth Eng 51:10331052
PGIDT04PXIC16601PN). 14. Cuadrado J, Dopico D, Gonzlez M, Naya M (2004) A combined
penalty and recursive real-time formulation for multibody dyna-
mics. J Mech Des 126:602608
15. NIST (2006) Basic linear algebra subprograms. http://www.netlib.
References org/blas/
16. Goto K (2006) GotoBLAS. http://www.tacc.utexas.edu/resources/
1. Cuadrado J, Cardenal J, Morer P (1997) Modeling and solution software/
methods for efficient real-time simulation of multibody dynamics. 17. AMD (2007) AMD Core Math Library. http://developer.amd.com/
Multibody Syst Dyn 1:259280 acml.jsp
2. Bae DS, Lee JK, Cho HJ, Yae H (2000) An explicit integration 18. NETLIB (2007) LAPACK. http://www.netlib.org/lapack/
method for realtime simulation of multibody vehicle models. Com- 19. Dopico D, Lugrs U, Gonzlez M, Cuadrado J (2006) Two imple-
put Meth Appl Mech Eng 187:337350 mentations of IRK integrators for real-time multibody dynamics.
3. Anderson KS, Critchley JH (2003) Improved Order-N perfor- Int J Num Meth Eng 65:20912111
mance algorithm for the simulation of constrained multi-rigid-body 20. Walter J, Kock M (2006) UBLAS. http://www.boost.org/libs/
dynamic systems. Multibody Syst Dyn 9:185212 numeric/
4. Anderson KS, Mukherjee R, Critchley JH, Ziegler J, Lipton S 21. Dongarra JJ (2004) Freely available software for linear algebra
(2007) POEMS: parallelizable open-source efficient multibody on the web. http://www.netlib.org/utk/people/JackDongarra/la-sw.
software. Eng Comput 23:1123 html
5. Gupta A (2002) Recent advances in direct methods for solving 22. Chen Y, Davis TA, Hager WW, Rajamanickam S (2006) Algo-
unsymmetric sparse systems of linear equations. ACM Trans Math rithm 8xx: CHOLMOD, supernodal sparse cholesky factorization
Softw 28:301324 and update/downdate. http://www.cise.ufl.edu/~davis/techreports/
6. Scott JA, Hu YF, Gould NIM (2006) An evaluation of sparse direct cholmod/tr06-005.pdf
symmetric solvers: an introduction and preliminary findings. Appl 23. Davis TA, Stanley K (2004) KLU: a Clark Kent Sparse LU Facto-
Parallel Comput State Art Sci Comput 3732:818827 rization Algorithm for Circuit Matrices. http://www.cise.ufl.edu/
7. Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical ~davis/techreports/KLU/pp04.pdf
optimizations of software and the ATLAS project. Parallel Comput 24. Demmel JW, Eisenstat SC, Gilbert JR, Li XYS, Liu JWH (1999)
27:335 A supernodal approach to sparse partial pivoting. Siam J Matrix
8. Turek S, Becker C, Runge A (2001) The FEAST indices. Realistic Anal Appl 20:720755
evaluation of modern software components and processor techno- 25. Davis TA (2004) Algorithm 832: UMFPACK V4.3an
logies. Comput Math Appl 41(1011):14311464 unsymmetric-pattern multifrontal method. ACM Trans Math
9. Yu JSK, Yu CH (2002) Recent advances in PC-Linux systems Softw 30:196199
for electronic structure computations by optimized compilers and 26. Gupta A, Joshi M, Kumar V (1998) WSSMP: a high-performance
numerical libraries. J Chem Inform Comput Sci 42:673681 serial and parallel symmetric sparse linear solver. Appl Parallel
10. Gonzalez M, Dopico D, Lugrs U, Cuadrado J (2006) A benchmar- Comput 1541:182194
king system for MBS simulation software: problem standardization 27. Cuadrado J, Dopico D (2004) A combined penalty and semi-
and performance measurement. Multibody Syst Dyn 16:179190 recursive formulation for closed-loops in MBS. In: Proceedings of
11. Garca de Jaln J, Bayo E (1994) Kinematic and dynamic simula- the eleventh world congress in mechanism and machine science,
tion of multibody systemsthe real-time challenge. Springer, New vols 15, pp 637641
York
123