Anda di halaman 1dari 26

Analysis of the SOR Iteration for the 9-Point Laplacian Author(s): Loyce M. Adams, Randall J.

Leveque and David M. Young Reviewed work(s): Source: SIAM Journal on Numerical Analysis, Vol. 25, No. 5 (Oct., 1988), pp. 1156-1180 Published by: Society for Industrial and Applied Mathematics Stable URL: http://www.jstor.org/stable/2157663 . Accessed: 13/11/2012 17:20
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and extend access to SIAM Journal on Numerical Analysis.

http://www.jstor.org

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

SIAM J. NUMER. ANAL. Vol. 25, No. 5, October 1988

1988 SocietyforIndustrial and Applied Mathematics 012

ANALYSIS OF THE SOR ITERATION FOR THE 9-POINT LAPLACIAN*


LOYCE M. ADAMSt, RANDALL J. LEVEQUEt,
AND

DAVID

M. YOUNG?

Thispaper is dedicatedto Werner C. Rheinboldt.


Abstract. The SOR iteration for solving linear systems of equations depends upon an overrelaxation factor t. A theory for determining t was given by Young ("Iterative methods for solving partial differential equations of elliptic types," Trans. Amer. Math. Soc., 76(1954), pp. 92-111) for consistently ordered matrices. Here we determine the optimal t for the 9-point stencil for the model problem of Laplace's equation on a square. We consider several orderings of the equations, including the natural rowwise and multicolor orderings, all of which lead to nonconsistently ordered matrices, and findtwo equivalence classes of orderings with differentconvergence behavior and optimal wo's. We compare our results for the natural rowwise ordering to those of Garabedian ("Estimation of the relaxation factor for small mesh size," Math. Comp., 10 (1956), pp. 183-185) and explain why both results are, in a sense, correct, even though they differ.We also analyze a pseudo-SOR method for the model problem and show that it is not as effectiveas the SOR methods. Finally, we compare the point SOR methods to known results forline SOR methods forthis problem. Key words. SOR, consistently ordered, multicolor ordering AMS(MOS) subject classification. 65

1. Introduction. The SOR method(successiveoverrelaxation) is a standarditerative methodforsolvinglinearsystems of equations,particularly large sparse systems from arising partialdifferential equations.Convergence ofthemethod is greatly affected by the choice of overrelaxation co.A standardmodel problemforanalyzing parameter SOR is thesystem ofequationsarising from a finite difference discretization ofLaplace's equation on a rectanglewith zero boundarydata. The solution of this problemis identically zero; hence,the iterates of SOR also represent the errorat each step. The convergence of SOR forthemodelproblemalso applyto Poisson's equation properties withgeneralDirichlet willstillsatisfy boundarydata,sincetheerrors thehomogeneous equation. The 5-pointapproximation to the Laplacian is
(1.1)
Uj-l,k + Uj+l,k + Uj,k-1 + Uj,k+l-4Ujk Ujk=0, =O ,

j, k= 1,2,

,N

j=O,N

ork=O,N,

whereUjk approximates the solutionU(Xj, Yk) withxj = jh, Yk = kh,and h = 1/N. This gives a linearsystem of (N- 1)2 equations in (N- 1)2 unknowns. The exact formof the matrix thatthe SOR iteration equation,and the form takes,depends on the order in which the unknownsUjk are arrangedin the vectorof unknowns.Two standard
* Received by the editors October 28, 1986; accepted for publication (in revised form) July 14, 1987. This research was supported in part by NASA contract NAS1-18107 and AFOSR contract 85-0189 while the authors were in residence at the Institute for Computer Applications in Science and Engineering (ICASE) at NASA Langley Research Center. t University of Washington, Seattle, Washington 98195. The research of this author was supported in part by U.S. Air Force Office of Scientific Research grant 86-0154. t University of Washington, Seattle, Washington 98195. The research of this author was supported in part by National Science Foundation grant DMS-8601363. ? University of Texas, Austin, Texas 78713. The research of this author was supported in part by Department of Energy grant DE-A505-81ER10954, National Science Foundation grant MCS-821473, and U.S. Air Force Office of Scientific Research grant 85-0052.

1156

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

9-POINT

SOR

ITERATION

1157

and the Red-Black (RB) ordering, are the naturalrowwise(NR) ordering orderings in whichthe gridis colored in a checkerboard fashionand the Red pointsare ordered before the Black points (by rows withineach color). For the model problem,the optimal co and rate of convergenceare the same for both of these orderings. This model problemwas analyzedby Frankel[1950] and also by Young [1954], who gave a moregeneraltheory of SOR fora wide class of matrix equationsin whichthematrix is "consistently ordered." Anotherstandardmodel problemis the 9-pointapproximation to the Laplacian:
4uj-,,k + 4Uj+l,k +
4Uj,k-1

4Uj,k+l

+ Ujl,k-1

Ujl,k+l

+ Uj+l,k-1

+ Uj+l,k+l

- 2OUjk = 0,

(1.2)
Ujk=O,

j, k= 1,2,*
j=O,N

ork=0,N.

Again there are various ways to orderthe unknowns, but none of these leads to a consistently orderedmatrixand so the theoryof Young does not apply. Multicolor similarto the RB orderingmentionedabove but usually involvingfour orderings, colors forthe 9-pointstencil,are of particular interest forparallelprocessing applications.Recently Adams and Jordan [1986] have studiedthisproblemin a moregeneral 72 distinctfour-color contextand identified orderings. These can be grouped into equivalence classes that are knownto have the same convergence behavior.For the model problemconsideredhere,theirtheory reducestheseto six classes of orderings thatcould potentially have different convergence rates,althoughthe actual rate was not determined forany class. In thispaper,we analyzefourofthesesix classes and showthatthesefourclasses can be reducedto two. One class is shownto have the same convergence behavioras buttheotherclass is shownto be distinct witha different thenaturalrowwise ordering, matrix of the iteration rate.The eigenvectors optimalco and asymptotic convergence and the corresponding are determined forthreeseparateorderings eigenvalues(which determine the convergence rate) are foundin termsof the rootsof quarticequations. The optimal co forsmall h is givenby an asymptotic expansion about h = 0, and is verified numerically. In ? 2 we use a separationof variablestechniqueto determine the eigenvalues and eigenvectors forthe NR ordering. The resulting quarticequationis used to derive the expansion forthe optimalco.Our resultsforthisordering those given differ from by Garabedian [1956]. We explain whyboth resultsare, in a sense, correct. In ? 3 we discussthe variousmulticolor orderings. The main techniquewe use is a change of variables fromn, the iteration number,to v, the "data flowtime," as definedby Adams and Jordan[1986]. The factthatthis change of variablescan be used to simplify and relate various SOR methodswas observed by LeVeque and a simpleFourieranalysisof SOR on the 5-point stencil Trefethen [1986]. Theypresent and show the equivalence of NR and RB using a change of variables motivated by Garabedian [1956] thatis equivalentto the data flowtimes. In ? 4 we analyze a pseudo-SOR methodbased on a Red-Black coloringforthe features forparallel computers, 9-pointstencil.Althoughthis methodhas attractive we show that it is unsatisfactory, being an orderof magnitudeslowerthan the true SOR methodswithoptimalco. in ? 5,we briefly discusslineSOR methods and comparetheir Finally, convergence rateswiththe point SOR methodsdiscussed in thispaper. We summarize resultsfor the 5-pointand 9-pointpoint and line SOR methodsin Fig. 5.2.

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

1158

L. M. ADAMS,

R. J. LEVEQUE,

AND

D. M. YOUNG

2. Analysisof the natural rowwise For the 9-point stencil with NR ordering. the SOR methodtakes the form ordering,
Unk J = ( 1n?1

&))Uj'k +-(U

(2.1)

jk-l1 + UJ-1,k + UJ?1k?k

n?1

n~~~~~~~+U Ujk+1)

j,~~~

2
(2.2)

(U.1k+l

UYJ-l,k-1+ UY+l,k+l + Uj+l,k-1)

has a solutionof the form We assume thatthisiteration


ujk AW(Xj, Yk)-

Then A is an eigenvalue of the iterationmatrix,and the vectorwith components - - , N- I is the corresponding eigenvector. Putting (2.2) into(2.1), W(xj, Yk), j, k = 1, on x and y gives cancellinga commonfactorof An, and droppingthe subscripts (2.3)
Aw(x, y) =Akc(w(xh, y)?+}w(xh, y- h)?+w(x, y- h)?+ w(x+ h,y- h)) ?cw)(0w(x+h,y)?+ 1w(x-h,y+h)+lw(x,y+h)+?1
+(1-co) W(X, y).

w(x+h,y+h))

The eigenfunction w(x,y) mustbe zero on theboundaryof theunitsquare in view of the givenboundaryconditions. We use separationof variablesand let w(x,y) = X(x) Y(y). We also set a A1/2 thisinto (2.3) and dividingby X(x) Y(y) gives Substituting
-

?a a2+Ct-1

2 Y(y

(2.4)

5\

h)+ Y(y +h) 1 /20 Y(y)

a2Y(y

- h)+ Y(y

+ h)

\Y(y) h)

(X(x-h)+X(x+
X(x)

5k

_a2X(x_ h)_+_X(x_+

h)

X(x)

Now let (2.5) to get (2.6) (2.6) where


a?2+C-1

Y(y) = a y/h sin 7qy 2a =-COS 5 12a X(x - h) 7h?+(, X(x) X(x + h) X(x)

(2.7)

a=a 5

+ ?-COS

10

71h,

1a =- ?cos 5 10
sin gx.

-qh.

We note that(D and (12 are independent of x and y and let (2.8) Then
+ (lX(x - h)?2X(X

X(x) =

((1/(2)x/2h

+ h)2

1/2(/2

COS

h,

and using thisin (2.6) gives


(2.9)
cx(1)

cos

ih=

2I1/2I

/2cos gh.

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

9-POINT

SOR

ITERATION

1159

Squaring(2.9), using (2.7), and rearranging terms givesa quarticequation fora:


a
-[5cos COSh

+ 25os2 42 cos2

(h cos 7h]a

(2.10)

[2(1 - w) -

qh+?2I cos2fh(cos27jh+4)]a'
cos

+ [4CO (I -_C) COS

qh-A2w02 COS2 (h

qh]a + (1I_

)2 = 0.

An eigenmodeof the iteration matrix has the form (2.11)


Ui'k =

A'X(Xj) Y(Yk) uJAXx)(k

= a 2n+k [(,D,/(D2)1/2]j

sin gxj sin

qYk-

In orderfortheboundaryconditions to be satisfied, g and -rmustbe integer multiples of 1r.The eigenvalueis A = a2, wherea is a rootof (2.10). Note thatwe mustchoose thecorrect square rootof 11/12 in (2.11) (recall thatwe squared (2.9) to obtain(2.10), introducing additionalsolutions).The correct sign for Ip1/2I1/2 is determined by the requirement that(2.9) be satisfied, and thisgivesthecorrect signfor ID1/2/I /2 as well. The frequencies g and -r each range over the values iT, 2 , * * *, (N - 1) (rI/2). This gives a set of (N- 1)2/4 pairs of frequencies. Corresponding to each pair (g,7) thereare fourroots of the quartic (2.10). In generalthese roots are distinct, and so we obtain (N - 1)2 eigenvaluesand eigenvectors, the correct number. Whenthe roots are not distinct, willbe obtainedand thenumber principalvectors of eigenvectors will be less than (N - 1)2. Recall thatforthe 5-pointdiscretization, a principalvectoris associated withthe optimalco,(see, e.g., Young [1971]). We make no attempt hereto determine the principalvectorsor thevalues of co forwhichtheyoccur. However,by we do obtain all of the eigenvalues. continuity, The frequencies((N- 1)/2+ 1)Tr, , (N- 1)IT,which one mightexpect to be includedas well,giverepeatsoftheeigenvectors alreadyfound.Replacingg by NIT- f leaves (2.10) unchangedwhile replacing7qby NIT -?) simplynegatesthe coefficients of a and a3. In eithercase the squares of the rootsare unchanged.The eigenmodes (2.11) are also unchangedby thesefrequency reflections. The convergence rate of the method(for fixedco) is determined by the spectral whichis radius of the iteration matrix,
p=max A(,
71).

To determine the optimalcw, we need to minimizep over co. To help characterize the roots of (2.10), we first solved the quartic numerically forvariousvalues of the parameters. For example,the solid lines in Fig. 2.1 (the +'s will be explained later) show the magnitude of the fourrootsplottedas a function of co when == q = XT for h = 1/10,1/100,and 1/1000.When co is small thereare four real roots.As co increases,two of the rootsbecome complexconjugates.The optimal co occurs when these complex roots intersect the largestreal root. As co increases two more roots become complex conjugates and near c = 2 thereare two further, complex conjugate pairs. The same behavior was observed for smallervalues of h usingvarious values of A and 71. We now show thatthese observations are correct the optimalc by determining and corresponding spectralradius forsmall h. We let co have the form (2.12) co=2-klh+ O(h2)
as h - 0. At each value of h, f and q range over T, 2I, **, (N - 1)IT/2 where N = 1/h,

and so cos (h and cos qh are selectedfrom a set of pointsin [0, 1] thatbecomes dense

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

1160

L. M. ADAMS,

R. J. LEVEQUE,

AND

D. M. YOUNG

the maximumroot of as h - 0. At each h, the optimal co is the one that minimizes We will simplify the problemby making (2.10) over all allowable choices of 4 and 77. the replacement cos (h- p, cos h v-q rootoftheresulting quarticequation themaximum in (2.10), and proceedto determine thatthemaximum rootoccurswhen overall p and q in [0, 1] forfixedkl. We will find 4= 7 = 7r determine the lowest that this that frequencies implies p = q = 1. Wewill show in (2.10) matrix as h -* . Makingthisreplacement the spectralradius of the iteration yields (2.13)
a'?
[_)q+ 2w2p2q]

3 - [2(1-c

-)-5?2q2+5 (-Ico)2

O2p2(q2+4)]a2

[ -w(1ow)q -Aw2p2q]a +

0.

We wish to analyze the behaviorof the roots of this equation as h -*0 withco of the the limiting behaviorwe set h = 0, co=2 form(2.12) and p and q fixed.To determine in (2.13) to obtain (2.14)
a4_ [q2qp2]

[2+

q2

5p2(q2

+4)]a2 _ [q

qp2]a

+ 1=0.

In thislimitall the rootsof (2.14) lie on the unitcirclesince theymusthave modulus a = e"' and multiplying no greater thanone and productequal to (1 - c)2 = 1. Setting by e-2 O reduces (2.14) to a quadraticequation forcos 0
COS2 0 _ (8q8
qp2) cos + 2 2 2(q

4)

This equation has roots (2.15a) and (2.15b)


COS2 4=(5? 25 ) +
q2

cos

=y(8+

8 p 2)

q2+2

p2q2

+?4

2?4P ?p2q

The corresponding rootsof (2.14) are e ?iol and e i02. From (2.15a), -1 < cos 61< 1 for all p, q E [0, 1], and hence (2.14) always has at least one pair of nonreal roots.The nonnegative function of p and q for side of (2.15b) is a strictly increasing right-hand p, qE [0, 1] and gives cos 02= 1 only forp = q = 1. Thus forp = q = 1, thereare two real rootsat a = 1, but forall otherchoices of p and q (2.14) has fournonrealroots. of the behaviorof the rootsof (2.13) as h - 0 withcto We now wishto determine theform (2.12) forsome fixedk, (we will optimizeover k1later).If (p, q) #(1,1), then forsufficiently small h, the perturbed equation (2.10) also has fournonrealroots,by This mightalso be true forp = q = 1 and will be true for k, sufficiently continuity. small,as we will see. For largerk1,two rootsare real and thiscase will be discussed below. separately We assumefornow that(2.13) has two complexpairsof roots,expandthemabout the limiting values, and look forrootsof the form (2.16) + 0(h2), e1iol(l - f31h)
eii02(1

-f32h)+ 0(h2),

where /3 and /2 could be complexbut will in factbe real. From (2.13), we see that the productof the roots is (1- co)2. Using (2.12) and (2.16) and equatingthe 0(h) terms yields

(2.17)

+ ?f2= l1

k,.

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

9-POINT

SOR

ITERATION

1161

+ (2/25)w(02p2q and again equatingthe Similarly, the sum of the rootsmustbe (4/5)wcoq 0(h) termsyields

(2.18)

131cos 0 1+?82 COS 2 =k1(q? q +p25

If p # 0 thencos 01$ cos 02 and we can solve (2.17) and (2.18) forI3 and 132.Doing thisand using (2.15a) and (2.15b) we obtain (2.19a) and (2.19b)
(3~=2
51(13/5)q2 +(4/25)p2q2

+4)

(2

51(13/5)q2 + (4/25)p2q2

+) 4

By continuity, theseexpressionsare also valid in the case p = 0. Il3-- 2 and the maximumroot of the quartic Since p and q are nonnegative, + 0(h2) 1 - /31h (2.13) (forp, q, k, fixed, assumingall rootsare nonreal)has magnitude as h -O0. Maximizingthis value over p and q in [0, 1], we findthat the maximum
occurs for p = q = 1 where

(2.20)

= 26k1 131

fixedas h -* 0 with g and Y7 Moreover,if we replace p and q by cos (h and cos r1h = (11/26)k,+ 0(h) from(2.19a) and consequently the (low frequencies) we obtain /3, in (2.20) is valid forthe discrete value of I31 problemas well. We concludethatifwe varycoas in (2.12) when h - 0, and iftheresulting quartic (2.13) has nonreal roots then the spectral radius of the iterationmatrixwill be that 1 -f3,h + 0(h2) with/3,givenby (2.20). But we muststillconsiderthe possibility two of the rootsof (2.13) are real. Recall thatthiscan onlyoccur whenp = q= 1 and thatin thiscase thelimiting p= q= 1 quartic(2.14) has a double rootat a = 1. Setting thatin practice in (2.13) we findthata = 1 is in facta rootforall co.Recall, however, p = q = 1 cannot occur for h > 0 and the maximumvalue thatp and q can actually to theoriginal we mustreturn takeis cos Irh= 1 -_172h2+ 0(h4). Consequently, quartic (2.10), set cosh1h -2h2?0(h4), and look fora real root of the form (2.21) a = 1 - clh - c2h2+ 0(h3). willcancel identically because nowthe 0(h) terms outto the 0(h2) term We taketerms of h2. Also, we an for the coefficients and we obtain expression cl onlyby equating in mustextendthe expansion (2.12) as cos-h=12 h ?0(h4),

co= 2- klh- k2h2?+ 0(h3)


the coefficients although c2and k2will drop out. Usingtheseexpansionsin (2.10) and of h2 gives equatingcoefficients
13C12- 15k1C1? 9((2+? So,
2) =

0.

(2.22)

k2 cl= -5k,+ 21 225 -468((2

+ 2).

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

1162

L. M. ADAMS,

R. J. LEVEQUE,

AND

D. M. YOUNG

We see thatfora givenchoice of g and 71,(2.10) has real rootsonlyif cl is real,that is, onlyfor (2.23)
= 7 = Tr.This gives kl_ 225

The largestreal root is obtained by takingthe minus sign in (2.22) and choosing (2.24) 5-V225k cl =2k
1 - 9361X2

whichis real fork,> 2.04XI. the complex roots in the case wherethereare two real We muststilldetermine + O(h2). From (2.10), the productof roots.Take theseto be of the forme'0(1 -f(3h) all fourrootsmustbe (1 - c()2. Using thisformof the complexrootsand (2.21) with of h in theproduct yields cl givenby (2.22) forthetwo real roots,equatingcoefficients -2f3- 30k,= -2k1.
SO,

13= 26kl-

in (2.20) forthe largest Note thatthis agrees withthe value of 3,ffoundpreviously root in the case of fourcomplex roots.This indicatesthatit is the pair of complex rootswithsmallermagnitude thatsplitsintoreal rootsas the critical value of k1given by (2.23)Yispassed. This agreeswithwhatis observedin Fig. 2.1. We now determine the optimalco.We need onlyminimize max (1 -13h, 1- clh) where3,fand cl are givenby (2.20) and (2.24), respectively. Since 3,fis an increasing of k, and cl is decreasingfunction of kl, the minimum function occurswhere 3,3=cl whichgives =z~2.116X. k,= IT /9236 Consequently, cl 131 =26kj (2.25) = 2 - 2.116Trh + O(h2), wOopt 0.895-T and theoptimalcoand corresponding spectral radius,p = a 2= 1 -2clh, have thevalues

= 1- 1.791Trh + O(h2) Popt

as h-0. For comparison, the corresponding values forthe 5-pointmodel problemare : - 2ITh. t)opt 2 2ITh, p1o Noticethatforthe9-point thespectral radiusis slightly stencil, somewhat larger, giving slowerconvergence thanforthe 5-point thetwo are veryclose. More stencil, although bothare 1 - 0(h) as h - 0, givingasymptotically important, thesame orderof converthe Jacobi and Gauss-Seidel methods,and also the pseudo-SOR gence. By contrast methodanalyzed in ? 4, have spectralradii 1 - 0(h2) as h - 0. It is veryinteresting to comparetheseresults withasymptotic results obtainedby Garabedian[1956], especially since theydo not agree and yetboth are, in a sense, correct. Garabedian's analysisis based on viewingthe SOR iteration (2.1) as a finite PDE. Expandingin Taylorseriesshowsthat methodfora time-dependent difference thisdifference equation is consistent withthe PDE (2.26) = 3uXx 5Cu,+2u, ?+3uyt +3uyy, u = 0 on theboundary

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

9-POINT

SOR

ITERATION

1163

where C and co are relatedby


(2.27) i +Ch

If we fixC > 0 and choose co accordingto (2.27) foreach h > 0, then0 < cK< 2, and withthe linear equation (2.26), so the method (2.1) is stable. Since it is consistent to solutionsu(xj, Yk, T) of (2.26) as h - 0 (by iterates uk withn = T/h mustconverge fixed initial data by discretizing the Lax Equivalence Theorem) if we choose u3?k the decay of solutionsto (2.26) gives information studying u(x, y,0). Consequently, of SOR. about the rate of convergence the change of variables By introducing
x y 3 2'

to (2.26) is transformed

5Cus+

3us

= 3uxx +3uyy.
sin qy,
, 7)

Separationof variablesshows thateigenmodesof thisPDE have the form


gx u(x, y, s) = e-Ps sin

multiplesof XTand p =p( where g and -r are integer equation


(2.28) (2.29)
-2

is a root of the quadratic

- 5Cp+3(

r2?72)0

back to the originaltimevariablegives Transforming


u(x, y, t) = e-P(t+x/3+Y/2)sin gx sin 7y.

Note that in a time step of lengthh, this solution decays by a factore-Re(P)h. The square eigenmodewithslowestdecay is obtainedby takingg = q = iX and thenegative rootin solving(2.28), giving
(2.30)
Ujk = U(Xj, Yk, 0)

= f3(5C -25C Pmin

-261

).

If we obtaininitialdata forSOR by discretizing eigenfunction, thecorresponding from(2.29), it follows(by convergence) thatthe decay factor forthe SOR iteration musthave the form
(2.31) = JAmaxl + 0(h2) 1-Re (Pmin)h

as h - 0. Since takingothereigenfunctions as initialdata givesfaster decay,one is led we should possible convergence, to the conclusionthatin orderto obtain the fastest Recall thatthe value of C is still at and hence minimizeJAmaXj. maximizeRe (Pmin) Re (Pmin) theradicalto zero in (2.30), giving our disposal. We can minimize by setting
C= 26

v1.02 T,

= 6 T2/ 132.35XT. Pmin

forthe optimalco and the predictions By (2.27) and (2.31), we obtain the following decay rate,as in Garabedian[1956]: corresponding
(.)
o*pt
=

(2.32)
A*t

1 2.34 1 ,h, -h l?Ch~~--2(1Ch)=22.04iTh, C


-mn
-z /Pmin h = I 2. 5 oTh.

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

1164

L. M. ADAMS,

R. J. LEVEQUE,

AND

D. M. YOUNG

These values do not agree withthe values (2.25) foundby computing the eigenvalues of the iteration matrix. The reason is the following. While (2.31) does indeed give a correct expressionforthe largesteigenvalueof the iteration matrix to corresponding a decay factorfor the PDE, thereare other,spurious,eigenvalues of the discrete problemthat have largermagnitudefor co near 2 and hence determine the spectral radiusp. This is seen clearlyin Fig. 2.1, wherewe have plottedle-Phl forthetwo roots ofthequadratic(2.28) (as + 's) along withJA themagnitudes oftheactual eigenvalues 1, obtainedby solvingthe quartic(2.10) (as the solid lines). One pair of discrete eigenvalues closely matchesle-Phl for small h, while the other(complex conjugate) pair does not. In fact,Garabedian's resultsmay also be obtained fromour approach by choosing k, in (2.24) to maximizecl, thereby the effect of the otherroot. ignoring For each fixedh, we can choose initialdata (namely,as a spuriouseigenvector) so thatconvergence is slow and determined radius.On theotherhand, by the spectral thesespuriouseigenvectors become highly and do not approach a limitas oscillatory h -O0. In particular,they do not approach the eigenmodes of the PDE as h -*0. ifwe obtain our initialdata by discretizing Consequently, a fixedfunction of x and y at each h (as is morerealisticin practice),we would expectto see vanishingly small components of thesespuriouseigenvectors as h - 0. For practicalpurposes,then,the values (2.32) obtainedby Garabedian may be more meaningful and usefulthan the "true" values (2.25). in Fig. 2.2, wherewe show the decay of 11u This is demonstrated 12 forvarious initial data. For initial data obtained by discretizing the smooth data u(x, y) = (x2-x)(y2-y), the observeddecay is initially much closerto jAmaXIn, as predicted by (2.31), than to pn. However, as the iteration continueswe would expect to see the of the spuriouseigenvectors effect take over.Withthe NR ordering on smoothinitial data,thishas notyetappeared overthenumber ofiterations used. Withdifferent initial data, uij= 1 at all interior points(so thatthereis a discontinuity at theboundary where u = 0, and hence morehigh frequencies are present),thisdivergence does occur and theasymptotic slope appears to agreewiththespectralradius.This effect is evenmore visiblein Fig. 3.11 wherethe same experiments are performed fora different ordering. In thatcase, even withsmoothinitialdata, the asymptotic rateis clearly convergence givenbythespectral radiusalthough downto an error level of 10-6 or so Garabedian's estimate is valid. To verify that the eigenvector corresponding to the spectral radius is highly we note thatby (2.5) and (2.8) an eigenvector oscillatory, has the form

(2.33)

Wjk =

ak [(P1/2)

sinN'Yk1/2]j sinfxj
I2) /2]j

a complexeigenvaluea = e'0(1 -,8h)+ 0(h2) into (2.33) gives Inserting


Wjk= eiOk(1 -oh)k[(Q1/

sin fxj sin '7Yk-

Since 0 # 0, this function will be oscillatory in k and j and nonconvergent as h - 0. the eigenvector By contrast, to the otherpair of rootsconverges corresponding to the eigenfunctions (2.29) of the PDE as h - 0. For thesevectorsour previousarguments show thatA has theform of (2.31) and hence a, thesquare rootof A,can be expressed as (2.34) Expanding (2.35)
O1/P2

+ 0(h 2). a = 1 -4fph using thisvalue of a in (2.7) shows that


O1/P2 = 1 -j3ph +

0(h2).

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

9-POINT

SOR

ITERATION

1165

Cl V

ovil

VII

Cil

In

~~~In

I UI

d7

ojj

(,

~ ~

Cl V

t~~~~~~~~~~

11 VI
,_

e11

,~VII
+ x

Ii

tD

__o_ _

_N

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

1166

L. M. ADAMS,

R. J. LEVEQUE,

AND

D. M. YOUNG

10-9

20

40

60 1 t era

80 t io n

100

120

140

160

h = 0.05 versus iteration with FIG. 2.2. Convergence number) fortheNR ordering oferror history (2-norm to the spectral corresponding and w = 1.86. Threechoicesof initialdata are compared: (a) An eigenvector to the largestnonspurious (c) uk= (J- x)(yk - Yk). eigenvalue. radius.(b) An eigenvector corresponding
(d) u=o1.

Putting (2.34) and (2.35) into (2.11) gives


n

-_ph+ O(h2))j sin Oh sin rkh = (1 -ph + 0(h2))n+j3+k,2 sinOh sin rkh. (1-1ph +
O(h2))2n+k(l

thisapproachesthe eigenmode(2.29) of As h -- 0 witht = nh,x =jh and y = kh fixed, the PDE. (See noteadded in proof.) 3. MulticolorSOR. In this sectionwe considerthe SOR methodapplied to the orderings of the unknownsUjk. These 9-pointmodel problemwithseveralalternative are determined colors (Red, by labelingthe gridpointswithfourdifferent orderings listing all the points the pointsby first Black, Green,and Orange) and thenordering of gridpoints is of one color, thena second color, and so on. The overall ordering determined by two factors:(a) the mannerin whichthe gridpoints are labeled (the of coloringof the grid) and (b) the orderin whichthe colors are taken (the ordering the colors). are of interest forthe9-pointstencilbecause withfourcolors it is Four-colorings forupdating SOR formula possible to decouple thegrid,in thesense thattheresulting a grid point of any given color involvesneighboring grid points,all of which have different colors than the centerpoint. This is advantageous in parallel processing applicationssince all gridpointsof the same color can be updated simultaneously. to decouple the grid using the RB For the 5-pointstencil,two colors suffice checkerboard discussed in ? 1. RecentlyLeVeque and Trefethen[1986] have pattern an easywayto analyzethe5-point modelproblem usinga changeofvariables presented n, to the earliesttime,v,thatthe unknownat a gridpoint from the iteration number, can be updatedassuming one timeunit.Thisvariablev corresponds one updaterequires [1986] and closelyresembles to the "data flowtimes"discussedin Adams and Jordan the change of variablesused by Garabedian[1956] to analyze the PDE. This change rateand the convergence of variablesallows the use of Fourieranalysisto determine proofof the equivalenceof the NR and RB optimalCt.It also gives a straightforward orderings.

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

9-POINT

SOR

ITERATION

1167

Here we use thissame approachto analyzethefour-color orderings forthe9-point Beforeintroducing theseorderings, we briefly stencil. review theanalysisforthe5-point NR and RB orderings to introduce notationand motivate our 9-pointanalysis. For the 5-pointmodel problem(1.1) withthe NR ordering, the SOR iteration takes the form

(3(3-1) .1)

~n?1== ( 1-() u _0Un UJk UJk j)

"(n

1+ j-l,k+

Ujk-

n?l

n Un + U,k+1 + U+1,k)-

n + 1 usingthe NR ordering The stencilforupdatinga gridpointon iteration is given in Fig. 3.1. To assistin determining thechangeof variables, timesat which theearliest an unknowncan be updated on thefirst two iterations usingthe stencilin Fig. 3.1 are variable P. Each listedbelow each node in Fig. 3.2. These timesdefinethe iteration node in Fig. 3.2 is updated at time v+ 1 by the stencilshown in Fig. 3.3, and the SOR iteration is corresponding (3.2)
u-W)uj ]k1 =

(1

?-(U>

1,k + Ujk-1

+ Ujk+1

+ UJ?1,k)-

in Figure 3.2 shows thatthe timesalong lines j + k are constantand that iterations variable n occur every two timeunits.Hence, theproperchangeof variablesbetween (3.1) and (3.2) is (3.3) v=2n+j+k-2. The advantageof thischangeof variablesis thatthe eigenmodesof (3.2) are easy to determine. They are simplyFouriermodes of the form (3.4) sin'qYkUjk = g" sinfxj theboundary These gridfunctions satisfy conditions providedthat4 and iq are integer multiples of iT, and substituting into (3.2) givesthe following equation forg:
(3.5)
g2 = (1 - G)+

?g (cos n

eh+ cosqh).

n+1

n n+I

n
n

n+ 1~n

in variablen. FIG. 3.1. NR stencil


4,6 3,5 2,4 1,3 5,7 4,6 3,5 2,4 6,8 5,7 4,6 3,5 7,9 6,8 5,7 4,6

FIG. 3.2. Times for twoiterations of NR.

vc

vN-1

iv

FIG. 3.3. NR stencil in variabler'.

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

1168

L. M. ADAMS,

R. J. LEVEQUE,

AND

D. M. YOUNG

totalof 2(N - 1)2 modes. This is the correct numbersince (3.2) requirestwo previous levels of data to calculate uJkl A of(3.1) is g2 and thecorresponding Bythechangeofvariables(3.3), an eigenvalue eigenvector has components gj+k sin fxjsin 'qYk- We now appear to have twiceas many eigenmodesas requiredfor(3.1), buthereeach mode is repeatedsince replacing ((, 7) by (N1J- ~, Nir - D) simply negatestherootsof (3.5). Since A = g2 thisreflection leaves these eigenvaluesunchanged.The eigenvector is also unchangedsince sin (NT -()xj sin (NT -)Yk = gj?k sin fxjsin vYkEquation (3.5) givesthe famousrelationship betweenan eigenvalueA of SOR and an eigenvalueA = 2(cos (h + cos qh) of the Jacobi iteration:
(_g)j?k

all modes by letting 5 and q range over the values , q

For each f and 7q, thisequation has two solutions,givingtwo eigenmodes.We obtain
= iX, 21i,

for a * * *, (N - 1)1J-,

(3.6)

A+ Cto-1
Gto

12 = Al/2y

the optimalvalue of w as a function Equation (3.6) can be used to determine of A will not be repeatedhere. (see, e.g., Young [1971]) and thisderivation A similar out fortheRed/Blackordering analysiscan be carried ofthegridpoints. In the variable n thereare two stencils, one forthe Red nodes and one forthe Black iteration is ones, as givenin Fig. 3.4. The corresponding R: (3.7)
Ujk =(1-)U+(Uik+1+U,k1+UJ-l,k+Uj+I,k)

4,

B:B u`k~=(1-o)u~',.I+ + 4 jk +u(k+ jk =

j,k+1

j,k-1

n? J-1,kUj

1 k)

The earliest timescorresponding to (3.7) are givenin Fig. 3.5. In thedata flow variable, P, both R and B nodes have the same stencil, the stencilof Fig. 3.3. Equation namely, (3.2) givesthe update formulaforall nodes. Hence, (3.5) and (3.6) also hold forthe RB ordering. Figure3.5 shows thatthe timesalong all lines with j + k even are equal,
n n n n Red n n+1 n+1 n n+1 Black n+1

FIG. 3.4. Red/Black stencil in variablen.

B 2,4 R 1,3 B 2,4 R 1,3

R 1,3 B 2,4 R 1,3 B 2,4

B 2,4 R 1,3 B 2,4 R 1,3

R 1,3 B 2,4 R 1,3 B 2,4

FIG. 3.5. Times for twoiterations of RB.

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

9-POINT

SOR

ITERATION

1169

and similarly in variablen occurevery for j + k odd, and thatiterations twotimeunits in P. Therefore, the change of variablesis
v=2n+[(j+k) mod 2]-1,
, i1

and the eigenvector of thisiteration components corresponding to

{sin exjsinX'Yk,

are

Wik-A

forthe R and B nodes,respectively. This agreeswiththeresult givenin Young [1954]. We now turnto the 9-pointstencil,with orderings based on four colors. The resultsof Adams and Jordan [1986] applied to this model problemshow thatthe 72 distinct can be groupedinto six equivalence classes withregard four-color orderings to convergence behavior (see Fig. 3.6). Representative fromeach of these orderings classes are: Ordering #1: #2: Ordering #3: Ordering Ordering #4: #5: Ordering #6: Ordering is colored The grid as in Fig.3.6(a) with ordering RIB/GIO. The grid is colored as in Fig.3.6(b) with ordering RIB/GIO. The grid is colored as in Fig.3.6(b) with ordering RIBO/IG. is colored The grid as in Fig.3.6(b) with ordering RIG/BIO. The grid is colored as in Fig.3.6(a) with ordering RIB/OIG. The grid is colored as in Fig.3.6(a) with ordering RIG/BIO.
G R G R O B O B R R B B G R G R O B O B G O R R B B G O G O G O

1/2sinexjsin )Yk

FIG. 3.6(a)

FIG. 3.6(b)

We will show thatOrderings#1, #2, and #4 are in factequivalentto the NR discussedin ? 2. Ordering ordering and givesslightly #3 is different, slower however, based on the spectralradiusand slightly convergence faster based on the convergence theiteration forsmoothinitialdata. We have notbeen able eigenvaluethatdominates to analyze eitherordering #5 or ordering #6. Ordering #1. Figure3.7 shows the update timesforthisordering, whichdefine the data flowvariable P. In thisvariable,each node has the same stencilwithupdate formula
k = (1-) (U +Uj+kl+kUJ
Uk+1

k-)

(3.8) +20
G
3,7 J-1k-1 + UJ-1l,k+1+ Uj+1,k-1 +U1,k+)-

0
4,8

R
1,5

B
2,6

R
1,5

B
2,6

G
3,7

0
4,8

G
3,7

0
4,8

R
1,5

B
2,6

R
1,5

B
2,6

G
3,7

0
4,8

FIG. 3.7. Times with for twoiterations Ordering #1.

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

1170

L. M. ADAMS,

R. J. LEVEQUE,

AND

D. M. YOUNG

The change of variablesis givenby (3.9) v =4n+c-3, wherec =0, 1,2, 3 forthe R, B, G, and 0 nodes of Fig. 3.7, respectively. we onlyneed to look is equivalentto the NR ordering, To see thatthisordering shownin Fig. 3.8. at the update timesforthe NR ordering,

7,11 5,9 3,7 1,5

8,12 6,10 4,8 2,6

9,13 7,11 5,9 3,7

10,14 8,12 6,10 4,8

for the9-pointNR ordering. FIG. 3.8. Times

(3.8) withthe If we define thevariable v by thesetimes, we again obtainiteration change of variables (3.10) v = 4n +2k+j-6. Since a change of variables gives (3.8) in eithercase, the eigenvalues and hence behaviorwill be identical.If g is an eigenvalueof (3.8), thenA = g4 iS convergence an eigenvalueof both NR and Ordering #1. change of The eigenvectors, however,will not be the same, since a different forthe NR ordering weredetermined variablesis used in each case. The eigenvectors the in ? 2. Using these and the above changes of variables allows us to determine forOrdering#1. In analyzing(3.8) we view it as applyingto all mesh eigenvectors in our applicationsit is applied onlyto points joints (j, k) at each level of v,although affecting of a singlecolorin each step.(Note thatwe could applyitto all pointswithout the results, but the work requiredwould be increasedby a factorof 4.) Since (3.8) u'k+4,an eigenvector of (3.8) consists requiresfourlevels of priordata to determine

of4(N-

1)2

values,

v=

V=v2'

for where V' = { Vj)k} j, k = 1, 2, *

N - 1. If g is the corresponding then eigenvalue,


-vl-v0-

V3 =g VL].
E R4(N-1)2

= gV' foreach v, and hence This indicatesthat VV+" V=

[vo]

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

9-POINT

SOR

ITERATION

1171

forwhich(j, k) is a If we now let V? be the vectorconsisting only of the values V3?k oftheoriginal iteration redpoint,and similarly forV?, VG and VO,thenan eigenvector in the n variable (withOrdering#1) has the form

(3.11)

V()=|2
g3

21
VR _
vO

R(_2

witheigenvalueA = g4. On the otherhand, by the change of variables (3.10), an eigenvector forthe NR in the n variablehas the form ordering
V(NR)
=

g2k+j VO

A= againwitheigenvalue

g4.

fortheNR ordering, Equation (2.11) givestheeigenvector


(Dk[((F/(2)l/2]j

V5kR) =

sin exj sin 7)Yk,


g2.

where(DI and (2 are givenby (2.7) and a = A1/2=


Vj9k = g 2k-j V5NR) 1/2]j

Using thiswe obtain

= g-j[((l/(2)

sin exj sin 7iYk-

eigenvalue-eigenvector pairs alreadyfound. two Ordering#2. For Ordering#2, the associated earliesttimes for the first iterations are givenin Fig. 3.9. An inspection of Fig. 3.9 showsthatthe R and G nodes have the same stencilin the variable P, withthe following update formula:

The eigenvectors are now determined by (3.12) with q,7j = , 2 , * - *, (N- 1)(X-/2). As before,the frequenciese, iq = ((N - 1)/2+ 1) , * , (N - 1)ir give repeatsof the

R, G: uk
(3.12)

(= (1- )Uk+

(UJ1

k+Uj+l1k+Uj-2l+Ujk+l) 3

to~ (ivi3 +20 (Uj-1 20

U>3 k-1+Uj-l,k+l

k?

? +

j+l,k-l

+ Uiik 1)3 k+l)+j+l

Likewise,the B and 0 nodes have the same stencilwithupdate formula: B, 0: (3.13)


+2
(ulk-1

uk

= ( 1-)uk

?(U

k + UJ+ k + Ujk-1I + Uj,k+l

+ UJ-l ,k+l + UJ+1,k-1 + Ujv+l k+l )-

G 3,7 R
1,5

0 4,8 B
2,6

G 3,7 R
1,5

0 4,8 B
2,6

G
3,7

0
4,8

G
3,7

0
4,8

R
1,5

B
2,6

R
1,5

B
2,6

FIG. 3.9. RI B/ GI O times #2. for Ordering

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

1172

L. M.

ADAMS,

R. J. LEVEQUE,

AND

D.

M.

YOUNG

The change of variablesfromn to v is again givenby (3.9), where c = 0, 1,2 and 3 forthe R, B, G and 0 equations,respectively. needed to verify that Now, the equations in (3.12) and (3.13) have the symmetry jk= sin exj sin 'qyk.Thus the methodsin (3.12) and (3.13) have the same value for but different amplification say g1 and g2. A singlestep of the fullmethod, factors, Vjok in variable n, consistsof foursweeps,two with(3.12) and two with(3.13) and hence has amplification factor
A=
g92-

a = g9g2 byconsidering In orderto determine twosweepsofthemethod, A,we find the following values Red and Black, say. We substitute
UJkk=e'(Xj+ z,+1 =
v+2

yk

UJk

g2 UJk,

(3.14)

UJk =glg2u k,
UJk =glg2uJk z v v+4 =gl2ujk, = 92 2 Ujk

Ujk

U+5

2g3 U glg 12 Ujk

into (3.12) to get

(31) (3.15)

2(-c) gg 19-2

+ - 22csh+ggco (292 COS eh + 2g,92

COS

cos qih). + hcsrh) 77h h+1~oseh 9192 CO


Putting

Next,we take a step with(3.13) to update the Black nodes, and obtain u5. (3.14) into (3.13) for v+5 we get (3.16)
9192 =

-w)+

(2

r2coseh+2g,g2C

cos

ehcOS-7h).

to give a quartic Equations (3.15) and (3.16) can be equated and gl and g2 eliminated this quarticis again (2.10), the quarticobtainedin our equation for a. Surprisingly, analysisof the NR ordering by separationof variables.This shows thatthisordering is equivalentto the NR ordering, and hence is also equivalentto Ordering#1. in the variable n can be seen from(3.9) and (3.14) to be The eigenvectors
= v(#2)

VB 9g2
g2 VO

R(N-1)2

Again, we find that eigenvalue-eigenvector pairs are repeated for the frequencies ((N N-1)/2 +1)T, - *, (N-1)ir. two iterations are #3. For this ordering, the earliesttimesforthe first Ordering givenin Fig. 3.10. The R and 0 nodes have the same stencilin the variable v with the following update formula:
R, O:jk

where V? = sin exj sin 'yk, e, Y7 = rr,2IT,, * ,(N

1)(Xr/2) with eigenvalue A=

g2g2

(3.17)
+?-

5
(UJf,+k-1 20+ +

jk+

jk-

j+

j-1,

Uj-l,k+ll

j+,k-1+ 2

Uj?+l,k?l)-

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

9-POINT

SOR

ITERATION

1173
0 3,7 B
2,6

G 4,8 R
1,5

0 3,7 B
2,6

G 4,8 R
1,5

G 4,8 R 1,5

0 3,7 B 2,6

G 4,8 R 1,5

0 3,7 B 2,6

FIG. 3.10. R/B/O/G times #3. for Ordering

Likewise,the B and G nodes are updated by the formula B,G: (3.18)


+
2(U-l,k-

uto4=(1-w)uk?

5,

(U

l?+

l?U

k+ uJ-1k)

+ Uj-i,k+l

+ U)'lk-1

j+l,k+l)-

Substituting (3.14) into (3.17) and (3.18) yields


(3.19) (3.19) - ( - (s) + g,2g2 ghg+(92w)+S2eh)

(919

rh 2 COS

(h) +

5 g1,92

COS

rqhcos eh

and

(3.20)

gg2 =(1-w)+

(gl cos

rh

192 cos eh)

9192 COS ehcos rqh, ~~~~~~~5

As before,we equate (3.19) and (3.20), eliminategl and g2, and getthe respectively. following quarticin a = 9192
a 4(3.21)
(5CI COS

-qh cos eh + 4w2 COS rqhcos eh) a 3


OC 24 2

+(25

cos2 qh COS2 eh + 2(w -_1) COS

2 qh - 2

C cos2

h) a2

+(3w(1-w)

qhcos eh-_22 cos ehcos qh)a+(w-1)2=0.

The change of variables fromn to v is again givenby (3.9) where c = 0, 1,2, and 3 thesamearguments Following to theR, B, 0, and G equations, respectively. correspond in variable n forOrdering#3 are givenby as before,we findthatthe eigenvectors

I- 1?
92

V(#3)

VIB
VO

)2

glg2

pairs are repeated for e, i = Again, we find that eigenvalue-eigenvector This can be seen fortheeigenvaluesfrom (3.21) since ((N- 1)/2+ 1), ***, (N- 1)7r. - 'q) leaves thequarticunchangedand replacing (4,'q) replacing(4,7q)by (N7T- e, N7T - 'q) negatesthe coefficient of the a and a3 terms. by either(N7T- e, 'q) or (4, N7T The quarticin (3.21) does not agree withthe quarticin (2.10), and the rootsdo rate theoptimaltoand corresponding convergence notagreein general.Consequently, consideredso far. than forthe otherorderings forthisordering are different
912.

where

V3.= sin exj sin rqYk,4, q = -r,2i-, * * , (N - 1)(iT/2) and the eigenvalue is A =

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

1174

L. M. ADAMS,

R. J. LEVEQUE,

AND

D. M. YOUNG

Numericalresults showthattherootsof (3.21) have thesame qualitative behavior as shown in Fig. 2.1 with e= q = X givingthe slowestdecay. The optimalto is also seen to occur wherethe largestreal root and the complex root of largestmodulus intersect forthe frequencies these observations by performing e= q = 7. We confirm an identicalanalysisto thatgivenin ? 2 forthe rowwiseordering. We summarize our resultsbelow. The complexrootof (3.21) withlargest = modulusforsmallh occurswhen T and is (3.22) + O(h2) eil(1 -_3k1h)
7T=

where 01=cos-=(-CO ). The largestreal root of (3.21) forsmall h also occurs when (3.23) where (3.24) cl =5k1-8A25kj-96&7. 1 - c1h+ O(h2)

= X

and is

The optimalw occurs when the modulus of the root in (3.22) equals the modulus of the rootin (3.23). This value of to is

(3.25)

to

= 42-

i7h+(h2)

: 2 -2.1387rh.
1-1.604irh.

The corresponding value of the spectralradius is (3.26) 1-2c,h + O(h2) Popt= This RBOG SOR iterationwas programmed and the resultsof (3.25) and (3.26) confirmed. By comparing (2.25) and (3.26), we see thatdifferent evaluationorderings forthe same coloringof the gridpointscan lead to different asymptotic convergence rates,based on the spectralradius. The eigenvector associated withthe spectralradius is highlyoscillatory as the mesh is refined (recall thatthiswas trueforthe rowwiseordering also). An analysis similar to Garabedian's can be performed to find theconvergence behaviorforsmooth initialdata. To do this,we choose k1to maximizec1 in (3.24) to get
k,=

g:= 1.95959-r

and 2c1= Vr (3.27)


2.44949Xg.

The corresponding values of p*ptand to*t are

pt = 1- 2.44949rh,

= t)o*t

2 - 1.95959rh.

The values in (3.27) show thatforsmoothinitialdata this ordering is preferred over and Orderings the rowwiseordering #1 and #2. Note thatthe values in (3.25) and (3.26), based on the true spectralradius,lead to the opposite conclusion.However, the results of (3.25) and (3.26) are valid fornonsmooth initialdata. Figure3.11 shows the decay of 11u12forvariousinitialdata. For initialdata obtainedby discretizing the smoothdata u(x, y) = (x2-x)(y2-y), the observeddecay is initially much closer to thatpredictedby (3.27), than to pn. leads to a quarticequivalentto (2.10) withe and i1 #4. This ordering Ordering interchanged. Hence, fora square gridwithstepsizeh in boththe x and y directions,
and Poptfor this ordering are also given by (2.25). toopt

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

9-POINT

SOR

ITERATION

1175

lo-,

1W2~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~d

20

40

60

80 lt er a t l on

100

120

1 40

1 60

FIG. 3.11. Convergence of error h = 0.05 history (2-norm versusiteration number) for Ordering #3 with and w = 1.86. Threechoicesof initialdata are compared:(a) An eigenvector to thespectral corresponding radius.(b) An eigenvector corresponding to the largestnonspurious eigenvalue. (c) Ujk = (x-x)(y -Yk). (d) uo=1.

Anyof thetwenty-four possibleorderings associatedwithFig. 3.6(b) can be easily proved to have the same eigenvaluesas Ordering#2 or Ordering#3 thatwe have analyzed here (Adams and Jordan [1986]). We have found two equivalence classes thatcharacterize theasymptotic convergence ratebehavior and therelaxation parameter cl. Orderings #1, #2, and #4 belongto the first class and Ordering #3 belongsto the second one. In addition,therecould be two moreclasses corresponding to Orderings #5 and #6 thatwe have not been able to analyze. 4. A 9-point pseudo-SOR method.We now considera pseudo-SOR methodwith the stencilin Fig. 4.1, and iteration
n?1 Ujk

( j1-_
+?

) )Uk

to n 1 +-(UJk-l

(4.1)

n?1 UJ1,k+

Un UJ+1,k +

U jk+?1)

(UJ1,k+1U- +

j+l,k+l

+ U?+l,k-1)

which differs from(2.1) in the last two terms.This methodcan be analyzed by the to Fig. 4.1 are techniquesof ? 3. The earliesttimesfortwo iterations corresponding equal to those of Fig. 3.2. That is, the iteration of thevariable v is expressedin terms
UJk =

(1- w)U k

+-(UJ7k-1

+ UJ-l,k+

UJ?+,k+

Uj,k+l)

(4.2)
+ (U.-1

+ Ulk-1

+ Uj+,k+?1

+k-1),

withthe change of variablesgivenby (3.3). It is interesting to note thatthe timesin in (4.2) are also obtainedforan RB ordering of thegridwith Fig. 3.5 and theiteration
n n+1 n+1 n n n+1 n-I n n

stencil in variablen. FIG. 4.1. NR modified

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

1176

L. M. ADAMS,

R. J. LEVEQUE,

AND

D. M. YOUNG

the Red and Black stencilsshownin Fig. 4.2. Since two colors do not decouple a grid discretized withthe9-pointstencil, it is tempting to use old information forthe Black to Blackcouplingas shownin Fig.4.2 to obtaina method suitablefor parallelcomputers. was consideredby Kuo, Levy, and Musicus [1986] for a 9-point This modification a discretization stencilarisingfrom of a PDE witha cross-derivative term. Theyshow the convergence rateof SOR fortheir problemto be 1 - 0(h) in theregionwherethe lowest frequency dominates.We show by analyzing(4.2) that the use of only two colors is not sufficient forthe 9-pointstencilarisingfrom the Laplacian. In particular, 0< l <, thattheoptimaltooccurs we will show thatthemethodconverges whenever wherethe lowestand highest frequencies cross,and thatthe rateof convergence with theoptimaltois approximately 1 - 3ITh2 forsmallh as opposed to theresult1 - 1.79Irh obtainedin ? 3 forthe trueSOR methodwithOrderings #1 or #2. We begin by observing that u'. = g' sin exj sin 7qYk is an eigenmodeof (4.2). We thisinto (4.2) to get substitute (4.3)
g2=
(1-)
+

g (cos h+ cos h)+ - cos ehcos rh,


5 5

whereg2 is the eigenvalueof the methodin (4.1) or the RB methoddepictedin Fig. in variable n forthe NR ordering 4.2. The eigenvectors (4.1) or the RB ordering (Fig. 4.2) are the same as the respective ones givenin ? 3 forthe 5-pointstencil. Equation (4.3) can be solved forg to get (4.4) g =(cos 5 eh+ cosih) i? 25 (cos eh + cos qh eh cos qh.

When w = 1, the "pseudo Gauss-Seidel" methodhas amplification factor (45)


g2=(C

o-h+cosT1h?
=

<f?(cos

eh+cos 7h)2+

(cos

h cos 7h))

which is maximized when

(4.5) to be cos2 irh.This is identicalto the spectralradius of Gauss-Siedel forthe when stencil. The methods, differ model problemwiththe 5-point however, drastically
to $ 1.

= r. The maximum value is easily determined from

To determine the optimal to,we mustminimizethe maximummodulus of g in Then (4.4). Two cases mustbe considered.First, assumetheradicalin (4.4) is negative.
(4.6)
hg2=c(1-c
C

h cos 'qh)-1,

< 1, and hence and forconvergence, we requireJgJ2

(4.7)

1 1 - COS eh cos qh' O< W<h2

n n n

n n n Red

n n n

n n+1 n

n+1 n n+1 Black

n n+1 n

FIG. 4.2. RB 9-point stencil in variablen. modified

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

9-POINT

SOR

ITERATION

1177

forcl > 3. Also (4.6) showsthatthe As h - 0, (4.7) showsthatthe methodis divergent is maximizedwhen cos eh = -cos -jh.This occurs when = N -e (low value of JgJ2 is in the other)and thismaximum and highfrequency frequency in one direction

(4.8)
As h-O gmax. *jcv_-11.

JgJax =(1

5 COS2

h)-1.

Second, assume the radical in (4.4) is positive.Then, the maximumvalue of g occurswhen 4:= ,= X and is
g2max =

(4.9)

25 +5 5

2 Cos2 (c1

h+

+ cos2

1Th ~~~~~~5

+ (1 - v) + 5cos2Th. c2 Cos2 1Th 5 25 (4.9) occurswhentheradicalis zero, It is interesting to notethatthe cothatminimizes forthe co thatminimizes and as h -*0, co 5/2. That is, the methodis divergent gmax of (4.9) for the lowest frequency.Recall that the cv that is optimal for SOR for For thepseudo-SOR matrices corresponds to thelowestfrequency. consistently ordered method,the optimal cv occurs where the modulus of the eigenvalues of the two = N - e and 77 =4: = IT are equal. This co is determined by equating(4.8) frequencies 77 and (4.9) to get
cos 7rh
2

(4.10)

6c Ccos Ih

+c cos2 rh

1) -4((o-1)2=0.

As h -*0,co o- 5/3, so we look fora solutionto (4.10) of the form (4.11) co(h) =5+c,h+c2h2+.

yields Substituting (4.11) into (4.10) and equatingterms


C1 = 0, C2 =-IT ,

values of the optimalco and spectralradius of (4.1) are and the corresponding (4.12)
t3-902h2I opt

Popt=1-3ir2h2.

radiusofthepseudo-Gauss-Seidel Comparing(4.12) to COS2 ih 1 - &2h2,thespectral method,shows that as h -*0, this methodwithoptimal cl is only threetimesfaster thanwithcv= 1. This is not nearlyas good as the trueSOR methodsdiscussedin ? 3, wherethe decay factoris 1 - 0(h) forthe optimalc. of pointand line methods.Let the system Ax= b be blocked as 5. Comparison
D,
-L21 -U12
.

-L32

D2

..L.=[.
...

Ulm

[Ul

[bb]
b2

U2m

u2

1-Lm2 L-Lm

...

Dm

[UmJ

L bm

methodis definedas The line-Jacobi (5.1) D1u7n+= i


j<i

Liju7+ j3 U nu+bi
j>i

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

1178

L. M. ADAMS,

R. J. LEVEQUE,

AND

D. M. YOUNG

and the line-SOR methodas


Diu"'=wZc) E Liju
j<i

+ cl) E U1ju7+wbi,+ (1-wt))Diu7


j>i

whereforboth methodsui corresponds to the nodes in row i of thegrid.The spectral radius of the Jacobi methodforthe 5-pointand 9-pointstencilscan easily be found by separationof variables,since sin fxj sin T)Yk is an eigenvector of iteration(5.1). These resultsare givenin Fig. 5.1. The spectralradius of the line-SOR methodforthe 5-pointand 9-pointstencils can now be foundusingYoung's theory forblock-consistently orderedmatrices. That is, if/uis thespectral radiusof line-Jacobi, thenwo0pt and poptfor line-SOR are givenby
(5.2) ()opt 2

Popt=Wi)optl.

These SOR resultsare summarizedin Fig. 5.2, where Garabedian's resultsfor the 9-pointpoint methodsare givenin parentheses. Figure5.2 shows that,based on the actual spectralradius,the line methodsconvergefasterthan the point methodsfor boththe5 and 9 pointstencils. The 9-point line SOR method has thesame convergence rate as the 5-pointline SOR methodfor small h; whereasthe 5-pointSOR method witha consistent can be expectedto convergeslightly ordering thanany of the faster 9-pointpoint SOR methodsthatwe analyzed. However,the convergence ratewill be observedin practicewithsmoothinitialdata forthe 9-pointpoint methodsmay be closerto Garabedian's results. That is, we can stillexpectthe 9-pointline methodsto convergeslightly faster than the point methods,but now, the 9-pointpoint methods will converge thanthe5-point faster This latter pointmethod. factis encouraging since the 9-pointdiscretization is moreaccuratethanthe 5-pointone.
Method 5-pointpoint Spectralradius cos irh 1 - 1 r2h2

line 5-point
9-pointpoint

cos 1rh 12 - cos rrh


1 _ 372h2

2h2

4 cos 7h + cos 2 1Th

9-pointline

5 COSirh 1 -s

OS2 h COS 7rh

2h2

FIG. 5.1. Spectral radiusofpointand lineJacobimethods.

Method 5-pointpoint 5-pointline 9-pointpoint 9-pointpoint 9-pointpoint 9-pointpoint 9-pointline

Ordering Rowwise or Red/Black Rowwise or Red/Black Ordering#1 Ordering#2 Ordering#3 Rowwise Rowwise or Red/Black

Spectralradius Pop, (popt) 1 -2-rh 1 - 2vxiTh 1 - 1.79-rh (1 - 2.355 1h) 1 - 1.7917-h (1 - 2.35 1-h) 1 - 1.6OiTh (1 -2.4517rh) 1 - 1.79irh (1 -2.3517rh) 1 - 2f iTh

FIG. 5.2. Spectral radius forpointand line SOR methods.

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

9-POINT

SOR

ITERATION

1179

6. Conclusions.The SOR method with several orderingsand a pseudo-SOR methodhave been analyzedforthe9-pointLaplacian. The Fourieranalysistechniques proposed by LeVeque and Trefethen [1986] and separationof variables techniques were used to determine the eigenvaluesand the eigenvectors of these methods. forthe9-point We examinedtheSOR method rowwise Laplacian usingthenatural and severalmulticolor For all these orderings, we gave a quartic ordering orderings. of the frequencies equation forthe square root of the eigenvaluesas a function and t. The optimalt was foundfortherowwiseordering and Ordering #3 by asymptotically solvinga quarticequation forthe intersection of the largest(in modulus) of the complex and real roots. Our results wereconfirmed by performing the SOR iteration withan initialguess to theeigenvector associatedwiththespectral corresponding radius.The observedrate of convergence matchedthatpredicted by thetheory to fivedecimal places. The SOR iteration was also performed by usinga smoothinitialguess,obtainedby discretizing (x - x2)(y - y2) forvarious stepsizes,h. In these cases, the observedconvergence rate moreclosely resembled thatpredicted by Garabedian. The resultsalso show that different orderings of the same coloringcan lead to in Fig. 3.6(b) have different spectralradii: R/B/G/IO and R/B/O /G forthecoloring and 1 - 1.6017rh, spectralradii of 1 - 1.7917rh For smoothinitialdata, these respectively. two orderings also led to different effective spectralradiiobservedin practice, namely, 1 -2.3517rh and 1- 2.4517rh, This information a can be usefulin selecting respectively. coloring,an ordering, and appropriateinitial data to use with multicolorSOR on parallel computers (Adams and Ortega[1982]). An analysisof the pseudo-SOR methodshowed thatthe optimalt occurswhen the highand low frequencies cross and thatthe corresponding spectralradiusis only 1 -31T2h2. This is inferior to boththe 5-pointand 9-pointSOR methodswe analyzed. In addition,forsmall h, the pseudomethodonly converges for0 < w < 3. The 5-pointand 9-pointand line SOR methodswere compared for the model thanthepointmethods. problemforsmall h. The line methods faster converge slightly The 9-point lineand 5-point linemethods havethesameasymptotic rateofconvergence, but the 5-pointpoint methodwitha consistent was 1.12 timesfaster, based ordering on the spectralradius,and 1.23 timesslower,based on Garabedian's arguments, than thebest9-pointpointmethodthatwe analyzed.Hence, fora smoothinitialguess,the than 9-pointpoint methodscan be expectedto be moreaccurateand convergefaster the 5-pointpointmethod. The authors thankLloyd N. Trefethen Acknowledgments. theinitial forproviding stimulusfor this work and for several valuable subsequentconversations. We also thankElizabeth Ong forprogramming the SOR methodwithour orderings. Note added in proof. In unpublishedwork carriedout in 1951, the thirdlisted authorof thispaper used the methodof separationof variablesto derivethe quartic equation (2.13) forthe eigenvaluesof the SOR methodforthe 9-pointequation with therowwiseordering. Because of thelack of sufficient he was computational faculties, unable to use thisequationto determine theoptimum value of wO and thecorresponding rate of the SOR method.The worklay dormant convergence untilits resurrection in the summer of 1986. Richard Varga and JohnBuoni brought to our attention thatan analysisof the SOR methodforthe9-pointequationwiththe rowwiseordering had been carriedout

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

1180

L. M. ADAMS,

R. J. LEVEQUE,

AND

D. M. YOUNG

by A. I. van de Vooren and A. C. Vliegenthart [1967]. This analysis includes cases wherethemeshsizes Ax and Ayin thex and y directions, respectively, maybe different. For the case whereAx = Ay,the authorsobtain the same asymptotic resultsforthe spectralradius as givenby (2.25). In thissection,we have carriedtheseresults further by finding the eigenvectors as well. These eigenvectors were used in our numerical experiments as starting data to verify (2.25). The eigenvectors werealso used to explain the discrepancy betweenGarabedian's resultsand those givenin (2.25).

REFERENCES L. M. ADAMS AND J. M. ORTEGA [1982], A multi-color SOR methodfor parallelcomputation, Proc. of the IEEE Catalog N. 82CH1794-7,August,pp. 53-56. 1982 Conference on Parallel Processing, SIAM J.Sci. Statist. L. M. ADAMS AND H. F. JORDAN [1986], Is SOR Color-blind?, Comput., 7, pp.490-506. G. FORSYTHE AND W. WASOW [1960], Finite-Difference Methods for PartialDifferential Equations,John Wiley,New York, p. 266. ratesof iterative S. FRANKEL [1950], Convergence treatments Math. Comp., ofpartialdifferential equations, 4, pp. 65-75. P. GARABEDIAN [1956], Estimation oftherelaxationfactorforsmallmesh size,Math.Comp., 10,pp. 183-185. C. Kuo, B. LEVY, AND B. MUSIcUS [1986], A local relaxationmethod for solvingellipticPDEs on mesh-connected arrays, SIAM J. Sci. Statist. Comput.,accepted. R. LEVEQUE AND L. N. TREFETHEN [1986], Fourieranalysisof the SOR iteration, ICASE Report No. 86-93, ICASE-NASA LangleyResearchCenter,Hampton,VA, to appear in IMA J. Num. Anal. A. I. VAN DE VOOREN AND A. C. VLIEGENTHART [1967], The 9-point formula for Laplace's difference equation, J. Engr.Math., 1, pp.187-202. Trans.Amer. D. M. YOUNG [1954], Iterative methods forsolving partialdifferential equationsof elliptic type, Math. Soc., 76, pp. 92-111. Academic Press,New York. Solutionof Large LinearSystems, [1971], Iterative

This content downloaded by the authorized user from 192.168.82.217 on Tue, 13 Nov 2012 17:20:07 PM All use subject to JSTOR Terms and Conditions

Anda mungkin juga menyukai