Numerical Methods Using Matlab, 1999

6661 IIEH 89}UeI4 Numerical Methods Using MATLAB Third Edition John H. Mathews. California State University, Fuller Kurtis D. Fink Northwest Missoun State Universi Hogskoten i Vestfoid Biblioteker- Borre Prentice Halt, Upper Saddle River, NI 07458 Contents a 12 24 22 23 24 25 Preface vii Preliminaries 1 Review of Caleatus Binary Numbers 1 Enor Analysis 24 The Solution of Nonlinear Equations f(x) =0 heron forSohing.x = g(x) 47 Bricketing Metis for Locating x Root 57 Inia. Approximation and Convergence Cite 62 Nowon-Rapisonand Secent Mecods 70 ‘ides'sProceas and Steffensen’s and Mater's Methods (Opional) 90 ‘The Solution of Linear SystemsAX = B 101 Inrocuction o Vectors ané Matrices 207Cconrewts 32. Propenies of Vectors and Matrices (09 33 UpperarangutarLineer Systems 120 34 Gavssian Elimination and Pvoting 125 33 Triangular Factoriabon 147 26 teratve Methods for Linear Systens 156 37 teranon for Nooinear Systems: Seidel and Newton's Methods (Optional) 167 Interpolation and Polynomial Approximation: 186 4.1 Taylor Series ané Cusultion of Functions 187 42 Intodvctiono Interpolation 199 43 Lagrange Approximation 206 44 Newton Polynomials 229 45 Chebyshev Polynomials (Optional; 230 46 Padé Approximations 243 Curve Fitting 252 Si Leasesquares Lie 252, 52 CaveFining 262 53 Interpolation by Spline Functions 279 $4 Fourier Series ani Tigonometic Polynomials 297 Numerical Differentiation 310 6.1 Approximating The Derivative 3/1 62 Numeric! Diflereniation Formulas J29 Numerical Integration 342 11 Anebution 1 Quadratme 343 12 Composite Trapescide and Simpscr’s Rule 354 73 Recursive Rules and Remberg Integration 368 TA Acaptive Quadraure 282 7S GauseLegendre hegraton (Option) 389 covresrs 8 9 92 93 94 9s 96 97 98 99 10 101 102 103 11 Wa 2 3 ne Numerical Optimization 399 Minimization of Function 400 Solution of Differential Equations 426 Invoduction to Differential Equations 427 Euer's Method 433, Heun'sMethod 43 Taylor Series Metiod 451 Runge-Kutta Methods 458 Predictor-Corrector Methods 474 ‘Systems of Differesti! Equations 487 Boundary Velue Problems 497 Finite-difference Nethod 505 Solution of Partial Differential Equations == 5/4 Hyperbolic Equations 516 Parabolic Equations 526 Eliptic Equations 538 Eigenvalues and Eigenvectors 555, Homogeneous Systems: The Eigensae Problem 556 Power Method 548 Jacob's Method 581 Eigenvalues of Syrmeuic Matrices. S94 Appendix: An Introduction to MATLAB 608 Some Suggested References for Reports 616 Bibliography and References 619 Answers to Selected Exercises 637 Index 655Preface “This book provides fundamental intoduction to numerical analysis suaable for up ergraduate students in mathematics, computer science, physical sciences, and en. necting. tis assumed tha the reader is Faiiar with cazulus and has aden a struc tured programming course The text has enough material ited modulaey for ether a single-term course or 8 year sequence. ln short the book contains enough material so insauctors wil be able to select topics appropriate to their needs ‘Studens of various backgrounds should fied numerical methods quite imteresting and useful, and ths is kep. in mind throughout the book. Thus, thee is wide var- txy of examples and problems that help sharpen one's sil in tht the doary anc practice of eumerical analysis. Compute calculations ae presented in the farm of tables and graphs whenever possible so that the resulting memerical approximations are casero visualize and itespret. MATLAB programs are he vehicle for presenting the tundecying numerical alga Emphasis is placed on understanding why numerical methods work and their in itacons. ‘This is challengmg and involves a bslance between tory, enor analysis and readability. An error analysis for cach method is presented in a fashion that is appropriate forthe method at hand. yet doesnot turn off he reades. A mathematica) derivation foreach methodis given that uses elementary reals and builds stsen's understanding of calculus. Compucerassgnmenss using MATLAB give students 21 pporumnity (© practice thet skills at scienfic programming ‘Shoner numeric! exercises can be caried out with a pocket calevlato!eompute. andthe longer ones can be done using MATLAB subroutines It is left fo: the instruc tos co guide the students regard the pedagogical use of numerical computatios Each instructor can make assignments that are appropri to the avali=le compatvill eRerace ing resources. Experimentation with the MATLAB subroutine libranes is encourage Thesematenals can be used to assist students jn ths completion of the surnerical anal {sis component of computer laboratory exercises “This Third Edition grows out of much polshng of the narrative forthe Second ition, For example, the QR method has been added to the chapter on Eigenvalues and Eigenvectors. New to this edition isthe explicit use of the software MATLAB. ‘An appendix gives an introduction to MATLAB syntex. Examples have been added ‘throughout the text with MATLAB and complste MATLAB programs are given in ‘each section, An instructor's disk is available upon request from te puslisher. Previously we took the atiauac that any software program that students mastered would work fine, However, many students entering this course have yet to master a ‘programming ‘anguage (computer science students excepted). MATLAB has become ‘he tool of early all engineers and applied ratzematicians, and its newest versions have improved the programmirg aspects. So we think that suuen's will have an easier and more productive ine ir ths MATLAB vesiow of our text, Acknowledgments We weld lke 10 express my gra rude to all ke peeple whose efforts contributed the vous editions ofthis Book. 1 John Mathews) thank the students at Califomia Statue Universi’. ullerton. Thank my colleagues Stephen Goode Mathew Koshy, Etiward Sabotka, Has Schule, and Soo Tang Tarfor cir suppor in he ts edivon: dadiiccally {dhank Russel’ Egbert, William Geartart. Ronald Miller, and Gieg Pierse for thar suggestions fo the sora euition. abo shank James Friel. Chairman of the ‘Matinatis Department at CSUE, for his encoaragement Reviewers who made usefo secommendations fr the fit edition are Walter M. Paterson, IH. Landes College: George 8. Mill. Cental Connetica State Univer: siy, Peter J, Gingo, The University of Akson: Michael A, Prcedmar, The University of Alaska Pirhanks; and Keneett P Bube, Lisersiy of Califomia, Los Aegees. For the second edition, we thank Richard Bumby, Rutgers Cniversity: Rober L. Cony, US. Aimy: Brace Bdwards, Univesity of Florida: anc David R. Hi, Temple Univer. sty or this third edition we wis to thank Tire Sauer, George Mason University: Ger ald M Pitsick, University of Oklhoa: Victor De Brunner, Universty of Oklahoe, George Trapp, West Veginia University; Tad ark, University of Alabama, Hontsille JefieyS, Scroggs, Non Carclina Stte University; Kun Georg, Colorado State Uni- ‘versity: and James N. Cracock, Souther Minos cniversity at Carbondale. ‘Suggestions for inprovements and additions tothe book ave always welcome and ‘can be made by comesponding dirslly with the hors Jon EL. Mathews Kurtis D. Fink Mathematics Deparment Depustment of Mathematics California State University Northwest Missouri State University Fallerion, CA 92634 Maryville, MO 64468 eatheve@fcllerton. edu xfink@nai] nymssouri.edu Preliminaries sider the function fix} = cos(x). its derivative f"(x) = ~sin(x), and its an- rivaive F(x) = sint:) + C. These formulas were studied in calculus. The former is used w determine the slope m = (9) ofthe curve y = F¢x}ata point (xa, F(x0) ) and the laer is used to commpute the area under the curve fora <1 5 6. ‘The slape at the poiet Gr/2, 0) is m = f(/2) = ~ | and can be used to find the lungentlin® at th's point see Figure I.1(a)): 5) r= 6G) dian =m (x Figure 11 (2) The tangent lane 1 the curve» = cos{x) atthe peint (12.9)2 Cir 1 PReLintvances: Figure 11h) The area under the cums» ~ costa) over the ier wa, ‘The area under the carve for 0 < x < =/2 is computed using an integral (see Fig: are 1.14b)} wen [ costnds =F Io 2) - #0 sin(Z)-0= 1 ‘These are some of the sults that we will nced to use from ealeutus, Review of Calculus tis assumed that the wader is tamiliar with the notation and subject matter covered in the undergraduate calculus sequence. This should have included the tepics of limits ‘continaity, differentiation. integration, sequences, andssies. ‘Throughout the book we refer tthe following results imits and Continaity Definition 1.1. Assume that f(x) is defined on a set § of teal numpcrs. Then f is said tohave the limit L atx = xo, and ve write wo lin, fay if, given any € > 0, there exists 8 > 1 such that, wherever x € $0 -< bx ~ a9] <8 implies that |f(2) — LI < «. When the A-ineremen: notation x — xp + h is used, ‘equation (1) becomes @ Jim (0004 Ske. 11 REVIW OF CALCULES 3 Definition 1.2, Assume that f(x) is defined on a set 5 of real nummbers and let xp € S. ‘Then f is stid tobe continuous af x = xy if @ slim, fx) = fou The function f is said ts be continuous on S if itis continuous at cach poimt x e 5. ‘The nouticn C7(S) stands for the set ofall functions J such that f and ts first n erivatives are continuous on S. When $ is an interval, say [a,b], then the notation Cla, 6 is used. As an example, consider the function f(x) = x om the inter val [=1,1}. Clearly. fer) and f"(2) = (4/3)x"? aye con:inuous on |= 1, 1), while £8) = (4/9)x-2? is pot continuous atx a Definition 1.3. Suppote that (x2; is an infinite sequence, Then the sequence is said to have the limit L, and we write m dim x0 — if, givenans € > 0, there exists a positive integer W = N(e)such that > 1 implies that tn = L] <€ a Whea a sequence has a limit, we say that itis a convergent sequence. Another commonly used notationis ‘xq > Las’ 20." Equation 44) is equivdlent io o imor D=0. ‘Thus we car view the sequence {¢n)%, = [22 — LJ, a8 an error sequence. The following theorem relates th> concepts of continuity and coneryent seqeence. ‘Theorem 1.1, Assume that f(x) is defined on the set $ und xp € 5. The following Statements are equivalen:: (2) The function fis continuous atx _ COVME gen vy — oy tem im. fen Fao) ‘Theorem 1.2 (Intermediate Value Theorem). Assume thar f © Clo, 01 and L is ‘any number between f(a) and f(b), Then there exists a number ¢, with ¢ = (a, bl. such that f (2) = L. Example1.1. The function (2) ~ cos(s ~ 1) ts continucus over [0,1], and the constant L = 0.8 € ‘cox(0), cos(!)). The solution to f(x) = OB over [0.1] is c1 = 0.356499. milly f(z) is continous over (1. 2.5]. and & = 0.8 € (cos(2.5), cas). The saludo to f(2) = 0.3 over, 2.5}is cz — 1.643802. These rwo cases are shown in Figure 1.2. =40 Cuap. 1 PRELIANARIES igure 12 The intermedat: »h ‘theorem spied to the funtion fq) = cosix ~ 1) over (0, 1] and ‘over the interval [1.2.5] (afar fey) Figure 13 The extreme we vc ‘heorem applied the fuse) PIs 8S 66.58 ‘ver the ineral [0,3] Theorem 1.3 ‘Extreme Value Theorem for « Continuous Function). Assusre that F © Cla,d}, Then there esists a lower bound Mi, an uppe= bound Mz, and two numbers x1, 22 € (a, 6] such that o My = flan) < f3) < fla) = My whenever = € [a,b] We sometimes express this by waiting My = Fle = min LF) and Ma = Fa) = pas 5} Differentiable Functions Definition 1.4. Assume that (x) is definedoon an open interval containing zo, Then ‘Fs sai tobe differentiable at xo if jim £22= 500) oo 30 o Sec. 11 Review oF CaLcetus s exists, When this limt exists, itis denoted by (xo) ati is called the derivasve of f at ro, An equivalent way to express this imit sto use the h-increment notation: flxy +h) fe) rc (20) lic, fs, Fe) ‘A function that has 4 derivative at each point ina se Sis said to be differentiable fon S. Note that, the nuniner a1 = "(xy) isthe slope ofthe targent line to the graph of the fimction y = (x) atthe point (x0, f(20)). . ‘Theorem Lt. If f(x) s differentiable at x 0. theo f(c}is condinuous a 1c follows from Theorern 1.3 that fa function J is differentiable on + closed imterva fa, 8), then is extreme valoesozcur atthe end poims of the interval a a the stitial pois (solutions of fore € ia. Show that there is a namberd € (a,b) such that #"(d) <0 AAD 18/58 21/714 29/91. Show ary Numbers Human beings do arithmetic using the decimal (base 10) aamber system. Most com puters do aritimetic using the binary (base 2) number system. It may seem othewwise Since commaunceation with the computer (inpuvoutput) is in base 10 numbers, Thi: transparency oes not mean thatthe competer uses base 10. In fact, it converts input te base 2 (or perhaps base 16). then performs base 2 arithmetic, and finally translate the answer ito base 10 before it disphys a result. Some experimentation is require to verify this. One computer with nine decimal digits of accuracy gave the answer raaen w os = 99.9047 Here the ment was to add the number /; repeatedly 100, 000 times. The mathemat answer is exactly 10, (00. One goa isto understand the resson forthe compute ap Parenily awed calculatos, A the endof ths secon, it il be shown how somethin, Islost when ine computer uansates the Cecima recon fs nto a binary number14 CHART PRELIMINARIES Binary Numbers ‘Base 10 numbers are used for most mathematical purposes. For illustration, thenum er 1563 is expressible in expanded form as 1563 = (Lx 10 ~ (5 x 107s + (6 » 107) + x 10%. In general. let M denote a positive integer: then the digits co, a1. ..., ax exis so that 1 has the base 10 expansion Ns Cag x 10) + (04-4 x 1 $e ea 10) ag x 10°), where the digits ae are chosen fram {0, 1.....8.9) Thus 4 is expressed in decimal novation 2s @ WN = ante 224)0%¢y, _(decienal) If ics understood that 10 is the base, then (2) is writen as W = aga) -azaien, For example, we understand that 1563 = 1563er. Using powers of 2, the number 1563 can be writen 1563 = 1x 2) 4 xP) +0 x 2) + Ox 2) + 0x2) > SOx BVH KR PI+ 02) (1 2) +0 x2, ‘This can be verified by performing the calculation 1563 = 1024+ 5124+ 16+84+241, In genera. Jet N denote a pos tive integer: the digits bo, by, .... by exist sothat tas the base 2 expansion “ thy 2) 4 (by EY bo + by XD) + by x 2), where each digit 5 is either a 0or 1. Thus NV is expressed in binary notation as o Ne bybpais-babrborey (binary). Using the matatios (5) and the result in (3) yields 1863 = 1100001101. Remarks. The word “two” will always be used as a subscript at the end of a binary ‘uraber. This will enable the reader to distinguish binary numbers from the ordinary ase 10 usage. Thus i 1 means one hundred eleven, whereas I Iyy stands for seve. Ste. 12 BINARY NewBERS 6 eis usually the case tha the binary representation for W will require rnore digits than the decimal representaion. This is due to the fact that powers 2f 2 grow much ‘more slowly than do powersof 10. ‘Ane-‘ficient algorithm for tinding the base 2 representarion of te integer N can be ‘derived from equation (4). Dividing both sides of (4) ky 2 yields #65) x29 Hence the remainder. upon dividing N by 2, the digit by. Now deermice by. 1-65 iswiten as 8/2 = Qg-t by/2, then MM — Dom thy x2) + thy. Now dive both sides of (7 by 210 gt ay) = tbyey hn eta ‘Hence the remainder, upon cividing Qo by 2, is the digit by. This process is continued and generates sequences (Q, and {b,} of quotients and remainders, respectively. The process is terminated wher an integer J is found such thar Qy = 0. The sequences ‘bey the “ollowing formulas: N a Mater Phe tye x 2 © Fate xP hr tsax 2b eM) + x2, 200-0 201 ~bi @) 05-25 20)14b)4 Q11=20;+b) (Qs=0. Example 1.10. Show bow» obtain 1563 = 1100001101 Iwo. ‘Stan with N= 1563 ard construct the quotients and remainders scoorcing to the equations in (8: 1563-2 2x 78141, 71 = 2% 39041, 390m 2 x 19540, 195 =2% 9741, O7=2% 4841 240, 12+0, %=0 640, 340, T+l, bel OF, bio16 CHAP. PreLinvaxies ‘Thus the binary representation for 1563 is 1563 = brgbaba bob Biveg = 1160001101 bye . ‘Sequences and Series ‘When rational numbers are expressed ir, decimal form, itis often she case tha: infinitely mary digits are reqcired. A familiar example is @ Here the symbol 3 means thatthe digit’ i repeated forever to form an infinite repeating te them the series diverges. Proof. The surnmation formula fora finite geometric series is (8) Symeter ter peter ‘To establish (12), observe that as) |r <1 implies that tim r™*? ‘Taking he iit a + ose (14) an 15) 1 ge Bm Soo p(t mr) By equation ¢15) of Section 1.1, the limit above establishes (12). When r| 2 1, the sequence r*™!} does nat converge, Hence the sequence (Sq) i (14} does not tend to a limit. Therefore, (13)is established, ° Equation (12) in Theorem 1.14 represents an efficient way to convert an infinite repeating decimal into a fraction. Example 1.11. 03= 530107 Binary Fractions Binary (base 2) fractions san be expressed as sums involving negative powers of 2. Ef R is a real number that Yes in the eange ) < R < 1, there exist digits d, d:. tye $0 thet a8) Ready xD) yx DA tidy RIM Oe, where dj € {0, 1}. We usually express the quantiy on the right side of (16) in the binary fraction notation an, R= Oddy18 CHART PRELISUNARIES ‘There are many real numbers whose binary representation requires infinitely many ‘digits. The fraction 7/10 can be expressed as 0.7 in base 10, yet its base 2 representation requis infinitely many digits: ae ay Fp TOOT en. “The binary fraction in (18) is a repeating fraction where the group of four digiss G10 is peated forever. ‘An efficient algorithm for finding base 2 representations cat: now be develeped. If both sides of (16) are culiplied by 2, the resuitis a9 BR di +d 9 ‘The quantity in parentheses on the right side of (19) is a positive number andis less than 1. Therefore d isthe integer part of 2. denoted dy = int(2R). To continve the process, take the fractional part of (19) and write 20, Fi where frae(2R) isthe fractional oar of the real number 2R. Multiplication of both sides of (20) by 2 result in au BES dy 4 Md x BN x 2) 4), Now take che iateger part of (21) and obtain dy = int(2F)). “The process is continued, possibly ad infinitum (it R has an infinite nonrepeating base 2 representation), and two sequences {cs} and {F} are recursively generate. ody x Sract2R) = (ds x PN + dy 7 te = im OAR. 22 = frac QF). where dh = imt(2R) and Fy = frac(2R). The binary decimal representation cf R is then given by the convergent geometric sores R= yaar mi Example 1.12, The binary decimal representation of 7/10 given in (18) was found using ‘the formulas in (22). Lot R = 7/10 = 0.7; then Fa free(1 6) = 0.6 Fi > feae(1.21 = 02 ds =imO4) = 0 5 = fractO4) dy = int(0.8) = 0 Fa free(0.8) = 08 sind =1 Fy feecs1.6) = 0.6 SEC. 1.2 BINARY NUMBERS. 19 [Note hat 2F = 1.6-= 2F, The patterns dy = dea and Fy = Fags will occur for k= 2 3.4. Thus 7/10 = 0.107100, * Geometric series can be used 10 find the base 10 rational number tha: a binary sumer represents. Exampl: 1.13, Find the base 10 rational number tha the binary number O0lrue teD4e= sents I expanded form, Ole = Ox 2°) Hx? x2 40x24 Ifa tical number that is equivalent to an infinite repeating binary expansion is w be sigund, then a shift inthe digits cam be helpful. For example. let $ be given by = 0.000007 70 v9 “Mutiplying bot sides of (23) by 2° wil shift the binary point five places to the rah send 325 us the form on 325 = 0.1700 Sisters. euiplying bos sides of 23) by 2! wil shift the binary poi ten places to the right and 10245 has the form 10245 = 11000, T7000 me ‘The mstit of naively taking the differences between the left- and right-hand sides of (24) ard (25) is 9925 = 110000 oF 9925 = 24, since 110004, = 24. Therefore S38, (2s) Scientic Notation A scainlonl way to present a real number, called seientifie notation, is obisined by siuluug tie decimal point and supplying an appcopriare power of 10. For example, 8,0000747 = 7.27 x 10°5, 34.4 159265 = 3 14159265 x 10, 7x10? 91,700,000,000 {In chemistry, an important constants Avogadso’s number, whict is 6.02252 x 10%. 1 is the numiber of atoms in the gram atomic weight of an element. In computer science, 1K — 1.024 «10°20, CHAP.1 PRELIMINARIES ‘Table 13 Decimal Bquvalems fora Set of Bizary Numbers with 4-Bit Mantissa and Bxooneat of n= 32S ‘Exponent n=0 [nat Nantesa Owe | Ooms fous [ozs fos |i 0.190two | 0703125 | O.t40625 | 0.28125 | 0.5625 | 1.125, 0.110 | oo7si2s | o.tse2 | ost2s | ows | 125 1.0 Tew | 20858575 | o.n189S | 034375 | RTS | 1375, 0.100466 ons | 1s osizs | 1625 oss | 1.75 ourmis7s | 026375 | oa6s7s | 0937s | Lars Machine Numbers Computers use a normalized Soating-point binary representation for real numbers ‘This means that the matherratieal quantity x is not actwally stored in the computer. Instead, the computer stores binary approxination to x 26) wag x2 ‘Thenumber¢ is the mantisse ard itis a finite binary expression satisfying the inequal- ty 1/2 2-9, .-- 0.1110 24, 0.11 Le % 24 ‘The decimal forms ofthese & numbers are given in Table 1. Iti important to lara that when the mantissa and exponent in (27) ae restricted the computer has a limited ‘number of values it shooses from to sore as an aproximation to te real number x What would happen if computer had only a 4bit mantissa and was restricted to perform the computation (5 + 4) + ? Assume thatthe computer rounds all ral ‘numbers tothe closest binary number in Table 1.3. At each step the reader can look a the mble to se that he best approximation is being used Sac. 1.2 BINARY NUMBERS 4 He © 0119Ime x2 = 0.0110 x2? Q9) a % OUIDNewo % 2-2 .110gy0_ x 2? % D007 x ‘The computer must decide how to store the number 0.0011 Imo x 2-2, Assume that i is roundedto 0.10100 x 2°! The next step is Gh ¥ 01010 m9 x2! = 0.10100. 2! (0) 4 © 040tNege X22 = 0.0101 pwo x 2°! t OI we x2 ‘The compater must decide how to store the number 0.111 aug x 2~!, Since rounding is assumed 1 take place, it st07es 0.10000, x 2°. Therefore, the computer's soluticn to the addition problem is oy J ¥01000 ee 2°. The erorin the compute’ sealeulation is 620100 04646 05000 = 0.0385 Expressedas a percentage of 7/15, this amountst0 7.14%. Computer Accuracy ‘To store numbers accurately, computers must have floating-point binary rurabers with al least 24 binary bits used for the mantissa; this translates ¢o abott seven decimal laces. If 2 32-bit mantissa is used, numbers wih nine decimal places can be stored, ow, again, consider the difficulty encountered in (1) atthe beginning of the section, ‘when a competer added 1/10 repeatedly. ‘Suppose taat the mantissa q in (26) contains 32 binary bits, The condition 1/2 < 9 implies tht the fist digit is dy = 1. Hence q hasthe form @) 6 = O.ldzds ---d3.ds2m. ‘When fractions are represented in binary form, it i often the ease that infinitely ‘many digits are required. An ecample is f ve oy Fp 7 OTe:22 CHAP | PRELIMINARIES 1s and the computer uses te internal ‘When the 32-bit mantissais used, truncation approximation 4 sostonicontnneor to one mae <2 ‘The error inthe approximation in (35), the difference between (34) und (35) is 36) 0.170 ay 2798 2.328306437 » 107" [Because of (36), the computer mus: be in e-ror when it sums the 100,000 addends cof 1/10 in (1). The error must be greater thar (100,000)(2.323306837 x 10~!!) 2,328306437 x 10%. Indeed. there is 2 much larger enor. Oczasionally, the partial ‘sum could be rounded up or down. Also asthe sum grows. the later addends of /10 are small compared to the current sizeof the sm, and their contribution is truncated move severely. The compounding effect of these errors actual y produced the error 10,000 — 9999,99487 = 5.53 x 107% Computer Floating-point Numbers Computers have both an integer mode anda floating-point mode for representing num: ‘ers. The integer mode is used for performing calculations tbat ase known tobe integer valved and has limited usage for numerical analysis. Floating-point numbers are ased for scientific and engineering applications. It must be understood that any computer implemenation of equation (25) places zesiictions on the number of digits used in the mantissa g, and that the range of possible exponents n must be limited, Computers that use 32 bits to represent single-precison reat numbers use 8 bits for the exponent and 24 bits for the mantissa, They can represent real numbers with ‘magnitudes in the range 29973EE—39 to 1TO14I2E +38 (ie.,2-"% 10 2!7) with six decimal digits of numerical precision (e.g.,2-% = 1.2 x 10" Computers that use 48 bits to represent single-precision reel numbers might use 8 bis for the exponent and 40 bits for the mantissa. They can represent real numbers inthe range 29BBTISETIIE 39 fo L.TOLAIIS346E +38 (ie, 2°" to 212" with 11 decimal digits of numerical precision (e 10" IF the computer has 6¢-it double-precisionreal numbers, i might use 11 bits for the exponent and 53 bits forthe mantissa, They ean represent real numbers inthe rane 5.S626846A6268003E — 309 to §.988465574311580E +307 Gig. 2-0 283) with bout 16 decimal digits of numerical precision (eg..2-"? = 22% 10" 29 18% SEC, 1.2 Bmiany NomBeRs a Exercises for Binary Numbers 1. Use a computer to accumulate the following sums. Te int sto have be computer ttorepectdsobtactans. Do nt seth mltpcton short @) 10.00- S50. 18) 10.000 -EP*o.125 2 Use equations (4) and () {0 conver the following binary nomers to de-imal (base 1€) form. (8 10101 {() 1110000 (©) MILO y0 [@) 100000011 hve 3. Use equations (16) and (17) to conven the following binary fractions to decimal (ase 1€) form. (2) 0.10 {b) 0.1010 ie () 010101040 @) O.LO1ICL ee 4. Convert he following binary numbers to desimal (bese 10} form, (@) LENO {b) 11.001001000 tive . The nurabers in Exercise 4 are approxima:ely v7 and 2. the error in these approximations, that, find fa) V2~ 1.011010 (Use V3 = 1.41421356237309.--) (B) t= 11,00100100014, (Use x = 3,14159265358979...) (6. Follow Example I 1Oand conver the flowing to binary numbers @) 2 &) 37 te) 378 (@) 2388 Follow 3xample 1.12 and coavert the following to a binary fraction of the form O.d1da- dove () 7/16 () 13/16 fo 23/32 @ 75/128 8, Follow Example 1.12 and convert the following to an infinite repeating binay frac- @ 110 © 18 ow 9, Fecthe folowing seven-dgit binary approximations in he evar in he approxima: tion R — O.didadsdsesdieo- (8) 1/.0 000011005 ©) 1/7 00010106 10, Show thatthe binary expansion 1/7 = ODT is quivalent > } = + J+ aty + ‘Use Theorem 118 to establish this expansion. 11, Show that the binary expansion 1/5 = O.59TToys is equivalent to $= + by + ‘aig +++. Use Thecrem 1.14 w establish this expansion, 12, Prove that any number 2~™, where N i a positive integer, canbe represented as a ‘decimal umber that 145 W Gigs thats 2°" = O.didadh dy. Hint. 1/2 = 0.5, 1/4= 028,3A CHART PRELIMONARIES 13. Use Title 13 w deternine what happens when » computer with» bit manasa perfor the following calculations. @ (J4dee (bt s)e$ © (s+$)+4 @ (B+s)+4 14, Show that when 2s replaced 2y 3 in al he formulas in (8) the sult is a method for finding te base 3 expansion ofa psiiveimeger. Express the folowing eget in base 3 @ 10 ©) 23 © a2 @ 17% 15. Show dnt when is replced by 3 in (22 the results method for finding the base 3 ‘expansion ofa positive rumba that fies in e iter < <1. Express the lowing numbers in ba 3 1/3 be) 1/2 © 110 @ ua 16, Show that when 2s replaced by Sin all he formals in (the results & method for finding the base 5 expunion cf a positive intege. Expres the following integers in tase 5. @ 10 35 © m om 537, Show that when 2s replaced by 5 in (22) the rvul i method for Sng the base 5 expansion ofa positive number that lies inthe interval < R < I. Express the following numbers in base 5. @ 15 b) 1/2 () 1/10 @ 154/625 Error Analysis Inthe practice of mumericl analysis important to be avare hat computed lutions are not exact mushematical solutions. ‘The preisfon of numerical soliton can be diminished in several subtle ways. Understanding these difcuies ean often guide te practitioner inthe 3ropermplmertation and/or developmentof numerical algcthms. Definition 1,7, Suppose that 7 is an approximation to p, The absolute error is Ey = |p — Bl. and the relative error is Ry = |p ~ Pl/\p\, provided that p #0. ‘The error is simply the difference between the tue value and the spproximete value, whereas the relative error is & portion ofthe ue value. Example 1.14. Find the error and relative enor in the following thee cases, Let x = 3.141592 and ? = 3.14; then th errr is as) Eq = be —F] = [3.141592 - 3.14] = 0.001592, and the relative erceis Sec.1.3 ERROR ANALYSIS: 25 [Let y = 1,000,000 and F = 999,996; then the errors ab) By =1y —F1 = [1,000,000 — 999, 96) = 4, and the relative errr is 4 Taoo,aon = 0000004 Lotz = 0.000012 and 5 = 0.000008; then the error ie cs) = |e ~ 3] = 10.000012— 0,000008| = 0.000003, and the relative eror is Incase (1a), there isnot too mach difference between F, and R,, and sther could be usodto determine the accuracy off. Incase Ib} the value of ys of magnitude 10°, the errr Eis large. and the relative ero R, is smal, In tis case, F would probably be considered a good approximation to. In case (Ie), zis of magnitude 10-€ and the error £, isthe smalles of all three cases, but the relative exror Re isthe largest. In terms of percentage, it amounts to 25%, and thus ? is a bad approximation (oz Observe tha as ipl moves eway from 1 (greater than or less than) the relative error Rp isa beter indicator of the accuracy of the approx mation than E- Relaive error i prefered for floating-point representations since i deals directly with the mantissa, Definition 1.8, ‘The number fis said wo approximate p tod significant digits if dis the largest postive integer for which w ° irl Example 15, Determine the numberof significant digits or the approxination i Brample 1.4. (Ga) Hh = 3141592 and 2 = 3.14 then e~ FY/tel = 0.000807 < 10°/2, Therefore, 3 approximats Yo tu sigtcat digi. (2) Wey = 4,000,000 and 5 = 399,996, then ly — Fifi) = 1000006 < 10/2. Theres 5 appro 10 fivesipicn! dg (Ge) fz = 0.000€12 snd 2 = 0.000008 then |e = Ze = 0.28 < 10-9/2, Therefore, 7 approximates: ono Sinica digi :2 Char 1 PRELBENARIES. Higure 17. The graphs of y = x fly ey = Pyle), and the ares ander the cive for O = x} ‘Truncation Error “The nation of truncation o-wor usally refers to errors mtradiced when 2 more com plicated mathematical expression is “replaced” with a moze dementay formula. This ferminelogy originarcs from the technique cf replacing a cor:plicaed function with a wuncated Taylor series. For example, the infinite Taylor series wight te meplced with just the ative tra 2+ B+ =. Tis might be tee when approunatiag ux eegra ouredcally example 1.16. Given that {ede = 0:344987104184 = v, determine the secaracy of che approximation ‘obtained by replacing te imegrand Sex) =e? with the truncated ‘Taylor series Pax) 1a + HET eae hasta 24a fxm [xt + ont [oe Seton tetestne ee eee eet — 27 247 35 t S76 * 0383 109.491 is pepehlRs 720817 = ey = 0.549986720817 = Since 10-4/2 > |p—AI/Ipk = 7.03442%10-' > 10°$/2. the wpocuxituation F ages with the tre arnwer p = 0.544987 104184 to five significant digits. The graphs of f(x — e* and y = Pi(x) and the ares under the curv: for 0 Sx < 1/2 areahown in Figure StC.14 ERROR ANALYSIS w Round-Off Error A computer's representation of real numbers is timuted to the lixed precision of the mamissa. Thue valves are sometimes not stored exacily by a computer's represen taron Ths ig called roundoff error, In the yreveding section the real qumaber 1:10 0.000TTrao was trunca'ed when it was sted in @ compute. ‘The actual num rer that is stored in the computer may undergo chopping or rounding of he last dig. therefore since the computer hardware works with only a limited mumpher of digits « Lubers, ruling ertors ate introduced snd propagated in successive com ‘Chopping off Versus Rousding Of CConsier any teal umber p that is expressed in « normalized dectmal form: “ Pm BW ddads dus. 10". Where | < di < 9and 0 < dj <9 for j > 1. Suppose that kis the maximum aumiber «+f dacimal digits carried in the Heating. point computations ofa computer, den the real amber p is epresonted by flay). Which is givea by i Flegag(P) = Odds 10" whew | $y <9 and O 2 dy <9 for} < j =k. The number ficjopip) is ealed ‘ne chopped floating-point representation ol p. ln this case the Atk digi of JTebop() agroos with the dah digit of p. An altemative digi: represencation is the rounded Floating-point representation flayual p)s which s given by " TlayeaB) ~ £0didady. 1 x 10" where | < dy = 9 and 0 = d ¢ Ofor | < j < £ and the ast digit. is ebtainas by rounding ine number daly dk—2-+ tw the neurestsmteger, For example, the real number 2 fe 3.14285 14289 /142487 ‘as the Following six digit representations: Alege P) = 0314285 » 10! leans P) = 0.315286 » 10! br common purposes the chopping and rounding would be writen as 3.14285 and 3.44286. respectively. The reader should note that escantally all computers use sowe form ofthe rounded floating point representation method,2H CHAP. PRELIMINARIES. Loss of Significance Consider the two numoers p = 3.1415926536 and q = 3.1415957341, which are nearly squal and both carry 11 decimal digits of precison. Suppose that ther dffer- ‘ence is formed: p — q = ~0.0000030805. Since the first six digits of p and q are the same, their differerce p — q contains only five decimal digits of precision. ‘This phenomenon is called bss of significance or subtractive cancellation. This reduction inthe precision of the final computed auswer can creep. in when itis net suspected. Example 1.17, Compare the results of calculating (00) and (500) using six digits androunding. The functons ae f(x) fist faretion, (800) = 500 ( ¥50T ~ 300) 500(22.3830 ~ 22.3607) For. 500 508 Tar 50 ‘io = 11.1748. Dae + 2607 ~ 4787 ‘The second function, g(=),is algebraically equivalent to f(x), as shown by the computa- s(VEFT = v5) (VET V5) Vette x ((verTy - (ve?) Vat t Je “Wein ‘The answer, g{500) = 11.1748, involves less error and isthe same as that obtained by sounding the ve answer 11.17475S300727198 ,.. 10 six digits. . ‘The reader is encouraged to study Exercise 12 on how to avoid loss of significance in the quadratic formula. The next example shows that truncated Taylor series will sometimes help avoid tte loss of significance error. Example 1.18. Compare the results of calculating f(0.01) and P(0.01) using six digits and rounding, where e fa)= and Pix) = 3 D6 SEC. 1.3 BRROR ANALYSIS » The function P(x) isthe Taylor polynomial of degree n= 2 for f(+) expanded about a For the firs function ©} 1-001 1.010050— 10) = 60 “gor OS Forte second finetion 1001 Poon = +20 = 0.5-+0.001667 + 0000004» 0.501671 Tre answer P(O01) = 0.50167! contains less ewor and is the same 3s that obtained by rounding the trve arswer 0,50167084 168087542... wo six digits. . Fer polynomial eveluation, the searangement of terms into nested multiplication form ‘ill sometimes produce a better result. Example 119, Let P(x) = x) —3x7 + 3x Land Ql) = e—F)x + Ie 1 ‘Use three-digit ouning arithmetic Yo compute approximations to P(2.19) and Q:2.18) ‘Cemypare them with the true values, (2.19) = Q(2.19) = 1.685159. P(2.19) % G.19)? ~ 3(2.19)? + 302.19) — 105-144 46.57 ~1 = 167. Q(2.19) © (2.19 ~ 3)2.19-+3)2.19 — 1 09. ‘Terrors are 0.015159 and ~0.004841, respectively. Thus the approximation (2.19) © |L6 has less errr Exercise 6 explores the sietion near the root ofthis polynomial. O(h") Order of Approximation Clearly the sequences { 3} ana (1) are doth converging to 2ero. In addition, it should be observed thatthe Bist sequence fs converging to zero more rapidly than the second sequence. Inthe coming chapters some spesial terminology and notatien wll 2e used to deseribe how rapidly a sequence is ccaverging, Definition 1.9. The function f(A) is said 10 be big Oh of gih), denoted fih) = 0 's()), if there exist constants C and c such that T: LfUDIS C'gQ01 whenever h = c a Example 1.20, Censider te functions f (x) = «2+ 1 and g(x) =x7. Since x? <2 asd 120 forx > |itfollows that 41 = 2s forx > L. Therefore, fix) = O(@U)).30 Car) PRELIMIVARILS ‘Thebig Oh notatica provides aust way of deserting the rate of growth of funtion interms of well kaown elementary functions (x" x", a¥ log, x. ete) The rat of eonvenence of secuenocs can be deserbed in similar manner. Definition 1.10. Let [sp}Z2) ant yn}, be to sequences. The sequence [a said tobe of onde big Oh of (ra) denoted sy ~ On) if thew exist constants = and N such chat & Leal ¢Clval whenever m 2 1. . Example 121. "5! = 0 (2), since "st < 5 — | whenever m 2 | . ‘Often a function f(A) is approximated by a function pi) and the error hourd is known to be Mik"), This leads to te following defixition Definition 1.11, Assume that fi) iS approxima'ed by the function pUA) and shat there exist a teal vonstont Mf > 0 ard.a positive integer n so that esa nar mons) rome a ° ‘When relation (9) is rewntten in the form |f th) — pli < MIRE], we see thatthe notation 0") stands in place ofthe error hound Mf"|. The following results show hhow to apply the definition to simpl« combinations of two Functions. Theorem 1.15. Assume thet fh) = p(y + O10). gh) = gla) + OCH), ans stains) Then on Fh) + 90>= pth) 1 4th) + 01K), aa JUhygthy= peng) + 96, iy (13, £2 2) | Gary provided that g0h) #0 and ih) £0. wh) oh) Its instructive wo consider p(c) t0 be the nth Taylor polynomial approximation ‘of F(0) then the remainder tern is simply designat-d O(h"''), which stands forthe presence of omitted terms starting wth the power h”=!. The remainder term converses 1o zero with the same rapidity that A" converges to Zoro a8 h spproaches 7ero, a° ‘expressed inthe relationship a Loo) as Oth MAME ay LOO prt ee Gri SHC 1.3 ERROR ANALYSIS a fo- sufficiently small h. Heace the notation O(h"~) stands in place of the quantity Mie where M is a constant ar “hehaves like a constent.” ‘Theorem 1.16 (Taylor's Theorem). Assume that f € C*'{a, bl. Hf both xo ys y+ lie m [a,b then 2 f(x) ass Susy +h) eed + mnt"), The following example illustrates the abov: theorems. The computations use the dition propenies (3) O(H?) + OWA) = O(HP:, (i) OWAP) + OUR) = OCH"), ‘wwe r — min{p,g}, and the multiplicative aroperty (il) O(APIO(H#) = OCH") whew s =p ig. ‘xample 1.22. Consider the Taylor polynomial expansions PP as BO oes Partie te pom) amd comet S +h + om Determine the ode of approximation fr their sum ard product. Forthe sume have ” Be eee oat Arenmaren FoF consi 8 oH ont z r Since O14 $= OUH ad O14 + O04 =2bh +005 +8 co OU this reduces to Pecos) 2484 E 6 OU, and dhe order of upproximation is OC). ‘The producti treated sina ° : oats =(ne ek (she) we i ofr Mowe ( + ORM OHS we sht mS nS 3 RRR + O(H) + Oth) + OCHOA),32 CHAx} PRetainnanies Since OO) 0188) = 01h) and SiH A ae, oat a omit Se tt Gat O04 06 + 00%) = 004, the procecing equation is simpBiied to yield Putty pn-8 008 and the order of approximationis OU") . Onder of Convergence of a Sequence ‘Numerical approximations ars often arrived at by computing a sequence of approxi ‘nations that get closer and closer to the desired answer. The definition of big Ob for sequences was given it Defin ion 1.10, and che definition of onder ef convergence (or 2 sequence is analogous tothat given for functions in Definition 3.1! Definition 1.12. Suppose that timy.50y = x and (ra}%., 18 a sequence wits Hiynroora = 0. We say that (xaf25, comerges to x With the order of conve gence OCn)qif there exists a constant K > O such that lana Tal K for m sufficiently targe This is indicated by writing Xp = x + Offa), oF xy —> x with order of conver gence O(n) . Example 1.2%. Let xy = costn)/n? and ry = /n?s then lito convergence O(3/n2). This follows immediately rom the relation it a rate of Jeos(na/n?| ey = cost) <1 for all x . Propagation of Error Let us investigate how error m:ght be propagated in successive computations, Con idet tne addition of two numbers p and g (the true values) with the approximate values and @, which contain errors €p and €y, respectively. Staring with p= P+ and 4 =F +6, thesumis 16 PHG= FH E+ GH ey) = PAD +p 464) emce, for addition, the error inthe sum isthe sam of the errs i the addends, Ste.) EsROC ANALYSIS » “The ropasaton of error in multiplication i more complicated. The proxi is an 14 = BA egMB + 6g) = BAY Bea t Fey + nee Hence. if and dare larger than Lin absolute vale, terms Peg and Ge, show that tere {isa ponstiy of magnification othe orignal erors ep and cq. Insights are gained if ‘we Louk athe relative eror. Rearrange the terms in 17) t0 get cr) 24 BA Bey +p + eta Seppose that p # O-and g # 0: then we can divide (18) by pq to obtain the ‘ertorin the procact pa: 49) Furthermore, suppose that and J are good approximations for p and q: then BIp™ 1810 ~ Ieand RpRe = (ep/ p)leq/a) * D(Rp and Ry are the relative errors inthe approximations j and 4). Then making these substitutions into (19) yieds Cx simplified ‘elaticnship 2) Rpg = A 2 2 0m Re +R, a? a ‘This shows that the relative error in the product pq is approximately che sum of the ‘lative errors in the approximations P and’. ‘Often anininal error will be propagated in a sequence of calculations. A quality is 1 contitions Shanges in the fina result. An algorithm with this feature is calle’ called unsiable. Whenever possible we shall choose methods {hat are sable. The following definition is used to describe the propegation of ¢ or. Definition 1.13. Suppose that ¢ represents an initia error and c(m) represents the sromth of ae cri after m steps. I°le(n); =e, the gxcuth of eror is suid to belinear IF }e-a}! © Ke the growth of error is called exponential. If K’ > 1, the expotential cr-or grows without bound a n —» 00, and if! < K <1, the exponential error Aicanishes te 7210 as n => 96. a “he nexttwo examples show how an initial error can propagate ia either astasie or an unsuble fashion, In tke first example, three algorithms are introduced. Esch slgovthm coarsively generates the same sequence. Then. in he secondexample srrall ‘ranges wit! be made to Ue initial coaditions and the propspation of error will be analyzed,34 CHARL PReLiMINARlES ‘Table 1A The Sequence {sq} = (1/3") and the Approximations (ra). (2x) and (ga) Scr 1.3 BRKOR ANALYSIS % ‘eble LS The Error Sequences (2x ~ ral Ute Pel and (xe — gab : m L- | iat 2 mate lane oe ‘0 | 1="e000n00000 | o.sspaeon000 | 0900000100 | 1.0000000000 € ‘o.oop040ca00 ‘60000000000 ‘0.0000000000 i Otani) Sami Stenosis 2] feos | arma | assnoimn | osanasr } | Stasis pr oma | 4 D.0000008538 0.0000337531 0003049383 3} f=ecsvos703—0 | 0270355555 | oosoITT78 | 90369022222 5 10.c000001646 0,0000199177 e.0012149708 ‘ ; 2 | Matis Senor oes «| geswaseam | conus | coms | conor & | tapiioss eas east 3} sheen | oem | coors} oom + | sens si osu : 7 5 | tonite Station 6| y= oonsrim421 | ooois7ie872 | ooorssi7@s | -oom2732510 7 : : fs 7) mfy=00nns72474 | o.00s72291 | o.0008372565 | -o.o1047rI903 saezs = = 8| ager socnniseatss | o.con1s24097 | oooorsz4iss | ~0.0326s25834 | pray =aconnsesnss | o.ccnasow0s2 | o.0000308063 | —c.o9ss641945 ‘equation bas the general solution gq = A(I/3")+ B3". This too is verified by substitution: 10 by comment | osoniss [oom | -oorsmmee : bial Poe rmans= (gh + ase) (sho +939) Example 1.24, Show tha the following thes schemes can be used with infinite-precision arithmetic to recursively generate the terms inthe sequence (1/3*) 2. fore pee 2a) forn=2,3, 41 and pe Pao ~ 5 Pant 21D) po= tsp 10 and ae Qe) alan 3, Formula (2) is obvious, In (21) de difference eyustion has the ganeral solution. A(1/3") + B, Ths can be verified by direct substitstion: (8-2)a-co-neete +89" = on Sting A = 1 end B = 0 generates the required sequerce. . Riample 1.28, Generate approximations to the sequence (xq) = (1/3") using the Ds) y= 0.99996 and ry for n= 1,2, me) wes inde 2d form =2,3, 1m 28) the intl enor in rp i 0.00004, and in (226) and (2c) the inal ers inp sn gy re 0.000013, Investigate the propagation of er foreach scheme Table. gives the ist ten numerical approximations foreach equeace, and Table LS rvsthe err in each formula, The ertorfor (,) i sable and deceasesin an exponential ‘rnet. Te eror for {pab is stable. The error for {gr} s unstable and grows at an expo- serial rat. Albough the enor fr (py is stable, the ems py —> Oas "+c, $0 that, he crrorevestelly dominates and the terms past ps hive no significant digs. Figures 18, 1.8, and 1.10 show the eros intra}, (Pais ad (gq). respectively. . 0332, ane a= Be36 CRAP. 1 PRELIMINARIES 1.000015 6.000005 . 2 4 6 5 ie Uncertainty in Data Data from real-world problems contain uncertainty or eror: This type ef error i ferred to as noise, It wil affect the accuracy of any numerical computation thats based ‘on the data. An improvement of precision is not accomplished by per'orning succes ive computations using noisy data. Hence, if you star with data with d significant digits of accuracy then the result of a compotation shou d be reported in d significant igi of accuracy. For example, suppose thatthe data p) = 4.152 and pp = 0.07931 both have four significant digits of accuracy. Then it & tempting to report all the di that appear on your ealevlatc (i.e, py + p2 = 4.23151). This isan overs ght, because Ae 03, . 02 2 4 6 3 1 Figure 1.10 An usable increasing error soquenee {iy — dn S8C.13 ERROR ANALYSIS a ‘you should not report conclusions from noisy data that have more significant digi ‘than the original data, The proper answer in this situation is pi + p= 4.231 Exercises for Error Analysis 1. Find the error Ey and relive eror Re. Alto determine the mbes of significant digits in the approximation (a) x = 271828182, F = 2.7182 2. Complete te following computation [Nee f (ne State what type of error is present inthis situation. Compare your answer with the trae value p = 0.2553074606. 3. (2) Consider the data pj = 1.414 and p> = (.09125, which have four significant digits of accuracy. Determine the proper answer forthe sum pi + pp and the product pp. () Consider the date pr = 31.415 and py = 01027182. which have five significant digits of accuracy. Determine the proper answer forthe sum p, + po and the 7 product psp 4. Complete the following computation and stae what type of errris present inthis siuaton. sin + 0.0001 bed 0.60001 (o 24240.00005) — nv, _ 069317218025 — 069314718056 _ ‘0.0005 908805 5. Sometimes the loss of significance eror can 2e-avoided by rearranging terms inthe function using a known identity from trigonometry of algebra. Find an equivalent {ormula forthe following funet ons that avoids a loss of significancs (2) InGx + 1) ~In¢x) for rge x () VEEHT ~ x forlargex (eh cost(x) ~ sin’Gx) for: ea /4 (FSO tore 6. Polynomial Evaluation. Let P¢x) and RG) = Ge = 1 (a) Use four-digitrounding arithmetic and compute P(2.72), Q(2.72), and R(2.72). In the computation of (x), assume that (2.72)? = 20.12 and (2:72)? = 7.398 3x2432-1, 0G) = (G—9r439e= 138 CHAP.) PReLimivantes by Use fourdigt rounding arthmesc and compute (0.975), 910.975). and ‘R(0.975). Jn the computation of P(x), assume thet (0.975)? = 0.9268 and (0.975)? = 0.9406. 7. Use dhee-igit rounding arithmetic o compute the following sums (sum in the given oni @ Shad () Dy ste 8 Discuss the propagation of ear forthe following (ai The sum of tree namber Ptatr=@~e)+Gtei+ Ore. (0) The quotient of wo numbers: © = FS, ©) Thequotien oa Btee (The product of tree munter: pa PrenGrelr+ 9. Giren the Taylor polynomial expansions ae 24m? 4 OM TapT tht HP + OM) Determine the order ef approximation for their sum and product. 10. Given the Taylor polynomial expansion: moe ate helene and sings = 24 00 Devermine the order cf approximation fer ther sum and product. 11, Given the Taylor polysomial expansions e © ons cost +R +005 ww inh) = ® 4 Oc? sin(hy) =~ F +E +008, ‘Determine the order o? approximation for their sum and preduct. Sec. 1.3 ERROR ANALYSIS » 12, Improvingthe Quadratic Fornula. Assume hata # O and b?—~4a> > O and consider the equation ax*-+bx-+e =O. Te roots can be computed wit the quadratic formulas iS ab VOTE 0, then x; should be computed with formula Gi and x2 should be ‘computed using (i). However if b < 0, then x should be computed using i) and x2 should be computed using (i 13, Use the appropriate formula forx1 and x2 mentioned in Exercise 12 10 find the roots ofthe follewing quadratic equations. (@) x ~1,000001 +1 =0 (@) x? ~10,000.0001 + (©) x? ~100,000.00001x + 1 =0 (@ = —1,000,000.000001x +1 =0 Algorithms and Programs 1. Use the results of Exercises 12 and 13 to constnct an algorithm and MATLAB pro: sam that will accurately compte the rots of quadratic equatic i ll situations, including ne woublesome ones when (| = VBF= Sac 2. Follow Example 1.25 and generate the fist tn numerical approximations for each ‘ofthe following three difference equation. In each case a small inital eror isin twodoced. If there were no intial error, then cdc ofthe difference equations would generate the sequence (1/2"),. Produce outpat analogous to Table 14 and 1 Sand Figures 1.8 1.9, and 1.10 (@) #0994 and ry = Frints form = 1,2, ©) po=t.pr = 0.497, and pe = BPent— Paar, form = 2.3, (© Go= 191 = 0497, and gn = 34-1 — dea, form 2,4,The Solution of Nonlinear Equations f(x) =0 Consider the physical problem that involves a spherical ball of radius r that is sub- ‘merged taa depth d in water (see Figure 2.1). Assume that te bal is constructed from 4 variety of longleaf pine that has a density of p = 0.638 and tat its radius measures = 10cm, How much of he ball will be submerged when i is placed in water? ‘The mass My of water displaced when a sphere is submerged to a depth dis xd M ={ x@P~ (ends = 2 3 ‘and the mass of the ball is Mp = 4xr3/3. Applying Archimedes’ law My = Mp, ‘produces the following equation that must be solved: nd — Mr 4473p) _ + = 4) Figure 21 The ponion of a spine of radius r dnt is to be su merged to 2 depth 21 SEC.2.1 ITERATION FOR SOLVING x = g(=) [9-282 -s0eat 2000 1000 ‘ 7 3 0 5 0 ~1000 Figure 22. The cubic y = 2552 — 30d? + 43, Inour case (with r = 10 and p = 0.638) this equation becomes ‘The greph ofthe cubic polynomial y = 2552 — 30d? + ¢3 is shown in Figure 2. from it one can See thatthe solution fies near the value d = 12, ‘The goal of this chapter is to develop a variety of methods for finding mums approximations forthe roots of an equation. For example, the bisection rrethod « bbe applied to obtain the three roots dy = ~8.17607212, dz = 11.86150151 ds = 2631457061. The frst root dis nota feasible continued indefstely, and itis easily showe tha imam. By = =. 11 Chapter 9 we wil see thatthe sequence {pr} is a numerical solution to ie differentia ‘equation y’ = 0.001. The sointor is known to be xr) = Oecd if we compute ‘he 100th term in ae sequence Witt y(100), we see shat pray = 1105116 > 1.505174 2 = 100. . In this section we are concemed with the types of functions (x) that produce ‘convergent sequences {74} Finding Fixed Points efinition 2. (Fixed Point), A fixed point of a function g(x) i a veal number? such thot ? = g(). a Geometrically, the fied pointe of a functior. y = g(x) arethe points of intersection of y= g(x) and Definition 22 (Fixed-point Iteration). The eration presi = (pm) for n 1... is calle fixed-point iteration. a SEc.2.4 ITPRsTION FoR SOLVING. he) e ‘Theorem 2.1. Assumethat gis acostinuocs function and that {pq}* pis a sequence ‘generates by fixed-point eration. IF ity.so ry = P. then P isa fixed point of g(x) 1 fimy- sae Py =P, then lis. Pugy = P. I. Follows from this result. the =P pn aan: of andthe ation Peas =F (2) sP)= a(n m4) = Sm, 80d = ie Prot = ‘Thererore, P is a fixed point of a(x). . Example22. Consider ihe convergent tration po=OS and pigs serh fork se0.1 ‘Tee fest 10 terms are obtained by the calculations py 0 980000 = 6.606531 pa =e PMS 6.545039 pa = S829. 6.579705, 67560 pro = 67587389 2 €.586907 “Toe sequence is vonwerging, and Fithercakalations seven that lip, Pa = 0.567143, “Twat we have found an approximation forthe fied poiat of the function y . ‘The followirg io theorems establish conditions for the existence ofa fixed point ‘and the convergence ofthe fixed-point iteration process toa fored point, ‘Thearem22. Assumethat ¢ = Cla, bl 3) [fre range of che mapping y = g(c) satisies y € (a. forall x ¢ fa. 6) them f basa tned poiat in (a, bl. (6) Fthérmore, suppose that g(x) 8 defined aver (a,b) and that a postive constant A < | exists with fp) K < 1 forall x € (a,b). then g has curique fed int Pin a.)4 CHAR.2 THE SOLUTION OF NONLINEAR EQUATIONS f(x) =0 Frovf of 3). Uf gle) = a oF g(b) = b, the assertion is true, Othe-wise, the vahiee of g(a) and g(b) mus satisfy g(a) € (a,b) and g(6) € (a,b). The function f(x) X= g(x) has the property that fla)=a~gla)<0 md f(b) =b— gib)>0. Now apply Theorem 1.2, the Intermediate Value Theorern, to f(x), with the constant 1 =0, and conclude that there exists a number P with P ¢ (a, bt sc that f(P) = 0. Therefore, P = e(P)and P is the desired fixed point of g(). ‘Proof of{4). Now we must show that this solution is unique, By way of contrac lion, et us make the additional assumption that there exist wo fixed points Py and Ps. ‘Now apply Theorem 1.6, the Mean Value Theorem, and conclude that there exis Nox, use the facts tat g(Pi) = P, and g(Ps) = Pz co simplify the right side of ‘equation (5) and obtain Bat his contradicts the hypothesis in (4) that [g’(x)|_< 1 over (a, 6), 80 it is aot possible for two fixed points to exist. Therefore, g(x) has a unique fixed point P in{a, b] under the conditions given in (4). . Example 23. Apply Theorem 2.2 to rigorously show that g(x) = 2os(x) has a unique fixed point in (0,1) Clearly, € C(O, N]- Secondly, yx) = eus(+) is a decreasing function om [0,1 thas itsrange on [0, 1] is [c05(1), 1} ¢ (0, 1]. Thus condition (3) of Theorem 22 is satisfied and 4 hasa fixed point in (0, 1]. Finally, if x € (0,1) then lg(a)1 = | ~ sin(x)! = sings) = Sir} < 0.8415 < I, Thus K = sin(1) < 1, condition (4) of Theorem 2.2 is satisfied, and 1g haSa unique fixed point in (0,1). . ‘We can now state a theorem that can be used to determine whethe: the fixed-point iteration process giver in (1) will produce a convergent or divergent sequence. Theorem 2.3 (Fixed-point Theorem). Assume that (i) g, g’ € Cla, b) (ii) K isa Desitive constant, (i) py € (a, 6), and (iv) g(x) € [a,b] for all x € [a 5} 16) If [g')] © K <1 forall x (a, 0}, then the iteration py = g(Py—1) will converge to the anique fixed point P € [a, 5). In this case, P is said to be an attractive fixed point. (7) If Ig'(xy) > 1 for all x © (a,b), then the iteration py = gtpn-1) will not converge to P. In this case, P is said w be repelling fixed point and the iteration exhibits local divergence. SeC.2.1 ITERATION FOR SOLVING x = g(+) 45 Figure 23. The relationship among. P, po, prs IP — pol and |P — pl ‘Remark I tis assumed that po # P in statement (7). Remark 2. Because g is continuous onan interal containing P, its permissible to use the simpler criterion |@\(P)| < K < | and jg'(P)| > 1 in (6) ard (7), respectively. Prof. We frst show thatthe points (u)%2q all lie in (a). Starting wit py, we apply Theorem 1.6, the Mean Value Theorem, There exists a vaue cy ¢ (a,b) s0 that P~ pr le) — g(p0)l = ie'(eoP ~ pod) (8) ‘ - = le! (eodILP — pol < KVP ~ pol <1P— pol ‘Therefore, py is no further from P than pp was, and it follows that pr € (a,b) (see Figure 2.3). In general, suppose that p,1 € (a, b¥ then [P= pal = I8(P) — (Pa = len DIIP ~ Pr. te'nP > Proll @) SKIP ~ pat) <1P = prot ‘Therefor, py ¢ (a,b and ence by induction al the pins (py) iin.) “To complet the prof of (6), we wll sow that ag First, a proot by induction will establish the inequality ay IP = pol = KP — pol | follows from the details in relation ¢8), Using the induction hypothess K?'1P — po and the ideas in €), we obtain ‘Thecasen IP ~ Path VP = pol SKIP ~ pail KR" IF ~ pol = K"IP ~ pol ‘Thus, by induction, inequality (11) holds for all n Since 0 < K < 1, the term K" £2065 10 zer0 as n goes 10 infinity. Henee «i 0 Jim IP ~ pal S im K"IP — pol = ‘The limit of /P — pp| issqueezed between zero on the left and 2er0 on the right, so we ccanconclude that lity sso |P — Pq| =D. Thus LM Pe = P and, by Theorem 2.1, the iteration p = g(px—1) converges to the fixed point P. Therefore, statement (6) cf ‘Theorem 2.3 is proved. We leave statement (7) fot the reader to investigate. »46 CHAP.2 THE SOLUTION OF NONLINEAR EQUATIONS Figare 24 (2) Monotone convergence when 0 < g'(P) < 1 Pryer 0, 8100") Figure 2.6 (b) Osilaing convergence wren —1 < g/(?) <0. Corollary 2.1. Assume that g satisfies the hypothesis given in (6) of Theorem 2.3, Bounds forthe error involved when using pq te approximate F are given by 03) IP = pal SK"\P — pol for all n>, ond Kips — pol 14) = pal < SPL Pol fe att n= 4) IP pal s SPAS PO for all m2 SEC. 2.1 ITERATION FOR SOLVING x = g(x) ” yest) Figure 25 pence when I < 9°(P). Figure 25 (b) Divergent oscilla tion when g'(P! < —1 Graphical Interpretation of Fixed-point Iteration Since we seck a fixed point P to g(x), i is necessary thatthe graph of the curve y= gla) and he line y = x inerect atte point (P. P). To simple ypes of convergent iteratior,, monotone and oscillating, are illustrated in Figure 2.4(a\ and (b), respectively "To visualize the proces, sar ot po on the x-anis and move vertically tthe pcint (po, pi) = (po, g(a) on the curve y = g(x). Then move herizontally from (po, 71) to he point (pp) onthe line ) = x. Finely, move verily Cownward to py on the x-ais. The recursion Pagi = @(ps) is used te constit the post (Pr. eas) on the graph, thenaherizonal mation locates (Pry. Pvt) on he ine y = xr and then a verdeal movementends up at pion the -anis. The station is shown in Figure 2.448 Cuar. 2 THe SOLUTIONOF NONLINEAR EQUATIONS f(x) =0 I |g'(P)] > 1, then the iteration pnt = g¢Pr) produces a sequence that diverges away from P. The two simple types of divergent iteration, monotone and oscillating, are illustrated in Figure 2.5(a) aad (b), respectively. Example 2.4, Consider the iteration pes = gp) when the Function g(x) = 1+x~x2/4 is used. The fixed points can be found by solving the equation x = g(x). The twe solttions (xed poinsof g)are x = ~2asdz = 2. The derivative ofthe functions g(a) = 1—1/2, an there ae only two cass to consider Cameils Pa 2 cue: Pm Sanwa =20 Sato thenge p= — 2 100535 eaget pea — 220378135 pe 2Arpest P= 199999095 Amen 20 ali eo? Sine 01 > Jont—3,—n,yTe | sinc le} < fon ,3), by Teo- tem 23, the sequence wil coe to ‘orem 23, the sequence will aot converge oP =-2 Theorem 2.3 does no state what will happen when g/(P) = 1. The nextexample hasbeen specially constructed 0 that the sequence (Px) converges whenever py > P and it diverges if we choose po < P. Example 25, Consider the iteration pa = g(P,) when the function g(x) = 2{x—1)!/2 forx 2 1 isused. Only one fixed point P = 2 exists, The derivative is g(x) = 1/tx—1)!/2 and g/@2) = I, sc Theorem 2.3 does not apply. There are two cases to consider when the tarting vale isto the left or right of P = 2. : Case (Or Sar with po = 1.5. Cave) San with po = 2.5, ten get 1421356 shenget py 2. AaD8974 pes 2718831 pem2.a0799513 pom 07179683 237300514 Pe 053590852 4 = 230358284 ps =2-0.46009168)"7, i, P= Since pe les outside the donain of £60), he term ps cao be computed. ‘This sequence is converging to slowly fo be value P= 2; ned, Ping = 2.00398714, SEC.2.1 ERATION FOR SOLVING x = gtx) “ Absolute and Relative Error Considerations Jn Example 2.5, case (i), the sequence converges slowly, and after 1000 iterations the three consecutive terms are roo = 2.00398714, preci = 2.00398317, and prow This should not be disturbing: after ll, we could compute. few thousand more terms and find a ett approximation! But what about s criterion for stopping the iteration? Notice hat if we use the difference between consecutive terms, | pmo — piocel = 200398317 — 2.00397921 | = (00000396. Yet absolue ctr in the approximation pan i kuownto be |? ~ prowol = 2.00000000 ~ 2.00398714| = 0.00398714 ‘This is about 1000 times larger than |pion1 ~ proual and it shows that closeness of consecutive rms does not guarantee that accuracy has been achieved. But itis asually the only criterion available and is often used to terminate an iterative procedure: ‘Program 2.1 (Fixed-Point Iteration), To approximate a solution to the equation 2 = g() sarting with the inital guess po and iterating Py = gPn) | function [k,p,err,P)=€ixpt(g,p0,tol maxi) 4% Input - g is the iteration function input as a string ’g’ 2.00397921, % ~ p? is the initial guess for the fixed point = t21 ia the tolerance % maxi is the marinun murber of iterations ‘Yourput - k is the nunber of iterations that vere carried out =p is tho approximation to the fixed point sr ig the error in the approximation ~ P contains the sequence {pa} POD =feval(g,PQ-1)) erreabs (Pk) -PCi-1)); relerr2rr/(abs(P (k) }+eps) ; peP(; Af (err = g¢pir. Also, assume “hat there exists constant K such that [g"(2)| < . Show that |p: — pil < Kp) ~ pol. Hint Use the Mean Value ‘Theorem, 7. Suppose that xr) and 4"(2) are continuous on (2, 6) and that [g'(ei] > 1 on this inverva. Ifthe fixed point P and the intial approximations po and p lien the interval (a), then show that p = g( po) implies that [Ey| = |P ~ pil > IP — pol = |Eo| Hence sttemen: (7) of Theorem 2.3 is established (local divergence). {8 Let els) =—0.0001x? + x and po = 1, and consider fixed-point iteration, (a) Show that pp > pl >--- > pu > past > (b) Show that py > 0 forall m 22 SEC.2.2 BRACKETING METHODS FOR LOCATING A ROOT at (©) Since the sequence ‘sis decreasing and bounded below: it has a mit. What is te limit? 9, Let 26x) = 0.5x + 1.5 ane po = 4, and consider fixed-point iteration, (a) Show thatthe fixed point is P = 3. (b)_ Show that |/P ~ ppl = IP — py-t|/2.for 2 (© Show that |/P = ppl = iP = pol/2" for 10. Let ¢(2) = x/2, and consider fixed-point iteraton, (a) Findthe quantity |p. ~ pal/|pasal (b)_Discass what will pen if only the relaive error stopping criterion were vsed in Program 2.1 11, For ixed-poin iteration, ciseuss why its an advantage to have s/(P) * 6. 1.2.3, Algorithms and Programs 4. Use Program 2.1 to approximate te fixed ports (if any) ofeach function. Answers should be accurate to 12 decimal places. Produce a graph of each funeton and the 2 that eleasy shows any fixed points F382 42 cosisince)) (© 8) =x? sinc +015) (@) atx) Bracketing Methods for Locating a Root Consider famitar topic of interest. Suppose that you save money by making regular monthly ceposts P and the annual interest rate is 7; then the total amount A after N deposits is @ Asrer(seBjer(iegyenen(ed ‘The first erm othe right side of equation (1) isthe last payment. Then the next-to-last payment, which has earned one period of intrest, contributes P (1+ 7). The second- from-last payment has earned 1vo periods of interest and consributes P (1 + 74)°, and soon. Filly, telat payment, which has eared interest for NI periods, coni-bues (1+ 4)" toward the total Recall that the formula for the sum of the N terms of, geomet series is 2 Lert gy$2 Ciian.2. THE SOLUTION OF NONLINEAR EQUATIONS f¢x We can write (1) in the form 4 o(te(+g)*(-8) + +(4t)" ‘ and use he sbsitton r= (1+ 1/12)in 2) obtain 1-a+h" 202 ‘Tus canbe implied 1 oban the analy due equate, 3 The following example uses the annuity-due equation and require: a sequence of repeated calculations to find an answer. Example 2.6, You save $250 per month for 20 years and desire that the total value of all payments and interest is $250, 000 atthe end of the 20 years. What interest rate Tis needed to achieve your goal? If we hold N = 240 fixed, then A is a function of / alone. that is A= A(/). We will start with two guesses, fo = 0.12 and fy = 0.13, and perform a sequence of calculations to narrow down the final answer. Stating with Jy = 0.12 yields 250 012) 12 oe aoa om ("+ 2) Since thi value isa litle shor ofthe goal, we next ry J \) =2e318 A013) = SS Tne nigh so wet eaten ie 250 0.125)" et 10125 = Bra (14 EY” ~1) aro. ‘This is again high and we conclude that the desired rate les in the interval (0.12, 0.125) ‘The next guess is the midpoint fs = 0.1225: fe 250 0.1225) ats) = tas ((1+ 22) -1) =a. Tiss hgh andthe eral ow sarod 9.12, 0125) Os sean ‘nei gpa 0S 250 (/, , 0.12125)" ans ((!+—)~ 4.12125) SEC.2.2 BRACKETING METHODS FOR LOCATING 4 ROOT 3 (asa (fay (este (efor 76) ero (2) IF fla) and fe) have opposie signs then squeez> from the right Figure 2.6 The decsion process forthe bisection process (8) Ife) and fb) have opposite signs then squerze from the left Further iterations can be done to obtain as many significant digits as required. The ‘purpose ofthis example was to find the value of thit produced a specified level L ofthe function valve, that isto find a solution to A(/) = L. Its standard practice to place the ‘consant on the lefi and solve the equation A(I) ~ 1 =: . Definition 2.3 (Root of an Equation, Zero of a Function). Asiume that f(x) is a continuous function, Any number r for which f(r) = Ois called a root of the equation FG} =0. Also, we say ris a zero of the function f(x) 4 For example, the equation 2x? + Sx my = 3, whereas the corresponding function f thas two real zeros,n) = 0.5 and r2 ( has two real wots r) = 0.5 and 2a? 5x3 = 2x 1)0+3) ‘The Bisection Method of Bolzano In this section we develop our frst bracketing method for finding a 22r0 of a continuous function. We must start with an initial interval fa, bl, where fia) and f(b) have ‘opposite signs. Since the graph y = f(x) of a continuous function is unbroken, it will ctossthe x-axis ata zero x = r that lies somewhere inthe interval (se Figure 2.6). The bisection method systematically moves the end points of the interval closer and cleser together ati] we obtain an interval of arbitrarily small width that brackets the zero, The decision step fer this process of interval halving is first to choose the midpoint54 CHAR. THE SOLUTION OF NONLINEAR EQUATIONS f(x) = 0 ¢-= (a-+b)/2 and thenco analyze the tee possbilitice chat might arise: a If fa) and fe) Bave opposite signs, a zero Ties in lac. 6) IF fe) and f(b) have opposite signs, a zero les i [e, 6 ©) Af fle) 0, then the zeteis . If either case (4) or (3) occurs, we have found an interval half as wide as the original interval that contains the root. aad we ae “squeezing éown on i” (see Figure 2.6). To continue the process, relabel the new smaller interval |a, 6] and repeat the process until the interval is as small as desired. Since the bisection process involves Sequences of nested imervals and their mipoints, we will use the fllowing notation to keep track of the details in the process ‘a5 isthe midpoint {a y) isthe second interval, which brackets the zero andcy i its midpoint; 17) the interval (ay, i] is Half as w de as lap, Po. jay, do} is the starung interval and cy = After arviv:ng al the rth incerval fan, Dp which brackets r and has midpoint ‘es the interval (ays. Beal is constructed, whch also brackets r and :s half ide a5 lon. Sel I is Joft.as an exercise ‘or the reader tc show that the sequence of left end points is, increasirg and the sequence of right end points is decreasing; that is, be 8 ty 5, So $y Soo SPS SOS SH where cy = %3%, and if flan—1)f(bn41) <0, then 01 lenats broil in Cal OF (tmat Bast = le Bel forall ‘Vheorem 2.4 (Blsection Theorem). Assume that f ¢ Ca, b) and tat there ensts a number r € [a.6] such that ft) =. If f(a) and £(b) have opposite signs” and ‘lq tepresents the sequence of midposnts generated ¥y the bisection process of (8) «dOh then 10) roe EE form =o. and therefore the sequerie (¢n}gag converges to the zero x =r; that 1s, a3 fades Pre Smce both 210 rad te point ca He che interven, al the is es cen ore ener dnl be we aa a 412) breaks for all n SEC.2.2 BRACKETING MzTHCDS FOR LocaTiNG A ROOT Figure'2.7. The root r and midpoint ¢y of {a,b forthe bisection metbod. (Observe thatthe successive interval widths form the pattern 10s felt. as an exercise for the reader to use cathernatical induction and show thet <3) bos eo Combining 123 and (13) results in as) reg = 2 foe all er [Now an argument similar tothe one given in Theorem 2.3 car be used to show thar (14) implies that the sequence (ce y converges to r and the zroaf of the theorem is complete . Example 2.7. The function A(+) = + sin(x) oveurs in the stucy of undamped forced oscillations. Find the value of x Cut ies inthe interval [0,2], where tne funetion takes on the value Ax} = { (the funetiensin(x) is evaleaed in rack anc) ‘We use the bisection meth ts finda zero cf the Function fx) sith ay = Oard 5) = 2, we cemngete sins) 1. Sanieg f(0) = 1.00000 ane (2) = 9818565, 40 8 root of f(z) = U lies inthe incerval (0.2), At the midpoint cg 1rd that (1) = ~0.158529, Hence the function changes sign on [ep. Bal =(1, 2h ‘To continue, we squeeze from the left ane sea; = cy and by = bp, The mi point Sey = 1S and f(ey) = 0.496282. Now, f(1) = ~0.158529 and (1.5) = 0.496242 -mply thatthe root lies in the ieterval (a1.c)] = [1 . 1.5]. TRenex: decision is:0 scueeze rom the rightand set a2 = aj and by = ey. Inthis manner we obtain a sequence fc.) that sonverges tor = 1.114157141. A sample caleulatioa is given ia Tage 2. .‘Table 21 Bisection Method Solution of xsintx) ~ 1 = 0 Leh Right Function value, | cotpoint og Midpoint, ce poi, by See) a} 0 i 0.158529 t| to us 10496242 2} 10 125 0.186031, 3 Kis 0.015031, 4 rows -o.nis27 s | 1.06250 1.08575 0.028362 & | 1008780 Lies moons 7 | 1 ross7s0 Lamisis o.c0420e 3! 110837500 “ins2si2s 0.001216, : | A virme of the bisection metnod is that formula (10) provides a predeternined estimate far the accuracy of the computed solution. In Example 2.7 the width of the staring interval was by ~ ay = 2. Suppose that Table 2.1 were continued to the thiny-first erate; then, by (10), the error bouné would be 0,break,end maxtelfround ( (20g (b-a)~log(delta))/103(2)) for ex(arb)/2 yerfevai(f.0): if yoo bee: if bra < delta, break, end end e=(ave)/2: exrraps(b-2) ; eval (fc)© CHAR.2 THESOLUTION OF NONLINEAR EQUATIONS f(x) Program 2.3 (Felse Position or Regula Falsi Methed). ‘To approximate a rcot of the equation f(x; = Oin the interval (a, 6}. Proceed with the method only if f(x} is continuous and f(a) and f(b) have opposite signs. ction [¢,er7,yc}=regula(f,<,b,delta,epeilon maxi} Mopat - £ ie the function input as a string ’f? % = a and b are the left and right ead points X - delta ie the tolerance for the zero : ~ epsilon is the tolerance for the value of f at the zer> = maxi is the maxisum nunber of iterations (Oatput ~ ¢ is the zero = yertic) ~ err is the error yarteval(t,a); yorfeval (f,b); Bf yaeyo>o Aisp( Note: £(a)##(b)>0"), break, ond for ketimaxt dxeybe (b-a)/(yb-ya) onb-ax; yerfeval (f,<); Af yor=0, break; elseif ybxyoo bees ybeyes else yeryes ond dxemin (abs (de) ,ac) 5 Lf abs (éx) 3, then fag) febo) < 9. Ths, on the itervtfaa, 3) the bisection method wil converge f0 one ofthe thre zeros. If.ay < 1 and by > 3 are selected such that cy = S32" isnot equal to 1,2, or 3 for any © > 1, then the bisection method wil never converge to which 2e0(3)? Why? 15. Ifapolynominl, f(x), has an odd rumber of real 20s inthe interval (a, by), and cach ofthe zeros is of odd multiplicity, then fag f(bo) < 0, and the bisection etd il comergeto oe fh 2s Ha Lund by » Bare sec sch at = S32 isnot equal to ay ofthe zeros of f(x) for any n > I, then the bisection ‘eod will ever enmvergeto which zr)? Why? Algorithms and Programs 1. Find an approximation accurate to 10 decimal places) forthe interes ate J that will yield a total nuit vale of $500, 00 if 240 monthly peyments of $300 are made. 2. Corsider a spherical ball of radius = 15 em the is constructed from a variety Of white oak that has a density of p = 0.710. How much ofthe tall (accurate 10 8 decimal places) wl be submerged when itis placed in water? 3. Modify Programs 22 and 2.3 to ottpt s matrix aalogous to Tables 2.1 and 2.2, respectively (ie, the fist roW ofthe matrix would be [0 a9 co bo fteo)}. 4. Use your programs rom Problem 310 approximate he three smallest positive roots of r= tan(z) (accurate to 8 decimal place). 5. A oni sphere is ctinto two segments by a plane. One segment bas thre times the volume of the other. Determine the distance x ofthe plane from the center of the sphere (accurate to 10 decimal places Initial Approximation and Convergence Criteria The bracketing methods depend on finding an interval (a. b}sothat f(a)and f(b) have ‘opposite signs. Once the interval has been found, no matter how large, the iterations wil proceed until a roots found, Hencethese methods ar called globally convergent However. if f(x) = 0 has several roots in [a, b}, then a different starting interval must be sed tn find each root, 8 not easy locate these smaller intervals on which fx) changes sign Tn Section 2.4 we develop the Newton-Raphson method and the secant method for solving f(x) = 0. Both ofthese methods require that a close approximation tthe root SEC.2.3. INITIAL APPROXIMATION AND CONVERGENCE CRITERIA 8 be given to guarantee convergence. Hence these methods are called cally convergent. They usually converge more rapidly than do global ones. Some hytrid algorithms sart witha globally convergent method and switgh to & locally conveigent method wten the iteration gets close to root IF the computation of roots is one part of a larger project, then a leisurely pace 1s suggested and the frst thing to do is graph the function. We can view the graph = f(x) and make decisions basedon what it looks like concavity, lope, oscillatory ‘behavior, local extrema, inflection points, et.). But more impertan. ifthe coordinstes of peints onthe graph are available. they can be analyzed and the azproximate location of rots determined. These approximations can then be used as stating values in our root finding algorthns, ‘We must proceel carefully. Computer software packages use graphics software of varying sophisticatien. Suppose thats computer is used to graph » = f(x) on [2.8]. Typically, the interval is pantioned into N + 1 equally spaced points: a = x9 < xj <--+ < ay = and the Functon valves yg = f (&1) comptted. Then either @ line segment or a “ftted curve" are plotted between consecutive points (x41. Y2=1) and ix 94) for k = 1, 2,.... WM. There must be enough poins so that we do not ‘miss. root in a porton of the curve where the Function is changing rapidly. If fx) is continuous and two adjacent poims (x1. yk-1) and (xs. 94) lie on opposite sites of the x-axis, then the Intermediate Value Theorem implies that a least one root es in the interval (241, x]. But if there is a root, or even several closely spaced rots, in the interval [x4 4) and the two adjacent points (x41, Ye—1) and (x4. 94) lie OB the sume side ofthe x-axis, then the computer-generated graph would not indicals @ situaion where the Intermediate Value Theorem is epplicable. The graph produced by the computer will not be a tue representation ofthe actual graph of the function f- Itis rot unusual for functions to have “closely” spaced roots; that is, roots where he raph touches but does not cross the x-axis, oF roois “close” to & vertical asympscte Such characteristics of « function ned to be considered when applying any numerical rooting algorithm. Finally, near twoclosely spaced ots or near double oot, the computer-generiied curve berween (xi. x1) and (%, 94) may fail to cross or tonch the x-axis. If 1/4) 8 smaller than a preassigned value « (.e.,f(x) ~ 0), then xy isa tentative approximate root. But the graph may be close to zere over a wide range of values meat ‘sg, and thus xp may aot be elose to an actual roo. Hence we add tre requirement that the slope change siga near (x,y) that is, m,_, = 2 hhave opposite signs. Since xy — #1) > Oand ty) ~% > 0, itis not necessary to use the difference quoticats, and it wil suffice to check o see ifthe differences ye — y»=1 and yus1 — yx change Sign. In this case, xy isthe approximate oot. Unfortunately. ‘we cennot guarantee that this starting value will produce a convergent sequence. If the raph of y = f(«) has 8 local minisum (or maximum) that is extremely elose to 2e70, thon iis possible that x, will be reported as an approximate root when f(x4) * 0, although xe may notbe closet a r03t64 CiIAP. 2. THE SOLUTION OF NONLINEAR EQUATIONS f(a) ‘Table23 Finding Approximate Locations for Roots Freon vals Direc ny Sinica changes a|ni| on | x-mi | nam infisier f@) =| aps | pee | ae 09 | “0958 | “0361 | 1329 | i663 | rctunsessgnintat-1.20) -06 [ost | ie | ous | ose nos | som | iiss | ose | 019s | 7’ changes sgn oars, 00] ts | 100 [01s 036s 03 | 100 | os7 | “038 | 0361 a6 | oo) 02s | ox | 0257 09} 0256 | cos | —o2s7 | “oe | 7 tango signs, 12 | oo | 00s [~~a0e [057 yetegarel Figare 2.0 The graph of the ct. bic polynomial y= x8 — 2? — 5-41 Example 2.9. Find the approximate location ofthe roots of x} — x? — x 1 = Doon the interval 1.2, 1.2]. For illusvation, choose A” = 8 and look at Table 2.3. ‘The three abscissas for consideration are ~1.08,~0:3, and 0.9. Because f(x) changes sign on the inerval [—1.2,-0.9}, the value —1.05 is an approximate root: indeed F105) = ~0.210. Although the slope changes sign near ~0.3, we find that f(/~0.3) = 1.183: hence ~03s not near a root Finally, the slope changes sign near 0.9 and f(0.9) = 0.019, s0 0.9 isan approximate root (see Figure 2.10) . SEC. 2.3 INITIAL APPROXIMATION AND CONVERGENCE CRITERIA 6 yore Figure 2:11 (a) The berzootal convergence band fer locating a solution to fa) =0. igure 2.11) The veri convergence band “or locating 8 solution 0 f(x) = Checking for Convergence A graph can beused to See the approximate locaton ofa root, but an algorithm must be used to compute a value py thal s an acceptable computer solution. Heration is often sed wo produce a sequence (p.} that conveiges to a root, and s termination criterion ‘on siategy must be designed abwead of time so that the compucer will stop when an accurate approximation is reached. Since the goel isto solve F(:) = Othe final value e should bave the property tat |f (Pa)! <€ “The user cin supply a tolerance value e fr the sizeof |/ (pul and then an iterative process produces points Py = (pu, (Px) uni he last point P, lies in the horizontal tend bounded by the lines y = ste and y = —e, as shown in Figure 2.11(a). This criterion is useful if the user is Lying to solve F(x) = L by applying @ root-finding66 CHAR2 THE SOLUTION OF NONLINEAR EQUATIONS ft) = 0 algorithmto the function f(x) = h(x) ~ L. ‘Another termination stterion involves the abscissas. and we can try 19 G4 the sequence {px} is comerging. If we draw the verticalines x = p +é anc fon each sde of « = p, we could decide to stop the itration when the pois, i between taese two vertict lines, as shown in Figure 2.11(b). ‘The latter criterion is often desired, but itis difficult to implement because i i volves the unknown soluion p. We adapt this idea and terminate further calculaticis ‘when the consecutive iterates py and py are sufficienly close oF if they agree with: M significant digits ‘Sometimes the user of an algorithm will be satisfies if py pq1 and other tims when f(pq) © 0. Coreet logical ressoning is required to understnd the cor quences. IF we require that [pp — pl < 8 and LF(p) < e, the point Py will be located inthe rectangular region about the solution (p, 5), as shown in Figure 2.12( If we stipulate that jpy ~ pl < 8 or if(pn)| < ¢ the point Py could be locate! anywhere in the region formed by the union ofthe horizontal and verica stripes, shown in Figure 2,12(b). The size of the tolerances 8 and « are crucil. If the tol eras are chosen too small, iteration may continue ferever. They should be chosen about 100 times larger tkan 10-™', where M is the number of decimal digits in the ‘computer’ floating-point aumbers, The closeness of the abscissas is checko« wth one of the ericria 1pm = Pr-il <3 (estimate for the absolute error) 2a pecil ial lal ‘The closeness of the ordinate is usually checked by | (Pa)l < €. 8 (estimate forthe relative errr). ‘Troublesome Functions A computer solution to /(x) = 0 will almost always be in error dte i andlor inability inthe calculations. If the graph y = f(x) is steep m {p.0) thes the root-finding problem is well conditioned (i. solution with sev significant digits is easy to obtain). If the graph y = f(x) is shallow near (p, 0). then the root frding problem iil conditione¢ (he computed root may baw afew significant digits). This occurs when f(x) has a multiple root at p. This vi-cussec further in the next section SEC. 2.3 INITIAL APPRCXIMATION AND CONVERGENCE CRITERIA Figure 2.12 (a) The ectangulr region defined by |x — p} < 8 AND [yl < € Figare 212 (9) Toe unbounded region defined by [x — p| < 8 OR |y| <68 CHAP. ‘THE SOLUTION OF NONLINEAR EQUATIONS f(x) Program 24 (Approximate Location of Roots). To roughly estimate the locations of the roots of the equation f(x) = 0 over the interval [a,b], by using the ‘equally spaced sample points (xe, f(x4)) and the following criteria: @ Or-DOW < 0.00 i) bye) <€ and (Ye — Yee! — 0) < 0. ‘That is, either (x41) and f (x) have opposite signs or | f(xs)| is small and the slope ofthe curve y = j (x) changes sign near (m1, (¢) function R= approot (X,epsilon) % Input - £ is the object function saved as an H-file naned f.0 % = X is the vector of abscissas 4 = epsilon is the tolerance % Output - R is the vector of approximate roots Yt; yrange = max(¥)-ain(¥) ; ‘peslon2 = yrangevepsilon; nelength(; X(ns1)=xX(@); Yoet)=YOa); for k=2:0, Af YOR) <0, went; Rw) = (eA) 4100)9/2 ena B=(¥G)-¥Clo1) (YC) -¥) ; if (aba(¥(k)) < epeilon2) & (8<=0), wat; R(@)=KCO ; end ond Example 210. Use apgroot to find approximate locations for the roots of f(x) = sin(cos(s°) in the interval ~2, 2]. First save f as an M-Sle named fm. Since the results ‘will be used as inital approximatiors fora root-findig algorithm, we will construct X so that the approximations will be accurate 104 decimal places. >oxe- >oapproot (X,0,00001) 71.9875 “1.6765 -1.1625 1.1625 1.6765 1.9873 01:25 ‘Sec, 2.3. INTIAL APPROXIMAT:ON AND CONVERGENCE CRITERIA @ Comparing te results with the gaph of f, we now have good initial pproximations for ‘one of our ront-finding algorithms . Exercises for Initial Approximation In Exercises | through 6 use « computer or graphits calculator to graphically determine the approximate location of the roots of f(x) = (in the given interval. In each case, ‘determine an interval (a, b] over which Programs 22 and 2.3 could be used to determine the roots (ie 1 f@ 2 fe) =x —cos(x) for -2 treme values ofthe functions in Problems 1 and 2. Compare your approx imations with the actual values.24 0 CHA2.2 THESO.UTION OF NOKEINEAR EQUATIONS f(x) =0 Newton-Raphson and Secant Methods Slope Methods for Finding Roots, IF fox, f°¢2), and J) are coninnous near a soot. chen this entra information regarcing the narure of 7(x) can be used 10 develop eigorids that will produce sequences |p.) thac converge faster top Caan either te bisection or Fae position metro. “The Newion-Raphson (or simply Newton's) method is one ofthe most useful and best koowr algorithms that reson the continuity of f(x: and #"(x). We shall introduce It grapnically and:hen give a more sgorous teatrnent based oa the Taylor polynoesal Assume thatthe jail approximation po is near the root p. Then the graph of y= Fx) interseos ths x-axis a the point (p.0).axd the point (po. pn) ies onthe ‘curve ear the point tp, 0) ‘see Figure 213) Define p to be the pont of intersection of the s-exis and the line tangent the curve atthe point (pp. f(po)). Then Figure 2.13, shows that willbe closer to p than pp in this case. An equationrelating py and iv can te found if we write dawn io versions forthe slope of the tangent line L: O= fp) 1 n= Doe) 7 Pim Po which is the slope of the line through (pr. 0) and (po. (po), and m= Fp) which isthe slope a she point (px. F(p0)). gusting the values of the slope min sontions (1) and (2) and solving fe results in _ fir, (Ro. eo m eo) finn gure 2.13. Tho geemeuie constuction of py and pa for the Newton-Raphson metvod SEC. 2.4 NEWION-RAPHSON AND SECANT MeTHODS n ‘The process above can be repeated tw obtain a sequence {ps} that converses to f ‘We now make these ideas more precise Thearem 2.5 (Newton-Raphson Theorem). Assume that = C¥\a. bj and there ‘existe a number p € [a, h}, where fip) = 0. I f"lace of pr in equation (9). the general rule (4) is established. For ‘most applications this all that needs o be understood. However, to fully comprehenc. o Pi= po72 CHAP. 2 THE SOLUTION OF NONLINEAR EQUATIONS f(x) =0 wha: is happening, we need to corsider the fixed-point iteration function and apply ‘Theorem 2.2 in our situation. The key is in the analysis of g'(x): dope Le) = fe7"a) . feos" ror For? By hypothesis, /(p) = 0: thus 4’(p) = 0. Since ¢(p) = Oand g(x) is continuous, it 1s possible to find a4 > Oso that the hypothesis |g)! < 1 of Theorem 2.2is satisied on (9 ~ 5, p +8). Therefore, a sufficient conditien for pp to initialize a convergent sequence (Ptlf®which convergesto a root of f(2) = 0,is that po € (p ~ 8. p +8) and tat 8 be chosen so that Leos"! For ry for all x €(p—8. p+8) . Corellary 2.2 (Newton's Iterationfor Finding Square Roots). Assume that A > 0 is areal number andlet pp > O be an intial approximation © V/A, Define the sequence (pelo using the recursive rule 4 pet be z “Ther the sequence {p1}f2q converges to VF; that it iyo be = VA Outline of Proof. Start with the function f(x) = 2? — A, and notice thatthe roots of the equation x? — A = are -t/A, Now use f(x) and the derivative f(x) in formula (S) and write down the Newton-Rapison iteration formula fo) Fo ‘This formula can be simplified to obtain for & an Pi aay a= 03) 50) = ‘When g(x) in (13) is used to define the recursive iteration in (4), the result is formula (11) It can be proved that the sequence that is generated in (11) will converge for any starting value pp > 0. The details are left for the exercises. . ‘An important point of Corollary 2.2 isthe fact thatthe iteration function x(s) involved only the arithmetic operatins +, ~, x, and /. If g(x) ha involved the cal culation of a square 00, we would be eaught inthe circular reasoning that being able to cacatethe square root would permit you to ecusively define asequence that wil converge 10 VA, Forths reason, f(:) = x? — A was chosen, becase it involved ox the aithmetic operations. ‘S8C.2.4 Newron-RAPHsOn AnD SECANT MSTHODS, n Example 211, Use Newton's qure rot algrte to ind V3. ‘Surg wih po'> 2and sing forma (1, we compe 2457 pix 2452 woos 2.25 + 5/225 Pea 2236111111 py mw PEON S/2.236100 9 soercrarg 236067918. Further iterations produce py * 2.236067978 for k > 4, so we soe that corvergence accurate to nine decimal places has been achieved. . [Now let us tur to a familiar problem from elementary physics and see why de- termining the location ofa root is an important task. Suppose that a projectile is fired from the origin with an angle of elevation 49 and initial velocity wp. In elementary courses, air resistance is neglected and we lean that the height y = y(#) and the dis tance traveled x = x(¢), measured in feet, obey the rules aay yen 16? and ‘where the horizontal and vertical components ofthe intial velocity are vx = 1 cos(bp) and vy = upsin(dy), respectively. The mathematical model expressed by the rules in (14); easy 19 work with, bul ends to give too high an altitude and too long a range forthe projectie’s path. If we make the additional assumption that the air resistance is proportional tothe velocity, the equations of motion become as) y= S10 =) +209 (1-19) ~s0C4 aod a6 rent) =Cy(1-e*) where C= m/k and k is the coefficient of ir resistance and m is the mass of the projectile, A larger value of C will result in ahigher maximum altitude and a longer ange forthe projectile. Te geaph ofa Right rath of a projectile When ar resistance is considefed is shown in Figure 214. This improved mode! i more realistic but equires the use of a root-finding algorithm for sclving ff) = 0 to determine the elapsed time until the projectile hits the ground, ‘The elementary model in (14) does not equine & sophisticated procedure o find the elapsed time.74 CHAP.2. THE SOLUTION OF NONLINEAR EQUATIONS f(z) = cape in.feo) a Figure 214 Path of a proictile an) 00 ed 00 1000" “Wath ar resstane considered, ‘Table 24 Finding the Time When the Height f(t) IsZero z Time pe Pest ~ Pe Height, f(r) o ‘ommoc00 ‘o797ia101 ‘Bamw7i00| 1 87973101, 0.055:018) =5.68369700 2 874210981 0.00025475, =0.03050700 3 aramsT =0.0000001 =0.00000100, 4 8792174665 ‘.00000000 .00¢00000 Example 2.12, A projectile is fred with an angle of elevation by = 45°, v= vy 160 ise, and C = 10. Find the elapsed time until impact and find the range, Using formulas (15) and (16), the equations of motion ate y = f(2) = 4800(1 — 711%) ~ 3201 and x = “() = 1600(1 ~ 2°19), Since f(8) = #3.220972 and F(9) ~31.534367, we will use the initial guess py = 8. THe derivative is f%¢) = a80e"*/1° — 320, and its value f"(po) = f'(8) = ~104.5220972 is used in formula (4) to get 183.22097200 (04 3220073 ‘A summary ofthe calculation is given in Table 2.4 ‘Tae value pe has eight decimal places of accuracy, and the time until impact is = 8,74217466 seconds. The range can now be computedusing (1), and we get 00 (1 8797731010. 718.74217466) omens) _ 95. 49863028 . ‘The Division-by-Zero Error ‘One ebvious pitfall of the Newton-Raphson method is the possibilty of division by zero it formula (4), which would oveur if /'(p4—1) = @. Program 2.5 has a procedure Sec.24 NEWTOr-RAPHSON AND SECANT METHODS 1 to check for this situation, but what use isthe last calculated approximation x in this cae? It is cue possible that £(p,1) 1s sufficiently close to zero and that py is an acceptable agproximation to the root. We now investiga this situation and will ‘uncover an interesting fact, that is, hw fast the iteration converges. Definition 2.4 (Order of a Roti. Assume that f(x) and its derivatives f(x) + (OC) ate defined a continavus ean ae interval abuot x =p. We siy dh FE) = Ohas a root of order Mat x = p iand only if an Fp) = 0. FP) =0. FAM p= 0, and Fp) £0. ‘A toot of onler M = is ofter called a simple root, and if bf > 1, itis called a multiple root. A r20t of order M = 2is sometimes called a donble root, and s3 on. ‘The nest resut will illuminate these concepts. a Lemma 2.1. I the equation f(x) = 0 has a root of order M at x = p, then there exists acontinucus function h(x) se that f(x) can be expressed asthe product as) F(x) = (x= pyMhex), where hp) #0. Example 2.13, The function f(x) = x3 — 3r +2 has a simple root at p = —2 and a outble root at p = |. This can be verified by considerirg the derivat ves #"(x) = 3x7 — 3 and fix) = 63. At the value p = =2, we have (2) = Oand f"(-2) = 3, 30 ‘M = Lin Definition 24; hence p = —2is a simple root. For te value p = 1, we have (1) =0, f'(1) =, and f"(1} = 6,50 M = 2in Definition 2.4; hence p= 1 isa double oot, Also, notice that F(x) has the fstorization f(x) = (2 + 2)\x— DF . Speed of Convergence ‘The distinguishing property we seckis the following. If p isa simple root of f(x) [Newtor's method wll converge rapidly, and the number of sccurate decimal paces (roughly) doubles ~vith each iteration. On the other nand, if p is-a multiple roo, the error in each successive approximation is 2 fraction of the previous exror. To make this precise, we desine the order of convergence. This isu measure of how ragidly a sequence converges. Definithin 2 (Order of Convergence). Assume tat {Pp}. converges io p and 0 Ey = p ~ py orn > 0. IF (wo postive constants A 7 Oand R > Ost and a9) lig, P= Prt gy Emel wees |p — pal!16 ChaP.2. THE SOLUTION. OF NONLINEAR EQUATIONS fx) Table 25 Newion's Method Converges Quadsstically at a Simple Root SHC. 2.4 NEWTON-RAPHSON AND SECANT METHODS n Table 2.6 Neoon’s Method Converges Linssy at a Double Root e EF - Ein! . a me pest pe Ey=p~ me Basal é m Peet ~ Pe Fuse pe ie us 7 cama] avememan | oarai907 a) Tanmisooo a oenanans~ | oasomn000 —] —oaisisiis ; nsrapass=t||-apamone sca ivammenes | | timers | ocsomsss: | “names | SStnes3 ; pememe =| Cassesane tt ears > | tacssean | “empecc | “dome | Guests 2 iessomee=—-tehmmnes > | Inastoont’ | ners | —ooasenn | osorssass 3 cmmese | ssomsse 1 | tonareo | “atest | Tongans | Sdsops |__21000 ” 5 | Vowwtatts | —ooossieess | ooossanis | osoassons | then the sequerce is said to converge to p with order of convergence R. The num ber A is called the asymptots error constant, The cases R = 1, 2 are given special consideration. 20) ¥ ep Fr. UE Risse. the sequcnce(p) convergstapidly top: that is, eaton (19) implies that for large values of n we have the apprax:mation (E41 * A\E,|®. For example, suppose that R= 2 and |E,|~ 10"; then w2 would expect that [E,4.1| © A x 10~4 ‘Some sequences converge at rate that snot a0 integer and we will see thatthe ender of convergence ofthe scant method is R= (1+ ¥5)/2 ~ 1 61803398. 1, the convergence of { Px }x2. is called linear. = 2, he convergence of [pa }g2n is called quadratic . Example 2.14 (Quadratic Convergence at a Simple Root), Start with pa = -2.4 snd use Newior-Raphson iteration to find the root p = —2 OF the polynomial f(x) 4 — 31 +2, The iteration formula for computing (px) is 2p. -2 Beha 3 a = 8(D4-1) = Using formula @21) to check for quadratic convergence, we get tae values in Table2.5. a ‘A detailed look atthe rate of convergence in Example 2.14 will reveal that the error in each successive iteration is proportional te the square of the error ir the previous iteration, That is, p= rail Alp ee vihere A = 2/3 To check this, we use lp psi = 0.000008589 and Ip ~ pal? and i is easy tose that 0.000012931 (0.003596011 |a-— psi = 0.000008589 ~ 0.c09008621 = > p ~ px Example 2.15 (Linear Convergence at a Double Root). Start with py = 1.2 and use "Neweon-Rapiisom iteration to iad the double rot p = 1 of the polynomial f(x) = x? — +2. Using formula (20) to check for linear convergence, We ge the values in Table 2.6. Notice that che Newton-Raphson method is converging (0 the double root, but at astow rete, The values of fips) in Example 2.15 go to zero faster than the values, of F'(ped 80 the quotient f1ps)/f" (px) i formula (4) is defined when pe # p. ‘Tae sequence is converging linearly, and theerroris dereesing by a factor of approximately 1/2 with each successive iteration. The following theorem surumarizes the performance of Newton's method on simple and dovble rots ‘Theorem: 2.6 (Convergence Rate for Newton-Raphson Iteration), Assume that [Newton-Raphson iteration produces 2 sequence {pa}22> that converges tothe root p ofthe furtion f(x). If pis simple root, convergence fs quadratic and yw 23) Mensa © Eg? for n sufficiently Is ¢ MEwsil ® sippy teal ly large. If p is 9 multiple root of order M, convergence is linear and M co) JE nes) “2*|En1 for nm sufficiently large. Pitfalls ‘The division-by-2ero error wa: easy 10 amticipete, but there are other difficulties that are not so easy to spot, Suppose that the function is (x) = x? — 4x + 5; then the sequence {4} of real numbers generated by formula (4) will wander back and forth from left to right and not converge. A simple analysis o° the sitation reveals that ‘ftx) > Cand has no seal roots78 CHAP.2 THE SOLUTION OF NONLINEAR EQUATIONS f(s) = 0 y fod mat Oy wre 2.15 _‘a) Newton-Rapheon iteration for fix) = can produce a divergen sequence. Sometimes the initial spproximation py is too far away from the desired root and the sequence (ps converges te some other root, This usually happens when the slope F'ipo) is sail and the targent line to the curve y = f(x) is nearly horizontal. For example, if f(x) = cos(x) and we seek the root p = 2/2 and star with po = 3, caleulation reveals that py = —4.01525255, pp = ~4.85268757, ..., and (ps) will converge toa different root ~3x/2 ~ —-4.71238898. ‘Suppose that (+) is positive and monotone decreasing on the unbounded interval {a,00) and po > a then the sequence (p:} raight diverge to +00. For example, if fie) = xe and po = 2.0, then P= 40, pp =5.333333333, + pis = 19723580834, and {p4) diverges slowly to +00 (see Figure 2.15(a). This particular function hes ancthersorprising problem, The value of fx) goes 10 era rapidly as» gets large, for ‘example, f(pis) = 0.000(000536, and its possible that pss could be mistaken for ‘100t, For this reason we designed stopping criterion in Program 2.5 to involve the relative eror 2|p441 — pe|/|px|+10~4), ad when k = 15, this value is 0.106817, so the tolerance 5 = 10~¢ willhelp guard against reporting a false 00%. Another phenomenon, gjeling, occurs wher the terms in the sequence (px) tend to repeat or almost repeat. For example, if f(x)= x? —x~3 and heinital approximation then the sequence is 1.961538, ps 0.006579, 1.961818, py 1.147176, py 1.147430, mn Ps and we are stuck in a cycle where pis © pe for k = 0, 1, ... (see Figure 2.15(b), Butif the stating value pa issu Ticiently close tothe root p ~ 1.671699881. then {pi} SEC. 2.4 NEWTON-RAPHSON AND SECANT METHODS nv Figure 215 (b) Newion-Raphson iteration for (2) = sony — Son pradice seyslic sequence = taal) igure 218 4) Newton-Raphson iteration for fx) = arctan(x} can Froduce a divergent oscillating sequence. ‘converges. If po = 2, the sequence converges: p, = 1.72727272, pp = 1.67369173. ps = 1671702570, and pq = 1.671699881 When g/(2)| > 1 on an interval containing the root p, there is a chance of divergent oscillation. For example, let f(x) = arctan(x); then the Newton-Raphson iteration function is g(x) =x — (1 +) anctanx), and g(x) = —2x arctan(x). Ifthe starting value po = 1.43 is chosen, then =2.889109054, 1.550263297, p= 1.845931751, ps a te. (see Figure 2.15(c)). But ifthe starting value is sufficiently close to the root p = 0,80 CHAP.2 THESOLUTION OF NONLINEAR EQUATIONS f(x) =0 igare 2.16 The geometric construction of p> fer the se- can method. ‘a convergent sequeace results. If py = C.S, then 0.000000000. p= ~0.079559511, pp = 0.000335302, ps “The sittations above point o the fact that we must be ronest in reportingan answer. Sometimes the sequence does no! comserge. It is not always the case that after iterations 2 solution is found. The user cf a root-inding algorithm needs tobe warned of the situation when a root is not found. If there is ether information concerning the context of the problem, then itis fess likely that an erroneous root will be found Sometimes f(x) hes a definite interval in which a root is meaningful. If knowledge of the behavior of the Function or an “accurate” graph is available, then itis easier to choose po ‘The Secant Method “The Newon-Raphsonalgorthm requires the evaluation of wo functions per iteration HP4-1) 98 f'(p1-1). Traditional, the calculation of derivatives of elementary func tions could involve considerable etfor. But, with modem computer algebra software packages, this has tecome less of an issue. ‘Still many functions have nonelementary forms (integrals, sums, ec) and itis desirable to have «method that converges almost 28 fast as Newron’s method yet involves only evaluations of (2) and not of f(x) ‘The secant method will require only on evaluation of 7 (5) per step and ata simple root hasan order of convergence R= 1.618033989, Is almost as fast as Newton's method, which has order 2. “The ema involved inthe secant wath e the car ome that was usd inthe regula fas method, except that the logical decisions regarding how to define each succeeding term are different. Two initial points (po, f (po)) and (pi, f(p1)) near the point (7.0) are needed, as shown in Figure 2.16. Define p2 to be te abscissa Sec.2.4 NEWTON-RAPHSON AND SECANT METHODS a ‘Table 2.7 Comergence of the Seeant Method! at a Simple Root eo) a Pet ~ Pe ° =2-600n00000 ‘200000000 “e00000000 o.91e132631 1 ~2.400900000 ‘0293401015, ‘.400000000 0.699765 2 =2:106598985 101083957573 o.106s98985 o.gar290c12 3 ~2oms4iat2 1021130314 ‘o.on26a1612 0.693608522 4 2.001311088 01488561 0.001511098 o.g2s84n116 5 ~2.000922537 1000022515 0.000022537 o.727100687 5 ~2.000800022 © co000e022 ©.000000022 7 =2.009000000 ©-c00000000 0.000000000 cf the point of intersection ofthe line through these two points and the x-axis: éhen Figure 2.16 shows that pp will be closer to p than either pp or pi. The equation ‘lating pa, p, and po is Found by considering the slope LOV= 200 gag yy 9 LOD PP Pi Pi ‘The values of min (25) are the Slope ofthe secant line throngh the frst two approxi ‘mations and the s ope ofthe line through (pi, f(p1)) and (p20), respectively Sette "rghrchend sides equal in (25) and solve for p2 = a(91. po) and get _ £(vsiPs — pod So) ~ Foo) ‘The general term is given by the t¥o- point iteration formula 5) 26) P= (Ps po) = Pr en) Pest = 8(PR: Fea) = Pk Example 2.16 (Secant Method at a Simple Roll. Stet with pp = 2.6 ind B= 24 and use the secant method to find the oat p = ~2 ofthe polynomial fun HG) 28-3 42. In this ease the iteration formula (27) is _ (Wh = 3+ PE PE Pies ~ 3p +324: ‘This canbe algebraically manipulated w obtain Pipe + Pape. P+ pape + "The sequence of iterates is given in Table 2.7. . ee Pai = 8(Ph. Pet” = Pe ~2 Gs) Phot = B(Pks Pe) =82 CHAP.2. THE SOLUTION OF NONLINEAR EQUATIONS f(x) =0 ‘here is a relationship between the secant method and Newton's method. For a poly oma function f(x), the secant method rwo-peint formula pri = g1P. PAI) Will reduce to Newton's one-point formula pest = g() if px s placed by Pi. Indeed, if we replace px by pi in (29), then the right side becomes the same asthe fightside of 22) in Example? 14, Proofs aboutibe cat of canvergence of the secantmethod can be found in advanced texts on numerical analysis. Let us state thatthe ero terms satisfy the relationship (30) Waal Eel ‘where the order of ccuvergence is R = (1 + ¥/5)/2 ® 1.618 and the relation in (30) is ‘valid only at simple 2oots ‘To check this, we make use of Example 2.16 and the specific values [p= pst = 0.000022537 ip - pal?® = 0.001511098'*"* — 0,000027296, and ARIF H2/2F HII = (2/3) = 0.778351205, ‘Combine hase and itis easy to see that 0.000022537 = 0.000021246 = Alp — pal" {p— pst Accelerated Convergence ‘We could hope that there are :oot-finding techniques that converge faster then linearly whee p is a oot of order M. Our finel result shows that 2 modification can te made to ‘Newton's method so that convergencs becomes quadratic at a multiple root. ‘Thecrem 2.7 (Acceleration of Newton-Raphson Iteration). Suppose that the New-on-Raphson algorithm produces a sequence that converges linearly 12 the root ‘of order M > 1. Then the Newton-Rashson iteration formula MF (pe) an n= pay — PD i. PEP Foy will produce a sequence {pe} 2g that converges quadratically tp SEC.2.4 NEWTON-RAPKSON AND SECANT METHODS 3 ‘Table 2.8 Acceleration of Convergence a a Double Root is ' m Pasi Fee p-re te >| 1aimconase | ~—o.i0abo04 | ——ocmoooooo J a is1s15130 1 | onwasteos | ooosasisis | —ocneososos | aiesriesre 2 | 1000006087 | nove? | coon? 3 | to.ee00000 | “scovenee |“ o.cxmoncnn ‘Table28 Comparison ofthe Speed of Convergence Specie Relaionberween| consideraions sucoesive sor term Best = Eel Exe © ALE) Maltple roo Exst = AlEL Newton Rapson Makiple os Fest © AEs Secant method Simple root Eee © ALE Ae Newton: Raphton Simple rot, Fest © AEE Accelerated Mulkiple roo: Best © AMER? Newtor Rapbson Ecample 2.17 (Acceleration of Convergence at a Double Root). Stat with pp = 1.2 saad use acelerated Newton-Raphson iteration o find the double root p = 1 of f(x) Past? Since Mf = 2, he acceleration fonmula(31) becomes Stes). Pla t3ei~4 rr 3 ‘and we obtain the values in Table 2.8. . ‘Table 2.9 compares the speed of convergence of the various root-finding methods ‘hat we have studied so far. The value ofthe constant A is different for exch method84 CHAP, 2 THE SOLUTION OFNONLINEAR EQUATIONS f(x) =0 Program 2.5 (Newton-Raphson Iteration). To approxinate a root of f(x) siven one initial approximation po and using the iteration L-1) q =m for k= 1,2, PPT Fon ‘fanvtion [p0,err xk, yl=newton (Zt pd, delta, eptilon axl) Ktaput ~ £ le che object function Luput as © wiring “f = af is the derivative of £ input as a string *df” = pO Ss the initial approximtion to a zero of £ = delta is the tolerance for pO = epeiton is the tolerance for the fuction values y ~ maxi is the aaxiaua number of iterations = pO is the Nevtox-Raphson approximation to the zero = err is the error estimate for pO = is the munber of iterations ~ y is the function value (90) for kel :maxt pixpO-faval (t ,p0)/teval(2f pO); exrmabe(p1-p0) relorr=2verz/(abs(p1)4elta) p0=pL: eval (£70): Lf Corrcdelta)| (reterr. and ps. What is litasao Pe? fe) If 9 = 20. then find pr. pr. p3. ard pa. Whats bthe-ce Pk? (2) What isthe value of f(p4) in partic)? In Exercises § trough 10, use the secant method and formula (27) and compate the nev! two iterates pp and p 8 Let f(x) = 1? — 2x ~ |, Stat with po = 2.6 and py = 2.5. 9, Let fix) 3. Start with pp = 1.7 and p) = 167. 10, Let f(x) = x3 —x +2, Stat with pp = ~1.5 and py = 1.52. 11. Cube-root algorithm. Start with f(x) derive the recursive formula 2 — A, where A is any real number, nl Dosen + AVR peo EAE oe 12, Consider f(x) =x — A, where N isa positive integer. (a) What real values are the solution te f(x) = O forthe various choices of Nand A that can arise? (b) Derive the recursive formule WV Does +AlohG DP APP for emt, 2, forfinding the Nth root of A, 13. Can Nevton-Raphon iteration be used to solve f(x) = Oif fx) =x? — 14x +50? Why? 14, Can Newton-Raphson iteration be used to solve f() = if f(x) = x42 Why? 15. Can Newton-Raphson iteration be used 19 solve f(x) = Dif fx) = (x — 3)" and the staring value is po = 4? Why? 16, Establish the limit of the sequence in (11), ¥7, Prove thatthe sequence (px) in equation (4) of Theorem 2.5 converges to p. Use the Following steps (a) Show that pis a xed point of g(x) in equation (5) then p isa 2210 of fx). (b) If pis a zero of flx) and f' = (0.900062, and Es = 0.000000. Estimate the asy-nptotic enor constant A. and the rier of convergerce R of the sequence generated by the iterative mathod, Igorithms and Programs 1. Modify Programs 2.5 and 2.6 to dsplay an appropriate eror message when () i vision by zero occurs in (4) or (27), respectively, or (il) the maximum number of iterations, maxi, is exceeded 2. Itis often instructive to display the terms in the sequences generated by (4) and (2 Ge, the second eclumn of Table 24. Modify Pregrams 2.5 and 26 to display thc sequences generated by (4) and (27), respectvl. 3. Modify Program 2.5 to use Newtot's square-root slgorithm to approximate ea:h the following square roots to 10 decimal places, Start with pp = 3 and approximate vB. (Start wih py = 10 and approximate V5T. (et Start With pp = ~3 and approximate — v8. 4. Modify Program 2.5 to use the cube-root algorithm in Exercise 11 w approximare ‘ofthe following cube roots to 10 decimsl places. Start with pp = 2 and approximate 71°. (bi) Start with py = 6 ané approximate 200", (el Start wih py = ~2 and approximate (~7)"9 Ste. 2.4 NeWTOW-RAPHSON AND SECANT METHODS oo 5. Modify Propram 2.5 to use the accelerated Newion-Raptson slgoius in Theo- rem 2.7to find the oot p of order Mf of ea of the Following funcuons (@) fix) =e —2)9,M 25,9 = 2551 with po () fea) = sing), M = 3, p = 0; stare with pp = (© f= le ~ 1) lah), M = 2, p = 15 sa with py =2 5. Modify Program 2.$ to use Haley's method in Cxercine 22 19 Gnd the simple sato uf F(a) =28 ~ 3x 4-2, using py = —24. 7, Suppose the: the equations of motion fora penjecile sxe = fit) = 960001 — x =r) = 24000 ~ e-*'5, (@)_Fitd the elapsed time until impact accurate to 10 decimal places (b) Fit ths cange accurate c 10 decienal places, 5. (a) Find de point on the parsbola y = that i closest tothe point (3, 1) accurate 0 10 dscimal places. () Ftd the point onthe graph of y = sin x — sin(x)) that is elosest tothe point (2.,05) accurate to 10 decimal places. (6) Fitd the value of x at which the minimum vertical distance between he graphs of F(x: = x7 + 2 and g(x) = (4/5) ~ sin(x) occurs accurate to 10 dzcimal places. ‘%. An openttop box is constructes from a rectangular piece of sheet metal measusing 10 by I6 inches. Squares of what size Caceuratet00,00000000 ine) shoul be ett from, the corners ifthe volume of the box is to be 190 cubic inches? {@, A-catenary is the curve formed by a hanging cable. Assume thatthe lowest point is «0,0 then the formula for the catenary is» = C cosh(x/C) ~ C. To determune the catenary that goes through (:ba, b) we musi selve the equation b = Ceoshia/€) ~C fore. (q)_ Show that the catenary through (10,6) is y = 9.1889 cosh(x/9,1889) — 9.1839. (b) Find the catenary that passes through £12, 5).90 CHAP? THESOLUTION OF NONLINEAR EQUATIONS f(x) =@ 2.5 Aitken’s Process and Steffensen’s and Muller’s Methods (Optional) In Section 2.4 we saw that Newton's method converged slowly at ¢ multiple root and the sequence of iterates (p,} exhibited linear convergence, Theorem 2.7 showed how to speed up convergence, but it depends on knowing the order of the root in advance. Aitken’s Process A technique called Aitken's A? process can be used to speed up convergence of any sequence tha is linearly convergen In order to proceed, we will need a definition, Definition 2.6. Given the sequence {n 22, define the forward difference Ap, by aw APn= Foti Pa for n > 0. Higher powers AK;,, are defined recursively by Q) Opn = ak "(Apy) for k > 2. “ ‘Theorem 2.8 (Aitken’s Acceleration). Assume that the sequence {pelo en- verges linearly to the limit p and that p ~ py % 0 for all x > OL If there exists realnumber A with}Al < 1 such that er fi P= Past (Prt = Pal & -~ es = Po Pe Bsa DP + Pa conerge top fast than (pq the sense that 6) Tie, Proof, We will show how to derive formula (4) and will Ieave the proof of (5) at an cexervise. Since the terms in (3) are approaching a limit, we ean write Poe Pee 2 Pest 24 when n is large P-Pr P— Prot ms ‘The elations in (6) imply that o o (~ Paps © (P ~ Put2tP ~ Pa) SBC.2.5 AITKEN’S PROCESS AND STEFFENSEN'S ANDMULLER'S METHODS ST. ‘Table2.10 Linearty Comergent Sequence (py, Ee . En=pn—p | A fem mm—P ane ‘o.soessces0 | cosaeTae9 | ~0. 86616609 ‘oses3e2i2 | —oozisou7e | -0.356119357 ‘os7s7ozos | o.onssesos | —0573400269 0.360064628 | -o.007078663 | —0.5es96551 osmu7aiss | oomazsess | -0.509188345 0.369862987 | -o.00r280343 | —0.566000341 ‘Table211 Derived Sequence {ge} Using tke's Process * A T 03672998 2 | osenssi 3 0.367159364 4 | osertesss ‘.000008163 3 01367144952 ‘o.co0001662 6 0,367143825 o.000000534 ‘When both sides of (7) are expanded and the terms p* are canceled, the result is Pasa —2Povi + Pe “The forma in (8) is used w define he term gq. It an be rarsnged algebraically to ‘obtain formula (4), which has less error propagation when conaputer calculaticns are made ° @ mn for n=O, |, ‘Example 2.18. Show that the sequence (p,} in Example 2.2 exhibits linear convergence, «and show that he sequence {q,} obtained by Altken’s A? process converges faster. ‘The sequence {pq} was obtsined by Rxec-pcint iteration using the function g(x) © and gating with’ pp = 0.5. After convergeace has been achieved, the limit & P 0,567143290. The values pp and qy are given n Tables 2,10 and 211, For illustration, the value of 915 giv2n by the calealation (px= pi? BS PL B= Dp + D1 (0.061298)? 6.606539540 — {0.06129 48F «9 serzagoes, . oe: (0,09575533192 CHAR2 THE SOLUTION OF NONLINEAR EQUATIONS f(s er Soy) So Pa flon) Figure 247 ‘The siarting approximations pp, py, and pp for Muller's method, and the siferences hy and hy Although the sequence {qn} in Table 2.11 converges linearly, it converges fast than (pq) in the sense of Theorem 2.8, and usually Aitken’s method gives a better improvement than this. When aitken's process is combined with fixed-point iteration, the results called Steffensen's acceleration. The details are given in Program 27 and in the exersses, ‘Muller’s Method Muller's method is « generalization of the secant method, in the sense that it doe ‘not require the derivative of the function. It is an ierative method that requires thre: starting points (po, (po)). (Pi. f(p1))» and (p2, F(p2)). A parabola is constructe: that passes through the three points; then the quadratic formula is used to find x roo: of the quadratic for the next approximation. It has been proved that near a simph. 00% Muller’s miethod converges faster than the sevant method and almost as fasta Newton's method, The method can be used to find teal or complex zeros of afunctios and can be programmed to use complex srthmetic. ‘Without loss of generality, we assume that p> is the best approximation 10 thy root and consider the parabola through the three stating values, shown in Figure 2.17 Make the change of variable 0 tarp. and use the difference (19) Po-p2 and hy = pi po Consider the quadratic polynomial involving the variable any af torte. Sec. 2.5 AMKEN'S PROCESS AND STEFFENSEN’s AND MULLER'S METHODS 93 Each point is used to obtain an equation involving a, b, and Attshat a+ Bho +e = fix (2 Ate ahh + bby +e Att a0? +b0 +e From the thizd equation in (12), we see that a3) fr ‘Substituting (13) into the first ew equations in (12) and using the definition eo = fo~ and ¢) = fi —¢ results in the linzar system ahi + bho = fo—c = e0, nny thi + bho = fo 0, abit bhy = fie Solving the linear system fora and b results in gx fh = eho hig nok as) eee ext —eatt hgh — high ‘The quadtatie formala is used to ind the roots # = <1, <2 of (HI): a6) za ne. ba Tac Formula (16) is equivalent to the standard formula for the toons of a quadratic and is better inthis case because we know thet c = f> ‘To casure stability of the method, we choose the root in (16) that has the smallest absolute value. If > 0, use the positive sign with the square root, and if < 0, use the negative sign. Then ps is shown in Figure 2.17 and is given by a pa mte ‘To.update the iterates, choose po and p1 tobe the two values selected from among (Po. pt; #5} that ie closesto ps G.c.throw out the one thet is farthest away). Then re place With ps. Although 2 lot of auxiliary calculations are done in Mullt’s method, itonly requires one function evaluation per iteration, If Maller's method is used to find the real roots of f(x) = 0, itis possible that one may enccunter complex approximations, because the tots of the quacratic in (16) ‘might be complex (nonzero imaginary componens). In these cases the imaginary components will ave a small magnitude and canbe st equal o2er9 so that thecalealtions proceed with real nurabers.94 CHar.2. THE SOLUTION 9F NONLINEAR EQUATIONS f(x) =0 ‘Table 212 Comparison of Convergences near a Sinple Root SEC.2.5 ArTKEN'S PROCESS AND STEFFENSEN'SAND MULLER’S METHODS 9S. ‘Table 2.13 Comparison of Convergence Near a Double Root ‘Seca Malers ‘Rewo’s ‘Sefeaen Sceant Mulls ‘Newic's Seensen 1 eto sod ‘method with Newton k smatbod setbod ‘netiod Newton ‘| ~2eaon00000 | ~2.eooo00000 | —2-40onnean0 | ——2 scoonno ° 190000000, T0000 Pd 10000000 1 | =z4aoocon0o | <2soo000000 | 207610176 | 2176190476, 1 {20000000 § 300000000 Lrosa5036 5103020303 2 | “2inssssss | “zaooonoo0 | —2oassseori | —20035ee011 2 1138461538 1.20000 | 1052386815 1os2ss6417 3 ~2oR26HIS12 =1.985275287 =2.000008589 1198268143, 3 1083873738 1003076023 | 1026900812 0,996850433 4 | -zonisnioss | ~2oonssdos2 | —2.00n0n0000 | 2000201982 4 1 osso93es4 1oosasse22 | Lo1s2s773e «938426023 5 | -20%022557 | —20om00a18 2 vo0000028 5 1002853156 1000027140 oneonsste 599223213 & | —2om000022 | “2000000000 TFovoonz3e9 ‘ 1910429426 0988597914 1008325375 (959959193 7_| =200000000 = 2000000000 1 10 2688627 10999809767 1001683607 (999559597 ™ 8 {oores2 128 "990000000 1 o00852034 cs59959798 ’ oas757 ooasieans (999955999 ‘Comparison of Methods : = Steffensen’s method can be used together withthe Newton-Raphson fixed-point function g(x) = x ~ fC2)/f'Ce". In the next two examples we look atthe roots of the polynomicl f(x) = x3 ~ 3x + 2. The Newion-Raphson function is gf) = x} —2)/(3x2 ~ 3). When this function is used in Program 2.7, we get the ealcula- tions under the heading Steffensen with Newton in Tables 2.12 and 2.13. For example, starting with po = —2.4, we would compute «as i= g(po) = ~2.076190876, and a9) 2 = aps) = ~2.005596011 ‘Then Aitken’s improvement wall give ps = ~1.982618143, ‘Example 2.19 (Convergence near a Simple Roo!). This is a comparison of msthouls forthe function f (2) = x° ~ x +2 near the simp root p [Newion’s method and the secant method for this Function were given in Exsmples 2.14 ‘and 2.16,respectively. Table 212 provides a summary of calculations forthe methods. Example2.20 (Convergencenear a Double Root). This is a comparison of te methoc's for the function f(x) 35 +2 near the double root p = I. Table 2.13 provides » summary of cal-vlations. . Newion’s method isthe best choice for finding a simple root (see Table 2.12). AT double wot, either Muller's metho or Steffensen's method with the Newton-Raphso formulas a good choice (ee Teble 2.13), Note inthe Aitken’s acceleration formala (1 that division by zero can ocear as the sequence (p:} converges. In this ease, the las calculated approximation to zero should be used a the approximation othe 2er of { {In the following program the sequence {py}. generated by Steffensen’s method with the Newwon-Raphson formula, is stored in # matrix Q hat has way rows and ‘ree columrs. The first column of Q contains the initial approximation to the root, ‘Pov and the terms p3. pb...» Pu,» generated by Aitken’s acceleration method (4). ‘The second and third columns of @ contain the terms generatec by Newtcn's method. ‘The stopping criteria in the program are based on the difference between consecutive terms from the first column of Q. Program 2.7 (Steffensen’s Acceleration). To quickly find a solution of the fixed- point equation x = g(x) given an intial approximation po; where itis assumed that both g(x; and g(x) are continuous, —'(x)| < 1, and that ordinary fixed-point {iteration converges slowly linea) to p, function [p,Ql=stefs (# df p0,deita,epsilon mart) Input - f is the object function input as a string °? = af is the derivative of f input as a string ’d!” = 20 is the initial approximation to a zero of f - delta is the tolerance for po epsilon is the tolerance for the function values y = next io the maximum number of iterations Yourput - p is the Steffensen approxination to the zero % ~ Q ig the matrix containing the Steffensen sequence ‘Anitialize the matrix R Rezeros(nazt,3): RGA, 1)=p0;OMLINEAR EQUATIONS f(x for wet:maxt for j= Denominator in Nevton-tapheon nrdenoasfeval (df ,R(k, j-1))5 thod is calculated Yoalculate Mevton-Raphssn approxinations nrdenon==0 ‘division by zero in Nevton-Raphaon method’ break else Ok, J)*R(k, J-1)feval Cf ,ROk, J-1) )/nedanoe; end Denominator in Aitken’s Acceleration process calculate aadenom=R(k,3)-24R(e,2)4R(E, 1); WCalculate Aitkeii’s Acceleration approximations if aadenoa==0 Yalvieien by zero in Aitken’e Acceleration’ break else Red, 136R(K, 1) (Rk, 2)-R(k, 1))“2/aadenom: end ond Wend progran if diviezoa by zero occurroé Af (ardencns*C) | (aadenou==0) break end yscopping crizeria are eva‘uated erreabs(R(k, 1}-R(K+1 1D) relerrmert/(abs(R(c+i 1) delta) ; yofeval (f Re: 107; Af Cerz 1 and shrinks the vector when |e! < 1. Thi is shown by using equa‘ion (10) NeXY = af + Aah + +e2aRy"? Sela? tad te ta? = lolx, ‘An important relationship exists between the dot product and norm of a vector. ‘F both sides of equation (10) are squared and equatien (9) is used, with ¥ being replaced with X, we have ua WX? =p tad +e $3 = XX, If X and ¥ are position vectors that locate the two points (x).22,....4) ant {yt Yos--- yy) im N-dimensional space, then the displacement vector froin X to Y ‘saiven by the difference a ¥—X_ (displacement from positon X to position ¥). ‘Notice that if a particle starts at 1 position X ard moves through the displacement ¥ ~ X, its new position is ¥. This can be obtained by the following vector sum: a) 4, ¥aXxX4(r-2). Using equations (10) and (13), we ean write down the formula for the distance {wo ponis in N-space. ay va 08) =X) = (0) — 9 +02 2)? ++ OW ay?) ‘When the distance between poin's is computed using formula (15), we say that the foints lie in N-dirtensional Euclidean space.104 CHAP.3 ‘THE SOLUTION OF LINEAR SYSTEMS. AX = Example 34. Let X = (2,-3.5, -1) and ¥ = (6, 1,2, ~4). The concepts mentioned above are now ilusrated for vectors in 4-space. Sum xa¥=6,~ Difference x-¥ Seal maltple 3X = 6.-9.15,-3) Length 1X] = 449425 + 9'? = 392 ot product 23 Displacemea. from X 00 ¥ 3) Distance fom X 0 ¥ IX =Uo+16+949)'7 = 50? It is somerimes useful to write Vectors as colurnns instead of rows. For example, x ym 2 ” a) x and aN, Dn, ‘Then the linear combination eX 4 d¥ is ex + dy on tiv an cX+d¥ = caw + dy By choosing c and d appropriately in equation (17), we have the sum 1X + IY, the difference L¥ ~ 1¥, and the sealar multiple cX + OY. We use the superscript for transpose to indicate that a row vector should be converted to a column yector, and x 1 (18) Ginenany =] f and 7 | = (ears cand ‘The set of vectors has a zero element 0, whichis defined by (0.0,..-,01 as) ‘Theorem 3.1 (Vector Algebra). Suppose that X. ¥, and Z are N-dimensional vee tors and a and bare scalars (real numbers). The fol owing properties of vector addition and sealar multislication hold 0) ¥+X=x+¥ commutative property Ql) 0+X=X+0 additive identity SEC.3.1_ INTRODUCTION TO VECTORS AND MATR.CES 105 () X-X=N4(-X) additive inverse 3) (X#VFZ=X+O-Z) associative propeny 124) (@+b)X = aX +bX distributive property for scalars (25) a(X+=aX +a¥ istrbutve property for vectors (26) a(bX) = (ab)X associative property for scalars Matrices and Two-dimensional Arrays A matrix isa rectangular aray of numbers that is arranged systema:cally in rows and ‘columns, A matix having M rows and N columnsis called an Mx 1 (read “M by N”) matrix. The capital letter A denotes a matrix, and the lowercase subscripted leter a,j denotes one of te numbers forming the matrix. We write en Azlayluey for si sMlsisM, Where aij is the number in location (7, ) (Le. stored in the ith row and jth column of the matrix), We refer to aj; asthe element in Iccation (i,j). Im expanded form we an ay ay aw aa an aaj oN 28) revi} on a 4 ay |= aun ays ayy amy, 1 columnn j ‘The rows of the Mx NV matix A are N-dimensional vectors: a Vis Cain ais.-aiy) for = 1 2. 1 row vectors in (29) can also be viewed a¢ | N matrices. Here we nave iced the Mx N matrix A into M pieces (submatrices) that are 1 x N matrices In this case we could express A as an M x 1 matrix consisting ofthe 1 x N row matrices V; thats, G0) A Vac Vee Wl)106 Cap. 3. THE SOLUTION oF Linear SYSTEMS AX = B ‘Similarly, the columns of the Mx N matrix A are Af x 1 matrices: a ay ay ay ay ay an oa C a 7 Fay ain am omy aun, In this case we sould express A as 1 N matrix consisting of the M x I column, matrices C 32) A=[C1 CG Cy] Brample 2. eri ov and oan maces asad wih Be 3 a “03 8 oS ‘Thefourrowmatices ae V;=[-2 4 9].V2=[5 -7 I}, ¥s=[0 -3 8}. and Vi =[-4 6 ~5]) The tree columa maiices are 2 £ 7 oni ee. aeee r 1 cia] G|. co=| 23]. ana es=] f 4 ‘ “5 Nove how A can berepresented with these mates As cr 2G] . Let A = [aiplycy and B = [byly,.y be to matrices of the same dimension, ‘The two matrices A and B are said to be equal if ard only if each corresponding clement isthe same; that is, G3 A=B ifand only if ay=b, fori cis M1 sj matrix A. denoted A’, is the Nx Mf mastix ob ay! from A by convening the rows ofA columns ofA". Thats EA = [oly so A’ Uhlan hen the elements satisfy the relation bn aj for 1 3h 33] n-3 8 Sec. 3.2 PROPERTIES OF VECTORS AND MATRICES. 109 6. The square matrix 4 of dimension N x N is said to be symmetric if A = A’ (see Exercise 5 forthe definition of A’). Determine whether the following square matrices “ 47d m jo 2 35 = a i--j As 7. Prove statements (20), (24) and (25) in Theorem 3.1. (@) A= (aylven, whereay, Properties of Vectors and Matrices ‘A inear combination of the variables x),.x2,--..9 is € sua aw ayy Haan + ‘where ay isthe coefficient of xe for k= 1,2......¥. ‘A linest equation in 1,13... , xy is oblained by requiring the linear combination in (1) to take on a proscribed valve ; thats, a ayy tae + Systems of linear equations arise frequently, and if M equations in N vanknowns axe given, we write ~anxn tayzy =6. an Fane +. agin) tae ++ taavan = 62 ayy = hy o auix taqre +o tauway = be bw. ‘To keeptrack of the different coefficients in each equatioa, xt is necessary use the two fupscripts(k, J). The first subscript locates equation ke and the second subscript locates the variable x, ‘A solution to (3) isa set of numerical values x. x2... xy that satisfies all the equations in (3) simulzaneously. Hence a solution can be viewed as an N-dimnensional vector: “ x yes) ana +o baa (81a BN)110 CHAR 3. THE SOLUTIONOF LINEAR SYSTEMS AX = B Example 34, Concrete (used for sidewalks, etc.) is a mixture of portland cement, sand, ‘and gravel. A distributor has three batches avallatle for contractors, Batch 1 contains ce- ‘ment, sand, and gravel mixed in the proportions 1/8. 3/8. 4/8: batch 7 has the prcpartions 2/10, 5/10, 3/10; and batch 3 hat the proportions 2/5, 3/5.0/5. Let xy.4z, and x5 denote the amount (in cubic yards) to be used from each batch to form 2 mixture of 10 cubic yards. Also, suppose thatthe mixture is 0 contain by = 2.3, by = 4.8, and by = 2.9 eubic yards of portland cement, sand, and gravel, respectively. ‘Then the system of linear equations ofthe ingredients is 0.125x1 40.2009 40.4015 = 2.3. (cement) o 0.37511 +0500%; +0.600x5= 4.8 (sand) 0.500%; + 03002 +0.000x3=2.9 (gravel) ‘The solution tthe linear system (5) is x) = 4, x2 = 3, and xy = 3, which can be verified by direct substation into the equations: (O125)¢4)-+ (0.200903) + 0.4003) = 23 (0.375)¢4) + (0.500)(3) + (0.600)(3) = 4.8 (0.500)(4) + (0.300)(3) + (0.000)) = 2.9, . Matrix Multiplication Definition 3.L. IF A = fatlwew and B = (bej)wxr are two matiices with the Droperty thet A has as many columns as B has sows, then the matrix product AB is defined to be the matrix C of dimension M x P: © AB=C=Leylygrs where the element cy of Cis given by the dot product ofthe ‘th row of A and the jth ‘column of B: y Vance; Bi o oy abs + ai2b2j +--+ andy; for’ = 1,2... Mand f= 1.2.0.5 7. . Example 35. Find the product € = AB for he following matrices, and tell why BA is net defned 2s afi [tg B34 ‘The mavix A has two columas and B bas two r0%s, so the matrix product AB is defined. The product of a2 x 2 and a2 x 3 matrix is a2 x 3 matrix. Computation reveals Sec. 3.2 PROPERTIES OF VECTORS AND MATRICES a that 2 ays 2 7 aT IB 3 J 149 4424 2-18] _fi9 20-16] _ = [-s412 2432 -1-24) 7 3 -25, When an attempt is made to form the product B.A, we ciscover thatthe dimensions are ‘ot compatible inthis order becease the rows of B are tee-dirmensional vectors and the ‘columns of A ar two-dimensional vectors. Hence the dot product ofthe jth row of B and the kth column of A is not defined. . I it happens that AB = BA, we say that A and B coramute. Most cften, even when AB and BA are both defined, the prod.cts are net necessarily the same. ‘We now discuss how to use matrices to represent a linear system of equations ‘The linear equations in (3) can be written as a matrix product. The coeffc-ents ay; are stored in a matrix A (called the coeficient matrix) of dimension Mf x Nand the unknowns x, ae stored ina matrix X of dimension N x 1. Tke constants by are stored in a matrix B of dimension Mf x 1. It is conventional to use column matrices for both X and B and write an aa ay aw] fn) Pos aa az aay aay | fn] | be x= : 7 AEE a an os aay oan | fay | = | by ami ann ~~ amy > aunl Lew) Lom, ‘The matrix multiplication AX = B in (8) is reminiscent of the dot product for ordinary vectors, because each element by in B is the result obtained by taking the dot product of row ¢ in matrix A withthe column matrix X Example 3.6. Express the system of linear equations (5) in Example 3.4 ass matrix product, Use matix muliplication wo verify that | 3 3) isthe solution of 3): 0.125 0,200 0400] fu] [235 o 0375 0.00 0600! | x2| =| 48 0500 0.300 000} [x5] 29 ‘To verify tha (4 3. 3] isthe solution of (5), we must shaw that A{4 3. 3] 3 48 29]: 0125 0.200 0.4007] [4] fos+o6e12] [2.3 0375 0.500 0.600] /3/=]/15415+18/ =| 48]. 0300 0,300 0.000} [3] [20+09+00} [29 .112 CHAP.3 THE SOLUTION OF LINEAR SYSTEMS AX = 8 ‘Some Special Matrices ‘The M x N matrix whose elements are all zero is called the tere matrix of dimension Mx N and is denoted by 30) 0= lly. ‘When ie dimension is clear, ve use 0 1 denote the 2eco mati. “The identty matrix of order N isthe square matix given Ey 1 wren! — j, ay Ty=Hidvey Where By = ‘Mis the multiplicative entry, a8 usta ip the next example. Example 3:7. Let A bea 2 +3 matris. Then Iz = Aly = A. Muluplicatio 0° A on the leftby 72 results in 1 O]fon az ap] feu+0 aad an +0) > tJ[an a as] [or +0 2240 aas+0) Multiplication of A on uh rigin by 1 results in : 106 - afte iah au t040 O4us-0 040-00) gy jee [: ' ‘| fentec5 ofS8c0 otosan|-4 Some pecperties of matrix multiplication are given in the fcliowing theorem. ‘Theorem 3.3 (Matrix Multiplication). Suppose that cis a scalar and that A. 6 and € are matrices such dat tne indicatec sums and products are defined: then (12) ABC = ABO, associativity of matiix multiplication (3) [A= AI~A identity matrix (4) ADC) =AB+ AC left distributive property (1S) (A+ BIC —AC+ BC right distributive property (16) (AB = (cA\B = ACB) scalar associative propery ‘The Inverse of a Nonsingular Matrix The concept uf an inverse applies to matrices. hut special atention must e given._\1 No N inatrix A is called nonsingular orinventble if thereexists an W x N wets such that an AB See 32 PROPERTIES oF VECTORS AND MATRICES, ua Ino such matrix B car be found, A is said to be singuler. When B can be found and (17) holds, we usually write B= A~- and use the familar relation as) AAT =A'A if A is nonsingalar, Itiseasy to show tha: et most one matte B can be found that satisfies relation (17). Suppose that C is also an inverse of 4 (ie., AC = CA = 1). Then properties (12) and (13) can be used to ebsain Determinants ‘The determinant of a square matrix A is a scalar quantity (ceal number) and is denoted by det(A)or |A\. If Aisa N x N matrix P89 3) am. ane ++ ane vas lem ane aww Although ihe novation fora determinant may look like a matrix, its properties are com: pletely different. For one. the determinant is a scalar quantity (feal number). The ddefnitcn af det() found in most linear algebra textbooks is not tractable for compu tation when N > 3. We will review how to compute determinants using the cofactor ‘expansion method. Evaluation of higher-order determinants is done using Gaussian 2 chen let Afi be the determinant ofthe Nt x NV ~ | submatrix of A obtained bby deleting the ith tow and jth column of 4. The determinant #,;, is said to be dic minor of 3. ‘The cofactar Ay, of a is Uefined as Aj, = (—1)'JM,;. Then the determinant of an N x Nrratrix A is given >y 09 dew) Ya Aiy i row expansion) mi11d Cuar. 3. THE SOLUTIONOF LINEAR SYSTEMS AX = B em dey) = Yr ayAyytheolunn expansion Agen frac (19), 1 = tte? 2 matt a=[u 2] stat 222 ceca) ik lg Gaia lsd eve formulas (193 and (20) to recursively reduce the calculation of the determinant of an 'N x N matrix to the calculatien of a number of 2 x 2 determinants, Example 38, Use formula (19) with i = 1 and formula (20) with j ~ 2 to caleulate the determinant ofthe matrix 308 3-1 6 9 Using formula (19) with ¢ = 1, we obtain 5 | =i! j-4 -1) i-4 5} 1a=@|8 “Yo aanal § -of 4 Gl+e/4 3) = 21S ~ 6) — (36-474 )AA~35) > Using formula (20) with j = 2, we obtain 4-1], ogi 8 2 4 -af} Yeoh §-col} Al =n. . ena) ‘The following theorem gives sufficient conditions forthe existence and uniqueness ‘of solutions ofthe linear system AX = B for square coefficient matrices. ‘Theorem 34. Assume that 4 is an N x N matix. The following statements are quivalent. (21) Given any x 1 matrix B, the linear system AX = (22) The mavrix A is nonsinguiar ie.. A~} exists) (23) The system of equations AY =O has the unique solution X = 0. (24) deta) #8 0. has @ unique solution, SEC.3.2 PROPERTIES OF VECTORS AND MATRICES, us ‘Theorems 3.3 and 3.4 heip relate matrix algebra to orcinary algebra. If state- ‘ment (21) is tue, then statement (22) together with properties (12) and (3) give the following line of reasoning: 25) AX=B implies AAX=A™'B, whichimplies X= A7'B, sL2 3] and the reasoning in (25) to solve the linear system AX = B: “8 )-Bl-« ago 4 UP] LIP) tos eaten sls ME)= silos : Remark. In practice we never numerically calculate the inverse of « nonsingular matrix of the determinant of a square matrix. These concepts are used a theoretical “*ools” to establish the existence and uniqueness of solutions or as a means 0 algebraically express the solution of a linear system (as in Example 3.9). Example 3.9. Use the inverse mati: Plane Rotations Suppose that Aisa3x<3 matrix and U = [xy <]{isa3x 1 mati; then he produc V = AU is mother 3 x 1 matrix. This isan cxanple ofa linear transformation and aplicaion are found inthe aca of computer graphics. The matin U is equivalent to th postional vector U = (x,y). which represents the coordinates of point. in thre dimensional space. Consider thre special mates 10 o 7 26) Ry(@) =] 0 cos(a) —sinva) 0 sina) costa) } cos(A) 0 siniay en RO=} 0 1 0 4, ~sin(B) 0 cos(f), costy) —sin(y) ° 28) Ray) ={siny) —cox(y) 0. 0 o if116 CHAP.3 THE SOLUTION OF LINEAR SYSTEMS AX = B ‘Table 3.1 Coordinates of the Vatces of @ Cube under Successive Rotations v Vo Rig)u Wa mR (DU O00 {@.00000, 0. omIOO0 OF ‘Croo0000, 0.0000, 0.0000007 oor {omer armor oy (0.612372, 0.07107, -0353583) 10) (0.707107, 0207107, 97 Co.er2¥73, 0707107, 0353853) ory {2000000 0 60000, 1° {@.s00000 6 0000, 0. 866025)" a, Loy (0.000000, 1 + 14214, 0)’ (0.000000, | 414214, 0.000000)" .0.1¥ (0.707 107, 0.707107, 1) (1.112372, 0.707107, 0.512472)" ony (070707, 070167. 7 {0.112972 0707107, 1.219579) ag (0.00000, 118218, 17 {0300000 416214, 0.866025) ‘These matrices R,(«), Ry(B), and R.(y) are used to rotate points about the x-, =, and z-axes through the angles «, f, and y, respectively. The inverses are Ry(~a), R,(—B), and Re(—y) and they “otate space abont the x-, y-, and z-axes through the angles ~a, ~f, and ~y, respectively. The next example illustrates the situation, and further investigations are left for ‘he reader. Example 3.10. A unit cube is stuated in the firs: octant with one vertex at the origin, First, rotate thecabe through an angle /4 about the z-axis; then rotate this image through ‘an angle +/6 about the y-axis. Find the images of all eight vertices ofthe cube. ‘The frst rotation is given by tke transformation ny, [eos ~sincgy 07 Px v=R,(2)u=| sind) — cost) o||y cen [RB SB of ‘o.707107 0.707107 9.000000) Fx 0.707107 0707107 9.000000) | y .c00000 0.000000 1.000000 } [ ‘Then the second rotation is given 3y cost) 0 sin) v=| 00 1 0 |v sin) 0. cost). ‘0.866025 0.000000. 0.500000" 10,0¢0000 1.000000 0.000000 | ¥ 0.500000 0.000000 0.866025, ‘The composition ofthe two rotation is 0.612372 0.612372 0.500000) Fx" 0.353553 0.353553 0.866025 | | 2 wea (Da(Se [ sir ourion oamoon| | > SEC.3.2 PROPERTIES OF VECTORS AND MATRICES uw 2 2 Gl a ji i ’ y x : y cc) ® © Figure 3.2 (2) The original sang cube. (b) V = Re(or/4)U. Rotation about the #-04s. (€) W = Ry(x/6)V. Rotation about the y-axis [Numerical computations for the coordinates of the vertices ofthe starting cube are giveain ‘Table 3.1 (as positional vectors), and the images ofthese cubes are shown in Figure 3.20) through ) . MATLAB ‘The MATLAB functions det(4) and inv(A) calculate the deteminant and inverse GEA is invertible), respectively, of a square matrix A. Example 3.11. Use MATLAB to solve the linear system in Example 3.6, Use the inverse ‘matrix method described in (25). First we Verify that A is nonsingular by showing that det(A) #0 (Theorem 3.4). >>A=[0.125 0.200 0.400;0.375 0.500 0.600;0.500 0.300 0.0001; >>aet (a) 0.0175 Following the easoning in (25), the solution of AX = B is X = 47! >oKeinv(a)#:2.3 4.8 2.9)" x 4.0000 3.0000 8.0006 ‘We can check our solution by vertying that AX = B. >>Baaex B 2.3000 44.8000 2.9000 .LIS Cuap.3 THE SOLUTION oF LINEAR SYSTEMS AX Exercises for Properties of Vectors and Matrices ‘The reader is encouraged to carry out the following exercises by hand and with MATLAB. 1. Find AB and BA for the following matrices: 3. tA, Band te seat i ag dh = (Fed ABC wa 480 () Foaataecym an ac (© Fadia Diem ac +00 (a) Find (4By' and BA’ 4, We use the notation A? = AA. Find A? and B® forthe following matrices: afi ape ei : =] a 0 6 © FEY ee Pee 3-5 2 rede 2 : ° [ ; 24s PEt. oo 7 eee aE yay ee cet ean ad Re(—1; (ee formal (26)) 4. (a) Show that Rs(a)Ry(B) = cos(B) 0 sin(p) sin(B) sina) cos(a) —c0s(B) sin a —cos(a) sin(P) sinfa’) e0s(A) cosa) (Gee formulas (26) and (27). Sec.3.2 PROPERTIES OF VECTORS AND MATRICES 49 () Show that Ry(B)Rs(a) cos(8) _sin(f) sin(a) se 0 cesta) = sin(a) sine) c0s(Z)sin(a) cos(B? costa), 8, IFA and B are nonsingular NN matrices and C = AB, show that C-' Hint. Use the associative property of matrix muluplicaion, 9. Prove statements (13) and (16) of Theorem 33. 10. Let A be an M > N matrixand X an N x 1 matrix, (@) How many multiplications are neeced to caleulele AX? (b) How many additions are needes to calculate AX? IL, Let A be an M x N matrix, and let B and C be Nx ® matrices. Prove the left —>se TT ——_. ‘The first column of Table 3.1 contains the coordinates ofthe vertices ofa unit cube situated inthe first octant with one vertex atthe origin. Note that all eight vertices can be stored in ‘a matrix U of dimension 8 x 3, where each row represents the coordinates of one of the vertices. It follows from Exercise ld that the product of U and the transpose of (7/4) will produce a matrix of dimension 8 x 3 irepeesenting the secand column of Teble 3.1, ‘where eech row represents the transformation ofthe corresponding row in U). Combining this idea with Exercise 15, it follows that te coordinates of the vertices of a cube under any numberof sucvessive rotations can be represented by a matrix product. 1. A unit cube is situated in the first octnt with one vertex atthe origin, Firs, rotate the cube through ao angle of x /6 about the y-axis; then rotate this image though an angle of 7/4 about the z-axis. Find the images ofall eight vertices ofthe starting ‘cube. Compare this result with theresa in Example 3.1033 120 CHaP.3. THESOLUTION OF Livan SvSTEMS AX = B ® © @ igure 3.3. («) The orginal suring cube. (b) V = Ay(n/6)U). Rotation abosr the y-2uis. (c) W = (/4)V- Rotation about the zane, What is different? Explain your axswer using the fact tht, in general, mattix ul ‘plication is a0: commutative. (See Figure 3.32) 10 c)), Use the plot command 0 plot each ofthe tree cubes, ‘A unit cube is situated in the fst o=tant with one vertex atthe origin. First, rotate the cube through am angle of 3/12 about the x-axis; then rotate this image through angle of /6 about the z-an'. Find the images of all eight vertices of the starting, cube, Use the ptot3 command: plot eech ofthe three cubes, 3: The tetrahedron wich vertices at (0.0, 0), (1, 0,0), (C, 1.0), and (0,0, 1 (ated through an angle of 0.15 radian about the »-cxis, then through an angle of ~_-S radians atout the =-axis, and fally tcugh an angle of 2.7 radians about the s-axis. Find the images of all four vertices. Use the plot command to plot each of the four images. Upper-triangular Linear Systems ‘We will now develop the back-substitution algorithm, which i useful for solving a lin ‘cag system of equations that has an upper-triangular coefficient matrix. This algorih will be incorporatedin the algorithm for solving a general linear system in Section 3.4 Definition 3.2. An.N « N matrix A = [a] is called upper triangular provided that the elements satisfy si, = O whenever é > j. The Wx N matrix A = (ai) iscalled lower triangular provided that a1; = 0 whenever < j We will develop method for const-ucting the solution to upper-triangular linea ‘systems of equations and leave the investigation of lover-triangular systems to the reader. If A is an uppeririangular matrix, then AX = B is said to be an upper SEC. 3.3 UPPER-TRIANGULAR LINEAR SYSTEMS at ‘riangular system of linear equations and has the form ay sytaees bases tet aint aiwen = by aarrtanss bt await ayan = bp agate t aqwanwait aye = bs wo N-ANP + ayy = buat by. auvrw ‘Theorem 3.5 (Back Substitution). Suppose that AX’ = Bis an upper-tiangstar system with the form given in (1). If @) ay #0 for ‘then there exists a unique solution to (1). Constructive Proof. The solutios is easy to find. The last equaticn involves only xy. so we solve it first: a [Now zw is known end it can be used in the next-to-last equation: byt = anne w away = PMSE a ahaa @N-iN=t Now x and ty. are used to find xy-2: by — ay-2n—1X4=1 ay-2W? © sey = PMD Oman Et ~ a2 N-2n-2 (Once the values 2.2-1,..-+ke1 af known, the general step is 4 i ge RENO ie pew ma, ar The unguenes of the sla is easy wo ee. The Mth equdon implies a by/enw i te oly possible vale of ty Thee nite induson's edo eab Bah Eee eeee eee cree 7 Example 3.12. Use back substituton o solve the linear system dy +2n +34 = 20 wey + 733 — hag = 7 Gytime 4 Bus 6322 CHAP.3 THE SOLUTION OF LINEAR SYSTEMS AX = B ‘Solving for 2 inthe last equation yields 6 a= $=2 Using 14 = 2in he tied equation, we obtain SQ) wets ae used to rind x2 in the sezond equation: TATED +4Q) 2 Land xe Finally. x1 is obtained using the first equation: 20+ N-#) - 20-9 = 32) ay = Rte 3) 3 . ‘The condition that aye 7 0 is essential because cquation (6) involves division bby agg. If this requi-ement isnot fulfilled, either no solution exists or infinitely many solutioas exist Example 3.13. Shew that there is nc solution tothe linea system o Using the last equation in (7), we must have 4 = 2, wich is substnuted ato the soon and hid equations te obtain Ts 8=-7 at 6a +l0= 4 ‘The fist equation in (8) implies that x3 = 1/7, and the second equation implic> hat xy = —1. This contradiction leads te the conclusion that there is no solution t0 «lin ear system (7) . [Example 3.14, Show that there ae infinitely many solutions to o SEC.3.3UPRER-TRIANGULAR LINEAR SYSTEMS 123 Using the las squation in 9), we must have x4 = 2, which is substituted into the second and third equations to get x) = —1, which checks out in both equations. But only two values 13 and 24 have been obsained from the sevord through fourth equations, and when they are sabstinvted into the fist equation of 2), the result is 0) sae dry 16 ‘ich has infinitely many solutions; hence (9) has ininitely many solutions, If we choose a value of in (10), then the vaiae of x2 is uniquely determined. For exammpe, if we include the equation x1 = 2 inthe system (9), then from (10) we compute x ‘Theorem 3.4 states thatthe linear system AX = B, where A is an NV x N matrix, ‘has a unique solution if and only if dei(A) # 0. The following theorem states that if any enuty or. the main diagonal of an upper- or lower-triangulas matrix is zero then det(A) = 0. Thus, by inspecting the coefficient matrices in the previous three examples tis clear thatthe system in Example 3.12 has a unique solution, and the systems in Examples 313 and 3.14 do not have unique solutions. The proof of Theorem 3.6, ‘can be found in most introductory linear algebra textbooks. ‘Theorem3.6. Ifthe Nx W matrix A = [a,j] is either upper or lower wiangular, then an ‘The value of the determinant for the coefficien: matrix in Example 3.12 is det A A(-2(6)'3) = —144. The values of the determinants of the coefficient matrices in Example 3.13 and 3.14 are both 4(0)(6)(3) = 0. ‘The following program will solve the upper-tiangular system (1) by the method of back substitetion, provided ans #0 for k= 1,2,...,.N. | Program 3.1 (Back Substitution) To solve theupper-viangulasystem AX = B. by the method of back substitution. Proceed with the method only if all the diagonat | elements are ronzero, First compute xy = by /awy and then use the rale | be o ny Thay ge ee fanctjpp X=kackoub(h,B) input - A is an a x a upper-triangular nonsingular matrix x - Bis ann x 1 matrix output - X is the solution to the linear aysten AX = B Wind the dizoneion of B and initialize X arlength(8) ; Xezeros(e, 1);124 Quan. 3 THE SOLUTIONOF LiNRAR Systems AX Kin)=B(a)/A(n,2) + for ken-4:-tel KER) (BQO “ACK. #1 sa) ¥ Cee 22) 9/0 5 ond Exercises for Upper-triangular Linear Systems In Exerises | through 3, solve the epper-triangular system and find the value ofthe dete: ‘inant ofthe coefficient matrix. 1 3 -tn+ o- = 8 2 Sy-e- Tt weal 4n- nt tye} Nap 49x Sum 2 2x4 3x4 1 323-1 = <1 Sea 15 Tees 14 3. a= mat Pen +20 23 mda + Gra + Deg + 1 aSwos Diy a5= be 4. (a) Consider the two uppertriangular matrices 0 0 oy, ° bn bg 0 an ar au bn A=|0 az a| ad B=} 0 ‘Show hat their product C = AB is also upper triangular (8) Let A and B be to #7 > N upper-tiaagelar matrices. Shov that their prods is also upper triangular, 45, Solve the lower-rangular system AX = B and (ind dew A), =6 fae 5 Qn-m 0 =4 n+ 6x) + 3% =2 6, Solve the lower-wiangular system AX = B and find det. Sx =-10 e438 = 4 Bet4mt2y = 2 x t3n-6n-4= $ bis bas bs ] Sie 3.4 GAUSSIAN ELBAENATION AND PIVOTING 1s 7. Show that back subsintion requires N divisions, (WW? — )/2 multiplications, and (OF ~ ¥)/2 adgitiors er subuactions. Hint, You can use the formula Veamarene. Algorithms and Programs 1 Use Program 3.1 t0 solve the system UX = B where, ie fcosti)) lo lui Nion10 0d ay and B = (bir how and Bix = tan), is). 13. 2. Forward-substination algorithm. A linear system AX = B is cilled lower triangular provided that O when i < j. Construct 2 program foreub, analogous 10 Program 3.1, to solve the following lowersriangular system, Remark. This program willbe used in Seetien 25. m+ ane + aes} anes ayaun bayer bayeises + ay sa + anes + anaes + aN iN=LsN + au ween rub o solve the system LX = B, where L=Cyloao and ty (2) and B Gaussian Elimination and Pivoting tanway =p; Fb bass ane by In this section we develop « scheme for solving a general systen AX = B of N xuations and N unknowns. The goal isto construct an equivalent upper triangular system UX = ¥ that can be solved by the method of Section 3.3 ‘Two linear systems of dimension N x N are said to be equivalent provided that their solution sets are the sanse. Theorems from linear algebra show that when seri tansformations are applied ta given system the solution sets dor change.126 CHAP. 3 THE SOLUTIONOF LINEAR SYSTEMS AX = B ‘Theorem 3.7 (Elementary Transformations). The following operations applied to «linear system yield an equivalent system: (1) Ioterchanges: The order of two equations can be changed. 2) Scaling: Multiplying an equation ty a nonzero constan: (3) Replacement: An equation can be replaced ty the sum of itself and ‘a nonzero multiple of any other equation. 1s common to use (3) by eplcing an equation withthe difference ofthat equation aad a multiple of another equation, These cncepts are illustrated in the next example. Example 15. Find the parabola y = A + Bx +Ce” tht pases through the dee points (1,1), and G1) Foc exch point we obtain an equation relating the value of x tothe value of y- The result isthe linear sytem At BH 1 aay o AS2B-4C=-1 2Q.-1) A+3B-9C= 1 4G.) The variable A is eliminated from the second and third equations by subtracting the first equation from ther. This isan application ofthe replacement transformation @), and the resulting equivalent linear system is © ‘The vaiable B is elimicated from the third equation im ($) by subtracting from it wwe times the second equation. We arrive a the equivalent upper triangular systen: oO “The back-substittion algorithm is now used to find the coefficients C = 4/2 = 2.8 = =2 = 32) = 8, and A = 1 (-8) — 2 = 7, and the equation of the parabola yaT HBr $207 * It's efficient to store all the coefficients ofthe near system AX = B in an array of dimension N x (N+ 1). The coefficients of B ae stored in column N + 1 of th: array (.e., azw 41 = bi). Each row contains all the coefficients necessary to represei fan equation in the linear system. The augmented matrix is denoted [A\B} and th: SEC. 3.4 GAUSSIAN ELIMINATION AND PIVOTING a linear system is represerted as follows: aa aw | by ry tape | MR ae ant anz --> ayy | by ‘The system AX = B, with augmented matrix given in (7), can be solved by performing row operations on the augmented matrix [A|B). The veriables xy are place- hholders for the coefficients and can be omitted until the end of the calculation, Theorem 3.8 (Elementary Row Operations). The following operations applid to the augmented matrix (7) yield an equivalent linear system, (8) Interchanges: ‘The order of two rows can be changed. (9) Scaling: Maltiplying 2 row by a nonzero constant, 10) Replacement: The row can be replaces by the sum ofthat row and ‘anonze-0 multiple of ary other row: that is: TOW, = FO, —Mrp TeWp- It is comanon to use (10) by replaging a row with the difference of that row and 2 hultiple of another row. Definition 3.3 (Pivot). The number a,, in the coefficient matrix A that is used to eliminate are, where k = r+ 1, r +2, +. H, is called the rth pivotal element, and the rth row is called the pivot row. s ‘The following example illustrates how to use the operations in Theorem 338 to obtain an equivalent upper-triangular system UX = ¥ from a linear system AX = B where A is an Nx N matrix, Example 3.16. Express the following system in augmented mattx form and find an ‘equivalent uppee-tiangular system and the soluton. a2 gt dey = 13 Dey 4 Org + 4x5 + 3xq = 28 fey + 2x.4209+ 44 = 20 i+ at3nt2a= 6 ‘The augmented matrix is128 Char. 3. THE SOLUTION OF LINEAR SYSTEMS AX = B ‘The first row is used to eliminate elements in the first column below the diagonal We refer to the fest row as the pivotal ow and the element aii = is called the pivotal element, The values maze the multiples of row 1 that are tobe subtracted from row b for &=2.3.4. The result after elimination is 3 2 32 45 The second row is used t9 eliminate elements in the second column that tie below the diagonal. The second row is the pivotal row and the values mya ase the muliples of row 2 that are to be subtracted from row k for t = 3, 4, The resut after elimination s 2 1 4) 3 1 o-4 2 -5 2 pots | 0 0 =S 75-35 ma = 0 0:95 525 485 Finally, the multiple may = —1.9 of the third cow is subwacted from the fourth rox. ind ‘the result isthe upper-riangular system 121 4) 3 o 4 2 -s| 2 an © 0 5 ~35}-35 © 0 0 -9]~18 The back-substtution algorithm can be used to solve (1), and we get 422 gee mend The process described abave is called Gaussian elimination and must be mcdified so that it can be used in most circumstances. If aix = 0, row k cannot be used ro eliminate the elements in column k, and row k must be interchanged with some row below the diagonal to obtain a nonzero pivot e'ement. If this cannot be done, then the coefficient matrix ofthe systema of linear equations is nossingular, and the systera does nothave a unique solution. ‘Theorem 3.9 (Gaussian Elimination with Back Substitutior). If A is an W x N ‘nonsingular matrix, then there exists a system U'X = Y, equivalent to AX = B, where U is an upper-triangular matrix with uj 0- After U and ¥ are constructed, back substitution can be used to solve UX = ¥ for X. 'SBC.3.4 GAUSSIAN ELIMINATION ANE Pivori NG 129 Prod, We will use the angmented matrix with B stored in column N +1: ‘ait! gi) gilt “, Ma) ais aty | [= go gd gill oO nay ay aay | | 2 ag Qo ao ax= [as au) af aw | | 33 aD Qt y ©, ay ai afk ay | [aw] [ave ‘Thet we will construct an equivalent upper-riangular system UX = ¥: ‘a? aD af a, 0 De > aty tw | Par) [eres 2 4 2, 2, 0 ay of av] [a2] | ata ° oy 1 wx} 9 8 a ow a Tas | fasta | ay, a || wo, 0 0 oO oie] fw] lal, Step 1. Store the coeffciemts in the augmented matrix. The superscript on af!” means that this isthe first time that a number is stored a? a af Mt ay iy og gw ay) ayy asp Og a a) ayy ayo a aid ad ML all ah im locatioa (x, €) 1) 40 ay | tne | go ay | Bia | go aN | Sires a | gu ony | aves “0 Sep 2. If necessary, switch rows so that af)) # 0; then eliminate x) ia rows 2 through 1. In this process, m1 is the multiple of row 1 that is subtracted from row ter=2:N m= al af =0. eviews a = afl — my os «1390 CHAP 3. THE SOLUTIONOF LINEAR SYSTEMS AX = B ‘The new elements are written a{2 to indicate that this is the second time that a ‘number has been stored in the matrix at location (r, c). The result after step 2 is > gid Qi > {4 aly als ain | het 2 2 | 4 0 oP aS | OF 2 4 2 | 42 0D oS | Siar 2 4 @ | Ooi ays oN | aia Step 3. Uf necessary, switch the second row with seme row below it $0 that 0 0; then eliminate x2 in rows 3 through N. In this process, m,a is the multiple of row 2 tna is subtracted from row r aM =e: 0; Beare 22 ma eo ‘The new elements are written a{_’ to indicate that this is the third time that a nun ber has been stored in the matrx at location (r,c). The result after step 3 is DQ... a | a® " aD at af oct | athe a 0 ap a oof | oF 00 a aS | aan a 2 |. RC Step p +1. Tiss the genera step. W necessary, switch rw p with some rw beneath it 0 that (2 3 0 then eliminate x in cows p+ I dough N. Here ms the mug of ow p that is unrated fom fo forr= pti: try = alp faye: =o; SEC.3.4 GAUSSIAN ELIMINATION AND PIVOTING 131 foe=p+i:N41 al” =a? — mp, eal? 2°) = ah — mrp wai; end end ‘The final result after xy has been eliminaced from row N is oo “0 ain aie aty my | tN 2 42 @ | 42 0 ay ay ety | ave 2) 2 | 2), 0 2 ay ain | B01 cee at | aR ‘The upper-tiangularization precess is now complete. ‘Since A is nonsingular, when row operations are performed the successive matrices are also nonsingular. This guarentees that aft) ¥ 0 forall kin the construction process. Hence back substitution can be used to solve UX = ¥ for X, and the theorem is prove. Pivoting to Avoid aff) = 0 iF a) = 0, ow p cannot be used to eliminate the elements in column pbelovr the ‘main diagonal. It is necessary t find row k, where ai2) 7 Oand & > p, end then in terchange row p and row so that a nonzero pivot element is obtained, This process is called pivoting, and the criterion for deciding which row to choose is ealleca pivoting suategy. The trivial pivoting scatey isa follows. Teo? 0, donot swich cows, If. ajf) = 0, locate the fist row below p in which a?) 3 0 and switch rows & and p. This will resut in anew element a) #0, which sa nonzero pivot element. Pivoting to Reduce Error Because the computer uses fixed-precision ariuhmentc, ic ls possible that a small enor Will be introduced each time that an arithmetic operation is performed. The following example illustrates how the use ofthe trivial pivoting strategy in Gaussian elimination can lead to significant error in te solution of a linear system of equations.12 CHAP.3. THE SOLUTION OF LINEAR SYSTEMS AX = B Example 3.17, The values xj = x9 = 1.000 are the solutions to 138K; +5.28Le2 = 6.414 24.14x) = 1.210% ap Use four-digit arithmetic (see Exercises 6 and 7 in Section 1.3) and Gaussian elimination vith trivial pivoting «find a computed approximate solution to the system. “The muliple ma, = 24.14/1.133 = 21.31 of row 1 isto be subtracted from row 210 cbtain the upper-triangolar system. Using four digits in the calculations, we obtain the new confficients| a = -1.210-21.316.281) = -1.210- 112.5 =~1137 a= 2293-21316414)= 22.93 136.7 =-113.8 “he computed upper tiangular system is 1.133x) + 5.2810. = 6.414 =11379 = “1138. ‘Back substitution is used to compute x2 = —113.8/(~ 113.1) = 1.091, and.x1 = (6.414 = '5.281(1.001))/(1.133) = (6.414 ~ 5.286)/(1.133) = 0.9956. . ‘The error in the solution of the Iinear system (12) is due to the magnitude of the multiplier m3, = 21.31, In the next example the magnitude of the multiplier m2} is reduced by fist interchanging the first and second equations inthe linear system (12) and then using the uvil pivoting strategy in Gaussian elimination te solve the sytem Fxample 3.18. Use four-digit arithmetic and Gaussian elimination with trivial pivoting te solve the linear system 2A.14ey | 21037 = 22.93 1.1334 + 5.281x7 = 641. This time mg, = 1.133/24.14 = 0.04693 is the multiple of row 1 that is 10 be subiricis from row 2. The new coefficients are 5.281 — 0.04693(~1.210) = 5.281 +0.05679= #338 = 6414 —0.08693022.93) = 6414-1075 = $338, ‘The computed uppersriangular system is 2A.Ldxt = 1.210% = 22.93 5.3384 = 5.338, Back substitution is used to compute x2 = 5.338/5.338 = 1.000, ard xi = @233-+ 1 210(1,000))/(24.14) = 1,000 . Sec.3.4 GAUSSIAN ELIMINATICN AND PIVO"ING 1 ‘The purpose of a pivoting strategy is to -nove the entry of greatest magnitude tc the main diagonal and then use icto eliminate the remaining 2ntres in the column, there is more than one nonzero clement in column p that lies on or below the mait diagonal, then there is a choice to determine which rows to interchange. The partia, pivoting strategy, illustrated in Example 3.18, is the most common one and is used ir Program 3.2. To reduce the propagation of srror, itis suggested that one check the magnitude of all the elements in column p that lie on or below the main diagonal Locate row & in which the element that has the largest absolute value lies, thats, (ap = maxes lapsnple-o-4 ange Mpls and then switch row p with row k if k >. Now, each of the multipliers try for 1 = p+ Io... will be less than or equal 101 in absolute valve. This process will usually keep the relative magnitudes of the e-ements of tke matrix U in Theorem 3.9 the same as those inthe original coefficient matrix A. Usually, the choice ofthe larger pivot element in partial pivoting will resul in a smaller error being propagated In Section 3.5 we will find tht it takes a total of (4N3 + 9N? — 7N)/6 arithmetic ‘operations to solve an NN system, Waen N= 20, the tal number of arthmetic ‘operations that must be performed is 5910, and the propagation of errr in the compu tations could result in an erroneous answer. ‘The technique of sealed partial pivoting ‘or equilibrating cen be used to further reduce the effect of errce propagation. fn scaled partial pivoting we Search all the elements in column p that lie on or below tre main Aiagonal forthe ove that is largest relative to the entries in its sow. First search rows p ‘through W forthe largest element in magnituce in each row, say sy: ans, smaxllarp|lpeiiosos epi) for r= ppb AL a. N, ‘The pivotal row kis determined by finding ait lag! ox et el, lt rs Sp Spe ay Now interchange mow p and k, urless p = k. Again, this pivoting process is designed to keep the relative magnitudes of the elements in the matrix U in Theorem 3.9 the ‘same as those in the original coeficient matrix A Mi conditioning ‘A matrix A is called ill conditioned if there exists a matrix B for which small pertur- ‘ations in the coefficients of A or B will produce large changes in X = A~!B. The system AX = Bs said to be ill conditioned when A is ill conditioned. In this case, ‘pumnerical methods for computing an approximate solution are prone to have more error One circumstance involving ill conditoning occurs when A is “nearly singular” and the determinant of A is close to zero. Ul conditioning can also oceur in systems134 CHAP.3 THE SOLUTION OF LiNEaR SYSTEMS AX = 6 2eeayaa os 03,06) Figure 34 A repon where wo 05101520 sguaions ae “alot satised of two equations when two lines are nearly parallel (or in three equations when three planes are nearly parallel). A consequence of ill conditioning is that substitution of erroneous values may appear to be genuine solutions. For example, consider the two equations x+2y-200=9 eo 2x+3y— 3.40 =0. Substitution of x = 1.00 and yp = 0.48 into these equations “almost produces ze70s" 1 +2(0.48) —200 = 1.96 ~ 2,09 = ~0.04 = 0 2+ 3(0.48) ~340=3.44-3.40= 0.04% 0. Here the discrepancy from 0 is only +£0.04. However, the true solution to this fin tar system is x = 0.8 and y = 0.6, so the error: in the approximate solution are x — x9 = 0.80 ~1.00 = 0.20 and y — yo = 0.60 ~ 0.48 = 0.12. Thus, merely sub: stitutng values irto a set of equations is not a reliable test for accuracy. The thombus shaped region R ia Figure 34 represents a set where both equations in(15) are “almost satisfied”: R= (Gy): be +2y— 2.00] <0. and 2x4 3y~ 3.40) < 0.2}, “There are points in R that are far away from the solution point (0.8, 0.6) and ye produce small values when substituted into the equations in (15). It is suspected that lines. system is ill conditioned, computations should be carried out in maltiple precision shmetic. The interested reader should reiearch te topic of condition num ber ofa mix to get more information on this phenomenon 1 conditioning has more drasic consequences when several equations are in solved. Consider the problem of nding the cubic polynomial y = crx? + ezx° 4 «sx-bea thatpasses through te fourpoints (2, 8) (3,27) (4, 64),and 5, 125) clearly SEC, 3.4 GAUSSIAN ELIMINATION AND PIVOTING 135 y = 27 isthe desired cubic polynomial). In Chapter $ we will intoduce the method ‘of least squares. Applying the method of least squares to find the coeffcienrs requires, that the following linear system te solved: 20,514 4,424 978 224] For] [20,5147 4424 ‘978 224 sa} |e2|_ | sara one 24 54s] | cs 978 |" ze 54 14] [tea x24 ‘A computer that carried nine digts of precision was uied 40 compute the eefficients and obtained 000004. cp = =0.000038, c= 0.000126, and cy = 0.900131 Although this computation is close tothe true solution, ¢y = I andez = ey =¢4 = Qi! shows how easy it is for error to creep into the solution, Furthermore, suppose thatthe coefficient aii = 20,514 in the upper-left comer of the coefficient matrix is changed to the value 20,$15 and the perturbed system is Solved. Values obtained withthe same ‘computer were 1 = 0.642857, cp = 3.75000, 5 =-12,3928, and cx 12.7500, which is a worthless answer. III conditioning is not easy to detect, If the system is solved a second time wit slightly perturbed coefficients and an answer that differs significantly from the first one is discovered, then itis reelized tht ill conditioning is present. Sensitivity analysis is a topic normally intioduced in advanced numerical analysis texts MATLAB In Program 3.2 the MATLAB statement [A ] is used to construct the augmented matrix for the linear system AX = B, and the wax command is used to determine the pivot element in partial pivoting. Once the equivalent triangulated matrix (U|Y] is obtained itis separated into U and ¥, and Program 3.1 is used to carry out hack substitution (backsub(U, ¥)). The use of these commands and processes is illustrated inthe following example. Example 3.19. (@) Use MATLAB constructthe augmented mattis fo the linear system in Exauple 3.16; fo) use the wax command to find the element of greatest magnitade inthe fist columa of the coefficient matrix A; and (c) break the augmented matrix in (11) into the coefficient matrix U and constant matrix ¥ of the uppestriangular system UX @ >> Avft 2 1.4;2043)4224;-313 a1; >> Be[13 28 20 6)": >> Aug=TA B)190 UMAR.91Me QULUTIONOF LINEAR SYSTEMS AY 1 age BBS "31326 (©) In the following MATLAB display, a is the element of greatest magnitude in the tr-1 column of and j i the row number. >> (a, jlemax{ane(aCt:4,1))} 4 3 (o) Let Augup. >> Augup=[1 2.1 4 13:0 ~ >> Uskugup(1:4,1:4) ue 41,0000 2.0000 1.000 4.0000 0 4.0000 2.0009 ~8.0000 [UY] be the upper-triangular matrix in (U1), 2-5 2:0 C -8 -7.5 -35;0 00-9 -16) 0 0 =8.0¢00 ~7.5000 2 08 0 -8.0000 >> Yonugep (44,5) 13 2 -35 “18 7 Program 3.2 (Upper Triangularization Followed by Back Substitution). To | construct ihe solution to A.K = 2, by first reducing the augmented marix (41:0 tupper-tiangular form and then performing tack substitution function X = uptrok(4,B) Aimput - A is an N x N nonsingular matrix % - Bis an Nx 1 matrix Output ~X is an N x 1 matrix containing the solution to AX-B Winitialize X and the tesporary storage matrix C Iv Ni=size ca) ; Xezeroe (Wi, 1); Crzeroa(t,Ne1); ‘YUForm the augnented aatrix:Aug=(4/B) Avge Ct 8); Sec. 3.4 GAUSSIAN ELIMINATION AND PrvoriNc for pel:N-1 Ypartial pivoting ror column p Ly, j}=max(abe(Aug(p:,p))); Yrnverchange row p and j Craug¢p, 35 Aug(p, JAug(j*-4, Aug(}+p1,:)=C5 if Aug(p, p)==-0 >A vag singular. No unique solution’ break ond Welivination process for colum p for kepeiiN sneang (it.p)/Aug P,P); ‘Augk, piN+1) Aug (k, p:W#1)-neAug(psp:N#) ; end ead Back Substitution on (U]¥] using Progran 3.1 XsbackeubiAug(1:N,1:N) ,Aug(1:N,N-1)) 5 Exercises for Gaussian Elimination and Pivoting In Bxercises | tough 4 show that AX = B is equivalent t the upper-iangular system UH = ¥ anc find the solution 1 Qa t4r— 6x = 4 2 t4y— Gye 4 2 3 2 7 9 2 3 2-2ntSH= 6 2-2 6 Wit3n+ H= 1B Su- dps 7 ant4e-dn= 3 18 4 ~Sut2n~ m= -1 5x1 +2n— - mHOn+3n= 5 04 +285 = 48 31+ a+6n= 17 16g = -10 + Bx + Cx? that passes through (1,4), (2, 7), and (3, 14) '. Find the parabola y138 CHaP.3. THE SOLUTION OF LINEARSYSTEMS AX = B 6. Find the parabola y = A + Bx + Cx? that passes through (1,6), (2.5), and (3, 2) 7. Find the cubic y = A 4+ Bx + Cx? + Dx? that pesses trough (0,0), (1,1), (2,2). and (3.2). In Exercises 8 through 10, show taat AX = B is equivalent tothe upper-triangular system UX =Y and find the solution, Ban t8yt4ytiy= 8 du tintint0u= 8 NFS bar bys 3ap+3n— 3a Naat Tat Ine 10 fy the 12 ay b3ag +055 2g = 4 we 2 9. td dents 12 Ie ban —4y +054 = 12 w1+5ig—Sx—tee= 18 3r2~3n = 3x4 Wy tint tine 8 An t2e ait4n—2s tiga 8 Bre 10. a42at0n~ 4= 9 N+2g+On- y= 9 Qn+3n- mtO= 9 On +4 +2 Su 26 Sn+5m+2a—4u= 32 11. Find the so“ution tothe following linear systera. ait 2ey 2a +3n- 4ep42e5 4 3x 12, Find the solution to te folowing linear system. atom s 2a pt Ses Bap du tng = 19 2 +6ry= 2 13, The Rockmore Comp. s considering the purchase of a new computer and will choose. either dhe DGood 174 or the MightDo 11. They tet both computers” ability solve the linear system 34x + 55y 21 55x + 9y = 34 ‘The DoGond 174 computer givesx = ~0.11 analy 45, and its check for scoursey Sec, 3:4 GAUSSIAN ELIMINATION AND PWWOTING is found by substituton 34(-0.11) +510.45) —21 55(—0.11) +89,0.45) —34 1.01. andits check foraceuracy ‘The MightDo 11 computer gives x = ~099 and y is found by substitution: 34(—0.99) + 55(1.01) ~21 = 0.89 55(-0.99) + 89(1.01) ~34= 1.44, Which computer gave the beter answer? Why? 14, Soive the following linzar systems using (i) Gaussian elmination with partial pivoting, and (i) Gaussisn elimination with sealed partial pivoting. @ 2- Get 10H=1 — &) m+ Wy-~ 5 +0001 =O i+ 10x2—0001x5 = 0 2a — Sept Wy- Oly =1 3x) = 1002+ 2.0183 = 0 Sm 4 s1—100xy— xy =0 dy 12- t =O 15. The Hilbert matrix i a classical ill-conditioned matrix and small changes in its coef ficients will produce a arge change in the solution to the perturbed system, (@) Find the exectsolution of AX = B (leave all numbers as fractions and do exact arithmetic) using the Hilbert matrix of dimension & x 4: Vad : pital yo “"leagal 771 wen (0) Now solve Ax = B using four-tgit rounding arithmetic: 1.0000 9.5000 0.3333 0.2500 €.5000 0.3332 0.2500 0.2000 €3333 0.2500 U:20W0 0.1667 62500 0.2000 0.1667 0.1429, Y 0 A= 3 0. [Note, The coefficient matrix in part (b) is an approximation to the coefficient matrix in part (8):190 CHA? 3. THE SOLUTION GF LINEAR SYSTEMS. AX = B Algorithms and Programs 1. Many applications involve rurices with maay zeros. Of practical importance are tridiagonal systems (see Exercises 11 and 12) of the form ae aint toun mh fda tess = ayn + dyagt cue ah n-axnaa + dy-ixn—1 + ENN on-13w-1 + dyn = by Construct a program that will soive a tridiagonal system. You may assume tht row interchanges are not needed end that row k can be used to eliminate xy in row k-+ 1 2 Use Program 3.2 find he sinth-degree polynomial y = ay asx + asx? Faux? + ast + agx? + a7x® that passes through (0, (1.3), (2,2), 8,1), 4.3) (5.2) and (6,1). Use the pot command to plot he polynomial andthe given points on the ‘same graph Explain any discrepancies in your graph, ‘3. Use Program 3.2 to solve the linear system AX = B, where A om wl loylway and apd B= Byles, whore bry = N and by) = 2) ~ 1) fort = 2. Use N= 3 7.and 11, The exactsoluionis X= [1 1... 1 1] Explain any deviatons fom the exac solution. ‘Constructs program thatchanges the pivoting srategy in Program 3.2to scaled paral pivoting. 5. Use your sealed partial pivoting program from Problem 4 to solve the system given in Problem 3 for N= 1. Exolain any improvement in the solutions, 66. Modity Program 3.2 so that it will effcienly solve M lines sy:tems withthe sane ‘coeticient matrix A but cfferent column matrices B. The M linear systems lock I ke AX BAX? Ba, AXu = By 7 Me lowing dcsion resend formas of dene 3 > 3, th on cepts apply to matrices of dimension N x N. IFA is nonsingular, then A~” exists and AAW! = 1. Let C), Cz,and C2 be the columns of A“! and By, B2, and E's be the columas of ¥. The equation AA~' = I car be represented a8, afer cs ey] This matrix producti equivalent to the thre linear systems AC)=5i, AC2=8;, and ACs SeC,3.5 TRIANGULAR FACTORIZATION 1 “Thusfinding A“T is eauivalent to solving the three linea systems. ‘Unig Program 3.2 or your program from Problem 6 finde inverse of each ofthe following matrices. Check your answer by computing the product AAT andalso by Using the command in (A). Explain any differences, zoo! 130 “Boo -z100 eno @ [; + 1] (©) ) 240-2700 6480-4200 =i40 1680-1200 2800, ‘Triangular Factorization In Section 33 we saw how easy itis to solve an upper-triasgular system. Now we introduce the concept of factorization of a gven matrix A into the product of a lower~ triangular matrix L that has 1's along the rain diagonal and an upper-tangutas mattix U with nonzero diagoral elements, Forease of notation we illustrate the concepts ‘with matiices of dimension 4 x 4, but they apply to an arbitrary system of dimension NXN. Definition 3.4. The nonsingular matrix A has a triangular factorization if it can be expressed as the product of @ lower-triangular matrix Z and an uppet-tiangular matrix U w A=W. In matrix form, this is writen as an az ay au] [1 0 0 OP fun wa ws as a3) aq an al [ma 1 0 0}[ 0 wr uy ua ax, am ay ase] [may msn 1 Of] 0 0 uss use faq am ae dag) [mar ma my 1JL0 0 9 un) 4 ‘The condition that As nonsingular implies that usx #0 forall k. The notation for the entries in L is mj, and the reason Tor the choice of m,; instead of fj will be ‘pointed ont soon. Solution of a Linear System ‘Suppose thatthe coefficient matrix A forthe linear system AX’ = B has a tiangolar factorization (1); then the solution to @ LUX=B142 CHa. THE SOLUTION OF LINEAR SYSTEMS AX = B canbe obtained by defining ¥ = U-X and then solving two systems @) first solve LY = B for ¥; then solve UX = ¥ for X. In equation form, we must first solve the lower-tiangular system ” ah is mayi+ 32 =h maytman+ on bs ba and use them in solving the upper-triangular system mary + maya + mays + Y4 to obtain yi, ya, ys, and y ana + uname + unas tu ata 91 unpre tags + a pate + uayts + usatg = Yo sy9 — usax4 = 3 meats = Ya. Example3.20, Solve at Ont 4+ xy =21 Dt Bat Gry 444 = 92 Sat 10e+ 8134 8x4 = 79 ay + 12an + 103 + 5 = 82. Use the triangular factorization method and the fact that 1241) floooypi2 «1 a|2 8 6 4]_{2 10 ofjo 4 -2 2 7 A=13 0 8 8(-|3 11 olfe 0 -2 3 =e 4120 6} [412 1J[0 0 0 -6} Use the forwart-substitution method to solve LY = B: y =21 © Qnty 52 Intet yy =79 dy ty + 2ys ty = 82. ‘Compute the values y) = 21, 2 = 52—2@1) = 10,93 = 79— 32) — 10 = 6, an 2g BAAD 10-26) = 24 oF = BE 106 —24] Nexon essen stl tert xy= 21 o SEC. 3.5 TRIANGULAR FACTORIZATION 3 [Now use back substitusion and compute the solution x4 = -24/(~6) = 4.13 = (6 — 3,x2 = (10— 214) + 2(3))/$ =2, and xy = 21 4403) ~ 20) ‘Triangular Factorization We now discuss how to obtain the triangular “actorizaton. If row interchanges are not secessary when using Gaussian elimination, the multipliers mi, are the subdiagonal cates in L, Example 3.21, Use Gaussian elimination to construct the triangular factorization ofthe suatrix 43-1 aa|2 -4 5 12 6 ‘The matrix L will be constructed from an identity matrix placed atthe left. For each row operation used to construc: the upper-trangula: matrix, the multipliers mj will be put in their proper places atthe left. Start with 10 0)f 4 3-1 a=|o 1 o[}—2 -4 s}. ool: 2 6 Row 1 is used to eliminate the elements of in column 1 below a. The multiples ma, 05 and ms, = 0.25 of row 1 are subsracted from rows 2 and 3, respectively. These rmltplirs are put inthe matrix at the left and the result is Looyfs 3 =r 5 1 0]fo -25 45 025 0 1}|9 125 625 Row 218 used to climinatethe elements of A in column 2 below 273. The muldple m3, 0.5 of the second row is ubt-acted from row 2, and the multplisris entered inthe matrix athe left and we have the desired triangular factorization of A. 1 0O)fs 3-9 ® a=|-05 1 0}|0 -25 45 925 -05 if[o 0 85 = ‘Theorem 3.10 (Direct Factorization A = LU. No Row Interchanges). Suppose ‘that Gaussian elimination, without row interchanges, can be successfully performed to solve the general linear system AX = B. Then the matrix A can be factored as the pvduct of a lower-rtangular matrix L and an upper-triangular matrix U: A A=LU.

Numerical Methods Using Matlab, 1999

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Numerical Methods Using Matlab, 1999

Diunggah oleh

Hak Cipta:

Format Tersedia

Anda mungkin juga menyukai