Anda di halaman 1dari 2

CS450: Numerical Analysis

Solutions to HW 1
Name: Benjamin Chiang
Netid: bchiang2@illinois.edu
Problem 1:
(a) 3 signicant digits. 1.23579 1.23456 = 0.00123
(b) In the case of normalized system, where the leading digits have to be one. The x and y term
uses 1 as their exponent, while their dierence 1.23 10
3
uses -3 as their exponent. Therefore, the
exponent range is -3 E 1
(c) Yes, because when y x the dierence between x and y will result in cancellation. Cancellation
would lead to loss in signicant digits, needing less mantissa. By using gradual underow allow
leading zeros in the mantissa, it is possible to exactly represent y-x when x y.
Problem 2:
In the case where x approx y, log(x/y) would likely to give better result for two reasons. First, as
stated in the problem, it avoids the risk of cancellation. Second, in x and y are close to zero, the
log(x) log(y) is highly sensitive the change in input, because log(0) approach innity, and issue
that doesnt exist in because of division.
Problem 3:
Formula (b) is preferable because it reduce the risk of overow. In the case where a and b are both
really large, the rst formula could introduce overow and produce incorrect answer. In the case
of formula (a), a+b would be cut o, due to overow, there by producing the wrong answer.For
example, assuming you are using 1 digit precision of base 10, using formula (a) for point 8 and 9
would yield 7/2, because 8+9 or 17 would be cut o to 7, where the answer 7/2 is outside of 8 and
9. However, formula (b) would yield 8+1/2 = 8.5, the correct answer.
Problem 4:
(a) Yes, from the graph there seem to be a minimum magnitude of the error. The error seem to be
at its minimum when h is between 10e 15 and 10e 1. On the text book, the optimal h = 2

e,
which in python by using np.nfo(np.oat32).eps, h 3.45e 4, which agress with the graph.
Problem 5:
(a) The relative error of between the innite series and the Numpy result are fairly close for the
positive inputs. However, the relative error grows directly with smaller negative input.
1
-20 -15 -10 -5 -1 1 5 10 15 20
1.442 3.302e-05 7.234e-09 2.136e-13 3.017e-16 1.633e-16 1.915e-16 3.303e-16 0.0 2.457e-16
(b)Yes, simply computing e
x
as 1/e
x
for negative values resolved the issue as evident in the following
table.
-20 -15 -10 -5 -1 1 5 10 15 20
2.00e-16 0.0 1.49e-16 2.57e-16 1.50e-16 1.63e-16 1.91e-16 3.30e-16 0.0 2.45e-16
(c) Yes, by placing the positive term and the negative term in separate arrays and summing them
up in the end it is possible to produce a lower relative error by reducing the risk of cancellation by
performing subtraction only once during the nal return, when two array are summed. However,
my rewritten function didnt out perform the previous one, due to the dierence the power of each
term that resulted in loss in precision. By skipping every other term, the lower power term would
lose more precision to perform addition, during the alignment of the terms with dierent exponents,
producing inaccurate answer.
-20 -15 -10 -5 -1
27.92 1.47e-12 9.36e-08 1.47e-12 0.0
Problem 6:
(a) Yes, here is a table of the relative error of one pass - two pass, showing that two pass mostly out
perform one pass
100 200 300 400 500 600 700 800 900
-0.002 0.337 0.006 0.0250 0.185 0.093 0.135 0.045 0.0138
(b)Yes, the dierence between the two formula lies inside the summations, (X
i
x)
2
vs X
2
i
n x
2
.
We can see that the two-pass formula suers less cancellation because it computes the dierence of
two small two value before squaring it. While, the one pass formula squared the term rst before
taking the dierence, therefore increase the chances of cancellations. Therefore, a input where
(X
i
x)
2
would dramatically illustrates the numerical dierence between one pass and two pass.
Problem 7:
The inconsistency of the result between the polynomial and its expanded form is due to the attempts
to compute a small quantity as dierence of large quantities, where rounding error would dominate
the result, leading to loss in accuracy due to cancellation.
In the rst form (x 1)
6
, the program only round twice. Whereas, in the second form x
6
6x
5
+
15x
4
20x
3
+ 15x
2
6x + 1, the program have to round multiple times before reaching the nal
result, where because the dierence of the terms are close to zero, a lot of information is lost due to
rounding.
2

Anda mungkin juga menyukai