14 CHAPTER 1- Mathematical Preliminaries and Error Analysis oduces 0.99995000042 0.99995000000 To show both the function(in black) and the polynomial(in cyan) near xo=0, we enter plot(f,p3),x=-2..2) and obtain the Maple plot shown in Figure 1.11 Figure 1.11 0.5 0.5 The integrals of f and the polynomial are given by ql:=in(f,x=0..0.1);q2:=int(p3,x=0..0.1) 0.099833416647 0.099833333333 We assigned the names g l and 2 to these values so that we could easily determine the error with the command err: =lql-q21 8.331410-8 There is an alternate method for generating the Taylor polynomials within the Numer- icalAnalysis subpackage of Maple's Student package. This subpackage will be discussed in Chapter 2. EXERCISE SET 1.1 1. Show that the following equations have at least one solution in the given intervals. a. r cos x-2x2+3x-1=0,0.2,03]and[12,13] 0.[1,2] and le,4 Copyright 2010 Cengage Learning. All Rights May no be copied, scanned, or duplicated, in whole or in part Due to maternally aftec the overall leaning expenence. Cengage Learning
14 CHAPTER 1 Mathematical Preliminaries and Error Analysis This produces 0.99995000042 0.99995000000 To show both the function (in black) and the polynomial (in cyan) near x0 = 0, we enter plot((f , p3), x = −2..2) and obtain the Maple plot shown in Figure 1.11. Figure 1.11 –2 −1 1 x 2 1 0.5 0 –0.5 –1 The integrals of f and the polynomial are given by q1 := int(f , x = 0 . . 0.1); q2 := int(p3, x = 0 . . 0.1) 0.099833416647 0.099833333333 We assigned the names q1 and q2 to these values so that we could easily determine the error with the command err := |q1 − q2| 8.3314 10−8 There is an alternate method for generating the Taylor polynomials within the NumericalAnalysis subpackage of Maple’s Student package. This subpackage will be discussed in Chapter 2. E X E R C I S E S E T 1.1 1. Show that the following equations have at least one solution in the given intervals. a. x cos x − 2x2 + 3x − 1 = 0, [0.2, 0.3] and [1.2, 1.3] b. (x − 2)2 − ln x = 0, [1, 2] and [e, 4] Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it
s(2x)-(x-2)2=0,[2,3]and[3,4] (lnx)2=0,[4,5 2. Find intervals containing solutions to the following equations c.x3-2x2-4x+2=0 d.x3+4001x2+4.002x+1.101=0 3. Show that f(x)is 0 at least once in the given intervals l)sin(x/2)x),[0.1 b. f(r)=(x-1)tanx +xsin x, [0, 11 丌x-(x-2)lnx,[1,2 d.f(x)=(x-2) sinx In(x+2),[-1,3 4. Find maxasrsb If(r) for the following functions and intervals /3,[0.1 f(x)=(4x-3)/(x2-2x),0.5,1l c.f(x)=2rcos(2x)-(x-2)2,[2,4 d.f(x)=1+e [1,2 5. Use the Intermediate Value Theorem 1. ll and Rolles Theorem 1.7 to show that the graph of f(x)=x+2x+k crosses the x-axis exactly once, regardless of the value of the constant k Suppose fE Cla, b] and f'(r)exists on(a, b). Show that if f(x)+for all x in(a, b), then there can exist at most one number p in [a, b] with f(p)=0 7. Let f(x)=x a. Find the second Taylor polynomial P2()about xo =0. b. Find R2(0.5)and the actual error in using P2(0.5)to approximate f(0.5) c. Repeat part(a)using xo= 1 d. Repeat part(b)using the polynomial from part(c) 8. Find the third Taylor polynomial P3(x)for the function f(r)=vx+I about xo=0. Approximate √0.5,√0.75,√125,and√l.5 using p3(x), and find the actual errors. 9. Find the second Taylor polynomial P2(x)for the function f(x)=e cos x about xo =0 a. Use P2(0.5)to approximate f(0.5). Find an upper bound for error If(0.5)-P2(0.5)l using the error formula and it to the actual error b. Find a bound for the error If(x)-P2(x)l in using P2(x)to approximate f(x)on the interval 0,1] C. Approximate Jo f()dr using o P2(x)dr Find an upper bound for the error in(c)using o IR2(x)dxl, and compare the bound to the actual error 10. Repeat Exercise 9 using xo /6 11. Find the third Taylor polynomial P3(x) for the function f(x)=(x-1)In x about xo =1 a. Use P3(. 5)to approximate f(0.5) Find an upper bound for error If (0.5)-P3(0.5) using the error formula, and compare it to the actual error. Find a bound for the error If(x)-P3(x)l in using P3(x) to approximate f(x)on the interval 0.5,1.5 c.Approximate Jos f(x)dx using /os P3(x) Find an upper bound for the error in(c)using /s IR3(x)dxl, and compare the bound to the actual error 12. Let f(x)=2x cos(2x)-(x-2)and xo=0 a. Find the third Taylor polynomial P3(x), and use it to approximate f(0. 4) b. Use the error formula in Taylor's Theorem to find an upper bound for the error If(.4)-P3(0.4) Compute the actual error Copyright 2010 Cengage Learning. All Rights May no be copied, scanned, or duplicated, in whole or in part Due to maternally aftec the overall leaning expenence. Cengage Learning rese
1.1 Review of Calculus 15 c. 2x cos(2x) − (x − 2)2 = 0, [2, 3] and [3, 4] d. x − (ln x)x = 0, [4, 5] 2. Find intervals containing solutions to the following equations. a. x − 3−x = 0 b. 4x2 − ex = 0 c. x3 − 2x2 − 4x + 2 = 0 d. x3 + 4.001x2 + 4.002x + 1.101 = 0 3. Show that f (x) is 0 at least once in the given intervals. a. f (x) = 1 − ex + (e − 1)sin((π/2)x), [0, 1] b. f (x) = (x − 1)tan x + x sin πx, [0, 1] c. f (x) = x sin πx − (x − 2)ln x, [1, 2] d. f (x) = (x − 2)sin x ln(x + 2), [−1, 3] 4. Find maxa≤x≤b |f (x)| for the following functions and intervals. a. f (x) = (2 − ex + 2x)/3, [0, 1] b. f (x) = (4x − 3)/(x2 − 2x), [0.5, 1] c. f (x) = 2x cos(2x) − (x − 2)2, [2, 4] d. f (x) = 1 + e− cos(x−1) , [1, 2] 5. Use the Intermediate Value Theorem 1.11 and Rolle’s Theorem 1.7 to show that the graph of f (x) = x3 + 2x + k crosses the x-axis exactly once, regardless of the value of the constant k. 6. Suppose f ∈ C[a, b] and f (x) exists on (a, b). Show that if f (x) = 0 for all x in (a, b), then there can exist at most one number p in [a, b] with f (p) = 0. 7. Let f (x) = x3. a. Find the second Taylor polynomial P2(x) about x0 = 0. b. Find R2(0.5) and the actual error in using P2(0.5) to approximate f (0.5). c. Repeat part (a) using x0 = 1. d. Repeat part (b) using the polynomial from part (c). 8. Find the third Taylor polynomial P3(x) for the function f (x) = √ √ x + 1 about x0 = 0. Approximate 0.5, √0.75, √1.25, and √1.5 using P3(x), and find the actual errors. 9. Find the second Taylor polynomial P2(x) for the function f (x) = ex cos x about x0 = 0. a. Use P2(0.5) to approximate f (0.5). Find an upper bound for error |f (0.5) − P2(0.5)| using the error formula, and compare it to the actual error. b. Find a bound for the error |f (x) − P2(x)| in using P2(x) to approximate f (x) on the interval [0, 1]. c. Approximate 1 0 f (x) dx using 1 0 P2(x) dx. d. Find an upper bound for the error in (c) using 1 0 |R2(x) dx|, and compare the bound to the actual error. 10. Repeat Exercise 9 using x0 = π/6. 11. Find the third Taylor polynomial P3(x) for the function f (x) = (x − 1)ln x about x0 = 1. a. Use P3(0.5) to approximate f (0.5). Find an upper bound for error |f (0.5) − P3(0.5)| using the error formula, and compare it to the actual error. b. Find a bound for the error |f (x) − P3(x)| in using P3(x) to approximate f (x) on the interval [0.5, 1.5]. c. Approximate 1.5 0.5 f (x) dx using 1.5 0.5 P3(x) dx. d. Find an upper bound for the error in (c) using 1.5 0.5 |R3(x) dx|, and compare the bound to the actual error. 12. Let f (x) = 2x cos(2x) − (x − 2)2 and x0 = 0. a. Find the third Taylor polynomial P3(x), and use it to approximate f (0.4). b. Use the error formula in Taylor’s Theorem to find an upper bound for the error|f (0.4)−P3(0.4)|. Compute the actual error. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it
16 CHAPTER 1- Mathematical Preliminaries and Error Analysis c. Find the fourth Taylor polynomial Pa(x), and use it to approximate f(0. 4) Use the error formula in Taylors Theorem to find an upper bound for the error lf(0.4)-P4(0.4) 13. Find the fourth Taylor polynomial Pa(x) for the function f( about xo =0 a. Find an upper bound for lf(x)-P4(x)l, for 0 <x<0.4 b. Approximate o. c. Find an upper bound for the error in(b)using P(x)dx. d. Approximate f(0. 2)using P4(0. 2), and find the 14. Use the error term of a Taylor polynomial to estimate the error involved in using sinx a x to approximate sin1° 15. Use a Taylor polynomial about r/4 to approximate cos 42 to an accuracy of 10-6. 16. Let f(x)=e/sin(x/3). Use Maple to determine the following a. The third Maclaurin polynomial P3(x). b. f(x)and a bound for the error If(x)-P3(x)l on (0,11 17. Let f(x)=In(x2+2). Use Maple to determine the following a. The Taylor polynomial P3(x) for f expanded about xo= 1 b. The maximum error Lf(x)-P3(x),for0≤x≤1 The Maclaurin polynomial P3(x) for f. The maximum error If(x)-P3(x)|,for0≤x≤1. e. Does P3(O)approximate f(o) better than P3(1)approximates f(1)? 18. Let f(x)=(1-x)- and xo=0. Find the nth Taylor polynomial Pn(x) for f(x)about xo. Find a value of n necessary for Pn(r)to approximate f(r)to within 10-bon [0, 0.5 19. Let f(x)=e and xo= 0. Find the nth Taylor polynomial Pn(r) for f(x)about xo. Find a value of n necessary for P,(x)to approximate f(x)to within 10-6 on [0, 0.51 20. Find the nth Maclaurin polynomial Pn(x) for f(r)=arctanx 21. The polynomial P2(x)=1-2x is to be used to approximate f()=cos x in [-3,21Find a bound for the maximum err 22. The nth Taylor polynomial for a function f at xo is sometimes referred to as the polynomial of degree at most n that"best"approximates f near xo b. Find the quadratic polynomial that best approximates a function f near xo= I if the tangent ne at xo= I has equation y= 4x-l, and if f"(1)=6. 23. Prove the Generalized Rolle's Theorem, Theorem 1.10, by verifying the following Use Rolles Theorem to show that f(zi)=0 for n-l numbers in la, bl with a < 21<22< <an-I< b b. Use Rolles Theorem to show that f(wi)=0 for n-2 numbers in [a, b] with z1 W1 <z2< c. Continue the arguments in a and b to show that for each j= 1, 2,., n-1 there are n-j in [a, b] where fo is 0. d. Show that part c implies the conclusion of the the 4. In Example 3 it is stated that for all x we have I sin xl a. Show that for allx >0 we have f(x)=x-sin x is non-decreasing, which implies that sinx sx b. Use the fact that the sine function is odd to reach the conclusion 25. A Maclaurin polynomial for e is used to give the approximation 2. 5 to e. The error bound in thi approximation is established to be e=6. Find a bound for the error in E 26. The error finction defined by erf(x)= Copyright 2010 Cengage Learning. All Rights May no be copied, scanned, or duplicated, in whole or in part Due to maternally aftec the overall leaning expenence. Cengage Learning
16 CHAPTER 1 Mathematical Preliminaries and Error Analysis c. Find the fourth Taylor polynomial P4(x), and use it to approximate f (0.4). d. Use the error formula in Taylor’s Theorem to find an upper bound for the error|f (0.4)−P4(0.4)|. Compute the actual error. 13. Find the fourth Taylor polynomial P4(x) for the function f (x) = xex2 about x0 = 0. a. Find an upper bound for |f (x) − P4(x)|, for 0 ≤ x ≤ 0.4. b. Approximate 0.4 0 f (x) dx using 0.4 0 P4(x) dx. c. Find an upper bound for the error in (b) using 0.4 0 P4(x) dx. d. Approximate f (0.2) using P 4(0.2), and find the error. 14. Use the error term of a Taylor polynomial to estimate the error involved in using sin x ≈ x to approximate sin 1◦. 15. Use a Taylor polynomial about π/4 to approximate cos 42◦ to an accuracy of 10−6. 16. Let f (x) = ex/2 sin(x/3). Use Maple to determine the following. a. The third Maclaurin polynomial P3(x). b. f (4) (x) and a bound for the error |f (x) − P3(x)| on [0, 1]. 17. Let f (x) = ln(x2 + 2). Use Maple to determine the following. a. The Taylor polynomial P3(x) for f expanded about x0 = 1. b. The maximum error |f (x) − P3(x)|, for 0 ≤ x ≤ 1. c. The Maclaurin polynomial P˜ 3(x) for f . d. The maximum error |f (x) − P˜ 3(x)|, for 0 ≤ x ≤ 1. e. Does P3(0) approximate f (0) better than P˜ 3(1) approximates f (1)? 18. Let f (x) = (1 − x)−1 and x0 = 0. Find the nth Taylor polynomial Pn(x) for f (x) about x0. Find a value of n necessary for Pn(x) to approximate f (x) to within 10−6 on [0, 0.5]. 19. Let f (x) = ex and x0 = 0. Find the nth Taylor polynomial Pn(x) for f (x) about x0. Find a value of n necessary for Pn(x) to approximate f (x) to within 10−6 on [0, 0.5]. 20. Find the nth Maclaurin polynomial Pn(x) for f (x) = arctan x. 21. The polynomial P2(x) = 1 − 1 2 x2 is to be used to approximate f (x) = cos x in [−1 2 , 1 2 ]. Find a bound for the maximum error. 22. The nth Taylor polynomial for a function f at x0 is sometimes referred to as the polynomial of degree at most n that “best” approximates f near x0. a. Explain why this description is accurate. b. Find the quadratic polynomial that best approximates a function f near x0 = 1 if the tangent line at x0 = 1 has equation y = 4x − 1, and if f (1) = 6. 23. Prove the Generalized Rolle’s Theorem, Theorem 1.10, by verifying the following. a. Use Rolle’s Theorem to show that f (zi) = 0 for n − 1 numbers in [a, b] with a < z1 < z2 < ··· < zn−1 < b. b. Use Rolle’s Theorem to show that f (wi) = 0 for n − 2 numbers in [a, b] with z1 < w1 < z2 < w2 ··· wn−2 < zn−1 < b. c. Continue the arguments in a. and b. to show that for each j = 1, 2, ... , n − 1 there are n − j distinct numbers in [a, b] where f (j) is 0. d. Show that part c. implies the conclusion of the theorem. 24. In Example 3 it is stated that for all x we have |sin x|≤|x|. Use the following to verify this statement. a. Show that for all x ≥ 0 we have f (x) = x−sin x is non-decreasing, which implies that sin x ≤ x with equality only when x = 0. b. Use the fact that the sine function is odd to reach the conclusion. 25. A Maclaurin polynomial for ex is used to give the approximation 2.5 to e. The error bound in this approximation is established to be E = 1 6 . Find a bound for the error in E. 26. The error function defined by erf(x) = 2 √π x 0 e−t 2 dt Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1.2 Round-off Errors and Computer Arithmetic gives the probability that any one of a series of trials will lie within x units of the mean, assuming that the trials have a normal distribution with mean 0 and standard deviation v2/2. This integral cannot be evaluated in terms of elementary functions, so an approximating technique must be used a. Integrate the Maclaurin series for e- to show that erf(x)= ∑ b. The error function can also be expressed in the form erf(x)= 1.3.5…(2k+1) Verify that the two series agree for k= 1, 2, 3, and 4. [Hint: Use the Maclaurin series for e-. c. Use the series in part(a) to approximate erf(l)to within 10- d. Use the same number of terms as in part(c)to approximate erf(l) with the series in part(b) e. Explain why difficulties occur using the series in part(b) to approximate erf(x). 27. A function f: [a,bl-R is said to satisfy a Lipschitz condition with Lipschitz constant L on [a, bl if, for every x, y E [a, b], we have If(x)-foIsLIx-yl. a. Show that if f satisfies a Lipschitz condition with Lipschitz constant L on an interval [a, b, then b. Show that if f has a derivative that is bounded on [a, b] by L, then f satisfies a Lipschitz condition with Lipschitz constant L on [a,bl c. Give an example of a function that is continuous on a closed interval but does not satisfy a Lipschitz condition on the interval 28. Suppose f E Cla, b], that x, and x, are in [a, b] a. Show that a number 5 exists between xn and xz with f()= f(x1)+f(x2)1 f(x1)+=f(x2) b. Suppose that cI and c2 are positive constants. Show that a number 5 exists between xI and xy f()=f(x1)+c2f(x2) c. Give an example to show that the result in part b does not necessarily hold when ci and cz have opposite signs with ci#-C2 29. Let f E Cla, b], and let p be in the open interval (a, b) a. Suppose f(p)#0. Show that a8>0 exists with f(x)#0, for all x in Ip-8, p+8l, with Ip-8, P+8] a subset of [a, bl b. Suppose f(p)=0 and k>0 is given. Show that a8>0 exists with If(x)l s k, for all x in p-8,P+8, with Ip-8, p+8] a subset of [a, b]. 1.2 Round-off Errors and computer arithmetic The arithmetic performed by a calculator or computer is different from the arithmetic in algebra and calculus courses. You would likely expect that we always have as true statements things such as 2+2=4, 4-8=32, and (V3)2=3. However, with computer arithmetic we expect exact results for 2+2=4 and 4. 8=32, but we will not have precisely To understand why this is true we must explore the world of finite-digit arithmeti Copyright 2010 Cengage Learning. All Rights May no be copied, scanned, or duplicated, in whole or in part Due to maternally aftec the overall leaning expenence. Cengage Learning
1.2 Round-off Errors and Computer Arithmetic 17 gives the probability that any one of a series of trials will lie within x units of the mean, assuming that the trials have a normal distribution with mean 0 and standard deviation √2/2. This integral cannot be evaluated in terms of elementary functions, so an approximating technique must be used. a. Integrate the Maclaurin series for e−x2 to show that erf(x) = 2 √π ∞ k=0 (−1)k x2k+1 (2k + 1)k! . b. The error function can also be expressed in the form erf(x) = 2 √π e−x2 ∞ k=0 2k x2k+1 1 · 3 · 5 ···(2k + 1) . Verify that the two series agree for k = 1, 2, 3, and 4. [Hint: Use the Maclaurin series for e−x2 .] c. Use the series in part (a) to approximate erf(1) to within 10−7. d. Use the same number of terms as in part (c) to approximate erf(1) with the series in part (b). e. Explain why difficulties occur using the series in part (b) to approximate erf(x). 27. A function f : [a, b] → R is said to satisfy a Lipschitz condition with Lipschitz constant L on [a, b] if, for every x, y ∈ [a, b], we have |f (x) − f (y)| ≤ L|x − y|. a. Show that if f satisfies a Lipschitz condition with Lipschitz constant L on an interval [a, b], then f ∈ C[a, b]. b. Show that if f has a derivative that is bounded on [a, b] by L, then f satisfies a Lipschitz condition with Lipschitz constant L on [a, b]. c. Give an example of a function that is continuous on a closed interval but does not satisfy a Lipschitz condition on the interval. 28. Suppose f ∈ C[a, b], that x1 and x2 are in [a, b]. a. Show that a number ξ exists between x1 and x2 with f (ξ ) = f (x1) + f (x2) 2 = 1 2 f (x1) + 1 2 f (x2). b. Suppose that c1 and c2 are positive constants. Show that a number ξ exists between x1 and x2 with f (ξ ) = c1f (x1) + c2f (x2) c1 + c2 . c. Give an example to show that the result in part b. does not necessarily hold when c1 and c2 have opposite signs with c1 = −c2. 29. Let f ∈ C[a, b], and let p be in the open interval (a, b). a. Suppose f (p) = 0. Show that a δ > 0 exists with f (x) = 0, for all x in [p − δ, p + δ], with [p − δ, p + δ] a subset of [a, b]. b. Suppose f (p) = 0 and k > 0 is given. Show that a δ > 0 exists with |f (x)| ≤ k, for all x in [p − δ, p + δ], with [p − δ, p + δ] a subset of [a, b]. 1.2 Round-off Errors and Computer Arithmetic The arithmetic performed by a calculator or computer is different from the arithmetic in algebra and calculus courses. You would likely expect that we always have as true statements things such as 2+2 = 4, 4·8 = 32, and ( √3)2 = 3. However, with computer arithmetic we expect exact results for 2+2 = 4 and 4 · 8 = 32, but we will not have precisely ( √3)2 = 3. To understand why this is true we must explore the world of finite-digit arithmetic. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 1- Mathematical Preliminaries and Error Analysis In our traditional mathematical world we permit numbers with an infinite number of digits. The arithmetic we use in this world defines v3 as that unique positive number that when multiplied by itself produces the integer 3. In the computational world, however, each representable number has only a fixed and finite number of digits. This means, for example, that only rational numbers-and not even all of these--can be represented exactly. Since v3 is not rational, it is given an approximate representation, one whose square will not be precisely 3, although it will likely be sufficiently close to 3 to be acceptable in most situations. In most cases, then, this machine arithmetic is satisfactory and passes without notice or concern, but at times problems arise because of this discrepancy Error due to rounding should be The error that is produced when a calculator or computer is used to perform real expected whenever computations number calculations is called round-off error. It occurs because the arithmetic per- are performed using numbers that formed in a machine involves numbers with only a finite number of digits, with the re- are not powers of 2. Keeping this sult that calculations are performed with only approximate representations of the actual rror under control is extremely numbers. In a computer, only a relatively small subset of the real number system is used important when the number of calculations is large for the representation of all the real numbers. This subset contains only rational numbers, both positive and negative, and stores the fractional part, together with an exponential Binary Machine Numbers In 1985, the IEEE (Institute for Electrical and Electronic Engineers) published a report called Binary Floating Point Arithmetic Standard 754-1985. An updated version was published in 2008 as IEEE 754-2008. This provides standards for binary and decimal floating point numbers, formats for data interchange, algorithms for rounding arithmetic operations, and for the handling of exceptions. Formats are specified for single, double, and extended precisions, and these standards are generally followed by all microcomputer manufacturers sing floating-point hardware. and a 52-bit binary fraction, f, called the mantissa. The base for the exponent is risI A 64-bit(binary digit)representation is used for a real number. The first bit is a indicator, denoted s. This is followed by an 11-bit exponent, c, called the characteristic, Since 52 binary digits correspond to between 16 and 17 decimal digits, we can assume that a number represented in this system has at least 16 decimal digits of precision. The exponent of 1l binary digits gives a range ofo to 2-1=2047. However, using only posi tive integers for the exponent would not permit an adequate representation of numbers with small magnitude. To ensure that numbers with small magnitude are equally representable 1023 is subtracted from the characteristic, so the range of the exponent is actually from 1023to102 mon o save storage and provide a unique representation for each floating-point number, a nalization is imposed. Using this system gives a floating-point number of the form (-1)2(1+f) Illustration Consider the machine number 010000000011101100100010000000000000000000000000000000000000000. The leftmost bit is s=0, which indicates that the number is positive. The next 1l bits, 10000000011, give the characteristic and are equivalent to the decimal number c=1.210+0.2+…+0.22+1.21+1.20=1024+2+1=1027 Copyright 2010 Cengage Learning. All Rights May no be copied, scanned, or duplicated, in whole or in part Due to maternally aftec the overall leaning expenence. Cengage Learning
18 CHAPTER 1 Mathematical Preliminaries and Error Analysis In our traditional mathematical world we permit numbers with an infinite number of digits. The arithmetic we use in this world defines √3 as that unique positive number that when multiplied by itself produces the integer 3. In the computational world, however, each representable number has only a fixed and finite number of digits. This means, for example, that only rational numbers—and not even all of these—can be represented exactly. Since √3 is not rational, it is given an approximate representation, one whose square will not be precisely 3, although it will likely be sufficiently close to 3 to be acceptable in most situations. In most cases, then, this machine arithmetic is satisfactory and passes without notice or concern, but at times problems arise because of this discrepancy. Error due to rounding should be expected whenever computations are performed using numbers that are not powers of 2. Keeping this error under control is extremely important when the number of calculations is large. The error that is produced when a calculator or computer is used to perform realnumber calculations is called round-off error. It occurs because the arithmetic performed in a machine involves numbers with only a finite number of digits, with the result that calculations are performed with only approximate representations of the actual numbers. In a computer, only a relatively small subset of the real number system is used for the representation of all the real numbers. This subset contains only rational numbers, both positive and negative, and stores the fractional part, together with an exponential part. Binary Machine Numbers In 1985, the IEEE (Institute for Electrical and Electronic Engineers) published a report called Binary Floating Point Arithmetic Standard 754–1985. An updated version was published in 2008 as IEEE 754-2008. This provides standards for binary and decimal floating point numbers, formats for data interchange, algorithms for rounding arithmetic operations, and for the handling of exceptions. Formats are specified for single, double, and extended precisions, and these standards are generally followed by all microcomputer manufacturers using floating-point hardware. A 64-bit (binary digit) representation is used for a real number. The first bit is a sign indicator, denoted s. This is followed by an 11-bit exponent, c, called the characteristic, and a 52-bit binary fraction, f , called the mantissa. The base for the exponent is 2. Since 52 binary digits correspond to between 16 and 17 decimal digits, we can assume that a number represented in this system has at least 16 decimal digits of precision. The exponent of 11 binary digits gives a range of 0 to 211−1 = 2047. However, using only positive integers for the exponent would not permit an adequate representation of numbers with small magnitude. To ensure that numbers with small magnitude are equally representable, 1023 is subtracted from the characteristic, so the range of the exponent is actually from −1023 to 1024. To save storage and provide a unique representation for each floating-point number, a normalization is imposed. Using this system gives a floating-point number of the form (−1) s 2c−1023(1 + f ). Illustration Consider the machine number 0 10000000011 1011100100010000000000000000000000000000000000000000. The leftmost bit is s = 0, which indicates that the number is positive. The next 11 bits, 10000000011, give the characteristic and are equivalent to the decimal number c = 1 · 210 + 0 · 29 +···+ 0 · 22 + 1 · 21 + 1 · 20 = 1024 + 2 + 1 = 1027. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it