1.2 Roundoff Errors and Computer Arithmetic 19 The leftmost bit is zero,which indicates that the number is positive.The next 11 bits, 10000000011,giving the characteristic,are equivalent to the decimal number c=1.210+0.29+.+0.2+1.2+1.2°=1024+2+1=1027 The expontial pr of the umber is thereforeThe fn5bspecify that the mantissa i 1=1()+1()+1()+1()-1()+1()” As a consequence,this machine number precisely represents the decimal number -xm+D=9.2@-m(+(++6+克+6+0》 =27.56640625. However,the next smallest machine number is 01000000001:101110010000111111111111111111111111111111111111111. and the next largest machine number is 0100000000111011100100010000000000000000000000000000000000000001. ea2股a5初3 sents not only 27.56640625.but also neighbors.To be precise,it represents any real number in the interval [27.5664062499999982236431605997495353221893310546875, 27.5664062500000017763568394002504646778106689453125) The smallest normalized positive number that can be represented has all Os except for the rightmost bit of and is equivalent to 2102.(1+2-2)≈10-308 and the largest has a leading 0 followed by all 1s and is equivalent to 21024.(2-2-52)10308 Numbers occurring in calculations that have a magnitude less than 2-1023.(1+2-52)result in underflow and are generally set to zero.Numbers greater than 21024.(2-2-52)result in overflow and typi cally cause the co omputations to halt The use of binar d gits tends t omputational diffculties that occur wher ection of machine num ers is us sed to ret the rea o ex s,we now assume,for simplicity,that machine numbers are represented ir the normalized decimal floating-point form ±0.dd.dk×10,1≤d≤9,and0≤4≤9, for each i =2.,k.Numbers of this form are called k-digit decimal machine numbers
20 C HA P T E R 1.Mathematical Prelliminaries to thiv rcal number within the numenica range of the machine can be nommalized y=0.dd2.d4d4+d+2.×10 The floating-point form of y,denoted fl(y),is obtained by terminating the mantissa of y at k decimal digits.There are two ways of performing this termination.One method,called chopping.is to simply chop off the digitstoobtain f1(y)=0.d2.d×10. The other method,called rounding,adds 5x)toy and then chops the result to obtain a number of the form f0y)=0.81d2.d×10 So,when rounding,ifd5.we add 1 to d to obtain fl(y):that is,we round up When d+<5,we merely chop off all but the first k digits;so we round down.If we round down,then 6,=d,for each i =1.2.,k.However,if we round up,the digits might change. EXAMPLE 1 The number has an infinite decimal expansion of the form=3.14159265.Written in normalized decimal form,we have T=0.314159265.×102 The floating-point form of using five-digit chopping is f(r)=0.31415×10=3.1415. Since the sixth digit of the decimal expansion of is a 9,the floating-point form of r using five-digit rounding is f1(π)=(0.31415+0.0000I)×103=3.1416. ◆ The error that results from replacing a number with its floating-point form is called roundoff error(regardless of whether the rounding or chopping method is used).The following definition describes two inethods for measuring approximation errors. If p*is an approximation to p,the absolute error is Ip-p"l,and the relative error is ,provided that p≠o. Ipl ◆ Consider the absolute and relative errors in representing p by p'in the following example EXAMPLE 2 afp=0.3000×10
1.2 Roundoff Erors and Computer Arithmetic 个 b. This example shows that the same relative error,0.3333 x 10-,occurs for widely varying absolute errors.As a measure of accuracy,the absolute error can be misleading and the relative error more meaningful since the relative error takes into consideration the size of the value. The following definition uses relative error to give a measure of significant digits of accuracy for an approximation Definition 1.16 The number p*is said to approximate p to t significant digits (or figures)if r is the largest nonnegative integer for which ip-p*l <5×10 p Table 1.1 illustrates the continuous nature of significant digits by listing,for the various values of p,the least upper bound of lp-p"l,denoted max lp-pl,when p*agrees with p to four significant digits. Table 1.1 10.1 0.5 10010005000999010000 maxp-p10.000050.000250.050.52.54.9955. Returning to the machine representation of numbers,we see that the floating-point representation fl(y)for the number y has the relative error ly-fl(y) Ifk decimai digits and chopping are used for the machine representation of y=0,d1d.dd+1.×10, then y-fi(y_|0.d1.dd+.×10-0.dd.d4×10 y 0.d4k.×l0 =|0d44142.x10m- 0.de+14+2. 0.d1d2.×10 =0.dh. ×10 Sinced0.the minimal value of the denominator is 0.1.The numerator is bounded above by 1 As a consequence y-fIy) y ≤07×10=10+
22 C H A P T E R 1.Mathematical Preliminarles In a similar manner,a bound for the relative error when using k-digit rounding arithmetic is0.5×10- (See Exercise 24.) Note that the bounds for the relative error using k-digit arithmetic are independent of the number being represented.This result is due to the manner in which the machine numbers are distributed along the real line.Because of the exponential form of the char- acterstic,the same number of decimal machine numbers is used to represent each of the yas0.1,.[L,0.and[l0.100. within the limits of the ine.the num ber of de a bine numbers in ,10】is cons ate representation of numbers, putersnot exact.The anthmtic involves manipulati nyitshir logical.operations.Since the actual mechanics of these operations are not pertinent to this presentation,we shall devise our own approximation to computer arithmetic.Although our urithmetic will not give the exact picturc,it suffices to explain the problems that occur.(For an explanation of the manipulations actually involved,the reader is urged to consult more echnically oriented computer science texts.such as [Ma].Computer System Architecture.) entations fl(x)and fl(y)are given for the real numbe nd yand that ,日,8. addition tion,ultipli will assume a finite-digi arithmetic given by x⊕y=fl(f1(x)+f(y)以.x8y=f1(fx)×f1(y). xey=f1(f(x-f(y),xy=fi(f(x)÷f(v) This arithmetic corresponds to performing exact arithmetic on the floating-point repre sentations of x and y and then converting the exact result to its finite-digit floating-point representation. Rounding arithmetic is easily implemented in a CAS.The Maple command >Digits:=t; causes all arithnictic to be rounded to t digits.For example.flifl(x)+fl(y))is performed using t-digit rounding arithmetic by >evalf(evalf(x)+evalf(y)); Implementingt-digit ch etic is more difficult and requires a sequence of steps or a procedure.Exercise EXAMPLE 3 Suppose thatx =3,y=,and that five-digit chopping is used for arithmetic calculations involving x and y.Table 1.2 lists the values of these computer-type operations on fl()= 0.71428×10 and fl(y)-0.33333×10 ◆ Table 1.2 Operation Result Actual valve Absolute error Relative error 0.10476×10 22/21 0190×10 0.182×104 038095×10 8/21 0.238×10- 0.625×10- x31 0.23809×10 5/21 0.524×10- 0.220×10 yex 0.21428×10 15/2 0.571×109 0.267×10
1.2 Roundoff Errors and Computer Arlthmetic 的 Since the maximum relative error for the operations in Example 3 is 0. actory fiv Its S h ver,that we also hav 01111× fl()=0.98765 x 10,and fl(w)=0.11111x10-4.(These numbers were chosen to illustrate some problems that can arise with finite-digit arithmetic.) In Table 1.3,xu results in a small absolute error but a large relative error.The sub sequent division by the small number w or multiplication by the large number u magnifies the absolute error without modifying the relative error.The addition ol the large and sinall numbers and v produces large absolute error but not large relative error. Table 1.3 Operation Result Actual vatue Absolute crror Relative error x白 0.30000×104 0.34714×10 0.471×10 0.136 (xeu)段u 0.29629×10 0.34285×I0 0.465 0.13G (xGw)②U 0.29629×10 0.34285×10 0.465 4出U 0.98765×10 0.98766×10 0.161x10 0.163×10 One of the common err -producing calculations involves the cancellation of su racti arly equal numbe s.Suppose wo numberx and y.withy,have the k-digit representation fl(x)=0.did.dpuptiap+2.ax l and fy)=0.dd.d,p+p+2.月×10 The floating-point form of x-y is fl(fl(x)-fl(y))=0.ap+p+2.x l P, where 0.0p+10p+2,.om=0.p+1cn+2.ak-0.6p+1fp2.fBe The foating-point However,in most calcula y will be ed k digi with the last being either zero or randomly assigned.Any further calc the problem of having only k-p digits of significance.since a chain of calculations is no more accurate than its weakest portion. It a finite-digit representation or calculation introduces an error,further enlargement of the error occurs when dividing by a number with small magnitude (or,equivalently.when multiplying by a number with large magnitude).Suppose.for example,that he numher oximationz where the or 8 is introduced by representation or by previc divide by e =10-,whcre n U.Then ≈fn(e 】=(2+8)×10. fi(E)