Experiment 1. Round-off Errors

Suppose that (for some reason) we need to evaluate the definite integral

In = ó
õ
1

0 
xn ex - 1  dx      (1.1)(PopUp1.1)
for several integers n ³ 0. What is a ``good'' method to numerically evaluate In?

Although this is a specific mathematical problem, our main purpose in beginning this book with such a simple problem is not to actually evaluate In. Instead, our objectives in this experiment are the following:

·
We want to show that there are often two or more methods that can be use to solve a mathematical problem on the computer. It is important to think beforehand about the various methods and to determine which are ``good'' and which are ``not good''. To do this, a little theory is often helpful.
·
We want to study a recursive formula and the types of behavior that can arise when solving such a formula. It is quite common that numerical algorithms use recursive formulas and so we begin with a particularly simple one.
·
The accuracy of a method, and the errors inherent in it, should always be considered before putting the problem on the computer. Here we begin with a very simple problem, where the accuracy can be easily evaluated - and the results are quite surprising.
·
We begin with a formula which has a numerical instability in order to point out that even incredibly simple formulas require thought - and a little theory.
·
We discuss what it means for a mathematical model to be ``well-behaved'' and for a numerical algorithm to be ``well-behaved''.
·
In the first two experiments we also focus on the computer programs themselves. We explain the structure of the programs and how input and output are handled. Some of the topics discussed are:
·
single precision vs. double precision.
·
the machine epsilon.
·
passing parameters to subroutine and function subprograms.
·
input and output.
·
some basic errors which are often make in writing programs, and how to avoid them.
Section 1.1. Various Methods of Evaluation

Subsection 1.1.1. ``Standard'' Methods

Before we even consider evaluating In numerically, we should study In analytically. First, we determine some bounds for In. Since xn e-1 < xn ex-1 < xn for x Î (0,1) we obtain the bound

e-1
n+1
< In < 1
n+1
 ,     (1.2)
so that limn ® ¥ In = 0. Second, since In is a strictly monotonically decreasing function of n, xn+1 < xn for all x Î (0,1). These are two simple facts that are readily apparent from the integral 1.1, and that will be useful later.

Before deciding how to evaluate this integral numerically, there are a number of questions that need to be considered:

·
Is In to be evaluated for one n or for several n? (And, if it will be evaluated several times, how large is ``several''?)
·
How large can n be (i.e., 10 or 100 or 1000 or ¼)?
·
What is the accuracy required in the evaluation of In?
Based on the answers to these questions, we consider various algorithms to evaluate In numerically. Three methods for the evaluation of In come immediately to mind:

(1)
Use integration by parts to calculate the integral analytically.
(2)
Use a numerical integration code.
(3)
Expand ex in a Taylor series, do the integration analytically, and then evaluate the sum numerically.

Method 1 can certainly be done by hand if n is not ``very'' large. It is also possible to have a symbolic manipulation package compute In if n is not ``too'' large, and then input the results into an array in the program.

Method 2 can also be used if n is not ``too'' large. This method can be quite slow if the integrand must be evaluated repeatedly and if high accuracy is required. For large n the slope of the function becomes very large and the area is concentrated near x = 1. This can also increase the number of times the integrand must be evaluated and degrade the accuracy since if n >> 1, the interval [0,1] will have to be subdivided into very many subintervals to obtain an accurate answer.1 (Plot the integrand free-hand to see this.)

Method 3 is quite practical since it is straightforward to write down the Taylor series and then evaluate the integrals analytically. Using the Taylor series expansion for ex the integrand becomes

xn ex-1 = e-1 xn ¥
å
j = 0 
xj
j!
= e-1 ¥
å
j = 0 
xj+n
j!
     (1.3)
so that

In = e-1 ¥
å
j = 0 
1
(j+n+1)  j!
 .     (1.4)
This sum converges quickly and, since all the terms are of one sign, there is no danger of loss of accuracy from subtraction. (Catastrophic cancellation is discussed in Experiment .)

We want to use the method which is the most ``efficient''. There is no precise definition for this term, but generally the most efficient algorithm generates solutions to a predetermined degree of accuracy using the least amount of human and computer effort. Note that Method 3 converges rapidly even for large n, unlike Methods 1 and 2. Thus, in general we expect Method 3 to be the most efficient of the three methods. However, there is another method to consider.

Subsection 1.1.2. The Recursive Method

For this simple integral there is also a fourth method which is much simpler to use.

(4)
Use integration by parts to obtain the recursive formula

In
= xn ex-1 |01 - ó
õ
1

0 
n xn-1 ex-1  dx = 1 - n In-1   for \MS n = 1,2,3,¼      
     (1.5a)
where
I0
= ó
õ
1

0 
ex-1  dx = 1 - 1
e
 .       
     (1.5b)

With this simple recursive formula we can obtain Im for a specific m Î \NATURAL[1,¥), even for very large m. Note that we calculate all m integrals at once (i.e., I0 ® I1 ® I2 ® ¼® Im). This method is very fast and it is almost impossible to code incorrectly. Also, it appears to be very accurate (almost to the machine accuracy of the computer). Thus, either Method 3 or Method 4 appears to be the most efficient, depending on the answers to the questions we first posed.

Using the program round.f we have used Method 4 to calculate In for n = 0,1,¼,20 in single precision on a Sun, and we show the results in Table 1.1. The data is written out in free format as ``WRITE(6,*)N,FINTEG'' so that the numbers are correct to all the digits shown.2 (However, see the discussion in subsection 1.2.3, since this behavior does not occur for all compilers.)


10 11 0     1 In vs. n using the forward recursive formula 1.5 on a Sun.

The computed values are 0 horrible! With such a simple recursive formula it is hard to believe that I12 is negative or that | I20 | is larger than the total human population of this planet! Note that these values seem to be behaving correctly for ``small'' n, but all of a sudden the values begin to grow exponentially in magnitude and to alternate in sign.

The first thought that should immediately come to mind when viewing such results is that there is a bug in the program. However, we assure you that this is not the case. Instead, these results are due to a numerical instability in the algorithm. (Because of the alternating sign in the values, this is often called a sawtooth instability.) The simplest way to confirm that this is a numerical instability is to run the program again but in double precision. We see the same sawtooth instability, but it occurs for larger values of n. (See Question 1.2.1).

What can be the cause of this instability? (After all, it is hard to imagine a much simpler recursive formula.) Suppose that we could evaluate the formula In = 1 - n In-1 exactly, i.e., with no arithmetical errors. Then there would still be one source of error in the calculation of Im for any specific m Î \NATURAL[1,¥), namely the evaluation of I0. This error occurs because e is an irrational number, but it must be represented by a rational number in the computer. We denote this error by \deltaI º I0 - \WH I0 where \WH I0 is the numerical approximation to I0 in the computer. Does this error grow or decay as we calculate I1, I2, I3, ¼?

Every floating-point operation in a computer can introduce errors. However, we simplify our error analysis by assuming that only the calculation of I0 is in error and that every other calculation is done exactly. Let \WH In be the numerically calculated approximation to In for each n so that the initial error is \deltaI = \WH I0 - I0. Then

\WH I0
= I0 + \deltaI
\WH I1
= 1 - \WH I0 = 1 - (I0 + \deltaI) = I1 - \deltaI
\WH I2
= 1 - 2\WH I1 = 1 - 2(I1 - \deltaI) = I2 + 2\ < \deltaI
\WH I3
= 1 - 3\WH I2 = 1 - 3(I2 + 2\ < \deltaI) = I3 - 6\ < \deltaI
\EqualVdots
\WH Im
= Im + (-1)m m! \deltaI       
     (1.6)
Thus, the error grows in magnitude as m! and alternates in sign, which qualitatively explains the results in Table 1.1.

In order to compare the error formula 1.6 with Table 1.1 quantitatively, we need to estimate \deltaI. It is probably on the order of magnitude of the machine epsilon of the computer, which we denote by \me.3 It appears that \deltaI » 5 × 10-7 by comparing the analytical value of I0, namely

I0 = 0.63212 05588 28557 67840¼ ,     (1.7)
with \WH I0 in the table. However, if we print \WH I0 to full accuracy we find that the error is actually \deltaI » 9 × 10-94, so that I15 - \WH I15 » -12,000. This is an excellent approximation to the value in Table 1.1 and shows that the numerical instability is caused by the multiplication of the error by n for each iteration. We emphasize that every mathematical operation in the calculation of Im introduces errors, not just the initial calculation of I0, and so our error estimate is much better than we have any right to expect. However, this simple analysis certainly demonstrates the cause of the numerical instability.

Is there any way to develop a stable numerical algorithm using recursion? 0 Yes! Simply invert the recursive formula to

In-1 = 1 - In
n
    for \MS n = N, N-1, ¼,2,1  .     (1.8a)
Thus, to compute Im for a given m we can work backwards by

IN ® IN-1 ® IN-2 ® ¼ ® Im. It is reasonable to suppose that this backward recursive formula is a stable algorithm because the error is being multiplied by 1/n for each iteration. Of course, the difficulty is that we do not know the initial iterate, IN, for any N Î \NATURAL[1,¥) (except by computing it numerically somehow). We could use Method 3 to calculate IN - but then we could use Method 3 to calculate Im directly. Instead, we use the simple approximation that

IN = 0     (1.8b)
for N ``sufficiently large''. Our error is at worst 1/(N + 1), as we see from the bounds 1.2, and it seems reasonable that this error decays rapidly using backward recursion.

The calculation of I15 performed using this backward recursive formula with N = 30 is shown in Table 1.2. The value of \WH I15 certainly looks correct, but can we trust it? Obviously we have 0.229 \lt I15 \lt 0.625, as is required by our simple inequality 1.2. This gives us some cause for optimism. As a much better test, however, we can continue the backward iterations until we obtain I0. If we do so, we find that I4, I3, ¼, I0 in Table 1.2 agree identically with the values in Table 1.1 to the number of digits shown. (See Question 1.2.2).


20 21 0     1

In vs. n using the backward recursive formula 1.8 on a Sun.

To verify analytically that backward recursion is stable, we repeat the previous error analysis where now \deltaI(N) º IN - \WH IN and we find that

Im - \WH Im = (-1)N-m
N (N-1) (N-2) ¼(m+1)
 \deltaI(N)  .     (1.9)
Letting N = 30 and \WH IN = 0 (so that \deltaI(N) = IN), we find \Absb I15 - \WH I15 < 10-20  \deltaI(N). Of course the actual error in the value is much larger because every arithmetic operation introduces errors. But the errors remain small, as opposed to the ``factorial'' growth of errors using forward recursion. In other words, the errors rapidly decrease, so that the errors that arise in each arithmetic operation should remain on the order of a few \me.

We have now shown how to use Method 4 to efficiently calculate In. However, we still have a few loose ends to tie up. In our discussion of round-off errors we have been somewhat loose with our terminology. When we talk about the error in a numerical calculation we should say explicitly (or implicitly by making it clear by the context) whether we mean the absolute error or the relative error. For example, if the exact solution to some problem is Q = 0.000625678¼ and the number we find in the computer is \WH Q = 0.000625 then the absolute error is

dQ = Q - \WH Q » 7 × 10-7  ,
which is quite small. However, the relative error is

dQ
Q
» 7 × 10-7
6 × 10-4
» 10-3  ,
which is much larger. We say that the computed solution is good to three significant digits or has three digits of accuracy. When we talk about an error being on the order of magnitude of \me we mean a relative error. For example, if we let \WH P denote the floating-point representation of the number P, i.e., the value that is actually stored in the computer, then (as we discuss in Experiment )

ê
ê
ê
P - \WH P
P
ê
ê
ê
< \me  .
Thus, the relative error between Q and \WH Q is many orders of magnitude larger than \me (if \me » 10-7, as it is on a Sun in single precision).

It is important to realize that the ``trick'' of computing In by backward recursion is not merely a pedagogic example. A number of the functions used in scientific computing, such as Chebyshev polynomials (see Expt. ???), Legendre polynomials (see Expt. ???), and Bessel functions (see Expt. ???), are calculated using recursive formulas. The computations of the first two of these functions are stable using forward recursion. However, the computation of Bessel functions requires both forward and backward recursion depending on the values of the parameters. PUT IN OTHER APPLICATIONS???

Subsection 1.1.3. Well-posedness

We have now discussed some examples of ``well-behaved'' and ``badly-behaved'' numerical algorithms which can be used to evaluate In. For example, the backward recursive formula produces ``good'' approximations to In, whereas the forward recursive formulas produces ``bad'' approximations. It might seem that to evaluate a mathematical problem numerically we simply find a ``well-behaved'' numerical algorithm, code it up, and run it. However, life is not this simple - and neither is scientific computing. Many mathematical problems are themselves ``badly-behaved'' and so any resulting numerical algorithms will also be ``badly-behaved''.

In this subsection we digress briefly from our specific mathematical problem and discuss the general concept of what is a ``well-behaved'' mathematical model or problem and what is a ``well-behaved'' numerical algorithm. We then apply this concept specificially to the problem at hand. We consider a general mathematical model by using an operator \sL which converts the known input, represented by f, into the output, represented by y. We can write this mathematical model abstractly as

y = \sL{ f}  .
(f is the independent variable or variables, and y is the dependent variable or variables.)

A mathematical model, denoted by the operator \sL, is well-posed if the output satisfies the following three properties:

(1)
For each input f there exists at least one output y = \sL{ f}.
(2)
For each input f there exists at most one output y. (If properties (1) and (2) are both satisfied, then there exists exactly one output for every input.)
(3)
The output y is ``insensitive'' to small variations in the input.
If a mathematical model is not well-posed, then it is ill-posed. Frequently, the words ``well-conditioned'' and ``ill-conditioned'' are used in place of ``well-posed'' and ``ill-posed''.

The first two properties are quite straightforward. For every input we expect to obtain exactly one output. Also, if we repeatedly use the same input, we expect to obtain exactly the same output. If the mathematical model is linear, this explanation is sufficient. However, if the mathematical model is nonlinear, we have to explain more clearly what we mean by ``exactly one output'', since there is often more than one solution to a given nonlinear problem. As a very simple example, the solutions of x2 - 4 = 0 are x = ±2 so that the problem has two solutions. However, if we only use the larger root (if they are real) or the root with positive real part (if they are complex) then we obtain a unique solution. Note that we can even have an infinite number of roots to a nonlinear problem. For example, the complex solutions to ez = 1 are z = 2pi n for any integer n. Thus, we must replace the requirement of a ``unique solution'' by something more general. We will not attempt to give a precise mathematical description of this generalization. Instead, we simply require that the solutions to a nonlinear problem be ``distinct'' or ``locally unique''. That is, we can give a rule which will allow us to obtain a unique solution.

Since the first two properties are satisfied by almost all mathematical problems, it is the third property which determines whether a mathematical model is ``well-behaved''. In simplest terms, if the input changes by a ``small'' amount, then we want the output to change by a ``small'' amount - or, at least, not to change by a ``large'' amount. Mathematically, the input is now f+ df and the output is y+ dy. We want dy to be ``small'', or at least not ``large'', when df is ``small''.

Well-posedness is a crucial requirement for all the mathematical problems we will discuss for two reasons. First, many of these problems come from physical systems where the input parameters can only be determined to within certain error bounds. If the output varies by a ``large'' amount when the input parameters vary within their allowed bounds, then the mathematical problem is worthless in helping us understand the underlying physical system.

Second, to study the mathematical problem we write and run one or more computer programs. As we discuss at length in Experiment , the computer generally makes small errors in storing floating-point numbers and in performing mathematical operations on these numbers. Again, if the output varies by a ``large'' amount due to these small errors, then the computer program is worthless in helping us study the mathematical problem.

Now we return to our specific problem, namely the calculation of ò01 xn ex-1  dx and show that it is well-posed. This integral does not appear to have any parameters that we can change by a ``small'' amount, so let us generalize the integral to

\cI{f, a, b} = ó
õ
b

a 
f(x)  dx  .
Now we have three input parameters and we recover the original problem by lettingf(x) = xn ex-1, a = 0, and b = 1. If we make small changes in any or all parameter, we will only obtain a small change in the output. For example, suppose we change the upper endpoint by a small amount. Then

\Absb \cI{f, a, b + db } - \cI{f, a, b}
= ê
ê
ê
ó
õ
b+db

b 
f(x)  dx ê
ê
ê
£ ó
õ
b+db

b 
\Absb f(x)  dx
£ db
max
x Î [b,b+db] 
\Absb f(x)  ,
so that a ``small enough'' change in db does lead to a ``small'' change in the value of the integral. Thus, the original mathematical problem is well-posed.

The same cannot be said of the forward recursive formula. The input is I0 and m, and the output is Im, i.e., \sM{I0, m} = Im. From the error formula 1.6, if we change I0 to \WH I0 = I0 + \deltaI, then the output changes to Im + (-1)m m! \deltaI. In the discussion, we assumed that \deltaI arose because we could not store I0 precisely in the computer. However, we can think of \deltaI0 as being an analytical change to determine if the mathematical algorithm is well-posed or ill-posed. Obviously, if m is very small, the output does not change much. However, if m is not very small, the output changes by a large amount. Thus, the well-posedness of the algorithm \sM depends on m. For our purposes, we cannot assume that m is always small, and so we see that \sM is analytically ill-posed. Thus, we cannot expect the numerical implementation of this algorithm to give an accurate answer. (In Question 1.2.13 we explain how the forward recursive formula, which is equivalent to a well-posed integral operator, can be ill-posed.)

Before discussing the backward recursive formula, we briefly digress to explain how our formal notation helps us understand what is happening. There are three different calculations which are necessary to determine whether the mathematical algorithm \sM and/or the numerical algorithm \WH \sM are well-conditioned. The calculation we are actually interested in is \sM{I0, m}, which is the analytical calculation of the mathematical algorithm. Since we cannot actually evaluate this algorithm for arbitrary m, we have to settle for obtaining a numerical value which is as close as possible. This numerical value is \WH \sM{ \WH I0, m } for a given m, and from Table 1.1 it is clear that the numerical algorithm produces results which are not at all close to the analytical results.

What is going wrong? To answer this question, we first consider is whether the mathematical algorithm is well-conditioned. We determine this by comparing \sM{ \WH I0, m } to \sM{ I0, m }. Although we cannot analytically evaluate either algorithm for arbitrary m, we can analytically determine the difference between them. Since this difference grows as m!, it is clear that the mathematical algorithm is ill-conditioned. (If this difference was ``small'', then we would know that the mathematical algorithm was well-conditioned and so the numerical algorithm would have to be ill-conditioned.)

In the backward recursive formula we input N, IN, and m, and obtain the output Im, i.e., \sM 1{ N, IN, m } = Im. We use IN = 0 (but this is not required in the algorithm) and choose N > m large enough that the error in IN is small as shown in the error formula 1.9. (We investigate how large N - m has to be to obtain an accurate answer in Question 1.2.5.) Thus, \sM 1 is a well-posed algorithm. Numerically, each operation in \WH \sM 1{ N, IN, m } introduces new errors, but it seems ``reasonable'' that these errors should remain small. We check this for ``reasonably small'' values of m in Table 1.2. In Question 1.2.5 you are to check this for ``large'' values of m. (Of course, this does not prove that \WH \sM 1 is well-conditioned for ``very large'' value of m, but it does provide very strong supporting evidence.)

We will discuss the posedness of each mathematical problem and each numerical algorithm we use in this workbook. Rather than trying to give a more general discussion of posedness here, we will discuss it in more detail in the various experiments.

Section 1.2. The Program

Each experiment contains a section entitled ``The Program'' which discusses all facets of the computer program itself. (If there is more than one program for an experiment, then each program is discussed in a separate section.) This section always begins with a box which contains the name of the program, along with any other files that are also required. Then there is a discussion of the program itself, which includes a description of what is output by the program. This discussion can be quite lengthy if this program introduces a new ``feature'' which has not been present in previous programs. There is always a subsection entitled ``The Input'' which describes every parameter that needs to be input. This is immediately followed by a subsection entitled ``The Output'' which describes exactly what is printed out and/or plotted. Also, there is always a subsection entitled ``Debugging the Program'' which discusses the methods we used to debug this program. A number of the questions at the end of each experiment ask you to repeat some of these methods in order to convince yourself that the program is ``reliable''.

Note
The numerical output which you see on the computer screen is also stored in a data file with the same name as the program and with the extension ``lpt''. Any graphical output is stored in a data file with the extension ``ps''. We will not restate this in each experiment. However, we will always mention any other data files which are input or output.
Warning
If one of these data files does not exist before a run, then it is created by the program and the data is saved in it. However, if it does exist before a run, then any data which it initially contains is overwritten by the data from this run. If you want to save the output of a run for later use, then change the name of the file immediately after the run.
Each computer program itself also includes some documentation. For example, there is always a brief description at the beginning of the program which explains what it does. Of course, this description is often not easy to read because it is not easy to type mathematical expressions in text files. The discussion in the experiment itself in the workbook is much more complete and should be your main source of information about the program.

In one area the documentation in each computer program is quite detailed. The input parameters for each program are documented just before the lines of code that actually read in the parameters. To find the input parameters in the program, you look for the comment line ``CCCCC LEVEL ?:''. (``Levels'' are discussed in the subsection ``The Input'' which follows.) This line is followed by a description of each input parameter. Then comes a WRITE statement which prints out all the parameters that you are to input. This is followed by a READ statement which reads in the parameters that you enter. Thus, you are always prompted for the parameters that you are to input, so that you do not need to continually refer to the discussion in the experiment. To make inputting the parameters evey easier, the prompt is usually a brief description of each parameter, rather than its actual variable name in the program. (The reason the documention is so detailed in this one area is that we always write the program before the experiment. This documentation is our only description of what the parameters actually mean.)

Now it is time to discuss the program round.f which generates the calculations shown in the two tables we have presented. First, we discuss the main program, which is contained in the file round.f. Later, in subsection 1.2.3 we will discuss the subroutine file subutl.f, which is needed by almost every program in this workbook.

The main program itself is very easy to read and to use, particularly because it is so short. In fact, it is one of the shortest in the entire workbook. There is one feature about it which is generic to all our programs and which we want to emphasize. Note that the actual forward and backward recursive formulas require only two lines of code. Even if we include the DO and CONTINUE statements which cause these formulas to be repeated and the initialization of these formulas, they only require eight lines of code. Since there are well over one hundred lines of code, you can see that most of the program is taken up by documentation or comments, by blank lines which improve the readability of the program by setting off different parts of it, by the input and output statements, and by ``other'' Fortran statements (i.e., statements that are directly connected to the two recursive formulas). When looking at program listings, it is very important to understand that the coded mathematical formulas make up only a very small fraction of the entire program. If you think of a program as a ``reasonably small'' number of separate pieces of code, rather than as a large number of lines of code, you will find it much easier to ``understand'' how the program works.

There is another feature about this program which we want to emphasize. Sometimes, when writing codes programmers try to save a few lines of code at the expense of readability. We always try to make programs as readable as possible, even if this requires adding a number of lines of code. For example, in first part of round.f, which calculates the forward recursive formula, we calculate the initial iterate separately from the other iterates. A shortened version of the program is

ONE = 1.
FINTEG = 1. - EXP(-ONE)
I = 0
WRITE(6,*)' K , I(K) = ',I,FINTEG
DO 50 I = 1,M
FINTEG = 1. - I*FINTEG
WRITE(6,*)' K , I(K) = ',I,FINTEG
50 CONTINUE

However, we want to handle the initial iterate in exactly the same way as the other iterates. Thus, the WRITE statement we use to print the initial iterate is exactly the same as for the other iterates, even down to defining the variable I outside the loop as well as using it for the loop counter. This makes it easier to read the program and to check that it is ``correct''.

There are many other ways that we try to improve the readability of our programs. It is good programming style to have variables ``mean something'' to the programmer. (For example, EPSSSP means ``machine EPSilon Stored in Single Precision'' to us.) This makes it easier to read the code. Thus, TESTDP is a double precision variable which contains \fraction1/3 in either single precision or double precision. Also, IUNIT is the unit to which the output is stored and ISPODP is a variable which determines whether the program is being run in single or double precision.

The variable IUNIT which appears in the call to SPORDP is an example of another feature that we (almost) always use. We prefer to use only variables in subprogram arguments (i.e., no constants). For example, we find it easier to read

IUNIT = 10
CALL SPORDP(TESTDP,IUNIT,ISPODP,EPSSSP)

than

CALL SPORDP(TESTDP,10,ISPODP,EPSSSP)

where IUNIT is the unit to which the WRITE statement is directed in this subroutine. We agree that this adds variables and lines of code to the program. However, the program is now easier to follow.

Also, we prefer to have only one STOP or RETURN statement in a program or subprogram, because we think it makes the code easier to debug. (We almost always use the statement label ``1000'' for this statement.) For example, in a subroutine we can put in WRITE statements to print out the values of all the variables when the subprogram terminates. If we are debugging a running program, we can also put a breakpoint at this statement to stop the execution of the program just before exiting the subprogram. We can then examine the value of any variable.

There are also a number of other more complicated features in this program which are generic to all our programs. However, we leave this discussion to subsection 1.2.3. Instead, we now discuss the input to all our programs and to this program in particular.

Subsection 1.2.1. The Input

In this program there is only one line of input. However, often in later programs there are two or three or four or ... lines of input. Sometimes we need many lines because there are many parameters to input. And sometimes the input parameters for one line are different depending on the input to the previous line. We may even need additional lines of input in certain cases. (For example, we can usually select many different functions to test in a program. Some of these functions require additional parameters while others do not. When additional parameters are required, the next line prompts for them.) Thus, we do not talk about ``lines'' of input, but ``levels'' of input. We highlight which level we are discussing in the text to make it easier to follow. Also, when running the program we can usually move back up one level of input at a time in order to change previously set parameters. In this way, most programs are completely interactive.

In this program, there is only one level of input. However, different parameters are needed for forward and for backward recursion, and so we ``toggle'' back and forth between these two. This program always starts by asking for the input for the forward recursive formula.

When forward recursion is used, the only parameter needed is m. The I\Ftt-70 M that is desired. Calculate, and print, I0, I1, ¼, I\Ftt-70 M. Quit (or end) this program Switch to using backward recursion. You are then prompted for the parameters needed in backward recursion.

When backward recursion is being used, two parameters are needed. Begin the backward iteration with I\Ftt-70 Nmax = 0. (This parameter is denoted by N or Nm in this experiment.) The I\Ftt-70 M that is desired. Calculate, and print,

I\Ftt-70 Nmax, I\Ftt-70 Nmax-1, I\Ftt-70 Nmax-2, ¼, I\Ftt-70 M. Quit (or end) this program Switch to using forward recursion. You are then prompted for the parameter needed in forward recursion.

There are actually three ways to end this program.

·
We have already described how setting \Ftt-70 M = 0 ends this program. However, there are two alternative methods which are quite useful.
·
On many computers, typing `` d'' (i.e., Control-d, which means to hold the control key down while typing ``d'') also ends this program. This is the standard ASCII ``end-of-file'' mark and causes the read statement ``READ(5,*,END=1000,ERR=1000)...'' to pass control to the statement number 1000 which is given by the keyword ``END''. This statement number is the STOP statement and so ends the program. (`` d'' will be more useful in later experiments.)
·
On many computers, entering an incorrect quantity also ends this program. For example, this program is waiting for a number to be entered. If we enter a letter or a special character instead, many compilers will detect that this is an error and cause the read statement to pass control to the statement number 1000 which is given by the keyword ``ERR''.
Note
Most of the programs in this workbook involve more levels of input. Then it is very useful to be able to go down one level of input at a time in order to change some previously entered quantites. On most computers at least one of these two alternative methods can be used. However, neither is guaranteed to work. On some computers these are fatal errors and cause an immediate aborting of the program. However, if they do work, they provide a simple way to bounce around between the levels of input.
Subsection 1.2.2. The Output

In this program the output is very simple. However, since in many programs there is much output, we briefly describe everything that is output (both printed output and plotted output). Recall that printed output appears both on the computer screen and in the data file round.lpt.

The first line of output from the program states whether this program is being run in single precision or in double precision. The second line of output prints out an approximate value for \me. In Experiment  the machine epsilon is discussed in great detail, along with how it determines the maximum accuracy we can expect in numerical calculations. However, we need it here and so we simply print it out.

1

The only output is k and \WH Ik. In most programs we would also output the error in the numerical results, i.e., \WH Ik - Ik. However, we do not have a ``useful'' closed form expression for Ik and so we cannot calculate the error. (For a ``non-useful'' closed form expression for Ik see Question 1.2.13.)

Subsection 1.2.3. Generic ``Features'' of Our Programs

We have already discussed a number of simple features about our programs which are exemplified by round.f. Here we discuss a number of other features which require more explanation.

The two areas we will emphasize in almost every experiment are the accuracy and stability of numerical algorithms and of the resulting programs. One way we study these areas is by switching a program between single precision and double precision. It is important to be able to do this quickly and reliably, and thus to be able to do it in as few operations as possible. This is our first topic.

This program can be run in either single or double precision. Recall that in Fortran all variables which begin with the letters A-H and O-Z are implicitly declared to be real variables unless overwritten by another type declaration, such as an INTEGER or DOUBLE PRECISION statement. We use this fact to make all real variables double precision by putting the statement

IMPLICIT DOUBLE PRECISION (A-H,O-Z)

at the beginning of the program and at the beginning of 0 EACH subprogram in this file. (There are no subroutines in this program, but we are describing all the programs that appear in this workbook.) To convert the program back to single precision, simply comment out each IMPLICIT statement.

Warning
We emphasize the word 0 EACH because the program usually will not work correctly if any subprogram is skipped.
Since we do not want to have to modify anything else, there are a few other details we need to discuss. It is important to use generic names for Fortran library functions when using this IMPLICIT statement to convert between single and double precision. For example, the generic Fortran name for the sine function is SIN, which corresponds to SIN (real), DSIN (double precision), and CSIN (complex). That is, SIN(X) returns a real value if X is real, a double precision value if X is double precision, and a complex value if X is complex. A list of Fortran intrinsic functions is contained in Appendix .

There are two difficulties that remain when using IMPLICIT statements and they both concern floating-point constants. Suppose that Y is a double precision variable. Then the statement ``\MS Y = SIN(1.)'' calculates sin1 in single precision and then converts the single precision value to double precision. Thus, the value in Y is only accurate to single precision. One solution is to use the statement ``\MS Y = SIN(1.D0)''. However, we prefer either

ONE = 1.
Y = SIN(ONE)

or

DATA ONE / 1. /
Y = SIN(ONE)

The reason we do not like to use double precision constants is that there are times when they will fail. For example, the statement ``\MS Y = MIN(A,1.D0)'' only works if A is a double precision variable (since all the arguments to MIN and MAX must have the same precision). Similarly, the statement ``\MS Y = MIN(A,1.)'' only works if A is a single precision variable. Also, as we describe below, it is very dangerous to pass floating-point constants to subprograms in the argument lists.

The second difficulty with constants is the resulting precision when they are used in arithmetic operations. For example, if Y is double precision the statement ``\MS Y = 2./3.'' only evaluates \fraction2/3 in single precision because the operation of division is carried out in single precision (because all of the terms in the operation are real). However, if X is also in double precision then the statement ``\MS Y = 2.*X/3.'' evaluates \fraction2/3 in double precision because one of the terms is double precision. ``\MS Y = 2.d0/3.d0'' gives the correct result in both single and double precision. However, it is not always possible to use double precision constants when a program is being run in single precision (as discussed in the previous paragraph). The solution is to use variables whenever necessary. For example,

DATA TWO / 2. / , THREE / 3. /
Y = TWO/THREE

always gives the correct result.

Difficulties can also occur if floating-point constants are passed in the argument list. Consider the following code fragment.

IMPLICIT DOUBLE PRECISION(A-H,O-Z)
.....
CALL SUB(1.)
.....
SUBROUTINE SUB(Y)
IMPLICIT DOUBLE PRECISION(A-H,O-Z)
.....

The single precision number 1. is being passed to the double precision variable Y. On some computers Y will, in fact, have the value 1 on input to this subroutine. However, on others it will not! (This is discussed in detail in Experiment .) If we change the calling statement to ``CALL SUB(1.D0)'' then Y will always have the correct value. However, if we comment out the IMPLICIT statements then Y will be a single precision variable and so the value of Y might again be incorrect. The solution is to use the statement ``CALL SUB(ONE)'', which will always work correctly.

We emphasize that to change any program in this workbook between single precision and double precision, all that is required is to comment out or un-comment out any IMPLICIT statements in the main program and/or any subprograms (i.e., both function and subroutine subprograms) of the file which contains the main program. No other files need be modified. In particular, no changes need to be made to the subprogram SPORDP. In fact, this is why so many lines of code are required by this subprogram and so many variables are necessary (i.e., TEST13, TESTDP, and EPSSSP).

Since we are performing experiments on a computer program, it is important that everything we do be saved for later reference. We do this by saving everything that appears on the computer screen (even graphics) to one or more data files. To do this, each program begins with an OPEN statement using unit 10. Every WRITE statement that prints data on the screen (i.e., unit 6) is followed by a WRITE statement to unit 10. In this way everything that appears on the screen is stored in a data file for later use.

This program next calls the subroutine SPORDP (which is short for ``Single Precision OR Double Precision'' to check whether this program is being run in single precision (ISPODP = 1) or in double precision (ISPODP = 2). The \me for this precision is also calculated. The precision is then printed on the screen and is also stored in the data file (since IUNIT = 10). There are a number of reasons for using this subroutine:

·
When we first run this program during a session, we do not have to look at the source code to determine the precision.
·
The variable ISPODP can be used later in the program if different options are necessary for different precisions. For example, in Linpack all single precision subroutines begin with the letter ``S'' while all double precision subroutines begin with the letter ``D''. We use IF statements to call the appropriate subroutine based on the value of ISPODP. (Recall we said that no other files need be changed. This is how we do it. SAY MORE???)
·
The machine epsilon \me (EPSSSP) is available for later use in the program.
Often, our results will be very accurate and so we want to print the results to their full precision. If we simply print the errors in the results, then we only need one or two significant digits in E format. However, if we are printing out the results themselves, then we should print them out to their full precision - both when we are running in single precision and in double precision.

Most compilers allow this to be done by using list-directed I/O. For example, on most compilers the statement (which appears in round.f)

WRITE(6,*)' K , I(K) = ',I,FINTEG

causes the integer variable I and the real or double precision variable FINGET to be printed out to their full accuracy. Unfortunately, the ANSI standard for list-directed output only requires a slightly higher accuracy for double precision than for single precision. If the compiler you are using follows the ``spirit'' of the ANSI standard, rather than its ``letter'', then use list-directed write statements. If it does not, then more work is required.

It is possible to use a FORMAT statement to print the variables, as in

WRITE(6,10000)I,FINTEG
10000 FORMAT(' K , I(K) = ',I12,E26.16)

This works on most computers, but it is not foolproof. For example, on a Sun the maximum absolute value of an integer is 231 and so ``I12'' is sufficient. However, on a Cray the maximum absolute value of an integer is 263 and so it is not. Similarly, on a Sun ``E26.16'' is sufficient to print a double precision variable out to full accuracy, but on a Cray it is not. (We discuss this in Experiment .) Furthermore, on a Sun this will print too many digits of a single precision variable since approximately the last 9 digits will be in error. We mentioned this point previously in a footnote (in a slightly different context), and it is worth reemphasizing. 0 It is a very bad practice to print more than the actual number of digits which have been calculated. (For example, we would assume that the run had been done in double precision, since so many digits were printed out, and wonder why the results were not more accurate.)

Instead, we use the variable LDOORE which means ``List-Directed Output OR E format''. If LDOORE = 1 then list-directed output is used. However, if the compiler supports the ``letter'' of the ANSI standard then set LDOORE = 2 and the program determines what precision should be used from EPSSSP. It does this by the code fragment

NDIGIT = -LOG10(EPSSSP)
WRITE(FRMT,10050)NDIGIT+10,NDIGIT
10050 FORMAT('('' K , I(K) = '',I12,E',I2,'.',I2,')')

and the WRITE statement

WRITE(6,FRMT)I,FINTEG

Thus, I is printed out as ``I12'' (which seems large enough for our purposes) and FINTEG is printed out in E format using approximately log10 1/\me significant digits. It is a little annoying to print I out using so many extra spaces, but there is no easy way to avoid it. (We could determine the number of digits in I ourselves and modify the format statement accordingly, but this is a lot of work for very little gain.)

There are a number of subroutines in the file subutl.f. The most important one is SPORDP, which determines the precision of the program which calls it. In a nutshell, this subroutine works by comparing the variable TESTDP with the calculations of 1.E0/3.E0 and 1.D0/3.D0. It determines whether TESTDP has the same accuracy as the single or double precision calculation of \fraction1/3. It return the variable ISPODP, which is 1 if the program is in single precision and 2 if the program is in double precision. The REAL variable EPSSSP is also returned. It contains an approximate value of \me to the precision of the program, using the same method as discussed in subsection . (This constant is only needed by a few programs, but it is easy to calculate and so there is no reason not to include it.) Note that the variable EPSSSP itself is in single precision. However, if the program is in double precision then it returns the double precision value of \me, although stored in a single precision variable. (\me is usually only needed to one or two significant digits.)

There are also a number of other subroutines in this file. The subroutine WHSTRG in particular is used in many of the programs in this workbook. It parses a line of input to determine what action is desired. We discuss this subroutine further in subsection .

Subsection 1.2.4. Debugging the Program

In every experiment we discuss how we have debugged the program or programs. The ``guts'' of this program, i.e., the recursive formulas 1.5a and 1.8a, is so short that there is little likelihood of miscoding these two formulas. However, it is still worthwhile to go over the steps required to check that the program is ``reliable''. We can calculate I1, I2, I3 by the recursive formula and also directly by integration by parts as

I1
= ó
õ
1

0 
x e-x  dx = ( x - 1 ) ex - 1 |01 = e-1 » 0.36787 94411 71442 32159 ¼
I2
= ó
õ
1

0 
x2 e-x  dx = (x2 - 2x + 2) ex - 1 |01 = 1 - 2 e-1 » 0.26424 11176 57115 35680 ¼
I3
= ó
õ
1

0 
x3 e-x  dx = (x3 - 3x2 + 6x - 6) ex - 1 |01 = -2 + 6 e-1 » 0.20727 66470 28653 92957 ¼
We can check that our calculations agree with the results using the forward recursive formula, i.e., I1 = 1 - I0 = e-1, I2 = 1 - 2 I1 = 1 - 2 e-1, and I3 = 1 - 3 I2 = -2 + 6 e-1, and with the numerical values in Table 1.1. Since the coded forward recursive formula is the same for all values of n, it is hard to imagine that there can be any error in the code after we have checked it for n = 1, n = 2, and n = 3. To verify that the backward recursive formula is correct, we need only check it against the forward recursive formula, which we have just verified.

It is also important to verify the program in both single and double precision. There are two common bugs we are particularly looking for. First, we might have forgotten to comment out or un-comment out the IMPLICIT DOUBLE PRECISION statement in the main program as well as all the subroutines. On many computers this error is easy to spot because the results are obviously wrong; in fact, they are orders of magnitude wrong. (We discuss what causes this in Experiment .) And, second, we might have a mismatch with a constant. For example, if we begin the forward recursive formula in the program with the statement ``\MS FINTEG = 1. - EXP(-1.)'', instead of ``\MS FINTEG = 1. - EXP(ONE)'', then the variable FINTEG only approximates I0 in single precision, whether or not the program is in double precision.

In later experiments, more effort will usually be required to debug a program. However, our methods are always the same. We compare the numerical results with analytical results that have been done completely separately.5 That is, we are not allowed to ``cheat'' by looking at the numerical results when generating the analytical results. In this way we have two independent checks. (It is easy to be subconsciously swayed by the numbers we are trying to check and to make a silly mistake.)

Questions

1) (a) Generate Table 1.1 yourself in single precision. (i) Are your numerical results ``approximately'' the same as those in the table? (ii) Does the error formula 1.6 give a ``good'' approximation to the error in your numerical results? Answer (ii) in two steps. First, check if the error is growing as m! by simply calculating \WH Im / \WH Im-1 for a few ``small'', ``intermediate'', and ``large'' values of m. Second, if the error is growing approximately as m!, estimate \deltaI. Is this ``close'' to \me (which is printed out at the beginning of each run)?

Note
As long as |en| >> 1, it is easy to calculate the errors to a few significant digits (since In << 1). This is all the accuracy that is needed to answer this question.
Note
The largest value of m the program can handle is a ``large'' value of m. (b) Generate Table 1.1 yourself in double precision. Again, does the error formula 1.6 give a ``good'' approximation to the error in your numerical results?
Hint
If you are running in a window environment, you might want to have two versions of round.e, one in single precision and one in double precision. You can then run both programs at the same time and compare results.
2) Check that the forward and backward recursive formulas give the same numerical results for Im if m is ``small enough''. What is the smallest value of m for which Im is different using forward recursion and backward recursion? Do this both in single precision and in double precision.
Note
To answer this question, you have to choose N (the starting iterate for the backward recursive formula). Do not worry about exactly what value to use. Instead, simply choose N large enough that the results do not change (for the values of m in which you are interested) if you vary N by one or two.
3) Show that the errors using the forward recursion formula, i.e., \WH en º \WH In - In, grow faster than exponentially in magnitude. <Exponential growth> En has exponential growth if |En| ~ C egn as n ® ¥ for some constants C > 0 and g > 0.

(a) Show this numerically by plotting (free-hand) | \WH en | vs. n on semi-log paper. (Or use standard paper and compute log|en| using a calculator.)

Hint
Show that the plot of |C egn| vs. n is a straight line on semi-log paper (or log|C egn | vs. n is a straight line on standard paper). (b) Show this analytically by plotting | en+1/en |, where en is the error obtained from the error formula 1.6.
Note
Use the same sheet of paper in both parts.
4) Modify the second part of the program round.f, which calculates the backward recursive formula, to print In in increasing order, i.e., n = m,m+1,m+2,¼,N, rather than in decreasing order, i.e, n = N,N-1,N-2,¼,m, as is done now. This makes it easier to compare the results using the forward and backward formulas. To do this you need to use an array to store all the iterates as they are calculated and only to print them out after all the iterates have been calculated.

Note
How large does your array have to be? Do you actually have any idea? Often we use arrays in programs and have no idea what their maximum size should be. In fact, this happens in most of the programs in this workbook (but not this particular program). Since an array out of bounds does not usually generate a warning message, and may or may not generate wrong results, and may or may not crash the program, 0 it is very important to check explicitly that the array bounds are not exceeded. The following code fragment (which has nothing to do with this experiment) does this.
PARAMETER (MAXNPT=100)
DIMENSION ARRAY(MAXNPT)
.....
10 READ(5,*)NPT
IF ( NPT .GT. MAXNPT ) THEN
WRITE(6,*)' MAXIMUM ARRAY SIZE EXCEEDED - MAXIMUM ALLOWED IS ',
$ MAXNPT
GO TO 10
ENDIF
DO 20 I = 1,NPT
ARRAY(I) = I
20 CONTINUE

Note
The length of the array is defined in a PARAMETER statement. In this way, the size need be entered only once into the program. The symbolic constant MAXNPT can then be used throughtout the program instead of the explicit length of the array. For example, as soon as the desired size of the array is read in, the program checks if it exceeds the allowed length.  This check is just as important in C. It is true that in C it is possible to dynamically increase the size of an array while the program is running. However, this can only be done if the program has been programmed to perform this check.
5) In our discussion of the backward recursive formula, we stated that it begins with IN = 0 for N ``sufficiently large''. In this question you will quantify this vague statement. Choose some m > 0 and determine, by experimentation, the smallest N, call it Nm, for which the relative error in Im is within one order of magnitude of \me. Do this for a number of different values of m, both large and small. Draw a (free-hand) graph to show the relationship between m and Nm - m. Do this both in single precision and in double precision.
Hint
You can assume that N is ``sufficiently large if Im changes only in the least significant digit when we let the initial iterate be IN = 0, IN+1 = 0, IN+2 = 0, etc. (Three initial iterates is probably enough to use.)
Note
You are to do this for both large and small values of m. The U. S. federal budget, which is over one trillion dollars, is ``very, very large''. The number 100 is ``miniscule''.
6) Strip all the unessential lines of code from the program round.f to see how few lines are actually needed. Do this by

only running the program in double precision (so the call to SPORDP is not needed),  only writing the output to the screen (so WRITE statements to unit 10 are not needed),  only using list-directed output,  and, finally, removing all comments and blank lines.
Provide a listing of your ``very short'' program. How many lines long is it?

7) Since the factorial symbol occurs repeatedly in this experiment, it is worthwhile to present a simple formula which provides a good approximation to it without having to do all the multiplications. This formula, called Stirling's formula, is

k! »   æ
Ö

2pk
 
æ
ç
è
k
e
ö
÷
ø
k

 
 .     (1.10)
(a) You should never use a formula without checking it. After all, there might be a typographical error which renders the formula completely wrong! Thus, determine the relative error in the approximation to k! for some small and large values of k.
Note
Do not expend a lot of effort in calculating these quantities. For example, if you have a desk calculator, use it. If not, use a software package, such as MATLAB or MATHEMATICA, on a computer. Only as a last resort should you write a computer program. (b) You also should never use a formula without determining the approximate error. You have already calculated the relative error numerically in part 1.2.7a. Analytically, the relative error is ~ 1/(12k) as k ® ¥. How close is this analytical error to your numerically calculated error for both small and large values of k?
8) Suppose we use Method 3 to evaluate I20. Analytically determine how many terms are required in the sum 1.4 to obtain a result which is accurate to within \me. (We should really use the relative error. However the absolute error is easier to calculate than the relative error.) Then, estimate the number of floating-point operations which are required. Finally, estimate the number of floating-point operations required to obtain I20 by backward recursion (and using a ``good'' choice for N as determined by Question 1.2.5).
Hint
You must determine how many terms are required in the sum S = åk = 0¥ ak to obtain a result which is accurate to within a given error bound. That is, for what K does åk = 0K ak have an absolute error which is < \me? The answer is that this is equivalent to finding for what K
ê
ê
ê
¥
å
k = K+1 
ak ê
ê
ê
< \me  .
(Note that it is not necessary to find the smallest K for which this is true.) K is easy to find once you show that åk = K+1¥ ak < 2 aK+1. (So show it!)

9) Is Method 4 also applicable to the integral

\cIn = ó
õ
10

0 
xn ex - 10  dx     for \MS n = 0,1,2,¼ ?     (1.11)
(a) Determine this by doing the following:

(i)
Obtain a bound for \cIn (as we did for In in 1.2). Then, find a recursive formula for it.
(ii)
Do you think that the forward recursive formula is stable or unstable numerically? Also, is it easy to calculate an initial iterate?
(iii)
Do you think that backward recursive formula is stable or unstable numerically? Also, is it easy to calculate an initial iterate?
(b) One difficulty in working with the integral 1.11 is that 10n seems to appear in a number of places. Rescale this integral to remove these factors of 10 by letting u = 10-1 x and \cJn = 10-n  \cIn. Answer questions (i), (ii), and (iii) again for this rescaled integral.
Note
In this question you are asked to decide whether you think that the forward and backward recursive formulas are stable or unstable. Do this analytically by the methods we have used in this experiment for In.  If you wish, you can code these recursive formulas and see what actually happens numerically. However, do not turn in the results, because the purpose of this question is for you to apply the analytical arguments already used in this experiment to study a slightly different integral.
10) Is Method 4 also applicable to the integral

\scriptIn = ó
õ
1

0 
xn ex2  dx     for \MS n = 0,1,2,¼ ?
Answer questions (i), (ii), and (iii) as in Question 1.2.9.

11) Does Method 4 still work if n > 0 is not an integer? For example, can we numerically compute I1/2 accurately?

12) Write a program to evaluate In using Method 3 as given in 1.4, and explain what your program does. (You can simply write the comments in by hand on the side of the printed listing.) Use this program to calculate I100. Do you think I100 can be evaluated by 1.4 as accurately as by backward recursion (with a ``good'' choice of N)?

Note
Explain how you determine which method is more accurate.
13) Calculate In analytically in order to see, from another point of view, why the numerical calculation is unstable. (a) Use the methods of Appendix  to calculate the exact solution to In using the linear difference equation In + n In-1 = 1 with initial iterate I0 = a.
Hint
First, calculatey the solution to the homogeneous equation In + n In-1 = 0. (Note that this solution has an arbitrary constant.) Second, calculate a particular solution to the desired equation In + n In-1 = 1. Finally, put these two solutions together and solve for the arbitrary constant in terms of a.
Hint
What is the Taylor series expansion for e-1? (b) Show analytically that for any initial condition other than a = 1 - e-1 the solution blows up with the typical sawtooth instability (i.e., the solution grows at least exponentially in n and alternates in sign).
Note
The integral ò01 xn ex-1  dx is equivalent to the recursive formula In = 1 - n In-1 with the initial iterate I0 = 1 - e-1. Thus, In ® 0 as n ® ¥. However, the analytical solution as calculated in this question shows that for any other initial iterate the recursive formula blows up faster than exponentially as n ® ¥. Since the computer can only store an approximation to I0 = 1 - e-1, it is not surprising that the numerically calculated iterates lose their accuracy as n increases.
:[A: diff eq]probh *probhint= The exact solution to 1.5 is

In(part) = 1
n+1
- 1
(n+1)(n+2)
+ 1
(n+1)(n+2)(n+3)
- ¼ .
The common solution is to In + n In-1 = 0 is In(common) = c (-n)n. Thus,

I0 = c + 1 - 1
2!
+ 1
3!
- 1
4!
+ ¼ = c + 1 - e-1  .
Thus, c = a- (1 - e-1) so that c = 0 only for the initial iterate I0 = 1 - e-1.

Footnotes:

1 An adaptive quadrature algorithm would not take ``too much'' time - but it would still be the slowest method of all those discussed in this experiment.

2 It is a very bad practice to print more than the actual digits which are correct (for example, about 7 on a Sun in single precision and about 16 in double precision). The difficulty is that everyone tends to believe that all the digits shown are accurate - even if told otherwise. For example, if we compute p in single precision to two decimal digits (see Experiment ) we should write the answer as p » 3.14, not, for example, as p » 3.14033.

3 We discuss the machine epsilon in detail, and we also calculate it numerically, in Experiment . The basic definition is that \me is the smallest positive floating-point number in the computer for which 1 + \me ¹ 1.

4 The reason for this discrepancy is that \WH I0 is not quite printed out to full accuracy using the ``WRITE(6,*)'' statement. Instead, we should print it out using a FORMAT specification such as ``F14.10'' or ``E18.10''. Even better, we can write a very short program which calculates \WH I0 in double precision as well as in single precision and prints out the difference.

5 After World War II the measured speed of light in a vacuum jumped by nearly 20  km / sec, which was a large jump (at least to experimental physicists). In 1926 A. A. Michelson (of Michelson & Morley fame) determined the time it took a beam of light to travel from Mt. Wilson to Mt. San Antonio (which are in Southern California) and back again. The known index of refraction of air was then used to obtain c = 299,798 ±6  km / sec. In 1929 Michelson, Pease, and Pearson began a new experiment in which the light beam travelled through a mile-long evacuated pipe. Using 3000 measurements they obtained c = 299,774 ±6  km / sec in 1931 (after Michelson's death). For over fifteen years all following measurements were consistent with this latter value (in the sense that the value 299,774 was within their error bars). For example, in 1941 Anderson used a newer method (still using an evacuated pipe) and nearly 3000 observations to obtain c = 299,776 ±6  km / sec. During World War II questions started being raised about these latter calculations because distances could now also be measured by timing a short pulse of radio waves. For example, in 1947 a team led by Carl I. Aslakson calculated the value of c = 299,792 ±2.4  km / sec. However, he did not publish his results at the time because ``it was felt that to question the generally accepted determination by Anderson and others of 299,776  km./sec. would be presumptuous'' (). By 1949 other researchers had obtained similar values and so he finally published his results. All subsequent measurements have agreed with his value. In fact, by the end of the 1960's the value had been calculated as c = 299,792.56 ±0.11  km / sec. What happened was that this calculation was extremely difficult. In any calculation there are many source of uncorrelated and correlated errors. Uncorrelated errors correspond to the random errors that occur in any experiment. Correlated errors are due to such difficulties as the calibration being slightly off for some instrument, using a slightly incorrect value for some fundamental constant, etc. For example, the beam was not travelling in an absolute vacuum, it interacted with the walls of the pipe, and a very short pulse of light had to be generated and its transit time measured very accurately. Experimentalists always compensate for these correlated errors in their calculations. What happened for nearly a twenty year span was that experimentalists modified their calculations until they agreed with what they thought was the ``correct'' value and then they stopped.


File translated from TEX by TTH, version 1.46.