2.4 Integrating with a Spreadsheet

We have not yet defined functions so this section is way ahead of us. If you encounter something that bothers you here, stop, go on to the next chapter and come back here later. If what you see below makes sense, then keep going.

Integration has a geometric meaning. Given a positive function $f$, the Definite Integral of $f$ between $A$ and $B$ means the area between the plot of of the function $f(x)$, and the x-axis, from a fixed starting value of $x, A$, to another value $B$ with $B > A$.

If the function is constant, that area just the product of $(B-A)$ (the length of the interval) with the constant value of $f$, because the figure whose are we are computing is a rectangle, with sides at $x = A$ and $x = B$, top at $y = f$, and bottom at $y = 0$.

Otherwise, we can divide the interval from $A$ to $B$ into slivers of length $d$ and calculate the area by the sum of the areas in each sliver. (We count area below the x axis as negative when the function is negative and when $B < A$ positive becomes negative and vice versa.) We will choose slivers all of length $d$, and approximate the area in each sliver.

There is an interesting question here: what do you do to approximate the area in a sliver?

A sliver has width $d$, and we chose an approximate height, so this question becomes what height should we assign to the area between say $s$ and $s + d$?

There are three very simple ways to do this. One way is to use $f(s)$, and another to use $f(s+d)$ and another is to use their average.

These ways of estimating have names! They are, the left hand rule, the right hand rule and the trapezoid rule. The contribution to area from each sliver will be this estimate multiplied by $d$.

Happily, the only difference between these in the answer you get for the area in question comes from the contributions $f(A)d$ and $f(B)d$. All other intermediate points contribute the same amount no matter which of these "rules" is used.

This happens because the end of one sliver is the beginning of the next, and the contribution to the sum from point $s$ is $f(s)d$ no matter which of these methods is used. If you use the value of $f$ on the left side of intervals, then you get $f(s)d$ from the interval starting at $x$; if you use the right side value of $f$ you get the same thing coming from the interval ending at $x$; and if you use their average, you get half from either interval.

This means the only difference comes from the first and last intervals. With the "left rule", you get $f(A)d$ but not $f(B)d$ vice versa for the "right rule", and $\frac{(f(A)+f(B))d}{2}$ from the average or "Trapezoid Rule". In other words, in the trapezoid rule you get $f(s)d$ for every interior sliver except the end ones, and only $\frac{f(A)d}{2}$ and $\frac{f(B)d}{2}$ at the endpoints $A$ and $B$. The trapezoid rule turns out to be the best of the three.

So we will estimate the sum using $f(s)d$ for values s between A and B inclusive, and subtract $\frac{(f(A)+f(B))d}{2}$ from the total, and that will give us the answer supplied by the trapezoid rule. Later on we will see that this is much better than either of the others, because its error is proportional to $d^2$ while the others each differ from the actual area by a linear term $\frac{|f(A)-f(B)|d}{2}$ as well.

Calculating the sum of the contents of consecutive boxes in a column is what you did in column D with Fibonacci numbers. To get in column C the sum of what is in column B from 5 on you enter =B5+C4 into C5 and copy it down that column.

This will compute the left hand rule estimate for areas in column C. By putting =C5-(B$5+B5)/2 in D5, we convert the left hand rule to the trapezoid rule which will be displayed at each intermediate point by what is in column D. The -B$5/2 takes away half of the contribution at $x = A$, and the other subtraction takes away the contribution at the other end.

We start by putting the choice for d in B2; putting the starting value for $x$, $A$, in B3.

We do this so we can easily change these when we want to.

Column A will contain the values of $x$ from A on.

Entry Bk will contain the values of your function for $x = \text{Ak}$.

As an illustration we will estimate the integral of the function $\sin x$.

You can set this up starting at the fifth row by putting =B3 in A5. Then set A6 to =A5+B$2, and copy A6 down column A. That will be the value of your variable.

In B5 put =B$2*sin(A5) and copy this down column B.

In C5 put =B5+C4 and copy that down column C.

In D5 put =C5-(B$5+B5)/2 and copy down column D.

If you do this, you can change d just by inserting a different value for it in B2. You can change the starting point by entering the new one in B3. You can change the function you want to integrate by replacing sin(A5) by your new f(A5) and copying =B$2*f(A5) down column B.

The estimate of the area starting at A5 and ending at at x=A5+kd using the Left Hand Rule will appear in column C at row whose a value is B5+(k-1)d. (This box will have the sum of $k$ terms of the form $\sin(x)d$.)

The entries in column D convert the left hand rule to the Trapezoid rule. Thus what appears in the row with A value B4+kd will be the trapezoid rule estimate of the area between the x-axis, the sine curve and the lines x=B4 and x=B4+kd.

This is an estimate to the area; we can do better and will, later on.

This is what the spreadsheet should look like with $d = 0.01$ and $A = 1$.

Number of increments

Number of digits after decimal point

Now select columns A and B from A5 to B105, and insert an xy scatter chart. What do you see?

How can we do better?

If you add a column E which is like C except jumps by $2$, that is in E5 put =2*B5+E3 and copy down, and correct this to the trapezoid rule in column F by putting in F5 =E5-(B$5+B5) and copying down, and finally putting =(4*D5-F5)/3 in column G you will get the Simpson’s rule estimate to the area in question in the odd entries of column G. (like rows $5$, $7$, $9$, etc.) The even entries will be useless junk.

What the devil is this?

The odd entries in E and F repeat the previous calculation with $d$ replaced by $2d$. The error in the trapezoid rule behaves as $d^{-2}$; the term in that error that behaves as $d^{-2}$ will cancel out if you take $4$ times the $d$ computation and subtract the $2d$ one. The result will be roughly $3$ times the actual result. Thus dividing 4D5-F5 by 3 gives a rule for the area whose error actually goes as $d^4$. It is called Simpson's rule.

This will be discussed in detail in Chapter 14.