









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Various methods for approximating derivatives using Taylor series, centered differences, and discusses the impact of rounding error on the results. It covers formulas for derivatives using n + 1 points, the most important formula - the centered difference, and error analysis for rounding errors.
What you will learn
Typology: Study notes
1 / 17
This page cannot be seen from the preview
Don't miss anything!
◦ Finite differences ◦ Error analysis, effect of rounding error ◦ Deriving formulas using Taylor series... ◦ ...and using Lagrange interpolation
◦ Asymptotic error series ◦ Use in deriving differentiation formulas ◦ Estimating error using two approximations
We now consider the problem of deriving formulas for the derivative f ′(x) of a function at a point x using its values at a finite set of nearby points. The simplest is the forward difference
f ′(x) ≈
f (x + h) − f (x) h
defined for h > 0 (which is obvious from the definition of f ′). To determine, the error, we simply use a Taylor series to expand
f (x + h) = f (x) + hf ′(x) +
h^2 2
f ′′(ξx).
This gives the forward difference formula with error term:
f ′(x) =
f (x + h) − f (x) h
h 2
f ′′(ξx) (1)
where ξx lies between x and x + h. In particular, the error is O(h), which is not very good; this formula is about as inaccurate as a formula can be (while still valid). To get better formulas, we can use more points and/or arrange them more optimally. There are several approaches.
A formula using n + 1 points will have the form
f ′(x) =
∑^ n
k=
ckf (x + akh) + error
for coefficients ck and ak (important point: why must the formula be a linear combination of function values?). Typically the derivation goes as follows:
f (x + akh) = f (x) + ckakhf ′(x) + · · ·
Given n + 1 coefficients, we can expect to control the first n + 1 terms; the Taylor series should then be taken to n + 1 terms plus an error term. Note: In some cases, one more term should be taken; see below.
f ′(x) =
∑^ n
k=
ckf (x + akh) + Cf (p+1)(ξ)hp
for some ξ in the interval spanned by the points and some constant C (that would be found in the derivation). The formula has degree p (see subsection 3.1), i.e. it is exact for polynomials up to degree p, and the error is O(hp) as h → 0.
Consider a ‘centered difference formula’
D(h) = af (x − h) + bf (x + h). (2)
We seek a and b so that
f ′(x) = af (x − h) + bf (x + h) + error. (3)
Key point (symmetry): The O(h^2 ) term cancels for free due to the symmetry of the approximation. We can guarantee that D(h) has degree of accuracy 1 (exact for constant and linear functions) by choosing a and b. But observe that the approximation is also exact for
f (x) = x^2.
It follows that it is exact for all quadratic polynomials (why?). The symmetry provides extra accuracy. Note that this formula uses the same number of points as the forward difference (1) but has much better error - it is a superior formula when it is available.
We find the three-point backward difference using points x − 2 h, x − h and x:
D(h) =
af (x) + bf (x − h) + cf (x − 2 h) h
The 1/h is put in here knowing that the coefficients will have this factor. There are three coefficients and no symmetry, so we need three terms plus an error term. Expand:
f (x − h) = f (x) − hf ′(x) +
h^2 2
f ′′(x) −
h^3 6
f ′′′(ξ 1 )
f (x − 2 h) = f (x) − 2 hf ′(x) + 2h^2 f ′′(x) −
4 h^3 3
f ′′′(ξ 2 ).
Plugging in, we get
D(h) =
(a + b + c) h
f (x) − (b + 2c)f ′(x) + h(
b 2
b 6
f ′′′(ξ 1 ) +
4 c 3
f ′′′(ξ 2 )
which gives
0 = a + b + c 1 = b + 2c 0 = b/2 + 2c.
Solving for the coefficients, we obtain the formula
f ′(x) =
− 3 f (x) + 4f (x − h) − f (x − 2 h) h
f ′′′(ξ 1 ) −
f ′′′(ξ 2 )
Using (without proof) the rule that the ξi’s can be taken to be the same (subsection 3.2),
f ′(x) =
− 3 f (x) + 4f (x − h) − f (x − 2 h) h
h^2 3
f ′′′(ξ). (4)
Notation (finite difference operators): In this section, we have foregone defining sym- bols for the various differences (using the placeholder D(h) instead). The forward, backward and centered differences are common enough that they are sometimes given notation. For instance,
δhf =
f (x + h) − f (x − h) 2 h
is common for centered differences, with ∆ and ∇ for forward and backward. These symbols are operators (mapping a function to a function) and are analogous to the derivative operator:
δh ↔
d dx
A ’calculus’ of finite differences can be developed, just as you have done in calculus for continuous derivatives.
Observe that both the numerator and denominator of a differentiation formula go to zero as the points get closer together, e.g
f ′(x) =
f (x + h) − f (x) h
If there is an error in computing f due to rounding (or if f was given empirical data with som eerror), then the error in f ′(x) is on the order of /h. Thus as h → 0 the error goes to ∞. See homework for details. This problem is more or less inherent to computing f ′, since it is a limit of difference quotients by definition. A key lesson is that it is dangerous to estimate derivatives from data, and calculating derivatives is best avoided when possible.^1
To give a concrete example, consider the forward difference (5), but assume that the com- puted function values are f˜ rather than f :
f˜ (x) = f (x) + δ, |δ| < η
where ηapprox 1. 2 × 10 −^16 is the rounding unit. Also assume that
|f ′′(x)| ≤ M.
Using the error formula we have that the computed derivative, D˜(h), satisfies
D˜(h) =
f˜ (x + h) − f˜ (x) h
= f ′(x) +
h 2
f ′′(ξx) +
δ 1 − δ 0 h (^1) Unfortunately, derivatives are sometimes unavoidable, and show up all the time in numerical calculations. In this case, we often accept that there will be some limited accuracy, and use the knowledge of this section to avoid disaster (e.g. taking h too small). When high accuracy is needed and some aspects of the function are known, there is a technique in between exact calculation and finite differences called automatic differentiation that can help.
The problem is even worse when trying to estimate derivatives from given data that has noise. The scale of the error δ is then larger than 10−^16 and causes problems at a larger h. For a k-th derivative (at equally spaced points for simplicity), the formula will be
f (k)(x) ≈
cj f (x + akh) hk^
The approximation error will be O(h) or better. From the formula, we see that
δ sized error in f =⇒ error ∼
δ hk^
in f (k).
Not only does this create error, but the nature of the error is ‘random’ if the error in f is not well-behaved (like random noise), leading to jagged shapes like the one in Figure 2. For a nice example, see the textbook. p425.
We can derive the same numerical differentiation formulas using Lagrange interpolation. The process is more mechanical, but cumbersome. Here we switch notation a bit but this is really the same problem as before.
Let x 0 , x 1 , · · · xn be a set of nodes, and let
fk = f (xk).
Recall that the i-th Lagrange basis polynomial is
Li(x) =
j 6 =i
x − xj xi − xj
and the interpolant satisfies
f (x) =
∑^ n
i=
fiLi(x) +
f (n+1)(ξx) (n + 1)!
w(x) ︸ ︷︷ ︸ E(x)
where w(x) =
∏n j=0(x^ −^ xj^ ).
Suppose we wish to compute f ′(x) at one of the nodes, xα. The idea is simple: just differ- entiate the interpolation formula (6). This gives
f ′(xα) =
∑^ n
k=
fiL′ i(xα) + E′(xα).
The error term can be simplified using the observation that
((x − a)f (x))′
x=a
= f ′(a). (7)
which avoids some nasty product rule terms. We have that
E(x) = (x − xα)
f (n+1)(ξx) (n + 1)!
∏^ n
k=0,k 6 =α
(x − xk)
so
E′(xα) =
f (n+1)(ξx) (n + 1)!
∏^ n
k=0,k 6 =α
(xα − xk). (8)
If the points are all separated by a constant times h then the error is O(hn). Thus the order of the error is one less than the number of points (also one less than the error for the interpolant itself). Useful rule of thumb: Differentiating worsens the error by one power of h.
Example: To obtain the centered difference formula we derived earlier, first observe that it suffices to derive an approximation for f ′(0) given points −h, 0 and h. Choose points
x 0 = −h, x 1 = 0, x 2 = h
and xα = 0. Note that the x 1 must be included here (even though it disappears in the formula) since we must evaluate at an interpolation point. The Lagrange polynomials are
x(x − h) 2 h^2
(x + h)(x − h) h^2
(x + h)x 2 h^2
Differentiating and evaluating at zero,
x − h 2 h^2
x=
2 h
and similarly
L′ 1 (0) = 0, L′ 2 (0) =
2 h
using the general rule (7) to avoid some product rule terms. Thus
f ′(0) = −
2 h
f (−h) +
2 h
f (h) + · · ·
and the error (8) is
f (3)(ξ) 6
(0 + h)(0 − h) = −
f (3)(ξ) 6
h^2.
which yields the same formula as obtained via Taylor series,
f ′(0) =
f (h) − f (−h) 2 h
f (3)(ξ) 6
h^2.
Note that the Lagrange-form error (8) proves some observations from before:
2 Richardson extrapolation
Here we introduce a powerful, rather general tool for improving the accuracy of a formula with minimum effort. The underlying idea is of critical importance in numerical analysis, and will be used often.
The basic setup is this: suppose we have a quantity L to be computed and a formula A(h) defined for h > 0 such that lim h→ 0 A(h) = L.
We first illustrate via an example: the two-point centered difference formula. We have
L = f ′(x), A(h) =
f (x + h) − f (x − h) 2 h
Consider the Taylor series, but this time with infinite terms:
f (x ± h) = f (x) ± hf ′(x) +
h^2 2
f ′′(x) ±
h^3 3!
f ′′′(x) + · · ·.
Every other term cancels when computing f (x + h) − f (x − h) (exercise: check this), and so
A(h) = L + c 2 h^2 + c 4 h^4 + · · · (10)
for some constants c 2 , c 4 , · · ·. Their values do not matter here.
The fundamental idea is that by evaluating A(h) at more than one value of h, we obtain systems of equations for the unknowns (like L). These equations can be combined to ‘solve’ for quantities of interest.
Suppose first that we want to obtain a better approximation to L than A(h). The pro- cess of Richardson extrapolation is to evaluate (10) at two values of h and solve for L.
Let us pick h and h/2. Then we have
A(h) = L + c 2 h^2 + c 4 h^4 + · · · ,
A(h/2) = L +
c 2 4
h^2 +
c 4 16
h^4 + · · ·
To cancel the error term, take 4× the second equation minus the first:
4 A(h/2) − A(h) = 3L −
c 4 h^4 + b 6 h^6 + · · ·
where · · · is a series in powers of h^2 starting with h^8. Thus
4 3
A(h/2) −
A(h) = L + b 4 h^4 + b 6 h^6 + · · ·
for constants b 4 , b 6 , · · ·. Again, we could compute the b’s in terms of the c’s but their values are probably not known are not needed here anyway. The result is that
A 1 (h) =
A(h/2) −
A(h)
is an O(h^4 ) formula for L. For the centered difference formula (10), this gives
A 1 (h) =
f (x + h/2) − f (x − h/2) h
f (x + h) − f (x − h) 2 h
−f (x + h) + 8f (x + h/2) − 8 f (x − h/2) + f (x − h) 6 h
which is the four point centered difference formula for f ′^ using points x ± h and x ± h/ 2.
But we can keep going! The approximation A 1 (h) also has an asymptotic error series:
A 1 (h) = L + b 4 h^4 + b 6 h^6 + · · ·.
Following the same process with A 1 (h) and A 1 (h/2) we obtain the formula
A 2 (h) =
A 1 (h/2) −
A 1 (h) =
A(h) −
A(h/2) +
A(h/4).
The result is an O(h^6 ) formula using the O(h^2 ) formula evaluated at h, h/2 and h/ 4.
The whole process can be formalized, leading to a table of approximations Ak(h 2 −j^ ) for j = 0, · · · k that can be computed efficiently using formulas like the above, with
Ak(h) = L + O(h^2 k).
It is worth emphasizing that this process is cheap to compute, because we only need to compute some formula A(h) a few times to get higher-order accuracy. This makes Richardson extrapolation a convenient way to improve accuracy without much (computational) effort.
Caution: Note that the process tells us the order of each approximation (as O(hp)) but not more precise information like the degree or the constant in front of the hp. A more detailed analysis would be required to do so. On the other hand, once the formula is known this is not so bad, and in practice is not always necessary.
What is required for Richardson extrapolation? All that is needed is
A(h) = L +
k=
ckhnk^ (11)
But observe that the error is
E(h) = c 2 h^2 + O(h^4 ) ≈ c 2 h^2 ,
so the expression on the right can be written in terms of the error (up to O(h^4 )):
E(h) =
(A(h) − A(h/2)) + O(h^4 ).
Alternately, the error in A(h/2) could be found as ≈ (A(h) − A(h/2))/3.
Key point: Combining two different approximations can yield an estimate of the error if we have some knowledge of the form of the error. This technique is a good way of doing practical error estimation for any method with an asymptotic error series.
3 Background
Some general concepts and details used in this section.
Suppose we have an approximation f → A[f ]
that is linear, in that A[c 1 f 1 + c 2 f 2 ] = c 1 A[f 1 ] + c 2 A[f 2 ]
for all scalars c 1 , c 2 and functions f 1 , f 2 that can be approximated.
The degree of accuracy n is the largest integer n such that
A[f ] is exact for all polynomials of degree ≤ n.
For instance, the interpolating polynomial with n + 1 points has degree of accuracy n, since the interpolant of a degree n polynomial through n + 1 points is itself.
The forward difference formula has degree of accuracy 1 since
f ′(x) =
f (x + h) − f (x) h
s when f is linear.
In general, if the error in a formula has the form
f (n+1)(ξ) (· · · )
then it has degree of accuracy n. This is simply because if f is a polynomial of degree n then f (n+1)(ξ) = 0, but if it is one degree more then this may not vanish.
Observe that for the differentiation formulas,
degree n =⇒ error ≈ Chn.
For instance, the centered formula has one higher degree than the forward one, and its error is O(h^2 ) instead of O(h). The symmetry of the formula increases its degree by one for free, which leads to the improvement.
Expressions like c 1 f ′′′(ξ 1 ) + c 2 f ′′′(ξ 2 )
that arise in calculations like the centered difference formula can be simplified.
This is done using the IVT (intermediate value theorem).
Mean value theorem (derivative version): Suppose f ∈ C([a, b]) and f ′^ exists in (a, b). Then there exists a point ξ ∈ (a, b) such that
f ′(ξ) =
f (b) − f (a) b − a
Equivalently, if f is defined in an interval I and a, b ∈ I then
f (b) − f (a) = f ′(ξ)(b − a)
for some ξ between a and b.
The equivalent version is just a useful way of interpreting the result: it lets us connect (with an equality) the change in f vs. the change in its arguments. Here’s a simple example of how it is typically used:
Claim: Suppose f ∈ C^1 ([a, b]) and f ′^ is bounded by M :
|f ′(x)| ≤ M for x ∈ [a, b].
Then |f (x 1 ) − f (x 2 )| ≤ M |x 1 − x 2 | for all x 1 , x 2 ∈ [a, b].
Proof. Let x 1 , x 2 ∈ [a, b]. By the mean value theorem, there is a ξ ∈ (a, b) such that
f (x 1 ) − f (x 2 ) = f (ξ)(x 1 − x 2 ).
Since ξ is in the interval, the bound applies, so
|f (x 1 − f (x 2 )| ≤ |f (ξ)||x 1 − x 2 | ≤ M |x 1 − x 2 |.
The mean value theorem also has a variant for integrals:
Mean value theorem (integral version): Let f ∈ C([a, b]) and suppose g(x) is a function such that g(x) ≥ 0 for x ∈ [a, b].
Then there is a point ξ ∈ (a, b) such that
∫ (^) b
a
f (x)g(x) dx = f (ξ)
∫ (^) b
a
g(x) dx.
That is, the factor f (x) can be moved outside the integral if the rest does not change sign (and we replace x with an unknown point).
The integral form will be useful for simplifying the error in integration formulas. It allows us to pull complicated functions outside of an integral, at the cost of evaluating them at some unknown point. Integrating the rest can give nice errors formulas, losing less structure than one would by just bounding terms. For example, if ξx is some point in (0, h) depending on x then ∫ (^) h
0
f ′′(ξx)x dx = f ′′(η)
∫ (^) h
0
x dx =
h^2 f ′′(η)
for a single value η ∈ (0, h) (this is ξx∗ for some x∗^ ∈ (0, 1)).