To understand the Chain Rule we will need to slightly blur the distinction between function and variable.
Example14.3.1.
Here’s what we mean: The formula \(y=(2x^2-6x)^3\text{,}\) is given entirely in terms of the variables \(x\text{,}\) and \(y\text{.}\) To differentiate using differentials we would make the (variable) substitution \(z=3x^2+6x\) so that \(y=z^3\text{.}\) In that case, \(\dx{y}=3z\dx{z}=3\left(3x^2+6x\right)^2(6x+6)\dx{x}\text{,}\) and dividing through by \(\dx{x}\) gives us the derivative of \(y\) with respect to \(x\text{,}\)
But Definition 13.2.3 requires that we think about functions, not variables so let’s translate this problem into the language of functions. If \(y=\left(2x^2-6x\right)^3\text{,}\) clearly \(y\) is a function of (depends on) \(x\text{.}\) Naming that function \(f\text{,}\) we have \(y=f(x)\text{.}\) Replacing \(y\) with \(f(x)\text{,}\) we get \(f(x)=(2x^2-6x)^3\text{.}\)
Similarly, if \(z=3x^2+6x\) then \(z\) is also a function of (depends on) \(x\text{,}\) and naming that function \(\beta\) we have \(z=\beta(x)\text{.}\) Replacing \(z\) with \(\beta(x)\) we have \(f(x)=(\beta(x))^3\text{.}\) If we suppress the “\((x)\)” part of \(\beta(x)\text{,}\) we see that
This is the Chain Rule. We have expressed the Chain Rule in this form so that we can prove it rigorously, not so that we can use it. The substitution process using differentials still works so there is no reason to stop using substitution when you are actually computing derivatives.
Theorem14.3.2.The Chain Rule.
Suppose that \(\beta(x)\) is differentiable at \(x\text{,}\) that \(\alpha(x)\) is differentiable at \(\beta(x)\) and that \(\Delta\beta\neq0\) near \(x\text{.}\) Then the composition,
Before the invention of Calculus, arithmetic primers gave the name “The Chain Rule” to the computational technique that is used to, among other things, convert money from one currency to another. For example if we need to convert \(30\) American dollars ($) to British pounds (£) but we only know their values ie euros (€). Specifically we know that
\begin{align*}
1 \text{ dollar} = 0.86\text{ euros,} \amp{}\amp{}\text{ and that }
\amp{}\amp{} 1\text{ euro} = 0.9 \text{ pounds.}
\end{align*}
A similar chain of cancellations will occur when we differentiate a function composition of the form \(\alpha(t)=\alpha(\beta(y(x(t))))\text{.}\) We think of
\begin{gather*}
\alpha \text{ as a function of }\beta \left(\text{ so that }
\alpha^\prime(\beta)=\dfdx{\alpha}{\beta}\right)\\
\beta \text{ as a function of }y \left(\text{ so that
}\beta^\prime(y)=\dfdx{\beta}{y}\right)\\
y \text{ as a function of }x \left(\text{so that}
y^\prime(x)=\dfdx{y}{x}\right),\\
\end{gather*}
and
\begin{gather*}
x \text{ as a function of }t \left(\text{so that }
y^\prime(x)=\dfdx{x}{t}\right).
\end{gather*}
The substitution we used to make things “easier on your eyes” in Section 2.2 is equivalent this chain of cancellations. With the invention of Calculus the older Chain Rule for unit conversion was extended to the differentiation by substitution technique using differentials. Eventually the older usage was dropped and this became the only Chain Rule. When the limit was used to provide rigor to Calculus the name was also applied to equation (14.2)).
END OF DIGRESSION
Understanding the Chain Rule in this form requires that we blur the distinction between function and variable a bit. When we compute \(\dfdx{\alpha}{\beta}=\alpha^\prime(\beta)\) (the derivative of \(\alpha\) with with respect to \(\beta\)) we view \(\beta\) as a variable, but when we compute \(\dfdx{\beta}{x}=\beta^\prime(x)\) (the derivative of \(\beta\) with respect to \(x\)) we view it as a function.
As far as the Chain Rule is concerned it is both.
Proof.
Before we begin take specific notice of the assumption “\(\Delta\beta\neq0\) near \(x\)” in the statement of the Chain Rule. We will have a few comments about this in Digression: Why Assume That \(\Delta\beta\neq0\) Near Zero? after the proof is completed.
In equation (14.6)\(\beta\) is first used as a variable in \(\alpha^\prime(\beta)\text{,}\) and then as the function \(\beta(x)\text{.}\) While this is correct, it is also poor form because it accentuates the dual use of \(\beta\text{.}\) To avoid this we usually express the Chain Rule as
to emphasize that \(x\text{,}\) not \(\beta\text{,}\) is the variable.
DIGRESSION: Why Assume That \(\Delta\beta\neq0\) Near Zero?
Do you see why we had to assume that \(\Delta\beta\neq0\) near \(x\text{?}\)
Observe that in equation (14.5)\(\Delta\beta\) plays the same role the \(h\) plays in Definition 13.2.3. In Definition 13.2.3 we were careful to insist that \(h\) could never equal zero,so if we are going to interpret
as the derivative of \(\alpha\) with respect to \(\beta\text{,}\) as we did in equation (14.5), we need to know that \(\Delta\beta\neq0\) when \(h\) is near zero.
Our imposition of that constraint means that Theorem 14.3.2 does not apply to any function \(f(x)=\alpha(\beta(x))\) where \(\Delta\beta\) might be equal to zero no matter how close \(h\) is to zero. Fortunately, functions of that sort are generally the kinds of “pathological functions” that Poincarè is complained about in the quote at the beginning of this chapter. A valid proof of the Chain Rule without that constraint is possible, but since it would have very little relevance to anything we’ll be doing we have chosen to prove only this weaker form of the Chain Rule
If you are unsatisfied with this proof and want to see a proof of the stronger version of the Chain Rule, consider majoring in mathematics. You’ll see that and much, much more. In the meantime try working through the following problem.
Problem14.3.3.
(a)
Show that the function \(\beta(x)=\sin\left(\frac1x\right)\) does not satisfy the constraint \(\Delta\beta\neq0\) when \(x\) is near zero.
As a result of part 14.3.3.aTheorem 14.3.2 does not apply to any of the following functions at \(x=0\text{.}\) Nevertheless one of them is differentiable at \(x=0\text{.}\) Use Definition 13.2.3 to find out which one.
Suppose that \(f(x)=\left(\sin(x)+\cos(x)\right)^2\text{.}\) To use the Chain Rule to compute the derivative of \(f(x)\) we need to recognize that \(f(x)\) is the composition of \(\alpha(x)=x^2\text{,}\) and \(\beta(x)=\sin(x)+\cos(x)\) and then apply Theorem 14.3.2 as follows.
In our opinion the Chain Rule leaves a lot to be desired as a computational technique. But we don’t have to use it that way since Theorem 14.3.2 validates the substitutions we have always used.
Drill14.3.5.
Suppose \(y=f(x)=\left(\sin(x)+\cos(x)\right)^2\text{.}\) Compute the differential \(\dx{y}\) and then divide through by \(\dx{x}\) to find the derivative \(\dfdx{y}{x}\text{.}\) Confirm that it is the same as the derivative we found in Example 14.3.4.
Problem14.3.6.
Compute \(\dfdx{y}{x}\) for each of the following functions by identifying \(\alpha(x)\) and \(\beta(x)\) such that \(y(x) = \alpha(\beta(x))\) and applying the Chain Rule. You may have to do this more than once for a given problem. In each case confirm that your computation is correct with an appropriate differential substitution.