The Chain Rule

Section 14.3 The Chain Rule

To understand the Chain Rule we will need to slightly blur the distinction between function and variable.

Example 14.3.1.

Here’s what we mean: The formula $y=(2x^2-6x)^3\text{,}$ is given entirely in terms of the variables $x\text{,}$ and $y\text{.}$ To differentiate using differentials we would make the (variable) substitution $z=3x^2+6x$ so that $y=z^3\text{.}$ In that case, $\dx{y}=3z\dx{z}=3\left(3x^2+6x\right)^2(6x+6)\dx{x}\text{,}$ and dividing through by $\dx{x}$ gives us the derivative of $y$ with respect to $x\text{,}$

\begin{equation} \dfdx{y}{x}=3z\dx{z}=3\left(3x^2+6x\right)^2(6x+6).\tag{14.1} \end{equation}

🔗

But Definition 13.2.3 requires that we think about functions, not variables so let’s translate this problem into the language of functions. If $y=\left(2x^2-6x\right)^3\text{,}$ clearly $y$ is a function of (depends on) $x\text{.}$ Naming that function $f\text{,}$ we have $y=f(x)\text{.}$ Replacing $y$ with $f(x)\text{,}$ we get $f(x)=(2x^2-6x)^3\text{.}$

🔗

Similarly, if $z=3x^2+6x$ then $z$ is also a function of (depends on) $x\text{,}$ and naming that function $\beta$ we have $z=\beta(x)\text{.}$ Replacing $z$ with $\beta(x)$ we have $f(x)=(\beta(x))^3\text{.}$ If we suppress the “$(x)$” part of $\beta(x)\text{,}$ we see that

\begin{equation*} f(\beta)= \beta^3 \end{equation*}

is also a valid representation of our function. If we now define $\alpha(\beta)=\beta^3$ we see that

\begin{equation*} f(x)= \alpha(\beta(x). \end{equation*}

🔗

Looking again at equation (14.1), and mixing the differential and functional notations a bit we see that

\begin{equation*} f^\prime(x)=\dfdx{y}{x}=3z\dx{z}=\underbrace{3\left(3x^2+6x\right)^2}_{\dfdx{\alpha}{\beta}=\alpha^\prime(\beta(x))}\underbrace{(6x+6)}_{\dfdx{\beta}{x}=\beta^\prime(x)}=\alpha^\prime(\beta(x))\cdot\beta^\prime(x). \end{equation*}

🔗

Thus if $f(x)=\alpha(\beta(x))$ is the composition of $\alpha(x)$ and $\beta(x)$ then

\begin{equation*} f^\prime(x)=\alpha^\prime(\beta(x))\beta^\prime(x). \end{equation*}

This is the Chain Rule. We have expressed the Chain Rule in this form so that we can prove it rigorously, not so that we can use it. The substitution process using differentials still works so there is no reason to stop using substitution when you are actually computing derivatives.

🔗

Theorem 14.3.2. The Chain Rule.

Suppose that $\beta(x)$ is differentiable at $x\text{,}$ that $\alpha(x)$ is differentiable at $\beta(x)$ and that $\Delta\beta\neq0$ near $x\text{.}$ Then the composition,

\begin{equation*} f(x) = \alpha(\beta(x)) \end{equation*}

is also differentiable, and

\begin{equation} f^\prime(x) =\alpha^\prime(\beta(x))\cdot\beta^\prime(x).\tag{14.2} \end{equation}

🔗

DIGRESSION: The Origins of the Chain Rule.

Before the invention of Calculus, arithmetic primers gave the name “The Chain Rule” to the computational technique that is used to, among other things, convert money from one currency to another. For example if we need to convert $30$ American dollars ($) to British pounds (£) but we only know their values ie euros (€). Specifically we know that

\begin{align*} 1 \text{ dollar} = 0.86\text{ euros,} \amp{}\amp{}\text{ and that } \amp{}\amp{} 1\text{ euro} = 0.9 \text{ pounds.} \end{align*}

Then the conversion is

\begin{align*} 30 \text{ dollars} = 30 \textcolor{red}{\cancel{\text{dollars}}} \times \frac{0.86}{1} \frac{\textcolor{blue}{\cancel{ \text {euros}} } } {\textcolor{red}{\cancel{\text{dollars}}} } \amp{} \times \frac{0.9}{1} \frac{\text{pounds}}{\textcolor{blue}{\cancel{\text{euros}}}}\\ \amp{}=30\times0.86\times0.9\text{ pounds}\\ \amp{}=23.22£ \end{align*}

🔗

Aside: Comment.

We’ve actually seen this type of conversion before. In Chapter 6 we converted angular velocity to linear velocity via the formula:

\begin{align*} \left(\frac{3}{1}\frac{\cancel{\text{revolution}}}{{\cancel{\text{minute}}}}\right)\cdot \left(\frac{2\pi}{1} \frac{\cancel{\text{radians}}}{\cancel{\text{revolution}}}\right)\cdot \left(\frac{10}{1}\frac{\text{meters}}{\cancel{\text{radians}}}\right)\amp{}\cdot \left(\frac{1}{60}\frac{\cancel{\text{minute}}}{\text{second}}\right)\\ \amp{}=\frac{\pi}{1} \frac{\text{meters}}{\text{second}}\\ \amp{}\approx 3.14 \frac{\text{meters}}{\text{second}}. \end{align*}

🔗

A similar chain of cancellations will occur when we differentiate a function composition of the form $\alpha(t)=\alpha(\beta(y(x(t))))\text{.}$ We think of

\begin{gather*} \alpha \text{ as a function of }\beta \left(\text{ so that } \alpha^\prime(\beta)=\dfdx{\alpha}{\beta}\right)\\ \beta \text{ as a function of }y \left(\text{ so that }\beta^\prime(y)=\dfdx{\beta}{y}\right)\\ y \text{ as a function of }x \left(\text{so that} y^\prime(x)=\dfdx{y}{x}\right),\\ \end{gather*}

and

\begin{gather*} x \text{ as a function of }t \left(\text{so that } y^\prime(x)=\dfdx{x}{t}\right). \end{gather*}

🔗

Putting this all together we see that

\begin{equation*} \alpha^\prime(t) =\frac{\dx{\alpha} }{\cancel{\dx{\beta }}} \cdot\frac{\cancel{\dx{\beta}}}{\cancel{\dx{y}}}\cdot\frac{\cancel{\dx{y}}}{\cancel{\dx{x}}} \cdot \frac{\cancel{\dx{x} }}{\dx{t} } = \dfdx{\alpha}{t}. \end{equation*}

🔗

The substitution we used to make things “easier on your eyes” in Section 2.2 is equivalent this chain of cancellations. With the invention of Calculus the older Chain Rule for unit conversion was extended to the differentiation by substitution technique using differentials. Eventually the older usage was dropped and this became the only Chain Rule. When the limit was used to provide rigor to Calculus the name was also applied to equation (14.2)).

🔗

END OF DIGRESSION

🔗

Understanding the Chain Rule in this form requires that we blur the distinction between function and variable a bit. When we compute $\dfdx{\alpha}{\beta}=\alpha^\prime(\beta)$ (the derivative of $\alpha$ with with respect to $\beta$) we view $\beta$ as a variable, but when we compute $\dfdx{\beta}{x}=\beta^\prime(x)$ (the derivative of $\beta$ with respect to $x$) we view it as a function.

🔗

As far as the Chain Rule is concerned it is both.

🔗

Proof.

Before we begin take specific notice of the assumption “$\Delta\beta\neq0$ near $x$” in the statement of the Chain Rule. We will have a few comments about this in Digression: Why Assume That $\Delta\beta\neq0$ Near Zero? after the proof is completed.

🔗

We will first establish that

\begin{equation} \limit{h}{0}{\Delta\beta}=0.\tag{14.3} \end{equation}

🔗

Since $\beta (x)$ is differentiable at $\beta $ then by Theorem 17.4.22 it is also continuous at $x\text{.}$ Thus

\begin{align*} \limit{h}{0}{\Delta \beta } \amp{}= \limit{h}{0}{\left( \beta(x+h)-\beta(x)\right)}\\ \amp{}=\limit{h}{0}{\beta (x+h)}-\limit{h}{0}{\beta (x)}\\ \amp{}=\beta (x)-\beta (x) =0\text{.} \end{align*}

🔗

Aside: Comment.

To prove the Chain Rule recall that

\begin{align*} f^\prime(x)\amp =\limit{h}{0}{\frac{f(x+h)-f(x)}{h}}\\ \amp =\limit{h}{0}{\frac{\alpha(\beta(x+h))-\alpha(\beta(x))}{h}}. \end{align*}

🔗

Aside: Comment.

Multiplying by $1$ in the form $\textcolor{red}{\frac{\Delta\beta}{\Delta\beta}}$ gives

\begin{align} f^\prime(x) \amp =\limit{h}{0}{\left(\frac{\alpha(\beta(x+h))-\alpha(\beta(x))}{\textcolor{red}{\Delta\beta}}\cdot\frac{\textcolor{red}{\Delta\beta}}{h}\right)}.\tag{14.4}\\ \end{align}

Since $\textcolor{red}{\Delta\beta}=\textcolor{blue}{\beta(x+h)-\beta(x)}$ we see that

\begin{align} \amp =\limit{h}{0}{\left(\frac{\alpha(\beta(x+h))-\alpha(\beta(x))}{\Delta\beta}\cdot\frac{\textcolor{blue}{\beta(x+h)-\beta(x)}}{h}\right)}.\notag\\ \end{align}

By Theorem 14.1.2 we have:

\begin{align} \amp =\limit{h}{0}{\left(\frac{\alpha(\beta(x+h))-\alpha(\beta(x))}{\Delta\beta}\right)}\cdot\limit{h}{0}{\left(\frac{\beta(x+h)-\beta(x)}{h}\right)}.\notag\\ \end{align}

Equation (14.3) says that $h\rightarrow 0$ which implies that $\Delta\beta\rightarrow 0$ so we have

\begin{align} f^\prime(x)\amp =\lim_{\textcolor{red}{\underset{\Delta\beta\rightarrow0}{\cancel{h\rightarrow0}}}}{\left(\frac{\alpha(\beta+\Delta\beta)-\alpha(\beta)}{\Delta\beta}\right)}\cdot\limit{h}{0}{\left(\frac{\beta(x+h)-\beta(x)}{h}\right)},\notag\\ f^\prime(x) \amp =\underbrace{\limit{\Delta\beta}{0}{\left(\frac{\alpha(\beta+\Delta\beta)-\alpha(\beta)}{\Delta\beta}\right)}}_{=\alpha^\prime(\beta)}\cdot\underbrace{\limit{h}{0}{\left(\frac{\beta(x+h)-\beta(x)}{h}\right)}}_{=\beta^\prime(x)}.\tag{14.5}\\ f^\prime(x) \amp =\alpha^\prime(\beta)\cdot\beta^\prime(x).\tag{14.6} \end{align}

In equation (14.6) $\beta$ is first used as a variable in $\alpha^\prime(\beta)\text{,}$ and then as the function $\beta(x)\text{.}$ While this is correct, it is also poor form because it accentuates the dual use of $\beta\text{.}$ To avoid this we usually express the Chain Rule as

\begin{equation*} f^\prime(x) =\alpha^\prime(\beta(x))\cdot\beta^\prime(x) \end{equation*}

to emphasize that $x\text{,}$ not $\beta\text{,}$ is the variable.

🔗

DIGRESSION: Why Assume That $\Delta\beta\neq0$ Near Zero?

Do you see why we had to assume that $\Delta\beta\neq0$ near $x\text{?}$

🔗

Observe that in equation (14.5) $\Delta\beta$ plays the same role the $h$ plays in Definition 13.2.3. In Definition 13.2.3 we were careful to insist that $h$ could never equal zero,so if we are going to interpret

\begin{equation*} \limit{\Delta\beta}{0}{\left(\frac{\alpha(\beta+\Delta\beta)-\alpha(\beta)}{\Delta\beta}\right)} \end{equation*}

as the derivative of $\alpha$ with respect to $\beta\text{,}$ as we did in equation (14.5), we need to know that $\Delta\beta\neq0$ when $h$ is near zero.

🔗

Our imposition of that constraint means that Theorem 14.3.2 does not apply to any function $f(x)=\alpha(\beta(x))$ where $\Delta\beta$ might be equal to zero no matter how close $h$ is to zero. Fortunately, functions of that sort are generally the kinds of “pathological functions” that Poincarè is complained about in the quote at the beginning of this chapter. A valid proof of the Chain Rule without that constraint is possible, but since it would have very little relevance to anything we’ll be doing we have chosen to prove only this weaker form of the Chain Rule

🔗

Aside: Comment.

If you are unsatisfied with this proof and want to see a proof of the stronger version of the Chain Rule, consider majoring in mathematics. You’ll see that and much, much more. In the meantime try working through the following problem.

🔗

Problem 14.3.3.

(a)

Show that the function $\beta(x)=\sin\left(\frac1x\right)$ does not satisfy the constraint $\Delta\beta\neq0$ when $x$ is near zero.

🔗

Hint.

Recall Definition 14.1.6.

🔗

(b)

As a result of part 14.3.3.a Theorem 14.3.2 does not apply to any of the following functions at $x=0\text{.}$ Nevertheless one of them is differentiable at $x=0\text{.}$ Use Definition 13.2.3 to find out which one.

$\displaystyle T(x)= \begin{cases} \sin\left(\frac1x\right) \amp x\neq0\\ 0 \amp x=0 \end{cases}.$
🔗

🔗
$\displaystyle U(x)= \begin{cases} x\sin\left(\frac1x\right) \amp x\neq0\\ 0 \amp x=0 \end{cases}.$
🔗

🔗
$\displaystyle V(x)= \begin{cases} x^2\sin\left(\frac1x\right) \amp x\neq0\\ 0 \amp x=0 \end{cases}.$
🔗

🔗

🔗

END OF DIGRESSION

🔗

Example 14.3.4.

Suppose that $f(x)=\left(\sin(x)+\cos(x)\right)^2\text{.}$ To use the Chain Rule to compute the derivative of $f(x)$ we need to recognize that $f(x)$ is the composition of $\alpha(x)=x^2\text{,}$ and $\beta(x)=\sin(x)+\cos(x)$ and then apply Theorem 14.3.2 as follows.

\begin{align*} f^\prime(x) \amp = \alpha^\prime(\beta(x))\cdot\beta^\prime(x)\\ \amp = \alpha^\prime(\beta(x))\cdot(\cos(x)-\sin(x))\\ \amp = \alpha^\prime(\sin(x)+\cos(x))\cdot(\cos(x)-\sin(x))\\ f^\prime(x)\amp = 2 (\sin(x)+\cos(x))\cdot(\cos(x)-\sin(x)). \end{align*}

🔗

In our opinion the Chain Rule leaves a lot to be desired as a computational technique. But we don’t have to use it that way since Theorem 14.3.2 validates the substitutions we have always used.

🔗

Drill 14.3.5.

Suppose $y=f(x)=\left(\sin(x)+\cos(x)\right)^2\text{.}$ Compute the differential $\dx{y}$ and then divide through by $\dx{x}$ to find the derivative $\dfdx{y}{x}\text{.}$ Confirm that it is the same as the derivative we found in Example 14.3.4.

🔗

Problem 14.3.6.

Compute $\dfdx{y}{x}$ for each of the following functions by identifying $\alpha(x)$ and $\beta(x)$ such that $y(x) = \alpha(\beta(x))$ and applying the Chain Rule. You may have to do this more than once for a given problem. In each case confirm that your computation is correct with an appropriate differential substitution.

🔗

(a)

$y=(3x+5)^6$

🔗

(b)

$y=\sec(\tan(x))$

🔗

(c)

$y=\sqrt[7]{\frac{1}{x} +x^3}$

🔗

(d)

$y=\left(\frac{x-x^{\frac12}}{x^3-1}\right)^2$

🔗

(e)

$y=e^{x-\cos^2(x)}+(2x^2-3)^{\frac15}$

🔗

(f)

$y=\sqrt{x+\sqrt[3]{2+\sqrt[4]{3-x^2}}}$

🔗

Prev Top Next

Differential Calculus From Practice to Theory

Section 14.3 The Chain Rule

Example 14.3.1.

Theorem 14.3.2. The Chain Rule.

DIGRESSION: The Origins of the Chain Rule.

Aside: Comment.

Proof.

Aside: Comment.

Aside: Comment.

DIGRESSION: Why Assume That \(\Delta\beta\neq0\) Near Zero?

Aside: Comment.

Problem 14.3.3.

(a)

(b)

Example 14.3.4.

Drill 14.3.5.

Problem 14.3.6.

(a)

(b)

(c)

(d)

(e)

(f)