Skip to main content

Section 5.6 The Derivative

Figure 5.6.1. Joseph Louis Lagrange
 18 
https://mathshistory.st-andrews.ac.uk/Biographies/Lagrange/
(1736–1813)
We have seen how differentials and the rules for computing them can be useful in applications involving slopes, velocities, and accelerations. The fact that they are infinitely small and difficult to rigorously define does not diminish their value as a problem-solving tool. Eventually, we will return to these foundational issues in Chapter 13.
As subtle as the concept of infinitely small changes such as \(\dx{x}\text{,}\) \(\dx{y}\text{,}\) or\(\dx{t}\) can be, we have seen how their ratios \(\dfdx{y}{x}\text{,}\) \(\dfdx{x}{t} \text{,}\) or \(\dfdxn{y}{x}{2} = \frac{\dx \left( \dfdx{y}{t} \right) }{\dx{t}}\) can represent actual finite physical quantities. For later mathematicians, these ratios became the focal point for the understanding of calculus.
Because of concerns regarding the validity of differentials, mathematicians in the \(18\)th and \(19\)th centuries, had a strong motivation to skip over the differential concept and jump immediately to the more useful, and finite, differential ratio.
In his 1797 work Théorie des Fonctions Analytiques
 19 
https://old.maa.org/press/periodicals/convergence/mathematical-treasure-lagrange-on-calculus
(The Theory of Analytic Functions), Joseph Louis Lagrange
 20 
https://mathshistory.st-andrews.ac.uk/Biographies/Lagrange/
attempted to make Calculus more rigorous. He even coined a new term for the differential ratio. He called it the fonction dérivée (meaning a function derived from another function). He also replaced the differential ratio \(\dfdx{y}{x}\) with the more modern function notation \(y^\prime(x)\) (read this aloud as “\(y\) prime of \(x\)”).
Lagrange’s attempt to make Calculus rigorous was very clever, but ultimately unsuccessful. Full rigor had to wait for another hundred years, so we will not say much about Lagrange’s efforts here. But we will adopt his terminology and his notation.

DIGRESSION: Function Notation and Prime Notation.

Suppose that \(y=x^3\text{.}\) It is clear that \(y\) depends on \(x\text{,}\) so we denote this functional dependence with the notation
\begin{equation*} y(x)=x^3. \end{equation*}
Lagrange called the differential ratio \(\dfdx{y}{x}\text{,}\) a derived function. The “derived” part seems clear enough. After all, if \(y=x^3\) them \(\dfdx{y}{x}\) is obtained (derived) from \(y\) as follows:
\begin{align*} y\amp =x^3 \\ \dx{y}\amp =3x^2\dx{x}\\ \dfdx{y}{x}\amp =3x^2. \end{align*}
Since \(\dfdx{y}{x}\) depends on \(3x^2\) as a function it would not be wrong to denote this functional dependence as
\begin{equation*} \dfdx{y}{x}(x) = 3x^2 \end{equation*}
but it would be awkward. Moreover, Lagrange was trying to get away from the use of differentials so instead he wrote
\begin{equation*} y^\prime(x)=3x^2 \end{equation*}
and called \(y^\prime(x)\) the fonction dérivée of \(y(x)\text{.}\) In English the phrase fonction dérivée has been shortened to the derivative.
In some contexts Lagrange’s prime notation has several advantages over the differential notation we’ve been using. Over time it has become the most common notation for the derivative in mathematics. But the fact that it took over \(100\) years to develop suggests that something more than mere notation is in play here.
Using multiple equivalent notations can be very confusing for beginners. Since our current task is simply to master the differentiation rules we will stick to Leibniz’s differential notation as much as possible. But there will come a time when Lagrange’s prime notation will be much more convenient. At that point we will casually use the two expressions \(\dfdx{y}{x}\) and \(y^\prime(x)\) interchangeably and we will think of them both as a function derived from the function \(y(x)\text{.}\)
When we do this the differential notation we’re currently emphasizing will take on two distinct “personalities.” On the one hand \(\dfdx{y}{x}\) represents a ratio of the differentials \(\dx{y}\text{,}\) and \(\dx{x}\) which are distinct infinitesimal quantities. On the other hand \(\dfdx{y}{x}\) is the name of a function -- it is all one symbol. When we are thinking of \(\dfdx{y}{x}\) as the derivative function we cannot detach the pieces of \(\dfdx{y}{x}\) any more than can delete the letter “n” from \(\sin(x)\) because \(\text{si}(x)\) has no meaning.
Eventually the differentials we’ve been using so casually will become a guilty secret. Given \(y=y(x)\) we’ll use them as a helpful aid while we compute. But as soon as we have \(\dfdx{y}{x}\) in hand we will view it as a single, complete symbol representing the (finite) derivative function. Often we will simply replace it with \(y^\prime(x)\) as if we are ashamed of having used differentials at all.
This more advanced viewpoint will become commonplace later, but to give you a preview, consider the following problem.

Problem 5.6.2.

Recall that in Descartes’ Method of Normals, we had to find a double root of a polynomial. To deal with this problem, Johann van Waveren Hudde
 21 
https://mathshistory.st-andrews.ac.uk/Biographies/Hudde/
(1628–1704) developed an algebraic tool for determining such double roots. Calculus allows a development of Hudde’s Rule that does not require the complex algebraic reasoning that Hudde used and is much easier to follow.
Consider any polynomial \(p(x)=a_0+a_1 x+\cdots+a_n x^n\text{.}\) Let \(a\) and \(b\) be any real numbers and form the following “Hudde Polynomial.”
\begin{equation*} H\left (x \right ) =a {a} _ {0} + \left (a+b \right ) {a} _ {1} x+ \left (a+2b \right ) {a} _ {2} {x} ^ {2} + \left (a+3b \right ) {a} _ {3} {x} ^ {3} +\ldots+ \left (a+nb \right ) {a} _ {n} {x} ^ {n} \end{equation*}
Hudde showed that if \(r\) is a double root of \(p(x)\text{,}\) then \(r\) is a root of the Hudde polynomial \(H(x).\)
(a)
Show that if \(r\) is a double root of the polynomial \(p(x)\) then it is a root of \(p^\prime(x)=\dfdx{p}{x}\text{.}\)
Hint.
If \(r\) is a double root of \(p(x)\text{,}\) then \(p(x) = (x-r)^2q(x)\) for some polynomial \(q(x)\text{.}\)
(b)
Show that \(H(x) = ap(x)+bxp^\prime(x)\) and use this to prove Hudde’s Rule.
END OF DIGRESSION
The bottom line is that we will adopt the name derivative to indicate the result of dividing one differential by another. So the expression \(\dfdx{y}{x}\) is “the derivative of \(y\) with respect to \(x\text{.}\)” Although \(y^\prime(x) \) is simpler to write and to read the sheer simplicity of the notation can be a problem for beginners so we will only use it when the prime notation actually clarifies our meaning. The first place this occurs is in Section 7.2.
When computing a derivative you will eventually become sufficiently proficient that you will jump directly to the derivatives. But for now you should go through the two-step process of differentiating to obtain a differential and then dividing by another differential to obtain a derivative because the computational rules you’ve learned are differentiation rules, not derivative rules. If you do this, you will avoid some difficulties created by trying compute too much too soon. This can be illustrated in the following example, where we purposely use prime notation to highlight the difficulties involved in the computation.

Example 5.6.3.

Given \(y(x)=(1+x^2)^\frac12\) we wish to compute \(y^\prime(x)\text{.}\) Setting \(z=1+x^2\) we see that
\begin{equation*} y(x)=z^\frac12. \end{equation*}
By the Power Rule we have
\begin{equation} y^\prime(x)=\frac12z^{-\frac12}\text{.}\tag{5.6} \end{equation}
This would seem to be correct but it is not. Do you see the problem?
The left side of equation (5.6) indicates that the variable is \(x\) but there is no \(x\) on the right side, only \(z\text{.}\) So this can’t be right. But what went wrong? We can avoid problems like this by using differentials:
\begin{equation*} \label{eq:PrimeVSDiff2} \dx{y}=\frac12z^{-\frac12}\dx{z}. \end{equation*}
At this point if we divide by \(\dx{z}\) we recover equation (5.6) in the form:
\begin{equation*} \dfdx{y}{z}=\frac12z^{-\frac12}. \end{equation*}
Thus we see the left side of equation (5.6) should have been \(y^\prime(z)\) not \(y^\prime(x)\text{.}\)

Problem 5.6.4.

Starting with equation (5.6) complete the computation of \(y^\prime(x)\text{.}\)

Example 5.6.5.

Of course, using differentials does not address all of the difficulties. For example, let \(y= x^3\text{.}\) Then
\begin{equation*} \dx{y}= \dx{(x^3)}= 3x^2\dx{x} \left(\text{First Derivative:}\dfdx{y}{x}=3x^2\right) \end{equation*}
So far, so good. Next we apply the Product Rule,
\begin{equation} \dx{(\dx{y})}= \dx(3x^2\dx{x}) = 3x^2\underbrace{\dx{(\dx{x})}}_{=0} + 6x\dx{x}\dx{x}\tag{5.7} \end{equation}
so
\begin{equation} \dx{(\dx{y})} = 6x\dx{x}\dx{x} \left(\text{Second Derivative: }\frac{\dx(\dx{y})}{\dx{x}\dx{x}}=6x=\dfdxn{y}{x}{2}.\right)\tag{5.8} \end{equation}
The glaring question here is why is \(\dx(\dx{x})\) equal to zero in equation (5.7) but \(\dx(\dx{y})\) is not equal to zero in equation (5.8)? Or, at a more fundamental level, what do we mean by “the infinitely small change of an infinitely small change?” As we will see in Chapter 13 the early critics of Calculus cited this question specifically to argue that Calculus was invalid.
We will address these issues beginning in Chapter 13. For now we will make the following compromise: We will only differentiate finite quantities, be they functions, or derivatives. Since our ultimate goal is to compute some derivative this will suit our needs without getting caught up in the very problematic question of the nature of higher order differentials. So for this example we have
\begin{align*} y\amp =x^3\\ \dx{y}\amp =3x^2\dx{x}\\ \text{ (First Derivative) } \dfdx{y}{x}\amp =3x^2\\ \dx\left(\dfdx{y}{x}\right)\amp =6x\dx{x}\\ \text{(Second Derivative) } \dfdxn{y}{x}{2}\amp =\dfdx{\left(\dfdx{y}{x}\right)}{x}=6x\\ \text{(Third Derivative) }\dfdxn{y}{x}{3}\amp =\dfdx{\left(\dfdxn{y}{x}{2}\right)}{x}=6. \end{align*}

Example 5.6.6.

Consider the expression \(y=\frac1x =x^{-1}\text{.}\) Differentiating we have
\begin{align*} \dfdx{y}{x}\amp =(-1)x^{-2},\\ \dfdxn{y}{x}{2} \amp = (-1)(-2)x^{-3},\\ \dfdxn{y}{x}{3} \amp = (-1)(-2)(-3)x^{-4}, \text{ and finally}\\ \dfdxn{y}{x}{4} \amp = (-1)(-2)(-3)(-4)x^{-5} \end{align*}
You’ve probably been taught all of your life to “simplify” complex looking expressions like \((-1)(-2)(-3)(-4)\) and you probably do it without thinking. So you may be wondering why we left the coefficients above in the form we did.
The reason is simple. We were looking for patterns not numbers. Writing the above formulas as \(\dfdxn{y}{x}{2} = 2x^{-3},\) \(\dfdxn{y}{x}{3} = -6x^{-4}\text{,}\) and \(\dfdxn{y}{x}{4} = 24x^{-5}\) obscures the pattern. Keep this in mind as you proceed. Algebraic or arithmetical “simplifications” often get in the way of recognizing patterns. Don’t do them until there is a compelling reason to.

Problem 5.6.7. Find the Pattern.

Find the pattern in Example 5.6.6. Use this pattern to find \(\dfdxn{y}{x}{50}\) directly, without computing all fifty derivatives.

Example 5.6.8.

Consider the circle \(x^2+y^2=1\text{.}\) Differentiating, we have \(2x\dx{x}+2y\dx{y}=0\text{,}\) or \(\dfdx{y}{x}=-\frac{x}{y}\text{.}\) Differentiating again we have
\begin{equation*} \dx{\left(\dfdx{y}{x}\right)}=-\frac{y\dx{x}-x\dx{y}}{y^2}. \end{equation*}

Problem 5.6.9.

(a)

Continue this example to show that \(\dfdxn{y}{x}{2}=-\frac{1}{y^3}\text{.}\)

(b)

Show that \(y=\pm\sqrt{1-x^2}\) and use this to compute \(\dfdxn{y}{x}{2}\text{.}\)

(c)

Do you get the same answer? Which method do you prefer?

Problem 5.6.10.

For each of the following find \(\dfdxn{y}{x}{2}\) in terms of \(x\) and \(y\text{.}\)

(a)

\(y=3x^4-x^3+2x-7\)

(b)

\(x=y^2\)

(c)

\(y=\sqrt{x}\) Compare this with part (b).

(d)

\(xy=1\) Compare this with Example 5.6.6. Which method do you prefer?

(e)

\(\frac{x^2+y}{3x+y^2}=x-y\)

Problem 5.6.11.

We know that it is not generally true that \(a^b=a\cdot b\) even though there are certain exceptions, like \(a=b=1\text{,}\) \(a=4\) and \(b=1/2\text{,}\) or \(a=b=2\text{.}\) In the same way, even though the Product Rule makes it very clear that
\begin{equation} \dfdx{(y\cdot z)}{x}\neq \dfdx{y}{x}\cdot\dfdx{z}{x}.\tag{5.9} \end{equation}
there are certain pairs of functions which are exceptions; for which (5.9) is true. For example, show that for each of the following it is true that \(\dfdx{(y\cdot z)}{x}= \dfdx{y}{x}\cdot\dfdx{z}{x}\text{.}\)

(a)

  1. \(y=x\)
    \(\displaystyle z=\frac{1}{1-x}\)
  2. \(y=x^2\)
    \(\displaystyle z=\frac{1}{(2-x)^{2}}\)
  3. \(y=x^3\)
    \(z=(3-x)^{-3}\)

(b)

Find the general pattern in part (a).

(c)

Those pairs of functions which fit the pattern you found in part (b) are not the only exceptional pairs. Can you find others?
Adopting Lagrange’s terminology, but not his notation, we see that if the position of a point moving in a straight line (like the \(x\) axis) is given by \(x=x(t)\text{,}\) then the first derivative, \(\dfdx{x}{t}\text{,}\) will give its velocity, and its second derivative, \(\dfdxn{x}{t}{2}\text{,}\) will give its acceleration.

Drill 5.6.12.

Each of the following represents the position of a point on the \(x\)-axis at time \(t\text{.}\) Find the velocity and acceleration.
  1. \(\displaystyle x(t)=12t^3\)
  2. \(\displaystyle x(t)=-4t^4+3t^2+1\)
  3. \(\displaystyle x(t)=5-2\sqrt{t}+t^3\)
  4. \(\displaystyle \displaystyle x(t)=\frac12t^{1/2}+t^{-1/2}\)
  5. \(\displaystyle \displaystyle x(t)=\frac{1}{\sqrt{t^2+t+1}}\)
  6. \(\displaystyle x(t)= t^{2/3}\)

Problem 5.6.13.

For each of the following \(x(t)\) represents the position of a point moving along the \(x\)-axis. Use the information given to determine if the point is slowing down or speeding up at the instant \(t_0\text{.}\)

(a)

\(\left.\dfdx{x}{t}\right|_{t=t_0}\gt 0, \left.\dfdxn{x}{t}{2}\right|_{t=t_0}\gt 0\)

(b)

\(\left.\dfdx{x}{t}\right|_{t=t_0}\gt 0, \left.\dfdxn{x}{t}{2}\right|_{t=t_0}\lt 0\)

(c)

\(\left.\dfdx{x}{t}\right|_{t=t_0}\lt 0, \left.\dfdxn{x}{t}{2}\right|_{t=t_0}\gt 0\)

(d)

\(\left.\dfdx{x}{t}\right|_{t=t_0}\lt 0, \left.\dfdxn{x}{t}{2}\right|_{t=t_0}\lt 0\)