5 Derivatives

These notes are reusing text and imagery from: Calculus Volume 1 from OpenStax, Print ISBN 193816802X, Digital ISBN 1947172131, https://www.openstax.org/details/calculus-volume-1 and Precalculus from OpenStax, Print ISBN 1938168348, Digital ISBN 1947172069, https://www.openstax.org/details/precalculus

5.1 Defining the derivative

The origins of calculus date back to the 17th century to the works of Isaac Newton and Gottfried Leibniz.

5.1.1 Secant line

We discussed already in the section 1.2 Increasing / decreasing functions how to calculate the average rate of change of a function between two points.

The line intersecting the curve at the two points of interest is called the secant. Note that the secant line has the same slope as is the average rate of change of the curve.

\[ \text{avg rate of chg } = m_{sec} = \frac{f(x)-f(a)}{x - a} = \frac{\Delta y}{\Delta x} \enspace .\]

Instead of using the points \(a\) and \(x\), we can use the points \(a\) and \(a+h\).

In this notation the average rate of change and the slope of the secant line are

\[\text{avg rate of chg } = m_{sec} = \frac{f(a+h)-f(a)}{a + h - a} = \frac{f(a+h)-f(a)}{h} \enspace .\]

5.1.2 Tangent line, derivative

As \(h\) is getting smaller and the point \(a+h\) closer to \(a\), the secant line approaches the tangent line at point \(a\) and the average rate of change approaches the instantaneous rate of change at point \(a\), the derivative at point \(a\) indicated as \(f'(a)\).

Definition: The derivative of function \(f(x)\) at point \(a\) denoted as \(f'(a)\) is the limit \[ f'(a) = \text{inst rate of chg } = m_{tan} = \lim_{x \to a} \frac{f(x)-f(a)}{x - a} \enspace ,\] provided it exists. In an alternative notation we defined the derivative \(f'(a)\) \[ f'(a) = \text{inst rate of chg } = m_{tan} = \lim_{h \to 0} \frac{f(a+h)-f(a)}{h} \enspace .\] We call the process of finding the derivative differentiation.

We can find the derivative by using directly the definition. For this, one must be quite skilled and experienced in solving limits. The other approach, more common in practice, is to rely on a few basic rules of differentiation. More on this later in 5.4 Differentiation rules

Graph: limit definition of derivative

5.2 Derivative function

We have defined a derivative \(f'(x)\) of a function \(f\) at a point \(x\) within the domain of the original function as a limit. It gives us the slope of the tangent line or the instantaneous rate of change at the point \(x\), useful information about the shape of the function.

It is quite natural to search to understand the values of the derivative at every point of the domain of the original function \(f\).

Definition: For a function \(f\) the derivative function \(f'\) is the function that evaluates to the limit below for all points \(x\) in the domain where the limit exists \[f'(x) = \lim_{h \to 0} \frac{f(x+h)-f(x)}{h} \enspace .\] Note: This definition is quite similar to the definition of the derivative at a point. Do not panic, they are indeed very similar. The only difference is that in one we speak about the derivative at a specific point \(f'(x)\) and in the other about the derivative function \(f'\) in general. But as usual, the derivative at a point is simply the evaluation of the derivative function at that point.

Also, as for any other function we can create a table of the derivative function \(f'\) values and we can plot it into a graph. It is simply a function as any other.

Graph: derivative function

5.2.1 Differentiability

No all functions \(f\) are differentiable at all points in the domain, that is the limit above may not exist in all points \(x \in \mathit{D}\).

  • If the limit exists for a specific point \(a\) we say the function \(f\) is differentiable at \(a\) and the derivative \(f'(a)\) exists.
  • If the limit does not exists, the function is said to be non-differentiable at \(a\) and the derivative \(f'(a)\) does note exist.
  • If the limit exists for all points in an interval \((a,b)\) we say the function is differentiable on \((a,b)\).
  • If the limit exists for all points in the domain \(x \in \mathit D\) we say \(f\) is a differentiable function.

5.2.2 Notation

There is a variate of notation for the derivatives so it is good to get to know them. All of these represent the derivative of a function \(y = f(x)\)

\[f'(x), \frac{dy}{dx}, y', \frac{d}{dx}\big(f(x)\big)\]

The notation \(\frac{dy}{dx}\) (called Leibnitz) is very common in neural networks literature. To indicate we evaluate the derivative of the function at a specific point \(a\) we use the following \[\frac{dy}{dx} \Big\rvert_{x=a}\]

5.2.3 Derivatives and continuity

We discussed in section 3.3 Continuity the concept of continuity and how it relates to the existence of the limit of a function. Intuitively, there must be a link between differentiability (the existence of a derivative) and the continuity because we are still talking about limits here.

Differentiability implies continuity: If a function \(f\) is differentiable at a point \(a\) within its domain (the derivative \(f'(a)\) exists), the function is continuous at the point \(a\).

Continuity does not imply differentiability!

For example the absolute value function \(f(x) = |x|\) is continuous at 0 (\(\lim_{x \to 0} |x| = 0\)) but it is not differentiable because \[f'(0) = \lim_{x \to 0} \frac{f(x)-f(0)}{x - 0} = \lim_{x \to 0} \frac{|x|-|0|}{x - 0} = \lim_{x \to 0} \frac{|x|}{x} \quad \text{does not exist}\] \[\lim_{x \to 0^+} \frac{|x|}{x} = 1 \qquad \lim_{x \to 0^-} \frac{|x|}{x} = -1 \qquad \text{not equal}\]

The absolute value function has a sharp corner at \(0\), not smooth. The limit slope of the tangent from the left is not the same as the slope of limit tangent from the right.

The tangent line at point of the function \(f(x) = sqrt[3]{x}\) is vertical, its slope is \(\infty\).

Summary: A function \(f\) is not differentiable at a point \(a\) if

  • it is not continuous at the point
  • it has a sharp corner at the point
  • the tangent line at the point is vertical (slope \(\infty\))
  • (and some more complicated cases)

5.2.4 Higher order derivatives

The derivative function \(f'\) is a function as any other and therefore we can differentiate it again. The derivative of a derivative is called the 2nd-order derivative \(f''\). It is again just a function and therefore we can go on with the differentiation and create higher order derivatives. We indicate the as \[f'(x), f''(x), f'''(x), f^{(4)}(x), \ldots, f^{(n)}(x)\] or \[y'(x), y''(x), y'''(x), y^{(4)}(x), \ldots, y^{(n)}(x)\] or \[\frac{d^2y}{dx^2}, \frac{d^3y}{dx^3}, \frac{d^4y}{dx^4}, \ldots, \frac{d^ny}{dx^n}\] Observe: \[\frac{d^2y}{dx^2} = \frac{d}{dx}\Big(\frac{dy}{dx}\Big)\]

Graph: Second order derivative

5.3 Application of derivatives

There are many uses of derivatives in science but we will limit ourselves to two: finding extreme values and linear approximation of a function

5.3.1 Finding extrema (maxima and minima)

Definition: A point \(c\) in the domain of \(f\) is a critical point of \(f\) if \(f'(c) = 0\) or if \(f'(c)\) is undefined.

Fermat’s theorem: If a function \(f\) has a local extremum at point \(c\) and \(f\) is differentiable at \(c\) then \(f'(c) = 0\).

Graph: Derivative sign

Careful: Not all points with \(f'(c) = 0\) are extrema!

To find the global extrema of a continuous function \(f\) over a close interval \([a, b]\) evaluate the function at the end points \(a, b\) and at all critical points \(c\) and compare.

5.3.1.1 The first derivative test

A function \(f\) is increasing over interval \([a, b]\) if \(f'(x) > 0\), it is decreasing if \(f'(x) < 0\).

For a critical point \(c\) (\(f'(c)=0\), or \(f'(c)\) is undefined):

  • if \(f'\) changes sign from positive when \(x<c\) to negative when \(x > c\) then \(c\) is a local maximum of \(f\)
  • if \(f'\) changes sign from negative when \(x<c\) to positive when \(x > c\) then \(c\) is a local minimum of \(f\)
  • if \(f'\) has the same sign when \(x<c\) and \(x > c\) then \(c\) is neither local maximum nor minimum of \(f\)

Strategy:

  1. Find all critical points \(c\) of a function \(f\) and split it into corresponding sub-intervals
  2. Check the sign of \(f'\) in each sub-interval (checking one point is enough, all the others will have the same sign)
  3. Use the result of 2. to decide if each \(c\) is local maximum, minimum or neither

5.3.1.2 The second derivative test

The function \(f\) is

  • convex over an interval \(I\) if \(f'\) is increasing over \(I\). \(f'\) is increasing over an interval \(I\) if its derivative \(f''(x) > 0\) for all \(x \in I\)
  • concave over interval \(I\) if \(f'\) is decreasing over \(I\). \(f'\) is decreasing over an interval \(I\) if its derivative \(f''(x) < 0\) for all \(x \in I\)

Alternative, yet equivalent definitions are as follows. The function \(f\) is

  • convex over an interval \(I\) if \(f(t x_1 + (1-t) x_2) \leq f(t x_1) + f( (1-t) x_2)\) for all \(x_1, x_2 \in I\) and \(t \in [0,1]\).
  • concave over an interval \(I\) if \(f(t x_1 + (1-t) x_2) \geq f(t x_1) + f( (1-t) x_2)\) for all \(x_1, x_2 \in I\) and \(t \in [0,1]\).

When a function \(f\) is convex over its whole domain \(D\), then any critical point \(c\) with \(f'(c) = 0\) is the global minimum. When a funciton is concave, then any point \(f'(c) = 0\) is the global maximum.


The point \(a\) where \(f\) changes from convex to concave (or vice versa) is an inflection point of \(f\). The second derivative of the function \(f''\) at the inflection points is either \(f''(a) = 0\) zero or undefined (critical point of the 1st derivative function f’).

Gaphs: Convex, concave, inflection point


The second derivative test: Suppose \(f'(c) = 0\) and \(f''\) is continuous over an interval containing \(c\)

  • if \(f''(c) > 0\) than \(f\) has a local minimum at \(c\)
  • if \(f''(c) < 0\) than \(f\) has a local maximum at \(c\)
  • if \(f''(c) = 0\) than the test is inconclusive (us the 1st derivative test instead)

5.3.2 Linear approximation of a function at a point

We defined derivative as the slope of the tangent line \[ f'(a) = m_{tan} = \lim_{h \to 0} \frac{f(a+h)-f(a)}{h} \enspace .\] For small \(h\) we can say that \[ f'(a) \approx \frac{f(a+h)-f(a)}{h} \enspace .\] We can then solve the equation for \(f(a + h)\) \[ f(a+h) \approx f(a) + f'(a)h \\ f(x) \approx f(a) + f'(a)(x-a) , \quad x = a+h \enspace ,\]

where \[L(x) = y = f(a) + f'(a)(x-a)\] is the tangent line which is the linear approximation of the function \(f\) at point \(x = a\).

Graph: linear approximation of a function

5.4 Differentiation rules

Finding derivatives by using the definition (the limit) can be lengthy and rather challenging for some functions.

Therefore, mathematicians have derived (and proved) rules of differentiation we can apply to the function to simplify the process.

Function Derivative Rule
\(f(x) = c\) \(f'(x) = \frac{d}{dx}(c) = 0\) constant
\(f(x) = x^n\) \(f'(x) = \frac{d}{dx}(x^n) = nx^{n-1}\) power
\(f(x) = x^\frac{m}{n}\) \(f'(x) = \frac{d}{dx}(x^\frac{m}{n}) = \frac{m}{n} x^{\frac{m}{n}-1}\) power or rational exponents
\(f(x) = cg(x)\) \(f'(x) = \frac{d}{dx}(cg(x)) = c\frac{d}{dx}(g(x)) = cg'(x))\) constant multiple
\(f(x) = g(x) \pm h(x)\) \(f'(x) = \frac{d}{dx}(g(x) \pm h(x)) = \frac{d}{dx}(g(x)) \pm \frac{d}{dx}(h(x)) = g'(x) \pm h'(x)\) sum, difference
\(f(x) = g(x)h(x)\) \(f'(x) = \frac{d}{dx}(g(x)h(x)) = \frac{d}{dx}(g(x))h(x) + \frac{d}{dx}(h(x))g(x) = g'(x)h(x)+h'(x)g(x)\) product
\(f(x) = \frac{g(x)}{h(x)}\) \(f'(x) = \frac{d}{dx}(\frac{g(x)}{h(x)}) = \frac{\frac{d}{dx}(g(x))h(x) - \frac{d}{dx}(h(x))g(x)}{(g(x))2} = \frac{g'(x)h(x)-h'(x)g(x)}{(g(x))^2}\) quotient

We can combine all the rules listed above to differentiate complicated functions.

5.4.1 Chain rule

The differentiation rules in the table above are useful and help us to find derivatives of simple functions.

However, when the function is more complicated constructed as a composition of two or more functions, we need to rely on the chain rule

Chain rule is THE rule you need to know to understand the concept of back-propagation in neural networks!

For example, using the chain rule we can find the derivative of the function \(f(x) = \sqrt{3x^2+1}\), which is the composition of the two functions \(h(x) = \sqrt{x}\) and \(g(x) = 3x^2+1\).

Chain rule: Let \(g(x)\) and \(h(x)\) be differentiable functions. The derivative of the composition \[f(x) = (h \circ g)(x) = h(g(x))\] is given by \[f'(x) = h'(g(x)) \, g'(x) \enspace .\]

For the NNs we use often the following notation: \[\begin{gather} y=f(x) = h(g(x)) = h(u), \qquad u=g(x) \\ f'(x) = \frac{dy}{dx}, \qquad h'(u) = \frac{dy}{du}, \qquad g'(x) = \frac{du}{dx}\\ \textbf{chain rule: } \quad \frac{dy}{dx} = \frac{dy}{du} \frac{du}{dx} \end{gather}\]


We can extend the same rule to composition of more than 3 functions. The derivative of the function

\[k(x) = f( h ( g(x) ) )\] is given by \[k'(x) = f'( h ( g(x) ) ) \, h' ( g(x) ) \, g'(x)\] In the NN notation \[y = k(x) = f( h ( g(x) ) ) = f( h ( v ) ) = f(u), \qquad u = h(v) = h( g(x) ), \qquad v = g(x) \\ k'(x) = \frac{dy}{dx}, \qquad f'(u) = \frac{dy}{du}, \qquad h'(v) = \frac{du}{dv}, \qquad g'(x) = \frac{dv}{dx} \\ \textbf{chain rule: } \quad \frac{dy}{dx} = \frac{dy}{du} \frac{du}{dv} \frac{dv}{dx} \]

5.4.2 Derivatives of selected important functions

Function Derivative
\(f(x) = \sin x\) \(f'(x) = \frac{d}{dx}(\sin x) = \cos x\)
\(f(x) = \cos x\) \(f'(x) = \frac{d}{dx}(\cos x) = -\sin x\)
\(f(x) = e^x\) \(f'(x) = \frac{d}{dx}(e^x) = e^x\)
\(f(x) = e^{g(x)}\) \(f'(x) = \frac{d}{dx}(e^{g(x)}) = e^{g(x)} \, g'(x)\)
\(f(x) = \log x\) \(f'(x) = \frac{d}{dx}(\log x) = \frac{1}{x}\)
\(f(x) = \log g(x)\) \(f'(x) = \frac{d}{dx}(\log g(x)) = \frac{1}{g(x)} g'(x)\)

5.4.3 Approximation by finite difference

From the definition of the derivative we know that for small \(h\) \[f'(a) \approx \frac{f(a+h)-f(a)}{h} \enspace .\] We can thus use this finite difference calculation to approximate the derivative of the function at point \(a\).

Often times you can see a two-sided symmetric approximation \[f'(a) \approx \frac{f(a+h)-f(a - h)}{2h} \enspace .\] These finite difference approximations may be useful when checking the analytical form of your derivatives. The derivative function you found analytically is mostly likely correct if its evaluations are near the finite difference approxiamation at multiple randomly selected points \(a\).