These notes are reusing text and imagery from: Calculus Volume 1 from OpenStax, Print ISBN 193816802X, Digital ISBN 1947172131, https://www.openstax.org/details/calculus-volume-1 and Precalculus from OpenStax, Print ISBN 1938168348, Digital ISBN 1947172069, https://www.openstax.org/details/precalculus
The origins of calculus date back to the 17th century to the works of Isaac Newton and Gottfried Leibniz.
We discussed already in the section 1.2 Increasing / decreasing functions how to calculate the average rate of change of a function between two points.
The line intersecting the curve at the two points of interest is called the secant. Note that the secant line has the same slope as is the average rate of change of the curve.
avg rate of chg =msec=f(x)−f(a)x−a=ΔyΔx.
Instead of using the points a and x, we can use the points a and a+h.
In this notation the average rate of change and the slope of the secant line are
avg rate of chg =msec=f(a+h)−f(a)a+h−a=f(a+h)−f(a)h.
As h is getting smaller and the point a+h closer to a, the secant line approaches the tangent line at point a and the average rate of change approaches the instantaneous rate of change at point a, the derivative at point a indicated as f′(a).
Definition: The derivative of function f(x) at point a denoted as f′(a) is the limit f′(a)=inst rate of chg =mtan=lim provided it exists. In an alternative notation we defined the derivative f'(a) f'(a) = \text{inst rate of chg } = m_{tan} = \lim_{h \to 0} \frac{f(a+h)-f(a)}{h} \enspace . We call the process of finding the derivative differentiation.
We can find the derivative by using directly the definition. For this, one must be quite skilled and experienced in solving limits. The other approach, more common in practice, is to rely on a few basic rules of differentiation. More on this later in 5.4 Differentiation rules
We have defined a derivative f'(x) of a function f at a point x within the domain of the original function as a limit. It gives us the slope of the tangent line or the instantaneous rate of change at the point x, useful information about the shape of the function.
It is quite natural to search to understand the values of the derivative at every point of the domain of the original function f.
Definition: For a function f the derivative function f' is the function that evaluates to the limit below for all points x in the domain where the limit exists f'(x) = \lim_{h \to 0} \frac{f(x+h)-f(x)}{h} \enspace . Note: This definition is quite similar to the definition of the derivative at a point. Do not panic, they are indeed very similar. The only difference is that in one we speak about the derivative at a specific point f'(x) and in the other about the derivative function f' in general. But as usual, the derivative at a point is simply the evaluation of the derivative function at that point.
Also, as for any other function we can create a table of the derivative function f' values and we can plot it into a graph. It is simply a function as any other.
No all functions f are differentiable at all points in the domain, that is the limit above may not exist in all points x \in \mathit{D}.
There is a variate of notation for the derivatives so it is good to get to know them. All of these represent the derivative of a function y = f(x)
f'(x), \frac{dy}{dx}, y', \frac{d}{dx}\big(f(x)\big)
The notation \frac{dy}{dx} (called Leibnitz) is very common in neural networks literature. To indicate we evaluate the derivative of the function at a specific point a we use the following \frac{dy}{dx} \Big\rvert_{x=a}
We discussed in section 3.3 Continuity the concept of continuity and how it relates to the existence of the limit of a function. Intuitively, there must be a link between differentiability (the existence of a derivative) and the continuity because we are still talking about limits here.
Differentiability implies continuity: If a function f is differentiable at a point a within its domain (the derivative f'(a) exists), the function is continuous at the point a.
Continuity does not imply differentiability!
For example the absolute value function f(x) = |x| is continuous at 0 (\lim_{x \to 0} |x| = 0) but it is not differentiable because f'(0) = \lim_{x \to 0} \frac{f(x)-f(0)}{x - 0} = \lim_{x \to 0} \frac{|x|-|0|}{x - 0} = \lim_{x \to 0} \frac{|x|}{x} \quad \text{does not exist} \lim_{x \to 0^+} \frac{|x|}{x} = 1 \qquad \lim_{x \to 0^-} \frac{|x|}{x} = -1 \qquad \text{not equal}The absolute value function has a sharp corner at 0, not smooth. The limit slope of the tangent from the left is not the same as the slope of limit tangent from the right.
The tangent line at point of the function f(x) = sqrt[3]{x} is vertical, its slope is \infty.Summary: A function f is not differentiable at a point a if
The derivative function f' is a function as any other and therefore we can differentiate it again. The derivative of a derivative is called the 2nd-order derivative f''. It is again just a function and therefore we can go on with the differentiation and create higher order derivatives. We indicate the as f'(x), f''(x), f'''(x), f^{(4)}(x), \ldots, f^{(n)}(x) or y'(x), y''(x), y'''(x), y^{(4)}(x), \ldots, y^{(n)}(x) or \frac{d^2y}{dx^2}, \frac{d^3y}{dx^3}, \frac{d^4y}{dx^4}, \ldots, \frac{d^ny}{dx^n} Observe: \frac{d^2y}{dx^2} = \frac{d}{dx}\Big(\frac{dy}{dx}\Big)
There are many uses of derivatives in science but we will limit ourselves to two: finding extreme values and linear approximation of a function
Definition: A point c in the domain of f is a critical point of f if f'(c) = 0 or if f'(c) is undefined.
Fermat’s theorem: If a function f has a local extremum at point c and f is differentiable at c then f'(c) = 0.
Careful: Not all points with f'(c) = 0 are extrema!
To find the global extrema of a continuous function f over a close interval [a, b] evaluate the function at the end points a, b and at all critical points c and compare.
A function f is increasing over interval [a, b] if f'(x) > 0, it is decreasing if f'(x) < 0.
For a critical point c (f'(c)=0, or f'(c) is undefined):
Strategy:
The function f is
Alternative, yet equivalent definitions are as follows. The function f is
When a function f is convex over its whole domain D, then any critical point c with f'(c) = 0 is the global minimum. When a funciton is concave, then any point f'(c) = 0 is the global maximum.
The point a where f changes from convex to concave (or vice versa) is an inflection point of f. The second derivative of the function f'' at the inflection points is either f''(a) = 0 zero or undefined (critical point of the 1st derivative function f’).
Gaphs: Convex, concave, inflection point
The second derivative test: Suppose f'(c) = 0 and f'' is continuous over an interval containing c
We defined derivative as the slope of the tangent line f'(a) = m_{tan} = \lim_{h \to 0} \frac{f(a+h)-f(a)}{h} \enspace . For small h we can say that f'(a) \approx \frac{f(a+h)-f(a)}{h} \enspace . We can then solve the equation for f(a + h) f(a+h) \approx f(a) + f'(a)h \\ f(x) \approx f(a) + f'(a)(x-a) , \quad x = a+h \enspace ,
where L(x) = y = f(a) + f'(a)(x-a) is the tangent line which is the linear approximation of the function f at point x = a.
Finding derivatives by using the definition (the limit) can be lengthy and rather challenging for some functions.
Therefore, mathematicians have derived (and proved) rules of differentiation we can apply to the function to simplify the process.
Function | Derivative | Rule |
---|---|---|
f(x) = c | f'(x) = \frac{d}{dx}(c) = 0 | constant |
f(x) = x^n | f'(x) = \frac{d}{dx}(x^n) = nx^{n-1} | power |
f(x) = x^\frac{m}{n} | f'(x) = \frac{d}{dx}(x^\frac{m}{n}) = \frac{m}{n} x^{\frac{m}{n}-1} | power or rational exponents |
f(x) = cg(x) | f'(x) = \frac{d}{dx}(cg(x)) = c\frac{d}{dx}(g(x)) = cg'(x)) | constant multiple |
f(x) = g(x) \pm h(x) | f'(x) = \frac{d}{dx}(g(x) \pm h(x)) = \frac{d}{dx}(g(x)) \pm \frac{d}{dx}(h(x)) = g'(x) \pm h'(x) | sum, difference |
f(x) = g(x)h(x) | f'(x) = \frac{d}{dx}(g(x)h(x)) = \frac{d}{dx}(g(x))h(x) + \frac{d}{dx}(h(x))g(x) = g'(x)h(x)+h'(x)g(x) | product |
f(x) = \frac{g(x)}{h(x)} | f'(x) = \frac{d}{dx}(\frac{g(x)}{h(x)}) = \frac{\frac{d}{dx}(g(x))h(x) - \frac{d}{dx}(h(x))g(x)}{(g(x))2} = \frac{g'(x)h(x)-h'(x)g(x)}{(g(x))^2} | quotient |
We can combine all the rules listed above to differentiate complicated functions.
The differentiation rules in the table above are useful and help us to find derivatives of simple functions.
However, when the function is more complicated constructed as a composition of two or more functions, we need to rely on the chain rule
Chain rule is THE rule you need to know to understand the concept of back-propagation in neural networks!
For example, using the chain rule we can find the derivative of the function f(x) = \sqrt{3x^2+1}, which is the composition of the two functions h(x) = \sqrt{x} and g(x) = 3x^2+1.
Chain rule: Let g(x) and h(x) be differentiable functions. The derivative of the composition f(x) = (h \circ g)(x) = h(g(x)) is given by f'(x) = h'(g(x)) \, g'(x) \enspace .
For the NNs we use often the following notation: \begin{gather} y=f(x) = h(g(x)) = h(u), \qquad u=g(x) \\ f'(x) = \frac{dy}{dx}, \qquad h'(u) = \frac{dy}{du}, \qquad g'(x) = \frac{du}{dx}\\ \textbf{chain rule: } \quad \frac{dy}{dx} = \frac{dy}{du} \frac{du}{dx} \end{gather}
We can extend the same rule to composition of more than 3 functions. The derivative of the function
k(x) = f( h ( g(x) ) ) is given by k'(x) = f'( h ( g(x) ) ) \, h' ( g(x) ) \, g'(x) In the NN notation y = k(x) = f( h ( g(x) ) ) = f( h ( v ) ) = f(u), \qquad u = h(v) = h( g(x) ), \qquad v = g(x) \\ k'(x) = \frac{dy}{dx}, \qquad f'(u) = \frac{dy}{du}, \qquad h'(v) = \frac{du}{dv}, \qquad g'(x) = \frac{dv}{dx} \\ \textbf{chain rule: } \quad \frac{dy}{dx} = \frac{dy}{du} \frac{du}{dv} \frac{dv}{dx}
Function | Derivative |
---|---|
f(x) = \sin x | f'(x) = \frac{d}{dx}(\sin x) = \cos x |
f(x) = \cos x | f'(x) = \frac{d}{dx}(\cos x) = -\sin x |
f(x) = e^x | f'(x) = \frac{d}{dx}(e^x) = e^x |
f(x) = e^{g(x)} | f'(x) = \frac{d}{dx}(e^{g(x)}) = e^{g(x)} \, g'(x) |
f(x) = \log x | f'(x) = \frac{d}{dx}(\log x) = \frac{1}{x} |
f(x) = \log g(x) | f'(x) = \frac{d}{dx}(\log g(x)) = \frac{1}{g(x)} g'(x) |
From the definition of the derivative we know that for small h f'(a) \approx \frac{f(a+h)-f(a)}{h} \enspace . We can thus use this finite difference calculation to approximate the derivative of the function at point a.
Often times you can see a two-sided symmetric approximation f'(a) \approx \frac{f(a+h)-f(a - h)}{2h} \enspace . These finite difference approximations may be useful when checking the analytical form of your derivatives. The derivative function you found analytically is mostly likely correct if its evaluations are near the finite difference approxiamation at multiple randomly selected points a.