Chain rule
In calculus, the chain rule is a formula that expresses the derivative of the composition of two differentiable functions and in terms of the derivatives of and. More precisely, if is the composition such that for every, then the chain rule is, in Lagrange's notation,
or, equivalently,
The chain rule may also be expressed in Leibniz's notation. If a variable depends on the variable, which itself depends on the variable , then depends on as well, via the intermediate variable. In this case, the chain rule is expressed as
and
for indicating at which points the derivatives have to be evaluated.
In integration, the counterpart to the chain rule is the substitution rule.
Intuitive explanation
Intuitively, the chain rule states that knowing the instantaneous rate of change of relative to and that of relative to allows one to calculate the instantaneous rate of change of relative to as the product of the two rates of change.As put by George F. Simmons: "If a car travels twice as fast as a bicycle and the bicycle is four times as fast as a walking man, then the car travels 2 × 4 = 8 times as fast as the man."
The relationship between this example and the chain rule is as follows. Let, and be the positions of the car, the bicycle, and the walking man, respectively. The rate of change of relative positions of the car and the bicycle is Similarly, So, the rate of change of the relative positions of the car and the walking man is
The rate of change of positions is the ratio of the speeds, and the speed is the derivative of the position with respect to the time; that is,
or, equivalently,
which is also an application of the chain rule.
History
The chain rule seems to have first been used by Gottfried Wilhelm Leibniz. He used it to calculate the derivative of as the composite of the square root function and the function. He first mentioned it in a 1676 memoir. The common notation of the chain rule is due to Leibniz. Guillaume de l'Hôpital used the chain rule implicitly in his Analyse des infiniment petits. The chain rule does not appear in any of Leonhard Euler's analysis books, even though they were written over a hundred years after Leibniz's discovery.. It is believed that the first "modern" version of the chain rule appears in Lagrange's 1797 Théorie des fonctions analytiques; it also appears in Cauchy's 1823 Résumé des Leçons données a L’École Royale Polytechnique sur Le Calcul Infinitesimal.Statement
The simplest form of the chain rule is for real-valued functions of one real variable. It states that if ' is a function that is differentiable at a point ' and ' is a function that is differentiable at, then the composite function is differentiable at ', and the derivative isThe rule is sometimes abbreviated as
If and, then this abbreviated form is written in Leibniz notation as:
The points where the derivatives are evaluated may also be stated explicitly:
Carrying the same reasoning further, given functions with the composite function, if each function is differentiable at its immediate input, then the composite function is also differentiable by the repeated application of Chain Rule, where the derivative is :
Applications
Composites of more than two functions
The chain rule can be applied to composites of more than two functions. To take the derivative of a composite of more than two functions, notice that the composite of,, and ' is the composite of with. The chain rule states that to compute the derivative of, it is sufficient to compute the derivative of ' and the derivative of. The derivative of can be calculated directly, and the derivative of can be calculated by applying the chain rule again.For concreteness, consider the function
This can be decomposed as the composite of three functions:
So that.
Their derivatives are:
The chain rule states that the derivative of their composite at the point is:
In Leibniz's notation, this is:
or for short,
The derivative function is therefore:
Another way of computing this derivative is to view the composite function as the composite of and h. Applying the chain rule in this manner would yield:
This is the same as what was computed above. This should be expected because.
Sometimes, it is necessary to differentiate an arbitrarily long composition of the form. In this case, define
where and when. Then the chain rule takes the form
or, in the Lagrange notation,
Quotient rule
The chain rule can be used to derive some well-known differentiation rules. For example, the quotient rule is a consequence of the chain rule and the product rule. To see this, write the function as the product. First apply the product rule:To compute the derivative of, notice that it is the composite of with the reciprocal function, that is, the function that sends to. The derivative of the reciprocal function is. By applying the chain rule, the last expression becomes:
which is the usual formula for the quotient rule.
Derivatives of inverse functions
Suppose that has an inverse function. Call its inverse function so that we have. There is a formula for the derivative of in terms of the derivative of. To see this, note that and satisfy the formulaAnd because the functions and are equal, their derivatives must be equal. The derivative of is the constant function with value 1, and the derivative of is determined by the chain rule. Therefore, we have that:
To express as a function of an independent variable, we substitute for wherever it appears. Then we can solve for.
For example, consider the function. It has an inverse. Because, the above formula says that
This formula is true whenever is differentiable and its inverse is also differentiable. This formula can fail when one of these conditions is not true. For example, consider. Its inverse is, which is not differentiable at zero. If we attempt to use the above formula to compute the derivative of at zero, then we must evaluate. Since and, we must evaluate 1/0, which is undefined. Therefore, the formula fails in this case. This is not surprising because is not differentiable at zero.
Back propagation
The chain rule forms the basis of the back propagation algorithm, which is used in gradient descent of neural networks in deep learning.Higher derivatives
generalizes the chain rule to higher derivatives. Assuming that and, then the first few derivatives are:Proofs
First proof
One proof of the chain rule begins by defining the derivative of the composite function, where we take the limit of the difference quotient for as approaches :Assume for the moment that does not equal for any near. Then the previous expression is equal to the product of two factors:
If oscillates near, then it might happen that no matter how close one gets to, there is always an even closer such that. For example, this happens near for the continuous function defined by for and otherwise. Whenever this happens, the above expression is undefined because it involves division by zero. To work around this, introduce a function as follows:
We will show that the difference quotient for is always equal to:
Whenever is not equal to, this is clear because the factors of cancel. When equals, then the difference quotient for is zero because equals, and the above product is zero because it equals times zero. So the above product is always equal to the difference quotient, and to show that the derivative of at exists and to determine its value, we need only show that the limit as goes to of the above product exists and determine its value.
To do this, recall that the limit of a product exists if the limits of its factors exist. When this happens, the limit of the product of these two factors will equal the product of the limits of the factors. The two factors are and. The latter is the difference quotient for at, and because is differentiable at by assumption, its limit as tends to exists and equals.
As for, notice that is defined wherever ' is. Furthermore, ' is differentiable at by assumption, so is continuous at, by definition of the derivative. The function is continuous at because it is differentiable at, and therefore is continuous at. So its limit as ' goes to ' exists and equals, which is.
This shows that the limits of both factors exist and that they equal and, respectively. Therefore, the derivative of at a exists and equals.
Second proof
Another way of proving the chain rule is to measure the error in the linear approximation determined by the derivative. This proof has the advantage that it generalizes to several variables. It relies on the following equivalent definition of differentiability at a point: A function g is differentiable at a if there exists a real number g′ and a function ε that tends to zero as h tends to zero, and furthermoreHere the left-hand side represents the true difference between the value of g at a and at, whereas the right-hand side represents the approximation determined by the derivative plus an error term.
In the situation of the chain rule, such a function ε exists because g is assumed to be differentiable at a. Again by assumption, a similar function also exists for f at g. Calling this function η, we have
The above definition imposes no constraints on η, even though it is assumed that η tends to zero as k tends to zero. If we set, then η is continuous at 0.
Proving the theorem requires studying the difference as h tends to zero. The first step is to substitute for using the definition of differentiability of g at a:
The next step is to use the definition of differentiability of f at g. This requires a term of the form for some k. In the above equation, the correct k varies with h. Set and the right hand side becomes. Applying the definition of the derivative gives:
To study the behavior of this expression as h tends to zero, expand kh. After regrouping the terms, the right-hand side becomes:
Because ε and η tend to zero as h tends to zero, the first two bracketed terms tend to zero as h tends to zero. Applying the same theorem on products of limits as in the first proof, the third bracketed term also tends zero. Because the above expression is equal to the difference, by the definition of the derivative is differentiable at a and its derivative is
The role of Q in the first proof is played by η in this proof. They are related by the equation:
The need to define Q at g is analogous to the need to define η at zero.