Week 4 ex, ln, differentiation

Version 2025/02/23 Week 4 in PDF All notes in PDF To other weeks

Remark: we will use the Binomial Theorem which says that for x,yR and n0, (x+y)n expands as (n0)xn+(n1)xn1y+(n2)xn2y2++(nn)yn where (ni)=n!(ni)!i!. The Binomial Theorem is taught in Probability I, and the standard proof is by induction.

The exponential function

Theorem 3.5 allows us to define new continuous functions by power series (with non-zero radius of convergence). Here is the most important example.

Definition: the exponential function and the number e.

exp(x)=1+x1!+x22!+x33!+=n=0xnn!,  e=exp(1).

Theorem 4.1: continuity and the law of the exponential.

exp is a continuous function on R. One has exp(x)exp(y)=exp(x+y) for all x,y.

  • Proof. Apply the Ratio Test to the series n=0|x|nn! to find =limnxn+1/(n+1)!|x|n/n! = limn|x|/(n+1)=0 for all |x|. Since 0<1, the power series exp(x) is absolutely convergent for all xR (the radius of convergence is R=). By Theorem 3.5, it follows that exp(x) is a continuous function on all of R.

    (-tikz- diagram)

    Figure 4.1: The double series used to prove exp(x)exp(y)=exp(x+y)

    To prove exp(x)exp(y)=exp(x+y), we compare two methods of summation of the double series am,n=xmm!ynn!, see Figure 4.1. We have

    RowSumm=xmm!(1+y+y22!+)=xmm!exp(y),

    hence summation by rows gives

    m=0RowSumm=m=0xmm!exp(y)=exp(x)exp(y).

    We now calculate the dth diagonal sum (multiplying and dividing by d! for emphasis):

    DiagSumd=1d!(xd+d!(d1)!1!xd1y1+d!(d2)!2!xd2y2++yd).

    By the Binomial Theorem, the expression in brackets is the expansion of (x+y)d. Thus, summation by diagonals gives

    d=0DiagSumd=d=01d!(x+y)d=exp(x+y).

    We claim that the sum of all numbers in this double series does not depend on the method of summation, and so exp(x)exp(y)=exp(x+y). We need to justify this claim.

    If both x and y are non-negative, then all the numbers xmm!ynn! are non-negative. In this case, Proposition 2.3 guarantees that the sum, m,nxmm!ynn!, is independent of the method of summation, and so exp(x)exp(y)=exp(x+y).

    Without the assumption that x,y are non-negative, we can show that the sum of all the absolute values in the table is finite:

    |xmm!ynn!|=|x|mm!|y|nn!m,n|xmm!ynn!|=exp(|x|+|y|)<+,

    so by Claim 2.7, the sum m,nxmm!ynn! is still independent of the method of summation, and we still have exp(x)exp(y)=exp(x+y).

Discussion of the ex notation. The law of the exponential tells us that, for all nN,

exp(n)=exp(1+1++1n)=exp(1)exp(1)exp(1)=en.

It also follows that, for p,qN, (exp(pq))q=exp(qpq)=exp(p) which is ep, and so by definition of the qth root and the (p/q)th power,

exp(pq)=epq=epq.

The law of the exponential also tells us that exp(x)exp(x)=exp(0)=1, hence

exp(x)=1exp(x)exp(pq)=1/epq=epq.

Therefore, exp(x)=ex for all rational numbers x. Motivated by this, we extend the notation to all real x:

Notation: ex.

exp(x) is written as ex for all xR.

Definition of ln, the natural logarithm function

We are going to introduce the inverse function to ex. Let us show that ex is bijective.

Proposition 4.2: properties of ex.

The function f(x)=ex is a strictly increasing bijection R(0,+).

  • Proof. Observe that x>0 ex=1+x+x22+>1+x. In particular, ex is positive for positive x. Then ex=1/ex implies that ex is positive for all x, and is indeed a function from R to (0,+).

    For all x,yR we have eyex=ex(eyx1). If x<y, then eyx>1 as observed above, so ey>ex. We have shown that ex is strictly increasing, hence injective.

    To show that ex is surjective, let d(0,+) be arbitrary. If d>1, note that ed>1+d>d as shown above. Also e0=1<d. The function is continuous, so by the Intermediate Value Theorem there exists c[0,d] such that ec=d.

    If d<1 then 1d>1 and by the above, 1d=ec for some c. We then have d=ec by the law of the exponential. Finally, if d=1 then d=e0. We have proved that ex is surjective, and so it is bijective.

We immediately deduce

Theorem 4.3: natural logarithm ln.

There is a strictly increasing continuous bijection ln:(0,+)R such that lnex=x for all xR, elny=y for all y>0 and ln(yz)=lny+lnz for all y,z>0.

  • Sketch of proof. ex is a bijection from R to (0,+) so it must have an inverse (0,+)R, which we denote ln and call the natural logarithm function. Inverse means that lnex=x and elny=y.

    Using the Inverse Function Theorem 1.2, we conclude that ln is strictly increasing and continuous.

    By definition of ln, x=lnex for all x. Set x=lny+lnz to get lny+lnz=lne(lny+lnz). By the law of the exponential, this equals ln(elnyelnz). Yet elny=y and elnz=z, so the answer simplifies to ln(yz). We proved the logarithm law, lny+lnz=ln(yz).

Differentiation of functions: an informal introduction

We begin the second part of the course: the theory of differentiation.

To differentiate a “smooth” function f at point aR means to calculate the derivative, f(a), of f at a. The derivative, if it exists, shows “how fast” the function f grows (or decreases) at the point a. It is impossible to measure growth by looking just at the value of f at a. Rather, the derivative is defined via taking the limit; we illustrate this in Fig. 4.2.

(-tikz- diagram)

Figure 4.2: The secant passing through the points (a,f(a)) and (x,f(x)) on the graph is m=f(x)f(a)xa. As xa, we expect the secant to get closer to the tangent at (a,f(a)).

We first present the idea informally (rigorous definitions are below). Fix a point P=(a,f(a)) on the graph of a function f. The slope, or gradient, of the secant passing through P and another point Q=(x,f(x)) on the graph is

mPQ=f(x)f(a)xa.

As Q “gets closer” to P, the secants “seem” to approach a fixed line, the tangent to the graph at P. The gradient of the tangent at P, if it exists, is the derivative of f at a:

mtangent at P=f(a).

Why differentiate functions? It turns out that derivatives appear in powerful results which allow us to approximate functions by extremely good functions — polynomials — and to represent some functions as sums of infinite power series. But first, we build up theory to

  • differentiate basic functions, such as polynomials, rational functions, exponential, logarithm, trigonometric and inverse trigonometric functions;

  • use rules of differentiation, to find derivatives of new functions constructed from basic functions.

Definition of the derivative of f at a

We now start our rigorous treatment of differentiation.

Definition: open neighbourhood of the point aR.

An open neighbourhood of a is an open interval (a𝛿,a+𝛿) for some 𝛿>0.

Definition: differentiable at a, derivative at a.

Let AR, and let f:AR be a function. Suppose that aA and A contains an open neighbourhood of the point a. We say that f is differentiable at a if

limxaf(x)f(a)xa

exists. The value of this limit is the derivative of f at a, and is denoted f(a).

Remark: for f to be differentiable at a, f(a) must be a real number, not infinity.

Definition: differentiable on an open interval.

f is differentiable on an open interval I if it is differentiable at every point of I.

Remark: if f is defined on a closed interval [a,b], we will not try to differentiate f at a or at b. Though possible via one-sided limits, we will not need this.

Notation: ddxf(x).

If a function f(x) is differentiable on an open interval, taking the derivative of f at each point of the interval defines a new function. We will write f(x), or ddxf(x), to denote the derivative of f(x) as a function of x.

There are functions whose derivatives can be computed by definition, i.e., by calculating the limit given in the definition of f(a) without using any further theorems.

Example: derivative of a constant function.

Given cR, define a constant function on R by the formula f(x)=c for all x. This function has derivative 0 at all points of R.

Justification: by definition, the derivative at a is limxaccxa=limxa0=0.

Remark: Remember that the limit, limxag(x), of g(x) as x tends to a, does not require g(x) to be defined at a. Indeed, the MFA definition of limit (revisit it!) looks only at points x such that 0<|xa|<𝛿, and this excludes the case x=a.

For example, the expression ccxa above is undefined when x=a. But it is of no concern to us: ccxa has value 0 for all x such that xa, and so we can write limxaccxa=limxa0.

To conclude: when calculating a limit limxa, we can always assume xa.

Example: derivative of the function x.

ddxx=1 on R.

Justification: by definition, the derivative of x at a is limxaxaxa=limxa1=1.

Theorem 4.4: differentiable implies continuous.

If f is differentiable at a, then f is continuous at a.

  • Proof. The criterion of continuity says that f is continuous at a iff limxaf(x)=f(a). Rearranging, we obtain: f is continuous at a limxa(f(x)f(a))=0.

    Assume f is differentiable at a, so that the limit limxaf(x)f(a)xa=L exists. Then

    limxa(f(x)f(a))=limxaf(x)f(a)xa(xa)(can assume xa)=limxaf(x)f(a)xalimxa(xa)(by AoL for functions)=L0=0. Thus, f verifies the (rearranged) criterion of continuity above, so is continuous at a.

Alert: continuous at a differentiable at a.

The converse to Theorem 4.4 does not hold. For example, f(x)=|x| is continuous but not differentiable at 0.

(-tikz- diagram)

Figure 4.3: Visibly, the graph of f(x)=|x| is “not smooth” at x=0.

Justification. “Differentiable at 0” requires the limit limx0|x||0|x0=limx0|x|x to exist. Yet the function is defined by |x|={xif x0,xif x<0, see the graph in Fig. 4.3. Hence

limx0+|x|x=limx0+xx=1,limx0|x|x=limx0xx=1.

The one-sided limits are not equal, so the limit limx0 does not exist.

Rules of differentiation: sums and products

We can obtain new differentiable functions from known ones by addition and multiplication.

Theorem 4.5: sum and product rules of differentiation.

Suppose that the functions f,g are differentiable at a. Then

  • the function f+g is differentiable at a, and (f+g)(a)=f(a)+g(a);

  • the function fg is differentiable at a, and (fg)(a)=f(a)g(a)+f(a)g(a).

  • Proof. The sum rule (proof not given in class): by definition of the function f+g, (f+g)(x)(f+g)(a)xa is the same as f(x)+g(x)(f(a)+g(a))xa which rearranges as f(x)f(a)xa+g(x)g(a)xa. Taking the limit as xa and using AoL for functions, we obtain (f+g)(a)=f(a)+g(a) as claimed.

    The product rule: by definition, (fg)(x)=f(x)g(x). Start with

    f(x)g(x)f(a)g(a)xa=f(x)g(x)f(a)g(x)+f(a)g(x)f(a)g(a)xa

    where we subtract then add f(a)g(x) in the numerator. The RHS rearranges as

    f(x)f(a)xag(x)+f(a)g(x)g(a)xa.

    We are given that g is differentiable at a. Differentiable implies continuous, so g is continuous at a. Hence limxag(x)=g(a). Taking limxa in the last displayed formula and using AoL, we get f(a)g(a)+f(a)g(a), as claimed.

Now, using only + and ×, we can construct all polynomials in x from constants and the function x. If we apply the rules of differentiation, we obtain

Corollary.

A polynomial in x is differentiable for all xR.

Differentiating infinite sums

The sum rule of differentiation does not extend to infinite sums. A function defined as a sum of series of differentiable functions may not be differentiable.

Yet one can show that a function defined as a sum of a power series is differentiable on (R,R), where R is the radius of convergence. We will not go through the proof of this in class. Interested students are invited to construct a proof as an exercise, along the following lines (not done in class and not examinable):

Let f(x)=n=0ckxk where the radius of convergence is R>0. Let a(R,R). By Algebra of Infinite Sums, we have f(x)f(a)=Fa(x)(xa) where Fa(x)=n=1ck(xk1+axk2++ak2x+ak1). By Proposition 4.6 below, f(x) will be differentiable at a if Fa(x) is shown to be continuous at a.

We note that Fa(x) is obtained if the double series am,n=cm+n+1amxn, m,n0, is summed by diagonals. Yet summation by columns gives the same answer (this needs to be justified by demonstrating that m,n|am,n|<+ when a,x(R,R)) and returns a power series in x. By Theorem 3.5, the sum of a power series is a continuous function, so Fa is continuous on (R,R), as required.

One concludes from the above that (n=0cnxn)=n=0(cnxn)=n=1ncnxn1. So in particular, since (xnn!)=nxn1n!=xn1(n1)!, differentiating the exponential series n=0xnn! term-by-term gives the same series, so (ex)=ex.

Instructions for the exam: differentiating a power series term-by-term as above without giving full justification will not be accepted in the exam. If asked to justify differentiation of ex, give a result obtained below, Proposition 4.7.

Proving “differentiable” by constructing slope function

Rather than showing directly that limxaf(x)f(a)xa exists, we may use the following:

Proposition 4.6: differentiability means continuity of the slope function at a.

A function f(x), defined in an open neighbourhood of aR, is differentiable at a, if and only if there is a function Fa(x) such that f(x)f(a)=Fa(x)(xa) for all x, and Fa(x) is continuous at x=a. If these conditions hold, f(a) equals Fa(a).

  • Proof. If such Fa exists and is continuous at a, we have limxaf(x)f(a)xa=limxaFa(x) which, by continuity, is Fa(a). That is, f(a) exists and equals Fa(a).

    Now suppose that f is differentiable at a. Then, defining

    Fa(x)={f(x)f(a)xa,xa,f(a),x=a.

    guarantees limxaFa(x)=Fa(a), so by criterion of continuity Fa is continuous at a.

We call Fa the slope function for f at a, because Fa(x) is the slope (the gradient) of the secant through the points (a,f(a)) and (x,f(x)) on the graph of f. It is useful to note the slope function for the polynomial xn:

f(x)=xnFa(x)=xnanxa=xn1+xn2a++an.

This formula defines a polynomial function of x which is continuous everywhere, including at x=a. One has Fa(a)=nan1 which is the derivative of xn at x=a.

Differentiating ex

We use the method of continuous slope function to differentiate ex.

Proposition 4.7: derivative of ex.

ddxex=ex.

  • Proof. To differentiate ex at 0, write

    exe0=x+x22!+x33!+ =AoIS xk=1xk1k!=(x0)F0(x).

    The slope function F0(x) is the sum of a power series convergent for all x, hence is continuous by Theorem 3.5, and by Proposition 4.6 ddx(ex)|x=0 exists and equals F0(0)=1. This proves the Special Limit for ex:

    limx0ex1x=1.

    Indeed, the left-hand side is exactly the derivative of ex at x=0 which we have just found to be 1. We now differentiate ex at an arbitrary xR:

    ddxex=limyxeyexyx=limyxexeyx1yx =h=yx exlimh0eh1h.

    By the Special Limit, this is ex×1=ex.

The Chain Rule and the Quotient Rule

We will work in the situation

RgRfR

We will write g as a function of yR and f a function of xR.

Theorem 4.8: The Chain Rule.

If g(y) is differentiable at y=k and f(x) is differentiable at x=g(k) then (fg)(y) is differentiable at y=k, and (fg)(k)=f(g(k))g(k).

  • Proof. By Proposition 4.6, whenever f is differentiable at a point , one has

    f(x)f()=F(x)(x)

    for all x, where the slope function F is continuous at . In particular, this holds for x=g(y) and =g(k):

    f(g(y))f(g(k))=F(g(y))(g(y)g(k))=F(g(y))Gk(y)(yk),

    where we assumed that g was differentiable at k and applied Proposition 4.6 to g.

    The function F(g(y)) is continuous at y=k, because g(y) is continuous (even differentiable!) at k, F is continuous at g(k)=, and a composition of continuous functions is continuous. The function Gk(y) is continuous at k. Therefore, by Algebra of Continuous Functions, F(g(y))Gk(y) is a continuous function of y. It immediately follows by Proposition 4.6 that the function f(g(y)) is differentiable at y=k, with

    F(g(k))Gk(k)=f(g(k))g(k)

    as its derivative at k, as claimed.

Example.

Find ddyey22.

Solution. Put f(x)=ex and g(y)=12y2 so that our required function is f(g(y)). To apply the Chain Rule, we must check that the assumptions of Theorem 4.8 are met:

  • g(y)=12y2 is a polynomial, hence is differentiable for all y, with g(y)=y;

  • f(x)=ex is differentiable for all x by Proposition 4.7, with f(x)=ex.

Hence we are allowed to use the Chain Rule: ddyey22=f(g(y))g(y)=ey22(y)=yey22.

Corollary: the Quotient Rule.

If g(a)0 and f(y), g(y) are differentiable at y=a, then

(1g)(a)=g(a)g(a)2,(fg)(a)=f(a)g(a)f(a)g(a)g(a)2.

  • Proof. If h(x)=1x then, for any 0, h()=limx1x1x=limxx(x)x. When calculating limx, we may assume that x, so this simplifies to limx1x=12.

    Writing 1g(y) as h(g(y)) and applying the Chain Rule, we have (1g)(a)=h(g(a))g(a)=1g(a)2g(a) as claimed. Now, to obtain (fg), apply the Product Rule to f1g.

Version 2025/02/23 Week 4 in PDF All notes in PDF To other weeks