Thứ Năm, 9 tháng 6, 2016

Welcome to Calculus One Jim Fowler, P4

How can I sketch a graph by hand?


Suppose I've got some function given by a rule and I want to make a  graph of that function. I wanted to plot say this function f(x)  equals 2x cubed minus 3x squared minus 12.  First thing I might do is just plug in some values.  All right I'll pick. Pick some inputs and I'll see what the  function outputs at those inputs. And once I've got this table of values, I  could then plot those points on a graph. The issue is, how do I really know what  happens between these points that I plotted on the graph? How do I know the  graph isn't doing some crazy wiggling in between? How do I know that I've really  picked enough input points to really get a good idea of what this graph is doing?  We're going to use derivatives to make sure that we're really capturing the  qualitative features of the function. I might have been trying to graph a  function, like f(x) equals sin Pi x, and if I just plugged in some whole number  inputs, the function would always output 0.  That might trick me into making a graph like this, where I plot 0 as the output  for all these whole number inputs. I might, then, be tempted to just fill in  this graph by drawing a straight line across.  But that's totally ridiculous, right? This graph, you know, actually looks like  this. Not a horizontal straight line.  There's all kinds of extra wiggling that's happening that I missed because I  chose my in points badly. We're going to use derivatives to make  sure that we're really capturing the qualitative features of the function and there's a ton of different ways to do this.  So let's work this out in one specific concrete example. So let's keep working on the graph of this function, f(x) equals 2x cubed minus  3x squared minus 12x. First thing I'm going to do is  differentiate this, the derivative is 6x^2-6x-12,  cause' the derivative of 2x^3 is 6x^2, the derivative of minus 3x^2 is minus 6x,  
and the derivative of minus 12x is minus 12.  There's a common factor of 6 here which I can pull out, and then I'm left with this  quadratic, and I can factor that quadratic into (x+1) times (x-2).  Now once I've got this nice factorized version of the derivative, I can then  figure out where the derivative is positive and negative.  The derivative is positive when the input is more negative than minus and it's positive when the input is more positive than 2.  In between -1 and 2, the derivative is negative. And at the point -1, and at the point 2, the derivative is equal to zero.  Now, since this function is differentiable everywhere, the only  critical points are where the derivative is equal to zero.  These are the critical points, minus 1 and 2.  Alright. So I found the critical points.  I found the derivative. Now, I'll also find the second derivative  of this function. Which I get by differentiating this derivative. If I differentiate 6x^2 I get 12x, if I  differentiate minus 6x I get minus 6, and if I differentiate minus 12 I get  zero. Again, I've got a common factor of six so  I'll pull that out and I'm left with 2x-1.  And now I can think about the SIGN of the second derivative.  And what do I know about that? Well, the second derivative is negative if I plug  in an x value which is less than 1/2 and the second derivative is positive if I  plug in an x value which is bigger than 1/2.  All right, now I know a lot of information about the SIGN of the first  and the second derivative, so I can use this information to say something about  the function. Let me look back to my preliminary graph  that I made with just plugging in a few points.  All right, so here I plugged in a few points and what I'd like to be able to  say now is where is the function increasing and decreasing.  And by looking at the sign of the first derivative I know that the function's  increasing, decreasing, and then increasing.  Minus 1 and 2 are my critical point and in fact, they're local extrema.  This is a local maximum value, and this is a local minimum value down here, and I  can also see that by considering the information given in the sine of the  second derivative. Since the second derivative's negative  here, the functions concave down. And since the second derivative is  positive over here, the function is concave up.  And that makes this point into a local maximum and this point into a local  minimum. Alright, now that I've got all that  information I can try to just fix the graph here filling it in.  So let's see, so I've got these points here and what do I know? I know the  function is increasing here, and now I know that it's decreasing here.  And I know that it's concaved down in this region.  Over the rest of the graph the rest concave up.  There's an inflection point here when x=1/2 and this point over here is a local  minumum. The function's decreasing by looking at  the sign of the first derivative, until I get to two.  And then when I get to two, the first derivative tells me the function's increasing. So there we go,  I've drawn a graph of my function. The point here is not to capture a perfect picture of the function. It's like an impressionistic painting,  the point is to capture all of the meaning all of the emotion of the  function. Compare that to a photograph which might  be a perfectly accurate portrayal, but somehow misses everything that's  essential. So here's the graph that I drew in red,  and here is a more perfect graph admittedly, that the soulless robot drew.  And you'll see that my graph really is just as good.  I mean, it captures all the qualitative information which is really what a human  being cares about. Functions increasing, decreasing,  increasing. You can see where it's concave down and  where it's concave up. And you can kind of see roughly where this function crosses the x axis. Let's summarize the situation.  There's really 4 basic pieces that you're just gluing together when you're doing a  lot of these curve sketching problems. It depends on the SIGN of the first  derivative, and the SIGN of the second derivative.  If the derivative is positive, and the second derivative is positive, then the  function is increasing, and the slopes of the tangent lines are increasing.  If the function's derivative is negative but the second derivative is positive,  that means although the function's decreasing, the slopes of those tangent  lines are increasing. We've got kind of complementary pictures  over here when the second derivative's negative, here the function's increasing  but the slopes of those tangent lines are decreasing, and here both the function is  decreasing and the slopes of the tangent lines are decreasing.  A lot of the curve sketching problems amount to just gluing together these four  basic pieces in the appropriate way.  

What is a function which is its own derivative?


Up until now, we've been considering the functions that you can  get by starting with variables and numbers, and combining them using sums,  products, quotients, and differences. So we can write down, you know, functions  like f(x)=x2+x/(x+1^)^10+x, all of this, -1/x.  But there's more things in heaven and earth that are dreamt of in your rational  functions. For instance, can you imagine a function  f, which is its own derivative? I'm looking for a functions, that if I  differentiate it, I get back itself. Now, if you're thinking cleverly, you  might be able to cook up such a function very quickly. What if f is just the zero function? Or if I differentiate the zero function,  differentiate a constant function, that's zero.  So this would be an example of function in its own deriviative.  But, that's not a very exciting example. So let's try to think of a  nonzero function, which is its own derivative.  How might we try to find such a function? So to make this concrete, I'm looking for  a function f, so if I differentiate it, I get itself and just make sure that it's  not the zero function. Let's have this function output one if I plug in zero.  Now, how could I rig this function to have the correct derivative at zero? If the derivative of this function itself, the derivative of this function  at zero should also be one. Can you think of a function whose value  at zero is one and whose derivative at zero is one?  Yes. Here is a function, f(x)=1+x.  This function's value with zero is one, and this function's derivative at zero is also one. But if the derivative of f is f, then the  derivative of the derivative of f is also the derivative of f, which is also f.  So, the second derivative must be f as well.  So, if this function is its own derivative, the second derivative of f  would also be equal to f. Now, specifically, at the point zero, that means the second derivative of the function at the point zero would be the  function's value with zero which should be equal to one.  is this function's second derivative at zero equal to one? No. If I differentiate this function twice, I  just get the zero function, but I can fix this at least to the point  zero. If I add on x^/2, now, this function's derivative at zero  is one and this function's second derivative at zero is one.  Since f is its own derivative, the third derivative of f must also be f.  No worries. If the thid derivative of f is also equal  to f, which is a consequence of the derivative of f being equal to f.  That means the third derivative of f at zero is equal to one, but this thing's  third derivative is just zero. But if I add on x^3/6, now, if I take the  third derivative of this function and plug in zero, I get out one.  The fourth derivative of f must also be f.  Okay, yeah. I gotta deal with the fourth derivative.  I'm out of space here, but no worries, I'll just get more paper.  Here, I've written down a function whose value at zero is one, whose derivative at  zero is one, whose second derivative at zero is one, whose third derivative at  zero is one, whose fourth derivative at zero is one.  And you can see, this is sort of building me closer and closer to a function which  is its own derivative. If I try to differentiate this function,  what do I get? Well, the derivative of one is zero, but the derivative of x is  one, and the derivative of x^2/2 is x, and the derivative of x^3/6, well, that's x^2/2, and the derivative of x^4/24, well, that's x^3/6.  And yeah, I mean, this function isn't its own derivative, but things are looking better and better.  But the fifth derivative of f must also be equal to f.  Okay, yeah. The fifth derivative.  I'll just add on another term, x^5/120.  And if you check, take the fifth derivative now of this function,  its value at zero is one. I've written down a function,  so that if I take its fifth derivative at zero,  I get one. The sixth derivative of f must be equal  to f. The sixth derivative I am out of room,  but here we, go. Here is a polynomial whose value first,  second, third, fourth, fifth and sixth derivative at the point zero are all one.  And you can see how this is edging us a little bit closer still to a function  which is its own derivative, because if I differentiate this function, yeah, the one goes away, but the x gives me the one back, and the x^2/2, when I  differentiate that, gives me the x. X^3/6, when I differentiate that, gives  me x^2/2. X^4/24, when I differentiate that gives  me x^3/6. x^5/120, when I differentiate that, gives  me x^4/24. X^6/720, when I differentiate that, I've got x^5/120. And now, of  course, these aren't the same, but I'm doing better.  The seventh derivative must be equal to f.  To get the seventh derivative at zero to be correct, I'll add on x^7/5040.  The eighth derivative, I'll add on x^8/40,320.  The ninth derivative, I'll add on x^9/362,880. Okay, okay. This is, isn't working out.  We're not really succeeding in writing down a function which is its own  derivative. Let's introduce a new friend, the number  e to help us. Here is how we're going to get to the  number e. This limit,  the limit of 2^h-1/h as h approaches zero is about 0.69, a little bit more.  On the other hand, this limit, the limit of three to the h minus one  over h as h approaches zero is a little bit more than one,  it's about 1.099. If you think of this as a function that  depends not on two or three, you could define a function g(x), right?  The limit as h approaches zero of x to the h minus one over h. In that case,  this first statement, the statement about the limit of two to the h minus one over  h, that's really saying that g(2) is a bit less than one.  And, this statement over here, and if you think of this as a function g, this  statement is really saying that g(3) is a bit more than 1. Now, if you're also willing to concede that this function g is continuous, which  is a huge assumption to make, but let's suppose that's the case.  If that's the case, I've got a continuous function, let's say, and if I plug in  two, I get a value that's a little bit less than one,  and if I plug in three, I get a value that's a little bit more than one.  Well, by the intermediate value theorem, that would tell me there must be some  input so that the output is exactly one. I'm going to call that input e.  In other words, e is the number, so that the limit of e^h-1/h as h approaches zero  is equal to one, and this number is about 2.7183 blah,  blah, blah. Now lets consider the function f(x)=e^x.  So let's think about this function f(x)=e^x.  Now, what's the derivative of this function? Well, from the definition,  that's the limit as h approaches zero of f of x plus h minus f of x over h.  Now, in this case f is just e^x, so this is the limit as h approaches zero  of e to the x plus h minus e to the x over h.  And this is e to the x plus h minus e to the x over h,  so I can write this as e to the x times e to the h.  This is the limit then as h goes to zero of e to the x, e to the h minus e to the  x over h, Now I've got a common factor of e to the  x. So I'll pull out that common factor and  I've got the limit as h approaches zero of e to the x times e to the h minus one  over h. Now, as far as h is concerned, e to the x  is a constant, and this is the limit of a constant times  something, so I can pull that constant out.  This is e to the x time the limit as h goes to zero of e to the h minus one over  h. But I picked the number e precisely, so  that this limit was eqal to one. And consequently, this is e to the x  times one, this is just e to the x. Look, I've got a function whose derivative is the same function.  We've done it. We've found a function which is its own  derivative. The derivative of e^x is e^x.  E^x is honestly different from this polynomials and rational functions.  We couldn't have produced that number e without using a limit.  E^x is the function that only calculus could provide us with. 


Is there anything more to learn about derivatives?



What is the chain rule?


Many of the functions that we'd most like to differentiate are actually compositions of two different functions. This happens in the real world, too. I mean, look, if you change the number of flowers, that's going to affect say how many rabbits there are around to you know, eat those flowers. And if you change them with rabbits, that'll affect how many wolves that forest can support. There's some really concrete examples of this. Here's a concrete example. Suppose that f of x is the number of widgets produced with an investment of x dollars, right? With, with more money, maybe, you can build more widgets. Suppose g of n is the income that you get by selling those n widgets. What you're probably really interested in is not exactly how many widgets you produce. What you'd like to know is, for a given investment, how much money are you going to make, right? Well, that's g of f of x minus your initial investment, right, g of f of x is how much money comes in when you sell the widgets that you produced with your initial investment of x dollars, right? This quantity is measuring the profit on an investment of x dollars in widget production. We need some framework, some general picture that let's us understand how one thing changing affects something else and how that thing's changing goes on to affect something else. Specifically, if I've got some function h which is a composition of two functions, g of f of x in this case, I'd like to know something about the derivative of h. I want to know how changing x affects f and then how changing f goes on to affect g. And I'd like some sort of formula that gives me that answer, right? I'd like to know the derivative of h in terms of information about how x is changing affects f and how changing the input to g affects g. I want a formula for the derivative of h in terms of the derivatives of f and the derivative of g. This is exactly what the chain rule does. What the chain rule says, is that the derivative of the composition is the derivative of g evaluated at f of x times the derivative of f evaluated at x. Sometimes, people have the idea that the chain rule looks somehow, that you'd really expect the formula to look very different. I mean sometimes people think this formula looks a little bit weird, you know? I'm composing functions, but now it's the derivative of g composes just a function f. What's going on? You might think that given the fact that the derivative of a sum is the sum of the derivatives. You might be tempted to think that the derivative of a composition should be the composition of derivatives, but that's not the case. But the chain rule really is capturing what happens when you chain together these changes. So let's think about this chain rule, the derivative of g of f of x is g prime f of x times f prime of x in terms of chaining together different changes. I'm trying to calculate is how changing x changes g of f of x right? This is the derivative of the composition. What do I know? Well, I know how changing x will change f of x, right? This is what the derivative of f is, is, is measuring, right? The derivative is the ratio of output change to input change. Now, in between here, what I have is the change in f of x will change g of f of x in some way. This ratio of changes is really the derivative of g at the point of f of x. What is the derivative? You plug in an input to the derivative to ask how wiggling that input would effect the output and that's exactly what this ratio is. I'm asking how will f of x is changing affect g of f of x, right? That's the derivative of g at the point that's wiggling, f of x. Well, if you think about it, now, if I just multiply these two things together, then I get the change in g of f of x divided by the change in x. This is the chain rule, right? If I multiply together g prime f of x and f prime of x, what I'm left with is exactly what I want, the derivative of g of f of x. You can see this pictorially as well. So here, I've drawn three number lines. On the first number line, I've drawn x and I imagine x is the input to f. And on the second number line, I've drawn f of x and f of x is now the input to g. And on the last number line, I've drawn g of f of x. The essential question answered by the derivative is how changing x will affect g of f of x? But since this is a composition of functions, I'm going to analyze the effect of changing x and g of f of x in stages, right? I'm first going to see how this changing x affect f of x and how f of x is changing affect g of f of x. So let's imagine that I change x by a small quantity. I'm calling that small quantity h here, h is not a function, just some small number, the amount by which I'm wiggling the input. Now, how is the output affected? Well, that's exactly what the derivative measures. Right? The derivative of f at x tells me how wiggling the input x would affect the output. So f prime of x, which is the ratio of output change to input change times an actual input change gives me a first order approximation of the output change. So I imagine the output is changing by about f prime of x times h. Now, how does that change in value of f of x affect g? Well, I have to figure out how wiggling the input to g will affect the output of g and that depends on where I'm calculating the derivative. I need to calculate the derivative of g at the point f of x, because, f of x is the point that's doing the wiggling. So, it's the derivative of g at the point f of x that tells me how wiggling the input around f of x would affect the output to g. So it's that derivative times the amount by which the input changed, which is this quantity here, f prime of x times h. And when you look at it this way, you can see that for an input change to x of some small amount h, the output changes by about g prime f of x times f prime of x as much, which is exactly what the chain rule is telling me should be the case. Since this is the correct rule, that the chain rule really is the derivative of the outside at the inside times the derivative fu nction. Let's try to see a numerical example of this thing in action. So as a numerical example let's consider the function g of x equals x to the 4th power and the function f of x equals 1 plus x to the 3rd power. Andm maybe what I want to try to estimate is g of f of 1.0001, and now, approximately what is that equal to? Well, it's not too hard to calculate g of, of 1, right? What's f of 1? Well, that's 1 plus 1 cubed, well, that's 2. So what's g of 2> Well, that's 2 to the 4th, well, that's 16. So I know that g of f of 1.0001 is going to be close to 16. The question is, how is wiggling the input up to 1.0001 going to affect the output of this composition of functions? Well, I could do it in stages, right? That's what the chain rule's telling me to do. So I could calculate first the derivative of f at 1. Right? And the derivative of f is 1 plus 3x squared, so the derivative of f at 1 is 3. And indeed, if I calculate f of 1.0001, that's about 2.0003 and a bit more. Now, I want to try to calculate how changing the input to g will affect the output of g. So I should calculate the derivative of g and that's 4x cubed by the power rule, but where should I evaluate the derivative of g? Your first temptation is to calculate the derivative of g at 1, but that is not a good idea, because you're not wiggling the input 1 to g. What you're really should be calculating is the derivative of g at 2, because it's this 2 that's going to be wiggling. When you wiggle the input to f, it's the output to f, f of 1, that's going to be changing, so you should calculate the derivative of g there and what is that? That's 4 times 2 cubed, that's 4 times 8, that's 32. So what we're trying to calculate is g of f of 1.0001 and we know that that's about g of, well, what's f of 1.0001? It's about 2.0003. So what happens when I wiggle the input of g from 2 to 2.0003? Well, that should be about the output of g at 2 which is 16 plus how much I change the input by, times the derivative of g at the point where the wiggli ng is happening, which is 2 and that's 32. And what's 16 plus 0.0003 times 32, that's 16.0096. So g of f of 1.0001 is about 16.0096. And you can see this 96 just from the chain rule, right? The relevant thing to calculate is g prime of f of 1 times f prime of 1, right? This is going to tell me how wiggling the input 1 affects the output and g prime of f of 1 is 32, f prime of 1 is 3, and 32 times 3 is 96. So, that's the chain rule and it's going to take some time for the chain rule to really sink in. But the chain rule is super important for two very different reasons. On the one hand, you've ta know the chain rule just to be able to compute derivatives. A lot of the functions that you'll be asked to differentiate are actually compositions of differentiable functions, so you'll need to use the chain rule to finish those derivative calculations. But on the other hand, you've gotta know the chain rule just to understand how chained together changes work. In the real world, a lot of things change, and those changing things affect other things, and those changing things, then go on to affect yet other things. And you've got, got understand how those changes get composed together, in order to really understand how the real world works. 


What is the derivative of (1+2x)^5 and sqrt(x^2 + 0.0001)?


, I want to differentiate really complicated functions. As a concrete example take a look at the function f of x equals 1 plus 2x to the fifth power. Let's try to differentiate this function. We could approach this in a couple different ways. First of all, I could just expand it out. Alright. So I'm just going to expand this out 1 to the fifth is just one, plus ten x, plus 40 x squared, plus 80 x cubed, plus 80 x to the fourth, plus 32 x to the 5th. Now, it's just a polynomial so I can fearlessly differentiate it. So, f prime of x, by differentiate this, the derivative of one is zero, the derivative of ten x is ten, the derivative of 40 x squared is 80 x, and 240 x squared, 320 x cubed, and 160 x to the fourth. Of course, if we're clever at this point, we can also see that this mess factors. So it's sort of believable as a factor of ten here, since all of these coefficients end in a zero. This is ten times one plus eight x plus 24 x squared plus 32 x cubed plus 16 x to the fourth. What's way less obvious, I mean not obvious at all, is that this mess also factors. It happens to be one plus two x to the fourth power. This is not an accident. What if we instead applied the change rule to original problem? So let's compute the derivative to the change rule. The first step is we're going to split up the function f into a composition of two functions, g and h, g here, the outside function is the fifth power function, and h, the inside function is one plus two x. So if I combine those two functions, save the composition, I get back f. Now, I want to differentiate f and, by the chain rule, that's the derivative of the outside, add the inside function, times the derivative of the inside function. In this case, what is the derivative of the outside function? The derivative of g is five x to the fourth. So I'm going to take that but if evaluate it at h. five h of x to the fourth multiply by the derivative of h. What is the derivative of h? Well, it's two. Well, look what I got here. I've got five, h of x is one plus two x to the fourth times two, that's ten times one plus two x to the fourth, that's exactly what we calculated before. It's really nice example, because it shows that we're doing the same calculation. We're calculating derivative of the function one plus two x to the fifth power, but we're doing it in two different ways, nevertheless, we get the same answer. Somehow, mathematics is conspiring to be consistent. Okay, well, let's try another example. Well, here's a more complicated function, f of x equals the square root of x squared plus 0.0001. What's the derivative of f? We can't simply expand this function out, and in fact, if you graph the function, you might think that the function is not differential, because the graph of the function has this sharp corner at the origin, but let's zoom in and see what this actually looks like if we zoom in close enough. If we zoom in close enough, the thing doesn't look like it has a sharp corner anymore. It actually looks like it's curved and if we zoom in any further, the thing would look more and more like a straight line. What we're really seeing is the function is differentiable. Now, we can verify this algebraically, we can use our derivative laws, like a change rule, to actually calculate the derivative of this function. We'll differentiate this by using the change rule since this is really a composition of two functions. This is a composition of the square root function and this polynomial, x squared plus 0.0001. Alright, so the derivative of f is the derivative of the outside function, which is the derivative of the square root, which is 1 over 2 square root And it's the derivative of the outside function evaluated at the inside, which is x squared plus 0.0001. I have to multiply by the derivative of the inside function. What is the derivative of x squared plus 0.0001? Well, that's the derivative of the x squared, since it's a constant and the derivative of x squared is two x. So the dirivitave of f is one over two t imes the square root of x squared plus 0.0001 times two x. I could make that a little bit nicer looking. I could cancel these twos and write this as x over the square root of x squared plus 0.0001. What happens at zero? So let compute the derivative at zero. Well, if I plug in zero for x, I've got zero over zero squared plus 0.0001. The denominator is not zero, the numerator is zero, the derivative at zero is zero and you can see that from the graph. If I look at when x equals zero, the tangent line at that point is horizontal, the slope of that tangent line is zero. The derivative at zero is zero, and there's more awesome things that you can see by looking at the derivative. If you look at, say, the limit of the derivative as x approaches infinity, that's the limit of this quantity, which is one, and the limit of the derivative as x approaches minus infinity is negative one and you can see this visibly on the graph of the function. If you plug in a really big number and look at the tangent line there, that tangent line has slope close to one. And, if you plug in areally negative number and look at the tangent line there, That tangent line has slope close to minus one. Our derivative rules are revealing facts that are hidden. This function looks like it's got a sharp corner, but we know, by applying our differentiation rules, by using the change rule, that this function is in fact differentiable. And we know that if we zoom in close enough, the thing looks like a straight line, the derivative rules really revealing this structure at very small scales. 


What is implicit differentiation?


, Sometimes you don't have a function, you have a relation between two variables. A classic example is x squared plus y squared equals, say, 25. The graph of the points in the plane that satisfy this equation as a circle. But that's not the graph of a function, right? This graph fails the vertical line test. For a given input value, say 4 in this case, there's multiple y values which satisfy this equation. So, I can't simply solve this equation for y. Nevertheless, if you pick a specific point like 4, 3, you might be able to find a function whose graph traces out that same curve. So yeah, if I pick 4, 3, there is a function, y equals the square of 25 minus x squared. Which traces out a piece of the whole curve, right? I'm just ignoring the rest of this and this little tiny piece of the curve can be regarded as a function. If I had picked a different point, then I'm going to pick a different function. Instead of the square root of 25 minus x squared, if I wanted to stand down here, near the point 4, minus 3, well then maybe I'd pick the function y equals negative the square root of 25 minus x squared. If I ignore the rest of this and I'm just looking at this curve here, yet this curve by itself is a function. If I ignore this, it satisfies the vertical line test. This function is picking out a piece of the curve given by this equation which is, yeah, only valid near the point 4, minus 3. But maybe that's all I care about for the time being. So, let's say there is a function, y equals f of x, that satisfies the original equation. Well then, I can write that down. y equald f of x say satisfies the equation just means that x squared plus f of x squared equals 25. Now, I'm not saying that this gives me all of the solutions, right? The graph x squared plus y squared equals 25 is a circle fails the vertical line test. There is no function that gives me all those outputs because there's multiple outputs for a given input. All I'm saying is that I've got some function which traces out a piece of the whole curve. Then, I can differentiate. So, this is true for a bunch of values of x that I can differentiate this. The derivative of this sum is the sum of the derivative, so the derivative of x squared is 2x plus the derivative of f of x squared. I'm going to use a chain rule to do that. It's the derivative of the outside function at the inside times the derivative of the inside function equals the derivative of 25, which is zero. Now I can solve. So, subtract 2x from both sides and I'm left with 2 times f of x times f prime of x equals negative 2x. And then, I'll divide both sides by 2 times f of x. And I'll find that f prime of x is minus 2x over 2 f of x, and I can cancel those 2's and just get minus x over f of x. It seems like a funny situation. The derivative depends on more than just x. It also has an f of x. in it. Another way to say it is that the slope of the tangent line dy, dx, is negative x over y, right? y is f of x. And it does really seem a littie bit off putting initially in these kinds of calculations the slope of the tangent line depends on more than just x. It's negative x over y for this particular case. But think back to the piacture for this case, right? The picture's a circle. And what I'm saying is the slope of the tangent line is negative x over y. So, if you pick that point, say 4,3, and you ask what's the slope of the tangent line to the circle at the point 4,3 this equation is telling you the slope is -4 thirds. And yeah, that line is going down, the slope's negative. What's the slope of the tangent line to the curve at the point 4, negative 3? Same equation tells us that the slope there is 4 3rds. And yeah, this line's going up. The slope of the tangent line is depending on more than just the x coordinate, right? You also need to know the y coordinate in order to know exactly what function you're actually looking at near that point. And that totally affects the slope of that tangent line. To do all these sorts of calculations, the trick is the cha in rule. For instance, if you're given some relation like this, x squared plus y cubed equals 1. You just got to make sure to think of y as a function of x. So that when you differentiate both sides, the derivative of the left hand side is 2x plus the derivative of y cubed equals the derivative of 1, which is 0, but what's the derivative of y cubed? If y is a function of x, then when you differentiate this, you've got to use the chain rule. It's 3 times the inside function squared, that's the derivative of the third power function, times the derivative of the inside function. I'll just write y prime. And as long as you're careful to use the chain rule, you'll be able to do these kinds of implicit differentiation problems. And you'll eventually solve for y prime in terms of both x and y. The chain rule is our friend.

What is the folium of Descartes?


, The folium of Descartes is an algebraic curve carved out by a certain equation. By which equation? This equation, x cubed plus y cubed minus 3axy equals 0. It's the points on the plane that satisfy this equation. So, what's a folium? Well, folium is just a Latin word for leaf, you know, the sorts of things that grow on trees. So, where's the leaf? Well, here's the leaf. I've plotted the points on the plane that satisfy x cubed plus y cubed minus 9xy equals 0. And this is the curve that I get, and you can see it looks kind of like a leaf. This is not the graph of a function, it's really a relation. x cubed plus y cubed minus 9xy is a polynomial in two variables, in both x and y, in both. I can't solve for x in terms of y. Look, this graph fails the vertical line test. For a given value of x, there's potentially multiple values of y which will satisfy this equation. So, what's the point of all these? Well, once upon a time, Descartes challenged Fermat to find the tangent line to this folium. And Descartes couldn't do it but Fermat could. And now, so can you. And you can do it with implicit differentiation. So, let's use implicit differentiation on this, thinking of y secretly as a function of x. So, the derivative of x cubed is 3x squared. The derivative of y cubed, well, that's 3y squared times dy dx, that's really the Chain rule in action, minus, now it's got to differentiate this. It will be 9 times the derivative x, which is 1y minus 9x times the derivative of y, which is dy dx, and that's equal to 0. Alright, now I can rearrange this, the things with the dy dx, and the things without the dy dx, and you gather it together. So, 3x squared minus 9y plus, and the things with the dy dx term, 3y squared minus 9x dy dx equals 0. Now, I'm going to subtract this from both sides. So, I'll have 3y squared minus 9x times dy dx equals minus 3x squared plus 9y. And I'm going to divide both sides by this, so I'll have dy dx equals minus 3x squared plus 9y over 3y squared minus 9x. And note that we're calculating dy dx but the answer involves both x and y. And you can see, it's really working. I can pick a point on this curve like a point 4, 2 satisfies this equation. Then, I can ask what's the slope of the tangent line to the curve through the point 4, 2? When I go back to our calculation of the derivative and if I plug in 4 for x and 2 for y, I get that the derivative is 4/5. And indeed, I mean, this graph is somewhat stretched, but, you know, yeah, I mean that doesn't look terribly unreasonable for the slope of this line. Problems like this one, which once stumped the smartest people on earth can now be answered by you, by me, by lots and lots of people. Calculus is part of a human tradition of making not just impossible things possible, but things that were once really hard much easier. Well, in any case, there's plenty more questions that you can just ask about different kinds of curves besides this folium of Descartes. You can write down some polynomial with x's and y's, like y squared minus x cubed minus 3x squared equals = 0 and then you can ask about the points, the x comma y's that satisfy this equation. And if you want to know the slope of the tangent line, use implicit differentiation. The trick is just to use the Chain rule and to treat y as a function of x. 


How does the derivative of the inverse function relate to the derivative of the original function?


, I often want to differentiate an inverse function. Say, I've got a function f. The derivative of f encodes how wiggling the input affects the output. The derivative of the inverse function would encode how changes to the output affect the input. Here's a theorem that I can use to handle this situation. Here is the inverse function theorem. I'm going to suppose that f is some differentiable function, f prime is continuous, the derivative is continuous. And the derivative, at some point, a, is nonzero. In that case, I get the following fantastic conclusion. Then the inverse function at y is defined for values of y near f of a. So, the function f is invertable near a. The inverse function is differentiable for inputs near f of a. And that derivative is continuous in your inputs near f of a. And I've even got a formula for the derivative. The derivative of the inverse function at y is 1 over the original derivative, the derivative of the original function, evaluated at the inverse function of y. How can I justify a result like that? Why should something like that be true? One 1 way to think about this is geometrically. Here, I've drawn the graph with just some made up function, y equals f of x. What's the graph of the inverse function look like? Well, one way to think about this is that the inverse function exchanges the roles of the x and y axes, which is the same as just flipping it over, alright? What was the y-axis now, the x-axis, what, was the x-axis is now the y-axis? And this graph here is y equals f inverse of x. This is how you graph the inverse function. Alright. So, let's go back to the original function and if I put down a tangent line to the curve at some point, let's say that tangent line has slope m. Well, what's the tangent line of the inverse function? That would be the derivative of the inverse function. Well, if I flip over the graph again to look at the graph of the inverse function, I can put down a tangent line to the to the inverse function. And that has slo pe 1 over m. If m was the original slope for the tangent line to the original function, 1 over m is the new slope to the tangent line of the inverse function. Why 1 over m? Well, that makes sense because I got this graph by exchanging the roles of the x and y-axis, by flipping the paper over. And that exchange is rise for run, and run for rise. So, the slope becomes the reciprocal of the old slope. This slope business is reflected in the notation, dy dx. Som let's suppose that y is f of x, so x is f inverse of y, supposing that this is an invariable function. If y is f of x, then f prime of x could be written dy dx. And if f is inverse of y, then the derivative of the inverse function at y, well, that's asking how's changing y change x could write that as dx over dy. Well, if you really take this notation seriously, what it looks like it's saying, is that, dx dy, which is the derivative of the inverse function, should be 1 over dy dx, right? The derivative of the inverse function is 1 over the derivative of the original function. But you have to think about where these derivatives are being computed. So, maybe you believe that dx dy is 1 over dy dx, it makes sense that if you exchange the roles of x and y, that takes the reciprocal of the slope of the line. But where is this wiggling happening, right? dy dx is measuring how wiggling x affects y. Wiggling around where? Well, let's suppose that I'm wiggling around a. So, I'm really calculating dy dx when x, say, is at a. This is the quantity that records how wiggling x near a. will affect y. Well then, where's y wiggling? Well, if x is wiggling around a, y is wiggling around f of a. So, the derivative on this side is really being calculated at y equals f of a. And it's really necessary to keep track of where this wiggling is happening in order to get a valid formula. It's actually easier to think about what's going on if we just phrase all of these in terms of the Chain rule. So, what do I know about the inverse function? Well, here's f inve rse. F of f inverse of x is just x. Alright, what is the inverse function do? Whatever you plug into the inverse function, it outputs whatever you need to plug into f to get out the thing you plugged into the inverse function. Alright. So, this is true. Now, if I differentiate both sides, assuming that f and f inverse are differentiable, then by the Chain rule, what do I get? Well, the derivative of this composition is the derivative of the outside at the inside times the derivative of the inside. And that's equal to the derivative of the other side, which is the derivative of x is just 1. Now, I'll divide both sides by f prime f inverse of x and I get that the derivative of the inverse function of x is 1 over f prime of f inverse of x. Is that a proof? Absolutely not. The embarrassing truth is that this argument assumes the differentiability of the inverse function. If this function, f inverse, is differentiable, then the Chain rule can be applied to it. The Chain rule requires that the functions be differentiable. Now, if the function is differentiable, then this Chain rule calculation tells me that the derivative inverse function is this quantity. But that's all predicated on knowing that the inverse function is differentiable. How do we know that? Well, that's actually the content of this theorem, right? The content of the inverse function theorem is not really the calculation of the derivative of the inverse function. It's really just the fact that the inverse function is differentiable at all. That is a huge deal, and it's not something that we can just get from the Chain rule. Once we know that the inverse function is differentiable, then the Chain rule gives us this calculation. But actually verifying if the inverse function is differentiable is really quite deep, that's why the inverse function theorem is such a big deal. The Chain rule requires that the functions I'm applying the change rule to be differentiable. In contrast, the inverse function theorem is asserting the differenti ability of the inverse function. It's really saying much more, than just a computation of the derivative if the derivative exists. It's actually telling me that the derivative exists. I'm going to have to punt on saying much more about the proof of the inverse function theorem. But nevertheless, we can now apply the inverse function theorem to some concrete examples. For example think about the function, f of x equals x squared. Well, what's the inverse function to this? Let's suppose the domain is just the nonnegative real numbers. Then, the functions invertible on the domain, and we know the name of the inverse is the square root of x. What's the derivative of the original function? Well, we know that it's 2x, and the derivative is continuous and the derivative is not 0 provided that x is a positive. This is all the stuff that we need to apply the inverse function theorem. Then, we know that the derivative of the inverse function at x is 1 over the original derivative at the inverse of x. Now, the inverse fuction is the square root of x, so that's 1 over f prime of the square root of x, and what's f prime? f prime is the function that doubles its input. So, that's 1 over 2 square roots of x. So, the derivative of the inverse function, the derivative of the square root function is 1 over 2 square roots of x, provided x is bigger than 0, right? Just like before, this is a calculation of the derivative of the square root function. We can also see this numerically. So, the square root of 10,000 is 100, and you might ask what do you have to take the square root of, to get at about 100.1? Say, some numeric example. Well, think now about the functions that are involved here. There's the squaring function and the square root function. we saw the derivative of the square root function is 1 over 2 square root x and the derivative of x squared, we already know, is 2x. Where are we evaluating these functions? Well, I'm evaluating the square root function at 10,000, right? This is at x equals 10,000 . And if I evaluate that at 10,000, that's 1 over 2 times the square root of 10,000, that's 1 over 200. Where am I evaluating the other function, the x squared function? Well there, I'm really thinking of 100 as the input, so I'll evaluate that derivative at 100 and 2x, when x is a 100 is 200. And it's not too surprising, right, that 1 over 200 and 200 are reciprocals of each other, because I'm calculating derivatives of a function and the inverse function at the appropriate places. Now, let's try to answer the original question. I'm trying to figure out, what do I have to take the square root of to get about 100.1? Well, the ratio here is about 200 between the input and the output. So, if I want the output to be affected by 0.1, I should try to change the input by about 200 times as much, and 200 times 0.1 is 20, so I should try to change the input by about 20 and sure enough, if you take the square root of 10,020, that's awfully close to a 100.1. I hope that you'll play around with these numbers. All the conceptual stuff that we're doing, these theorems, I'm not telling you these theorems to make numbers boring, right? I'm telling you all these theorems to heighten your appreciation of the numerical examples.

What is the derivative of log?

, In our quest for a function which with it's own derivative, we met e to the x. Remember, the derivative of e to the x is e to the x. What's the inverse function for e to the x? What function undoes that sort of exponentiation? Well, we really don't have a name for that function yet, so we're just going to call it Log. So in symbols, if e to the x is equal to y, then log y equals x, right? Log is the inverse function for e to the, the log of something I must raise e to get back the thing I plugged into log. These logs or logarithms are super important for a ton of reasons. Take a look at this. Since e to the x plus y is e to the x times e to y, right? This is the property of exponents, if you like. There's a corresponding statement about log. Log of a times b is log of a plus log of b. Or, a shorthand way to say that is that logarithms transform products into sums. This is a big reason why we care so much about logs. Once we've got this new function, log, we can ask what's the derivative of log? So, if f of x is e to the x, the inverse function is log. If I want to now differentiate log, I can use the inverse function theorem. So, the derivative of the inverse function is 1 over the derivative of the original function evaluated at the inverse function of x. Now, the neat thing here is that the derivative of e to the x is itself. So, f prime is just f, and I'm left with f of f inverse of x. It's e to the log of x. But log of x tells me what I have to plug into e to get out the input, right? f of the inverse function of f is just, it would be the, the same input again. So, this is 1 over x. So, the derivative of log x is just 1 over x. And you can really see this fact on the graph. Here's a graph of y equals log x. And I should warn you right off the bat that the x-axis and the y-axis have totally different scales. The x-axis goes from 1 to 100. The y-axis in this plot goes from 0 to 5. it's going to make it not so easy to tell the exact values of the slopes and tangent lines, but you can see from this graph the important qualitative feature. That the graph is getting less and less slopey. And if you like, it's flattening out as the input gets bigger. if I put down a tangent line and I start moving the point that I'm taking the tangent line at to the right, you can see the tangent line slope is getting closer and closer to zero. And, of course, that's reflected by knowing the derivative of log x is 1 over x. So, if x is really big, the tangent line at x is really close to zero in slope. Think about log of a really big number. For instance, what's log of a million? A log of a million is about 13.815510. And, of course, it keeps going. I, it's an irrational number. But, now the derivative of log, right? Is 1 over its input. So, what does that tell you that you might think log of a million and 1 is equal to? Well, the derivative tells you how much wiggling input affects the output. So, if I wiggle the input by 1, you expect the output to change by about the derivative. And yeah, log of a million and 1 is about 13.815511, right? What's being affected here is in the millionths place after the decimal point, right? It's the 6th digit after the decimal point because it's being affected like a change of 1 over a million. All right? I'm changing the output by about a millionth. At this point, we can also handle logs with other bases. So, let's suppose I want to differentiate log of x base b, right? This is the number that I'd raise b to, to get back x. Well, there's a change of base formula for log. This is the same as the derivative of, say, the natural log of x over the log of b. But the log of b is a constant, and the derivative of a constant multiple is just that constant multiple times the derivative. So, this is 1 over log b times the derivative of, here's a natural log of x. But I know the derivative of the natural log of x, it's 1 over x. So, the derivative of log of x base b is 1 over log b times 1 over x. Or maybe another way to write this would be 1 over x times log b, if you prefer writing it that way. e to the x is a sort of key that unlocks how to understand the derivative of a ton of other exponential functions. For example, now that we know how to differentiate e to the x, we can also differentiate 2 to the x. So, let's suppose I want to differentiate 2 to the x. Now, you might just memorize some formula for differentiating this. But it's easier, I think better, to just recreate this function out of the functions that you already know all the derivatives of. So, in this case, let's replace 2 by e to the log 2 to the x, right? So, instead of writing 2 here, I've just written e to the log 2, this is just 2. But I've got e to the log 2 to the x and that's the same as e to the log 2 times x. You know, this is a composition of functions that I know how to differentiate. I know how to differentiate e to the, and I know how to differentiate constant multiple times x. So, by the chain rule, it's the derivative of the outside function. So, which is itself, e to the, at the inside function, which is log 2 times x, times the derivative of the inside function which in this case is log 2 log x. So, I'm just going to multiply by log 2. Now, I could kind of make this look a little bit nicer, right? e to the log 2 times x, well, that's just 2 to the x times, again log 2. So, the derivative of 2 to the x is 2 to the x times log 2. And, of course, 2 didn't play any significant role here. I could have replaced 2 by any other number and I'd get the same kind of formula. What I hope you're seeing is that all of the derivative laws are connected. With practice, you'll be able to differentiate any function that you build by combining our standard library of functions and operations on those functions.

What is logarithmic differentiation?


all divided by 1 plus x to the fourth, this to the seventh power. In principle, there's nothing stopping you from plowing ahead and computing the derivative. You can totally differentiate this function. Right? What's the derivative of this function? Well, this function's a quotient so he needs the quotient rule. The denominator of the quotient rule is the original denominator squared. So it's going to be the original denominator now to the 14th power And the quotient rule, the numerator starts off with the derivative of the original numerator. Now, the original numerator is a product. So I'll be able to do this derivative by using the, product rule and chain rule. So it's the derivative of the numerator imes the denominator. And it keeps going, right? Then I gotta subtract the, derivative of the denominator times the numerator. But, look, you can do this derivative just by careful application of the quotient rule, the product rule, the power rule, and the chain rule. There is one thing stopping you, your sense of human decency. It's just an awful calculation. Nobody would want to do that. So instead, I propose a trick. But maybe it's not a trick, because it's a trick that fits into a general theme. It's logarithms. Logarithms turn exponentiation into multiplication, and multiplication into addition. Let's see how this helps us. So here we go. Instead of calling this function f of x. I'm jut going to call it y, because I'm getting ready to do a sort of implicit differentiation. I'm going to first apply log to both sides of this. And I'll get log y. And what's log of the other side? Well it 's log of a quotient, which is a difference of logs, and logs of things to powers, which is that power times log of the base. So this works out to 5 times log of 1 plus x squared plus This log turns multiplication into addition. 8 times log of 1 plus X cubed and this quotient becomes a difference, so, minus 7. The 7 in the exponent, log 1 plus x. Now, we differentiate. All right, so differentiating now, what's the derivative of log y? Remember, y is secretly a function of x, so I differentiate log y. It's the derivative of the outside, which is 1 over y, times the derivative of the inside, which, I'll write dy dx. This is really an example if you like the implicit differentiation. Alright, now I differentiate the other side, 5, I just multiply it by 5 the derivative of log is 1 over, so 1 over the inside function and 1 plus x squared times the derivative of the inside function which is 2x. The derivative of 1 plus x squared is 2x. All right, plus 8, and its derivative log is 1 over at the inside function, 1 plus x cubed, times the derivative of this inside function, which is 3 times x squared, minus 7 over, the derivative of log is one over at the inside function, one plus x to the fourth, and the derivative of one plus x to the fourth is four x cubed. We're almost there. So now, I just multiply both sides by y. And I get that the derivative is this thing calculated x y, y is this quantity. I can write this a little bit more nicely alright here's this 5 times 2x is 10x, 8 times 3 is 24, 7 times 4 is 28, and then I multiply by y. So I found the derivative, here it is. In general, this trick logarithmic differentiation as it's called, works fantastically well for functions like these. Rational functions that involve a lot of high powers. 

How can we multiply quickly?


>> Here's a fundamental question. How can I multiply numbers really quickly. It's not like this is a particularly new problem. You can imagine somebody trying to multiply 10, 15, 17 by 10, 20, 30, 35, 36, 37 and getting 500, 600 and 29. But you know, how would you ever do these kinds of calculations if you were trapped in a world filled with Roman numerals? Thankfully, instead of Roman numerals, we've got place value. Place value provides an algorithm for actually computing these multiplication answers. So, here we've got 17, here I've got 37. If I want to do this multiplication problem, I just have to do this single digit multiplies. 7 times 7 is 49. 7 plus 4 is 11. 3 times 7 is 21. 3 plus 2 is 5. Add these numbers up, 629. And that's exactly what I had here in Roman numerals. What if the numbers were much, much bigger than just 2 digit numbers? What if the two numbers I wanted to multiply each had 10 digits? Well then, I've still gotta do all these para-wise multiplications. Right down here, I'm going to end up writing 100 digits. At least 100 digits. Because for every pair of digits here, I've got to write down at least one digit down here. That's terrible. You know, and then I've got to add all of these things up before I before I'm able to get the answer. I mean that's a ton of work, right? You'd really hope that there'd be some way to speed this up. And there is a way to speed this up. There's a ton of ways to speed up multiplication. Multiplication is such an important operation that humans have given it a ton of thought. We've really got a lot of different ways to try to make this faster, but maybe the easiest way Is that of quarter squares. So here's the trick. I'm going to use this table of quarter squares. This is n squared over 4, a quarter square, so here's n, here's the output. If I plug in 1, I get a quarter. If I plug in 2, 2 squared over 4 is 1. 3 squared over 4 is 2 and a quarter. 4 squared over 4 is 4. 5 squared over 4 is 6 and a quarter. Okay, you can image I've got a really big table of these quarter squares. Now, why does this help you multiply? We've also got this little algebraic fact. A times b is a plus b squared over 4, the quarter square of a plus b minus a minus b squared over 4. So, instead of multiplying a and b, I'll add them together, look it up in the table, take their difference, look it up in the table, and take the difference of those table values. For instance, let's suppose that I want to multiply 3 times 2. I mean, this is a ridiculously easy case. But just to show off how it works. Let's multiply 3 times 2. I'll add 3 and 2 and I get 5. And I look it up in my table. And 5 squared over 4 is 6 and a quarter. I take the difference, 3 minus 2 is 1. And if I look up 1 in my table, I get a quarter. And 6 and a quarter minus a quarter is 6. Which is a product of 3 and 2. Quarter squares convert multiplication into an addition, a subtraction, two table lookups and a final subtraction. Let's try doing this on a much bigger number. Let's try to multiply 17 by 37 using quarter squares. So, the first thing to do is to figure out the quarter square of 17 plus 37. 17 plus 37 is 54. I've got a much bigger table of quarter squares here. Here's 54 on my table. 54 squared over 4 according to my table is 729. Now, the next step is to look at the difference of 17 and 37, which is 20, and look that up in my table of quarter squares. Here's 20. And the quarter square of 20 is 100. That's pretty clear. 20 squared is 400, divided by 4. So 100. Now what do I do to figure out the product of 17 and 37? Well, I'm going to take 729, I'm going to subtract 100, and I'm going to get 629, which is in fact the produce of 17 and 37. But we did it using quarter squares, by just adding the numbers together, taking their difference, looking up those numbers in the table and then taking the difference of the numbers in the table, I got the product of these 2 numbers. Admittedly, people don't talk too much about quarter squares nowadays. What you've probably heard a lot more about is logarithms. There's this property of exponents that e to the x times e to the y is e to the x plus y. The corresponding property of logs is that log of a product is the sum of the logs. The log of a times b is log of a plus log of b. You can use this property of logs to multiply very quickly provided you have a log table. And I do have a table of logarithms. Here's my table. Let's multiply 17 by 37. So, instead of looking up 17, I'm going to look 1.70 in my table and I find the log of 1.70 is about 0.23045. Then, instead of looking up 37, I'm going to look up 3.70 in my table, and the log of 3.70 is about 0.56820. I'm going to add together those two logs and I get 0.79865, and I just got to hunt for that number in my table. Mercifully, the numbers are in order and I find that 0.79865 is right here in my table that's in the 9th column of the row which starts with 6.2. So the product of 17 and 37 must be 629. Why this works is that same basic fact about logs again. Logs convert multiplication into addition. There's another way that we can exploit this fact. If I've just got two plain old rulers, I can use the two rulers to add together numbers. Let's say, I want to add 3 and 2 together. What I'll do is I'll put the 0 on the top above the 2 on the bottom so that this distance is 2 units. On the top, the distance between 0 and 3 is 3 units. So, if I'm going to add 2 units to 3 units, I just read down and the answer is 5. If your 2 rulers, have a logarithmic scale, then you've just invented the slide rule. This is a logarithmic scale, so this distance, on the scale labeled D, starting at 1 and ending at this 7, this distance is log of 1.7. Now, next to that distance, I have to place a distance whose length is log of 3.7. Now I've placed the 1 on the scale labeled C, just above the 7 on the D scale. The distance on the C scale from that one all the way over here to 3.7 on the C scale, that's a distance which is really log of 3.7. Since I've placed these two distances next to each other, the total distance is just the sum of the logs, which is the log of the product. So, here's the answer. Just below 3.7 on the top scale is what looks to be 6.3, just a little bit less than 6.3. Now, I know the last digit of 17 times 37 is going to be a 9. So, the answer must be 6.29. For hundreds of years, these slide rules were state of the art for multiplying. So that you can join into this proud tradition, I encourage you to print out your own slide rule, and try doing some multiplication problems on it. 


How do we justify the power rule?

, Remember back, remember back to the power rule. Well, what do the power rules say? It's that the derivative of x to the n, and some real number not zero, the derivative of x to the n is n times x to the n minus one. The power rule isn't just something we just made up. It's a consequence of the definition of derivative. But how do we know it's actually true? Well remember, we've already worked this out when n is a positive whole number. Then, the derivative of x to the n is the limit of this difference quotient. This is the limit that calculates the derivative. If n is a positive whole number, I can expand out x plus h to the n, and I get x to the n plus n x to the n minus 1 times h plus things with lots of h's minus x to the n all over h. Now, the x to the n and the minus x to the n cancel, the h here cancels this h here, and I'm left with a bunch of h's divided by h, there's still a lot of h's in this. And the limit of this constant, as far as h is concerned plus a thing where h is in it, well, this goes to zero. And I'm left with n times x to the n minus 1, which is the derivative of x to the n. And this is a completely valid argument and as long as n is a positive, whole number. But there's plenty of numbers which aren't positive, whole numbers. What if n equaled negative 1? Let's figure out the derivative of x to the minus first power. Actually, the derivative of 1 over x. This is a problem that we can attack directly using the definition of derivative. Here, I've written the limit of the function of x plus h minus the function over h. Now, to calculate this limit, I'll first put this part in the numerator over a common denominator. So, this is the limit as h goes to zero, whole thing's over h. But the numerator is now x minus x plus h, over common denominator for the things in the numerator, x plus h times x. Now, what's x minus x plus h? Well, in that case, this x and this x cancel. And what I'm left with is just negative h up there. So, this is the limit as h goes to zero of negative h over x plus h times x all over h. Great. Now, the h down here and the h up here cancel. What am I left with? I'm left with the limit as h goes to zero of negative 1 over x plus h times x. Now, how can I deal with this? Well, as h goes to zero, the numerator's just 1 but the denominator is approaching x squared. So, this limit is minus 1 over x squared. What we've calculated here is the derivative of 1 over x, the derivative of 1 over x is negative 1 over x squared. Now, I can use this fact. The fact that the derivative 1 over x is negative 1 over x squared to compute using the change rule the derivative of 1 over x to the n. This is a composition of two functions, the composition of the 1 over function and the x to the n function. The derivative of 1 over is negative 1 over the thing squared. So, it's the derivative of the outside function at the inside times the derivative of the inside function. The derivative of x to the n, if n says positive, whole number, I already know this, it's n times x to the n minus 1. And x to the n squared is x to the 2 n. So, I've got negative 1 over x to the 2n times n times x to the n minus 1. Now, a minor miracle happens. The x to the 2n and the x to the n minus 1, they're interacting, so that I'm left with the x to the n plus 1 in the denominator, minus 1 times integer minus n in the numerator. Now, this movie doesn't look so great. But remember that at 1 over x to the n is just another name for x to the negative nth power. And this, if I rewrote this as x to a power, I could rewrite this as negative n times x to the negative n minus 1 power. And look. What we've shown is the derivative of x to the negative n is negative n x to the negative n minus 1. This is verifying the power rule holds even when n is a negative number. Pretty good. We've done it now for all whole numbers. But what about rational numbers? So, here's a question. How are the derivative of x to the 21/17 power is 21/17 times x to that power minus 1, 4/17. Implicit di fferentiation to the rescue. Well, here's maybe a simpler case. y is the derivative of x to the 1/17. 1/17 times x to the negative 16/17. Well, let's set y equal x to the 1/17, and that means y to the 17th power is x. And I can apply explicit differentiation to y to the 17 equals x. So the differentiation precisely gets 17y to the 16 dy dx equals a derivative of x, which is 1. Divide both sides by 17 times y to the 16th power and I get that dy dx is 1/17 times 1 over y to the 16th power. But, y is x to the 1/17. So, dy over dx is 1/17 times 1 over y, is now x to the 1/17 seventeenth x to the 16/17. But, it's 1 over that, so I could write this as 1/17 times x to the negative 16/17. So now the chain rule finishes off the problem. if I want to differentiate x to the 21/17 power, well that's the same as differentiating x to the 1/17 power to the 21st power. It's chain rule. So that's the same as 21 times the inside, x to the 1/17 to the 20th. That's the derivative of the outside function versus the 21st power function at the inside, times the derivative of the inside function. Now, good news. We calculated the derivative of the outside function. This is 21 times x to the 1/17 to the 20th power times the derivative of x to the 1/17, which is 1/17x to the negative 16/17. well, this is 21 times x to the 20/17 times 1/17 x to the negative 16/17. And 20 minus 16 is 4. It's 21 times x to the 4/17 over 17, it's 21/17 x to the 4/17. That's exactly what the power rule tells you when n is 21/17. We started off just knowing the power rule was true for positive whole number exponents. And now, after doing a little bit of work, we know that the power rule holds for any rational exponent. What about the function f of x equals x to the square root of 2 power? Whoa. What does that even mean? It's a serious objection. What do I mean by a number raised to the square root of 2 power? Well, what can I do? I can take x and I can raise it to the 1.4 power, by which I mean, I take x multiplied by itself 14 times and then take the tenth root of that. I can take x to the 1.41 power, by which I mean I take x multiplied by itself 141 times, and then take the hundredth root of that. And I can keep on going, right? If I want to take x to the 1.414 power, I'd multiply x by itself 1,414 times, and then take the thousandth root of that. If I were to take x to the 1.4142 power, right? I take x and multiply by itself 14,142 times and then take the 10,000th root of that number. And I can keep doing this, and I'm getting closer and closer to the square root of 2. And that's really what this function means. It really means to take a limit of these functions I actually understand, functions where I'm taking x to a rational exponent. We can handle this with a logarithm. So, let's set y equals x to the square root of 2 power. I want to calculate dy dx. So, the trick here is log. So, I'm going to take a log of both sides. log of y is log of x to the square root of 2 power. But log of something to a power is that power times log of the base. So, I've got log y is the square root of 2 times log x. Now, I differentiate both sides and I find out that the derivative of log y is 1/y dy dx, and the derivative of the other side is the square root of 2 times 1/x. Multiply both sides by y, and I've got dy dx is the square root of 2 y/x. But I know what y is. y is the x to the square root of 2 power. So, this is the square root of 2 times x to the square root of 2 power divided by x. In other words, it's the square root of 2 times x to the square root of 2 minus 1. We're using logarithms to fill in the gaps in the quotient rule. We're not just learning a bunch of derivative rules, we're actually learning why these rules work. Take a look. The square root of 2 here plays no essential role in this argument. I could go back through this entire thing and replace the square root of 2 everywhere I see it by the number n. And what I'd see is that this logarithm argument is justifying the power rule. The derivativ e of x to the n is n times x to the 'n, n minus 1. And so, we're really building the foundations of calculus. We are not just learning how to apply the rules to some calculations, we're learning to justify that these rules are the correct rules.

How can logarithms help to prove the product rule?


, Remember back before, when we talked about the product rule? You know, it goes like the derivative of f times g is the derivative of f times g plus f times the derivative of g. It's a little bit mysterious considering that the product rule has a plus in it. But we proved this previously, just by going back to the definition of derivative in terms of limit. And, calculating the necessary limit to show that, this product rule was in fact valid. We've already seen a proof for the products rule. Originally we justified the product rule by going back to the limit definition of derivative and manipulating that limit. But maybe that proof didn't speak to you, so now there's another trick that we can use. We can use logarithms to replace the product with a sum. Let's see how. So let's suppose that f of x is bigger than 0, and g of x is bigger than 0, say for all x. I just want to do this for positive functions. Okay. Now I'm going to use logs, so let's take the log of f of x times g of x. And what do I know about logs? Logs turn products into sums. So the log of f of x times g of x is the log of f of x plus the log of g of x. I'm going to differentiate both sides of this equation. So the derivative log is 1 over, and by the chain rule, that's 1 over the inside function times the derivative of the inside function, which in this case is the derivative of f of x times g of x. That's what I'd like to compute. What's the driv of the other side? Well the derivative of the log is 1 over, so 1 over the inside function times the derivative of the inside function, plus log of g of x is 1 over the inside function times the derivative of the inside function. Now if I multiply both sides by f of x times g of x, what happens? Well if I multiply this side by f of x times g of x, I've then isolated the derivative of the product. So this is just the derivative of f of x times g of x. If I multiply this side by f of x times g of x, f of x times 1 over f of x is just 1, but I'm left with a factor of g of x times f prime of x plus, and if I multiply this term by f of x times g of x, g of x times 1 over g of x is just one, but I'm left with an f of x, so f of x times g prime of x. And look, this is the product rule. The derivative of the product is the, in this case, g of x times f prime of x plus f of x times g prime of x. So, I mean, the order's a little bit different, but it is the product rule. So we've justified the product rule another way using logarithms, but that raises a question, what's the point of having multiple proofs of a single mathematical fact? It's not as if having 2 different proofs of the product rule makes the product rule any more true. What this argument has is in its favor is that it's showing off a nice trick that you can do with logarithms. There's a theme that products and quotients are much more complicated than sums and differences Armed with logarithms, we can convert difficult products and quotients into much easier sums and differences, and that's a huge win for us. 


How do we prove the quotient rule?


, How do we prove the quotient rule? Well first, we should remember what the quotient rule says. So, remember what the quotient rule says. It says the derivative of f over g is the derivative of f times g minus f times the derivative of g all over the value of g. Not the derivative of g. Just g of x squared. But we haven't actually seen a proof of the quotient rule. Why should the derivative of a quotient be governed by that crazy looking formula? Well, one way to justify this formula is to combine the chain rule and the product rule. So, we're trying to build our way up to the quotient rule so we can first do the simplest possible case the quotient rule by hand if you like. What's the derivative 1 over x? Well, 1 over x is just x to the minus first power. And if I differentiate this, that's the power rule. We saw how to do that before. That's minus 1 times x to the minus second power. Another way to write this is minus 1 over x squared. Now, if you weren't certain why the power rule held, you could also have calculated this derivative by hand by going back to the definition of derivative. This is actually how we justified the power rule for negative exponents. This limit, you can also calculate, is minus 1 over x squared. Knowing how to differentiate 1 over x is enough for us to differentiate 1 over g of x by using the chain rule. So, we're going to use the chain rule. So, let me first make up a new function that's called f of x at function 1 over x. So then f prime of x is minus 1 over x squared. The derivative that we just calculated. Now, if I wanted to calculate the derivative of 1 over g of x. Well, that's the same as the derivative now of f of g of x, since I defined f to be the 1 over function. And by the chain rule, the derivative of this composition is the derivative of the outside at the inside times the derivative of the inside. The derivative of f is minus 1 over its input squared. So, f prime of g of x is minus 1 over g of x squared. That's looking good. That's like th e denominator of the quotient rule. Alright, times g prime of x. so, the derivative of 1 over g of x is minus 1 over g of x squared times the derivative of g. Now, I've got the product rule. So, if I can differentiate f and I can differentiate 1 over g of x, I can differentiate their product which happens to be the quotient, f of x over g of x. So, I want to differentiate f of x over g of x, right? I'm trying to head towards the quotient rule. But I'm going to rewrite this quotient as a product. It's the derivative of f of x times 1 over g of x. Now, this is the derivative of products so I can apply the product rule, which I already know. And that's the derivative of the first times the second, plus the first times the derivative of the second. But we calculated the derivative of the second just a moment ago. The derivative of 1 over g of x is minus 1 over g of x squared times g prime of x. So, I can put that in here as the derivative of 1 over g of x. It's minus 1 over g of x squared times g prime of x. By rearranging this, I can make this look like the quotient rule that we're used to. Let's aim to put this over a common denominator. So, I could write this as f prime of x times g of x over g of x squared plus, what do I have over here? Well, negative f of x g prime of x, the negative 1 f of x g prime of x, over g of x squared. And I can combine these two fractions, f prime of x g of x minus f of x g prime of x all over g of x squared. That's the quotient rule, right? We've got to the quotient rule at this point. And how is it we do it? Well, think back to what just happened. I, I used the power rule to differentiate 1 over x. Once I knew the derivative of that, I could use the chain rule to differentiate 1 over g of x. And once I knew the derivative of 1 over g of x, I could then use the product rule on f of x times 1 over g of x to recover the quotient rule. What's the upshot here? Why is it important that the quotient rule can be seen as an application of the chain rule and the product rule? One reason is a pedagogical one. I think it's important for you to see that all these differentiation rules are connected together. I hope that will give you a better sense of the rules and, and make them more memorable. 


How does one prove the chain rule?


, The Chain rule is so important, it's worth thinking through a proof of its validity. You might be tempted to think that you can get away with just cancelling. What I mean is you might be tempted to think that something like this works. let's say you wanted to differentiate g of f of x. You might set f of x equal to y. And then, you might think, well, what you're really trying to calculate is the derivative of g of y and okay. And then, you might say, well, the derivative of g of y, that will be the derivative of g with respect to y times the derivative of y with respect to x, and that's really the Chain rule. I mean, this first thing is the derivative of g and this other thing is the derivative of f just like you'd expect, and then you're trying to say that you can just cancel, alright? You're not allowed to just cancel. I mean, the upshot here is just that dy/dx is not a fraction, alright? You can't justify this equality by just canceling because these objects, the way you're supposedly doing the canceling, they're not fractions. We need a more delicate argument than that. One way to go is to give a slightly different definition of derivative. Well, here's a slightly different way of packaging up the derivative. The function f is differentiable at a point a, provided there's some number, which I"m suggestively calling f prime of a, if the derivative of f at a, so that the limit of this error function is equal to 0. And what's this error function? Well, it's measuring how far my approximation is that I'd get using the derivative from the actual functions output if I plug in an input value near a. In some ways, that's actually a nicer definition of derivative, since it really conveys that the derivative provides a way to approximate output values of functions. In any case, now, let's take this new definition of derivative and try to prove the Chain rule. Try to approximate g of f of x plus h. Try to discover the Chain rules. So, I want to be able to express this in terms of the derivative s of g and f, at least, approximately, and then control the error. Well, I can do that for f because I'm assuming that f is deferential about the point x. So, this is g of, instead of f of x plus h, f of x plus the derivative of f at x times h plus an error term, we should be calling error of f of h times h. I'm going to play the same game with g. This is g of f of x plus a small quantity. And if I assume that g is differentiable at the point f of x, then this is g of f of x plus the derivative of g at f of x times how much I wiggle by, which, in this case, is f prime of x h plus that error term plus an error term for g, which is the error term for g. And I have to put in how much I wiggled by, which, in this case, is f prime of xh plus the error term for f at h times h times that same quantity, f prime of xh plus the error term for f at h times h. Alright, so that's exactly equal to g of f of x plus h and I'm including all of the error terms. Now, we can expand out a bit. So, this is g of f of x plus, you can multiply these two terms together, g prime of f of x times f prime of xh. That's looking really good, because that's what the Chain rule is, right? It's supposed to give me this as the derivative of g composed with f. Plus, I've got a ton of error terms now. All those error terms have an h, so I'm going to collect all the h's at the end. The first error term is g prime of f of x, this term here, times the error of f at h. The next ones, plus the error of g, at this complicated quantity, I was going to abbreviate hyphen, times f prime of x times h, I'm collecting all of these h's the end, plus the error term for g, at that complicated quantity, times the error for f at h, and all of that is times h. Alright. Now, this is almost giving me the derivative of the composite function provided that I can control the size of this error term, right? What I need to show now is that the limit as h approches 0 of this error term is really 0, and the error term, right, it's the part before th e times h, and it's g prime of f of x times the error term for f at h plus the error term for g times f prime of x, plus the error term for g times the error term for f at h. Now, why do I know that, that limit is equal to 0? Well, I can do it in pieces, right? It's the limit of the sum, so it's the sum of the limits. and I know that this first term is 0 because it's got an error f h term in it, and because f is differentiable, the error term goes to 0. I likewise know the same for this, alright? This is, it also got an error f of h term in it. The most mysterious term is this. But if you think a little bit more about it, the error of g at this hyphen thing, which I'm abbreviating this whole thing here, also goes to 0 as h goes to 0. And that's another thing to know that the limit as h goes to 0 of this quantity is 0 which is then enough to say that g of f of x plus h equals this quantity, actually implies that this is the derivative. So, here's what we've actually shown. Suppose that f is differentiable at a point a and g is differentiable at the point f of a, then the composite function, g composed with f, is differentiable at a, with the derivative of g of f at the point a, equal to the derivative of g at f of a times the derivative of f at a. 

Không có nhận xét nào:

Đăng nhận xét

Tìm kiếm Blog này

Lưu trữ Blog