Why would I care to find the derivative?
Why is sqrt(9999) so close to 99.995?
Sometime in the future, we're going to see the following derivative rule. But, I want to mention it now just so we can see an example of how derivatives play out in practice. The derivative of the √x is 1/2√x. You might already believe this if you believe the power rule, right? The derivative of x^n is nx^n-1. So, if n is 1/2, then I've got that the derivative of x^n, now 1/2, is n, 1/2, times x^n-1. And conveniently, 1/2-1=-1/2. This is really the same as this, it's just written here with the square root symbol instead of with the exponents. We can use this derivative rule to help explain certain numerological coincidences. Let's take a look. Look, the √9999 is 99.9949998, okay, so it keeps on going forever. It's irrational. But, this is bizarrely close to 99.99 instead of 499 just saying it's close to 99.995. Is this just a coincidence? This isn't a coincidence. Look. The square √10000 is 100 because 100^2 is 10,000. What I'm really doing here is wiggling the input. I'm going from 10,000 to 9,999. In other words, I'm trying to calculate the √10000-1, wiggling the input down a bit. What does the derivative calculate? Well, the derivative calculates the ratio between output change to input change. So, the √10000 wiggled down a little bit is about the √10000 minus how much I change the input by times the ratio of how much I expect the output change compared to the input change. Now, we can try to calculate the derivative at 10,000. What's the derivative at 10,000? Well, it's 1/2√10000.
The √10000 is 100, so it's 1/2*100. 1/2*100 is 1/200, which is 0.005. Look, the √9999 is so close to 99.995 because the √10000 is 100. And, when I shift the input down by one, this derivative calculation is suggesting that the output should be shifted down by about 0.005, and indeed it is. This is a great example of calculus. Yes, you could have asked your calculator to compute the √9999, but you couldn't have asked your calculator to tell you why. Why is that answer so mysteriously close to 99.995? In short, calculus is more than calculating. It's not about answers, it's about reasons. It's about explanations about the stories that human beings can tell to each other about why that number and not another. But, that's not say that the numbers aren't fun to play with themselves, and we can use this same trick to do other amazing feats like · , we can try to estimate the √82. I know the √81 is 9. I'm trying to say something about the √82. I'm trying to wiggle the input up a little bit. Well, derivatives have something to say about that. The √81+1, the √82, would be about the √81, which is 9, plus how much I expect the output to change. I wiggled the input, I expect the output to change by some amount. Well, the derivative is measuring how much I expect the output to change by. So, I'm going to take the derivative of the function at 81, at the square root function at 81, I'm going to multiply by how much I'm wiggling the input by. This will be how much I expect the output to change when I change the input. Now, in this specific case, what's the derivative at 81? Well, that's 1/2√81, which is 1/2*9. The √81 is 9, which is 1/18. So, I would expect the √82 to be about 9+1/18 because I expect wiggling the input up to wiggle the output up by about 1/18. And this is pretty good. There's actually two different ways to tell if this isn't such a bad guess. Here's here's one way to tell. What's what's 1/18? Well, it's 0.05 repeating. And, what's the actual value of the √82? It's 9.055. Look, it's pretty close to 9 plus this. That's pretty good. Another way to see that this isn't such a bad guess is just to take 9+1/18, and square it. When I square 9+1/18, I get 9^2+2*9/18+1/1/8^2. 2*1/2=1.
This is 81+1=82, and 1/18^2 is the very small number 1/324. So, either way you look at it, we're doing pretty good to guess the √82 is about this, and we're doing it with derivatives. Again, it's derivatives for the win. By relating the input change to the output change, we're able to estimate the values of functions that would be very hard to access directly.
What information is recorded in the sign of the derivative?
In the future, we're going to have a lot of very precise statements about the derivative. But before we get there, I want us to have some intuition as to what's going on. Let's take a look at just the S, I, G, N, the sign of the derivative. The thick green line of the plot is some random function, and the thin red line is its derivative. And note that when its derivative is positive, the function's increasing, and when the derivative is negative, the function's decreasing. We can try to explain what we're seeing here formally, where that calculation on paper. So let's suppose that the derivative is positive over a whole range of values. And, we also know something about how the derivative is related to the functions values. The function's output or x+h is close to the functions output of x plus how much the derivative tells us the output should change by, which is how the input changed by times the ratio change of output change to input change. Alright. Now let's suppose that x+h is a bit bigger than than x. Well, what that's really saying is that h is positive, right? I shift the input to the right a little bit. Well then, h*f'(x) is going to be positive because a positive number times a positive number is positive. And that means that f(x)+h*f'(x) will be bigger than f(x). We're just add something to both sides of this inequality. Now, f(x)+h*f'(x), that's about f(x)+h. So, although this argument isn't entirely precise yet, what it looks like it's saying is that the function's output at x+h is bigger than the function's output at x. So, if you plug in bigger inputs, you get bigger outputs. What about when the derivative is negative? We can play the same kind of game when the derivative's negative. Here we go. So again, x+h is just a bit bigger than x. And in that case, h is positive. But I've got a positive number times a negative number, h times the derivative of f is negative. Now, if I add f(x) to both sides, got that f(x)+h*f'(x) is less than f(x).
But this is approximately the new output value of the function at x+h. So, I've got that the function's output at x+h is a little bit less than its output at f. So, a bigger input is giving rise to a smaller output. Even a little bit of information, whether or not the derivative is positive or negative, says something about the function. And you can see the same thing in your own life. For instance, suppose that the derivative of your happiness with respect to coffee is positive. What does that really mean? Well, that means that you should be drinking more coffee because an increase in coffee will lead to greater happiness. Of course, this is only true up to a point. After you've had a whole bunch of coffee, you might find that the derivative of your happiness, with respect to coffee, is zero. You should stop drinking coffee. Now, this makes sense because the derivative depends upon x, right? It depends upon how much coffee you've had. Not very much coffee, the derivative might be positive. But after a certain point, you might find that the derivative, vanishes. This seems like a silly example. coffee and happiness. But, so many things in our world are changing. And those changing things affect other things. The question is that when one of those things changes, does the other thing move in the same direction or do they move in opposite directions? And the sign, the S, I, G, N of the derivative records exactly that information.
How do differentiability and continuity relate?
Why is a differentiable function necessarily continuous?
. Remember, continuity is all how nearby inputs are sent to nearby outputs. Differentiability is how wiggling the input affects the output. In light of this, they seemed related, right? Something like the following seems plausible. Here's the theorem. Theorem. If f is differentiable at a, then f is continuous at a. In other words, a differentiable function is continuous. Morely, we know that a differentiable function is continuous. But were advanced enough at the is point in the course to give a precise argument using limit. Here we go. Let's suppose that f prime of a exists. In other words, that means a certain limit exists. What limit? Well, the limit of f(x)-f(x)/x-a as x approaches a. This limit of a difference quotients computes the derivative for the function at a. So, to say that the derivative exists is to say that this limit exists. Now, here comes the trick. What I'd like to compute is the limit of f(x)-f(a) as x approaches a, but I don't know how to do that directly. But, I can rewrite this thing I'm taking the limit of as a product. Watch. Instead of taking this limit, I'm going to take the limit as x approaches a of x-a times this difference quotient, times f(x)-f(a)/x-a. Now, as long as x isn't equal to a, this product is equal to this difference. Now, why does that help? Well, this is a limit of a product. So, by one of the limit laws, the limit of a product's the product of the limits as long as the limits exist. And in this case, they do. So, this limit of this product is, the product of the limits. It's the limit of x-a as x approaches a, times the limit of f(x)-f(a)/x-a. I'm only allowed to use this limit law because I know both of these limits exist. Now, this first limit, the limit of x-a as x approaches a, that's 0. And this second limit, well, this limit exists precisely because I'm assuming differentiability, the function's are differentiable. So, this limit is calculating the derivative at a, and zero times any number is equal to zero. The upshot here is that we've shown that the limit of f(x)-f(a)=0 as x approaches a. Why would you care about this? How does that help us? We know that the limit of f(x)-f(a) as x approaches a is equal to 0. What that means is that the limit of f(x) as x approaches a is equal to f(a), but this is just the definition of continuity. So now we know that f is continuous at the point a. That's where we ended up. Remember, what we started with. We started by assuming that f was differentiable at a. And after doing all this work, we ended up concluding that f is continuous at the point a. So, differentiability implies continuity. One way to keep track of arguments like this is to think about clouds and rain. Theorem. If it is rainy, then it is cloudy. A shorter way of saying this, rainy implies cloudy. Now the question is, does it go the other way? If it's cloudy, is it necessarily rainy? Can you think of a cloudy day with no rain? Yes, today. Let's look out the window. It is very cloudy but there's no rain. Beyond clouds and rain, let's bring this back to the mathematics. A differentiable function is continuous, can you think of a continuous function which isn't differentiable? You might want to hit pause right now, if you don't want the puzzle given away. Here's an example of a function which is continuous but not differential, the function f(x)=|x|.
We recently saw that the absolute value function wasn't differentable at zero. But how do we know that the absolute value function is continuous everywhere? We know that the absolute value function is continuous. I mean, look at it. It's all one piece. But, we can do better. We can use our limit knowledge to make a more precise argument. We know that f(x)=|x| is continuous for positive inputs. It's continuous on the open interval from zero to infinity because the function x is continuous there. And this function, the absolute value function, agrees with the function x if I plug in positive numbers. Likewise, I know that the function is continuous on negative inputs because the function -x is continuous there, and the function -x agrees with this function on this interval. The only sticking point is to check that the function's continuous at zero. And if I know it's continuous for positive inputs, negative inputs, and it's continuous at zero, then I know that it's continuous for all inputs. Now, how do I know that the absolute value function is continuous at zero? Well, that's another limit argument, right? The limit of the absolute value function when I push from the right-hand side is the same as the limit of the absolute value function when I push from the left-hand side, they're both zero. And because these two one-sided limits exist and agree, then I know the two-sided limit of the absolute value function is equal to zero, which is also the function's value at zero. And therefore, the abslute value function is continuous. In the end, there's some relationship between differentiability and continuity. Differentiable functions are continuous. Mathematics isn't just a sequence of unrelated concepts. It's a single unified whole. All of these ideas are connected at the deepest possible levels.
What is the derivative of a constant multiple of f(x)?
Here is the question that I want to address right now. What's the derivative of some constant multiple of some function? Now, in this case the constant multiple is 2 but of course that 2 could be replaced by any sixth number. If you don't like the D D X notation, another way to ask this question is this. If you've got some new function G and it's the constant multiple times F, again in this case I'm using 2 as the constant, the question is, what's the derivative of G in terms of the derivative of F? To gain some intuition, let's pick a specific example, and look at a graph. So here's the graph, of, just some random function. Let's suppose that I stretch this graph, in the y direction. So now I stretch the y axis, and that corresponds to multiplying the function by a constant value. In this case, to, how do the tangent lines change when I do this stretching? So if I double the Y axis, the function changes by twice as much for the same input change. So if I double the Y axis, the slope of the tangent line also doubles and that makes sense numerically. Here's what I know. I know that g of x is twice f of x. G is this constant multiple of f. I also know something about the derivative of f. The derivative encodes how input changes become output changes, or, a bit more precisely, the derivative in the limit is the ratio of output change to input change. So if I multiply the ratio of output change to input change by an actual input change, this at least approximately is telling me how much the output should change when I move from x to x plus h. Right, f's new output at the input x plus h. Is it's old output plus how much I expect the output to change. This is a really nice way to summarize what the derivative's saying. I know another thing. I know that g of x plus h is twice f of x plus h just because g is twice f for any input value x, so in particular that's true when the input is x plus h. These two statements are connected. Alright, G of X plus H is twice F of X plus H and F of X plus H is approximately this. So I can combine those two statements together in this statement. G of X plus H is about twice f(x)+h is approximate value, right. 2f(x)+2h*f-prime(x), which I've written as h*2f-prime(x). I made this a little bit nicer. Since 2f(x) is g(x), I can replace this 2*f(x), with g(x). And this is really looking good. This is telling me that g's output at x+h is about, g's output at x, plus how much I change the input by, times some quantity. Now, considering that the actual derivative of g would tell me some information like this. That g's output is about, g's old output plus how much I change the input by times the derivative. You're beginning to see what's going on here, right? Look, I've got the derivative of g here. And I've got twice the derivative of f here. And if you sort of believe that these statements are connected in this way, you might then believe that the derivative of g is twice the derivative of f. The derivative of g here is twice the derivative of f. We can formalize this as a rule. Here's the constant multiple rule. So let k be constant and suppose that f is just some function which is differentiable at the point a. G is that constant multiple of f, so g of x is k*f(x). Given this setup, what the constant multiple rule is concluding, is that the derivatives are related in the same way. The derivative at the point A is K times the derivative of F at the point A. Now if you don't like this prime notation you could also write it using the D D X notation. So here's how I write the constant multiple law using the D D X notation. The derivative of k times a function is k times the derivative of that function. I encourage you to keep practicing. With time, you'll be able to calculate the derivative of just a ton of different functions.
How do I find the derivative?
Why is the derivative of x^2 equal to 2x?
We're going to calculate the derivative of x squared with respect to x. maybe a little bit more prosaically, I want to know how wiggling x affect x squared? There's a ton of different ways to approach this. Let's start by looking at this numerically. So let's start off by just noting that 2 sqyared is 4, and I'm going to wiggle the 2 and see how the 4 wiggles. Instead of plugging in 2, let's plug in 2.01. 2.01 squared is 4.0401. And let's just keep on going with some more examples. 2.02 squared is 4.0804. 2.003 squared, say, is 4.012009. Alright, so those are a few examples. I've wiggled the inputs, and I've seen how the outputs are affected. And of course the, all the outputs are close to 4, alright? But they're not exactly 4. When I wiggled the input from 2 to 2.01, the output changed by about .04, and a little bit more, but that'ts a lot smaller. When I wiggled the input from 2 to 2.02, the output changed by about .08, not exactly .08, but pretty close to .08. And when I wiggled from 2 to 2.003, the output changed by about. About .012 and a little bit more, but, you know, it's close. Now look at the relationship between the input change and the output change. The input change by .01, the output change by 4 times as much, about. The input change by .02, the output change by 4 times as much. The input change by .003, the output change by about 4 times as much. I'm going to summarize that. The output change is the input change magnified by 4 times. Right? The input change by some factor. And the output change by about 4 times that amount. Let's see this at a different input point. Instead of plugging in 2, let's plug in 3 and see what happens. So 3^2 is 9, but what's, say 3.1^2? That's 9.61. Or what's 3.01^2? Well, that's 9.0601. maybe wiggle down a little bit. What's 2.99 squared? That's close to 3 but wiggling down by .01. That's 8.9401. Let's see how much roughly the output changed by. When I went from 3 to 3.1 the output changed by Out point 6. When I went from 3 to 3.01, the output changed by about .06, and when I went from 3 down to 2.99, the output when down by about .06 again. Little bit less. Now what's the relationship between the input change and the output change? Well here the input changed by .1, the output change by .6, about 6 times as much. Again, the input change by .01, the output changed by About six times as much. And when the input went down by .01 the output went down by about six times as much. So again, we're seeing some sort of magnification of the output change to the input change, but now it's magnified not by four times But by six times. So the important lesson here is that the extent to which wiggling the input affects the output depends on where you're wiggling. If you're wiggling around 2, the output is being changed by about four times as much. If you're wiggling the input around 3, the output is being change by about six times as much. Instead of doing just a few numerical examples, let's generalize this by doing some algebra. So, I'm starting with x^2 and I'm going to wiggle x and see how x^2 is effected. So, instead of plugging in x, I'll plug in x + something, let's call the change in x, h. Now I want to know, how is this related to x ^ 2? Well I can expand out (x+h)^2, that's x^2+2xh+h^2. So when I wiggle the input from x to x+h, how is the output being affected? Well the output, is the old output value plus this change in output value 2xh+h^2, h^2 is pretty small. When h is small, h^2 is really small so I'm going to throw that away for now. And just summarize this by saying that the output change is 2xh and the input change. Is h. Now the derivative is supposed to measure the relationship between the output change and the input change. So I'm going to take the ratio of the output change to the input change, and 2xh/h=2x, as long as h isn't 0. This is the ratio of output change to input change and that makes sense, right? Think back to what just happened here a minute ago, when we were plugging in some nearby values and seeing how the outputs were affected. When I was wiggling the input around 2, the output was changing by about twice 2. When I was wiggling the input around 3, the output was changing by about twice 3, alright? 2x is the ratio of output change to input change. If the algebra's not really speaking to you, we can also do this geometrically, like drawing a picture. Here's a square of side length x. The area of this square is not, coincidentally, x^2. Now I want to now the derivative of x^2 with respect to x. I want to know how changing x would affect the area of this square. Now to see this here is another square. This is a slightly larger square of side length x+h. h is a small but positive number. So how does the area of this new square compare to the area of this old square? Let me put the old square on top of the new square, and you can see that when I change the input from x to x+h, I gained a bit of extra area. The derivative is recording the ratio of output change to input change. So, I want to know what's the ratio of this new area as compared to just the change in the input H. So, let me pull off the extra area. There is extra area, is this L shaped region. How big is this L shaped region? Well, this short side here, has side length h. This side length here, is also h. This is the extra length that I added when I went from x to x+h. This inside has length x, and this inside edge has length x. Now I want to know the area of this region. To see that, I'm going to get out my scissors and cut this region up into 3 pieces. Now here's one of those pieces. And, here's another one of those pieces. And, here's the third piece. So these are the 2 long thin rectangles, and they've both got height h, and length x. I'm also left with this little tiny corner piece. And that little tiny corner piece has side length h, and the other side is also length h. It's a little tiny square. Well the limit, this little tiny corner piece, is infinitesimal. I'm going to throw this piece away and most of the area is left in these 2 long, thin rectangles. If I rearrange these long, thin rectangles a bit, can put them end to end. They've both got height h. So I can put them next to each other like this. And their base is both length x. So how much area is in this long thin rectangle? Well, it's height h, it's width is 2x. So the area is 2x * h. Now this is the additional area, excepting for that little tiny square, which we gained when I changed the size of the square from x to x + h. So the change in output is about 2 * x * h. The change in input was h, so the ratio of output change to input change Is 2 * x. Maybe what we're doing here seems a little bit wishy washy, not really precise enough. But we can also calculate the derivative of x ^ 2 with respect to x, by just going back to the definition of derivative in terms of limits. Carefully, f of x is x^2. And the derivative of f is by definition the limit as h approaches 0. F of x plus h minus f of x, the change in output divided by h, the change in input. In this case f of x plus h is just x plus h squared and f of x is just x squared. I'm dividing by h. I can expand this out. This is the limit as h approaches 0 of ((x+h)^2-x^2)/h. Now I've got an x^2-x^2, so I can cancel those, and I'm just left with the limit, as h approaches 0, of (2xh+h^2)/h. More good news, in the limit I'm never going to be plugging h=0, so I can replace this With an equivalent function that agrees with it when h is close to but not equal to 0. In other words, maybe a little bit more simply, I'm canceling an h from the numerator to the denominator. So, 2xh over h is just 2x and h^x over h is just an h. Now what's the limit of this sum? Well that's the sum of the limits. It's the limit of 2x as h approaches 0 + the limit of h as h approaches 0. Now as far as wiggling h is concerned, 2x is a constant, so the limit of 2x as h approaches 0 is just 2x. And what's the limit of h as h approaches 0? Well, what's h getting close to when h is close to 0. That's just 0. So, this limit is equal to 2x and that's the derivative of x^2. What that limit is really calculating is the slope of a tangent line at the point x and we can see that it's working. This is the graph of y=x^2. At -4 the slope of the tangent line is -8 At 2, the slope of the tangent line is 4, and, at 6, the slope of the tangent line is 12. There's a ton of different perspectives here. We've been thinking about the derivative of x ^ 2 with respect to x, numerically, algebriaically, geometrically, going back to the definition of derivative in terms of limits, looking at it in terms of slopes of tangent lines. What makes derivatives so much fun is that there just so many different perspectives on this single topic, no matter how you slice it. We've shown that the derivative of x squared with respect to x is 2 times x. Maybe you like algebra, maybe you like geometry, maybe you just like to play with numbers. But now matter what your interests are, derivatives have something to offer you.
What is the derivative of x^n?
Here is the so-called power rule for differentiating x^n. Nevertheless, here we go. When n=1, the derivative of just x^1, which is just x, is equal to 1. This should make sense because what's the derivative measuring? The derivative is measuring output change compared to input change. And, in this case, the function is just the function that sends x to x. The input and the output are exactly the same. So, the input and the output change is exactly the same, their ratio is just 1. And consequently, the derivative of x, the derivative of the identity function is 1. For the time being, we're just going to think about this when n is a positive whole number. But even there, it's pretty tricky. Admittedly, when n=1, you're probably going to be pretty unimpressed. The derivative of x^n is n*x^n-1. What's n? n can be any real number except for zero. You should think about what you don't want to plug in zero for n. When n=2, that means we're differentiating x^2, which we studied a little bit ago. Now, here is the power rule. If I plug in 2 for n, I've got the derivative of x^2=2*x^2-1. Or a bit more nicely written, the derivative of x^2=2x. I really remember, we really did study this in quite some detail, you know, algebraically, numerically, geometrically. When n=3, we can still study the derivative of x^3 in a geometric way. So, here's the power rule. You plug in n=3, and you get the derivative of x^3=3*x^3-1, 3*x^2. We can see this geometrically. We start with a cube of side length x. And we're going to glue on three green slabs of side length x, x, h. Now, in order to actually thicken up the cube, we've got to glue on a few more pieces, these blue pieces and this red corner piece. But once we've done that, now we've built a cube of side length x+h. How is the volume changed? Well, most of the change in volume happened in these three green slabs, and those three green slabs have volume 3x^2h. The change in the side length of cube is h. Geometric argument is showing us that the derivative of x^3 is 3*x^2. When n=4, we're trying to differentiate x^4. But that would involve not a cube, but a hypercube. It seems a bit ridiculous to try to gain intuition about the derivative of x^3 by doing something as esoteric as studying 4-dimensional geometry. So instead, let's differentiate x^3 directly by going back to the definition of derivative. So, let's proceed directly. I want to compute the limit as h approaches 0 of x+h^4-x^4/h.
What is this computing? This is the limit of the difference quotient. This is the derivative of x^4 at the point x. Now, to proceed, I'm going to make this a little bit smaller. It's a bit too big to work with. This is the limit I'm trying to calculate. The first step is to expand out x+h^4. And if I expand x+h^4, this is what I get. (h^4+4h^3x+6h^2x^2+4hx^3+x^4). And now, you'll notice something very exciting. I've got an x^4-x^4 so I can cancel those two terms and I'll be left with a limit of everything else. h^4+4h^3x+6h^2x^2+4hx^3/h.
But more good news, every single term up in the numerator here, has an h in it. So, I can cancel those h's without affecting the limit. And this limit is the same as the limit of h^3+4h^2x+6hx^2+4x^3. Why? Well, look. h^4/h gives me the h^3. 4h^3x/h gives me the 4h^x/h gives me the 4h^2x, and so forth. Now, we're practically there. I want to evaluate this limit. Most of these terms here have got an h in it, so when I take the limit, these terms are all 0. The only term that survives is this one which as far as h is concerned is a constant. It's the limit of 4x^3 as h approaches 0. That's just 4x^3. And because this whole mess is calculating the derivative of x^4, what I've really done here is shown, from the definition of derivative, that the derivative of x^4 is 4x^3. This limit calculation is perhaps complicated enough to give us a glimpse into the whole story. What's the derivative of x^n? Trying to show the derivative of x^n is nx^n-1. And to do that, we go back to the definition of derivative and try to calculate this limit. The limit is h goes to 0 of (x+h^n)-x^n/h.
Just like the case when n was 4, the first step is to expand this out. But here, it's a bit trickier, right? To expand out x+h^n, I don't know exactly what n is. n's just some positive whole number so I can't write down exactly what it is. But I can write down enough of it to get a sense of what's going on in the story. h^n+nh^n-1x+, and hidden in this dot, dot, dot is all kinds of other terms that have h's in them, plus nhx^n-1+x^n-x^n/h. Just like before, I've got an x^n and a -x^n, so I can cancel those. And now, I'm left with just these terms, still a bunch of terms with h's in them. And note that every single term in the numerator here has an h, so I can then do the division just like before. The h^n/h becomes h^n-1, and h^n-1x becomes nh^n-2x.
Everything in the dot, dot, dot here has at least an h^2 in it. So, when I divide it by h, everything that's left over still has at least one h in it. This last term nhx^n-1/h becomes nx^n-1 after I divide by h. And now, look. This is a limit. As h approaches 0, this term dies, this term dies, all of these terms with h's in them dies. The only thing that's left is this term here, nx^n-1, and that means that this entire limit is equal to nx^n-1. This limit is calculated in the derivative of x^n. So, what we've really managed to do is show that the derivative of x^n is nx^n-1.
What is the derivative of x^3 + x^2?
. You've heard of the, ask not what your country can do for you, but what you can do for your country rule. These chiastic rules for limits. The limit's the sum of the limits. Same thing is true for derivatives. Let's go to the board. Here's the rule for derivatives. The derivative of f + g. Is the derivative of f + the derivative of g. In short, the derivative of the sum is the sum of the derivatives. Why does this make sense? Well, think back to what the derivative is measuring. The derivative is measuring how changing the input affects the output. In this case, I want to know how changing x affects the sum of f of x and g of x. Well, the sum is affected by the sum of the effects. The sum of the derivative of f and the derivative of g. Let' see this in a specific case. Here's a specific case. The function f(x) = x^3 + x^2. Let's differentiate this. I'm going to calculate d / dx (x^3+3=x^2).
Now this is a derivative of a sum, which is the sum of derivatives. Now we have to figure out what's the derivative of x ^ 3 and what's the derivative of x^2. That's the power law. derivative of x^3 is 3x^2, and the derivative of x^2 is 2x. And now there's no more d/dx's. We've calculated the derivative. The derivative of x^3+x^2+2x. Once we know this, we can figure out where the derivative is positive and where it's negative. So, the derivative was 3x^3+2x and I want to know where that's positive and negative, which values of x make that bigger than 0, which values of x make that less than 0. one approach to thinking about this is to factor through the x^2+2x. I can write that as x(3x+2). And once I factor it like this I can figure out the SIGN of this by figuring out the SIGN of these two terms separately. visualize this as a direct whole number line. So x, here's a number line. X is positive when it is bigger than 0 and negative when X is less than 0. That's not too complicated. Well look 3x+2. Well, 3x+2, I draw a number line for that. The exciting point is -2/3. When x is less than -2/3, 3x+2 is negative. And when x is bigger than -2/3's, then 3x+2. Is positive. Now, I really don't care about x and 3x+2 separately. I want to put them together right. I want to know when their product is positive or negative. So, I write down the product x * 3x+2 make a new number line here. I'll record both of these points -2/3 and 0 then I can think about what happens. When x is less than -2/3 then x is negative and 3x+2 is negative. So the product is a negative * a negative, which is positive. When x is between -2/3 and 0, then x is negative but 3x+2 is positive and a negative times a positive number is negative. And finally, when x is bigger than 0, well then x is positive and also 3x+2 is positive. So the Is positive. So, here on this number line, I've recorded the information about when 3x^2 + 2x is positive or negative. Now we can use this information to say something about the graph. Here's the graph of the function x^3+x^2. Goes up, down and up, And that's exactly what you'd expect from the derivative, right? We calculated before that if you're standing to the left of - 2/3's, then the derivative was positive, and indeed the functions going up. Up. Now once you get to -2/3, the derivative is 0 but then over here, between -2/3 and 0 the derivative is negative and indeed the graph is moving down until you get to 0 when the derivative of this function is positive again and the graph is going up. Look, the sign of the derivative Positive, negative, positive is reflected in the direction that this graph is moving. Increasing, decreasing, increasing. Incredible. By being able to differentiate x^2+x^3, we're able to gain real insight into the graph of the function. We're not just plotting a whole bunch of points and hoping that we can fill it in with a straight line. By looking at the derivative, we know that the function is increasing and decreasing. We're able to say something, for sure.
Why is the derivative of a sum the sum of derivatives?
. Looks like I've got two functions f(x) and g(x), and they're both differentiable at a. Then I can define a new function. h(x) which is the sum of f and g. Alright it's a new function, to compute h(x) I just plug x into f and I plug x into g and I add together whatever f and g give me. Alright so that's a new function that I build from f and g. Now here's the conclusion, right? Then each prime of a is just the sum of the derivative of f at a and the derivative of g at a. And to prove something like this, this is a really a theorem, right? This is a theorem that tells me how to compute. The derivative of the sum of functions. And how do I prove something like this? Why I just go back to the definition of derivative. Alright. The derivative of this function h at the point a is the limit as x goes to a of, h(x)-h(a)/x-a. Now I know what h(x) is. h(x) is f(x) + g(x). So I can plug that in. Alright, so this is the limit as x goes to a of f(x)+g(x).
And I also know what h(a) is, right? I just plug in a for x. And I get that h(a) is f(a)+g(a). And this is all divided by the same denominator, x-a. Great. I want to calculate that limit, right?. Well, I can rearrange the numerator, so the numerator is the same as what? This is f(x) + g(x) - f(a). Minus g of a, but rearrange the numerator and get f of x minus f of a plus g of a x minus g of a and this is divided by x minus a. Now what do I do? Well I can actually split this up into 2 separate fractions, alright? This is f(x)-f(a)/x-a, g(x)-g(a)/x-a.
That's a As a limit is x goes to a How do I calculate that limit. Okay. I'm just applying these, these rules for calculating limits and one for the rules of calculating limits is the limit of the sum is the sum of the limits provided the limits exist. What are these 2 limits? Well, this is really the derivative of f(a) and this is really the derivative of g(a). And I assume that f and g are both differentiable at a. So, those limits do, do exist and I can apply the limit of the sum and the sum of the limits. So this = the limit as x goes to a of f(x)-f(a)/(x-a) + the limit as x goes to a of g(x)-g(a)/x-a, because I know those 2 limits exist. And I even know what they're equal to, right? I have a name for those 2 limits. This 1st limit is the derivative of f at a, this 2nd limit is the derivative of g at a. So this is f prime of a plus g prime of a and that's exactly what I wanted to show,right? I wrote down the definition of derivative of h at the point a, there is is and I applied properties of limits until I conclude that that limit is equal. To the derivative of f(a) + the derivative of g(a), alright? And this is what tells me how to calculate the derivative of a sum. Alright, if I've got a sum of 2 functions, this is telling me that as long as those 2 functions are both differentiable at a, I can calculate the derivative by just adding together the derivatives of f and g. And hopefully this, this should seem reasonable, right? Because what is the derivative measuring, right, it's measuring how much change in the input changes the output. Right, I want to know how much wiggling the input a, would effect the output of H, and that's what this derivative is measuring. Right? Well, that's really going to be, you know, somehow connected to how wiggling the input to f changes f and wiggling the input to g changes g and I'm just adding them together. So I think this makes sense that, then the, how the output changes which would be the sum of how these 2 component functions change.
How do we compute derivatives?
I'm gonna say a little bit about how this course as a whole is structured. Right, we're really covering three topics. Covering limits, derivatives, and integrals. And for each of those three topics, we're looking at three different things. We're looking at the definition, the sort of concepts behind it. Looking at techniques, sort of how we work with those concepts. And then we're looking at applications, what can we do with it. So, take a look at limits, all right. We learned about the definitions for the concepts behind limits at first, and then we learned some techniques for doing computations involving limits. Say, using a little bit of algebra or thinking about infinity.
Well the reason that we care about limits was because of the derivative. Right? We use the limits to define the derivative and to introduce some of the concepts of the derivative.
And now that we've introduced the concept of the derivative, we wanna go into the techniques or the computational questions. How do you actually compute a derivative? And in order to do those kinds of computations, we have to think about how would you calculate the derivative of a product of two things, if you knew the derivative of those two things. Or how would you calculate the derivative of a fraction if you knew how to differentiate the numerator and the denominator separately? Those are the kinds of questions that'll occupy us now.
What is the derivative of f(x) g(x)?
What's the derivative of a product of two functions? The derivative of a product is given by this, the Product Rule. The derivative of f times g is the derivative of f times g plus f times the derivative of g. It's a bunch of things to be warned about here. This is the product of two functions, but the derivative involves the sum of two different products. It's the derivative of the first times the second plus the first times the derivative of the second. Let's see an example of this rule in action. For example, let's work out the derivative of this product, the product of 1+2x and 1+x^2.
Alright, well here we go. This is a derivative of product, so by the Product Rule, I'm going to differentiate the first thing, multiply by the second, and add that to the first thing times the derivative of the second. So, it's the derivative of the first term in the product times the second term in the product, derivative of the first function times the second, plus the first function, 1+2x, times the derivative of the second. So, that's an instance of the Product Rule. Now, this is the derivative of a sum, which is the sum of the derivatives. So, it's the derivative of 1 plus the derivative of 2x times 1+x^2 plus 1+2x
times the derivative of a sum, which is the sum of the derivatives. Now, the derivative of 1, that's a derivative of a constant function that's just 0, this is the derivative of a constant multiple so I can pull that constant multiple out of the derivative, times 1+x^2+1+2x times, the derivative of 1 is 0, it's the derivative of a constant, plus the derivative of x^2 is 2x.
Alright. Now, I've got 0+2 times the derivative of x. The derivative of x is just 1. So, that's just 2*1*(1+x^2)+(1+2x)*(0+2x).
So, there it is. I could maybe write this a little bit more neatly. 2*(1+x^2)+(1+2x)*2x. This is the derivative of our original function (1+2x)*(1+x^2).
We din't really need the Product Rule to compute that derivative. So, instead of using the Product Rule on this, I'm going to first multiply this out and then do the differentiation. Here, watch. So, this is the derivative but I'm going to multiply all this out, alright? So, 1+2x^3, which is what I get when I multiply 2x by x^2, plus x^2, which is 1*x^2+2x*1. So now, I could differentiate this without using the Product Rule, right? This is the derivatives of big sum, so it's the sum of the derivatives. The derivative of one, the derivative of 2x^3, the derivative of x^2, and the derivative of 2x. Now, the derivative of 1, that's the derivative of a constant, that's just 0. The derivative of this constant multiple of x^3, I can pull out the constant multiple. The derivative of x^2 is 2x and the derivative of 2-x, so I can pull out the constant multiple. Now, what's 2 times the derivative of x^3? That's 2 times, the derivative of x^3 is 3x^2+2x+2 times the derivative of x, which is 2*1. And then, I could write this maybe a little bit more nicely. This is 6x^2+2x+2. So, this is the derivative of our original function. Woah. What just happened? I'm trying to differentiate 1+2x*1+x^2.
When I just used the Product Rule, I got this, 2*(1+x^2)+(1+2x)*(2x). When I expanded and then differentiated, I got this, 6x^2+2x+2. So, are these two answers the same? Yeah. These two answers are the same. let's see how. I can expand out this first answer. This is 2*1+2x^2+1*2x plus 2x*2x is 4x^2.
Now look, 2, 2x^2+4x^2 gives me 6x^2. And 1*2x gives me this 2x here. These are, in fact, the same. Should we really be surprised by this? I mean, I did do these things in a different order. So, in this first case, I differentiated using the Product Rule and then I expanded what I got. In the second case, first, I expanded and after doing expansion, then I differentiated. More succintly in the first case, I differentiated than expanded. In the second case, I expanded then I differentiated. Look, you'd think the order would matter. Usually, the order does matter. If you take a shower and then get dressed, that's a totally different experience from getting dressed and then stepping into the shower. The order usually does matter and you'd think that differentiating and then expanding would do something really different than expanding and then differentiating. But you've got real choices when you do these derivative calculations, and yet somehow, Mathematics is conspiring so that we can all agree on the derivative, no matter what choices we might make on our way there. And I think we can also all agree that that's pretty cool.
Morally, why is the product rule true?
We've used the product rule to calculate some derivatives. We've even seen a proof using limits, but there's still this nagging question, why? For instance, why is there this + sign in the product rule? I mean, really, with all those chiastic laws, the limit of a sum is the sum of the limits, limit of products is the product of limits, you'd probably think the derivative of a product is the product of the derivatives, I mean, you think that if you differentiated a product, it'd just be the product of the derivatives. No, that's not how products work. What happens when you wiggle the terms in a product? We can explore this numerically, so play around with this. I've got a number a and another number b, and I'm multiplying them together to get some new number, ab. initially, I've said a=2 and b=3, so ab=6. But now I can wiggle the terms and see how that affects the output. So what if I take a and move it from 2 to 2.1? Well, that affects the output, the output is now 6.3. Conversely what if I move that back down and I move b from 3 to 3.1? Well, that makes the output from 6 to now 6.2. The deal here is that wiggling the input affects the output by a magnitude that's related to the size of the other number, right? When I went from 2 to 2.1, the output was affected by about three times as much, the 3. When I moved the 3 from a 3 to a 3.1, the output was affected by about two times as much and these affects add together. What if I simultaneously move a from 2 to 2.1 and move b from 3 to 3.1, then the output is 6.51, which is close to 6.5 which is what you guessed the answer would be if you just add together these effects. We can see the same thing geometrically. Geometrically, the product is really measuring an area. So let me start with a rectangle of base f(x) and height g(x). The product of f(x) and g(x) is then the area of this rectangle. Now, I want to know how this area is affected when I wiggle from x to say x+h. So lets suppose that I do that. Let's suppose that I slightly change the size of the rectangle, so that now the base isn't f(x) anymore, it's f(x+h) and the height isn't g(x) any more, it's g(x+h). Now, how does the area change when the input goes from x to x+h? Well, that's exactly just computing this area and this L-shaped region here. I can do that approximately. I actually know how much the base changes approximately, by using the derivative, right? What's this length here approximately? Well, the derivative of f at x times the input change is an approximation to how much the output changes when I go from x to (x+h). So this distance is approximately f prime of x times h. Same deal over here. When the input goes from x to x+h, the output is changed by approximately the derivative times the input change, so this length here is about g prime of x times h. Now, I'm trying to compute the area of this L-shaped region to figure out how the area, the product changes when I go from x to x+h. Let me cut this L-shaped region up into three pieces. This corner piece is pretty small, so I'm going to end up disregarding that corner piece. but let's just look at these two big pieces here. This piece here is a rectangle and what's its area? Well, its base is f(x) and its height is g prime of x times h. So the area of this piece, is f(x) times g prime of x times h. What's the area of this rectangle over here? Well, its base is f prime of x times h and its height is g(x), so the area of this piece is f prime of x g of x times h Now, I want to know how did the area change when I went from x to x+h? Well, that's pretty close to the, the sum of these two rectangles. So the change in area is about f of x times g prime of x times h plus f prime of x times g of x times h. The derivative is the ratio of output change, which is about this, to input change, which in this case is h. I went from x to x+h. So now, I can cancel these h's, and what I'm left with is f of x times g prime of x plus f prime x times g of x. That's the product rule. That's the change in the area of this rectangle when I went from x to x+h divided by how much I changed the input h. The power rule isn't something that we just made up. It's not some sort of sinister calculus plot designed to turn your mathematical dreams into nightmares. This rule, the product rule, arises for understandable reasons. If you wiggle one of the terms in a product, the effect on the product has to do with the size of the other term. You add together these two effects and then you have some idea as to how the product changes based on how the terms change. This is more than just a rule to memorize. It's more that just a algorithm to apply. The product rule is telling you something deep about how a product is effected when it's terms are changed.
How does one justify the product rule?
We can drive the product rule by just going back, to the definition of derivative. So what is the definition of derivative say? It tell us that the derivative of the product of f of x and g of x is a limit. It's the limit as h approaches zero. Of the function at x+h, which, in this case, is the product of f and g, both evaluated at x+h, because I'm thinking of this as the function, so I'm plugging in x+h, and I subtract the function evaluated at x, which is just f(x)*g(x), and then I divide that by h. So, it's this limit of this difference quotient, that gives me the derivative of the product. How can I evaluate that limit? Here's the trick, I'm going to add a disguised version of zero to this limit. Instead of just calculating the limit of f(x+h)g(x+h)-f(x)g(x), I'm going to subtract and add the same thing. So here, I've got f(x+h)*g(x+h), just like up here. Now I'm going to just subtract f(x+h)*g(x+h), and then add it back in, plus f(x+h)*g(x). This is just zero, I haven't done anything. And I'm going to subtract f(x)*g(x) right here and I'm still dividing by h. So these are the same limits, I haven't really done anything, but I've actually done everything I need. By introducing these extra factors, I've now got a common factor of f(x+h) here and a common factor of g(x) here. So, I can collect those out and I'll get some good things happening as a result. Let's see exactly how this happens. So this is the limit as h goes to zero, I'm going to pull out that common factor of f(x+h) . And I'm going to multiply by what's left over g(x+h)-g(x) and I can put it over h. So that's these two terms. Now, what's left over here? I've got a common factor of g(x). And what's left over? f(x+h)-f(x) I'll divide this by h, and then the factor I pull out is g(x). So this limit is the same as this limit. Now this is a limit of a sum. So that's a sum of the limits provided the limits exist and we'll see that they do. So this is the limit as h goes to zero of f(x+h)*g(x+h)-g(x)/h plus the lim as h goes to zero of f(x+h)-f(x)/h*g(x).
Now what do I have here I've got limits of products which are the products of limits providing the limits exist, and they do and we'll see, so let's rewrite these limits of products as products of limits. This is the limit as h goes to zero of f(x+h) times the limit as h goes to zero of g(x+h)-g(x)/h. You might begin to see what's happening here, plus the limit as h goes to zero of f(x+h)-f(x)/h times the limit as h goes to zero of g(x). Okay, now we've got to check that all these limits exist, in order to justify replacing limits limits. But these limits do exist, let's see why? This first limit, the limit of f(x+h) as h goes to zero, it's actually the hardest one I think, of all these to see. Remember back, we showed that differentiable functions are continuous. This is really calculating the limit of f of something, as the something approaches x. And that's really what this limit is, and because f is continuous, because f is differentiable, this limit is actually just f(x). But I think seeing that step is probably the hardest in this whole argument. What's this thing here? Well, this is the limit of the thing that calculates the derivative of g, and g is differentiable by assumption. So, this is the derivative of g at x plus, what's this limit? This is the limit that calculates the derivative of f, and f is differentiable by assumption, so that's f (x)'. This is the limit of g(x), as h goes to zero. This is the limit of a constant. Wiggling h doesn't affect this at all, so that's just g(x). And look at what we've calculated here. The limit that calculates the derivative of the product is f(x)*g'(x)+f'(x)*g, that is the product rule. What have we really shown here? Well, here is one way to write down the product rule very precisely. Confusingly, I'm going to define a new function that I'm calling h. So h is just the product of f and g now, h(x)=f(x)*g(x). If f and g are differentiable at some point, a, then I know the derivative of their product. The derivative of their product Is the derivative of f times the value of g plus the value of f times the derivative of g. This is a precise statement of the product rule, and you can really see, for instance, where this differentiability condition was necessary. In our proof, at some point in the proof here, I wanted to go from a limit of a product to the product of limits. But in order to do that, I need to know that this limit exists. And that limit is exactly calculating the derivative of G. So you can really see where these conditions are playing a crucial role in the proof of the product rule.
What is the quotient rule?
Given what we've done so far we can differentiate a bunch of functions. We can differentiate sums and differences and products. But what about quotients. Given a fraction I'd like to be able to differentiate that fraction. I like to be able to differentiate a really complicated looking function like f(x)=2x+1/x^2+1, for instance. But we're stuck immediately because we don't have anyways to differentiate quotients, until now. Here's the The quotient rule so to state this really precisely let's suppose I got two functions f and g and then I define a new function that I'm just going to call h for now. H(x) is this quotient f(x) over g(x). Now I also want to make sure that the denominator isn't 0 at the point a so it makes sense to evaluate this function at the point a. And I want ot assume that f and g are differential at the point a. And I'm trying to understand how h changes so I'm going to need to know how f and g change when the input wiggles a bit. Alright so given all this set up, then I can tell you what the derivative of the quotient is, the derivative of the quotient at a is the denominator at a times the derivative of the numerator at a. Minus the numerator at a times the derivative of the denominator at a, all divided by the denominator at a^2. Let's use the quotient rule to differentiate the function that we saw earlier. So, the function we were thinking about is f(x)=2x+1/ x^2+1. I want to calculate the derivative of that, with respect to x. Now the derivative of this quotient is given to us by the quotient rule. It's just the denominator times the derivative of the numerator minus the numerator times the derivative of the denominator. That's all divided by the denominator squared. Now, I've calculated the derivative of this quotient in terms of the derivatives of the numerator and denominator. So we can simplify this further, X^2+1 times the derivative of this sum is the sum of the derivatives. It's the derivative of 2x + the derivative of 1-2x+1 times at again, the derivative of a sum, so the derivative of x^2, with respect to x plus the derivative of 1. And it's all divided by the denominator, the original denominator squared. I can keep going, I've got x^2+1 times what's the derivative of 2x? It's just 2. What's the derivative of this constant? zero, minus 2x+1 times, what's the derivative of x^2? It's 2x, and what's the derivative of 1? It's the derivative of a constant zero, all divided by x^2+1^2. So, this is the derivative of the original function we're considering, there's no more differentiation to be done and we did it using the quotient rule. We've done a ton of work on differentiation so far, we differentiate sums, differences, products, now quotients. What sorts of functions can we differentiate using all of these rules? Well, here's one big collection. If you've got a polynomial divided by polynomial, these things are called rational functions. Sort of an analogy with rational numbers which are integers over integers. A polynomial over a polynomial is by analogy, being called a rational function. Now, since this is just a quotient of two things you can differentiate, you can differentiate these rational functions. This is a huge class of functions that you can now differentiate. I encourage you to practice with the quotient rule. With some practice, you'll be able to differentiate any rational function that we can throw at you.
What is the meaning of the derivative of the derivative?
Thus far, I've been trying to sell you on the idea that the derivative of f measures how we wiggling the input effects the output. A very important point is that sensitivity to the input depends on where you're wiggling the input. And here's an example. Think about the function f(x)=x^3. f(2) which is 2^3 is 8. f(2.01) 2.01 cubed is 8.120601. So, the input change of 0.01 was magnified by about 12 times in the output. Now, think about f(3) which is 3^3, which is 27. f(3.01) is 27.270901 so the input change of 0.01 was magnified by about 24 times as much, right? This input change and this input change were magnified by different amounts. You know, you shouldn't be too surprised by that right, the derivative, of course, measures this. The derivative of this function is 3x^2, so the derivative at two is 3*2^2 is 3*4 is 12 and not coincidentally, there's a 12 here and there's a 12 here, right, that's reflecting the sensitivity of the output to the input change. And the derivative of this function at 3 is 3*3^2, which is 3*9 which is 27 and again, not too surprisingly here's a 27, right? The point is just that how much the output is effected depends on where you're wiggling the input. If you're wiggling around 2, the output is affected by about 12 times as much if we're wiggling around 3, the output is affected by 27 times as much, right? The derivative isn't constant everywhere, it depends on where you're plugging in. We can package together all of those ratios of output changes to input changes as a single function. What I mean by this, well, f'(x) is the limit as h goes to 0 of f(x+h)-f(x)/h. And this limit doesn't just calculate the derivative at a particular point. This is actually a rule, right, this is a rule for a function. The function is f'(x) and this tells me how to compute that function at some input X. The derivative is a function. Now, since the derivative is itself a function, I can take the derivative of the derivative. I'm often going to write the second derivative, the derivative of the derivative this way, f''(x). There's some other notations that you'll see in the wild as well. So, here's the derivative of f. If I take the derivative of the derivative, this would be the second derivative but I might write this a little bit differently. I could put these 2 d's together, so to speak, and these dx's together and then I'll be left with this. The second derivative of f(x). A subtle point here is if f were maybe y, you might see this written down and sometimes people write this dy^2, that's not right. I mean, it's d^2 dx^2 is the second derivative of y. The derivative measures the slope of the tangent line, geometrically. So, what does the second dreivative measure? Well, let's think back to what the derivative is measuring. The derivative is measuring how changes to the input affect the output. The deravitive of the derivative measures how changing the input changes, how changing the input changes the output, and I'm not just repeating myself here, it's really what the second derivative is measuring. It's measuring how the input affects how the input affects the output. If you say it like that, it doesn't make a whole lot of sense. Maybe a geometric example will help convey what the second derivative is measuring. Here's a function, y=1+x^2. And I've drawn this graph and I've slected three points on the graph. Let's at a tangent line through those 3 points. So, here's the tangent line through this bottom point, the point 0,1 and the tangent line to the graph at that point is horizontal, right, the derivative is 0 there. If I move over here, the tangent line has positive slope and if I move over to this third point and draw the tangent line now, the derivative there is even larger. The line has more slope than the line through that point. What's going on here is that the derivative is different. Here it's 0, here it's positive, here it's larger still, right? The derivative is changing and the second derivative is measuring how quickly the derivative is changing. Contrast that with say, this example of just a perfectly straight line. Here, I've drawn 3 points on this line. If I draw the tangent line to this line, it's just itself. I mean, the tangent line to this line is just the line I started with, right? So, the slope of this tangent line isn't changing at all. And the second derivative of this function, y=x+1, really is 0, right? The function's derivative isn't changing at all. Here, in this example, the function's derivative really is changing and I can see that if I take the second derivative of this, if I differentiate this, I get 2x, and if I differentiate that again, I just get two, which isn't 0. There's also a physical interpretation of the second derivative. So, let's call p(t), the function that records your position at time t. Now, what happens if I differentiate this? What's the derivative with respect to time of p(t)? I might write that, p'(t). That's asking, how quickly is your position changing, well, that's velocity. That's how quickly you're moving. You got a word for that. Now, I could ask the same question again. What happens if I differentiate velocity, I am asking how quickly is your velocity changing. We've got a word for that, too. That's acceleration. That's the rate of change of your rate of change. There's also an economic interpretation of the second derivative. So, maybe right now dhappiness, ddonuts for me is equal to 0, right? What this is saying? This is saying how much will my happiness be affected, if I change my donut eating habits. If I were really an economist I'd be talking about marginal utility of donuts or something, but, this is really a reasonable statement, right? This is saying that right at this moment you know, eating more donuts really won't make me any more happier and I probably am in this state right now, because if this weren't the case, I'd be eating donuts. So, let's suppose this is true right now and now, something else might be true right now. I might know something about the second derivative of my happiness with respect to donuts. What is this saying? Maybe this is positive right now. This is saying that a small change to my donut eating habits might affect how, changing my donut habits would affect how happy I am. If this were positive right now, should I be eating more donuts, even though dhappiness, ddonuts is equal to zero? Well, yeah, if this is positive, then a small change in my donut eating habits, just one more bite of delicious donut would suddenly result in dhappiness, ddonuts being positive, which should be great, then I should just keep on eating more donuts. Contrast this with the situation of the opposite situation, where the second derivative happens with respect to donuts isn't positive, but the second derivative of happiness with respect to donuts is negative. If this is the case I absolutely should not be eating any more donuts because if I start eating more donuts, then I'm going to find that, that eating any more donuts will make me less happy. Let's think about this case geometrically. So here, I've drawn a graph of my happiness depending on how many donuts I'm eating. And here's two places that I might be standing right now on the graph. These are two places where the derivative is equal to zero. And I sort of know that I must be standing at a place where the derivative is 0, because if I were standing in the middle, I'd be eating more donuts right now. So, I know that I'm standing either right here, say, or right here. Or maybe here, or here. I'm standing some place where the derivative vanishes. Now, the question is how can I distinguish between these two different situations? Right here, if I started eating some more donuts, I'd really be much happier. But here, if I started eating some more donuts I'd be sadder. Well, look at this situation, this is a situation where the second derivative of happiness to respected donuts is positive, right? When I'm standing at the bottom of this hole, a small change in my donut consumption starts to increase the extent to which a change in my donut consumption will make me happier, alright? If I find that the second derivative of my happiness with respect to donuts is positive, I should be eating more donuts to walk up this hill to a place where I'm happier. Contrast that with a situation where I'm up here. Again, the derivative is zero so a small change in my doughnut consumption doesn't really seem to affect my happiness. But the second derivative in that situation is negative. And what does that mean? That means a small change to my donuts consumption starts to decrease the extent to which donuts make me happier. So, if I'm standing up here and I find that the second derivative of my happiness with respect to donuts is negative, I absolutely shouldn't be eating anymore donuts. I should just realize that I'm standing in a place where, at least for small changes to my donut consumption, I'm as happy as I can possibly be and I should just be content to stay there. There's more to this graph. Look at this graph again. So, maybe I am standing here. Maybe the derivative of my happiness with respect to donuts is zero. Maybe the second derivative of my happiness with respect to donuts is negative. So, I realize that I'm as happy as I really could be for small changes in my donut consumption. But if I'm willing to make a drastic change to my life, if I'm willing to just gorge myself on donuts, things are going to get real bad, but then they're going to get really really good and I'm going to start climbing up this great hill. It's not just about donuts, it's also true for Calculus. Look, right now, you might think things are really good, they're going to get worse. But with just a little bit more work, you're eventually going to climb up this hill and you're going to find the immeasurable rewards that increased Calculus knowledge will bring you.
What does the sign of the second derivative encode?
Earlier we saw how the sign, the S-I-G-N, of the derivative encoded whether the function was increasing or decreasing. Thinking back to the graph, here I've just drawn some random graph. What is the derivative encoding? Well here this point a, the slope of this tangent line is negative, the derivative is negative, and yeah, the function's going down here. At this point b, the slope of this tangent line is positive and the function's increasing through here. All right? The derivative is negative here, and it's positive here. The function's decreasing here, and increasing here. So that's what the derivative is measuring. What is the sign of the second derivative really encoding? Maybe we don't have such a good word for it so we'll just make up a new word. The S-I-G-N, the sign of the second derivative, the sign of the derivative of the derivative measures concavity. The word's concavity, and here's the two possibilities, concave up where the second derivative is positive, and concave down where the second derivative is negative. And I've drawn sort of cartoony pictures of what the graphs look like in these two cases. Now, note it's not just increasing or decreasing, but this concavity is recording sort of the shape of the graph in some sense. Positive second derivative makes it look like this, negative second derivative makes the graph look like this, and I'm just labeling these two things concave up and concave down.
And this makes sense if we think of the second derivative as measuring the change in the derivative. So let's think back to this graph again. Here's this graph of some random function. Look at this part of the graph right here. That looks like the concave up shape from before, where the second derivative was positive. So we might think that the second derivative is positive here. That would mean that the derivative is increasing. What that really means is that the slope of a tangent line through this region is increasing. And that's exactly what's happening. The slope is negative here, and as I move this tangent line over, the slope of that tangent line is increasing. The second derivative is positive here. You can tell yourself the same story for concave down. So look over here in our sample graph. That part of the graph looks like this concave down picture where the second derivative's negative. Now, if the second derivative is negative, that means the derivative is decreasing. And yeah, the slope of the tangent line through this region is going down, right? The slope starts off pretty positive over here, and as I move this tangent line over, the slope is zero, and now getting more and more negative.
So in this part of the graph, the second derivative is negative. What happens in between? Where does the regime change take place? So over here, the second derivative is negative. Over here, the second derivative is positive. There's a point in between, maybe it's right here. And at that point the second derivative is equal to zero. And on one side it's concave down, and on the other side it's concave up. A point where the concavity actually changes is called an inflection point. Alright, the, it's concave down over here, and it's concave up over here and the place where the change is taking place, we're going to, just going to call those points inflection points. It's not that the terminology itself is so important, but we want words to describe the qualitative phenomena that we're seeing in these graphs. Inflection points are something you can really feel. I mean, if you're driving in a car, you're braking, right? That means the second derivative's negative. You're slowing down. And then suddenly you step on the gas. Now you're accelerating. Your second derivative's positive. What happened, right? Something big happened. You're changing regimes from concave down to concave up and you want to denote that change somehow. We're going to call that change an inflection point.
What are extreme values?
The goal of computing is not numbers but insight. I don't care that f(2) is equal to four. I do care about the qualitative features of a function. Here's a graph of some random function I just made up. A significant qualitative feature of this graph is that right here is a valley in the graph and up here is a mountaintop. Speaking of mountaintops and valleys is maybe too metaphorical, not precise enough. Let's try to work out a better definition. So, instead of calling this a mountain top, I'm going to call that a local maximum. Let's be even a little bit more precise. Let's suppose that that maximum value occurs at the input c, where the output is f(c). I'm going to call f(c) the local maximum value of the function near the input c. Here's a precise definition, f(c) is a local maximum value for f if whenever x is near the input c, f(c) is bigger than or equal to f(x). Maybe this isn't even precise enough. A big sticking point with this is this word near. That's sort of a weasley word. I can make this near precise just like we've been doing with limits. I'll introduce sum epsilon. So f(c) is a local maximum value for the function f, if there's some small number, that's what I mean by near. So that whenever x is near c, and by near c, now, I'm in between c- epislon and c+ epsilon. This is really close to c if epsilon is real small. Okay. So that, whenever x is contained in this interval, f(c) is bigger then or equal to f(x). We can give a similar sort of definition for the valleys. Here's that same graph again and I've highlighted a local minimum on the graph of this function. Near the input c, f(c) is the smallest output for the function. A little bit more precisely, I'm calling it a local minimum value, because whenever x is near c, f(c) is less than or equal to f(x). Or even a little bit more precisely again just like for local maximums, I can replace near with espilon. So f(c) is a local minimum value for the function, if there's some epsilon measuring the nearness, so that whenever x is between c- epsilon, and c+ epsilon, f(c) is less than or equal to f(x). Sometimes, I'm going to want to talk simultaneously about local maximums and local minimums. So, we'll call either a local minimum or a local maximum a local extremum. And this is kind of a pretentious word, but it's just a word that we can use so that we can talk about both of these concepts simultaneously, because they actually share quite a few features in common. What if I want to talk about multiple extremums? Well, what if we wanted to talk about an octopus? Well, that's not really a problem, but, what if you wanted to talk about two of these things? You might not call it an octopus, you might call it an octopuses now. But, you'll find some people will get angry at you if you do that and I want you to call these octopi. I probably don't really agree with them, but, the same problem comes up with minimums and maximums and extremums. You're going to find some people who will demand that you call these minima, maxima, and extrema if you're talking about more than one of these things, but really, either is fine. The world local here is really in contrast to the world global. We'll say, in fact, everyone will say that f(c) is a global maximum value for the function f if no output of the function is larger than f(c). Maybe that's too cute of a way to say it. Here's a more precise way to say it, f(c) is a global maximum value for the function if whenever x is in the domain of f, I'm only going to be considering inputs in the domain, of course, then f(c) is bigger than or equal to f(x). Do the same deal for minimal values, f(c) is a global minimum value for the function if whenever I've got a point in the domain f(c) is less than or equal to f(x). One subtle thing to point out here is that I'm not claiming, say for global maximum values that this is the biggest output of the function. What I'm saying is that any other output isn't larger than f(c), f(c) is bigger than or equal to any other output of the function and the same deal for this global minimum. I'm not saying this is the smallest output of the function, I'm just saying that any other output is bigger than or equal to this output. Now we can see some examples of this. Of this. Here's that same graph we've been looking at so many times here. Let's try to figure out where the local extrema are. this point here is a local maximum, that's the biggest output value for nearby input values. This point here is a local minimum, right? You're sitting in that valley, that's the smallest output valley among nearby inputs. And, up here is another local max and this local maximum is also a global maximum. I mean, if you're really assuming that this graph just continues down to the left and the right-hand sides, you should also note that if you really believe this thing continues down like this, this function has no global minimum. Nobody's promising you that there is a global minimum or a global maximum, but in this case, there isn't one. There's no global minimum. There can definitely be multiple local maximums, but there can also be multiple global maximums. So here, I've drawn another graph, where I've rigged it so that these two output values are the same. So these are both local maximums, but they're also both global maximums. Alright? So in this case, these are both global and also local maximums. There's even more, dare I say it, extreme version of this. Here, I've graphed the constant function y=17 is both a global maximum and a global minimum for this constant function. The distinction between local and global maximums is really quite important, even in everyday life. When you're standing at a local maximum, on the top of this mountain, small changes to your situation just make things worse, and yet, if you're willing to go through this valley, you'll eventually come up here to what at least appears to be to global maximum of this function.
How can I find extreme values?
If we care about extreme values, it would really help to know how to find them. We can use a theorem, a theorem of Fermat. Here's Fermat's theorem. Suppose f is a function and it's defined on the interval between a and b. c, is some point in this interval, this backwards e, means that c is in this interval. Okay, so that's the set up. Here's Fermat's Theorem. If f(c) is an extreme value of the function f, and the function is differentiable at the point c, then the derivitive vanishes at the point c. It's actually easier to show something slightly different. So, instead of dealing with this, I'm going to deal with a different statement. Same setup as before, but now I'm going to try to show that if f is differentiable at point c and the derivative is nonzero, then f(c) is not an extreme value. It's worth thinking a little bit about the relationship between the original statement where I'm starting off with the claim that f(c) is an extreme value and then concluding that the derivative vanishes. And here, I'm beginning, by assuming the derivative doesn't vanish and then concluding that f(c) is not an extreme value. This is the thing that I want to try to prove now. Why is that theorem true? I'm going to get some intuitive idea as to why this is true. Why is it that if you differentiable a point c with nonzero derivative there, then f(c) isn't an extreme value? Well, to get a sense of this, let's take a look at this graph. Here, I've drawn a graph of some random function. I've picked some point and f'(c) is not zero. Differentiable there, but the derivative's nonzero. It's negative. Now, what does that mean? That means if I wiggle the input a little bit, I actually do affect the output. If I increase the input a bit, the output goes down. If I decrease the input a bit, the output goes up. Consequently, that can't be an extreme value. That's not the biggest or the smallest value when I plug in inputs near c, and that's exactly what this statement is saying. If the derivative is not zero, that means I do have some control over the output if I change the input a little bit. That means this output isn't an extreme value because I can make the output bigger or smaller with small pervations to the input. I'd like to have a more formal, a more rigorous argument for this. So, let's suppose that f is differentiable at c, and the derivative is equal to L, L some nonzero number. Now, what that really means from the definition of derivative is that the limit of f(c+h)-f(c)/h as h approaches zero is equal to L. Now, what does this limit say? Well, the limit's saying that if h is near enough zero, I can make this difference quotient as close as I like to L. In particular, I can guarantee that f(c+h)-f(c)/h is between 1/2L and 3/2L. Notice what just happened. So, I started with infinitesimal information. I'm just starting with a limit as h approaches zero. And I've promoted this infinitesimal information to local information. Now I know that this difference quotient is between these two numbers as long as h is close enough to zero. Now, we can continue the proof. So, I know that if h is close enough to zero, this difference quotion is between 1/2L and 3/2L, I'm going to multiply all this by h. What do I find out then? Then, I find out that if h is near enough zero, multiplying all this by h, I find that f(c+h)-f(c)/h*h is between 1/2hL and 3/2hL. I get from here to here just by multiplying by h. Now, what can I do? Well, I can add f(c) to all of this. And I'll find out that if h is near enough zero, then f(c+h) is between 1/2hL+f(c) and 3/2hL+f(c). Okay. So, this is what we've shown. We've shown that if h is small enough, f(c+h) is between these two numbers. Now, why would you care? Well, think about some possibilities. What if L is positive? If L is positive and h is some small but positive number, this is telling me that f(c+h), being between these two numbers, f(c+h) is in particular bigger than f(c). That means that f(c) can't be a local maximum. Same kind of game. What if L is positive but I picked h to be negative but real close to zero. Then, f(c+h) being between these two numbers, f(c+h) must actually be less than f(c). That means that f(c) can't be a local minimum. You play the same game when L is negative. Let's summarize what we've shown. So, what we've shown is this. We've shown that if a differentiable function has nonzero derivative at the point c, then f(c) is not an extreme value. And we can play this in reverse. That means that if f(c) is a local extrema, right? If f(c) is an extreme value, then either the derivative doesn't exist to prevent this first case from happening, where f is differentiable at c, or the derivatives equal to zero, which prevents this second thing, f'(c) being nonzero. So, this is another way to summarize what we've done. If f(c) is a local extremum, then one of these two possibilities occurs. Both of these possibilities do occur. Here's the graph of the absolute value function. There's a local and a global minimum at the point zero. The derivative of this function isn't defined at that point. Here's another example. Here's the graph of y=x^2. This function also has a local and a global minimum at the point zero. Here, the derivative is defined but the derivative is equal to zero at this point. We'll often want to talk about these two cases together, the situation where the derivative doesn't exist and the situation where the derivative vanishes. Let's give it a name to this phenomenon. Here we go. If either the derivative at c doesn't exist, meaning the function's not differentiable at c, or the derivative is equal to zero, then I'm going to call the point c a critical point for the function f. This is a great definition because it fits so well into Fermats theorem. Here's another way to say Fermot's Theorem. Suppose f is a function as defined in this interval and c is contained in there, then if f(c) is an extreme value of f, well, we know that one of two possibilities must occur. At that point, either the function's not differentiable, or the derivative's equal to zero. And that's exactly what we mean when I say that c is a critical point of f. So, giving a name to this phenomena gives us a really nice way of stating Fermat's Theorem. The upshot is that if you want to find extreme values for a function, you don't have to look everywhere. You only have to look at the critical points where the derivative either doesn't exist or the derivative vanishes. and you should probably also worry about the end points. In other words, if you're trying to find rain, you should just be looking for clouds. You just have to check the cloudy days to see if it's raining. In this same way, if you're looking for extreme values, you only need to look for the critical points because an extreme value gives you a critical point. So, this is a super concrete example. Here's a function, y=x^3-x. I've graphed this function. I'm going to try to find these local extrema, this local maximum and this local minimum, using the machinery that we've set up. So, if I'm looking for local extrema, I should be looking for critical points. So, here's the function, f(x)=x^3-x.
I'm looking for for critical points in this function. Those are points whose derivative doesn't exist, function is not differentiable, or the derivative vanishes. But this function is differentiable everywhere. Its derivative is 3x^2-1. So, the only critical points are going to be where the derivative is equal to zero, right? I'm looking for solutions to 3x^2-1=0. I'll write one to both sides, 3x^2=1, so I am trying to solve. I'll divide both sides by 3, x^2=1/3. So, I am trying to solve. Take the square root of both sides. x is plus or minus the square root of a third, which is about 0.577. And that let's me find these critical points, right? These are places where the derivative is equal to zero. The tangent line is horizontal and now I know the x coordinate of these two red points. Here, are the x coordinates about 0.577, and here the x coordinate is -0.577, about.
Do all local minimums look basically the same when you zoom in?
People have the idea that a local minimum means the function decreases and then increases. Here's a local minimum on the graph of this random function. And the misconception is that they all look like this. That every time you got a local minimum on one side, the function's decreasing, and on the other side the function's increasing. Plenty of local minima do look exactly like that. But, there's also plenty of pathological examples. For instance, consider this a somewhat pathological example. I'm going to define this function f as a piecewise function. If the input's nonzero, I'm going to do this. 1+sin(1/x), which makes sense since x isn't zero, time x^2. And if the input is zero, the function's output will also be zero. In this case, there's a local minimum at zero. How do I know? Well, here's how I know. Let's take a look at this function The claim is that f(x) is never negative. How do I know that? Well, what do I know about sine? Sine of absolutely anything at all, no matter what I take this sine of, is between -1 and 1. Now, if I add 1 to this, 1 plus sine of absolutely anything at all, is between zero and two. Now, that's pretty good. Now, think back to the definition of this function. Here, I've got 1 plus sine of something, it doesn't matter what, alright? 1 plus sine of anything, this is between zero and two. Now, I'm multiplying it by x^2. What do I know about x^2? Well, x^2 is not negative. It could be zero, it could be positive. But no matter what x is, x^2 is not negative. Now, I'm multiplying 1+sin(1/x), this number which is trapped between zero and to, by x^2 which is never negative. And that means f(x) is not negative as long as x isn't equal to zero, alright? As long as x isn't equal to zero. I mean, this first case and this is a non-negative number times a non-negative number, so the product is also non-negative. Now, the other possibility, of course, is that I plug in zero for x. But then, f(0) is just by definition zero. And that means in either case, no matter that I plug in for x, f(x) is never a negative. Now if f(x) is never negative and f(0)=0, then I know that this must be the smallest possible output value for the function. The only numbers that are smaller than zero are negative numbers, and the output of this function is never negative. But this isn't the usual sort of local minimum where the function just decreases and then increases. Well, here's the graph for our funciton f. And, there is a local minimum at zero, but if I start zooming in, no matter how much I zoom in, there's no little region on which the graph is just decreasing and then increasing. The graph is always wiggling. The upshot here is that decreasing and then increasing is one way to produce a local minimum, but it's not the definition of a local minimum. And not every local minimum arises in that exact way. What a local minimum means is just that no nearby output value is smaller than that local minimum value.
Không có nhận xét nào:
Đăng nhận xét