Thứ Năm, 9 tháng 6, 2016

Welcome to Calculus One Jim Fowler, P3

Why would I care to find the derivative?


Why is sqrt(9999) so close to 99.995?


Sometime in the future, we're going to see the following derivative  rule. But, I want to mention it now just so we  can see an example of how derivatives play out in practice.  The derivative of the √x is 1/2√x. You might already believe this if you  believe the power rule, right?  The derivative of x^n is nx^n-1. So, if n is 1/2, then I've got that the derivative  of x^n, now 1/2, is n, 1/2, times x^n-1. And conveniently, 1/2-1=-1/2.  This is really the same as this, it's just written here with the square root symbol instead of with the exponents. We can use this derivative rule to help  explain certain numerological coincidences.  Let's take a look. Look, the √9999 is 99.9949998, okay, so  it keeps on going forever. It's irrational.  But, this is bizarrely close to 99.99 instead of 499 just saying it's close to 99.995. Is this just a coincidence? This isn't a  coincidence. Look.  The square √10000 is 100 because 100^2 is 10,000.  What I'm really doing here is wiggling the input.  I'm going from 10,000 to 9,999. In other words, I'm trying to calculate  the √10000-1, wiggling the input down a bit.  What does the derivative calculate? Well, the derivative calculates the ratio  between output change to input change. So, the √10000 wiggled down a little bit  is about the √10000 minus how much I change the input by times the ratio of  how much I expect the output change compared to the input change. Now, we can try to calculate the derivative at 10,000.  What's the derivative at 10,000? Well, it's 1/2√10000.  
The √10000 is 100, so it's 1/2*100. 1/2*100 is 1/200, which is 0.005.  Look, the √9999 is so close to 99.995 because the √10000 is 100.  And, when I shift the input down by one, this derivative calculation is suggesting  that the output should be shifted down by about 0.005, and indeed it is. This is a great example of calculus. Yes, you could have asked your calculator  to compute the √9999, but you couldn't have asked your calculator to tell you  why. Why is that answer so mysteriously close to 99.995?  In short, calculus is more than calculating.  It's not about answers, it's about reasons. It's about explanations about  the stories that human beings can tell to each other about why that number and not  another. But, that's not say that the numbers  aren't fun to play with themselves, and we can use this same trick to do other  amazing feats like · , we can try to estimate the √82.  I know the √81 is 9. I'm trying to say something about the  √82. I'm trying to wiggle the input up a  little bit. Well, derivatives have something to say  about that. The √81+1, the √82, would be about the  √81, which is 9, plus how much I expect the output to change.  I wiggled the input, I expect the output to change by some  amount. Well, the derivative is measuring how much I expect the output to change by. So, I'm going to take the derivative of  the function at 81, at the square root function at 81, I'm going to multiply by  how much I'm wiggling the input by. This will be how much I expect the output to  change when I change the input. Now, in this specific case, what's the  derivative at 81? Well, that's 1/2√81, which is 1/2*9. The √81 is 9, which is  1/18. So, I would expect the √82 to be about 9+1/18 because I expect wiggling  the input up to wiggle the output up by about 1/18.  And this is pretty good. There's actually two different ways to  tell if this isn't such a bad guess. Here's here's one way to tell.  What's what's 1/18? Well, it's 0.05 repeating. And, what's the actual value of the √82? It's 9.055.  Look, it's pretty close to 9 plus this. That's pretty good.  Another way to see that this isn't such a bad guess is just to take 9+1/18, and  square it. When I square 9+1/18, I get  9^2+2*9/18+1/1/8^2. 2*1/2=1.  


What information is recorded in the sign of the derivative?


In the future, we're going to have a lot of very precise statements  about the derivative. But before we get there, I want us to have some intuition  as to what's going on. Let's take a look at just the S, I, G, N,  the sign of the derivative. The thick green line of the plot is some  random function, and the thin red line is its derivative. And note that when its  derivative is positive, the function's increasing, and when the derivative is  negative, the function's decreasing. We can try to explain what we're seeing  here formally, where that calculation on paper.  So let's suppose that the derivative is positive over a whole range of values.  And, we also know something about how the derivative is related to the functions  values. The function's output or x+h is close to  the functions output of x plus how much the derivative tells us the output should  change by, which is how the input changed by times the ratio change of output  change to input change. Alright. Now let's suppose that x+h is a bit bigger than than x. Well, what that's really saying is that h  is positive, right? I shift the input to the right a little  bit. Well then, h*f'(x) is going to be  positive because a positive number times a positive number is positive.  And that means that f(x)+h*f'(x) will be bigger than f(x).  We're just add something to both sides of this inequality.  Now, f(x)+h*f'(x), that's about f(x)+h. So, although this argument isn't entirely  precise yet, what it looks like it's saying is that the function's output at  x+h is bigger than the function's output at x.  So, if you plug in bigger inputs, you get bigger outputs. What about when the derivative is negative?  We can play the same kind of game when the derivative's negative.  Here we go. So again, x+h is just a bit bigger than x.  And in that case, h is positive. But I've got a positive number times a  negative number, h times the derivative of f is negative.  Now, if I add f(x) to both sides, got that f(x)+h*f'(x) is less than f(x).  
But this is approximately the new output value of the function at x+h.  So, I've got that the function's output at x+h is a little bit less than its  output at f. So, a bigger input is giving rise to a  smaller output. Even a little bit of information, whether  or not the derivative is positive or negative, says something about the  function. And you can see the same thing in your  own life. For instance, suppose that the derivative  of your happiness with respect to coffee is positive.  What does that really mean? Well, that means that you should be drinking more  coffee because an increase in coffee will lead to greater happiness.  Of course, this is only true up to a point.  After you've had a whole bunch of coffee, you might find that the derivative of  your happiness, with respect to coffee, is zero.  You should stop drinking coffee. Now, this makes sense because the  derivative depends upon x, right? It depends upon how much coffee you've had.  Not very much coffee, the derivative might be positive. But after a certain  point, you might find that the derivative, vanishes.  This seems like a silly example. coffee and happiness.  But, so many things in our world are changing.  And those changing things affect other things.  The question is that when one of those things changes, does the other thing move  in the same direction or do they move in opposite directions? And the sign, the S,  I, G, N of the derivative records exactly that information.

How do differentiability and continuity relate?


Why is a differentiable function necessarily continuous?

. Remember, continuity is all how nearby  inputs are sent to nearby outputs. Differentiability is how wiggling the  input affects the output. In light of this, they seemed related,  right? Something like the following seems  plausible. Here's the theorem.  Theorem. If f is differentiable at a, then f is continuous at a. In other words, a differentiable function  is continuous. Morely, we know that a differentiable  function is continuous. But were advanced enough at the is point  in the course to give a precise argument using limit.  Here we go. Let's suppose that f prime of a exists.  In other words, that means a certain limit exists.  What limit? Well, the limit of f(x)-f(x)/x-a as x approaches a. This limit of a difference quotients computes the derivative for the function  at a. So, to say that the derivative exists is to say that this limit exists.  Now, here comes the trick. What I'd like to compute is the limit of  f(x)-f(a) as x approaches a, but I don't know how to do that directly.  But, I can rewrite this thing I'm taking the limit of as a product.  Watch. Instead of taking this limit, I'm going  to take the limit as x approaches a of x-a times this difference quotient, times  f(x)-f(a)/x-a. Now, as long as x isn't equal to a, this  product is equal to this difference. Now, why does that help? Well, this is a  limit of a product. So, by one of the limit laws, the limit  of a product's the product of the limits as long as the limits exist.  And in this case, they do. So, this limit of this product is, the  product of the limits. It's the limit of x-a as x approaches a,  times the limit of f(x)-f(a)/x-a. I'm only allowed to use this limit law  because I know both of these limits exist.  Now, this first limit, the limit of x-a as x approaches a, that's 0.  And this second limit, well, this limit exists precisely because I'm assuming  differentiability, the function's are differentiable.  So, this limit is calculating the derivative at a, and zero times any number is equal to zero. The upshot here is that we've shown that  the limit of f(x)-f(a)=0 as x approaches a.  Why would you care about this? How does that help us? We know that the limit of f(x)-f(a) as x approaches a is equal to 0.  What that means is that the limit of f(x) as x approaches a is equal to f(a), but  this is just the definition of continuity.  So now we know that f is continuous at the point a.  That's where we ended up. Remember, what we started with.  We started by assuming that f was differentiable at a.  And after doing all this work, we ended up concluding that f is continuous at the  point a. So, differentiability implies continuity.  One way to keep track of arguments like this is to think about clouds and rain.  Theorem. If it is rainy, then it is cloudy.  A shorter way of saying this, rainy implies cloudy.  Now the question is, does it go the other way? If it's cloudy, is it necessarily  rainy? Can you think of a cloudy day with no rain? Yes, today.  Let's look out the window. It is very cloudy but there's no rain.  Beyond clouds and rain, let's bring this back to the mathematics. A differentiable  function is continuous, can you think of a continuous function which isn't  differentiable? You might want to hit pause right now,  if you don't want the puzzle given away. Here's an example of a function which is  continuous but not differential, the function f(x)=|x|.  
We recently saw that the absolute value function wasn't differentable at zero.  But how do we know that the absolute value function is continuous everywhere?  We know that the absolute value function is continuous.  I mean, look at it. It's all one piece.  But, we can do better. We can use our limit knowledge to make a  more precise argument. We know that f(x)=|x| is continuous for positive inputs. It's continuous on the open interval from  zero to infinity because the function x is continuous there.  And this function, the absolute value function, agrees with the function x if I  plug in positive numbers. Likewise, I know that the function is  continuous on negative inputs because the function -x is continuous there, and the  function -x agrees with this function on this interval. The only sticking point is to check that the function's continuous at zero. And if  I know it's continuous for positive inputs, negative inputs, and it's  continuous at zero, then I know that it's continuous for all inputs.  Now, how do I know that the absolute value function is continuous at zero?  Well, that's another limit argument, right? The limit of the absolute value  function when I push from the right-hand side is the same as the limit of the  absolute value function when I push from the left-hand side, they're both zero.  And because these two one-sided limits exist and agree, then I know the  two-sided limit of the absolute value function is equal to zero,  which is also the function's value at zero.  And therefore, the abslute value function is continuous.  In the end, there's some relationship between differentiability and continuity.  Differentiable functions are continuous. Mathematics isn't just a sequence of  unrelated concepts. It's a single unified whole.  All of these ideas are connected at the deepest possible levels. 

What is the derivative of a constant multiple of f(x)?


Here is the question that I want to address right now.  What's the derivative of some constant multiple of some function? Now, in this  case the constant multiple is 2 but of course that 2 could be replaced by any  sixth number. If you don't like the D D X notation,  another way to ask this question is this. If you've got some new function G and  it's the constant multiple times F, again in this case I'm using 2 as the constant,  the question is, what's the derivative of G in terms of the derivative of F?  To gain some intuition, let's pick a specific example, and look at a graph.  So here's the graph, of, just some random function.  Let's suppose that I stretch this graph, in the y direction.  So now I stretch the y axis, and that corresponds to multiplying the function  by a constant value. In this case, to,  how do the tangent lines change when I do this stretching? So if I double the Y  axis, the function changes by twice as much for the same input change.  So if I double the Y axis, the slope of the tangent line also doubles and that  makes sense numerically. Here's what I know.  I know that g of x is twice f of x. G is this constant multiple of f.  I also know something about the derivative of f.  The derivative encodes how input changes become output changes, or, a bit more  precisely, the derivative in the limit is the ratio of output change to input  change. So if I multiply the ratio of output  change to input change by an actual input change, this at least approximately is  telling me how much the output should change when I move from x to x plus h.  Right, f's new output at the input x plus h.  Is it's old output plus how much I expect the output to change.  This is a really nice way to summarize what the derivative's saying.  I know another thing. I know that g of x plus h is twice f of x  plus h just because g is twice f for any input value x, so in particular that's  true when the input is x plus h. These two statements are connected.  Alright, G of X plus H is twice F of X plus H and F of X plus H is approximately  this. So I can combine those two statements  together in this statement. G of X plus H is about twice f(x)+h is  approximate value, right. 2f(x)+2h*f-prime(x), which I've written  as h*2f-prime(x). I made this a little bit nicer.  Since 2f(x) is g(x), I can replace this 2*f(x), with g(x).  And this is really looking good. This is telling me that g's output at x+h  is about, g's output at x, plus how much I change the input by, times some  quantity. Now, considering that the actual derivative of g would tell me some information like this.  That g's output is about, g's old output plus how much I change the input by times  the derivative. You're beginning to see what's going on here, right? Look, I've got the derivative of g here.  And I've got twice the derivative of f here.  And if you sort of believe that these statements are connected in this way, you  might then believe that the derivative of g is twice the derivative of f.  The derivative of g here is twice the derivative of f.  We can formalize this as a rule. Here's the constant multiple rule.  So let k be constant and suppose that f is just some function which is  differentiable at the point a. G is that constant multiple of f, so g of x is k*f(x). Given this setup, what the constant  multiple rule is concluding, is that the derivatives are related in the same way.  The derivative at the point A is K times the derivative of F at the point A. Now if you don't like this prime notation you could also write it using the D D X  notation. So here's how I write the constant  multiple law using the D D X notation. The derivative of k times a function is k  times the derivative of that function. I encourage you to keep practicing.  With time, you'll be able to calculate the derivative of just a ton of different  functions.  

How do I find the derivative?


Why is the derivative of x^2 equal to 2x?


We're going to calculate the derivative of x squared with respect to  x. maybe a little bit more prosaically, I  want to know how wiggling x affect x squared?  There's a ton of different ways to approach this.  Let's start by looking at this numerically.  So let's start off by just noting that 2 sqyared is 4,  and I'm going to wiggle the 2 and see how the 4 wiggles.  Instead of plugging in 2, let's plug in 2.01.  2.01 squared is 4.0401. And let's just keep on going with some  more examples. 2.02 squared is 4.0804.  2.003 squared, say, is 4.012009. Alright, so those are a few examples.  I've wiggled the inputs, and I've seen how the outputs are affected.  And of course the, all the outputs are close to 4, alright? But they're not  exactly 4. When I wiggled the input from 2 to 2.01,  the output changed by about .04, and a little bit more, but that'ts a lot  smaller. When I wiggled the input from 2 to 2.02,  the output changed by about .08, not exactly .08, but pretty close to .08.  And when I wiggled from 2 to 2.003, the output changed by about.  About .012 and a little bit more, but, you know, it's close.  Now look at the relationship between the input change and the output change. The input change by .01, the output change by 4 times as much, about.  The input change by .02, the output change by 4 times as much.  The input change by .003, the output change by about 4 times as much.  I'm going to summarize that. The output change is the input change  magnified by 4 times. Right? The input change by some factor.  And the output change by about 4 times that amount.  Let's see this at a different input point.  Instead of plugging in 2, let's plug in 3 and see what happens.  So 3^2 is 9, but what's, say 3.1^2? That's 9.61.  Or what's 3.01^2? Well, that's 9.0601. maybe wiggle down a little bit.  What's 2.99 squared? That's close to 3 but wiggling down by .01. That's 8.9401. Let's see how much roughly the output  changed by. When I went from 3 to 3.1 the output  changed by Out point 6. When I went from 3 to 3.01, the output  changed by about .06, and when I went from 3 down to 2.99, the output when down  by about .06 again. Little bit less.  Now what's the relationship between the input change and the output change? Well  here the input changed by .1, the output change by .6, about 6 times as much.  Again, the input change by .01, the output changed by About six times as  much. And when the input went down by .01 the  output went down by about six times as much.  So again, we're seeing some sort of magnification of the output change to the  input change, but now it's magnified not by four times But by six times.  So the important lesson here is that the extent to which wiggling the input  affects the output depends on where you're wiggling.  If you're wiggling around 2, the output is being changed by about four times as  much. If you're wiggling the input around 3,  the output is being change by about six times as much.  Instead of doing just a few numerical examples, let's generalize this by doing  some algebra. So, I'm starting with x^2 and I'm going  to wiggle x and see how x^2 is effected. So, instead of plugging in x, I'll plug  in x + something, let's call the change in x, h.  Now I want to know, how is this related to x ^ 2? Well I can expand out (x+h)^2,  that's x^2+2xh+h^2. So when I wiggle the input from x to x+h,  how is the output being affected? Well the output, is the old output value plus  this change in output value 2xh+h^2, h^2 is pretty small.  When h is small, h^2 is really small so I'm going to throw that away for now.  And just summarize this by saying that the output change is 2xh and the input  change. Is h.  Now the derivative is supposed to measure the relationship between the output  change and the input change. So I'm going to take the ratio of the  output change to the input change, and 2xh/h=2x, as long as h isn't 0.  This is the ratio of output change to input change and that makes sense, right?  Think back to what just happened here a minute ago, when we were plugging in some  nearby values and seeing how the outputs were affected.  When I was wiggling the input around 2, the output was changing by about twice 2.  When I was wiggling the input around 3, the output was changing by about twice 3,  alright? 2x is the ratio of output change to input change.  If the algebra's not really speaking to you, we can also do this geometrically,  like drawing a picture. Here's a square of side length x.  The area of this square is not, coincidentally, x^2.  Now I want to now the derivative of x^2 with respect to x.  I want to know how changing x would affect the area of this square.  Now to see this here is another square. This is a slightly larger square of side  length x+h. h is a small but positive number.  So how does the area of this new square compare to the area of this old square?  Let me put the old square on top of the new square, and you can see that when I  change the input from x to x+h, I gained a bit of extra area.  The derivative is recording the ratio of output change to input change.  So, I want to know what's the ratio of this new area as compared to just the  change in the input H. So, let me pull off the extra area.  There is extra area, is this L shaped region.  How big is this L shaped region? Well, this short side here, has side length h.  This side length here, is also h. This is the extra length that I added when I went from x to x+h. This inside has length x, and this inside  edge has length x. Now I want to know the area of this  region. To see that, I'm going to get out my  scissors and cut this region up into 3 pieces.  Now here's one of those pieces. And, here's another one of those pieces.  And, here's the third piece. So these are the 2 long thin rectangles,  and they've both got height h, and length x. I'm also left with this little tiny corner piece.  And that little tiny corner piece has side length h, and the other side is also  length h. It's a little tiny square.  Well the limit, this little tiny corner piece, is infinitesimal.  I'm going to throw this piece away and most of the area is left in these 2 long,  thin rectangles. If I rearrange these long, thin  rectangles a bit, can put them end to end.  They've both got height h. So I can put them next to each other like  this. And their base is both length x.  So how much area is in this long thin rectangle? Well, it's height h, it's  width is 2x. So the area is 2x * h. Now this is the additional area, excepting for that little tiny square,  which we gained when I changed the size of the square from x to x + h.  So the change in output is about 2 * x * h.  The change in input was h, so the ratio of output change to input change Is 2 *  x. Maybe what we're doing here seems a  little bit wishy washy, not really precise enough.  But we can also calculate the derivative of x ^ 2 with respect to x, by just going  back to the definition of derivative in terms of limits.  Carefully, f of x is x^2. And the derivative of f is by definition  the limit as h approaches 0. F of x plus h minus f of x, the change in  output divided by h, the change in input. In this case f of x plus h is just x plus  h squared and f of x is just x squared. I'm dividing by h.  I can expand this out. This is the limit as h approaches 0 of  ((x+h)^2-x^2)/h. Now I've got an x^2-x^2, so I can cancel  those, and I'm just left with the limit, as h approaches 0, of (2xh+h^2)/h.  More good news, in the limit I'm never going to be plugging h=0, so I can  replace this With an equivalent function that agrees with it when h is close to  but not equal to 0. In other words, maybe a little bit more  simply, I'm canceling an h from the numerator to the denominator.  So, 2xh over h is just 2x and h^x over h is just an h.  Now what's the limit of this sum? Well that's the sum of the limits.  It's the limit of 2x as h approaches 0 + the limit of h as h approaches 0.  Now as far as wiggling h is concerned, 2x is a constant, so the limit of 2x as h  approaches 0 is just 2x. And what's the limit of h as h approaches  0? Well, what's h getting close to when h is close to 0.  That's just 0. So, this limit is equal to 2x and that's  the derivative of x^2. What that limit is really calculating is  the slope of a tangent line at the point x and we can see that it's working.  This is the graph of y=x^2. At -4 the slope of the tangent line is -8  At 2, the slope of the tangent line is 4, and, at 6, the slope of the tangent line  is 12. There's a ton of different perspectives  here. We've been thinking about the derivative  of x ^ 2 with respect to x, numerically, algebriaically, geometrically, going back  to the definition of derivative in terms of limits, looking at it in terms of  slopes of tangent lines. What makes derivatives so much fun is  that there just so many different perspectives on this single topic, no  matter how you slice it. We've shown that the derivative of x  squared with respect to x is 2 times x. Maybe you like algebra, maybe you like  geometry, maybe you just like to play with numbers.  But now matter what your interests are, derivatives have something to offer you. 

What is the derivative of x^n?




Here is the so-called power rule for differentiating x^n.  Nevertheless, here we go. When n=1, the derivative of just x^1,  which is just x, is equal to 1. This should make sense because what's the derivative measuring? The derivative is measuring output change compared to input  change. And, in this case, the function is just the function that sends x to x.  The input and the output are exactly the same.  So, the input and the output change is exactly the same,  their ratio is just 1. And consequently, the derivative of x,  the derivative of the identity function is 1.  For the time being, we're just going to think about this when n is a positive  whole number. But even there, it's pretty tricky.  Admittedly, when n=1, you're probably going to be pretty unimpressed.  The derivative of x^n is n*x^n-1. What's n? n can be any real number except  for zero. You should think about what you don't  want to plug in zero for n. When n=2, that means we're  differentiating x^2, which we studied a little bit ago.  Now, here is the power rule. If I plug in 2 for n, I've got the  derivative of x^2=2*x^2-1. Or a bit more nicely written, the  derivative of x^2=2x. I really remember, we really did study  this in quite some detail, you know, algebraically, numerically,  geometrically. When n=3, we can still study the  derivative of x^3 in a geometric way. So, here's the power rule.  You plug in n=3, and you get the derivative of x^3=3*x^3-1, 3*x^2.  We can see this geometrically. We start with a cube of side length x.  And we're going to glue on three green slabs of side length x, x, h.  Now, in order to actually thicken up the cube, we've got to glue on a few more  pieces, these blue pieces and this red corner piece.  But once we've done that, now we've built a cube of side length x+h.  How is the volume changed? Well, most of the change in volume happened in these  three green slabs, and those three green slabs have volume 3x^2h.  The change in the side length of cube is h.  Geometric argument is showing us that the derivative of x^3 is 3*x^2.  When n=4, we're trying to differentiate x^4.  But that would involve not a cube, but a hypercube.   It seems a bit ridiculous to try to gain intuition about the derivative of  x^3 by doing something as esoteric as studying 4-dimensional geometry.  So instead, let's differentiate x^3 directly by going back to the definition  of derivative. So, let's proceed directly. I want to  compute the limit as h approaches 0 of x+h^4-x^4/h.  


What is the derivative of x^3 + x^2?


Now this is a derivative of a sum, which is the sum of derivatives.  Now we have to figure out what's the derivative of x ^ 3 and what's the  derivative of x^2. That's the power law.  derivative of x^3 is 3x^2, and the derivative of x^2 is 2x.  And now there's no more d/dx's. We've calculated the derivative.  The derivative of x^3+x^2+2x. Once we know this, we can figure out  where the derivative is positive and where it's negative.  So, the derivative was 3x^3+2x and I want to know where that's positive and  negative, which values of x make that bigger than 0, which values of x make  that less than 0. one approach to thinking about this is to  factor through the x^2+2x. I can write that as x(3x+2).  And once I factor it like this I can figure out the SIGN of this by figuring  out the SIGN of these two terms separately.  visualize this as a direct whole number line.  So x, here's a number line. X is positive when it is bigger than 0  and negative when X is less than 0. That's not too complicated.  Well look 3x+2. Well, 3x+2, I draw a number line for  that. The exciting point is -2/3. When x is less than -2/3, 3x+2 is negative.  And when x is bigger than -2/3's, then 3x+2.  Is positive. Now, I really don't care about x and 3x+2  separately. I want to put them together right.  I want to know when their product is positive or negative.  So, I write down the product x * 3x+2 make a new number line here.  I'll record both of these points -2/3 and 0 then I can think about what happens. When x is less than -2/3 then x is negative and 3x+2 is negative.  So the product is a negative * a negative, which is positive.  When x is between -2/3 and 0, then x is negative but 3x+2 is positive and a  negative times a positive number is negative.  And finally, when x is bigger than 0, well then x is positive and also 3x+2 is  positive. So the Is positive.  So, here on this number line, I've recorded the information about when 3x^2  + 2x is positive or negative. Now we can use this information to say something about the graph. Here's the graph of the function x^3+x^2.  Goes up, down and up, And that's exactly what you'd expect from the derivative,  right? We calculated before that if you're standing to the left of - 2/3's,  then the derivative was positive, and indeed the functions going up. Up. Now once you get to -2/3, the derivative  is 0 but then over here, between -2/3 and 0 the derivative is negative and indeed  the graph is moving down until you get to 0 when the derivative of this function is  positive again and the graph is going up. Look, the sign of the derivative  Positive, negative, positive is reflected in the direction that this graph is  moving. Increasing, decreasing, increasing.  Incredible. By being able to differentiate x^2+x^3,  we're able to gain real insight into the graph of the function.  We're not just plotting a whole bunch of points and hoping that we can fill it in  with a straight line. By looking at the derivative, we know  that the function is increasing and decreasing.  We're able to say something, for sure.

Why is the derivative of a sum the sum of derivatives?

That's a As a limit is x goes to a How do I calculate that limit.  Okay. I'm just applying these, these rules for  calculating limits and one for the rules of calculating limits is the limit of the  sum is the sum of the limits provided the limits exist.  What are these 2 limits? Well, this is really the derivative of f(a) and this is  really the derivative of g(a). And I assume that f and g are both  differentiable at a. So, those limits do, do exist and I can  apply the limit of the sum and the sum of the limits.  So this = the limit as x goes to a of f(x)-f(a)/(x-a) + the limit as x goes to  a of g(x)-g(a)/x-a, because I know those 2 limits exist.  And I even know what they're equal to, right? I have a name for those 2 limits. This 1st limit is the derivative of f at a, this 2nd limit is the derivative of g  at a. So this is f prime of a plus g prime of a  and that's exactly what I wanted to show,right? I wrote down the definition  of derivative of h at the point a, there is is and I applied properties of limits  until I conclude that that limit is equal.  To the derivative of f(a) + the derivative of g(a), alright? And this is  what tells me how to calculate the derivative of a sum.  Alright, if I've got a sum of 2 functions, this is telling me that as long as those 2 functions are both differentiable at a, I can calculate the  derivative by just adding together the derivatives of f and g.  And hopefully this, this should seem reasonable, right? Because what is the  derivative measuring, right, it's measuring how much change in the input  changes the output. Right, I want to know how much wiggling  the input a, would effect the output of H, and that's what this derivative is  measuring. Right? Well, that's really going to be,  you know, somehow connected to how wiggling the input to f changes f and  wiggling the input to g changes g and I'm just adding them together.  So I think this makes sense that, then the, how the output changes which would  be the sum of how these 2 component functions change.

How do we compute derivatives?



What is the derivative of f(x) g(x)?



Morally, why is the product rule true?


We've used the product rule to calculate some derivatives.  We've even seen a proof using limits, but there's still this nagging question,  why? For instance, why is there this + sign in the product rule? I mean, really,  with all those chiastic laws, the limit of a sum is the sum of the limits, limit  of products is the product of limits, you'd probably think the derivative of a  product is the product of the derivatives, I mean, you think that if  you differentiated a product, it'd just be the product of the derivatives.  No, that's not how products work.  What happens when you wiggle the terms in a product? We can explore this  numerically, so play around with this.  I've got a number a and another number b, and I'm multiplying them together to get  some new number, ab. initially, I've said a=2 and b=3, so  ab=6. But now I can wiggle the terms and see  how that affects the output. So what if I take a and move it from 2 to  2.1? Well, that affects the output, the output is now 6.3. Conversely what if I move that back down and I move b from 3 to 3.1? Well, that  makes the output from 6 to now 6.2. The deal here is that wiggling the input  affects the output by a magnitude that's related to the size of the other number,  right? When I went from 2 to 2.1, the output was affected by about three times  as much, the 3. When I moved the 3 from a 3 to a 3.1, the  output was affected by about two times as much and these affects add together.  What if I simultaneously move a from 2 to 2.1 and move b from 3 to 3.1, then the  output is 6.51, which is close to 6.5 which is what you guessed the answer  would be if you just add together these effects.  We can see the same thing geometrically. Geometrically, the product is really  measuring an area. So let me start with a rectangle of base  f(x) and height g(x). The product of f(x) and g(x) is then the  area of this rectangle. Now, I want to know how this area is  affected when I wiggle from x to say x+h. So lets suppose that I do that.  Let's suppose that I slightly change the size of the rectangle, so that now the  base isn't f(x) anymore, it's f(x+h) and the height isn't g(x) any more, it's  g(x+h). Now, how does the area change when the  input goes from x to x+h? Well, that's exactly just computing this  area and this L-shaped region here. I can do that approximately.  I actually know how much the base changes approximately, by using the derivative,  right? What's this length here approximately? Well, the derivative of f  at x times the input change is an approximation to how much the output  changes when I go from x to (x+h). So this distance is approximately f prime  of x times h. Same deal over here.  When the input goes from x to x+h, the output is changed by approximately the  derivative times the input change, so this length here is about g prime of x  times h. Now, I'm trying to compute the area of  this L-shaped region to figure out how the area, the product changes when I go  from x to x+h. Let me cut this L-shaped region up into  three pieces. This corner piece is pretty small, so I'm  going to end up disregarding that corner piece.  but let's just look at these two big pieces here.  This piece here is a rectangle and what's its area? Well, its base is f(x) and its  height is g prime of x times h. So the area of this piece, is f(x) times  g prime of x times h. What's the area of this rectangle over  here? Well, its base is f prime of x times h and its height is g(x), so the  area of this piece is f prime of x g of x times h Now, I want to know how did the  area change when I went from x to x+h? Well, that's pretty close to the, the sum  of these two rectangles. So the change in area is about f of x  times g prime of x times h plus f prime of x times g of x times h.  The derivative is the ratio of output change, which is about this, to input  change, which in this case is h. I went from x to x+h.  So now, I can cancel these h's, and what I'm left with is f of x times g  prime of x plus f prime x times g of x. That's the product rule.  That's the change in the area of this rectangle when I went from x to x+h  divided by how much I changed the input h.  The power rule isn't something that we just made up.  It's not some sort of sinister calculus plot designed to turn your mathematical  dreams into nightmares. This rule, the product rule, arises for  understandable reasons. If you wiggle one of the terms in a  product, the effect on the product has to do with the size of the other term.  You add together these two effects and then you have some idea as to how the  product changes based on how the terms change.  This is more than just a rule to memorize. It's more that just a algorithm to apply. The product rule is telling you something  deep about how a product is effected when it's terms are changed.

How does one justify the product rule?


We can drive the product rule by just going back, to the definition of  derivative. So what is the definition of derivative  say? It tell us that the derivative of the product of f of x and g of x is a  limit. It's the limit as h approaches zero.  Of the function at x+h, which, in this case, is the product of f and g, both  evaluated at x+h, because I'm thinking of this as the function, so I'm plugging in  x+h, and I subtract the function evaluated at x, which is just f(x)*g(x),  and then I divide that by h. So, it's this limit of this difference  quotient, that gives me the derivative of the product.  How can I evaluate that limit? Here's the trick, I'm going to add a disguised  version of zero to this limit. Instead of just calculating the limit of  f(x+h)g(x+h)-f(x)g(x), I'm going to subtract and add the same thing.  So here, I've got f(x+h)*g(x+h), just like up here.  Now I'm going to just subtract f(x+h)*g(x+h), and then add it back in,  plus f(x+h)*g(x). This is just zero,  I haven't done anything. And I'm going to subtract f(x)*g(x) right  here and I'm still dividing by h. So these are the same limits,  I haven't really done anything, but I've actually done everything I need.  By introducing these extra factors, I've now got a common factor of f(x+h) here  and a common factor of g(x) here. So, I can collect those out and I'll get  some good things happening as a result. Let's see exactly how this happens.  So this is the limit as h goes to zero, I'm going to pull out that common factor  of f(x+h) . And I'm going to multiply by what's left  over g(x+h)-g(x) and I can put it over h.  So that's these two terms. Now, what's left over here? I've got a  common factor of g(x). And what's left over? f(x+h)-f(x) I'll  divide this by h, and then the factor I pull out is g(x).  So this limit is the same as this limit. Now this is a limit of a sum.  So that's a sum of the limits provided the limits exist and we'll see that they  do. So this is the limit as h goes to zero of  f(x+h)*g(x+h)-g(x)/h plus the lim as h goes to zero of f(x+h)-f(x)/h*g(x).  
Now what do I have here I've got limits of products which are the products of  limits providing the limits exist, and they do and we'll see, so let's rewrite  these limits of products as products of limits. This is the limit as h goes to zero of f(x+h) times the limit as h goes to zero  of g(x+h)-g(x)/h. You might begin to see what's happening  here, plus the limit as h goes to zero of  f(x+h)-f(x)/h times the limit as h goes to zero of g(x).  Okay, now we've got to check that all these  limits exist, in order to justify replacing limits limits.  But these limits do exist, let's see why?  This first limit, the limit of f(x+h) as h goes to zero, it's actually the hardest  one I think, of all these to see. Remember back, we showed that  differentiable functions are continuous. This is really calculating the limit of f  of something, as the something approaches x.  And that's really what this limit is, and because f is continuous, because f is  differentiable, this limit is actually just f(x).  But I think seeing that step is probably the hardest in this whole argument.  What's this thing here? Well, this is the limit of the thing that calculates the  derivative of g, and g is differentiable by assumption.  So, this is the derivative of g at x plus, what's this limit? This is the  limit that calculates the derivative of f, and f is differentiable by assumption,  so that's f (x)'. This is the limit of g(x), as h goes to  zero. This is the limit of a constant.  Wiggling h doesn't affect this at all, so that's just g(x).  And look at what we've calculated here. The limit that calculates the derivative  of the product is f(x)*g'(x)+f'(x)*g, that is the product rule.  What have we really shown here? Well, here is one way to write down the product rule very precisely. Confusingly, I'm going to define a new  function that I'm calling h. So h is just the product of f and g now,  h(x)=f(x)*g(x). If f and g are differentiable at some  point, a, then I know the derivative of their product.  The derivative of their product Is the derivative of f times the value of g plus  the value of f times the derivative of g. This is a precise statement of the  product rule, and you can really see, for instance, where this differentiability  condition was necessary. In our proof, at some point in the proof  here, I wanted to go from a limit of a product to the product of limits.  But in order to do that, I need to know that this limit exists.  And that limit is exactly calculating the derivative of G.  So you can really see where these conditions are playing a crucial role in  the proof of the product rule.  

What is the quotient rule?


Given what we've done so far we can differentiate a bunch of functions.  We can differentiate sums and differences and products.  But what about quotients. Given a fraction I'd like to be able to differentiate that fraction. I like to be able to differentiate a  really complicated looking function like f(x)=2x+1/x^2+1, for instance.  But we're stuck immediately because we don't have anyways to differentiate  quotients, until now. Here's the The quotient rule so to state  this really precisely let's suppose I got two functions f and g and then I define a  new function that I'm just going to call h for now.  H(x) is this quotient f(x) over g(x). Now I also want to make sure that the  denominator isn't 0 at the point a so it makes sense to evaluate this function at  the point a. And I want ot assume that f and g are  differential at the point a. And I'm trying to understand how h  changes so I'm going to need to know how f and g change when the input wiggles a  bit. Alright so given all this set up, then I can tell you what the derivative of the quotient is, the derivative of the  quotient at a is the denominator at a times the derivative of the numerator at  a. Minus the numerator at a times the derivative of the denominator at a, all divided by the denominator at a^2.  Let's use the quotient rule to differentiate the function that we saw  earlier. So, the function we were thinking about  is f(x)=2x+1/ x^2+1. I want to calculate the derivative of  that, with respect to x. Now the derivative of this quotient is  given to us by the quotient rule. It's just the denominator times the  derivative of the numerator minus the numerator times the derivative of the  denominator. That's all divided by the denominator  squared. Now, I've calculated the derivative of  this quotient in terms of the derivatives of the numerator and denominator.  So we can simplify this further, X^2+1 times the derivative of this sum is  the sum of the derivatives. It's the derivative of 2x + the  derivative of 1-2x+1 times at again, the derivative of a sum, so the derivative of  x^2, with respect to x plus the derivative of 1.  And it's all divided by the denominator, the original denominator squared.  I can keep going, I've got x^2+1 times what's the  derivative of 2x? It's just 2. What's the derivative of this constant?  zero, minus 2x+1 times, what's the derivative of x^2? It's 2x,  and what's the derivative of 1? It's the derivative of a constant zero, all  divided by x^2+1^2. So, this is the derivative of the original function we're considering, there's no more differentiation to be  done and we did it using the quotient rule.  We've done a ton of work on differentiation so far, we differentiate  sums, differences, products, now quotients.  What sorts of functions can we differentiate using all of these rules?  Well, here's one big collection. If you've got a polynomial divided by  polynomial, these things are called rational functions.  Sort of an analogy with rational numbers which are integers over integers.  A polynomial over a polynomial is by analogy, being called a rational  function. Now, since this is just a quotient of two  things you can differentiate, you can differentiate these rational functions.  This is a huge class of functions that you can now differentiate.   I encourage you to practice with the quotient rule.  With some practice, you'll be able to differentiate any rational function that  we can throw at you.  

What is the meaning of the derivative of the derivative?


Thus far, I've been trying to sell you on the idea that the derivative  of f measures how we wiggling the input effects the output.  A very important point is that sensitivity to the input depends on where  you're wiggling the input. And here's an example.  Think about the function f(x)=x^3. f(2) which is 2^3 is 8.  f(2.01) 2.01 cubed is 8.120601. So, the input change of 0.01 was  magnified by about 12 times in the output.  Now, think about f(3) which is 3^3, which is 27.  f(3.01) is 27.270901 so the input change of 0.01 was magnified by about 24 times  as much, right? This input change and this input  change were magnified by different amounts.  You know, you shouldn't be too surprised by that right, the derivative, of course,  measures this. The derivative of this function is 3x^2,  so the derivative at two is 3*2^2 is 3*4 is 12 and not coincidentally, there's a  12 here and there's a 12 here, right, that's reflecting the sensitivity of the  output to the input change. And the derivative of this function at 3 is  3*3^2, which is 3*9 which is 27 and again, not too surprisingly here's a 27,  right? The point is just that how much the output is effected depends on where  you're wiggling the input. If you're wiggling around 2, the output  is affected by about 12 times as much if we're wiggling around 3, the output is  affected by 27 times as much, right? The derivative isn't constant everywhere, it depends on where you're plugging in.  We can package together all of those ratios of output changes to input changes  as a single function. What I mean by this, well, f'(x) is the  limit as h goes to 0 of f(x+h)-f(x)/h. And this limit doesn't just calculate the  derivative at a particular point. This is actually a rule, right, this is a  rule for a function. The function is f'(x) and this tells me  how to compute that function at some input X.  The derivative is a function. Now, since the derivative is itself a  function, I can take the derivative of the derivative.  I'm often going to write the second derivative, the derivative of the  derivative this way, f''(x).  There's some other notations that you'll see in the wild as well.  So, here's the derivative of f. If I take the derivative of the  derivative, this would be the second derivative but I might write this a  little bit differently. I could put these 2 d's together, so to  speak, and these dx's together and then I'll be left with this.  The second derivative of f(x). A subtle point here is if f were maybe y,  you might see this written down and sometimes people write this dy^2, that's  not right. I mean, it's d^2 dx^2 is the second  derivative of y. The derivative measures the slope of the  tangent line, geometrically. So, what does the second dreivative  measure? Well, let's think back to what the derivative is measuring.  The derivative is measuring how changes to the input affect the output.  The deravitive of the derivative measures how changing the input changes, how  changing the input changes the output, and I'm not just repeating myself here,  it's really what the second derivative is measuring.  It's measuring how the input affects how the input affects the output.  If you say it like that, it doesn't make a whole lot of sense.  Maybe a geometric example will help convey what the second derivative is  measuring. Here's a function, y=1+x^2.  And I've drawn this graph and I've slected three points on the graph. Let's at a tangent line through those 3 points.  So, here's the tangent line through this bottom point, the point 0,1 and the  tangent line to the graph at that point is horizontal, right, the derivative is 0 there. If I move over here, the tangent line has  positive slope and if I move over to this third point and draw the tangent line  now, the derivative there is even larger. The line has more slope than the line  through that point. What's going on here is that the  derivative is different. Here it's 0, here it's positive, here  it's larger still, right? The derivative is changing and the second derivative is measuring how quickly the derivative is changing.  Contrast that with say, this example of just a perfectly straight line.  Here, I've drawn 3 points on this line. If I draw the tangent line to this line, it's just itself. I mean, the tangent line to this line is  just the line I started with, right? So, the slope of this tangent line isn't  changing at all. And the second derivative of this  function, y=x+1, really is 0, right? The function's derivative isn't changing  at all. Here, in this example, the function's  derivative really is changing and I can see that if I take the second derivative  of this, if I differentiate this, I get 2x, and if  I differentiate that again, I just get two, which isn't 0.  There's also a physical interpretation of the second derivative.  So, let's call p(t), the function that records your position at time t.  Now, what happens if I differentiate this? What's the derivative with respect  to time of p(t)? I might write that, p'(t).  That's asking, how quickly is your position changing, well, that's velocity.  That's how quickly you're moving. You got a word for that.  Now, I could ask the same question again. What happens if I differentiate velocity,  I am asking how quickly is your velocity changing.  We've got a word for that, too. That's acceleration.  That's the rate of change of your rate of change.  There's also an economic interpretation of the second derivative.  So, maybe right now dhappiness, ddonuts for me is equal to 0,  right? What this is saying? This is saying how much will my happiness be  affected, if I change my donut eating habits.  If I were really an economist I'd be talking about marginal utility of donuts  or something, but, this is really a reasonable statement, right? This is saying that right at this moment you know, eating more donuts really won't  make me any more happier and I probably am in this state right now, because if  this weren't the case, I'd be eating donuts.  So, let's suppose this is true right now and now, something else might be true  right now. I might know something about the second  derivative of my happiness with respect to donuts.  What is this saying? Maybe this is positive right now.  This is saying that a small change to my donut eating habits might affect how,  changing my donut habits would affect how happy I am.  If this were positive right now, should I be eating more donuts, even though  dhappiness, ddonuts is equal to zero? Well, yeah, if this is positive, then a  small change in my donut eating habits, just one more bite of delicious donut  would suddenly result in dhappiness, ddonuts being positive,  which should be great, then I should just keep on eating more donuts.  Contrast this with the situation of the opposite situation, where the second  derivative happens with respect to donuts isn't positive, but the second derivative  of happiness with respect to donuts is negative.  If this is the case I absolutely should not be eating any more donuts because if  I start eating more donuts, then I'm going to find that, that eating any more  donuts will make me less happy. Let's think about this case geometrically. So here, I've drawn a graph of my  happiness depending on how many donuts I'm eating.  And here's two places that I might be standing right now on the graph.  These are two places where the derivative is equal to zero.  And I sort of know that I must be standing at a place where the derivative  is 0, because if I were standing in the middle, I'd be eating more donuts right now. So, I know that I'm standing either right  here, say, or right here. Or maybe here, or here.  I'm standing some place where the derivative vanishes.  Now, the question is how can I distinguish between these two different  situations? Right here, if I started eating some more donuts, I'd really be  much happier. But here, if I started eating some more  donuts I'd be sadder. Well, look at this situation, this is a  situation where the second derivative of happiness to respected donuts is positive, right?  When I'm standing at the bottom of this hole, a small change in my donut consumption starts to increase the extent to which a change in my donut consumption  will make me happier, alright? If I find that the second  derivative of my happiness with respect to donuts is positive, I should be eating  more donuts to walk up this hill to a place where I'm happier.  Contrast that with a situation where I'm up here.  Again, the derivative is zero so a small change in my doughnut consumption doesn't  really seem to affect my happiness. But the second derivative in that  situation is negative. And what does that mean? That means a  small change to my donuts consumption starts to decrease the extent to which  donuts make me happier. So, if I'm standing up here and I find  that the second derivative of my happiness with respect to donuts is  negative, I absolutely shouldn't be eating anymore donuts.  I should just realize that I'm standing in a place where, at least for small  changes to my donut consumption, I'm as happy as I can possibly be and I should  just be content to stay there. There's more to this graph.  Look at this graph again. So, maybe I am standing here.  Maybe the derivative of my happiness with respect to donuts is zero. Maybe the second derivative of my happiness with respect to donuts is  negative. So, I realize that I'm as happy as I  really could be for small changes in my donut consumption.  But if I'm willing to make a drastic change to my life,  if I'm willing to just gorge myself on donuts, things are going to get real bad,  but then they're going to get really really good and I'm going to start  climbing up this great hill. It's not just about donuts, it's also  true for Calculus. Look, right now, you might think things  are really good, they're going to get worse. But with just a little bit more  work, you're eventually going to climb up this hill and you're going to find the  immeasurable rewards that increased Calculus knowledge will bring you. 

What does the sign of the second derivative encode?


What are extreme values?


The goal of computing is not numbers but insight.  I don't care that f(2) is equal to four. I do care about the qualitative features  of a function. Here's a graph of some random function I  just made up. A significant qualitative feature of this  graph is that right here is a valley in the graph and up here is a mountaintop.  Speaking of mountaintops and valleys is maybe too metaphorical, not precise  enough. Let's try to work out a better  definition. So, instead of calling this a mountain  top, I'm going to call that a local maximum.  Let's be even a little bit more precise. Let's suppose that that maximum value  occurs at the input c, where the output is f(c).  I'm going to call f(c) the local maximum value of the function near the input c.  Here's a precise definition, f(c) is a local maximum value for f if  whenever x is near the input c, f(c) is bigger than or equal to f(x).  Maybe this isn't even precise enough. A big sticking point with this is this  word near. That's sort of a weasley word.  I can make this near precise just like we've been doing with limits.  I'll introduce sum epsilon. So f(c) is a local maximum value for the  function f, if there's some small number, that's what I mean by near. So that whenever x is near c, and by near c, now, I'm in between c-  epislon and c+ epsilon. This is really close to c if epsilon is  real small. Okay.  So that, whenever x is contained in this interval, f(c) is bigger then or equal to  f(x). We can give a similar sort of definition  for the valleys. Here's that same graph again and I've  highlighted a local minimum on the graph of this function.  Near the input c, f(c) is the smallest output for the function.  A little bit more precisely, I'm calling it a local minimum value,  because whenever x is near c, f(c) is less than or equal to f(x).  Or even a little bit more precisely again just like for local maximums, I can  replace near with espilon. So f(c) is a local minimum value for the  function, if there's some epsilon measuring the nearness, so that whenever x is between c- epsilon, and c+ epsilon, f(c) is less than or equal to f(x).  Sometimes, I'm going to want to talk simultaneously about local maximums and  local minimums. So, we'll call either a local minimum or  a local maximum a local extremum. And this is kind of a pretentious word,  but it's just a word that we can use so that we can talk about both of these  concepts simultaneously, because they actually share quite a few  features in common. What if I want to talk about multiple extremums? Well, what if we wanted to talk about an octopus? Well, that's not  really a problem, but, what if you wanted to talk about two  of these things? You might not call it an octopus, you might call it an octopuses  now. But, you'll find some people will get  angry at you if you do that and I want you to call these octopi.  I probably don't really agree with them, but, the same problem comes up with  minimums and maximums and extremums. You're going to find some people who will  demand that you call these minima, maxima, and extrema if you're talking  about more than one of these things, but really, either is fine.  The world local here is really in contrast to the world global.  We'll say, in fact, everyone will say that f(c) is a global maximum value for  the function f if no output of the function is larger than f(c).  Maybe that's too cute of a way to say it. Here's a more precise way to say it,  f(c) is a global maximum value for the function if whenever x is in the domain of f, I'm only going to be considering inputs  in the domain, of course, then f(c) is bigger than or equal to  f(x). Do the same deal for minimal values,  f(c) is a global minimum value for the function if whenever I've got a point in  the domain f(c) is less than or equal to f(x).  One subtle thing to point out here is that I'm not claiming, say for global  maximum values that this is the biggest output of the function.  What I'm saying is that any other output isn't larger than f(c),  f(c) is bigger than or equal to any other output of the function and the same deal  for this global minimum. I'm not saying this is the smallest  output of the function, I'm just saying that any other output is bigger than or  equal to this output. Now we can see some examples of this.  Of this. Here's that same graph we've been looking  at so many times here. Let's try to figure out where the local  extrema are. this point here is a local maximum,  that's the biggest output value for nearby input values.  This point here is a local minimum, right? You're sitting in that valley,  that's the smallest output valley among nearby inputs. And, up here is another  local max and this local maximum is also a global maximum. I mean, if you're really assuming that this graph just continues down to the  left and the right-hand sides, you should also note that if you really  believe this thing continues down like this, this function has no global  minimum. Nobody's promising you that there is a  global minimum or a global maximum, but in this case, there isn't one.  There's no global minimum. There can definitely be multiple local  maximums, but there can also be multiple global  maximums. So here, I've drawn another graph, where  I've rigged it so that these two output values are the same.  So these are both local maximums, but they're also both global maximums.  Alright? So in this case, these are both global and also local maximums.  There's even more, dare I say it, extreme version of this.  Here, I've graphed the constant function y=17 is both a global maximum and a  global minimum for this constant function.  The distinction between local and global maximums is really quite important, even  in everyday life. When you're standing at a local maximum,  on the top of this mountain, small changes to your situation just make  things worse, and yet, if you're willing to go through  this valley, you'll eventually come up here to what at least appears to be to  global maximum of this function.

How can I find extreme values?


If we care about extreme values, it would really help to know how to find  them. We can use a theorem, a theorem of  Fermat. Here's Fermat's theorem.  Suppose f is a function and it's defined on the interval between a and b.  c, is some point in this interval, this backwards e, means that c is in this  interval. Okay, so that's the set up.  Here's Fermat's Theorem. If f(c) is an extreme value of the function f, and the function is differentiable at the point c, then the  derivitive vanishes at the point c. It's actually easier to show something  slightly different. So, instead of dealing with this, I'm  going to deal with a different statement. Same setup as before, but now I'm going  to try to show that if f is differentiable at point c and the  derivative is nonzero, then f(c) is not an extreme value.  It's worth thinking a little bit about the relationship between the original  statement where I'm starting off with the claim that f(c) is an extreme value and  then concluding that the derivative vanishes.  And here, I'm beginning, by assuming the derivative doesn't vanish and then  concluding that f(c) is not an extreme value.  This is the thing that I want to try to prove now.  Why is that theorem true? I'm going to get some intuitive idea as to why this is  true. Why is it that if you differentiable a  point c with nonzero derivative there, then f(c) isn't an extreme value? Well,  to get a sense of this, let's take a look at this graph.  Here, I've drawn a graph of some random function.  I've picked some point and f'(c) is not zero.  Differentiable there, but the derivative's nonzero.  It's negative. Now, what does that mean? That means if I  wiggle the input a little bit, I actually do affect the output.  If I increase the input a bit, the output goes down.  If I decrease the input a bit, the output goes up. Consequently, that can't be an extreme value.  That's not the biggest or the smallest value when I plug in inputs near c, and  that's exactly what this statement is saying.  If the derivative is not zero, that means I do have some control over the output if  I change the input a little bit. That means this output isn't an extreme  value because I can make the output bigger or smaller with small pervations  to the input. I'd like to have a more formal, a more  rigorous argument for this. So, let's suppose that f is  differentiable at c, and the derivative is equal to L, L some nonzero number.  Now, what that really means from the definition of derivative is that the  limit of f(c+h)-f(c)/h as h approaches zero is equal to L.  Now, what does this limit say? Well, the limit's saying that if h is  near enough zero, I can make this difference quotient as close as I like to  L. In particular, I can guarantee that  f(c+h)-f(c)/h is between 1/2L and 3/2L. Notice what just happened. So, I started  with infinitesimal information. I'm just starting with a limit as h  approaches zero. And I've promoted this infinitesimal  information to local information. Now I know that this difference quotient  is between these two numbers as long as h is close enough to zero.  Now, we can continue the proof. So, I know that if h is close enough to  zero, this difference quotion is between 1/2L and 3/2L,  I'm going to multiply all this by h. What do I find out then? Then, I find out  that if h is near enough zero, multiplying all this by h, I find that  f(c+h)-f(c)/h*h is between 1/2hL and 3/2hL.  I get from here to here just by multiplying by h.  Now, what can I do? Well, I can add f(c) to all of this.  And I'll find out that if h is near enough zero, then f(c+h) is between  1/2hL+f(c) and 3/2hL+f(c). Okay.  So, this is what we've shown. We've shown that if h is small enough, f(c+h) is  between these two numbers. Now, why would you care? Well, think  about some possibilities. What if L is positive?  If L is positive and h is some small but positive number, this is telling me that  f(c+h), being between these two numbers, f(c+h) is in particular bigger than f(c).  That means that f(c) can't be a local maximum.  Same kind of game. What if L is positive but I picked h to  be negative but real close to zero. Then, f(c+h) being between these two  numbers, f(c+h) must actually be less than f(c).  That means that f(c) can't be a local minimum.  You play the same game when L is negative.  Let's summarize what we've shown. So, what we've shown is this.  We've shown that if a differentiable function has nonzero derivative at the  point c, then f(c) is not an extreme value.  And we can play this in reverse. That means that if f(c) is a local extrema,  right? If f(c) is an extreme value, then either the derivative doesn't exist to prevent this first case from happening, where f is differentiable at c, or the  derivatives equal to zero, which prevents this second thing, f'(c) being nonzero.  So, this is another way to summarize what we've done.  If f(c) is a local extremum, then one of these two possibilities occurs.  Both of these possibilities do occur. Here's the graph of the absolute value  function. There's a local and a global minimum at  the point zero. The derivative of this function isn't  defined at that point. Here's another example.  Here's the graph of y=x^2. This function also has a local and a  global minimum at the point zero. Here, the derivative is defined but the  derivative is equal to zero at this point. We'll often want to talk about these two cases together, the situation where the  derivative doesn't exist and the situation where the derivative vanishes.  Let's give it a name to this phenomenon. Here we go. If either the derivative at c  doesn't exist, meaning the function's not differentiable at c, or the derivative is  equal to zero, then I'm going to call the point c a critical point for the function  f. This is a great definition because it  fits so well into Fermats theorem. Here's another way to say Fermot's  Theorem. Suppose f is a function as defined in  this interval and c is contained in there, then if f(c) is an extreme value  of f, well, we know that one of two  possibilities must occur. At that point, either the function's not  differentiable, or the derivative's equal to zero.  And that's exactly what we mean when I say that c is a critical point of f.  So, giving a name to this phenomena gives us a really nice way of stating Fermat's  Theorem. The upshot is that if you want to find  extreme values for a function, you don't have to look everywhere.  You only have to look at the critical points where the derivative either  doesn't exist or the derivative vanishes. and you should probably also worry about  the end points. In other words, if you're trying to find  rain, you should just be looking for clouds.  You just have to check the cloudy days to see if it's raining.  In this same way, if you're looking for extreme values, you only need to look for  the critical points because an extreme value gives you a critical point.  So, this is a super concrete example. Here's a function, y=x^3-x.  I've graphed this function. I'm going to try to find these local  extrema, this local maximum and this local minimum, using the machinery that  we've set up. So, if I'm looking for local extrema, I  should be looking for critical points. So, here's the function, f(x)=x^3-x.  

Do all local minimums look basically the same when you zoom in?


People have the idea that a local minimum means the function decreases and  then increases. Here's a local minimum on the graph of  this random function. And the misconception is that they all  look like this. That every time you got a local minimum  on one side, the function's decreasing, and on the other side the function's  increasing. Plenty of local minima do look exactly  like that. But, there's also plenty of pathological examples.  For instance, consider this a somewhat pathological example.  I'm going to define this function f as a piecewise function.  If the input's nonzero, I'm going to do this. 1+sin(1/x), which makes sense since  x isn't zero, time x^2. And if the input is zero, the function's  output will also be zero. In this case, there's a local minimum at  zero. How do I know? Well, here's how I know.  Let's take a look at this function The claim is that f(x) is never negative.  How do I know that? Well, what do I know about sine? Sine of absolutely anything  at all, no matter what I take this sine of, is between -1 and 1.  Now, if I add 1 to this, 1 plus sine of absolutely anything at all,  is between zero and two. Now, that's pretty good.  Now, think back to the definition of this function.  Here, I've got 1 plus sine of something, it doesn't matter what,  alright? 1 plus sine of anything,  this is between zero and two. Now,  I'm multiplying it by x^2. What do I know about x^2? Well, x^2 is  not negative. It could be zero, it could be positive.  But no matter what x is, x^2 is not negative.  Now, I'm multiplying 1+sin(1/x), this number which is trapped between zero and to, by x^2 which is never negative. And that means f(x) is not negative as long  as x isn't equal to zero, alright?  As long as x isn't equal to zero. I mean, this first case and this is a  non-negative number times a non-negative number, so the product is also  non-negative. Now, the other possibility, of course, is  that I plug in zero for x. But then, f(0) is just by definition  zero. And that means in either case, no matter  that I plug in for x, f(x) is never a negative.  Now if f(x) is never negative and f(0)=0, then I know that this must be the  smallest possible output value for the function.  The only numbers that are smaller than zero are negative numbers, and the output  of this function is never negative. But this isn't the usual sort of local  minimum where the function just decreases and then increases.  Well, here's the graph for our funciton f.  And, there is a local minimum at zero, but if I start zooming in, no matter how  much I zoom in, there's no little region on which the graph is just decreasing and  then increasing. The graph is always wiggling.  The upshot here is that decreasing and then increasing is one way to produce a  local minimum, but it's not the definition of a local minimum.  And not every local minimum arises in that exact way.  What a local minimum means is just that no nearby output value is smaller than  that local minimum value.  






Không có nhận xét nào:

Đăng nhận xét

Tìm kiếm Blog này

Lưu trữ Blog