3.1 Introduction to Probabilistic Models
In another module we have talked about deterministic models. Deterministic models are those in which there's no uncertainty in either the inputs or the output of the model. In this module, we're going to address the situation that often arises in practice where there's uncertainty in the picture. And in particular, we're going to talk around about probabilistic models. So in terms of content, we'll define a probabilistic model. We're going to talk about random variables and probability distributions which are the building blocks of these probabilistic models. We'll look at a set of examples, so you can see instances and applications of these probabilistic models. And then, once we have some motivation and examples under our belt, we're going to have a look at some specific probability distributions and summaries of those probability distributions. In particular, means, variances, and standard deviation.
We're going to look at random variables that are termed Bernoulli, Binomial, and Normal. These three types of random variables are very foundational in terms of probabilistic modeling. Their not the only ones that are out there, but their certainly core to the process. And then, we'll finish off by looking at a rule that is termed The Empirical Rule that is suitable when your data or underlying model is approximately normally distributed. Now, in terms of a definition of a probabilistic model, these are models that incorporate random variables and probability distributions. Now, a random variable we often have an intuitive sense of what that means, but a random variable represents the potential outcomes of an uncertain event. So an easy way of thinking about a random variable is of an event that has not yet happened but you know is going to. So for example, I have a die in my hand, I'm going to throw it, a number is going to come up, one through six, but I don't know what it is prior to having thrown the die. So prior to throwing the die we'd call the outcome a random variable.
Now, along with a random variable comes a probability distribution. And the probability distribution is used to assign probabilities to the potential outcomes. And so, if we're talking about a die, and it's a fair die, there are going to be six potential outcomes, and fair means that each outcome would have probability one six. So that's what we mean by the probability distribution.
And we use these probabilistic models in practice because to be realistic in our decision making we often have to acknowledge that we don't have absolute certainty in the inputs and consequently there's going to be uncertainly associated with the outputs, as well. And so, being able to formulate and use probabilistic models will allow you to be more relevant and more appropriate to many business situations that you'll find yourself in.
So the key feature of the probabilistic model is that it incorporates uncertainty explicitly in the model. And because we have incorporated that uncertainty explicitly, we are able to propagate that uncertainty through the model, so that when we get an output from the model, we are able to understand the uncertainty in the output, as well. So models are often used to create forecasts, but typically it's much more useful to have a range of potential outcomes rather than a single best guess. And a probabilistic model will often allow ourselves to give a range of potential outcomes and that's just a more realistic endeavor to do so.
Another aspect of probabilistic models is that probability and uncertainty is typically synonymous with the risk in the business setting. And businesses, if they're going to operate well, need to understand the risk of the environment that they operate in. If it's a financial firm, they need to understand risk associated with the stock market. If it's an insurance company, they'll need to understand risk perhaps associated with weather events. And so, if you're interested in risk then you are absolutely interested in uncertainty and hence probability. So anyone who's dealing with risk is going to have to be able to formulate and use probabilistic models.
Now, I wanted to start off with a couple of examples. So you can see how probabilistic thinking can ultimately be useful in the decision making process.
So the first example that I have is to think about a company that is very energy intensive in the sense that it uses a lot of energy resources. So an example of such a company, would be an airline. For the sorts of passenger airlines that we typically fly on,
the cost of jet fuel is something like 20-25% of their operating expenses. So fuel, and ultimately, oil, which the jet fuel comes from, is a key component of the entire business. And if an airline wanted to plan for the future, not tomorrow perhaps, but medium term and long term planning and that would typically involve the purchase or the leasing of new planes, then clearly the anticipated future price of oil
ultimately jet fuel is going to be very, very important to them. If we believe that the price of oil is going to remain low then it's that certain types of aircraft might well still be profitable to the company or as if the price of oil becomes very, very high then it might well change the mix of aircraft that they would want to fly and if they want to remain profitable.
But immediately we're faced with a big problem because we're trying to do medium or long term planning and it would be a very bold person who would get up and say, yes, I know what the price of oil is going to be in five years. Or ten years time. So how do we deal with this situation? We know that this quantity, the price of oil is a key component of our decision-making process, but at the same time we don't know what it's going to be. So the probabilistic approach to this sorts of problems is to acknowledge that we don't know exactly what the price of oil is going to be. And to use our expert knowledge and models to create or try to create some realistic probability distribution that captures the likelihood of the price of oil taking on certain values in the future. So it's the creation of this probability distribution that is tempting to model the potential prizes of oil in the future, that is key to incorporating the energy component into the future planning. And so, if one is able to do that, by which I mean create a realistic probability distribution, and incorporate that into the decision making process, then hopefully the company will be making more informed decisions and will certainly have a better understanding of the risk associated with the decisions that they're making. So that's an example just thinking about energy prices in the future. Here's a second example. Imagine you are an investment company, and you're considering whether or not to invest in a drug company. Drug companies have, typically, many compounds, potential drugs under development, and the one that I'm thinking about has ten drugs in a development portfolio.
Now, if a drug gets approved, remember, these are under development, so one doesn't know whether or not they're going to be approved by the regulator or not, or whether they're even going to be successful drugs. But it's not unreasonable to be able to estimate that if they were approved, what sort of revenue they might generate. And you could do that by looking at similar drugs within the therapeutic category. So one could say, if this drug gets approved, I anticipate it's revenue is going to be a certain amount. But of course whether or not a particular drug is approved in the future is uncertain event, in other words a random variable. Now, if we can estimate the probability of a drug being approved and again, we might be able to do that by looking at similar drugs or the history of the company, then we're starting to build up the elements of a probabilistic model. That will allow us to make a more informed investment decision.
So let's say we only want to invest in the company if the expected total revenue from this portfolio of 10 drugs is greater than $10 billion in 5 years. So that might be our investment criteria. Of course, we don't know for sure whether or not these drugs are going to get past the regulatory hurdle. But if we've got a probability estimate for whether or not they can get past and we've also got an estimate of the revenue that they're going to be able to generate, then we have the building blocks in place to create a probability distribution for the total revenue. And if we're able to create that probability distribution then we can use that as a part of our decision making process. So for example, we could work out the probability that the portfolio creates more than $10 billion in revenue in 5 years' time. So there's a second example and these are realistic examples. They are activities that companies really do go through and examples of incorporating probabilistic models.
So now we have seen two practical examples of models in practice, and I want to, at this stage, describe some specific probability models that are frequently used in the business setting. Now the first one of these is called a regression model. And I will devote all of another module to the discussion of regression models because they're very, very fundamental to a lot of
forecasting and business analytics activities. In this module I will briefly introduce them but they are certainly an example of a probabilistic model. We're also going to have a look at probability trees. This is a structure that allows you to propagate probabilities through a set of events. They are very useful for modeling various processes and we'll have a look at a couple of examples there. We're going to see a technique that's called Monte Carlo simulation that involves, well you can think of it as a scenario
analysis where you look at lots and lots of scenarios, but those are scenarios, the inputs of those scenarios are being created VIA a probabilistic model. So it's like doing almost an infinite number of scenarios. It's very useful and, very practical technique for solving a lot of very hard problems. When I, and when I say hard problems, those are problems that it's difficult to write down specific equations for. But by doing a Monte Carlo simulation we can often get a very good sense of the uncertainty in these complicated business processes. And the final one we're going to have a look at is called a Markov model and this is an example of a dynamic model. If you'll recall from one of the other modules I had talked about various terms that we use for models. One was static and another was dynamic, and a Markov model is an inherently dynamic model. Looking at a process moving through various states. So we'll have a look at these four examples.
3.3 Regression Models
Now, what a regression model does is work on data. So it's not deterministic, it's based on a set of data, and we use that data to reverse engineer a realistic description of a process. And so here's a regression model being applied to a set of data, that is capturing the price of a diamond and its weight. So if you have a look in the graph on this slide you will see along the x-axis you have carats, in other words weight, and on the vertical y-axis you have the price of a diamond. And each of those dots on the graph Is actually a diamond that is being weighed and priced. And you can see that these prices and weights are falling on something that looks approximately like a straight line, it's approximately linear. And what a regression model does is take the data as an input, and
find the best fitting line, in this instance, to the data. And I've written down the formula for the best fitting line. And that best fitting line is the blue line that you can see superimposed on the graphic. And around the blue line I've plotted a gray band. And that gray band is termed a prediction interval. And this is the key difference between a probabilistic and a deterministic model. And that by using this probabilistic model, we're going to get measures of uncertainty of the outputs. And you can use the gray band there to create a prediction interval for what we term, a new observation. So if you came to me with a diamond that had come out of the same population that this regression was run against. Let's say you come to me with a diamond that weighs 0.25 of a carat. Then I can use this graph to predict the price of that diamond and furthermore, I can use the gray bands around the graph to give a prediction interval that captures the range of uncertainty. And clearly you want to be able to do that, because when you look at the points, they don't lie exactly on the straight line. They're pretty close, but they're not exactly on it, so there's some noise in the system, and we're able to measure that noise, and incorporate it in our prediction interval and forecast. So that's what a regression model does for you. And as I said before, this is certainly one of the techniques that is most frequently used in business analytics. So to summarize, regression models use data, and they use that data to estimate the relationship between the mean, or the average value of an outcome, let's call that Y, and a predictor variable X. So going back to the diamonds example, what our regression model is going to do is give us the expected price of a diamond for any given weight.
The intrinsic variation in the raw data, and by that I mean those points were not lying along a straight line exactly, is incorporated in to the forecasts. It's propagated through the regression model. And we are then able to create a prediction interval for our forecast, rather than a single best guess. And the basic idea behind these prediction intervals is that the less noise that there is in the underlying data, then the more precise the forecast and the regression are going to be. So a lot of the activity in regression modeling involves trying to find a model which best explains the data.
3.5 Monte Carlo Simulations
Next example that I want to show you is called a Monte Carlo Simulation. Now Monte Carlo Simulations are very useful for modeling complicated scenarios. The example that I have here I wouldn't claim is particularly complicated. But it will certainly give you a sense of what a Monte Carlo Simulation can do for you. So I'm going to go back to the demand model. The demand model being that the quantity of a product demanded is equal to 60,000 times its price to the power -2.5. In the module where we had looked at this model, we had worked out via calculus that the optimal price given this set up was equal to $3.33, three and a third dollars. And that was based off of the equation that the optimal price was equal to c times b over 1 + b, where c was the cost, and in the example we had the cost equal to 2. And b was the elasticity, and in the example we have b equals -2.5. If you plug those numbers into the optimality equation solution, you will get three and a third. So that's where the three and a third came from, but we were treating this in a deterministic fashion. In other words, we were saying we know what b is. b is equal to -2.5, but many situations, you're not actually going to know exactly what b is. b is going to have some uncertainty associated with it, and it would be good if we could propagate the uncertainty that's associated with b all the way through to some uncertainty associated with the optimal price. So what if b is not known exactly? Well, one natural thing to do is to try and put in different potential values of b. Maybe you could put b in at -2.4 and see what happens. Maybe you could put it in at -2.3 and see what happens, to start generating a range. And what Monte Carlo simulation does is take that idea, try different values of b. But, it draws those values of b from what we call a probability distribution. And each time it draws a new value from b, it calculates the optimal price and stores that, and we will replicate that process. We will take hundreds or thousands, or even millions of draws, from that probability distribution and end up with an entire distribution for the optimal price. And as I said, in this particular example, there's only one unknown, which is b. But in many examples, and I've worked on examples where we might have a million different unknowns with each one having its own probability distribution. And the same idea follows, you can draw each of those unknowns from a probability distribution propagated through the formula. The formula here being c times b over 1 + b, and get a range of uncertainty on the outcome. So let's see that working in this particular example. On this slide I'm showing you the input to a Monte Carlo simulation and the output from the simulation. So I'm going to generate the elasticity b from what's termed a uniform distribution. And my knowledge suggests that b lies somewhere between -2.9 and -2.1 and essentially each number between -2.9 and -2.1 is equally likely, so that's what we call a uniform distribution. And I've drawn a picture of that uniform probability distribution for b. And it's a straight line going along the top because every outcome is equally likely. So let's say we take a b from this particular distribution, and now drop it in to our optimality equation, which is c times b over 1 + b. Remember that c is equal to 2 in this particular instance, so we take a b, a random b, and drop it into the formula, and we save the answer. And then we keep doing that. And in this particular example, I have replicated that 100,000 times. I've drawn 100,000 bs from the uniform probability distribution, and each time I have calculated what the optimal price is. And that's what I'm showing you in the histogram at the bottom right-hand graphic on this slide. You can see it has an interesting distribution associated with it, it's not flat. And the reason it's not flat, even though the input came from a uniform distribution, is that our formula, c times b over 1 + b, is not a linear equation. It's non-linear, and that means that we're not going to have a uniform distribution coming out. And we can see that some outcomes are more likely than others, that those are the places where the histogram is higher. Once we've got this entire probability distribution for the output, remember that's the optimal price, we can do some useful things with that output probability distribution. On the distribution, I have drawn in where our single best guess, that was the three and a third. But I've also placed what one might term range of feasible values around there. I've created intervals that capture 80% of the draws from this distribution. And that 80% interval there ranges from 3.1 to 3.7. And so I could use that interval as a more realistic basis for understanding the uncertainty and the optimal price, given that I acknowledge that I don't know exactly what b is. So that's the basic idea of a Monte Carlo simulation. It's like a scenario analysis, but you're looking at potentially thousands or millions of scenarios. And those scenarios are being generated from inputs that are drawn from probabilistic models, from probability distributions. So a very, very common technique in many business situations.
3.6 Markov Chain Models
So the final sort of model that I'm going to show you, probabilistic model is called a Markov chain. Now, a Markov chain is a dynamic model. It's a probabilistic model and it's discrete. And what it does is modeled a discrete time state space transition. Now that's a bit of a mouth full. So I'm going to immediately give you an example, so you can understand what we mean by that. And the example that I'm thinking about is what a public policy person might do when they're trying to understand an individual's employment status. So obviously, unemployment and employment are key features of the economy. We like to understand them. We can understand them at a point in time by doing a survey. And asking people whether their employed or not employed. But we're, also, frequently interested in the dynamics of that process for how long do people stay unemployed, how liked they are. They do transition from employment to non-employment, so, in this particular example, I'm going to treat time not as a continuous variable, but as a discrete variable. And I'm going to consider time in six-month blocks, and I'm going to consider an individual's employment status as being in one of three possible categories. First one is that you're employed, you got a job. The second one is that you're unemployed and looking for a job. And the third one is that you're unemployed and you're not looking for a job. Now, it's quite possible to move from one of those states. That's what we mean by state. There are three states here to another. So you could be employed and you get fired. So that's going to take to being unemployed, and maybe you're upset about being fired, and you're going to try and find a job. And so you're unemployed and you're looking, or maybe you've said, that's enough, I'm unemployed and I'm not going to look. So you could go from state one to either state two or state three. Likewise, you could be unemployed and looking for a job and then get employed. So you could from one time period to the next go from state two to state one, but if you're unemployed and not looking, well, you can't go from state three to state one. Because even not looking for a job, you're not going to become employed. So you can see there are some transitions that you can make and others that you can't between these three states. Now, what a Markov chain does for you is model the probability of transitions between those three states. So if you have a look at the graphic on the right-hand side here, you can see the three possible states that an individual is in. So they're employed, they are unemployed and looking, that's state two, or unemployed and not looking, that's state three. And I've drawn arrows that show you the possible transitions. And so, if you are employed, you could certainly move to the unemployed and looking state, you could also move to the unemployed and not looking state. It's important to realize that you could stay employed in the next time period, which is why there's a darker blue arrow from the state back into itself. You can certainly stay in the same state.
Likewise, you could move between the looking and lot looking states as well. But notice the arrow or the absence of an arrow between not looking and employed. Because if you're not looking, you're not going to be able to transition to the employed state. So that graphically represents the chain. Now on the left hand side we have what is called the probability transition matrix. And what that does is provide the probability of moving from one state to another. And so let's have a look at the top row in that matrix. So, it corresponds to the current state being one. In other words, you're employed, and the probabilities tell you the chances of transitioning to one of the other states. And so the 0.8 is the probability that you transition to state one. In other words, you stay employed. So according to this model 80% of people retain their job over the next six months. And there's a 0.1 chance that you lose your job and move to the looking stage. And likewise, a 0.1 chance that you lose your job and move to the not looking stage. Notice that those probabilities across the road to add up to one. So something has to happen. Likewise, I've got a set of transition probabilities from state two into states one, two, and three. And if you have a look at state three, notice there's a zero in the bottom left hand side of the matrix, because if you are unemployed or not looking, there's a zero probability of becoming employed. If you're not looking for a job, you're not going to get a job, and so there can certainly be zeros in this matrix. So that's the idea of a probability transition matrix, and they can be very useful for modeling these probabilistic dynamic processes. Now, this model is called a mark of chain because it satisfies a certain condition and it's called a mark of property which more, generally is would be understood as a lack of memory problem, a lack of memory characterization.
And what that Markov property states is that the transition probabilities only depend on the current state, not on prior states. And the more elegant way of stating that is given the present, the future does not depend on the past. And that's definitely an assumption in this particular model, and the assumption may or may not be correct, and so one would want to check that assumption. But in this classic Markov chain that is an assumption, a simplifying assumption, that is made. So there's a fourth example of a probabilistic model. So we've talked about regression models, we've talked about tree models, we've talked about Monte Carlo approaches to solving problems, and we've seen a Markov model here at the end. And so these are all examples of probabilistic models, and have the potential to be applied to a wide number of business processes. So hopefully, you get the idea there's lots of opportunity to use these ideas in practice.
3.7 Building Blocks of Probability Models
Now that we've seen some examples of probability models in practice, I want to talk about the building blocks of these probabilities, models and in particular we're going to discuss some specific random variables. Both discreet and continuous, those are terms that we've seen a lot when we've been discussing modeling, and so the random variables will come in two forms, discreet and continuous, we'll talk about some very important probability distributions. And, if you recall as I define these objects, a random variable represents a potential outcome of an uncertain event, and the probability distribution. Assigns probabilities to the various potential outcomes, that's the basic idea.
So, let's start off by looking at a discreet random variable just to confirm our understanding of the terminology. So anticipate that you're going to roll a die. A single die. And we'll call it a fair die which means each outcome is equally likely. Now, because I haven't rolled the die yet, I don't know what the outcome is going to be. So, that's an example of a random variable and there's various notations, but quite often we'd write the outcome of the random variable as capital X. Now, given it's a six sided die, there are six possible outcomes, and you can see those being illustrated across the first row of the table. And beneath that you can see the probabilities that have been assigned to each of those possible outcomes, and we write, generally, those probabilities as the probability that capital X equals little x, and when you see that first time around it looks a little bit odd. But, what that's trying to say is that capital X is the random variable and little x is the realization of the random variable. So little x can take on the values one, two, three, four, five, or six. And this table displays the probability distribution for rolling a fair die where each outcome is equally likely. So each one is one-sixth. So this is what we mean by a probability model. Now, some facts about probabilities that it's useful to know. The first one, that probabilities have to lie between zero and one inclusive. If anybody ever presents you with a probability greater than one, or less than zero, something has gone horribly wrong. And the other fact about these discrete probabilities is that they have to add up to one. Something has to happen. And so here's an example of a probability distribution. Now that I've shown you a discrete random variable, I want to followup with a continuous random variable. And as an example of a continuous random variable, I'm going to consider the percent change on the S&P 500 stock index. So imagine I asked you, what do you think the S&P 500 is going to close at tomorrow? Well, you don't know the answer to that question, not exactly, and so one might be willing instead to use a probability model to get some assessment of the likelihood of various clothes and prices. And in fact, here I'm not going to look at the price directly, I'm going to look at the percent change sometimes called the return. And the way you would calculate a daily return for a stock or a stock index is to say. What's the price today minus the price yesterday over the price yesterday, that's often term a relative return, and if we multiply that through by 100, we're going to get that on a percentage basis, and that would give us our percent return. Now if I'm talking about the percent return tomorrow, I need to look at tomorrow's price minus today's price over today's price, and that's what's in the formula there, Pt+1-Pt over Pt. So, that's my object of interest, the percent change, and technically that quantity can take a value between mine is 100%, that would be a bit of a disaster, where everything was lost and infinity, I mean that's a little bit technical, but potentially you could get any value between there. Clearly, some feel more likely than others, typically, the returns on the market vary between plus or minus 1% each day, something of that order. Now, when we want to calculate probabilities of continuous random variables, it's a little bit different. We look at what's called the probability density function. I'm going to show you one of these on the next slide. So here's a potential probability distribution of the S&P 500 daily percent changes. And what that will give you is a probability model for the daily percent change. So, notice here that we have a complete curve. And each of the values on the x axis, the percent change axis, is a potential outcome. I haven't drawn this out to plus or minus infinity because if that doesn't really make sense as a very unlikely outcome, so I've captured the majority of potential outcomes here, and the way that you would calculate probabilities from such a graph is by looking at the area underneath the graph. So for this continuous random variables the probability that are associated with areas under this graphs. And if I wanted for example, to ask the question, what's the probability that the S&P 500 falls by more than half a percent, it means a percent change of minus 0.5. Then, the way I would do that is take this graph, I would identify the value minus 0.5 on the x axis, and I said more force by more than minus 0.5%. So that means area to the left of, and so the area under this graph would give you the probability. So in summary, the probabilities associated with the continuous random variables come from calculating errors. Now, in practice, you don't have to calculate these errors, you're going to use software to calculate the area for you. And so in Excel, and sheets, they're going to be built in functions that will calculate these probabilities, these areas on the curves. But the important thing to realize is that, given the model, the probability model really being the shape of this distribution here then given that model we're going to be able to calculate various probabilities.
So those are our two sorts of random variables, the discrete and the continuous random variable. Given that we've got a probability distribution, then one of the things we like to be able to do is summarize it in some fashion rather than just showing people the shape.
Or the numbers if it's a discrete probability distribution, what we might like to do is summarize it in some fashion, and there are some summaries that are frequently used. And one of the most basic summary is to say where is the center of that distribution? And the measure that we're going to talk about that captures the idea of centrality is known as the mean. And the mean, which I'm writing with the Greek letter mu here, is a measure of centrality. To get a sense of what the mean is doing for you, imagine taking the picture, which is actually a normal. Probability model and cutting out the shape of the graph making it, stamp it out in a piece of metal, then pick it up and try and balance it on your finger. So if you find the balancing point of that distribution you have found the mean, that's what it does. So the mean is a measure of centrality in this particular example here the mean is sitting there at zero right in the middle of the distribution, and that's because it's what's called a symmetric distribution. If you put a mirror in the middle of the distribution, you see the same, whether you look at the distribution or what's happening in the mirror. So, that's the first measure, the first summary, centrality, mu.
And after looking at the middle of the distribution, the other feature that we like to capture is the spread of the distribution. Now there are two ways that we measure the spread. One is through what is called the variance. And we write that with the great letters sigma squared. And then, there's the variance very close cousin called the standard deviation, which is just the square root of the variance which we write as sigma. And what this measures are spread I'm telling you is how spread out the distribution is, is it all clumped up in the middle or is there a lot of spread associated with it? And these two numbers are not the only summaries of a distribution, centrality and spread, but they're certainly the fundamental ones. And we will often use these two summaries to provide a characterization of a random variable, so for example, after having run a Monte Carlo simulation, you will have generated one of these distributions with the output from that Monte Carlo simulation. And rather than presenting someone with the whole graph it's useful to present someone with a whole graph, but it's also useful to present people with numerical summaries. And the two most likely numerical summaries you'd want to be able to present would be the mean mu and the standard deviation sigma. Again, the way that those will be calculated is through software.
Không có nhận xét nào:
Đăng nhận xét