You probably must have studied probability in your secondary schooling, and probably many of you must have got bored with the same subject as it is quite tricky to understand when the problems go beyond coin flip and dice rolling. In practice, there are many situations where finding a probability of something is almost impossible due to variability, uncertainty, and ambiguity. So, how do we find a way from those situations? the answer is to perform Monte Carlo Simulations on your model. Monte Carlo Simulations sometimes called as Monte Carlo Run has a wide range of applications in data science like stock market predictions, risk analysis. In essence, it finds a number of possible outcomes from some event.
The Meaning of Probability
Before understanding what is Monte Carlo Simulation and its use in finding probabilities, we must first need to understand, what does the probability of some event really tell? Let's take a simple example of rolling a dice. When a dice is rolled, what is the probability that we get Number 4? The answer is simple, it 1/6 ~ 0.167 or 16.67 %. But what does that number really tell? How can one interpret that there is a 16.67% chance of getting Number 4? Does it convey that if I roll a dice 6 times, I shall get Number 4 for at least one time? The answers are no, yes, and sometimes. The probability of occurring some event only matches with the theoretical value when the experiment is performed infinite times. But there is nothing like performing an experient infinite times. No matter how many times you perform your experiment you will always end up with the number of iterations closer to zero than infinity. By definition, the probability is expressed as the ratio of “Number of Desired Outcomes over Total Number of Possible Outcomes.”
The Law of Large Numbers
In probability theory, The Law of Large Numbers is a theorem that tells when the number of trials increases, the observed probability of some event approaches its theoretical probability. That means from the above example of rolling a dice, we can say that when the experiment of rolling dice is performed a large number of times, the ratio of getting Number 4 over the total number of runs will tend to 16.67%. Deep down the Law of Large Number convey that performing the same experiment a large number of times removes or averages out uncertainty from the results.
Monte Carlo Simulation
Monte Carlo Simulation has a wide range of applications as stated earlier. Also, there are many models based on this method. Here I’ll be explaining it with some basic examples.
Monte Carlo Simulation is mainly divided into 3 sections.
1- Generating Random Input which mimics the actual event
2- Passing the generated data through the governing model
3- Evaluate the statistics
Let's take the same example of rolling a dice and find the probability using Monte Carlo Simulation with the steps mentioned above.
Generating Random Input
Let's first assume we are performing Monte Carlo Simulation in the physical world. If I need to generate a random input to governing model I simply have to roll the dice n number of times and record what is the output of each event. Similarly when simulating it on computer programs like Python, R, or MATLAB. we can generate the random data with PRNG (Pseudo-random number generator). The data generated from PRNG can not be truly random as it is generated by some software program. The data distribution is controllable by the user. Monte Carlo performed on data generated from True Random Number Generator (TRNG) may give better results than PRNG. But as significantly large samples are used in the simulation and individual Monte Carlo Runs can be performed n number of times on a computer, it works well with PRNG. In the case of our example, we can generate data as dice rolling events on the software mentioned above, the randomly generated data will be a single number between 1 to 6. Sometimes PRNG data needed to be converted to the range of inputs to the governing model.
The governing model
The governing model is a black box that takes randomly generated data as an input and checks the conditions as per the modeled parameters to generate the output. In our case, the input to this block is the result of a dice rolling event, which is any number between 1 to 6. As we want to know what is the probability of getting Number 4, the governing model will maintain two separate counters, one for the total number of events and the other for the number of desired outcomes.
Evaluate the statistics
In the last stage of the Monte Carlo Run, the probability is calculated till the recent iteration of the simulation. The evaluated statistics will reach the theoretical value as the number of iterations goes on increasing.
Graphical Result of Monte Carlo Simulation
As shown in figure 1. The probability of getting Number 4 in the dice rolling experiment reaches theoretical probability as the number of iterations increases. Figure 2 shows the error between theoretical probability and observed probability at each iteration. As the probability reaches the theoretical value, the error reaches zero.
We have cross verified the results because we were knowing the value of theoretical probability as the example was quite simple. So let’s simulate another little bit complex example to see how Monte Carlo is useful.
Let’s assume we have 10 dice, we roll all of them simultaneously. What is the probability that we get the summation of all dice to be equal to 30? As there will be multiple combinations that can sum up to 30, the problem is quite difficult to solve on paper. So let's see what Monte Carlo Simulation tells.
After 1 million iterations the estimated probability is 0.048414 which is 4.8414%. We don’t have a theoretical probability for comparison, but the value given by one online portal is 4.846% which is close to the values estimated by Monte Carlo Simulation.
In essence, the Monto Carlo Simulation is strongly based on the law of large numbers and randomness very similar to the real world. It is intuitive that if over a large number of experiments observed probability matches with theoretical probability. So in another way, if the same experiment is simulated by generating random samples after many iterations the observed probability must be close to theoretical.