# Random Data and Sampling 

In this final section, we will talk about random data and sampling. A lot of fun can be had with random data, especially when creating Monte Carlo simulations and games. 

We will learn some basic random operations and then talk about probability distributions and simulations. 

## Why Use Random Data? 

Using randomized data comes up a lot in data science and machine learning. In stochastic gradient descent, we train machine learning models iteratively and randomly sample one or mofrom numpy import random

x = random.rand()

print(x)re datapoints from a dataset in a loop. This is done because traversing an entire dataset in a loop is computationally expensive. We also see randomness used in other optimization algorithms, like hill climbing and simulated annealing. 

We can also use random data for Monte Carlo simulations, meaning we use random data to model real-life events.

It is important to note that a **random number** is not generating a different number every time. Random means something that can not be predicted logically. Given that computers are logical machines, it is hard to reason how they can produce random numbers. Fortunately there are events like mouse movements, keystrokes, network data, system temperature, etc that can help produce **pseudo-random numbers**. Of course, when we use random numbers for cryptography and security there are concerns if the numbers are "random enough" but for our purposes of data science, these will be just fine. 

## Generating Random Numbers

To generate a random value between 0 and 1, we can use the `rand()` function inside the `random` package. 

In [1]:
from numpy import random

x = random.rand()

print(x)

0.43327516096110086


You can also generate random values as arrays. 

In [5]:
random.rand(10)

array([0.86471789, 0.09232875, 0.61160001, 0.2255342 , 0.09191982,
       0.23998885, 0.96311939, 0.63955302, 0.55370427, 0.10073828])

In [6]:
random.rand(3,3)

array([[0.53058354, 0.39299023, 0.49832956],
       [0.18252139, 0.19234588, 0.6260224 ],
       [0.92645105, 0.21918684, 0.54714236]])

To use a different range between `a` and `b`, you can rescale the 0 and 1 to between that range. 

In [15]:
a,b = 5,10 
a + random.rand()*(b-a)

7.868489298104468

You can also just use the `uniform()` function and specify `low` and `high` as well as any dimensional arguments through the `size` parameter. 

In [20]:
random.uniform(low=5, high=10, size=(3, 3))

array([[9.71263241, 9.04219187, 8.60949481],
       [6.42117743, 5.02409632, 5.49641781],
       [7.89836869, 8.17329716, 9.61899516]])

To generate random integers, use `randint()` in a similar manner. 

In [24]:
random.randint(low=5, high=10, size=(3,3), dtype=int)

array([[7, 9, 9],
       [6, 9, 8],
       [9, 7, 6]])

## Generating Data from Probablility Distributions

We saw earlier that we could generate data from a uniform distribution using `uniform()`, where any number in a range is equally likely. We can also use other probability distributions. These include, but are not limited to: 

* Normal
* Binomial
* Exponential
* Poisson
* Chi Square



## Monty Hall Problem with Monte Carlo

A **monte carlo** simulation is a type of model that uses randomized data to understand something in the real world. Let's put a new angle on the classic Monty Hall Problem, where a game show hosts presents a game contestant three doors. One of the doors has a price, the other three have goats. 



In [95]:
import numpy as np 

n = 100

prize_doors = np.random.choice(3, n)
chosen_doors = np.random.choice(3, n)

In [96]:
prize_doors

array([2, 0, 0, 0, 2, 1, 0, 2, 2, 2, 2, 1, 0, 0, 0, 2, 1, 1, 1, 0, 0, 2,
       2, 1, 2, 2, 2, 2, 1, 0, 2, 2, 0, 1, 0, 2, 2, 0, 1, 0, 0, 2, 0, 1,
       2, 0, 1, 1, 2, 2, 1, 1, 1, 0, 1, 1, 0, 0, 1, 2, 1, 2, 2, 0, 2, 1,
       2, 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 0, 0, 2, 0, 0, 2, 2, 0, 1, 0, 0,
       2, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0])

In [97]:
chosen_doors

array([0, 1, 0, 2, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, 0, 1, 2, 0, 0, 0, 1, 0,
       2, 2, 2, 1, 0, 2, 1, 2, 0, 0, 2, 1, 1, 0, 2, 1, 2, 2, 2, 1, 2, 1,
       1, 1, 1, 2, 0, 0, 2, 0, 0, 0, 2, 1, 2, 1, 1, 2, 2, 2, 0, 2, 0, 2,
       1, 0, 1, 1, 2, 0, 2, 1, 1, 1, 0, 2, 0, 2, 1, 0, 1, 0, 0, 1, 1, 0,
       2, 0, 2, 0, 2, 1, 0, 1, 0, 1, 2, 1])

In [98]:
opened_doors = np.zeros(n, dtype=int) 

for i in range(n): 
    doors = np.arange(0,3)
    opened_doors[i] = np.random.choice(
        doors[(doors != prize_doors[i]) & (doors != chosen_doors[i])]
    )

opened_doors

array([1, 2, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 2, 2, 2, 2, 1,
       1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 2, 1, 0, 2, 0, 1, 1, 0, 1, 2,
       0, 2, 0, 0, 1, 1, 0, 2, 2, 1, 0, 2, 1, 2, 0, 0, 0, 1, 1, 1, 1, 0,
       0, 1, 0, 0, 1, 2, 1, 2, 0, 0, 1, 1, 1, 1, 2, 1, 0, 1, 1, 0, 2, 1,
       0, 2, 0, 2, 1, 2, 2, 2, 2, 2, 1, 2])

In [99]:
switch_doors = np.zeros(n, dtype=int) 

for i in range(n): 
    doors = np.arange(0,3)
    switch_doors[i] = np.random.choice(
        doors[(doors != chosen_doors[i]) & (doors != opened_doors[i])]
    )

switch_doors

array([2, 0, 2, 0, 2, 2, 0, 2, 1, 2, 1, 1, 0, 0, 2, 2, 1, 1, 1, 1, 0, 2,
       0, 1, 1, 2, 2, 0, 2, 0, 2, 2, 0, 2, 0, 2, 1, 0, 1, 0, 0, 2, 0, 0,
       2, 0, 2, 1, 2, 2, 1, 1, 1, 2, 1, 0, 0, 0, 2, 1, 1, 0, 2, 0, 2, 1,
       2, 2, 2, 2, 0, 1, 0, 0, 2, 2, 2, 0, 2, 0, 0, 2, 2, 2, 2, 2, 0, 2,
       1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0])

In [100]:
stay_wins = np.sum(chosen_doors == prize_doors) 
switch_wins = np.sum(switch_doors == prize_doors)

print(f"STAY WINS: {stay_wins}")
print(f"SWITCH WINS: {switch_wins}")

STAY WINS: 34
SWITCH WINS: 66
