# The numpy.random package

## Purpose of the Package

NumPy is a library contained in the Python programming language. It is used to help support large data structures and matrices. It is also used for its high level mathematical functions to operate these arrays(1). NumPy was created in 2005 by Travis Oliphant and is an open- source software.

An important function within the NumPy library is the .random package that is NumPy's built-in pseudorandom number generator(2). Using and developing random numbers is a very important part of machine learning and can be used as a tool to help the learning algorithms be more robust and ultimately result in better predictions and more accurate models(3). Machine learning algorithms use randomness to learn from a sample of data. Randomness is a feature, which allows an algorithm to attempt to avoid over using the small training set and generalize to the broader problem. 

Machine learning algorithms also uses randomness when evaluating a small sample size of data when it does not have access to the full observations of the domain. The algorithm can harness randomness when evaluating a model such as using a k-fold cross-validation technique. This cross validation technique is used to predict a result using the available data to fit and evaluate the model on different subsets. We do this to see how the model works on average rather than on a specific set of data(3).

NumPy and specifically the .random package is a very important tool for both developers and data analysts. It allows the developer the ability to generate random numbers and sequences to try and mimic a real data set or scenario that their program or algorithm will have to deal with. The NumPy library has allowed for algorithms to be more robust and effective in the development of machine learning.   


## Simple Random Data and Permutations
For data intensive computing, NumPy provides us with a wide range of methods that make data manipulation in Python very quick and easy. One of the functions in the random package is the option to create multiple different types of random numbers. As I have stated previously being able to develop huge arrays of random numbers allows developers the ability to debug any new code they may have created. There are various different options and types of figures that the NumPy.random can create and I will show some ways to develop these numbers. The random module also contains some permutations that can be used to randomly shuffle the order of items in a list.  This is sometimes useful if we want to sort a list in random order.

In [3]:
# import numpy
import numpy as np
# random.randint creates a single random number between 5 and 10
np.random.randint(5, 10)  

8

In [4]:
import numpy as np
#An array of random numbers in the half-open interval [0.0, 1.0) can be generated
np.random.rand(5)

array([0.02776209, 0.75146397, 0.91778701, 0.59733079, 0.93826126])

In [10]:
#creates a list of 10 numbers
arr = np.arange(10)
#shuffles the list into a random order
np.random.shuffle(arr)
#prints the list below
arr



array([4, 0, 8, 5, 1, 6, 9, 7, 3, 2])

## Generating Pseudorandom Numbers
NumPy's random package is a built in feature that creates pseudorandom numbers. The numbers generated are never fully random as they are created from a seed and then are distributed randomly from there. NumPy uses a particular algorithm called Mersenne Twister to generate these pseudorandom numbers. The Mersenne Twister is one of the most extensively tested random number generators in existence. NumPy, if not instructed, generates the seed from the local time on the machine in milliseconds. The "seed" is a starting point for the sequence and the guarantee is that if you start from the same seed you will get the same sequence of numbers(5). 



In [29]:
# seed the pseudorandom number generator
from random import seed
from random import random
# seed random number generator
seed(1)
# generate some random numbers
print(random(), random(), random())
# reset the seed
seed(1)
# generate some random numbers
print(random(), random(), random())

0.13436424411240122 0.8474337369372327 0.763774618976614
0.13436424411240122 0.8474337369372327 0.763774618976614


The code above shows how we import the seed from the random package and then print off some values(6). This code also shows that when the seed is reset again but to the same number the same sequence of numbers will appear again. This can be useful for debugging purposes as we can create the same sequence of events at every turn. It can be useful to control the randomness by setting the seed to ensure that your code produces the same result each time, such as in a production model.
For running experiments where randomization is used to control for confounding variables, a different seed may be used for each experimental run (6).

## References 
1. https://en.wikipedia.org/wiki/NumPy#History 
2. https://engineering.ucsb.edu/~shell/che210d/numpy.pdf
3. https://machinelearningmastery.com/introduction-to-random-number-generators-for-machine-learning/
4. https://en.wikipedia.org/wiki/Mersenne_Twister 
5. https://stackoverflow.com/questions/14914595/what-is-a-seed-in-terms-of-generating-a-random-number
6. https://machinelearningmastery.com/how-to-generate-random-numbers-in-python/ 