# Random Sampling

(*Random number generation as discussed in the previous class is a legacy approach for generating random numbers. The approach discussed in this notebook is a new and recommended approach that provides enhancements in the quality of generated random numbers*)  

Random sampling is one of the very important topics in Statistical investigations. 

The `random` submodule of Numpy provides the functions for generating random numbers, including random numbers from various probability distributions.

It is important to understand that the random numbers generated by any computer software are essentially ***pseudo random numbers***. A sequence of pseudo random numbers is a deterministic sequence (generated by an algorithm), which possess *almost all* properties of a sequence of ***random numbers*** as verifiable by statistical tests.

## Generator 

Objects of the `Generator` class of the `random` submodule provide methods for generating random numbers. While creating an object of the Generator object, we can specify the ***seed*** for the random number generator.

### Default random number generator

The `random` submodule provides a built-in function `default_rng` for creating `Generator` objects. Using `default_rng` is the most common way of creating a Generator object.

In [1]:
import numpy as np
from numpy.random import default_rng
rng = default_rng()     # rng is a Generator object.

As stated earlier, a `Generator` object provides various methods for generating random numbers.

#### `integers` method

The `integers` method generates random integers in the specified interval

In [2]:
rng.integers(1, 10)   # generate a random integer r, with  1 <= r < 10

6

In [3]:
rng.integers(1, 10, endpoint = True)    # generate a random integer r, with  1 <= r <= 10

10

In [4]:
u = rng.integers(1, 10, 30, endpoint = True)    # generate a 1-D array of 30 random integers
u

array([ 4,  2,  7,  1,  5,  7,  4,  6,  5,  4, 10,  3,  5,  2,  9,  4, 10,
        5,  1,  7,  4,  1,  5,  3,  6,  5,  1,  7,  2,  4], dtype=int64)

In [5]:
A = rng.integers(1, 10, (2, 3), endpoint = True)  # generate a 2-D array of shape (2, 3)
A

array([[ 4,  2,  1],
       [ 8,  8, 10]], dtype=int64)

#### `random` method

The `random` method works similar to the `integers` function except that it generate random float numbers in the interval [0, 1).

In [6]:
y = rng.random(10)
y

array([0.65887424, 0.18757523, 0.73168824, 0.89130102, 0.68704323,
       0.93431286, 0.42986563, 0.37761321, 0.32548543, 0.52658349])

In [7]:
a = 5
b = 10
a + (b-a)*rng.random(10)

array([7.42088198, 9.30207767, 9.6673131 , 7.27465732, 7.41243423,
       8.26798927, 9.77264638, 5.61388996, 8.37108744, 7.34456237])

#### `normal` method

The `normal` method generates random numbers from Normal distribution.  

The following command generates 100 random numbers from $N(10.5, 0.7^2)$

In [8]:
x = rng.normal(10.5, 0.7, 100)
x[:10]

array([10.45710401, 10.85884162, 10.36142805, 10.04243082, 10.56761559,
        9.46246462, 10.02480359, 10.57529126, 11.62579925,  9.21395466])

In [9]:
print('Sample Mean =%7.3f \n'
      'Sample Variance =%7.3f'%(x.mean(), x.var()))

Sample Mean = 10.530 
Sample Variance =  0.548


**Home work :**   
Explore functions to generate random numbers from other probability distributions.  
Visit https://numpy.org/doc/stable/reference/random/generator.html#numpy.random.Generator for more information.

### Random sampling

#### `choice` method

A `Generator` object provides `choice` method for generating a random sample from a population contained in a 1-D array.

In [10]:
popln = np.array(["Club", "Spade", "Heart", "Diamond"])
rng.choice(popln, 10)   #Generate a with replacement sample of size 10

array(['Diamond', 'Heart', 'Club', 'Club', 'Heart', 'Heart', 'Club',
       'Spade', 'Spade', 'Spade'], dtype='<U7')

In [11]:
# Generate a standard deck of cards
cards = []
for suit in ['H', 'D', 'C', 'S']:
    for val in list(range(2,11))+['A','J','Q','K']:
        cards.append(suit + str(val))
aHand = rng.choice(cards, 5, replace = False)  # Generate a 5-card Hand at random without replacement
aHand

array(['S10', 'D3', 'S4', 'D7', 'D10'], dtype='<U3')

In [12]:
data = rng.integers(1, 5, 100, endpoint = True)
values, freq = np.unique(data, return_counts = True)

In [13]:
values

array([1, 2, 3, 4, 5], dtype=int64)

In [14]:
freq

array([17, 20, 16, 25, 22], dtype=int64)

In [15]:
data2 = rng.normal(0, 1, 100)
cuts = [-3, -2, -1, 0, 1, 2, 3]
data2[:10]

array([ 1.13805375,  0.30763009,  0.48608498, -0.01067742,  0.91533639,
       -0.29164724, -0.27368176,  1.97177186, -0.41419451,  0.34465881])

In [16]:
np.digitize(data2, cuts)[:10]

array([5, 4, 4, 3, 4, 3, 3, 5, 3, 4], dtype=int64)

In [17]:
freq = np.bincount(np.digitize(data2, cuts))

In [18]:
freq

array([ 0,  0,  7, 27, 44, 18,  4], dtype=int64)

In [19]:
np.unique(np.digitize(data2, cuts), return_counts = True)

(array([2, 3, 4, 5, 6], dtype=int64), array([ 7, 27, 44, 18,  4], dtype=int64))

In [20]:
np.bincount?