In [1]:
%pylab inline
import random

Populating the interactive namespace from numpy and matplotlib


## Random Number generator
A pseudo-random number generator is an object that outputs a value each time it is called.

A random number geenrator has a **state**:
* The state is initialized from a `seed`
* Each time the random number generator is called 
   * The state is updated
   * A pseudo-random number is generated from the seed.
   
The output value is a function of the seed.


In [2]:
random.seed(25)
print(random.randrange(2),end=", ")  # 1st bit
print(random.randrange(2),end=", ")
print(random.randrange(2),end=", ")
print(random.randrange(2),end=", ")  # 4th bit
state=random.getstate()  #save internal state
print(random.randrange(2),end=", ")  # 5th bit
print(random.randrange(2),end=", ")
print(random.randrange(2),end=", ")
print(random.randrange(2),end=", ")  # 8th bit

1, 0, 0, 1, 1, 0, 1, 0, 

In [3]:
random.setstate(state) # set internal state
print(random.randrange(2),end=", ")   #5th bit
print(random.randrange(2),end=", ")   
print(random.randrange(2),end=", ")
print(random.randrange(2),end=", ")   #8th bit

1, 0, 1, 0, 

The binary sequence is determined by the seed or the state.

There are two ways of setting the seed:
* `seed(i)` : sets the initial state according to the seed `i`.
Always generated the same sequence. Useful when you want to reproduce a bug.
* `seed()` : Uses the system time as the seed. Useful in production, when you want different randomization at each run. Generates a different random sequence each time. Useful when you want each run to be different.

## Why "pseudo-random" ?

The sequence generated is determined by the seed.

Why do we say that the generated sequence is **PseudoRandom**?

Because it looks random to most statistical tests.

Suppose we have a Generate a sequence of n bit. Each bit is independent and each element is +1 or -1 with probabilities 1/2,1/2

We will then try some tests

In [4]:
def generate(n=10):
    seq=[(random.random()>0.5)*2-1 for i in range(n)]
    return array(seq)

In [5]:
random.seed()
n=100000
R=generate(n)
R[:10]

array([-1,  1, -1,  1,  1, -1,  1, -1,  1,  1])

### Pseudo Random number generators pass most tests
 * An ideal random number generator will provide the same distribution of outcomes as a true random generator for every test.
### Test 1
The mean of the sequence should be very close to 0.

In [6]:
mean(R)

-0.00652

### Test 2
The difference between the mean of the first half and the mean of the second half should be close to zero.

In [7]:
n2=int(n/2)
mean(R[:n2])-mean(R[n2:])

0.0056

## Test 3 
In fact, the dot product of `R` with any other binary vector would be close to zero.

In [8]:
for i in range(100):
    S=generate(n)
    print(dot(R,S)/n,end=', ')
    if i%8==7:
        print()

0.00154, 0.00184, 0.00636, 0.00134, 0.00462, -0.00342, 6e-05, -0.0022, 
-0.00254, 0.00068, -0.00646, -0.00342, 0.00418, -0.00106, 0.00364, -0.00382, 
-0.00368, -0.00412, -0.0026, -0.0011, -0.00088, -0.00016, 0.0009, 0.00076, 
-0.00218, 0.00102, -0.00164, -0.0001, -0.00342, 0.00232, 0.0001, 0.004, 
-0.0036, 0.00198, -0.00438, 0.0004, -0.0019, 0.00748, -0.00204, -0.0086, 
0.0007, -0.00092, 0.00252, -0.0003, -0.00198, 0.00466, -0.00116, 0.0043, 
0.0012, -0.00316, 0.0014, -0.00318, -0.0046, -0.00202, -0.00072, -0.00258, 
-0.00244, 0.00466, 0.00358, -0.00406, -0.00538, 0.0041, 0.00146, -0.0018, 
0.00396, -0.00208, 0.00264, 0.00204, 0.00232, 0.00444, 0.00042, 0.00162, 
0.00116, 0.00026, -0.00048, 0.00188, 0.00674, 0.00212, 0.00086, -0.00104, 
0.00062, -0.00384, 0.00618, -0.00186, 0.00284, -0.00052, 0.00146, 0.00158, 
-0.00036, -0.00132, 0.00106, -0.00212, -0.00194, -0.00102, 0.0009, -0.00286, 
0.00074, -0.00464, -0.00346, 0.00032, 

### It is impossible to construct a PRNG that passes all tests
* If the tester with unlimited computational power, it can run a separate test for each possible seed and identify the seed that conforms with the sequence.
* suppose the seed is int32, then there are $2^{32} = 4,294,967,296 \approx 4Billion$ possible seeds
* generate a sequence for each seed. Length of sequence = 100.
* The fraction of generated seed from posssible seeds: 
$$\frac{2^{32}}{2^{100}} = \frac{1}{2^{68}}$$
An extremely tiny fraction.

### In practice, `python.random` is good enough
* But not always, for cryptographic/Key-Chain applications, the recommendation is to use the stronger `secrets` module.
* Or a [hardware random number generator](https://en.wikipedia.org/wiki/Hardware_random_number_generator) 

### Hash Function
* A **random mapping** is a function $R:\{1,\ldots,n\} \to \{1,\ldots,m\}$ such that for 
each $1 \leq i \leq n$ the value $R(i)$ is distributed uniformly over $\{1,\ldots,m\}$
* A **Hash function family** $H_j$ is a collection of functions $H_j:\{1,\ldots,n\} \to \{1,\ldots,m\}$ such that if $j$ is chosen at random then $H_j(i)$ is indistinguishable from a random mapping. (Assuming the adversary has restricted rsources).

In [9]:
def Hash(i,j,_range=100):
    """ 
    Compute a hash function
    i = index of hash function
    j = values to be mapped
    _range = the range of the output is [0,_range]
    """
    seed=int(str(i)+'0'+str(j))
    print('seed=',seed)
    random.seed(seed)
    return(random.randrange(_range))


In [10]:
Hash(5,7)

61

### Hashing [0,10] -> [0,10]
Note that this is **not** a permutation.

Repeated values are possible

In [11]:
i=3
Range=10
for j in range(Range):
    print(Hash(i,j,_range=Range),end=', ')

9, 6, 8, 0, 2, 2, 6, 8, 2, 1, 

###  Hash functions are very useful
For implementing maps (dictionaries), comparing documents ...

In future class.