# Task III: NVT molecular dynamics simulations (Part II)

## Startup

Set up the kernel

<center><img src="figures/fig1.png" width=1100 height=240 /></center>

<center><img src="figures/fig2.png" width=350 height=240 /></center>

run the following cells using `shift` + `enter`

In [None]:
from test_ran import main as test_ran
from test_corr import main as test_corr
from ran_gauss import main as ran_gauss
from ran_walk import ran_walk
import numpy as np
import matplotlib.pyplot as plt
import scipy.special

## Goals
- Generate sequences of RNs using different RNGs
- Perform several statistical tests to check statistical fluctuations and correlations
- Generate normally distributed RNs
- Perform a 1D random walk and compute the diffusion coefficient

## Step 1: Generate sequences of RNs using different RNGs


the function `test_ran` calls the selected RNG for the desired number of times and performs a binning of the
sequence of RNs so obtained. For the moment being, specify nexp=1, e.g.

```python
iran = 0 # which randum number generator to select. The allowed number is 0, 1, 2, 3
ntry = 1100 # the number of tries, i.e. how many random numbers to generate
nbins2 = 11 # the number of bins
nexp = 1 # the number of experiments, for each experiment do one chi^2 point (see below)
```
(ignore the warning about nexp). In this example, the histogram will be saved in a file
named histo-ran0_1x1100-11.

The detail of random number generators (RNGs), namely, ran0, ran1, ran2, and ran3 are provided in the Numerical Recipes book
(reference in the task moodle webpage). 

In [None]:
iran = 0
ntry = 1100
nbins2 = 11
nexp = 1
test_ran(iran, ntry, nbins2, nexp)

<div class="alert alert-block alert-info"><b>TODO:</b> Visualize the frequency histogram and compare two histograms built from RN sequences of different lengths. (ntry) </div>


In [None]:
# Insert here the name of the files to plot
filef=[
    'histo-ran0_1x1100-11',
    # insert more files here
    ]

color=['black','red','blue','green']

i=0
for filename in filef:
    i+=1
    x=[]
    y=[]
    for l in open(filename,'r'):
        x.append(float(l.split()[0]))
        y.append(float(l.split()[1]))
    x=np.array(x)
    y=np.array(y)
    width=x[1]-x[0]
    plt.bar(x,y,width,facecolor='none',edgecolor=color[i], label=filename)


plt.plot((0,1),(1,1))
plt.xlim([0,1])
plt.legend()
plt.show()




Since the number of trials is not very large, for each bin the frequency obtained in this
numerical experiment may be much different from the ideal value for a uniform deviate
(i.e., constant between 0 and 1). How to understand if this observed fluctuations are
statistically acceptable?


## Step 2: Perform several statistical tests to check statistical fluctuations and correlations

### Step 2.1

perform a $\chi^2$
statistic test (see Giordano&Nakanishi’s book): use the function `test_ran` and specify a large number of experiments (at least 1000). This performs a sufficiently
high number of experiments and, for each of them, computes the $\chi^2$
, which is then collect in a histogram. The integral of this histogram (i.e., the cumulative distribution of the $\chi^2$
), should be the incomplete Gamma function, P(a,x), with a equal to half
the number of degrees of freedom, in our case:
$$
 a=(nbins-1)/2  \\
 x=(\chi^2)/2. 
$$
The frequency of $\chi^2$ values and its integral are stored in the second and third column of
chisq-ran0_1000x1100-11, respectively. Compare the cumulative distributions
obtained from different RNGs with the theoretical one, P(5,$\chi^2$
/ 2). Now, try to run the following code to see the $\chi^2$ test.

In [None]:
# example
iran = 0
ntry = 1100
nbins2 = 11
nexp = 1000
test_ran(iran, ntry, nbins2, nexp)

<div class="alert alert-block alert-info"><b>TODO:</b> test all the random number generators, `ran0` `ran1` `ran2` `ran3`, and plot them compare with theoretical P(5, sigma)</div>

In [None]:
for i in range(4):
    iran = i
    ntry = 1100
    nbins2 = 11
    nexp = 1000
    test_ran(iran, ntry, nbins2, nexp)

In [None]:
nbins2 = 11
filef=[
    'chisq-ran0_1000x1100-11',
    'chisq-ran1_1000x1100-11',
    'chisq-ran2_1000x1100-11',
    'chisq-ran3_1000x1100-11',
    ]
# your plot of chi^2 of here
for filename in filef:
    f=open(filename,'r')
    lx=[]
    ly=[]
    lyt=[]
    for l in f:
        x=l.split()
        x0=float(x[0])
        lx.append(x0)
        ly.append(float(x[2]))
        lyt.append(scipy.special.gammainc((nbins2-1)/2,x0))  
    lx=np.array(lx)
    ly=np.array(ly)
    lyt=np.array(lyt)
    plt.plot(lx/2,ly,label=filename)
    plt.plot(lx,lyt,label='Theo.')
plt.legend()
plt.ylabel('$\chi^2$')
plt.show()

### Step 2.2

make a correlation test in k-space using `test_corr`; all the RNGs provided here
should work well (i.e. they should fill the k-dimensional space uniformly), unless you
use a linear congruential generator (LCG) with non-optimal parameters (use iran=9
to select the LCG parameters (a,c,m,i0)); 

In [None]:
# example for iran = 0, 1, 2, 3
iran = 0
ntry = 1000
ndim = 2
test_corr(iran, ntry, ndim)

In [None]:
# example for iran = 9
iran = 9
ntry = 1000
ndim = 2
ia = 10 # for iran = 9
ic = 0 #  for iran = 9
im = 509 # for iran = 9
i0 = 1 # for iran = 9
test_corr(iran, ntry, ndim, ia=ia, ic=ic, im=im, i0=i0)

<div class="alert alert-block alert-info"><b>TODO:</b> Produce 2-dimensional plots for all RNGs from the data in the files `corr-ran?_2dim-???` (use idim=2). What do you observe?</div>


In [None]:
# Files to be open for correlation
filef_corr=[
     # put the file name here, supposed to be 'corr-ran?_2dim-????',
     'corr-ran0_2dim-1000'
    ]
for filename in filef_corr:
    x=[]
    y=[]
    for l in open(filename,'r'):
        x.append(float(l.split()[0]))
        y.append(float(l.split()[1]))
    x=np.array(x)
    y=np.array(y)
    plt.plot(x,y,'.')

plt.xlim([0,1])
plt.show()


### Step 2.3


`test_corr` also prints the average of all RNs immediately following a very small
RN (if there is at least one such event); 

<div class="alert alert-block alert-info"><b>TODO:</b>compare the average of RNs from ran0 with ntry=1000 and ntry=1000000.</div>

In [None]:
# example for iran = 0, 1, 2, 3
iran = 0
ntry = 1000000
ndim = 2
test_corr(iran, ntry, ndim)

If the number of iterations is large enough, we get random numbers with values less than $10^{−5}$. We see that
for the case of `iran=0`, the number that follows a very small number is on average small (approximately 0.06), while for a
uniform distribution it should be 0.5. Hence, `iran=0` fails this test

### Step 2.4

<div class="alert alert-block alert-info"><b>(Optional)TODO:</b> compute the distribution of the product and of the difference of pairs of
RNs using `test_corr`; compare the data in distprod-ran?_??? (columns
2 against 1 for the difference, 4 against 3 for the product) with the corresponding theoretical distributions</div>

In [None]:
# Insert here the name of the files to plot
filef_dist=[
        'distprod-ran0_1000',
        ]



for filename in filef_dist:
    x=[]
    y=[]
    for l in open(filename,'r'):
        x.append(float(l.split()[0]))
        y.append(float(l.split()[1]))
    x=np.array(x)
    y=np.array(y)
    plt.plot(x,y,'.')

yt=-np.log(x)
plt.xlim([0,1])
plt.ylabel('Product')
plt.plot(x,yt,label='Theo.')
plt.legend()
plt.show()

for filename in filef_dist:
    x=[]
    y=[]
    for l in open(filename,'r'):
        x.append(float(l.split()[2]))
        y.append(float(l.split()[3]))
    x=np.array(x)
    y=np.array(y)
    plt.plot(x,y,'.')

plt.xlim([-1,1])
plt.ylabel('Difference')
plt.plot((-1,0,1),(0,2,0),label='Theo.')
plt.legend()
plt.show()

### Step 2.5

perform a two-dimensional $\chi^2$
test for pairs of RNs (you can use the script
test_chisq2d.py); ran0 and ran1 should fail the test for very long RN sequences (but shorter than the RNG period). As the script takes ages to run, the output
can be found in the subfolder `output_test_chisq2d`.


In [None]:
# this codes produce chisq2d for pairs of RNs.
# This taks a long time to run. You can find output files in the folder output_test_chisq2d
import test_ran
import time

ran=[0,1,2,3]
tryy=[110000,1100000,11000000,110000000]
ibin=11
iexp=1000

for ir in ran:
    print('iran={}'.format(ir))
    for it in tryy:
        print('ntry={}'.format(it))
        a=time.time()
        test_ran.main(ir,it,ibin,iexp)
        b=time.time()
        print(str(b-a)+" second")



<div class="alert alert-block alert-info"><b>(Optional)TODO:</b>Perform (not necessary) and plot two dimensional chi^2  test</div>


In [None]:
# Parameters for the theoretical curve
n=500 # Number of points for the incomplete gamma function
xmax=200 # Maximum values

# Insert here the name of the files to plot
filef=[
    './output_test_chisq2d/chisq2d-ran0_1000x110000-11',
    './output_test_chisq2d/chisq2d-ran0_1000x1100000-11',
    './output_test_chisq2d/chisq2d-ran0_1000x11000000-11',
    './output_test_chisq2d/chisq2d-ran0_1000x110000000-11',
    ]

xig=np.zeros(n)
yig=np.zeros(n)

color=['black','red','blue','green']

i=0
for filename in filef:
    x=[]
    y=[]
    for l in open(filename,'r'):
        x.append(float(l.split()[0]))
        y.append(float(l.split()[2]))
    x=np.array(x)
    y=np.array(y)
    plt.plot(x/2,y,label=filef[i])
    i+=1

for i in range(n):
    xig[i]=i/n*xmax
    yig[i]=scipy.special.gammainc(60,xig[i])

plt.plot(xig,yig,label='Theo.')


plt.legend(loc=4)
plt.ylabel('$\chi^2$ 2d')
plt.xlim(0,xmax)
plt.ylim(0,1.01)
plt.show()

## Step 3: Generate normally distributed RNs

with the program `ran_gauss` you can generate
a sequence of RNs distributed according to a normal Gaussian distribution using the Box-Muller method or the Central Limit theorem. A histogram of the frequencies is computed
and saved in a file named, e.g., histo-gaussBM_10000

<div class="alert alert-block alert-info"><b>TODO:</b> visualize it and compare
with the ideal normal distribution function </div>


the  normal distribution function, e.g.
$$
\frac{1}{\sqrt{2\pi}}\exp(-0.5x^2)
$$

In [None]:
# try the following codes
# method 1=Box-Muller
method = 1
ntry = 100000
ran_gauss(method=method, ntry=ntry)

In [None]:
# try the following codes
# method 2=C.L. theor.
method = 2
nsum = 4
ntry = 100000
ran_gauss(method=method, nsum=nsum, ntry=ntry)

In the Center Limit method you can choose the number of uniform random variables that should be
added to obtain a single gaussian distributed variable.(i.e. `nsum`) 

<div class="alert alert-block alert-info"><b>TODO:</b> Check how the distribution of the
sum changes as you increase nsum (it should approximate a gaussian already at a
reasonably small number) </div>


In [None]:
# method 2=C.L. theor.
method = 2
for nsum in range(1,6):  # you can change the range for nsum
    ntry = 100000
    ran_gauss(method=method, nsum=nsum, ntry=ntry)

(Optional) Modify the code to compute the $\chi^2$ test for the Gaussian distribution

## Step 4: Perform a 1D random walk and compute the diffusion coefficient

Perform a 1D random walk and compute the diffusion coefficient using the `ran_walk`
code. 

In [None]:
nwalk = 1000
nstep = 10000
ran_walk(nwalk=nwalk, nstep=nstep)

<div class="alert alert-block alert-info"><b>TODO:</b> Compare the diffusion coefficient with the theoretical value </div>

Diffusion coefficient can be calculated from the m.s.d. through the following relation
$$
\left<x^2\right> = 2Dt,
$$
where t is time, which is equal to the step number in our case. The theoretical value of the diffusion coefficient
can be found from $D = \frac{L^2} {2\Delta t}$, considering that in our calculations $L = \Delta t = 1$, we get $D = 1/2$.