## measuring memory latency 

The purpose of this notebook is to overcome a problem int the notebook `2_measuring_performance_of_memory_hierarchy.ipynb`.

The problem is that the `time()` function is only accurate up to $10^{-7}$ of a second. So any operations that take a shorter time do not register as taking any time.

To overcome the problem we perform many random pokes in sequence and measure the time it takes to complete all of the pokes.

As we ware interested in times shorter than $10^{-7}$ we restrict our attention to the main memory, rather than to files.

### Import modules

In [2]:
%pylab inline
from numpy import *

Populating the interactive namespace from numpy and matplotlib


In [3]:
import time
from matplotlib.backends.backend_pdf import PdfPages

from os.path import isfile,isdir
from os import mkdir
import os

In [43]:
from lib.measureRandomAccess import measureRandomAccess
from lib.PlotTime import PlotTime

ModuleNotFoundError: No module named 'lib'

### setting parameters
* We test access to elements arrays whose sizes are:
   * 1MB, 10MB, 100MB, 1000MB (=1GB)
* Arrays are stored **in memory** or on disk **on disk**

* We perform 1 million read/write ops  to random locations in the array.
* We analyze the **distribution** of the latencies.

In [5]:
n=100 # size of single block (1MB)
m_list=[1,10,100,1000,10000] # size of file in blocks
k=100000;  # number of repeats
L=len(m_list)
print('n=%d, k=%d, m_list='%(n,k),m_list)

n=100, k=100000, m_list= [1, 10, 100, 1000, 10000]


### Set working directory
This script generates large files. We put these files in a separate directory so it is easier to delete them later.

In [6]:
log_root='./logs'
if not isdir(log_root): mkdir(log_root)
TimeStamp=str(int(time.time()))
log_dir=log_root+'/'+TimeStamp
mkdir(log_dir)
%cd $log_dir
stat=open('stats.txt','w')

def tee(line):
    print(line)
    stat.write(line+'\n')

/Users/jasminesimmons/Grad_School/Spring_Qtr_2020/DSC291/DSC291_Team4_github/dsc291team4/HW2/logs/1587409106


In [7]:
_mean=zeros([2,L])   #0: using disk, 1: using memory
_std=zeros([2,L])
Tmem=[]
TFile=[]

In [8]:
import numpy as np
from numpy.random import rand
import time

def measureRandomAccessMemBlocks(sz,k=1000,batch=100):
    """Measure the distribution of random accesses in computer memory.

    :param sz: size of memory block.
    :param k: number of times that the experiment is repeated.
    :param batch: The number of locations poked in a single experiment (multiple pokes performed using numpy, rather than python loop)
    :returns: (_mean,std,T):
              _mean = the mean of T
              _std = the std of T
              T = a list the contains the times of all k experiments
    :rtype: tuple

    """
    # Prepare buffer.
    A=np.zeros(sz,dtype=np.int8)
            
    # Read and write k*batch times from/to buffer.
    sum=0; sum2=0
    T=np.zeros(k)
    for i in range(k):
        if (i%100==0): print('\r',i, end=' ')
        loc=np.int32(rand(batch)*sz)
        t=time.time()
        x=A[loc]
        A[loc]=loc
        d=(time.time()-t)/batch
        T[i]=d
        sum += d
        sum2 += d*d
    _mean=sum/k; var=(sum2/k)-_mean**2; _std=np.sqrt(var)
    return (_mean,_std,T)

In [9]:
m_list=[10000000, 20000000, 30000000, 40000000, 50000000, 60000000, 70000000]
m_legend=['10MB', '20MB', '30MB', '40MB', '50MB', '60MB', '70MB']

In [10]:
# Create a pandas dataframe to store the results. 
import pandas as pd

col_names = [] # Format pandas column names 
col_names.append('Time')
for i in m_list: 
    col_names.append(str(i) + '_Mean')
    col_names.append(str(i) + '_STD')

data = []

# Add a timer and a loop for the code below to run a trial of random_pokes() once per minute.
import time

# Run 60 trials, once per minute for an hour
num_trials = 0

while num_trials < 60:   # number of trials to run of time length x in time.sleep(x)
    localtime = time.localtime()
    result = time.strftime("%I:%M:%S %p", localtime)
    print(result, end="", flush=True)
    print("\r", end="", flush=True)
  

    Random_pokes=[]

    L=len(m_list)
    _mean=zeros([L])   #0: using disk, 1: using memory
    _std=zeros([L])
    TMem=[0]*L

    data_row = [result]
    for m_i in range(L):
        m=m_list[m_i]
        print('Memory array %d Bytes'%m)
        out = measureRandomAccessMemBlocks(m,k=1000,batch=1000)
        (_mean[m_i],_std[m_i],TMem[m_i]) = out
        TMem[m_i].sort()
        tee('\rMemory pokes _mean='+str(_mean[m_i])+', Memory _std='+str(_std[m_i]))

        Random_pokes.append({'m_i':m_i,
                            'm':m,
                            'memory__mean': _mean[m_i],
                            'memory__std': _std[m_i],
                            'memory_largest': TMem[m_i][-100:],
                    })

        data_row.append(str(_mean[m_i]))
        data_row.append(str(_std[m_i]))
       
    data.append(data_row)
    time.sleep(60)   # sleep for one minute (sleep function is in seconds)
    num_trials += 1

Memory array 10000000 Bytes
Memory pokes _mean=3.70926856994629e-08, Memory _std=1.011728783683005e-07
Memory array 20000000 Bytes
Memory pokes _mean=3.8277626037597854e-08, Memory _std=1.1869122555825732e-07
Memory array 30000000 Bytes
Memory pokes _mean=4.665207862854046e-08, Memory _std=1.4127540293992928e-07
Memory array 40000000 Bytes
Memory pokes _mean=6.257963180542028e-08, Memory _std=2.3750696775786994e-07
Memory array 50000000 Bytes
Memory pokes _mean=6.790947914123544e-08, Memory _std=2.2642404076809139e-07
Memory array 60000000 Bytes
Memory pokes _mean=7.728409767150907e-08, Memory _std=2.619228663110108e-07
Memory array 70000000 Bytes
Memory pokes _mean=1.1552405357360825e-07, Memory _std=3.966594149394396e-07
Memory array 10000000 Bytes
Memory pokes _mean=2.4815082550048704e-08, Memory _std=5.16313186693563e-09
Memory array 20000000 Bytes
Memory pokes _mean=2.7097702026367097e-08, Memory _std=2.338149218060513e-08
Memory array 30000000 Bytes
Memory pokes _mean=2.855896949

In [11]:
print(data[:2])

[['11:58:28 AM', '3.70926856994629e-08', '1.011728783683005e-07', '3.8277626037597854e-08', '1.1869122555825732e-07', '4.665207862854046e-08', '1.4127540293992928e-07', '6.257963180542028e-08', '2.3750696775786994e-07', '6.790947914123544e-08', '2.2642404076809139e-07', '7.728409767150907e-08', '2.619228663110108e-07', '1.1552405357360825e-07', '3.966594149394396e-07'], ['11:58:31 AM', '2.4815082550048704e-08', '5.16313186693563e-09', '2.7097702026367097e-08', '2.338149218060513e-08', '2.8558969497680552e-08', '1.3663221020703498e-08', '2.8179645538329997e-08', '1.1906649534513436e-08', '2.816057205200187e-08', '1.697054503482415e-08', '2.8516530990600508e-08', '1.172687551175156e-08', '2.9683351516723484e-08', '1.5647355218249535e-08']]


In [12]:
df = pd.DataFrame(data, columns=col_names)
print(df)

          Time           10000000_Mean           10000000_STD  \
0  11:58:28 AM    3.70926856994629e-08  1.011728783683005e-07   
1  11:58:31 AM  2.4815082550048704e-08   5.16313186693563e-09   
2  11:58:33 AM  2.4167299270629707e-08  1.426592420575795e-08   
3  11:58:36 AM  2.5387525558471582e-08  9.420737527426776e-09   
4  11:58:38 AM  2.3809432983398297e-08  5.395615797640838e-09   

            20000000_Mean            20000000_STD           30000000_Mean  \
0  3.8277626037597854e-08  1.1869122555825732e-07   4.665207862854046e-08   
1  2.7097702026367097e-08   2.338149218060513e-08  2.8558969497680552e-08   
2  2.5682687759399317e-08  1.5928818531307707e-08  2.5496959686279193e-08   
3   2.521109580993634e-08   1.955570599121378e-08   2.584743499755852e-08   
4   2.747559547424294e-08  1.8337124481211372e-08  2.6690483093261647e-08   

             30000000_STD           40000000_Mean            40000000_STD  \
0  1.4127540293992928e-07   6.257963180542028e-08  2.3750696775786994

In [39]:
# Take the mean of each column in the dataframe: 
new_df = df.loc[:, df.columns != 'Time']
for c in new_df.columns: 
    df[c] = pd.to_numeric(df[c], downcast='float')
df_means = new_df.mean(axis = 0, skipna = True)

print(df_means)

10000000_Mean    2.705441e-08
10000000_STD     2.708366e-08
20000000_Mean    2.874894e-08
20000000_STD     3.917887e-08
30000000_Mean    3.064919e-08
30000000_STD     4.106517e-08
40000000_Mean    3.429169e-08
40000000_STD     5.926063e-08
50000000_Mean    3.582969e-08
50000000_STD     5.761232e-08
60000000_Mean    3.769117e-08
60000000_STD     6.408113e-08
70000000_Mean    4.596157e-08
70000000_STD     9.252914e-08
dtype: float32


### Characterize random access to storage

In [41]:
# Re-create m_list from the pandas dataframe: 
means_stds = np.array(df_means)
print(means_stds)

[2.7054407e-08 2.7083658e-08 2.8748943e-08 3.9178872e-08 3.0649186e-08
 4.1065171e-08 3.4291695e-08 5.9260628e-08 3.5829689e-08 5.7612322e-08
 3.7691166e-08 6.4081128e-08 4.5961571e-08 9.2529142e-08]


In [None]:
# TO DO: Re-plot based on the averaged means, standard deviations (need to change code below)

In [42]:
pp = PdfPages('MemoryBlockFigure.pdf')
figure(figsize=(6,4))

Colors='bgrcmyk'  # The colors for the plot
LineStyles=['-']

fig = matplotlib.pyplot.gcf()
fig.set_size_inches(18.5,10.5)

for m_i in range(len(m_list)):
    Color=Colors[m_i % len(Colors)]
    PlotTime(TMem[m_i],_mean[m_i],_std[m_i],\
             Color=Color,LS='-',Legend=m_legend[m_i],\
             m_i=m_i)

grid()
legend(fontsize=18)
xlabel('delay (sec)',fontsize=18)
ylabel('1-CDF',fontsize=18)
tick_params(axis='both', which='major', labelsize=16)
tick_params(axis='both', which='minor', labelsize=12)
pp.savefig()
pp.close()

NameError: name 'PlotTime' is not defined

<Figure size 1332x756 with 0 Axes>

### Conclusions

We see that for this laptop (an apple powerbook) the latency of random pokes is close to $10^{-8}$ for blocks of size up to 1 MB. Beyond that, for sizes of 10MB, 100MB and 1GB, the delay is significantly larger.

This makes sense because the size of the L3 cache in this machine is about 6MB.

In [None]:
Saturday April 18, 2020 9:27:27pm