# File I/O Benchmarks
Loading everything into memory is very slow. Reading and writing huge datasets with `pickle` is very slow. How do `np.loadtxt/savetxt` compare with `pickle.load/dump`?

In [1]:
import sys,os
import numpy as np
import pickle
import matplotlib.pyplot as plt
%matplotlib inline

First, let's create three numpy arrays.

In [2]:
array1=np.random.rand(100)
array2=np.linspace(-500,500,10000)
array3=array2**2+5*array2

## Writing to File
First, let's look at how long it takes to write these to a file.

 * `pickle.dump` with three separate arrays

In [3]:
%%timeit
with open('pickle.array','wb') as f:
    pickle.dump([array1,array2,array3],f)

1000 loops, best of 3: 1.22 ms per loop


 * `pickle.dump` with a single dictionary object

In [4]:
%%timeit
with open('pickle.dict','wb') as f:
    pickle.dump([{'a1':array1,'a2':array2,'a3':array3}],f)

1000 loops, best of 3: 1.22 ms per loop


 * `np.savetxt` for three separate files

In [5]:
%%timeit
np.savetxt('numpy.a1',array1)
np.savetxt('numpy.a2',array2)
np.savetxt('numpy.a3',array3)

10 loops, best of 3: 108 ms per loop


## Reading from File
Now, more importantly, how long does it take to read from the file?

 * `pickle.load` with three separate arrays

In [7]:
%%timeit
with open('pickle.array','rb') as f:
    na1,na2,na3=pickle.load(f)

The slowest run took 10.04 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 56.7 µs per loop


 * `pickle.load` with a single dictionary

In [8]:
%%timeit
with open('pickle.dict','rb') as f:
    nad=pickle.load(f)

The slowest run took 8.99 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 58.4 µs per loop


 * `np.loadtxt` for three separate files

In [10]:
%%timeit
na1,na2,na3=np.loadtxt('numpy.a1'),np.loadtxt('numpy.a2'),np.loadtxt('numpy.a3')

10 loops, best of 3: 180 ms per loop


Since we'll be only reading/writing small files (though many of them!), this should scale quite linearly. It seems that only when you impose a lot of structure on these binary files that they really get slow.