created by Ignacio Oguiza - email: timeseriesAI@gmail.com

## How to efficiently work with (very large) Numpy Arrays? 👷‍♀️

Sometimes we need to work with some very large numpy arrays that don't fit in memory. I'd like to share with you a way that works well for me.

## Import libraries 📚

In [1]:
# ## NOTE: UNCOMMENT AND RUN THIS CELL IF YOU NEED TO INSTALL/ UPGRADE TSAI
# stable = False # True: latest version from github, False: stable version in pip
# if stable: 
#     !pip install -Uqq tsai
# else:      
#     !pip install -Uqq git+https://github.com/timeseriesAI/tsai.git

# ## NOTE: REMEMBER TO RESTART YOUR RUNTIME ONCE THE INSTALLATION IS FINISHED

[K     |████████████████████████████████| 194kB 13.8MB/s 
[K     |████████████████████████████████| 22.2MB 1.3MB/s 
[K     |████████████████████████████████| 5.7MB 60.8MB/s 
[K     |████████████████████████████████| 9.5MB 155kB/s 
[K     |████████████████████████████████| 3.2MB 49.2MB/s 
[K     |████████████████████████████████| 2.5MB 43.1MB/s 
[K     |████████████████████████████████| 174kB 47.2MB/s 
[K     |████████████████████████████████| 901kB 49.3MB/s 
[K     |████████████████████████████████| 92kB 11.7MB/s 
[K     |████████████████████████████████| 61kB 8.2MB/s 
[K     |████████████████████████████████| 25.3MB 79.8MB/s 
[K     |████████████████████████████████| 675kB 59.5MB/s 
[K     |████████████████████████████████| 102kB 14.5MB/s 
[?25h  Building wheel for tsai (setup.py) ... [?25l[?25hdone
  Building wheel for contextvars (setup.py) ... [?25l[?25hdone


In [2]:
from tsai.all import *
print('tsai       :', tsai.__version__)
print('fastai     :', fastai.__version__)
print('fastcore   :', fastcore.__version__)
print('torch      :', torch.__version__)



tsai       : 0.2.15
fastai     : 2.2.5
fastcore   : 1.3.19
torch      : 1.7.0+cu101


## Introduction 🤝

I normally work with time series data. I made the decision to use numpy arrays to store my data since the can easily handle multiple dimensions, and are really very efficient.

But sometimes datasets are really big (many GBs) and don't fit in memory. So I started looking around and found something that works very well: [**np.memmap**](https://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html). Conceptually they work as arrays on disk, and that's how I often call them.

np.memmap creates a map to numpy array you have previously saved on disk, so that you can efficiently access small segments of those (small or large) files on disk, without reading the entire file into memory. And that's exactly what we need with deep learning, be able to quickly create a batch in memory, without reading the entire file (that is stored on disk). 

The best analogy I've found are image files. You may have a very large dataset on disk (that far exceeds your RAM). In order to create your DL datasets, what you pass are the paths to each individual file, so that you can then load a few images and create a batch on demand.

You can view np.memmap as the path collection that can be used to load numpy data on demand when you need to create a batch.

So let's see how you can work with larger than RAM arrays on disk.

On my laptop I have only 8GB of RAM.

I will try to demonstrate how you can handle a 10 GB numpy array dataset in an efficient way. 

## Create and save a larger-than-memory array 🥴

I will now create a large numpy array that doesn't fit in memory. 
Since I don't have enough RAM, I'll create an empty array on disk, and then load data in chunks that fit in memory.

⚠️ If you want to to experiment with large datasets, you may uncomment and run this code. **It will create a ~10GB file on your disk** (we'll delete it at the end of this notebook).

In my laptop it took me around **2 mins to create the data.**

In [49]:
# path = Path('./data')
# X = create_empty_array((100_000, 50, 512), fname='X_on_disk', path=path, mode='r+')

# chunksize = 10_000
# pbar = progress_bar(range(math.ceil(len(X) / chunksize)))
# start = 0
# for i in pbar:
#     end = start + chunksize
#     X[start:end] = np.random.rand(chunksize, X.shape[-2], X.shape[-1])
#     start = end

# # I will create a smaller array. Sinc this fits in memory, I don't need to use a memmap
# y_fn = path/'y_on_disk.npy'
# y = np.random.randint(0, 10, X.shape[0])
# labels = np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
# np.save(y_fn, labels[y])

# del X, y

Ok. So let's check the size of these files on memory.

In [50]:
print(f'X array: {os.path.getsize("./data/X_on_disk.npy"):12} bytes ({bytes2GB(os.path.getsize("./data/X_on_disk.npy")):3.2f} GB)')
print(f'y array: {os.path.getsize("./data/y_on_disk.npy"):12} bytes ({bytes2GB(os.path.getsize("./data/y_on_disk.npy")):3.2f} GB)')

X array:  10240000128 bytes (9.54 GB)
y array:       400128 bytes (0.00 GB)


## Load an array on disk (np.memmap) 🧠

Remember I only have an 8 GB RAM on this laptop, so I couldn't load these datasets in memory.

☣️ Actually I accidentally loaded the "X_on_disk.npy" file, and my laptop crahsed so I had to reboot it!

So let's now load data as arrays on disk (np.memmap). The way to do it is super simple, and very efficient. You just do it as you would with a normal array, but add an mmap_mode.

There are 4 modes: 

- ‘r’	Open existing file for reading only.
- ‘r+’	Open existing file for reading and writing.
- ‘w+’	Create or overwrite existing file for reading and writing.
- ‘c’	Copy-on-write: assignments affect data in memory, but changes are not saved to disk. The file on disk is read-only.

I normally use mode 'c' since I want to be able to make changes to data in memory (transforms for example), without affecting data on disk (same approach as with image data). This is the same thing you do with image files on disk, that are just read, and then modified in memory, without change the file on disk.

But if you also want to be able to modify data on disk, you can load the array with mmap_mode='r+'.

In [51]:
X_on_disk = np.load('./data/X_on_disk.npy', mmap_mode='c')
y_on_disk = np.load('./data/y_on_disk.npy', mmap_mode='c')

**Fast load**: it only takes a few ms to "load" a memory map to a 10 GB array on disk.

In fact, the only thing that is loaded is a map to the array stored on disk. That's why it's so fast.

## Arrays on disk: main features 📀

### Very limited RAM usage

In [52]:
print(X_on_disk.shape, y_on_disk.shape)

(100000, 50, 512) (100000,)


In [53]:
print(f'X array on disk: {sys.getsizeof(X_on_disk):12} bytes ({bytes2GB(sys.getsizeof(X_on_disk)):3.3f} GB)')
print(f'y array on disk: {sys.getsizeof(y_on_disk):12} bytes ({bytes2GB(sys.getsizeof(y_on_disk)):3.3f} GB)')

X array on disk:          152 bytes (0.000 GB)
y array on disk:          120 bytes (0.000 GB)


**152 bytes of RAM for a 10GB array**. This is the great benefit of arrays on disk.

Arrays on disk barely use any RAM until each the it's sliced and an element is converted into a np.array or a tensor.

This is equivalent to the size of file paths in images (very limited) compared to the files themselves (actual images). 

### Types

np.memmap is a subclass of np.ndarray

In [54]:
isinstance(X_on_disk, np.ndarray)

True

In [55]:
type(X_on_disk)

numpy.memmap

### Operations

With np.memmap you can perform the same operations you would with a normal numpy array. 
The most common operations you will perform in deep learning are:

- slicing
- calculating stats: mean and std
- scaling (using normalize or standardize)
- transformation into a tensor

Once you get the array on disk slice, you'll convert it into a tensor, move to a GPU and performs operations there.


⚠️ You need to be careful though not to convert the entire np.memmap to an array/ tensor if it's larger than your RAM. This will crash your computer unless you have enough RAM, so you would have to reboot!

**DON'T DO THIS:  torch.from_numpy(X) or np.array(X)** unless you have ehough RAM.

To avoid issues during test, I created a smaller array on disk (that I can store in memory). When I want to test something I test it with that array first. It's important to always verify that the type output of your operations is np.memmap, which means data is still in memory.

#### Slicing

To ensure you don't brind the entire array in memory (which may crash your computer) you can always work with slices of data, which is by the way how fastai works.

If you use mode 'r' you can grab a sample and make changes to it, but this won't modify data on disk.

In [56]:
x = X_on_disk[0]
x

memmap([[0.34803748, 0.04089686, 0.7991264 , ..., 0.37770525, 0.8188598 ,
         0.17301634],
        [0.6830333 , 0.6505463 , 0.37935442, ..., 0.8085311 , 0.04008082,
         0.2903932 ],
        [0.9397006 , 0.89364505, 0.16049282, ..., 0.15007935, 0.345074  ,
         0.1874511 ],
        ...,
        [0.0391452 , 0.2724889 , 0.3282523 , ..., 0.9183814 , 0.67118037,
         0.39444962],
        [0.74068576, 0.16523811, 0.18562464, ..., 0.18377279, 0.3053631 ,
         0.3580035 ],
        [0.27491328, 0.7390123 , 0.9457232 , ..., 0.18275526, 0.6705215 ,
         0.9529823 ]], dtype=float32)

It's important to note that **when we perform an math operation on a np.memmap (add, subtract, ...) the output is a np.array, and no longer a np.memmap.**

⚠️ Remember you don't want to run this type of operations with a memmap larger than your RAM!! That's why I do it with a slice.

In [57]:
x = X_on_disk[0] + 1
x

array([[1.3480375, 1.0408969, 1.7991264, ..., 1.3777052, 1.8188598,
        1.1730163],
       [1.6830332, 1.6505463, 1.3793545, ..., 1.808531 , 1.0400808,
        1.2903932],
       [1.9397006, 1.893645 , 1.1604928, ..., 1.1500794, 1.3450739,
        1.1874511],
       ...,
       [1.0391452, 1.2724888, 1.3282523, ..., 1.9183815, 1.6711804,
        1.3944496],
       [1.7406857, 1.1652381, 1.1856246, ..., 1.1837728, 1.305363 ,
        1.3580035],
       [1.2749133, 1.7390122, 1.9457232, ..., 1.1827552, 1.6705215,
        1.9529823]], dtype=float32)

In [58]:
x = torch.from_numpy(X_on_disk[0])
x2 = x + 1
x2

tensor([[1.3480, 1.0409, 1.7991,  ..., 1.3777, 1.8189, 1.1730],
        [1.6830, 1.6505, 1.3794,  ..., 1.8085, 1.0401, 1.2904],
        [1.9397, 1.8936, 1.1605,  ..., 1.1501, 1.3451, 1.1875],
        ...,
        [1.0391, 1.2725, 1.3283,  ..., 1.9184, 1.6712, 1.3944],
        [1.7407, 1.1652, 1.1856,  ..., 1.1838, 1.3054, 1.3580],
        [1.2749, 1.7390, 1.9457,  ..., 1.1828, 1.6705, 1.9530]])

As you can see, this doesn't affect the original np.memmap

In [59]:
X_on_disk[0]

memmap([[0.34803748, 0.04089686, 0.7991264 , ..., 0.37770525, 0.8188598 ,
         0.17301634],
        [0.6830333 , 0.6505463 , 0.37935442, ..., 0.8085311 , 0.04008082,
         0.2903932 ],
        [0.9397006 , 0.89364505, 0.16049282, ..., 0.15007935, 0.345074  ,
         0.1874511 ],
        ...,
        [0.0391452 , 0.2724889 , 0.3282523 , ..., 0.9183814 , 0.67118037,
         0.39444962],
        [0.74068576, 0.16523811, 0.18562464, ..., 0.18377279, 0.3053631 ,
         0.3580035 ],
        [0.27491328, 0.7390123 , 0.9457232 , ..., 0.18275526, 0.6705215 ,
         0.9529823 ]], dtype=float32)

You can slice an array on disk by any axis, and it'll return a memmap. Slicing by any axis is very fast.

In [60]:
X_on_disk[0]

memmap([[0.34803748, 0.04089686, 0.7991264 , ..., 0.37770525, 0.8188598 ,
         0.17301634],
        [0.6830333 , 0.6505463 , 0.37935442, ..., 0.8085311 , 0.04008082,
         0.2903932 ],
        [0.9397006 , 0.89364505, 0.16049282, ..., 0.15007935, 0.345074  ,
         0.1874511 ],
        ...,
        [0.0391452 , 0.2724889 , 0.3282523 , ..., 0.9183814 , 0.67118037,
         0.39444962],
        [0.74068576, 0.16523811, 0.18562464, ..., 0.18377279, 0.3053631 ,
         0.3580035 ],
        [0.27491328, 0.7390123 , 0.9457232 , ..., 0.18275526, 0.6705215 ,
         0.9529823 ]], dtype=float32)

In [61]:
X_on_disk[:, 0]

memmap([[0.34803748, 0.04089686, 0.7991264 , ..., 0.37770525, 0.8188598 ,
         0.17301634],
        [0.9259249 , 0.514623  , 0.50216776, ..., 0.8823718 , 0.561646  ,
         0.25591376],
        [0.06298492, 0.10742943, 0.43376994, ..., 0.01061168, 0.3993792 ,
         0.5877482 ],
        ...,
        [0.36538476, 0.6251516 , 0.13214637, ..., 0.56368643, 0.03602772,
         0.02040654],
        [0.7697917 , 0.06593986, 0.12318378, ..., 0.24622898, 0.4352764 ,
         0.8795757 ],
        [0.30351886, 0.05458342, 0.18446152, ..., 0.00465104, 0.35671628,
         0.12464925]], dtype=float32)

However, bear in mind that if you use multiple indices, the output will be a regular numpy array. This is important as it will use more RAM. 

In [62]:
X_on_disk[[0,1]]

array([[[0.34803748, 0.04089686, 0.7991264 , ..., 0.37770525,
         0.8188598 , 0.17301634],
        [0.6830333 , 0.6505463 , 0.37935442, ..., 0.8085311 ,
         0.04008082, 0.2903932 ],
        [0.9397006 , 0.89364505, 0.16049282, ..., 0.15007935,
         0.345074  , 0.1874511 ],
        ...,
        [0.0391452 , 0.2724889 , 0.3282523 , ..., 0.9183814 ,
         0.67118037, 0.39444962],
        [0.74068576, 0.16523811, 0.18562464, ..., 0.18377279,
         0.3053631 , 0.3580035 ],
        [0.27491328, 0.7390123 , 0.9457232 , ..., 0.18275526,
         0.6705215 , 0.9529823 ]],

       [[0.9259249 , 0.514623  , 0.50216776, ..., 0.8823718 ,
         0.561646  , 0.25591376],
        [0.81914663, 0.79375005, 0.65909016, ..., 0.884909  ,
         0.23646063, 0.5160194 ],
        [0.99880844, 0.6775859 , 0.16700691, ..., 0.84936655,
         0.051814  , 0.20492136],
        ...,
        [0.4639562 , 0.41425797, 0.49373862, ..., 0.58005303,
         0.17869665, 0.97369766],
        [0.5

Unless you use a slice with consecutive indices like this:

In [63]:
X_on_disk[:2]

memmap([[[0.34803748, 0.04089686, 0.7991264 , ..., 0.37770525,
          0.8188598 , 0.17301634],
         [0.6830333 , 0.6505463 , 0.37935442, ..., 0.8085311 ,
          0.04008082, 0.2903932 ],
         [0.9397006 , 0.89364505, 0.16049282, ..., 0.15007935,
          0.345074  , 0.1874511 ],
         ...,
         [0.0391452 , 0.2724889 , 0.3282523 , ..., 0.9183814 ,
          0.67118037, 0.39444962],
         [0.74068576, 0.16523811, 0.18562464, ..., 0.18377279,
          0.3053631 , 0.3580035 ],
         [0.27491328, 0.7390123 , 0.9457232 , ..., 0.18275526,
          0.6705215 , 0.9529823 ]],

        [[0.9259249 , 0.514623  , 0.50216776, ..., 0.8823718 ,
          0.561646  , 0.25591376],
         [0.81914663, 0.79375005, 0.65909016, ..., 0.884909  ,
          0.23646063, 0.5160194 ],
         [0.99880844, 0.6775859 , 0.16700691, ..., 0.84936655,
          0.051814  , 0.20492136],
         ...,
         [0.4639562 , 0.41425797, 0.49373862, ..., 0.58005303,
          0.17869665, 0.9

This continues to be a memmap

There's a trick we can use avoid this making use of the excellent new L class in fastai. It is to **itemify** the np.memmap/s. 

In [64]:
def itemify(*x): return L(*x).zip()

To itemify one or several np.memmap/s is very fast. Let's see how long it takes with a 10 GB array.

In [65]:
X_on_disk_as_items = itemify(X_on_disk)

5 seconds to return individual records on disk! Bear in mind you only need to perform this once!

So now, you can select multiple items at the same time, and they will all still be on disk:

In [66]:
X_on_disk_as_items[0,1]

(#2) [(memmap([[0.34803748, 0.04089686, 0.7991264 , ..., 0.37770525, 0.8188598 ,
         0.17301634],
        [0.6830333 , 0.6505463 , 0.37935442, ..., 0.8085311 , 0.04008082,
         0.2903932 ],
        [0.9397006 , 0.89364505, 0.16049282, ..., 0.15007935, 0.345074  ,
         0.1874511 ],
        ...,
        [0.0391452 , 0.2724889 , 0.3282523 , ..., 0.9183814 , 0.67118037,
         0.39444962],
        [0.74068576, 0.16523811, 0.18562464, ..., 0.18377279, 0.3053631 ,
         0.3580035 ],
        [0.27491328, 0.7390123 , 0.9457232 , ..., 0.18275526, 0.6705215 ,
         0.9529823 ]], dtype=float32),),(memmap([[0.9259249 , 0.514623  , 0.50216776, ..., 0.8823718 , 0.561646  ,
         0.25591376],
        [0.81914663, 0.79375005, 0.65909016, ..., 0.884909  , 0.23646063,
         0.5160194 ],
        [0.99880844, 0.6775859 , 0.16700691, ..., 0.84936655, 0.051814  ,
         0.20492136],
        ...,
        [0.4639562 , 0.41425797, 0.49373862, ..., 0.58005303, 0.17869665,
         0

You can also itemify several items at once: X and y for example. When you slice the list, you'll get tuples.

In [67]:
Xy_on_disk_as_items = itemify(X_on_disk, y_on_disk)

In [68]:
Xy_on_disk_as_items[0, 1]

(#2) [(memmap([[0.34803748, 0.04089686, 0.7991264 , ..., 0.37770525, 0.8188598 ,
         0.17301634],
        [0.6830333 , 0.6505463 , 0.37935442, ..., 0.8085311 , 0.04008082,
         0.2903932 ],
        [0.9397006 , 0.89364505, 0.16049282, ..., 0.15007935, 0.345074  ,
         0.1874511 ],
        ...,
        [0.0391452 , 0.2724889 , 0.3282523 , ..., 0.9183814 , 0.67118037,
         0.39444962],
        [0.74068576, 0.16523811, 0.18562464, ..., 0.18377279, 0.3053631 ,
         0.3580035 ],
        [0.27491328, 0.7390123 , 0.9457232 , ..., 0.18275526, 0.6705215 ,
         0.9529823 ]], dtype=float32), 'h'),(memmap([[0.9259249 , 0.514623  , 0.50216776, ..., 0.8823718 , 0.561646  ,
         0.25591376],
        [0.81914663, 0.79375005, 0.65909016, ..., 0.884909  , 0.23646063,
         0.5160194 ],
        [0.99880844, 0.6775859 , 0.16700691, ..., 0.84936655, 0.051814  ,
         0.20492136],
        ...,
        [0.4639562 , 0.41425797, 0.49373862, ..., 0.58005303, 0.17869665,
      

Slicing is very fast, even if there are 100.000 samples.

In [69]:
# axis 0
%timeit X_on_disk[0]

The slowest run took 21.11 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.93 µs per loop


In [71]:
# axis 1
%timeit X_on_disk[:, 0]

The slowest run took 16.79 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.96 µs per loop


In [70]:
# axis 2
%timeit X_on_disk[..., 0]

The slowest run took 19.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.94 µs per loop


In [72]:
# aixs 0,1
%timeit X_on_disk[0, 0]

The slowest run took 20.50 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.97 µs per loop


To compare how fast you can slice a np.memmap, let's create a smaller array that I can fit in memory (X_in_memory). This is 10 times smaller (100 MB) than the one on disk.

In [73]:
X_in_memory_small = np.random.rand(10000, 50, 512)

In [74]:
%timeit X_in_memory_small[0]

The slowest run took 46.56 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 183 ns per loop


Let's create the same array on disk. It's super simple:

In [75]:
np.save('./data/X_on_disk_small.npy', X_in_memory_small)
X_on_disk_small = np.load('./data/X_on_disk_small.npy', mmap_mode='c')

In [76]:
%timeit X_on_disk_small[0]

The slowest run took 20.41 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.95 µs per loop


This is approx. 10 slower than having arrays on disk, although it's still pretty fast.

However, if we use the itemified version, it's much faster:

In [77]:
%timeit X_on_disk_as_items[0]

The slowest run took 15.68 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 877 ns per loop


This is much better! So now you can access 1 of multiple items on disk with a pretty good performance.

#### Calculating stats: mean and std

Another benefit of using arrays on disk is that you can calculate the mean and std deviation of the entire dataset. 

It takes a considerable time since the array is very big (10GB), but it's feasible:

- mean (0.4999966):  1 min 45 s
- std  (0.2886839): 11 min 43 s 

in my laptop. 
If you need them, you could calculate these stats once, and store the results (similar to ImageNet stats).
However, you usually need to claculate these metrics for labeled (train) datasets, that tend to be smaller.

In [79]:
# X_mean = X_on_disk.mean()
# X_mean

In [38]:
# X_std = X_on_disk.std()
# X_std

#### Conversion into a tensor

Conversion from an array on disk slice into a tensor is also very fast:

In [80]:
torch.from_numpy(X_on_disk[0])

tensor([[0.3480, 0.0409, 0.7991,  ..., 0.3777, 0.8189, 0.1730],
        [0.6830, 0.6505, 0.3794,  ..., 0.8085, 0.0401, 0.2904],
        [0.9397, 0.8936, 0.1605,  ..., 0.1501, 0.3451, 0.1875],
        ...,
        [0.0391, 0.2725, 0.3283,  ..., 0.9184, 0.6712, 0.3944],
        [0.7407, 0.1652, 0.1856,  ..., 0.1838, 0.3054, 0.3580],
        [0.2749, 0.7390, 0.9457,  ..., 0.1828, 0.6705, 0.9530]])

In [81]:
X_on_disk_small_0 = X_on_disk_small[0]
X_in_memory_small_0 = X_in_memory_small[0]

In [82]:
%timeit torch.from_numpy(X_on_disk_small_0)

The slowest run took 44.11 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.13 µs per loop


In [83]:
%timeit torch.from_numpy(X_in_memory_small_0 )

The slowest run took 46.27 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.16 µs per loop


So it takes the same time to convert from numpy.memmap or from a np.array in memory is the same.

#### Combined operations: slicing plus conversion to tensor

Let's now check performance of the combined process: slicing plus conversion to a tensor. Based on what we've seen there are 3 options: 

- slice np.array in memory + conversion to tensor
- slice np.memamap on disk + conversion to tensor
- slice itemified np.memmap + converion to tensor

In [84]:
%timeit torch.from_numpy(X_in_memory_small[0])

The slowest run took 76.50 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.44 µs per loop


In [85]:
%timeit torch.from_numpy(X_on_disk_small[0])

The slowest run took 24.08 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.89 µs per loop


In [86]:
X_on_disk_small_as_items = itemify(X_on_disk_small)

In [87]:
%timeit torch.from_numpy(X_on_disk_small_as_items[0][0])

The slowest run took 23.52 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.35 µs per loop


So this last method is **almost as fast as having the array in memory**!! This is an excellent outcome, since slicing arrays in memory is a highly optimized operation. 

And we have the benefit of having access to very large datasets if needed.

## Remove the arrays on disk

Don't forget to remove the arrays you have created on disk.

In [88]:
os.remove('./data/X_on_disk.npy')
os.remove('./data/X_on_disk_small.npy')
os.remove('./data/y_on_disk.npy')

## Summary ✅

We now have a very efficient way to work with very large numpy arrays.

The process is very simple:

- create and save the array on disk (as described before)
- load it with a mmap_mode='c' if you want to be able to modify data in memory but not on dis, or 'r+ if you want to modify data both in memory and on disk.

So my recommendation would be:

- use numpy arrays in memory when possible (if your data fits in memory)
- use numpy memmap (arrays on disk) when data doesn't fit. You will still have a great performance.