# NumPy's Native Binary File Format

In [None]:
# Imports required but not shown in the video lecture.
from numpy import array, save, load, arange, savetxt, savez, savez_compressed
from numpy.random import rand

## NumPy Binary File Format
### npy format

- The ".npy" format is the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk.
- The format stores all of the shape and dtype information necessary to reconstruct the array correctly even on another machine with a different architecture.
- See the module docstring in `numpy.lib.format` for details.

### npz format

- The ".npz" format is the standard format for persisting multiple NumPy arrays on disk. A ".npz" file isa zip file containing multiple ".npy" files, one for each array.

## NumPy Binary File Format
### `save`, `savez`, `savez_compressed`

- __`save(file, arr)`__:
  Save a single array in ".npy" format.
- __`savez(file, args, kwds)`__:
  Save serveral arrays into a single file in uncompressed ".npz" format. Creates a zipped archive of files.
- __`savez_compressed(file, args, kwds)`__:
  Save several arrays into a single file in comressed ".npz" format. Creates a compressed zipped archive of files.

### `load`

- __`load(flie, mmap_mode=None)`__
  Load a ".npy" or ".npz" file. For ".npy", returns the array. For ".npz", returns a dictionary-like object.

### Save a single array

In [None]:
a = array([[1.0, 2.0],
           [3.0, 4.0]])

Save to an uncompressed binary file.

In [None]:
fname = 'afile.npy'
save(fname, a)

And restore it.

In [None]:
aa = load(fname)
aa

### File size comparison
Create a largish array of doubels. ~80000 bytes

In [None]:
a = arange(10000.)

First save as text.

In [None]:
import os
savetxt('a.txt', a)

Text file: 250 kilobytes

In [None]:
os.stat('a.txt').st_size

Now the binary file.

In [None]:
save('a.npy', a)

Binary file: ~80 kilobytes roughly 1/3 size of text

In [None]:
os.stat('a.npy').st_size

## NumPy Binary File Format
### Save multiple arrays

In [None]:
a = array([[1.0, 2.0],
           [3.0, 4.0]])
b = arange(1000)

Save both `a` and `b` in one file.

In [None]:
savez('data.npz', a=a, b=b)

Look at the file structure.

In [None]:
!unzip -l data.npz

### Dictionary-like API
The object for an npz file has a dictionary-like API.

In [None]:
data = load('data.npz')

Look at the variables available from file.

In [None]:
data.keys()

Access vars with indexing.

In [None]:
data['a']

In [None]:
data['b'].shape

## NumPy Binary File Format
### Compression

Compare uncompressed and compressed binary data.

In [None]:
a = arange(20000.)

Uncompressed: ~160 Kbytes

In [None]:
savez('a.npz', a=a)
os.stat('a.npz').st_size

Compressed: ~27 Kbytes

In [None]:
savez_compressed('a2.npz',
                  a=a)
os.stat('a2.npz').st_size

~6x compression...Pretty good. But, your mileage will vary.

### Compression — random data
Random data doesn't compress well.

In [None]:
a = rand(20000)

Uncompressed: ~160 Kbytes

In [None]:
savez('a.npz', a=a)
os.stat('a.npz').st_size

Compressed: ~151 Kbytes

In [None]:
savez_compressed('a2.npz',
                  a=a)
os.stat('a2.npz').st_size

~1.06 compression...Such is life with completely random data.

Copyright 2008-2016, Enthought, Inc.<br>Use only permitted under license.  Copying, sharing, redistributing or other unauthorized use strictly prohibited.<br>http://www.enthought.com