# Part 2B: Low level API

Goal:
-----

explore h5py functionality to directly access the low level HDF5 API in Python.

h5py exposes the HDF5 low level API, which can be mixed with the high level interface.

[The HDF5 low level API](https://support.hdfgroup.org/HDF5/doc/RM/RM_H5Front.html) is subdivided into sections:
 * H5A : Attributes
 * H5D : Datasets
 * H5F : File interface

(...)

h5py wrap these functions in cython modules:
 * h5py.h5a
 * h5py.h5d
 * h5py.hdf
 
(...)

Most of the wrappers function live in a proxy class. For example the `H5D` functions are contained in the `DatasetID` class.

h5py low level API reference: http://api.h5py.org/index.html

In [3]:
import h5py
import numpy as np

In [90]:
f = h5py.File('dataset.h5', 'w')
f['/data'] = np.arange(100, dtype=np.int8)
f['/data'][:]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99], dtype=int8)

### H5Dget_storage_size() : get size of dataset on disk
```
H5Dget_storage_size
hsize_t H5Dget_storage_size( hid_t dataset_id )
```
HDF5 API reference: https://support.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-GetStorageSize

h5py API docs: http://api.h5py.org/h5d.html

In [92]:
dset = f['data']
dset

<HDF5 dataset "data": shape (100,), type "|i1">

In [93]:
dset.id

<h5py.h5d.DatasetID at 0x7f7ae425ed08>

The `DatasetID` class is a proxy object that wrap the H5D API in the Cython module `h5d.pyx`:


In [94]:
dset.id.get_storage_size()

100L

In [95]:
f.close()

## H5Pset_chunk_cache()
https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetChunkCache

Setting the file properties (chunk cache size)

Inpired by the `h5py-cache` package by Mike Boyle: https://github.com/moble/h5py_cache
See also: https://stackoverflow.com/questions/14653259/



In [36]:
# create file access properties. H5Pcreate()
propfaid = h5py.h5p.create(h5py.h5p.FILE_ACCESS)

In [37]:
settings = list(propfaid.get_cache())
settings

[0, 521, 1048576, 0.75]

In [38]:
nslots = 521
chunk_cache_mem_size = 2 * 1024 * 1024  
w0 = 0.75
settings[1:] = (nslots, chunk_cache_mem_size, w0)

# 
propfaid.set_cache(*settings)

In [32]:
# open the file
mode = h5py.h5f.ACC_TRUNC
f = h5py.File(h5py.h5f.create(b'newfile.h5', flags=mode, fapl=propfaid))

`We now have a "regular" (high-level interface) `h5py.File` object:

In [33]:
f['some_dataset'] = [0, 1, 2]

In [34]:
list(f)

['some_dataset']

In [42]:
f.close()

ValueError: Not a file id (Not a file id)

# Exercise (Low level API)

* h5py low level API http://api.h5py.org
* HDF5 API reference https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html

- Create (or open, there's a difference!) a file using the HDF5 low level API.
- Investigate the objects returned by each call
- Use it in the high level API.
- Close it.


# Excercise (Parallel HDF5)

Create a parallel version of the Mandelbrot demonstration (SMWR).

No Parallel HDF5: Use multiprocessing to "roll your own".