# Part 1A Structuring Datasets

Goals:
-------

 * How to use a hierarchy to structure datasets inside the same file
 * Use the hierarchy in h5py and PyTables

## Using the Hierarchy

In HDF5, all nodes stem from a root ("/").  The nodes can be either `Groups` or `Datasets` (also know as `Leaves` in PyTables).  `Groups` are the equivalent of directories on a filesystem and can container `Datasets` or other `Groups`.  A `Dataset` is a container for data.

In [1]:
import numpy as np

In [2]:
import os
import shutil
data_dir = "structuring"
if os.path.exists(data_dir):
    shutil.rmtree(data_dir)
os.mkdir(data_dir)

## PyTables

In [28]:
import tables

In [3]:
FILENAME = "%s/layout.h5" % data_dir
f = tables.open_file(FILENAME, "w")
group = f.create_group('/', 'a_group')
group

/a_group (Group) ''
  children := []

Inside this group we can create many datasets:

In [4]:
f.create_array(group, "my_array1", np.arange(10))
f.create_array(group, "my_array2", np.ones(100).reshape(10, 10));

or another group:

In [5]:
f.create_group('/a_group', 'another_group')

/a_group/another_group (Group) ''
  children := []

In [6]:
print(f)

structuring/layout.h5 (File) ''
Last modif.: 'Wed Jun 14 15:48:18 2017'
Object Tree: 
/ (RootGroup) ''
/a_group (Group) ''
/a_group/my_array1 (Array(10,)) ''
/a_group/my_array2 (Array(10, 10)) ''
/a_group/another_group (Group) ''



With that, you can endow your datasets with any hierachy that would fit better to your needs.

### Natural naming

In PyTables, you may access nodes as attributes on a Python object, namely `f.root.a_group.some_data`.  This is known as natural naming.

In [8]:
f.root.a_group.my_array1

/a_group/my_array1 (Array(10,)) ''
  atom := Int32Atom(shape=(), dflt=0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := None

Natural naming supports `<TAB>` completion:

In [23]:
f.root.a_group  # use <TAB> completion

AttributeError: 'File' object has no attribute 'root'

In [17]:
f.close()

## h5py

In [11]:
import h5py

In [12]:
f = h5py.File(FILENAME, 'r')

In [13]:
list(f)

['a_group']

The `h5py.File` object acts as a dictonary, which exposes the groups and datasets:

In [20]:
f['/a_group']

<HDF5 group "/a_group" (3 members)>

In [22]:
list(f['/a_group'])

['another_group', 'my_array1', 'my_array2']

`<TAB>` completion must be enabled in h5py:

In [26]:
h5py.enable_ipython_completer()

In [27]:
f['/a_group']  # use tab completion

<HDF5 group "/a_group" (3 members)>