# Part 1A Structuring Datasets

>Objectives:
>
> * How to use a hierarchy to structure datasets inside the same file
> * Use the hierarchy in h5py and PyTables
> * Interactive auto-completion

## Using the Hierarchy

In HDF5, all nodes stem from a root ("/").  The nodes can be either `Groups` or `Datasets` (also know as `Leaves` in PyTables).  `Groups` are the equivalent of directories on a filesystem and can container `Datasets` or other `Groups`.  A `Dataset` is a container for data.

In [1]:
import numpy as np

In [2]:
import os
import shutil
data_dir = "structuring"
if os.path.exists(data_dir):
    shutil.rmtree(data_dir)
os.mkdir(data_dir)

## PyTables

In [3]:
import tables

Create a HDF5 file:

In [None]:
FILENAME = os.path.join(data_dir, "layout.h5")
f = tables.open_file(FILENAME, "w")

Create a group:

In [10]:
group = f.create_group('/', 'a_group')
group

NodeError: group ``/`` already has a child node named ``a_group``

Inside this group we can create many datasets:

In [11]:
f.create_array(group, "my_array1", np.arange(10))
f.create_array(group, "my_array2", np.ones(100).reshape(10, 10));

or another group:

In [12]:
f.create_group('/a_group', 'another_group')

/a_group/another_group (Group) ''
  children := []

Let's look at the structure of the HDF5 file:

In [13]:
print(f)

structuring\layout.h5 (File) ''
Last modif.: 'Thu Jun 22 08:52:40 2017'
Object Tree: 
/ (RootGroup) ''
/a_group (Group) ''
/a_group/my_array1 (Array(10,)) ''
/a_group/my_array2 (Array(10, 10)) ''
/a_group/another_group (Group) ''



With that, you can endow your datasets with any hierachy that would fit better to your needs.

### Natural naming

In PyTables, you may access nodes as attributes on a Python object, namely `f.root.a_group.some_data`.  This is known as natural naming.

In [14]:
f.root.a_group.my_array1

/a_group/my_array1 (Array(10,)) ''
  atom := Int32Atom(shape=(), dflt=0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := None

Natural naming supports `<TAB>` completion:

In [15]:
f.root.a_group.

/a_group (Group) ''
  children := ['my_array1' (Array), 'my_array2' (Array), 'another_group' (Group)]

In [16]:
f.close()

## h5py

In [17]:
import h5py

In [18]:
f = h5py.File(FILENAME, 'a')

In [19]:
list(f)

['a_group']

The `h5py.File` object acts as a dictonary, which exposes the groups and datasets:

In [20]:
f['/a_group']

<HDF5 group "/a_group" (3 members)>

Using the `dict` like property of the `h5py.File` object, we can view and access its members:

In [21]:
list(f['/a_group'])

['another_group', 'my_array1', 'my_array2']

`<TAB>` completion must be enabled in h5py:

In [22]:
h5py.enable_ipython_completer()

In [18]:
# use <TAB> completion:
f['/a_group/my_array2']

<HDF5 group "/a_group" (3 members)>

In [23]:
f.create_group('/a_group/YAG')

<HDF5 group "/a_group/YAG" (0 members)>

In [24]:
f.close()