In [2]:
import numpy as np
import tables as tb

Create a new HDF5 file

In [3]:
f = tb.open_file('myfile.h5', 'w')

Create an HDF5 table with two columns:
* The name of a city (a string with at most 64 characters)
* Population (32-bit integer)

In [4]:
dtype = np.dtype([('city', 'S64'),
                 ('population', 'i4')])

Create the table in `/table1`:

In [5]:
table = f.create_table('/', 'table1', dtype)

In [6]:
table.append([('Brussels', 1138843),
             ('London', 8308369),
             ('Paris', 2243833)])

After adding rows, we need to flush the table to commit the changes on disk:

In [7]:
table.flush()

There are many ways to access the data from a table. The easiest but not particularly efficient way is to load the entire table in memory, which returns a NumPy array:

In [8]:
table[:]

array([(b'Brussels', 1138843), (b'London', 8308369), (b'Paris', 2243833)],
      dtype=[('city', 'S64'), ('population', '<i4')])

It is also possible to load a particular column (with all rows):

In [9]:
table.col('city')

array([b'Brussels', b'London', b'Paris'], dtype='|S64')

When dealing with a large number of rows, we can make a SQL-like query in the table to load all rows that satisfy particular conditions:

In [10]:
[row['city'] for row in table.where('population>2e6')]

[b'London', b'Paris']