# 7 Expressions 

The `tables.Expr` class evaluates (in-kernel) expressions on array-like objects. All the internal computations are performed via the `Numexpr` package. `Numexpr` provides multi-threading, SIMD and blocking techniques to solve the starving CPU problem. In combination with compressors (blosc) very high out-of-core performance can be reached for expressions on large-than-memory arrays (tables).

In [1]:
import tables
import numpy as np

In [2]:
data_dir = 'expr'
import os
import shutil
if os.path.exists(data_dir):
    shutil.rmtree(data_dir)
os.mkdir(data_dir)

We create a table with four columns (four-momentum from particle physics) and store random floats:

In [3]:
FILENAME = os.path.join(data_dir, "momentum.h5")
f = tables.open_file(FILENAME, "w")

In [4]:
class FourMomentum(tables.IsDescription): 
    E = tables.Float64Col()
    p_x = tables.Float64Col()
    p_y = tables.Float64Col()
    p_z = tables.Float64Col() 

In [5]:
filters = tables.Filters(complevel=0)  # no compression

In [6]:
t = f.create_table(f.root, "mydata", FourMomentum, filters=filters)

In [7]:
dtype = t.dtype

Store 1 million rows:

In [8]:
N = int(1e6)

In [9]:
arr = np.random.random((N,)).astype(dtype)
arr[:2]

array([( 0.04163101,  0.04163101,  0.04163101,  0.04163101),
       ( 0.5249018 ,  0.5249018 ,  0.5249018 ,  0.5249018 )],
      dtype=[('E', '<f8'), ('p_x', '<f8'), ('p_y', '<f8'), ('p_z', '<f8')])

In [10]:
t.append(arr)

In [11]:
t.flush()

In [12]:
t[:10]['p_x']

array([ 0.04163101,  0.5249018 ,  0.4513212 ,  0.48990016,  0.47547629,
        0.93303951,  0.48467668,  0.03413684,  0.78109578,  0.95519787])

We can acccess the columns using the `Cols` accessor:

In [13]:
px = t.cols.p_x
py = t.cols.p_y
pz = t.cols.p_z

Define the expression:

In [14]:
expr = tables.Expr('px**2 + py**2 + pz**2')
expr

<tables.expression.Expr at 0x1d2e034ac18>

Evaluate the expression, result will be stored in-memory:

In [15]:
%timeit expr.eval()

97.4 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Let's compare the pure-numpy version of the expression:

In [16]:
%timeit arr['p_x']**2 + arr['p_y']**2 + arr['p_z']**2

32.7 ms ± 219 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In this example numpy is 2-3x times faster. For large, in-memory arrays, pure numpy is usually about 1.5x faster.

### out-of-core

We can store results on-disk (in HDF5) so we can evaluate expressions out-of-core even if the results do not fit into memory:

In [17]:
output_array = f.create_carray(f.root, "output", atom=tables.Float64Atom(), shape=(N,), filters=filters)
output_array

/output (CArray(1000000,)) ''
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (8192,)

In [18]:
expr.set_output(output_array)

In [19]:
%timeit expr.eval()

98.3 ms ± 1.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Read the output:

In [20]:
f.root.output[:]

array([ 0.00519942,  0.82656571,  0.61107247, ...,  2.92931996,
        0.00319073,  0.04385807])

In this way we can evaluate expressions with larger-than-memory results out-of-core. 

In [21]:
f.close()

## Using compression

The facilitate experimenting with dataset size, compression etc, we define some functions:

In [22]:
fn = os.path.join(data_dir, 'momentum.h5')

In [23]:
def create_dataset(N, filename, filters):
    """Create table '/mydata' with a random four-momentum table of size N"""
    with tables.open_file(filename, "w") as f:

        t = f.create_table(f.root, "mydata", FourMomentum, filters=filters)

        dtype = t.dtype
        arr = np.random.random((N,)).astype(dtype)
        t.append(arr)
        t.flush()

        f.create_carray(f.root, "output", atom=tables.Float64Atom(), shape=(N,), filters=filters)
        

In [24]:
def create_expression(f, output=None, expr='px**2 + py**2 + pz**2'):
    """Create an expression object"""
    t = f.root.mydata
    px = t.cols.p_x
    py = t.cols.p_y
    pz = t.cols.p_z

    e = tables.Expr(expr)
    if output is not None:
        e.set_output(output)
    return e


In [25]:
filters = tables.Filters(complevel=0)
fn = os.path.join(data_dir, 'momentum-uncompressed.h5')

In [26]:
N=int(1e7)  # N == 1e7 is about 300 Mbytes.
create_dataset(N, fn, filters)

In [27]:
!ls -lh {data_dir}

total 344M
-rw-r--r-- 1 tomkooij 197613 306M Jun 27 09:06 momentum-uncompressed.h5
-rw-r--r-- 1 tomkooij 197613  39M Jun 27 09:06 momentum.h5


In [28]:
with tables.open_file(fn, 'a') as f:
    expr = create_expression(f)
    %timeit expr.eval()

981 ms ± 16.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [29]:
with tables.open_file(fn, 'a') as f:
    expr = create_expression(f, output=f.root.output)
    %timeit expr.eval()

988 ms ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [30]:
!ls -lh {data_dir}

total 421M
-rw-r--r-- 1 tomkooij 197613 382M Jun 27 09:07 momentum-uncompressed.h5
-rw-r--r-- 1 tomkooij 197613  39M Jun 27 09:06 momentum.h5


# Exercise

Create an (reasonably) compressible dataset and investigate the `tables.Expr` performance with and without compression. 

Can you achieve reasonable perfomance?