# 7 Expressions 

The `tables.Expr` class evaluates (in-kernel) expressions on array-like objects. All the internal computations are performed via the `Numexpr` package. `Numexpr` provides multi-threading, SIMD and blocking techniques to solve the starving CPU problem. In combination with compressors (blosc) very high out-of-core performance can be reached for expressions on large-than-memory arrays (tables).

In [18]:
import tables
import numpy as np

In [205]:
data_dir = 'expr'
import os
import shutil
if os.path.exists(data_dir):
    shutil.rmtree(data_dir)
os.mkdir(data_dir)

We create a table with four columns (four-momentum from particle physics) and store random floats:

In [348]:
FILENAME = os.path.join(data_dir, "momentum.h5")
f = tables.open_file(FILENAME, "w")

In [349]:
class FourMomentum(tables.IsDescription): 
    E = tables.Float64Col()
    p_x = tables.Float64Col()
    p_y = tables.Float64Col()
    p_z = tables.Float64Col() 

In [350]:
filters = tables.Filters(complevel=0)  # no compression

In [351]:
t = f.create_table(f.root, "mydata", FourMomentum, filters=filters)

In [352]:
dtype = t.dtype

Store 1 million rows:

In [353]:
N = int(1e6)

In [354]:
arr = np.random.random((N,)).astype(dtype)
arr[:2]

array([( 0.47288225,  0.47288225,  0.47288225,  0.47288225),
       ( 0.68584122,  0.68584122,  0.68584122,  0.68584122)],
      dtype=[('E', '<f8'), ('p_x', '<f8'), ('p_y', '<f8'), ('p_z', '<f8')])

In [355]:
t.append(arr)

In [356]:
t.flush()

In [357]:
t[:10]['p_x']

array([ 0.47288225,  0.68584122,  0.44403474,  0.66558961,  0.30561075,
        0.84742291,  0.56128508,  0.05560722,  0.77054002,  0.42891583])

We can acccess the columns using the `Cols` accessor:

In [358]:
px = t.cols.p_x
py = t.cols.p_y
pz = t.cols.p_z

Define the expression:

In [359]:
expr = tables.Expr('px**2 + py**2 + pz**2')
expr

<tables.expression.Expr at 0x24e14820358>

Evaluate the expression, result will be stored in-memory:

In [360]:
%timeit expr.eval()

99.1 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Let's compare the pure-numpy version of the expression:

In [361]:
%timeit arr['p_x']**2 + arr['p_y']**2 + arr['p_z']**2

32.1 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In this example numpy is 2-3x times faster. For large, in-memory arrays, pure numpy is usually about 1.5x faster.

### out-of-core

We can store results on-disk (in HDF5) so we can evaluate expressions out-of-core even if the results do not fit into memory:

In [362]:
output_array = f.create_carray(f.root, "output", atom=tables.Float64Atom(), shape=(N,), filters=filters)
output_array

/output (CArray(1000000,)) ''
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (8192,)

In [363]:
expr.set_output(output_array)

In [364]:
%timeit expr.eval()

95.8 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Read the output:

In [365]:
f.root.output[:]

array([ 0.67085288,  1.41113454,  0.59150055, ...,  0.0791325 ,
        0.36949893,  0.00462901])

In this way we can evaluate expressions with larger-than-memory results out-of-core. 

In [367]:
f.close()

## Using compression

The facilitate experimenting with dataset size, compression etc, we define some functions:

In [388]:
fn = os.path.join(data_dir, 'momentum.h5')

In [389]:
def create_dataset(N, filename, filters):
    """Create table '/mydata' with a random four-momentum table of size N"""
    with tables.open_file(filename, "w") as f:

        t = f.create_table(f.root, "mydata", FourMomentum, filters=filters)

        dtype = t.dtype
        arr = np.random.random((N,)).astype(dtype)
        t.append(arr)
        t.flush()

        f.create_carray(f.root, "output", atom=tables.Float64Atom(), shape=(N,), filters=filters)
        

In [390]:
def create_expression(f, output=None, expr='px**2 + py**2 + pz**2'):
    """Create an expression object"""
    t = f.root.mydata
    px = t.cols.p_x
    py = t.cols.p_y
    pz = t.cols.p_z

    e = tables.Expr(expr)
    if output is not None:
        e.set_output(output)
    return e


In [396]:
filters = tables.Filters(complevel=6, complib='blosc:lz4')
filters = tables.Filters(complevel=0)
fn = os.path.join(data_dir, 'momentum-uncompressed.h5')

In [402]:
N=int(1e8)
create_dataset(N, fn, filters)

In [403]:
!ls -lh {data_dir}

total 3.2G
-rw-r--r-- 1 tomkooij 197613 3.0G Jun 23 14:49 momentum-uncompressed.h5
-rw-r--r-- 1 tomkooij 197613 140M Jun 23 14:46 momentum.h5


In [404]:
with tables.open_file(fn, 'a') as f:
    expr = create_expression(f)
    %timeit expr.eval()

9.59 s ± 352 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [405]:
with tables.open_file(fn, 'a') as f:
    expr = create_expression(f, output=f.root.output)
    %timeit expr.eval()

9.73 s ± 56.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [406]:
!ls -lh {data_dir}

total 3.9G
-rw-r--r-- 1 tomkooij 197613 3.8G Jun 23 14:52 momentum-uncompressed.h5
-rw-r--r-- 1 tomkooij 197613 140M Jun 23 14:46 momentum.h5


# Exercise

Create an (reasonably) compressible dataset and investigate the `tables.Expr` performance with and without compression. 

Can you achieve reasonable perfomance?