# ZnH5MD - High Performance Interface for H5MD Trajectories

ZnH5MD allows you to easily access properties stored in the [H5MD](https://www.nongnu.org/h5md/) standard. Furthermore, ZnH5MD provides powerful generators based on [TensorFlow Data API](https://www.tensorflow.org/guide/data_performance) that enable the processing of large datafiles in memory-safe batches.

In [1]:
from znh5md import LammpsH5MD

In [2]:
# We use ZincHub to gather an example Simulation
from zinchub import DataHub
NaCl = DataHub(url="https://github.com/zincware/DataHub/tree/main/NaClH5MD")
filename = NaCl.get_file()

The trajectory object (`LammpsH5MD` for Lammps files) gives access to the supported properties and generators.

In [3]:
traj = LammpsH5MD(filename)

In [4]:
traj.position.shape # 201 configurations with 1000 particles each

(201, 1000, 3)

By passing `batch_size` to `get_dataset` we can split our positions. The base-dataset contains `step`, `time` and `value` as defined by H5MD.
We can further specify if we only need access to the values by accessing `position.value.get_dataset().`

In [5]:
%%time
for position in traj.position.get_dataset(batch_size=32):
    print(position.keys())
    print(position['step'])
    print(position['time'])
    print(position['value'].shape)
    break

dict_keys(['step', 'time', 'value'])
tf.Tensor(
[    0  1000  2000  3000  4000  5000  6000  7000  8000  9000 10000 11000
 12000 13000 14000 15000 16000 17000 18000 19000 20000 21000 22000 23000
 24000 25000 26000 27000 28000 29000 30000 31000], shape=(32,), dtype=int32)
tf.Tensor(
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.], shape=(32,), dtype=float64)
(32, 1000, 3)
Wall time: 149 ms


In [6]:
%%time
for position in traj.position.value.get_dataset(selection=slice(500), batch_size=32):
    print(position.shape)

(32, 500, 3)
(32, 500, 3)
(32, 500, 3)
(32, 500, 3)
(32, 500, 3)
(32, 500, 3)
(9, 500, 3)
Wall time: 43 ms


ZnH5MD supports iterating over different axis and supports selection along the select axis as well as the remaining ones. The shape will always remain `(n_configurations, n_particles, n_dims)`.

In [7]:
%%time
for position in traj.position.value.get_dataset(axis=1, batch_size=128):
    print(position.shape)

(201, 128, 3)
(201, 128, 3)
(201, 128, 3)
(201, 128, 3)
(201, 128, 3)
(201, 128, 3)
(201, 128, 3)
(201, 104, 3)
Wall time: 60 ms


## Selection of particles

Currently, the selection is static but will be extended by a dynamic per-configuration selection.

In [8]:
import numpy as np
species_1 = np.arange(traj.species.shape[1])[traj.species[0] == 1].tolist()

In [9]:
%%time
for position in traj.position.value.get_dataset(selection=species_1, batch_size=128):
    print(position.shape)

(128, 500, 3)
(73, 500, 3)
Wall time: 41 ms


In [10]:
traj.get_groups()

['box', 'force', 'image', 'position', 'species', 'velocity']

In [11]:
traj.not_exist[:5]

GroupNotFound: Could not load particles/all/abc/value from lammps_npt.h5