# Accessing data

The dataset for each topology can be loaded using NumPy in python. After loading (np.load(file)), the following keys can be used:

- positions; array of shape (N_configs $\times$ N_atoms $\times$ 3)

- energies; 1-d array of N_configs datapoints

- forces; array of shape (N_configs $\times$ N_atoms $\times$ 3)

- symbols; 1-d array of N_atoms symbols (consistent for all configs)

- cells; array of shape (N_configs $\times$ 3 $\times$ 3) (NVT and NPT MD used, so some cells are equal, others are not)

In [5]:
import numpy as np

cha_data = np.load('data/cha.npz')

In [26]:
positions = cha_data['positions']
print(positions.shape)

(1659, 108, 3)


In [13]:
energies = cha_data['energies']
print(energies.shape)

(1659,)


In [14]:
forces = cha_data['forces']
print(forces.shape)

(1659, 108, 3)


In [15]:
symbols = cha_data['symbols']
print(symbols)
print(symbols.shape)

['O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O'
 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O'
 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O'
 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O' 'O'
 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si'
 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si'
 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si' 'Si']
(108,)


In [18]:
cells = cha_data['cells']
print(cells.shape)

(1659, 3, 3)


### For direct compatability with DeePMD-kit's input, just reshape the positions, forces, and cells:

In [31]:
dp_positions = positions.reshape(positions.shape[0], positions.shape[1] * 3)
dp_forces = forces.reshape(forces.shape[0], forces.shape[1] * 3)
dp_cells = cells.reshape(cells.shape[0], cells.shape[1] * 3)

print(dp_positions.shape)
print(dp_forces.shape)
print(dp_cells.shape)

(1659, 324)
(1659, 324)
(1659, 9)
