# SAPA HDF5 File Contents

This tutorial will cover the basics of accessing data stored in the hdf5 format using python.

The h5py library allows these files to be used within Python, and the SAPA output can be accessed using just this library:

In [4]:
import h5py

hdf = h5py.File("BaTiO3.hdf5", "r") # The second argument tells python we want to open this file in read mode.

hdf.keys()



<KeysViewHDF5 ['GM4-', 'GM5-', 'M1+', 'M2+', 'M2-', 'M3+', 'M3-', 'M4+', 'M5+', 'M5-', 'R2-', 'R3-', 'R4-', 'R5+', 'R5-', 'X1+', 'X2+', 'X3-', 'X5+', 'X5-', 'nomodes', 'r_vals', 'temps']>

There is also a SAPA-specific module (imaginitively named sapa_utils_hdf) that uses the h5py functionality, but contains additional functions to perform routine tasks. The rest of this this tutorial will use the sapa_utils_hdf module, but the same syntax can be used with the base h5py module.

Let's close the hdf object and re-open with the SAPA specific module. 


In [6]:
hdf.close()

import sapa_utils_hdf as su

batio3 = su.sapa_utils_hdf('BaTiO3_iso.cif','BaTiO3.hdf5')

We can access the keys for the whole file:

In [7]:
batio3.hdf.keys()

<KeysViewHDF5 ['GM4-', 'GM5-', 'M1+', 'M2+', 'M2-', 'M3+', 'M3-', 'M4+', 'M5+', 'M5-', 'R2-', 'R3-', 'R4-', 'R5+', 'R5-', 'X1+', 'X2+', 'X3-', 'X5+', 'X5-', 'nomodes', 'r_vals', 'temps']>

And for the individual groups:

In [8]:
batio3.hdf["GM4-"].keys()

<KeysViewHDF5 ['Ba_beq', 'O_beq', 'Rwp', 'Ti_beq', 'a1', 'a2', 'a25', 'a26', 'a27', 'a3', 'a49', 'a50', 'a51', 'a52', 'a53', 'a54', 'delrwp', 'lprm_a', 'lprm_al', 'lprm_b', 'lprm_be', 'lprm_c', 'lprm_ga', 'mode_amps', 'norms', 'phase_scale', 'r2v', 'rv', 'ycalc', 'yobs']>

In the base group, the "r_vals" and "temps" keys are not subgroups, but datasets. These contain the bins for the r-range and the "temps" input when executing SAPA, respectively. 

They can be accessed as follows:

In [11]:
print(batio3.hdf["temps"])
print(batio3.hdf["r_vals"])

<HDF5 dataset "temps": shape (8,), type "<f8">
<HDF5 dataset "r_vals": shape (831,), type "<f8">


This tells you that these are datasets, and the dimensions of that dataset. To access the contents of the dataset, the array slicing syntax, familiar to numpy users, is used:

In [14]:
print(batio3.hdf['temps'][:])
print(type(batio3.hdf['temps'][:]))
print(batio3.hdf['temps'][0])

[ 15. 150. 210. 250. 293. 350. 410. 500.]
<class 'numpy.ndarray'>
15.0


We can also convert this to a list:

In [15]:
print(batio3.hdf['temps'][:].tolist())

[15.0, 150.0, 210.0, 250.0, 293.0, 350.0, 410.0, 500.0]


The group for each irrep contains datasets for all refined parameters and all lattice parameters. Two additional values are calculated; "delrwp" and "mode_amps". The first of these is simply the "Rwp" parameter, with the best Rwp for a refinement with no modes active subtracted. The "mode_amps" parameter is the square root of the sum of squares of the normalised indivual mode amplitudes. These normalisation values are stored in the "norms" dataset. Finally, there are also "ycalc" and "yobs". The former of these contains the calculated PDF for the best refinements at each temperature

All but the "norms", "yobs" and "ycalc" datasets can be used in the same way. If we look at the object, we can see the shape of the dataset:

In [16]:
print(batio3.hdf["GM4-/Rwp"])

<HDF5 dataset "Rwp": shape (8, 100), type "<f4">


The first of these dimensions coincides with the number of temperatures. The second corresponds to the number of cycles specified in the input file, which in this case is 100.

To access the whole dataset, we again slice using numpy syntax:

In [18]:
print(batio3.hdf["GM4-/Rwp"][:,:])

[[ 6.069599  6.071532  6.074318  6.074668  6.07689   6.077997  6.078283
   6.079223  6.079699  6.080241  6.08118   6.082473  6.082709  6.08295
   6.083234  6.084296  6.084697  6.085483  6.085484  6.086281  6.086317
   6.087003  6.087032  6.087139  6.087213  6.087238  6.087623  6.088154
   6.088422  6.088438  6.088509  6.08904   6.08918   6.089858  6.090568
   6.090839  6.091537  6.091915  6.092264  6.092633  6.093086  6.09362
   6.093734  6.09419   6.09554   6.095645  6.095904  6.097288  6.097599
   6.097658  6.097909  6.09797   6.098048  6.098846  6.099749  6.100099
   6.100449  6.100719  6.101068  6.102162  6.102389  6.102584  6.102635
   6.102794  6.103269  6.103693  6.104167  6.106257  6.106305  6.106438
   6.108171  6.1089    6.109912  6.110076  6.112211  6.112446  6.113008
   6.115432  6.116501  6.117186  6.123599  6.124258  6.12559   6.128384
   6.128739  6.130593  6.137005  6.15856  22.08456  22.12822  22.21272
  22.24209  22.31078  27.56256  27.60282  27.60835  27.84317  28.08

We can also slice this like a 2D array to access individual temperatures:

In [17]:
print(batio3.hdf["GM4-/Rwp"][0,:])

[ 6.069599  6.071532  6.074318  6.074668  6.07689   6.077997  6.078283
  6.079223  6.079699  6.080241  6.08118   6.082473  6.082709  6.08295
  6.083234  6.084296  6.084697  6.085483  6.085484  6.086281  6.086317
  6.087003  6.087032  6.087139  6.087213  6.087238  6.087623  6.088154
  6.088422  6.088438  6.088509  6.08904   6.08918   6.089858  6.090568
  6.090839  6.091537  6.091915  6.092264  6.092633  6.093086  6.09362
  6.093734  6.09419   6.09554   6.095645  6.095904  6.097288  6.097599
  6.097658  6.097909  6.09797   6.098048  6.098846  6.099749  6.100099
  6.100449  6.100719  6.101068  6.102162  6.102389  6.102584  6.102635
  6.102794  6.103269  6.103693  6.104167  6.106257  6.106305  6.106438
  6.108171  6.1089    6.109912  6.110076  6.112211  6.112446  6.113008
  6.115432  6.116501  6.117186  6.123599  6.124258  6.12559   6.128384
  6.128739  6.130593  6.137005  6.15856  22.08456  22.12822  22.21272
 22.24209  22.31078  27.56256  27.60282  27.60835  27.84317  28.08797
 28.56302 

As is clear from the above, the Rwp increases as we go from 0 -> 100 in the array. This is by design; the refinements are sorted by Rwp, and all parameters are ordered in this way.

We can access the lowest Rwp refinement for every temperature for a parameter:

In [20]:
print(batio3.hdf["GM4-/Rwp"][:,0])

[6.069599 5.861676 5.961901 6.323836 6.354663 6.141548 5.870195 5.873828]


Accessing the calculated PDF for a given temperature can also be done using array slicing. The PDF stored corresponds to the lowest Rwp refinement for each temperature:

In [21]:
print(batio3.hdf["GM4-/ycalc"][0,:])


[-9.1879702e-01 -1.1497070e+00 -1.3878150e+00 -1.6397200e+00
 -1.9087170e+00 -2.1995111e+00 -2.5173261e+00 -2.8667440e+00
 -3.2504270e+00 -3.6678891e+00 -4.1144900e+00 -4.5808358e+00
 -5.0527081e+00 -5.5115919e+00 -5.9358211e+00 -6.3022499e+00
 -6.5883260e+00 -6.7743440e+00 -6.8456502e+00 -6.7945452e+00
 -6.6216521e+00 -6.3365688e+00 -5.9576869e+00 -5.5111618e+00
 -5.0290771e+00 -4.5469918e+00 -4.1010561e+00 -3.7249899e+00
 -3.4472220e+00 -3.2884459e+00 -3.2598569e+00 -3.3622320e+00
 -3.5859251e+00 -3.9117999e+00 -4.3129711e+00 -4.7571869e+00
 -5.2096200e+00 -5.6357670e+00 -6.0042300e+00 -6.2890658e+00
 -6.4715562e+00 -6.5412440e+00 -6.4961810e+00 -6.3424301e+00
 -6.0929160e+00 -5.7657919e+00 -5.3825130e+00 -4.9658279e+00
 -4.5379019e+00 -4.1187029e+00 -3.7248111e+00 -3.3686781e+00
 -3.0583670e+00 -2.7977190e+00 -2.5868530e+00 -2.4229131e+00
 -2.3009231e+00 -2.2146490e+00 -2.1573880e+00 -2.1225910e+00
 -2.1043241e+00 -2.0975349e+00 -2.0981560e+00 -2.1030879e+00
 -2.1100910e+00 -2.11764