# Example euxfel_h5tools python package #

This example present basic functionalities of the python package euxfel_h5tools provided by the European XFEL. We will parse a run directory, extract related informations, read train related data, combine data from different data sources.

In [1]:
# The European XFEL specific HDF5 tools
from euxfel_h5tools import RunHandler, stack_detector_data

In [7]:
# Path to the data run we want to analyse
run_dir = '/gpfs/exfel/data/scratch/haufs/karabo_ws/r0803/'

The run directory contains many HDF5 files. In this case each of them contains a single data source (AGIPD detector modules), but it can contains many sources and thus be difficult to know where to find a particular parameter.

In [3]:
!ls $run_dir | grep .h5

ls: cannot access /gpfs/exfel/data/exp/XMPL/r0803/: No such file or directory


In [4]:
help(RunHandler)

Help on class RunHandler in module euxfel_h5tools.reader:

class RunHandler(builtins.object)
 |  Handles a 'run' generated at the European XFEL.
 |  
 |  A 'run' is a directory containing a various amount of HDF5 file recorded
 |  in the European XFEL format. This class can iterate through the data
 |  contained in the run and extract instrument data per XRAY train.
 |  
 |  Parameters
 |  ----------
 |  path: str
 |      Path to the run directory.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, path)
 |  
 |  infos(self)
 |      Show information about the run.
 |  
 |  train_from_id(self, train_id, devices=None)
 |      Get Train data for specified train ID.
 |      
 |      Parameters
 |      ----------
 |      train_id: int
 |          the train ID you want to return
 |      devices: dict, optional
 |          Use to filter data by devices and by parameters, i.e., for::
 |      
 |              dev = {'xray_monitor': {'pulseEnergy', 'beamPosition'}}
 |              for id, da

By instantiating a RunHandler class, the run directory is parsed and contained data is sorted per train.

In [8]:
# Instanciate the run handler with the path to the run folder.
run1 = RunHandler(run_dir)

You can find basic information about the run with the method `infos()`. `instrument` devices are devices which are pulses related (have more that one parameter value per train), `control` devices are train related or slower.

In [10]:
# Display general information about this run.
run1.infos()

Run information
	Duration:       0:02:48.400000
	First train ID: 1541484692
	Last train ID:  1541486376
	# of trains:    251

Devices
	Instruments
	- SPB_DET_AGIPD1M-1/DET/0CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/10CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/11CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/12CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/13CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/14CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/15CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/1CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/2CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/3CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/4CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/5CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/6CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/7CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/8CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/9CH0:xtdf
	Controls
	-


The RunHandler class contains a generator method that can Iterate over trains.
The returned object is a tuple with 2 values: (1) the train ID of the returned train and (2) the data it contains.

In [11]:
trains = run1.trains()

# get the first train in the run by calling next().
first_train = next(trains)
print('* The train generator returns a ', type(first_train))

train_id, data = first_train

# train_id is an int (unique identifier for each XRAY train)
print('* The returned train is:', train_id)
# data is a dictionary, each item is a data source.
print('* data sources in the first train:\n', data.keys())

* The train generator returns a  <class 'tuple'>
* The returned train is: 1541484692
* data sources in the first train:
 dict_keys(['SPB_DET_AGIPD1M-1/DET/9CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/8CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/14CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/1CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/5CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/2CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/15CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/13CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/12CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/0CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/10CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/7CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/3CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/4CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/6CH0:xtdf', 'SPB_DET_AGIPD1M-1/DET/11CH0:xtdf'])


We want to get the parameter 'image.data' from the source 'SPB_DET_AGIPD1M-1/DET/0CH0:xtdf'.
The data source represente the output data of a device in a karabo data pipeline.
in this case the source is the output of the module 0 of the AGIPD detector from the instrument SPB.
This parameter contains the pixels values for each pulses in the train.

In [12]:
image_mod0 = data['SPB_DET_AGIPD1M-1/DET/0CH0:xtdf']['image.data']

# image.data is a numpy.array
# 1st dimention: pulse index
# 2nd and 3rd: x, y
print('data shape:', image_mod0.shape)

data shape: (64, 512, 128)


The detector contains 16, each are independant data sources.
We can combine all of them in a single array.

In [13]:
# Combine all modules into a single array
full_detector_image = stack_detector_data(data, 'image.data')

# shape: (pulses, modules, x, y)
full_detector_image.shape

(64, 16, 512, 128)

We can iterate easily over all train using our train generator. Here we iterate over the next 10 trains and combine all the detector modules in a single array.

In [11]:
i = 0
for tid, data in trains:
    full_detector_image = stack_detector_data(data, 'image.data')
    print('train:', tid, 'det:', full_detector_image.shape)
    i+=1
    if i == 5:
        break

train: 1541484693 det: (64, 16, 512, 128)
train: 1541486128 det: (64, 16, 512, 128)
train: 1541486129 det: (64, 16, 512, 128)
train: 1541486130 det: (64, 16, 512, 128)
train: 1541486131 det: (64, 16, 512, 128)


We can also retrieve a specific train contained in the run.

In [12]:
# Retrieve a specific train by his train ID
tid, data = run1.train_from_id(1541486130)
print('retrieved train 1541486130:', tid)
# Or by index
tid, data = run1.train_from_index(100)
print('retrieved 101th train:', tid)

retrieved train 1541486130: 1541486130
retrieved 101th train: 1541486226


Here we check if a a detector module data is missing in any train.

In [13]:
for i in range(len(run1.ordered_trains)):
    nb_sources = len(run1.ordered_trains[i][1])
    if nb_sources < 16:
        print('train {}: only {} modules found'.format(i, nb_sources))

train 2: only 8 modules found
train 250: only 8 modules found


In [14]:
run1.train_info(1541486130)

Train [1541486130] information
Devices
	Instruments
	- SPB_DET_AGIPD1M-1/DET/0CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/10CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/11CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/12CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/13CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/14CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/15CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/1CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/2CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/3CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/4CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/5CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/6CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/7CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/8CH0:xtdf
	- SPB_DET_AGIPD1M-1/DET/9CH0:xtdf
	Controls
	-


While retrieving train data, it is possible to filter only interesting data sources, and parameters.

In [20]:
# dict holding only what we are interested in
devs = {'SPB_DET_AGIPD1M-1/DET/5CH0:xtdf': {'image.data', 'image.gain'}}
    
for i in range(12, 15):
    tid, data = run1.train_from_index(i, devices=devs)
    print('train:', tid)
    print('sources', data.keys())
    print('parameters', data['SPB_DET_AGIPD1M-1/DET/5CH0:xtdf'].keys())
    print('***')

train: 1541486138
sources dict_keys(['SPB_DET_AGIPD1M-1/DET/5CH0:xtdf'])
parameters dict_keys(['image.gain', 'metadata', 'image.data'])
***
train: 1541486139
sources dict_keys(['SPB_DET_AGIPD1M-1/DET/5CH0:xtdf'])
parameters dict_keys(['image.gain', 'metadata', 'image.data'])
***
train: 1541486140
sources dict_keys(['SPB_DET_AGIPD1M-1/DET/5CH0:xtdf'])
parameters dict_keys(['image.gain', 'metadata', 'image.data'])
***
