# PYHESSIO module for MC data access Training




## Introduction
pyhessio package allows to read CTA  MC data generated by corsika + simetel array.  
This is a temporary solution meanwhile the official data format will be choosen.  

Will be covers in this training:  
* installation  
* how to use it  
* how does it work  
* how to participate to module development by adding new getter

## Installation

There is two differents way to install it:  
1. with standard *python setup.py install*
2. within a conda environment

### Standard python installation:
>  \$ *git clone https://github.com/cta-observatory/pyhessio*  
>  \$ *cd pyhessio*  
>  \$ *python setup.py install*  

This will install pyhessio module into site-packages python installation directory 
(../lib/python3.4/site-packages)


### within a conda environment:
Only if you do not have already create a conda environment, let's create it:
>  \$ *conda create -n cta python=3.4*    
  
>  \$ *source activate cta*  
>  \$ *git clone https://github.com/cta-observatory/pyhessio*   
>  \$ *conda build pyhessio*  
>  \$ *conda install --use-local pyhessio*  

This will install pyhessio module into cta conda environment site-packages directory 
(~/anaconda3/envs/cta_34/lib/python3.4/site-packages)

## How to use it


In [2]:
from pyhessio import *

# Open a MC hessio file
file_open("/home/jacquem/CTA/pyhessio/pyhessio-extra/datasets/gamma_test.simtel.gz")



ImportError: No module named 'pyhessio'

In [1]:
#pyhessio.move_to_next_event is a generator. It iterates overs events.
# iterate over event
for run_id, event_id in move_to_next_event(limit=3):
    print(run_id, event_id)

NameError: name 'move_to_next_event' is not defined

In [3]:
# get triggered telescopes list for current event
tel_list = get_teldata_list()
print(tel_list)


NameError: name 'get_teldata_list' is not defined

In [11]:
# get adc_sum for telescope 17 and channel 0
adc_sum = get_adc_sum(17,0)
print(adc_sum)

[3145 2958 3321 ..., 2951 3073 3029]


<aside class="warning">
When working with ctapipe, do not used pyhessio direcly but instead use ctapipe.io.hessio module
</aside>

## How does it work
### Build mechanism

Different ways to wrap hessio C library exist, I explored 2 of them .
1. Wrap hessio functions and data structures with SWIG
It seemed to be the best solution because all hessio data structures could be wrap "automatically", by providing SWIG specific command but no python or C code.  
But unfortunately hessio data format is C structures containing other structures and multidimentional arrays.
Issue: SWIG only provides typemaps for 1D and 2D arrays.  
hessio → *uint16_t adc_sample[H_MAX_GAINS][H_MAX_PIX][H_MAX_SLICES];*  
Cannot use SWIG  

2. or produce high-level functions in C langage, wrap them and call them in Python   
It use python numpy.ctypeslib.load_library to load shared library containing high-level C functions and that can be directly call by Python functions.

Cons:
- Each time a user need to access a new data of hess_all_data_struct struture, user must write the corresponding high level C fonction and write the Python function that calls the C function.


#### setuptools Extension

Everything happens in setup.py file.  
It uses [Extension](https://docs.python.org/2/extending/building.html) module of setuptools package.  

pyhessio_module = Extension(  
    'pyhessio.pyhessioc',  
    sources=['pyhessio/src/pyhessio.c',  
              'hessioxxx/src/atmprof.c',  
              'hessioxxx/src/current.c',  
              'hessioxxx/src/dhsort.c',  
              'hessioxxx/src/eventio.c',  
              'hessioxxx/src/eventio_registry.c',  
              'hessioxxx/src/fileopen.c',  
              'hessioxxx/src/histogram.c',  
              'hessioxxx/src/hconfig.c',  
              'hessioxxx/src/moments.c',  
              'hessioxxx/src/io_histogram.c',  
              'hessioxxx/src/io_history.c',  
              'hessioxxx/src/io_simtel.c',  
              'hessioxxx/src/io_trgmask.c',  
              'hessioxxx/src/straux.c',  
              'hessioxxx/src/warning.c',  
              'hessioxxx/src/io_hess.c' ],  
    include_dirs = ['hessioxxx/include',  '.'],  
    define_macros=[('CTA', None), ('CTA_MAX_SC', None)]  
  )  



### Software and data format

MC data is produced by CORSIKA and sim_telarray.   
MC data format is hessio.  
[Software for CTA MC simulations](https://www.mpi-hd.mpg.de/hfm/CTA/MC/)  
hessio source code changes for each official production (PROD1, PROD2 and PROD3).  

pyhessio is just a python wrapper upper to hessio source code (C langage).

#### hessio data structure

pyhessio/hessioxxx/include/io_hess.h  


struct hess_all_data_struct   
{   
  RunHeader run_header;  
  MCRunHeader mc_run_header;  
  CameraSettings camera_set[H_MAX_TEL];  
  CameraOrganisation camera_org[H_MAX_TEL];    
  PixelSetting pixel_set[H_MAX_TEL];    
  PixelDisabled pixel_disabled[H_MAX_TEL];   
  CameraSoftSet cam_soft_set[H_MAX_TEL];  
  TrackingSetup tracking_set[H_MAX_TEL];  
  PointingCorrection point_cor[H_MAX_TEL];  
  FullEvent event;  
  MCShower mc_shower;  
  MCEvent mc_event;  
  TelMoniData tel_moni[H_MAX_TEL];  
  LasCalData tel_lascal[H_MAX_TEL];  
  RunStat run_stat;  
  MCRunStat mc_run_stat;  
};  
typedef struct hess_all_data_struct AllHessData; 

#### Fill AllHessData structure:
pyhessio/src/pyhessio.c is the only C file.  
It contains a function that fills AllHessData and some getter to access elelements of that structure.  

*fill_hsdata* function fills AllHessData structure.    
*move_to_next_event* searchs for next event (thanks to  item_header)  and calls fill_hsdata  

#### How does getter function work ?

##### From C side
A getter function have to be writed by hessio elemment
ie:  get_add_sum, get_add_sample, ...

>//----------------------------------------------------------------  
>// Return adc sum for corresponding telescope and channel (HI_GAIN/LOW_GAIN)  
>// Returns TEL_INDEX_NOT_VALID if telescope index is not valid  
>//----------------------------------------------------------------  
>int get_adc_sum (int telescope_id, int channel, uint32_t * data){  


##### From python side
pyhessio/\_\_init\_\_.py  
It uses numpy.ctypeslib to load C library.    
*lib = np.ctypeslib.load_library('pyhessioc', _path)*  

Each C function must declare is argument type and response type ie:  
*lib.get_adc_sum.argtypes = [ctypes.c_int,ctypes.c_int,np.ctypeslib.ndpointer(ctypes.c_int32, flags="C_CONTIGUOUS")]*  
*lib.get_adc_sum.restype = ctypes.c_int*    

Finally python function is needed.  
*def get_adc_sum(telescope_id,channel):*  
   ...  
   *npix = get_num_pixels(telescope_id)*  
   *__data = np.zeros(npix,dtype=np.int32)__*  
   *result = lib.get_adc_sum(telescope_id,channel ,data)*  



##  How to participate to module development by adding new getter

1. Find the structure containing what you need in io_hess.h  
2. Write getter function in pyhessio/src/pyhessio.c  
3. Declare  argument type and response type in pyhessio/\_\_init\_\_.py  
4. Write python function that instantiates data (thanks to numpy) and call your C getter function.  
5. Write unit test in  pyhessio/tests/test_hessio.py.  
6. test it.  
7. Ask for a PULL request.

Do not forget to add it to ctapipe.io.hessio.py
