# Prototyping a Data Storage Model for ChiantiPy

In [None]:
import os

import numpy as np
import pandas
import matplotlib.pyplot as plt
import h5py
import astropy.units as u
import ChiantiPy.tools.util as ch_util
import ChiantiPy.tools.io as ch_io
import ChiantiPy.core as ch

%matplotlib inline

## Build HDF5 file from CHIANTI files
Build out an HDF5 file from all of the various CHIANTI data files. It would be nice if we could read the data without the ChiantiPy readers...

## Data Access Model

CHIANTI has several file formats that it stores for each ion. The most notable are,

* `.elvlc`: energy levels (in cm$^{-1}$) with additional level configuration
* `.wgfa`: wavelengths, oscillator strengths, and Einstein A coefficients for the transitions
* `.scups`: temperatures and effective collision strenghts for each transition. Replaces the old `.splups` files. There are also still `.psplups` files
* Additional files:
  * `.fblvl`: information for calculating free-bound continuum
  * `.cilvl`, `.reclvl`: ionization and recombination rates

Essentially, we want to have a property for each of these files. Each of these properties returns an object with a `__getitem__` method that takes in the keys associated with each of these files. These objects return the relevant data streamed out of the HDF5 file.

Ideally, this file would be built once the first time you download ChiantiPy and then only rebuilt when your installed CHIANTI database gets updated. The filename is then stored at the package level. We'll use our CHIANTI database HDF5 file that we've been using in `synthesizAR`.

In [None]:
chianti_hdf5_filename = '/data/datadrive1/ar_forward_modeling/systematic_ar_study/chianti_db.h5'

In [None]:
class DataIndexer(object):
    
    def __init__(self,ion_path,key_conversion):
        self.ion_path = ion_path
        self.key_conversion = key_conversion
    
    def __getitem__(self,key):
        if key not in self.key_conversion:
            raise IndexError('{} not a valid attribute for {}'.format(key,self.ion_path))
        with h5py.File(chianti_hdf5_filename,'r') as hf:
            data = np.array(hf['/'.join([self.ion_path,self.key_conversion[key][0]])])*self.key_conversion[key][1]
        return data
    
    def __repr__(self):
        with h5py.File(chianti_hdf5_filename,'r') as hf:
            ref = hf[self.ion_path].attrs['ref']
        return ref
    

In [None]:
class GenericChiantiData(object):
    
    def __init__(self,ion_name):
        self.ion_name = ion_name
        self.element = ion_name.split('_')[0]
        self.Z = ch_util.el2z(self.element)
        self.stage = ion_name.split('_')[-1]
        
    @property
    def elvlc(self):
        # more sensible and readable keywords
        key_unit_conversion = {'energy':('ecm',1/u.cm),'level':('lvl',u.dimensionless_unscaled)}
        return DataIndexer('/'.join([self.element,self.stage,'elvlc']),key_unit_conversion)
    
    @property
    def wgfa(self):
        key_unit_conversion = {'wavelength':('wvl',u.angstrom),'einstein_a':('avalue',1/u.s)}
        return DataIndexer('/'.join([self.element,self.stage,'wgfa']),key_unit_conversion)
    

In [None]:
with h5py.File(chianti_hdf5_filename,'r') as hf:
    print([a for a in hf['fe/15/scups'].attrs])
    print(hf['fe/15/elvlc'].attrs['ref'])

In [None]:
chianti_reader = GenericChiantiData('fe_15')

In [None]:
chianti_reader.wgfa

In [None]:
elvlc_data = chianti_reader.elvlc

In [None]:
elvlc_data['level']

In [None]:
GenericChiantiData('mg_12').elvlc['energy']

In [None]:
%%timeit
GenericChiantiData('h_1').wgfa['einstein_a']

In [None]:
%%timeit
ch.ion('h_1').Wgfa['avalue']

This is specifically for an ion. We could also implement an even more generic class for the other non-ion-specific datasets, e.g. abundance, ionization potential, miscellaneous continuum data. 

Alternatively, when the CHIANTI HDF5 database is created, these could just be broken up by ion appropriately. This would work except for the continuum data which is maybe a special case anyway. 

Basically, we just want to avoid having to index things over and over again. Better to just refer to it by the ion name.

Since this kind of data is used in quite a few places, we could provide it as a generic object. This also makes the CHIANTI data easily accessible without the baggage of the ion object if users want to extend it in anyway.