Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module and notebook for the pipeline (CMFGEN to HDF5) #143

Merged
merged 20 commits into from
Jul 23, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion carsus/io/cmfgen/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from carsus.io.cmfgen.base import (CMFGENOscillatorStrengthsParser,
CMFGENEnergyLevelsParser,
CMFGENCollisionalDataParser,
CMFGENPhotoionizationCrossSectionParser)
CMFGENPhotoionizationCrossSectionParser)
from carsus.io.cmfgen.hdfgen import hdf_dump
55 changes: 55 additions & 0 deletions carsus/io/cmfgen/hdfgen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import glob
import logging

logger = logging.getLogger(__name__)

def hdf_dump(cmfgen_dir, patterns, parser, chunk_size=10, ignore_patterns=[]):
"""Function to parse and dump the entire CMFGEN database.

Parameters
----------
cmfgen_dir : path
Path to the CMFGEN atomic database
patterns : list of str
String patterns to search for
parser : class
CMFGEN parser class
chunk_size : int, optional
Number of files to parse together, by default 10
ignore_patterns : list, optional
String patterns to ignore, by default []
"""
files = []
ignore_patterns = ['.h5'] + ignore_patterns
for case in patterns:
path = '{0}/**/*{1}*'.format(cmfgen_dir, case)
files = files + glob.glob(path, recursive=True)

for i in ignore_patterns:
files = [f for f in files if i not in f]

n = chunk_size
files_chunked = [files[i:i+n] for i in range(0, len(files), n)]

# Divide read/dump in chunks for less I/O
for chunk in files_chunked:

_ = []
for fname in chunk:
try:
obj = parser.__class__(fname)
logger.info('Parsed {}'.format(fname))
_.append(obj)

except TypeError:
logger.error('Failed parsing {} (try checking `find_row` function)'.format(fname))

except UnboundLocalError:
logger.error('Failed parsing {} (try checking `to_float` function)'.format(fname))

except IsADirectoryError:
logger.error('Failed parsing {} (is a directory)'.format(fname))

for obj in _:
obj.to_hdf()
logger.info('Dumped {}.h5'.format(obj.fname))
3 changes: 2 additions & 1 deletion docs/io/cmfgen/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ Here we show how to parse and dump files from John Hillier's CMFGEN Atomic Data

osc_files
col_files
pho_files
pho_files
pipeline
Loading