# Thermo Library Tools

This is a jupyter notebook containing tools for manipulating the RMG thermodynamic database. The current functions are:
1. Check if a species is contained in certain thermo libraries.
2. Merge different thermo libraries together.

## 0. Initialization
### Import dependencies

In [None]:
import logging
import os

from rmgpy import settings
from rmgpy.data.rmg import RMGDatabase

from toolbox.base import read_species_from_yml, write_species_to_yml
from toolbox.thermolib import find_thermo_libs, read_thermo_lib_by_path, \
                              merge_thermo_lib, draw_free_energies

%matplotlib inline
%load_ext autoreload
%autoreload 2

logger = logging.getLogger()
logger.setLevel(logging.INFO)

### Load a RMG database instance
You may need to add some RMG built-in libraries for your purpose. The generated `thermo_database` will be used in the later section.

In [None]:
database = RMGDatabase()
database.load(
    path = settings['database.directory'],
    thermo_libraries = [],  # Can add other library if necessary
    kinetics_families = "default",
    reaction_libraries = [],
    kinetics_depositories = ['training'],
)

thermo_database = database.thermo

### Assign a log file to record all the changes [OPTIONAL]

In [None]:
fh = logging.FileHandler('thermo_lib_tools.log', mode="a+")
fh.setLevel(logging.INFO)
fh.setFormatter(logging.Formatter('%(asctime)s - %(levelname)s: %(message)s'))
logger.addHandler(fh)

## 1. Check species in libraries
Check if the species to run are contained in other thermo libs
- `work_dir` (str): indicate where your ARC jobs are located. Do not need to provide the full path to the file because the script will search for the thermolibrary.
- `yml_file` (str): the full path to a file contains the species you want to check. It is originally designed for ARC input file
- `disp` (bool): determine whether to display the molecule

In [None]:
work_dir = ''
yml_file = ''
disp = True

In [None]:
# Read the species list from yml_file
spc_list = read_species_from_yml(yml_file)

# Get the thermo libraries under work_dir
thermo_lib_list = find_thermo_libs(work_dir)
for thermo_lib in thermo_lib_list:
    read_thermo_lib_by_path(thermo_lib, thermo_database)

# Find the species not contained in any of the library
not_include = []
for spc in spc_list:
    thermo = thermo_database.get_all_thermo_data(spc)
    if len(thermo) == 1:
        # only GA value
        not_include.append(spc)
        if disp:
            display(spc)

# Write the species list containing unique species
# to a new yaml file'new_+[OLD_NAME]' 
if new_yml_file:
    write_species_to_yml(not_include, yml_file, mode='backup',)

## 2. Merge thermo libraries

Merge the libraries from working directory into the base library
- `base_thermo_lib` (str): the full path to an existing RMG thermolibrary.
- `work_dir` (str): indicate where your thermo libraries (generated by ARC). Do not need to provide the full path to the file because the script will search for the thermolibrary.

In [None]:
base_thermo_lib = ''
work_dir = ''

In [None]:
# Get the base thermo library
read_thermo_lib_by_path(base_thermo_lib, thermo_database)
base_lib = thermo_database.libraries[base_thermo_lib]

# Get the thermo libraries under work_dir
thermo_lib_list = find_thermo_libs(work_dir)
for thermo_lib in thermo_lib_list:
    read_thermo_lib_by_path(thermo_lib, thermo_database)

# Combine the thermo libraries 
for thermo_lib in thermo_lib_list:
    library_to_add = thermo_database.libraries[thermo_lib]
    merge_thermo_lib(base_lib, library_to_add)
    
# Save the libs
base_lib.save(base_lib.label)
for thermo_lib in thermo_lib_list:
    lib_to_add = thermo_database.libraries[thermo_lib]
    lib_to_add.save(thermo_lib)

[OPTIONAL] Remove tags

In [None]:
for thermo_lib in thermo_lib_list:
    lib_to_mod = thermo_database.libraries[thermo_lib]
    for spc in lib_to_mod.entries.values():
        spc.shortDesc = ''
lib_to_mod.save(thermo_lib)