Skip to content

Latest commit

 

History

History
34 lines (24 loc) · 2.84 KB

README.md

File metadata and controls

34 lines (24 loc) · 2.84 KB

Olefin-paraffin separation

The development of low-cost and energy efficient materials for gaseous mixture separation has been one of the top priorities in the scientific community. To that end, the holistic evaluation and the astonishing success of Metal Organic Frameworks (MOFs) have gained intense momentum by unlocking several new directions toward its usage in gas separation due to their evolutionary hallmark properties. However, the proliferation of nonporous material haystack makes it highly challenging to screen appropriate materials. Hence, there is an urgent need to discover properties that provide information about the suitable material for a particular application at a lower computational cost. With this aim, we establish a robust and more broadly applicable multistep workflow from the toolbox of supervised Machine learning (ML) and Active learning (AL) algorithms to construct well-trained data-driven models for predicting the olefin-paraffin selectivity of over 23,000 hypothetical MOFs. The random forest regression (RFR) model through recursive feature elimination and hyper-parameter optimisation was found to exhibit the best predictive performance towards olefin-paraffin selectivity with low mean absolute error and high coefficient of determination. Active leaning (AL), a technique that balances the exploration-exploitation trade-off by updating our beliefs of the structure-property correlations in each iteration, was discovered to be *~ 29 times* more efficient than the best supervised ML models. Additionally, Zinc and Copper (as a metal node) in a tfzd and hms topological arrangement were discovered to be a prevalent attribute in high-performing MOFs. Overall, the proposed AL/ML models not only achieve better prediction activity on par with the molecular simulations but also deciphers the underlying factors that govern the olefin-paraffin separation performance and enables a coherent path to investigate a large number of MOF configurations.

Files,Folders and their contents

  • 1_ML : Here we provide all the requisite codes and simulation data essential for replicating our outcomes across all machine learning models.
  • 2_AL : Here we provide all the requisite codes and simulation data essential for replicating our outcomes across active learning models.
  • code : contains the python scripts used.
  • input : contains files that include all the necessary data required for successful execution of the python notebooks.
  • output : contains all the output data generated by the python codes.

Dependencies
The following python libraries have been used in the current work:

  • Pandas
  • Scikit-learn
  • Matplotlib
  • Seaborn
  • numpy
  • Botorch
  • torch
  • pickle
  • pandas

Corresponding Authors

Varad J. Daoo, Jayant K. Singh
Contact Details
Email address: jayantks@iitk.ac.in