Skip to content
/ DyRAMO Public

DyRAMO: Dynamic Reliability Adjustment for Multi-objective Optimization

License

Notifications You must be signed in to change notification settings

ycu-iil/DyRAMO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DyRAMO

DyRAMO (Dynamic Reliability Adjustment for Multi-objective Optimization) is a framework to perform multi-objective optimization while maintaining the reliability of multiple prediction models.

How to set up

OS Requirements

DyRAMO is supported for Linux operation systems. DyRAMO has been tested on AlmaLinux release 9.3.

Python Dependencies

  • python: 3.11
  • chemtsv2: 1.0.3
  • physbo: 2.0.0
  • (optional) lightgbm: 3.2.1 (for property prediction)

Installation (example)

cd YOUR_WORKSPACE
python3.11 -m venv .venv
source .venv/bin/activate
pip install chemtsv2==1.0.3 physbo==2.0.0 lightgbm==3.2.1

This installation via pip normally finishes in about 40 seconds.

How to run DyRAMO

1. Clone this repository and move into it

git clone git@github.com:ycu-iil/DyRAMO.git
cd DyRAMO

2. Prepare a reward file for molecule generation

DyRAMO employs ChemTSv2 as a molecule generator. Here, please prepare a reward file for ChemTSv2 according to instructions on how to define reward function in ChemTSv2. An example of a reward file for DyRAMO can be found in reward/DyRAMO_reward.py.

3. Prepare a setting file

Please prepare a yaml file containing the settings for both DyRAMO and ChemTSv2. An example of a setting file can be found in config/setting_dyramo.yaml. Details of the settings are described in Setting to run DyRAMO section.

4. Run DyRAMO

Please execute run.py with the yaml file as an argument.

python run.py -c config/setting_dyramo.yaml

Expected outputs

  • run.log: An execution log file for DyRAMO.
  • serach_history.csv: A csv file containing information on the explored input variables and the corresponding objective variables.
  • search_result.npz: A binary file containing information on the search results. Please refer PHYSBO documentation for details.
  • result/: A directory containing the results of molecule generation with ChemTSv2.

Expected run time

  • Generating 10,000 molecules with a C-value of 0.01 is generally assumed to take about 10 minutes.
  • In this case, this generation is set to be repeated 40 times, requiring a total of approximately 7 hours.

Settings to run DyRAMO

The settings for DYRAMO and ChemTSv2 are described in a single yaml file. The settings for ChemTSv2 are partially quoted here. More details can be found in the following link. (The description of ChemTSv2 settings written here is taken from the above link.)

Option Suboption Descriotion
c_val - An exploration parameter to balance the trade-off between exploration and exploitation. A larger value (e.g., 1.0) prioritizes exploration, and a smaller value (e.g., 0.1) prioritizes exploitation.
threshold_type - Threshold type to select how long (hours) or how many (generation_num) molecule generation to perform per run.
hours - Time for molecule generation in hours per run.
generation_num - Number of molecules to be generated per run.
reward_function property Settings for calculating reward function, Dscore. Datails for setting of the Dscore parameters can be found in the following link.
search_range Search range of reliability levels for each property. Search ranges are defined by upper and lower limits and their intervals. For example, if the upper limit, lower limit, interval are set to 0.9, 0.1, and 0.2, respectively, the search range is defined as follows: [0.1, 0.3, 0.5, 0.7, 0.9].
[prop].max Upper limit of search range.
[prop].min Lower limit of search range.
[prop].step Interval of search points.
DSS Settings for defining DSS score, an objective function in Bayesian optimization processes.
reward.ratio Proportion of molecules to be evaluated in DSS. The average of the top ratio of rewards from the generated molecules is evaluated .
[prop].priority Priority for properties in adjusting reliabilirty levels. Select one from high, middle, and low for each property.
BO Settings for Bayesian optimization with PHYSBO.
num_random_search Number of random search iterations for initialization.
num_bayes_search Number of search iterations by Bayesian optimization.
score The type of aquision funciton. TS (Thompson Sampling), EI (Expected Improvement) and PI (Probability of Improvement) are available.
num_generation - Number of running of molecule generation at each search point.

Note

The [prop] and . represent the name of property to be optimized and nesting structure, respectively. For example, a description of the search_range parameter in yaml should be as follows.

search_range:
  EGFR:
    min: 0.1
    max: 0.9
    step: 0.01

How to Cite

@article{Yoshizawa2024,
  title = {Avoiding Reward Hacking in Multi-Objective Molecular Design: A Data-Driven Generative Strategy with a Reliable Design Framework},
  url = {https://doi.org/10.26434/chemrxiv-2024-dh681},
  DOI = {10.26434/chemrxiv-2024-dh681},
  journal = {ChemRxiv},
  author = {Yoshizawa, Tatsuya and Ishida, Shoichi and Sato, Tomohiro and Ohta, Masateru and Honma, Teruki and Terayama, Kei},
  year = {2024},
  month = jun
}

Note

If you would like to reproduce the results of the above article, please refer to doc/reproduction_instruction.md.

License

This package is distributed under the MIT License.

Contact

About

DyRAMO: Dynamic Reliability Adjustment for Multi-objective Optimization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages