About the generation process #11

tszslovewanpu · 2024-03-26T17:33:11Z

Hello, and great job!
1、When generating the 10K molecules in Table 1、Table2, or Table3, should we input some molecules, are they from the ZINK250K or MOSES?
2、MOLGEN can generate better molecules when gives the inputs, so the generation process is actually an optimization process, am i right?
Thank you very much!

ZJU-Fangyin · 2024-03-27T00:02:49Z

Hi,

Yes, all the experiments in the paper require the input of molecules, as the base model is the BART model.
Your understanding is completely correct; this is a work on molecular optimization.

tszslovewanpu · 2024-03-27T01:37:10Z

Thank you!
3、And how about MolGen 7B generate molecules? Is there any prompt gives to the trained model to start the generation process?
4、Does MolGen 7B designed for the 'generation from scratch' mission (generation and estimate the whole distribution, compare the distribution with the trainingset) or it can also complete the optimization mission?
Again thanks very much!

ZJU-Fangyin · 2024-03-27T04:23:26Z

MolGen 7B is capable of generating molecules from scratch. You can input a bos_token, or input an incomplete structure for the model to complete.

Denovo molecule generation example:

from transformers import AutoTokenizer, LlamaForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("zjunlp/MolGen-7b")
model = LlamaForCausalLM.from_pretrained(
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
sf_input = tokenizer(tokenizer.bos_token, return_tensors="pt").to(device)

molecules = model.generate(input_ids=sf_input["input_ids"],
sf_output = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True).replace(" ","") for g in molecules]

Molecular completion example:

from transformers import AutoTokenizer, LlamaForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("zjunlp/MolGen-7b")
model = LlamaForCausalLM.from_pretrained(
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
sf_input = tokenizer("[C][N][O]", return_tensors="pt").to(device)

molecules = model.generate(input_ids=sf_input["input_ids"],
sf_output = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True).replace(" ","") for g in molecules]

MolGen 7B is primarily designed for tasks involving the de novo generation of molecules or the completion of molecular structures. However, by making appropriate modifications to the model's generate function, it can also support inputting molecular embeddings and can be used for optimization tasks as well.

tszslovewanpu · 2024-03-27T11:56:20Z

Got it!

tszslovewanpu closed this as completed Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the generation process #11

About the generation process #11

tszslovewanpu commented Mar 26, 2024

ZJU-Fangyin commented Mar 27, 2024

tszslovewanpu commented Mar 27, 2024 •

edited

Loading

ZJU-Fangyin commented Mar 27, 2024

tszslovewanpu commented Mar 27, 2024

About the generation process #11

About the generation process #11

Comments

tszslovewanpu commented Mar 26, 2024

ZJU-Fangyin commented Mar 27, 2024

tszslovewanpu commented Mar 27, 2024 • edited Loading

ZJU-Fangyin commented Mar 27, 2024

tszslovewanpu commented Mar 27, 2024

tszslovewanpu commented Mar 27, 2024 •

edited

Loading