Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the generation process #11

Closed
tszslovewanpu opened this issue Mar 26, 2024 · 4 comments
Closed

About the generation process #11

tszslovewanpu opened this issue Mar 26, 2024 · 4 comments

Comments

@tszslovewanpu
Copy link

Hello, and great job!
1、When generating the 10K molecules in Table 1、Table2, or Table3, should we input some molecules, are they from the ZINK250K or MOSES?
2、MOLGEN can generate better molecules when gives the inputs, so the generation process is actually an optimization process, am i right?
Thank you very much!

@ZJU-Fangyin
Copy link
Collaborator

Hi,

  1. Yes, all the experiments in the paper require the input of molecules, as the base model is the BART model.
  2. Your understanding is completely correct; this is a work on molecular optimization.

@tszslovewanpu
Copy link
Author

tszslovewanpu commented Mar 27, 2024

Thank you!
3、And how about MolGen 7B generate molecules? Is there any prompt gives to the trained model to start the generation process?
4、Does MolGen 7B designed for the 'generation from scratch' mission (generation and estimate the whole distribution, compare the distribution with the trainingset) or it can also complete the optimization mission?
Again thanks very much!

@ZJU-Fangyin
Copy link
Collaborator

  1. MolGen 7B is capable of generating molecules from scratch. You can input a bos_token, or input an incomplete structure for the model to complete.

Denovo molecule generation example:

from transformers import AutoTokenizer, LlamaForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("zjunlp/MolGen-7b")
model = LlamaForCausalLM.from_pretrained(
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
sf_input = tokenizer(tokenizer.bos_token, return_tensors="pt").to(device)

molecules = model.generate(input_ids=sf_input["input_ids"],
sf_output = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True).replace(" ","") for g in molecules]

Molecular completion example:

from transformers import AutoTokenizer, LlamaForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("zjunlp/MolGen-7b")
model = LlamaForCausalLM.from_pretrained(
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
sf_input = tokenizer("[C][N][O]", return_tensors="pt").to(device)

molecules = model.generate(input_ids=sf_input["input_ids"],
sf_output = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True).replace(" ","") for g in molecules]
  1. MolGen 7B is primarily designed for tasks involving the de novo generation of molecules or the completion of molecular structures. However, by making appropriate modifications to the model's generate function, it can also support inputting molecular embeddings and can be used for optimization tasks as well.

@tszslovewanpu
Copy link
Author

Got it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants