Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About preprocess #9

Closed
tszslovewanpu opened this issue Mar 18, 2024 · 2 comments
Closed

About preprocess #9

tszslovewanpu opened this issue Mar 18, 2024 · 2 comments

Comments

@tszslovewanpu
Copy link

tszslovewanpu commented Mar 18, 2024

Hello, and great job!
Before running the finetuning process, you mentioned generating the candidate datasets with the pretrained model.
Is the pretrained model located at the path moldata/checkpoint/molgen.pkl?
I have downloaded the Hugging Face model but didn't find molgen.pkl.
moldata
├── checkpoint
│ ├── molgen.pkl # pre-trained model
Thank you!

@tszslovewanpu
Copy link
Author

We can find it in the past commit and apply for the permission, thanks!

@ZJU-Fangyin
Copy link
Collaborator

I have granted the download permissions for the model, and you can load the HuggingFace model in this way:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("zjunlp/MolGen-large")
model = AutoModelForSeq2SeqLM.from_pretrained("zjunlp/MolGen-large")

sf_input = tokenizer("[C][=C][C][=C][C][=C][Ring1][=Branch1]", return_tensors="pt")
# beam search
molecules = model.generate(input_ids=sf_input["input_ids"],
sf_output = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True).replace(" ","") for g in molecules]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants