About preprocess #9

tszslovewanpu · 2024-03-18T12:19:38Z

Hello, and great job!
Before running the finetuning process, you mentioned generating the candidate datasets with the pretrained model.
Is the pretrained model located at the path moldata/checkpoint/molgen.pkl?
I have downloaded the Hugging Face model but didn't find molgen.pkl.
moldata
├── checkpoint
│ ├── molgen.pkl # pre-trained model
Thank you!

tszslovewanpu · 2024-03-19T05:03:46Z

We can find it in the past commit and apply for the permission, thanks!

ZJU-Fangyin · 2024-03-19T05:05:30Z

I have granted the download permissions for the model, and you can load the HuggingFace model in this way:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("zjunlp/MolGen-large")
model = AutoModelForSeq2SeqLM.from_pretrained("zjunlp/MolGen-large")

sf_input = tokenizer("[C][=C][C][=C][C][=C][Ring1][=Branch1]", return_tensors="pt")
# beam search
molecules = model.generate(input_ids=sf_input["input_ids"],
sf_output = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True).replace(" ","") for g in molecules]

tszslovewanpu closed this as completed Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About preprocess #9

About preprocess #9

tszslovewanpu commented Mar 18, 2024 •

edited

Loading

tszslovewanpu commented Mar 19, 2024

ZJU-Fangyin commented Mar 19, 2024

About preprocess #9

About preprocess #9

Comments

tszslovewanpu commented Mar 18, 2024 • edited Loading

tszslovewanpu commented Mar 19, 2024

ZJU-Fangyin commented Mar 19, 2024

tszslovewanpu commented Mar 18, 2024 •

edited

Loading