Support for AutoModelForSeq2SeqLM #16

janpf · 2023-04-25T07:07:23Z

Hi,

Nice library, thanks for your work :)

As far as I understand the code it natively supports AutoModelForCausalLM (decoder only models), but currently does not handle AutoModelForSeq2SeqLM (Encoder+Decoder models), right?

Conceptionally they shouldn't be that different to implement from AutoModelForCausalLM, but would be cool for my use case. Are they on the roadmap, or could you possibly give me some hints on which pitfalls to avoid when trying to patch it in myself? E.g. how to keep the gradients for the encoder etc.

Thanks!

voidful · 2023-04-25T10:17:25Z

Hi

You can check flan-t5 in README, it is Seq2SeqLM model.
https://github.com/voidful/TextRL#example---flan-t5

colab example:
https://colab.research.google.com/drive/1DYHt0mi6cyl8ZTMJEkMNpsSZCCvR4jM1?usp=sharing

janpf · 2023-04-25T10:26:06Z

Oh sorry, I missed that. I only scanned the imports and it looked like the FlanT5 is used as a decoder-only architecture 👀

from transformers import AutoModelForCausalLM, AutoTokenizer
[...]
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")

Thanks for your quick reply, looks promising :)

janpf closed this as completed Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for AutoModelForSeq2SeqLM #16

Support for AutoModelForSeq2SeqLM #16

janpf commented Apr 25, 2023

voidful commented Apr 25, 2023

janpf commented Apr 25, 2023

Support for AutoModelForSeq2SeqLM #16

Support for AutoModelForSeq2SeqLM #16

Comments

janpf commented Apr 25, 2023

voidful commented Apr 25, 2023

janpf commented Apr 25, 2023