Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export T5 (encoder-decoder) to ExecuTorch #36486

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

guangy10
Copy link
Contributor

@guangy10 guangy10 commented Mar 1, 2025

What does this PR do?

This PR enables exporting T5 model to ExecuTorch, which has been asked by many OSS users.
We will need to export T5 encoder and decoder separately (i.e. to separate .pte files when lowering to ExecuTorch), and compose the encoder-decoder for specific task (e.g. summarization) in the ExecuTorch runtime. In this PR, I'm demonstrating the impl in python.

The T5 encoder is exported with "encoder_sequence_length" dim being dynamic. The decoder is exported with "encoder_sequence_length_dim" dim being dynamic and with cache support.

Tests:

Test export

RUN_SLOW=1 pytest tests/models/t5/test_modeling_t5.py -s -v -k test_export

Test lower to ExecuTorch

In optimum-executorch patch this WIP PR: huggingface/optimum-executorch#30
Users can just export the T5 model to two separate .pte files (encoder.pte and decoder.pte) and load them to perform the summarization task as simple as following:

model = ExecuTorchModelForSeq2SeqLM.from_pretrained("google-t5/t5-small", recipe="xnnpack")
generated_text = model.text_generation(
    tokenizer=AutoTokenizer.from_pretrained("google-t5/t5-small"),
    prompt="summarize: Simply put, the theory of relativity states that ...",
)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker @amyeroberts @qubvel

@guangy10 guangy10 marked this pull request as ready for review March 1, 2025 01:51
@guangy10 guangy10 mentioned this pull request Mar 1, 2025
31 tasks
Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @guangy10! I'm excited to see a new model compatible with Executorch! By the way, can we generalize this approach to other encoder-decoder models to avoid creating a specific *Exportable module for each model?

@guangy10
Copy link
Contributor Author

guangy10 commented Mar 4, 2025

Hi @guangy10! I'm excited to see a new model compatible with Executorch! By the way, can we generalize this approach to other encoder-decoder models to avoid creating a specific *Exportable module for each model?

Starting with t5 to ensure the model can work e2e with ExecuTorch in optimum-executorch. So the 2nd part on the optimum side is WIP. Yeah, I think this can be generalized for other encoder-decoder text models.

@guangy10
Copy link
Contributor Author

guangy10 commented Mar 6, 2025

@qubvel @ArthurZucker I spend lots of time today looking into other seq2seq-lm like BART and Pegasus. Their decoder code are implemented differently than T5, and they don't support Cache object. I'm implementing a different wrapper module for that kind of decoder but running into some constraint violation issue that I need more time to look into. I think that can come in a separate PR once it's ready, basically extending theSeq2SeqLMExportableModule I added in this PR. To unblock the optimum side work, i.e. huggingface/optimum-executorch#30, can I get this PR reviewed. wdyt?

@guangy10
Copy link
Contributor Author

guangy10 commented Mar 7, 2025

@GregoryComer @tarun292 @larryliu0820 Can you help review this PR? Basically, I want to standardize the way how an seq2seq-lm should be exported. And later we will need a c++ runtime that can load and run any seq2seq-lm as long as it's exported in the standardized way.

@guangy10
Copy link
Contributor Author

@ArthurZucker @qubvel Can I get this one reviewed?

@guangy10
Copy link
Contributor Author

@qubvel @ArthurZucker Sharing another research on model Hub: https://huggingface.co/models?pipeline_tag=text2text-generation&sort=trending. It shows pretty much all popular and recent variants of Seq2SeqLM are t5-based. So IMO enabling the base t5 via this PR will provide the highest ROI for users to pulling a Seq2SeqLM on-device in their application via ExecuTorch. Here is an example of the request in Discord from a real world user: https://discord.com/channels/1334270993966825602/1334270993966825605/1342205676222414908

I have rename the module to Seq2SeqLMExportableModule which can be extended to support other seq2seq-lm like BART and Pegasus in the future as I proposed in my previous comment

@guangy10
Copy link
Contributor Author

cc: @tugsbayasgalan

@guangy10
Copy link
Contributor Author

@ydshieh Do you mind reviewing this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants