[WIP] Support export of Llama with DynamicCache and transformers>=4.48 #24291

xadupre · 2025-04-03T10:00:19Z

Description

Replaces #24231.

LLM cannot be exportable with transformers>=4.48 and torch script export. It requires to use the new exporters (torch.onnx.expor(..., dynamo=True). This PR implements patches to overcome the pieces of code in transformers which don't export and convert dynamic_axes into dynamic shapes.

The modification were tested with a tiny LLM:

python -m onnxruntime.transformers.models.llama.convert_to_onnx -m arnir0/Tiny-LLM --output Tiny-LLM --precision fp16 --execution_provider cuda --small_gp --use_dynamo_export

…llama2

…xadupre/llama

onnxruntime/python/tools/transformers/models/llama/llama_inputs.py

@@ -7,6 +7,7 @@

 import numpy as np
 import torch
+import transformers


To fix the problem, we should remove the from transformers import AutoConfig, AutoTokenizer statement and access these components directly from the transformers module. This will make the code more consistent and easier to understand.

Remove the from transformers import AutoConfig, AutoTokenizer statement.

Update the references to AutoConfig and AutoTokenizer to use transformers.AutoConfig and transformers.AutoTokenizer, respectively.

onnxruntime/python/tools/transformers/models/llama/llama_parity.py

 import torch
+import transformers


To fix the problem, we should remove the from transformers import AutoConfig statement and access AutoConfig through the transformers module instead. This will ensure that the module is only imported once and will make the code more consistent and easier to understand.

Remove the from transformers import AutoConfig statement.

Replace all instances of AutoConfig with transformers.AutoConfig.

onnxruntime/python/tools/transformers/models/torch_export_patches/onnx_export_errors.py

onnxruntime/python/tools/transformers/models/torch_export_patches/patches/patch_torch.py

+def _catch_produce_guards_and_solve_constraints(
+    previous_function: Callable,
+    fake_mode: "FakeTensorMode",
+    gm: "torch.fx.GraphModule",
+    dynamic_shapes: dict[str, Any] | tuple[Any] | list[Any] | None,
+    equalities_inputs: "EqualityConstraint",  # noqa: F821
+    original_signature: inspect.Signature,
+    _is_torch_jit_trace: bool = False,
+    verbose: int = 0,
+):


To fix the problem, we need to add an explicit return statement at the end of the _catch_produce_guards_and_solve_constraints function. This will ensure that the function's return value is always clear, even when an exception is caught and the if conditions are not met. The explicit return statement should return None, as this is the implicit return value when no return statement is present.

onnxruntime/python/tools/transformers/models/torch_export_patches/patches/patch_torch.py

+def patch__check_input_constraints_for_graph(
+    previous_function: Callable,
+    input_placeholders: list[torch.fx.Node],
+    flat_args_with_path,
+    range_constraints,
+    verbose: int = 0,
+) -> None:


To fix the problem, we need to add an explicit return statement at the end of the function patch__check_input_constraints_for_graph. This ensures that the function always returns None explicitly when no exception is raised, making the code easier to read and understand.

Add an explicit return None statement at the end of the function patch__check_input_constraints_for_graph.

This change should be made in the file onnxruntime/python/tools/transformers/models/torch_export_patches/patches/patch_torch.py.

onnxruntime/python/tools/transformers/models/torch_export_patches/patches/patch_torch.py

+            # if config.print_specializations:
+            #    self.log.warning(
+            #         "Specializing %s to %s", self.var_to_sources[a][0].name(), tgt


To fix the problem, we should remove the commented-out code entirely. This will clean up the code and eliminate any potential confusion for future developers. The specific lines to be removed are 304-308.

onnxruntime/python/tools/transformers/models/torch_export_patches/patches/patch_transformers.py

+            # if input_ids.shape[1] == 0:
+            #     inputs_embeds = inputs_embeds[:, -cache_position.shape[0] :]
+            # else:
+            #     if cache_position[-1] >= input_ids.shape[1]:
+            #         input_ids = input_ids[:, -cache_position.shape[0] :]
+            #     else:
+            #         if input_ids.shape[1] != cache_position.shape[0]:
+            #             input_ids = input_ids[:, cache_position]


The best way to fix the problem is to remove the commented-out code entirely. This will improve the readability of the code and eliminate any confusion for future developers. If the commented-out code is needed for reference, it should be documented separately or included in a way that clearly indicates its purpose.

Remove the commented-out code block from lines 281 to 288.

Ensure that the remaining code is properly formatted and functional.

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py

onnxruntime/python/tools/transformers/models/torch_export_patches/patch_inputs.py

onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py

onnxruntime/python/tools/transformers/models/torch_export_patches/patch_inputs.py

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py

xadupre added 21 commits March 28, 2025 15:01

first draft to migrate to newer version of transformers

5453405

add patches

31e82a9

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

299f116

…llama2

fix import

cdec2d0

fix build and import

827d3bd

build

18b649e

fix lint

0e77ed4

lint

4633a3e

lint

b12287a

rename

1b926cb

lint

6646e61

lint

a14b8b3

remove args.dynamo

9f3a816

fix issues

0c88e42

copy inputs

8b60535

fix shape

741285b

fix validation

f8490a5

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

dbe202c

…llama2

add use_dynamo_export

ca43041

lint

19d4dfb

Merge branch 'llama2' of https://github.com/xadupre/onnxruntime into …

49fb806

…xadupre/llama

xadupre mentioned this pull request Apr 3, 2025

[WIP] Support export of Llama with DynamicCache and transformers>=4.48 #24231

Closed

github-advanced-security bot found potential problems Apr 3, 2025

View reviewed changes

xadupre added 3 commits April 3, 2025 13:12

fix requirements

8d3b0ba

fix requitmeents

835b76e

fix dynamic shapes

a0a8c21

github-actions bot reviewed Apr 3, 2025

View reviewed changes

onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py Outdated Show resolved Hide resolved

onnxruntime/python/tools/transformers/models/torch_export_patches/patch_inputs.py Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Apr 3, 2025

View reviewed changes

xadupre added 2 commits April 3, 2025 17:22

2.6

f61c27b

remove duplicated section

902c6af

github-actions bot reviewed Apr 3, 2025

View reviewed changes

onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py Outdated Show resolved Hide resolved

lint

e3188ad

xadupre changed the title ~~[WIP] Support export of Llama with DynamicCache and transformers>=4.48~~ [WIP] Support export of Llama with DynamicCache and transformers>=4.51 Apr 10, 2025

xadupre changed the title ~~[WIP] Support export of Llama with DynamicCache and transformers>=4.51~~ [WIP] Support export of Llama with DynamicCache and transformers>=4.48 Apr 10, 2025

xadupre mentioned this pull request Apr 10, 2025

[WIP] Support export of Llama with DynamicCache and transformers>=4.51 #24379

Open

@@ -10,3 +10,2 @@
             import transformers
-            from transformers import AutoConfig, AutoTokenizer
@@ -32,3 +31,3 @@
             def get_sample_inputs(
-                config: AutoConfig,
+                config: transformers.AutoConfig,
                 device: torch.device,
@@ -67,3 +66,3 @@
             def get_sample_with_past_kv_inputs(
-                config: AutoConfig,
+                config: transformers.AutoConfig,
                 device: torch.device,

@@ -28,3 +28,2 @@
             from models.torch_export_patches.cache_helper import make_dynamic_cache
-            from transformers import AutoConfig
@@ -35,3 +34,3 @@
-            def get_sequence_lengths(args: argparse.Namespace, config: AutoConfig):
+            def get_sequence_lengths(args: argparse.Namespace, config: transformers.AutoConfig):
                 past_sequence_length, curr_sequence_length = (8, 1) if args.use_past_kv else (0, 8)
@@ -41,3 +40,3 @@
-            def get_inputs(args: argparse.Namespace, config: AutoConfig):
+            def get_inputs(args: argparse.Namespace, config: transformers.AutoConfig):
                 # Dummy values for parity
@@ -104,3 +103,3 @@
                 pytorch_model: None | torch.nn.Module = None,
-                config: None | AutoConfig = None,
+                config: None | transformers.AutoConfig = None,
             ):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Support export of Llama with DynamicCache and transformers>=4.48 #24291

[WIP] Support export of Llama with DynamicCache and transformers>=4.48 #24291

xadupre commented Apr 3, 2025

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

github-actions bot left a comment

github-actions bot left a comment

@@ -43,3 +43,3 @@
                         )
+                    return None

[WIP] Support export of Llama with DynamicCache and transformers>=4.48 #24291

Are you sure you want to change the base?

[WIP] Support export of Llama with DynamicCache and transformers>=4.48 #24291

Conversation

xadupre commented Apr 3, 2025

Description

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment