Add support for loading PyTorch `.pt` (weights/states) files directly to model's record #1085

antimora · 2023-12-19T20:20:49Z

This PR introduces a new feature for loading PyTorch .pt files using Recorder.

Key Highlights:

Loading .pt Files:

let record = PyTorchFileRecorder::<FullPrecisionSettings>::default()
    .load("mypytorchweights.pt".into())
    .expect("Failed to decode state");

let model = MyModel::<B>::new_with(record);

Remapping Levels/Keys:
Aligning the source model's levels/keys with the target model (detailed explanation in the book):

let load_args = LoadArgs::new("mypytorchweights.pt".into())
    .with_key_remap("conv\\.(.*)", "$1"); // Removes "conv" prefix, e.g., "conv.conv1" -> "conv1"

let record = PyTorchFileRecorder::<FullPrecisionSettings>::default()
    .load(load_args)
    .expect("Failed to decode state");

let model = MyModel::<Backend>::new_with(record);

Compatibility with Burn's Modules:
The loader uses special adapters to match Burn's NN module structure. For example, Linear's weight is stored as [in, out] in Burn, whereas PyTorch stores it as [out, in]. This implementation facilitates ongoing development in Burn without needing to conform to PyTorch's format.
Dynamic Loading and Deployment:
Dynamic loading from PyTorch files is not necessary for deployments. The pytorch-import example shows how to convert a pt file to Burn's format during build time, thus removing the need to link to the Pickle library (candle-core in our case).

Current Limitations:

Candle's pickle library does not currently function on Windows due to this bug: Candle Issue #1454. Burn issue ticket: Candle's pickle library does not currently function on Windows due to Candle bug #1178
Candle's pickle does not currently unpack boolean tensors. Burn issue ticket: Candle's pickle does not currently unpack boolean tensors #1179.

Pull Request Template

Checklist

Confirmed execution of the run-checks all script.
Ensuring the book is updated with the changes in this PR. (TODO: I am still working on this)

Changes

Enhanced burn-import to add the PyTorch feature with PyTorchFileRecorder.
Refactored record implementation for [T;N] to avoid conversion into Vec[T] in primitive.rs. Instead, implemented serde for the [T;N] type. This is necessary to set default values for [T;N], e.g., kernel_size in conv2d. It's impractical to set a default value for Vec[T] and convert to [T;N] because the default for a Vec is an empty vector, which cannot be converted to [T;N].
Added a record-serde feature in burn-core that can take a NestedValue object and deserialize it into a RecordItem. This is used by PyTorchFileRecorder in burn-import and can also be utilized by other formats, e.g., Safetensor.

Testing

Added a pytorch-tests sub-crate in burn-import that thoroughly tests all NN models and scenarios using actual PyTorch pt files.

nathanielsimard

Great job! It's a lot of work, and I'm sure it's going to be very helpful to a lot of users!

Regarding the linear conversion: I'm unsure we should "force" our modules to store the weights the same way as PyTorch. What I'm sure about is that it would be very cool to have a way to load records of different versions and apply migration. I'm going to propose something soon regarding that.

burn-derive/src/record/codegen_struct.rs

antimora · 2023-12-20T02:27:51Z

@nathanielsimard @louisfd

Regarding the linear conversion: I'm unsure we should "force" our modules to store the weights the same way as PyTorch.

Yes, I agree it's not ideal, as my solution would not scale to other formats and would constrain our design choices going forward.

What I'm sure about is that it would be very cool to have a way to load records of different versions and apply migration. I'm going to propose something soon regarding that.

I agree, that would indeed be very cool. It would be even cooler if we can still keep build conversion.

I have two possible solutions (A and B):

"A" solution:

A .pt file (PyTorch model file) is converted during the build and it does not have to be aware of the target model - only during loading. This design choice will allow for build time translation or using a CLI tool. This is currently accomplished in my current implementation. Pre-converting as opposed to converting on the fly a) allows for doing some work in advance, b) eliminates a .pt runtime dependency, c) allows loading a subset of weights (e.g., load only encoder and not decoder).
During record loading, there is additional (minimal) translation because we can match the name & location of tensors. This is when we have the opportunity to know the target modules. This would mean we have to implement a custom load function. Something like the following:

let record = NamedMpkFileRecorder::<FullPrecisionSettings>::default()
    .load_pytorch_translated(file_path)
    .expect("Failed to decode state");

The caveat of the minimal translation is that there is still run-time conversion (e.g., transposing).

"B" solution:

A model's record is loaded from a .pt file but could be re-saved to Burn's file format. This would allow knowing what parts go to target modules (e.g., Linear). However, I am not sure if this would be achievable using build.rs because of the circular dependency. A CLI tool becomes out of the question because the target model info will be missing.

antimora · 2023-12-20T21:53:24Z

@nathanielsimard and I had an offline conversation, and here is the revised summary:

We agreed that the primary goal is to effectively integrate PyTorch weights into the Burn framework while maintaining independence from PyTorch's structural constraints. This involves developing mechanisms for importing, patching, and handling weights and module structures in a way that aligns with Burn's unique architecture.
The generated "Record" struct will provide essential information about the target module, including its hierarchical position, name, and module type (e.g., Linear or BatchNorm).

For PyTorch integration, we will use a PyTorchFileRecorder that functions as follows:

let record: MyModelRecord = PyTorchFileRecorder::<FullPrecisionSettings>::default()
    .load(file_path)
    .expect("Failed to load .pt file");
let model = MyModel::<Backend>::new_with(record);

MyModelRecord is a record type that can be saved in various Burn formats using the existing recorders, such as NamedMpkFileRecorder, PrettyJsonFileRecorder, BinFileRecorder, etc.

This solution achieves decoupling and offers the following advantages:

It enables dynamic or build-time conversion.
It can be implemented by others in addition to PyTorch.
It enhances accuracy as users are not required to tag module types manually.
It remains flexible, allowing for changes in module names from the source.

Luni-4

Thanks a lot for your work @antimora!

Just an advice to:

Remove a depencendy
Use an API interface more path-based and less str-based

burn-import/src/bin/pytorch2burn.rs

burn-import/src/pytorch/converter.rs

burn-import/src/pytorch/remapping.rs

burn-import/src/pytorch/converter.rs

burn-import/src/pytorch/remapping.rs

Cargo.toml

antimora · 2023-12-26T16:24:26Z

Just to update everyone. I have a solution that will accomplish what @nathanielsimard and I discussed. I researched serde extensively and it's possible to achieve only through a custom deserializer. No code change in the core or derived required.

This is does not build but it has enough progress to make the import work. I am committing it not to lose it.

nathanielsimard

LGTM

Luni-4

Fine for me too! Thanks a lot for your hard work! 😃

Just a final question, the TODO in the first message will be covered in a next PR or is this an intermediate review?

antimora · 2024-01-25T16:32:44Z

Fine for me too! Thanks a lot for your hard work! 😃

Just a final question, the TODO in the first message will be covered in a next PR or is this an intermediate review?

The documentation and filing TODOs will be done next (new PR). The TODO comment that you had found regarding Conv group testing is removed because I am testing similar aspects with kernel_size > 1.

Nikaidou-Shinku · 2024-01-30T17:39:29Z

burn-import/src/pytorch/recorder.rs

+    /// [Replacement](https://docs.rs/regex/latest/regex/struct.Regex.html#method.replace) for the
+    /// replacement syntax.
+    pub fn with_key_remap(mut self, pattern: &str, replacement: &str) -> Self {
+        let regex = Regex::new(&format!("^{}$", pattern)).unwrap();


I don't think it's a good idea to insert ^ and $ in this place, it will mislead the user.
And there are many use cases where ^ and $ need to be removed. For example I tried to rename all keys like conv.0 to conv0 because I used some workaround to implement PyTorch's Sequential in Burn. I want to do this for the whole model, it would be convenient if I could use .with_key_remap(r#"([a-z]+)\.(\d+)"#, "$1$2").

Sounds good. If you could submit a quick PR or issue, we will merge/fix it.

Thanks for letting us know.

@Nikaidou-Shinku I submitted a PR fix: #1196

Nikaidou-Shinku · 2024-01-30T17:43:32Z

burn-core/src/record/serde/data.rs

+        let mut new_name = name.clone();
+        for (pattern, replacement) in &key_remap {
+            if pattern.is_match(&name) {
+                new_name = pattern.replace_all(&name, replacement.as_str()).to_string();


Since LoadArgs::with_key_remap inserts ^ and $, replace_all has no effect at all here.

antimora added 2 commits December 19, 2023 14:16

Commit work in progress for review

4014dd8

Merge remote-tracking branch 'upstream/main' into import-torch

346110e

antimora changed the title ~~[WIP] Add support for import pytorch .pt files~~ [WIP] Add support for importing pytorch .pt files using burn-import Dec 19, 2023

nathanielsimard reviewed Dec 19, 2023

View reviewed changes

burn-derive/src/record/codegen_struct.rs Outdated Show resolved Hide resolved

antimora added 3 commits December 19, 2023 17:21

Fix the build due to a bad merge

6e8a66d

Fix path issue on Windows

ac6c7b2

Attempt to fix Windows path issue

0ef729b

Luni-4 requested changes Dec 21, 2023

View reviewed changes

antimora added 19 commits December 28, 2023 20:35

WIP as a backup

783178f

This is does not build but it has enough progress to make the import work. I am committing it not to lose it.

Add PyTorch module adapters and tests

0aef7f7

Fix serialization issue in MySerializer and add PyTorch adapter

28e7589

Refactor NestedValue struct in reader.rs

7ecf949

Add parent directories if they don't exist in file.rs

13a3b61

Add dead_code allow attribute to logger.rs

253e267

Update build.rs to re-save record in burn format

77fd43c

Added build.rs documentation

a70381e

Clean up

7807dc5

Clean up

58a8320

Add precision settings tests for linear

a4bad67

Add default deserialization methods for various data types

5980c59

Remove done todo

6411698

Add key remapping functionality to PyTorchFileRecorder

6ba93c4

Fix formatting

909ea11

Refactor PyTorch module imports and remove unused code

1d6f6c5

Remove remapping module and related code

c6b6d4c

Rename modules

0cf9144

Move remap into record module

6f01cf0

antimora added 19 commits January 23, 2024 20:30

Remove lazy_static dependency

4d2ee57

Fix array conversion error in primitive.rs

b521d00

Refactor file creation logic to handle parent directories

b78c91e

Fix assertions for gamma and beta values in norm group

c4ca115

Update de_into method to try_into_record

6e12d22

Add more detailed documentation to Serializable trait

099f450

Add error messages to unimplemented deserialization methods

9372767

Implement deserialize_map method in Deserializer

099f745

Add support for boolean values in NestedValue

c5be2aa

Refactor deserialization logic in de.rs

94a789a

Update serialization error messages

f578efd

Update error messages in PyTorchFileRecorder

7261de3

Merge remote-tracking branch 'upstream/main' into import-torch

b5fb6e1

Updated code for record with backend

60c9526

Update record-serde to record-item-custom-serde

7e832d2

Remove unused serde dependency

765b60e

Remove TODO comment in ConvBlock constructor

f1306cc

Add note about using Netron to inspect PyTorch file keys

a688fcf

Fix formatting

008775d

antimora requested review from nathanielsimard and Luni-4 January 24, 2024 21:09

nathanielsimard approved these changes Jan 24, 2024

View reviewed changes

Luni-4 approved these changes Jan 25, 2024

View reviewed changes

nathanielsimard merged commit 0368409 into tracel-ai:main Jan 25, 2024
13 of 14 checks passed

nathanielsimard deleted the import-torch branch January 25, 2024 15:20

antimora mentioned this pull request Jan 26, 2024

Update documentation and book sections on PyTorch import #1180

Merged

1 task

Nikaidou-Shinku reviewed Jan 30, 2024

View reviewed changes

antimora mentioned this pull request Jan 30, 2024

Regex fix pytorch #1196

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for loading PyTorch `.pt` (weights/states) files directly to model's record #1085

Add support for loading PyTorch `.pt` (weights/states) files directly to model's record #1085

antimora commented Dec 19, 2023 •

edited

nathanielsimard left a comment

antimora commented Dec 20, 2023

antimora commented Dec 20, 2023

Luni-4 left a comment

antimora commented Dec 26, 2023

nathanielsimard left a comment

Luni-4 left a comment

antimora commented Jan 25, 2024

Nikaidou-Shinku Jan 30, 2024

antimora Jan 30, 2024

antimora Jan 30, 2024

Nikaidou-Shinku Jan 30, 2024

Add support for loading PyTorch .pt (weights/states) files directly to model's record #1085

Add support for loading PyTorch .pt (weights/states) files directly to model's record #1085

Conversation

antimora commented Dec 19, 2023 • edited

Pull Request Template

Checklist

Changes

Testing

nathanielsimard left a comment

Choose a reason for hiding this comment

antimora commented Dec 20, 2023

antimora commented Dec 20, 2023

Luni-4 left a comment

Choose a reason for hiding this comment

antimora commented Dec 26, 2023

nathanielsimard left a comment

Choose a reason for hiding this comment

Luni-4 left a comment

Choose a reason for hiding this comment

antimora commented Jan 25, 2024

Nikaidou-Shinku Jan 30, 2024

Choose a reason for hiding this comment

antimora Jan 30, 2024

Choose a reason for hiding this comment

antimora Jan 30, 2024

Choose a reason for hiding this comment

Nikaidou-Shinku Jan 30, 2024

Choose a reason for hiding this comment

Add support for loading PyTorch `.pt` (weights/states) files directly to model's record #1085

Add support for loading PyTorch `.pt` (weights/states) files directly to model's record #1085

antimora commented Dec 19, 2023 •

edited