Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the record type performance #1151

Closed
nathanielsimard opened this issue Jan 18, 2024 · 5 comments · Fixed by #1161
Closed

Improve the record type performance #1151

nathanielsimard opened this issue Jan 18, 2024 · 5 comments · Fixed by #1161
Assignees
Labels
performance Anything related to performance

Comments

@nathanielsimard
Copy link
Member

The current NamedMpkGz is too slow in term of serialization and deserialization, so we should investiguate alternatives or improve it.

@nathanielsimard nathanielsimard added the performance Anything related to performance label Jan 18, 2024
@antimora
Copy link
Collaborator

It might be because of compression. There is an open bug the serialization does not finish if compression is enabled. We should check compressed and uncompressed.

@laggui
Copy link
Member

laggui commented Jan 18, 2024

It is definitely related to the compression. I collected some reference numbers while I was working on my ResNet implementation (in release mode, because compression is even slower in debug mode).

Model Format Load Save
ResNet-18 NamedMpkFileRecorder 80 ms 261 ms
ResNet-50 NamedMpkFileRecorder 201 ms 575 ms
ResNet-152 NamedMpkFileRecorder 463 ms 1296 ms
ResNet-18 NamedMpkGzFileRecorder 634 ms 7172 ms
ResNet-50 NamedMpkGzFileRecorder 1512 ms 15345 ms
ResNet-152 NamedMpkGzFileRecorder 3405 ms 36456 ms

The difference is quite substantial. Even if we switch from Compression::default() to Compression::fast() here.

Model Format Load Save
ResNet-18 NamedMpkGzFileRecorder 643 ms 3842 ms
ResNet-50 NamedMpkGzFileRecorder 1397 ms 8540 ms
ResNet-152 NamedMpkGzFileRecorder 3181 ms 19746 ms

The MessagePack format without compression is much faster, but does not provide checksum validation for corrupted files. Ideally we could provide a format that is faster than the current NamedMpkGzFileRecorder and that provides checksum validation.

@laggui
Copy link
Member

laggui commented Jan 19, 2024

So I explored some alternatives to the gzip compression today, namely zlib-ng (supported by flate2) and lz4 (with bindings from lzzzz.

For zlib-ng, which is supposed to be faster than gzip, we get better compression times and decompression is similar for the default compression levels.

Click for recorder implementation

Add the following to file.rs.

use flate2::{read::ZlibDecoder, write::ZlibEncoder, Compression};

/// File recorder using the [named msgpack](rmp_serde) format with zlib-ng.
#[derive(new, Debug, Default, Clone)]
pub struct NamedMpkZlibFileRecorder<S: PrecisionSettings> {
    _settings: PhantomData<S>,
}

impl<S: PrecisionSettings> FileRecorder for NamedMpkZlibFileRecorder<S> {
    fn file_extension() -> &'static str {
        "zz"
    }
}

impl<S: PrecisionSettings> Recorder for NamedMpkZlibFileRecorder<S> {
    type Settings = S;
    type RecordArgs = PathBuf;
    type RecordOutput = ();
    type LoadArgs = PathBuf;

    fn save_item<I: Serialize>(
        &self,
        item: I,
        mut file: Self::RecordArgs,
    ) -> Result<(), RecorderError> {
        let writer = str2writer!(file)?;
        let mut writer = ZlibEncoder::new(writer, Compression::default());
        // let mut writer = ZlibEncoder::new(writer, Compression::fast());
        rmp_serde::encode::write_named(&mut writer, &item)
            .map_err(|err| RecorderError::Unknown(err.to_string()))?;

        Ok(())
    }

    fn load_item<I: DeserializeOwned>(&self, mut file: Self::LoadArgs) -> Result<I, RecorderError> {
        let reader = str2reader!(file)?;
        let reader = ZlibDecoder::new(reader);
        let state = rmp_serde::decode::from_read(reader)
            .map_err(|err| RecorderError::Unknown(err.to_string()))?;

        Ok(state)
    }
}

Note: requires features = ["zlib-ng"] for flate2.

Model Format Load Save
ResNet-18 NamedMpkZlibFileRecorder 703 ms 4406 ms
ResNet-50 NamedMpkZlibFileRecorder 1526 ms 9573 ms
ResNet-152 NamedMpkZlibFileRecorder 3638 ms 22404 ms

But when using the lowest compression level, it seems that zlib-ng is actually a tiny bit slower than gzip.

Model Format Load Save
ResNet-18 NamedMpkZlibFileRecorder 663 ms 4077 ms
ResNet-50 NamedMpkZlibFileRecorder 1487 ms 8741 ms
ResNet-152 NamedMpkZlibFileRecorder 3343 ms 20833 ms

For lz4 we actually the lz4f module as it includes content checksum validation. The results for compression are a lot better (~3x) than any other comparable, though load times actually seemed a little bit slower than gzip decompression.

Click for recorder implementation

Add the following to file.rs.

use lzzzz::lz4f::{ContentChecksum, PreferencesBuilder, ReadDecompressor, WriteCompressor};

/// File recorder using the [named msgpack](rmp_serde) format with lz4f compression.
#[derive(new, Debug, Default, Clone)]
pub struct NamedMpkLz4fFileRecorder<S: PrecisionSettings> {
    _settings: PhantomData<S>,
}

impl<S: PrecisionSettings> FileRecorder for NamedMpkLz4fFileRecorder<S> {
    fn file_extension() -> &'static str {
        "lz4f"
    }
}

impl<S: PrecisionSettings> Recorder for NamedMpkLz4fFileRecorder<S> {
    type Settings = S;
    type RecordArgs = PathBuf;
    type RecordOutput = ();
    type LoadArgs = PathBuf;

    fn save_item<I: Serialize>(
        &self,
        item: I,
        mut file: Self::RecordArgs,
    ) -> Result<(), RecorderError> {
        let writer = str2writer!(file)?;
        let pref = PreferencesBuilder::new()
            .content_checksum(ContentChecksum::Enabled)
            // .compression_level(6)
            .build();

        let mut writer = WriteCompressor::new(writer, pref)
            .map_err(|err| RecorderError::Unknown(err.to_string()))?;

        rmp_serde::encode::write_named(&mut writer, &item)
            .map_err(|err| RecorderError::Unknown(err.to_string()))?;

        Ok(())
    }

    fn load_item<I: DeserializeOwned>(&self, mut file: Self::LoadArgs) -> Result<I, RecorderError> {
        let reader = str2reader!(file)?;
        let reader =
            ReadDecompressor::new(reader).map_err(|err| RecorderError::Unknown(err.to_string()))?;
        let state = rmp_serde::decode::from_read(reader)
            .map_err(|err| RecorderError::Unknown(err.to_string()))?;

        Ok(state)
    }
}

Note: requires lzzzz = "1.0.4" in your Cargo.toml.

Model Format Load Save
ResNet-18 NamedMpkLz4fFileRecorder 895 ms 1270 ms
ResNet-50 NamedMpkLz4fFileRecorder 2163 ms 2779 ms
ResNet-152 NamedMpkLz4fFileRecorder 5002 ms 6752 ms

When disabling the checksum we observe that the save and load times are slightly faster but there is still some overhead.

Model Format Load Save
ResNet-18 NamedMpkLz4fFileRecorder 808 ms 1109 ms

For usability, it still leaves much to be desired when loading simple models like a ResNet.

TL;DR: as discussed offline with @nathanielsimard the current MessagePack serialization with NamedMpkFileRecorder is probably the most sensible default at this time. For checksum validation, it's probably only relevant when downloading models that we distribute so we could handle that when we download the files instead.

/edit: This experiment was done for relatively small models.. I can't imagine how bad the user experience would be for even bigger models. See also: a note on checksum for loading/saving models

@antimora
Copy link
Collaborator

Thanks for exploring alternatives. I agree we should make message pack default without compression. This is what I did for onnx default type when exporting.

@Luni-4
Copy link
Collaborator

Luni-4 commented Jan 20, 2024

We also have https://github.com/memorysafety/zlib-rs as possible alternative, but it is still in an embronyal state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Anything related to performance
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants