Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive reparsing of ambr files in tests with large snapshots #837

Open
StephanMeijer opened this issue Nov 22, 2023 · 5 comments
Open

Comments

@StephanMeijer
Copy link

StephanMeijer commented Nov 22, 2023

Describe the bug

We are encountering a problem in our testing process where we handle very large DOCX/XML files. Our test setup involves using the syrupy library to create snapshots. Here's a snippet of our test code:

import lxml.etree
import pytest
from syrupy import snapshot

from nldoc_p4_docx.docx_processing.docx.document import Document
from tests.conftest import all_fixture_paths, matcher


@pytest.mark.parametrize('xml', all_fixture_paths('docx/documents'), indirect=True)
def test_loading_documents(snapshot, xml: lxml.etree.Element):
    assert Document(xml) == snapshot(matcher=matcher)

While generating large snapshot files is not a problem in itself, we are facing an issue where the ambr file is completely reparsed for each test run, even though we only need a snapshot for the specific fixture being tested.

To reproduce

Run a test that uses large fixtures, which results in the creation of large snapshot files.

Expected behavior

We expect that testing a large fixture would slow down only that specific test, rather than affecting the performance of all tests that use the same snapshot file. A potential solution might be to have a setting that allows splitting snapshot files with different names into separate files.

Environment (please complete the following information):

  • OS: macOS, Darwin 23.1.0
  • Syrupy Version: 4.6.0
  • Python Version: 3.11.6

Additional context

Our repository is available at: https://gitlab.com/toegang-voor-iedereen/conversion/preprocessing/docx-preprocessor

@StephanMeijer StephanMeijer changed the title Performance with huge snapshots and using fixtures Excessive reparsing of ambr files in tests with large snapshots Nov 22, 2023
@StephanMeijer
Copy link
Author

I saw a huge increase in performance after splitting my tests in separate files,but it still seems to take a long time for syrupy to parse.

@StephanMeijer
Copy link
Author

As also mentioned in #119, with this extension I reduced the time my tests took from around 200 seconds to around 5.

class SingleFileAmberSnapshotExtension(SingleFileSnapshotExtension):
    def serialize(self, data, **kwargs):
        return AmberDataSerializer.serialize(data, **kwargs).encode()

@StephanMeijer
Copy link
Author

Got things working perfectly with the following snipped:

class SingleFileAmberSnapshotExtension(SingleFileSnapshotExtension):
    _file_extension = 'ambr'
    _write_mode = WriteMode.TEXT

    def serialize(self, data, **kwargs):
        return AmberDataSerializer.serialize(data, **kwargs)

This means I am not going to invest any more time in this ticket. If no objections, this ticket can be closed.

@noahnu
Copy link
Collaborator

noahnu commented Nov 25, 2023

Let's keep the issue open. We shouldn't be parsing the snapshot file more than once. There's a cache. It must be getting invalidated. Not sure when I'll have time to dig into this but will loop back to this when I have a chance.

@StephanMeijer
Copy link
Author

Let's keep the issue open. We shouldn't be parsing the snapshot file more than once. There's a cache. It must be getting invalidated. Not sure when I'll have time to dig into this but will loop back to this when I have a chance.

That is interesting.. As the performance difference is about a factor of a hundred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants