Excessive reparsing of `ambr` files in tests with large snapshots #837

StephanMeijer · 2023-11-22T16:08:00Z

Describe the bug

We are encountering a problem in our testing process where we handle very large DOCX/XML files. Our test setup involves using the syrupy library to create snapshots. Here's a snippet of our test code:

import lxml.etree
import pytest
from syrupy import snapshot

from nldoc_p4_docx.docx_processing.docx.document import Document
from tests.conftest import all_fixture_paths, matcher


@pytest.mark.parametrize('xml', all_fixture_paths('docx/documents'), indirect=True)
def test_loading_documents(snapshot, xml: lxml.etree.Element):
    assert Document(xml) == snapshot(matcher=matcher)

While generating large snapshot files is not a problem in itself, we are facing an issue where the ambr file is completely reparsed for each test run, even though we only need a snapshot for the specific fixture being tested.

To reproduce

Run a test that uses large fixtures, which results in the creation of large snapshot files.

Expected behavior

We expect that testing a large fixture would slow down only that specific test, rather than affecting the performance of all tests that use the same snapshot file. A potential solution might be to have a setting that allows splitting snapshot files with different names into separate files.

Environment (please complete the following information):

OS: macOS, Darwin 23.1.0
Syrupy Version: 4.6.0
Python Version: 3.11.6

Additional context

Our repository is available at: https://gitlab.com/toegang-voor-iedereen/conversion/preprocessing/docx-preprocessor

The text was updated successfully, but these errors were encountered:

StephanMeijer · 2023-11-22T16:12:46Z

I saw a huge increase in performance after splitting my tests in separate files,but it still seems to take a long time for syrupy to parse.

StephanMeijer · 2023-11-22T16:28:45Z

As also mentioned in #119, with this extension I reduced the time my tests took from around 200 seconds to around 5.

class SingleFileAmberSnapshotExtension(SingleFileSnapshotExtension):
    def serialize(self, data, **kwargs):
        return AmberDataSerializer.serialize(data, **kwargs).encode()

StephanMeijer · 2023-11-22T16:49:40Z

Got things working perfectly with the following snipped:

class SingleFileAmberSnapshotExtension(SingleFileSnapshotExtension):
    _file_extension = 'ambr'
    _write_mode = WriteMode.TEXT

    def serialize(self, data, **kwargs):
        return AmberDataSerializer.serialize(data, **kwargs)

This means I am not going to invest any more time in this ticket. If no objections, this ticket can be closed.

noahnu · 2023-11-25T03:27:58Z

Let's keep the issue open. We shouldn't be parsing the snapshot file more than once. There's a cache. It must be getting invalidated. Not sure when I'll have time to dig into this but will loop back to this when I have a chance.

StephanMeijer · 2023-11-27T08:24:33Z

Let's keep the issue open. We shouldn't be parsing the snapshot file more than once. There's a cache. It must be getting invalidated. Not sure when I'll have time to dig into this but will loop back to this when I have a chance.

That is interesting.. As the performance difference is about a factor of a hundred.

StephanMeijer changed the title ~~Performance with huge snapshots and using fixtures~~ Excessive reparsing of ambr files in tests with large snapshots Nov 22, 2023

StephanMeijer mentioned this issue Nov 22, 2023

Crash with SingleFileSnapshotExtension #749

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive reparsing of `ambr` files in tests with large snapshots #837

Excessive reparsing of `ambr` files in tests with large snapshots #837

StephanMeijer commented Nov 22, 2023 •

edited

StephanMeijer commented Nov 22, 2023

StephanMeijer commented Nov 22, 2023

StephanMeijer commented Nov 22, 2023

noahnu commented Nov 25, 2023

StephanMeijer commented Nov 27, 2023

Excessive reparsing of ambr files in tests with large snapshots #837

Excessive reparsing of ambr files in tests with large snapshots #837

Comments

StephanMeijer commented Nov 22, 2023 • edited

StephanMeijer commented Nov 22, 2023

StephanMeijer commented Nov 22, 2023

StephanMeijer commented Nov 22, 2023

noahnu commented Nov 25, 2023

StephanMeijer commented Nov 27, 2023

Excessive reparsing of `ambr` files in tests with large snapshots #837

Excessive reparsing of `ambr` files in tests with large snapshots #837

StephanMeijer commented Nov 22, 2023 •

edited