Skip to content

Conversation

@faridyagubbayli
Copy link
Collaborator

@faridyagubbayli faridyagubbayli commented Jan 15, 2024

closes #259

Adds caching support for the generated matlab references. This will allow us to avoid re-generating the reference files in every test run -- speed up our tests, improve developer experience and reduce Github's energy bill :)

Introduced logic is simple: if there is no change in the matlab reference generation files, pull previously generated files (if they exist). I also changing data sharing logic between Github Action-Jobs. Previously, we used artifacts to share the data. Now we are using the cache directly. Cache functionality is faster compared to artifacts from what I observed.

Some numbers:

Matlab reference generation duration Total test duration
Artifacts based workflow (without caching) 3 mins 36 secs 8 mins 9 secs
Caching based workflow (cache-miss) 2 mins 22 secs 6 mins 53 secs
Caching based workflow (cache-hit) 15 secs 5 mins 20 secs

^ here first and second scenarios are similar because we generate the reference files in both of them. Latter is faster due to usage of cache to share data between jobs.
^^ when there is cache-hit, pytest runs start significantly quicker than first two scenarios. This means you can start reading test logs faster than before.

Future improvement: Use file-level caching. Currently, if any of the matlab reference generation files get updated, all references will be re-generated. In the future, we can work towards having more fine-grained caching.

@waltsims
Copy link
Owner

Please see #264 for my suggested changes.

Matlab reference generation duration Total test duration
Artifacts based workflow (without caching) 3 mins 36 secs 8 mins 9 secs
Caching based workflow (cache-miss) 2 mins 22 secs 6 mins 53 secs
Caching based workflow (cache-hit) 15 secs 5 mins 20 secs
New Caching based workflow (cache-miss) 2m 45s 4m 23s
New Caching based workflow (cache-hit) 6s 4m 6s

I don't know how indicative these numbers are, but the workflow file is simpler.

@faridyagubbayli
Copy link
Collaborator Author

I don't know how indicative these numbers are, but the workflow file is simpler.

Agree, they are not proper performance benchmarks. I just wanted to demonstrate the speed up based on the reported run times in Github. I used only a single test run for each row.

@waltsims waltsims merged commit 8b94bdc into master Jan 16, 2024
@waltsims waltsims deleted the feature/cache-matlab-references-between-runs branch January 16, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache matlab collected references

3 participants