Conversation
| vla(b'\x00') | ||
|
|
||
|
|
||
| FIXTURE_DIR = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'test_files') |
There was a problem hiding this comment.
Can you do this with pathlib?
There was a problem hiding this comment.
I was just following the most straightforward example from pytest-datafiles. These small details aren't important until we iron out the bigger details, which may require using a completely different way of downloading/loading data.
|
Would git-lfs or git-annex be a viable option? Why don't you want to include (small) test files directly in this repo? If they are small, I don't see the problem (the test files should not be packaged of course). In MNE, we have a separate repository just for our testing data, but the main problem is that downloading is super slow. Maybe this is faster when you attach files to a release, but isn't this difficult to manage? Do we have to move/copy all testing files for each new release then? |
(1) The way the package is structured currently, the tests travel with the python package to pypi. We could probably put the test data in a different folder, or manually remove them from the packages before uploading (seems like silly over-engineering to do it that way). This is also exactly the same test file that's in the parent repo. Attaching to a release feels like the wrong solution. The main reason I want the test files in there at all is to automatically test file loading whenever someone makes a PR. If a new feature requires a new test file, how does one verify the feature before the test file is in a release? |
|
I totally agree that we need a test suite that automatically runs with a PR. The more files we have in there the better. I think git-lfs should work pretty well for our purposes. |
Seconded. It's not always practical to clone a shallow copy and processing any data file every present in the repository slows everything down (or, as Clemens put it: "super slow") . With Git LFS it would be reasonable, but I think it's overkill. Also, once a big file gets pushed it's in the repository practically forever. Also, once we isolate the broken part of a file it should be possible to create a synthetic test file that triggers the bug, but is much shorter. For the benchmarks, we need huge files so I have small scripts that generate big files to test the different code paths on the fly and then delete them after. Those could stay in the repository.
It all depends what the CI should test. For one job (i.e., only one operating system), it doesn't really matter. For parallel jobs on Azure, I'd create one stage to download the files and then copy them to each job so the files get downloaded (and possibly cached) once per run and copying them to the CI workers is fast.
I'd suggest in a script unless pytest handles downloading changed/new files efficiently. rsync would do this, but it'd need a dedicated host. Or a simple folder listing served via HTTP and wget. |
|
I have about 1.7GB of test files. Not sure any online/cloud solution is suitable, because CIs need to download the data each time. Sure we could restrict ourselves to small files, but it'd still be nice to also test with real-world large files. |
Not with CI systems though. If you want to include some tests in the repo that are only run on your local computer (e.g., after checking for the presence of a specific folder and/or only invoked with a special argument to pytest) then that's fine. If you want the tests to be run on the CI servers then the files should be small or generated on the fly. |
|
Yes, obviously we can't test large files with CIs, so I'm fine with only using small files for CIs. We should still try to run our tests on large files, because these might reveal issues that are not found in small files. We could run our tests on all files (small and large) manually before a new release. I'm open to the implementation details of how we want to make this work, and we also still need to solve the question where we store our large files (since I assume that you all seem to be OK with including small files in either this repo or a separate xdf-testdata repo - I'd prefer the separate repo because that way other repos can re-use the same test files). |
|
Agreed, not within this repo. I don't have enough experience with git-lfs, does it have better performance with large binary files than normal git? |
It basically uploads the files separately and replaces them with links, so it's a lot faster than normal git but only as fast as the place where the files get uploaded to. |
|
Exactly. And we still don't know where to host the large files. |
|
@cboulay could you go ahead and create an |
|
Re how to do the actual testing, I think I'd clone the example-files repo in an Azure script, and we can then use the test files with the |
This is a first attempt. I would like to NOT include the datafile in the repo and instead download the file on demand. We can put the (small) datafile in a GitHub release, otherwise we need to host them somewhere.
@tstenner , how do you think we should get azure to download our test data? Within a script? Or within a pytest fixture? Something else?