pytest with datafile by cboulay · Pull Request #38 · xdf-modules/pyxdf

cboulay · 2019-09-20T05:34:58Z

This is a first attempt. I would like to NOT include the datafile in the repo and instead download the file on demand. We can put the (small) datafile in a GitHub release, otherwise we need to host them somewhere.

@tstenner , how do you think we should get azure to download our test data? Within a script? Or within a pytest fixture? Something else?

cbrnr · 2019-09-20T06:37:34Z

        vla(b'\x00')
+
+
+FIXTURE_DIR = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'test_files')


Can you do this with pathlib?

I was just following the most straightforward example from pytest-datafiles. These small details aren't important until we iron out the bigger details, which may require using a completely different way of downloading/loading data.

cbrnr · 2019-09-20T06:43:42Z

Would git-lfs or git-annex be a viable option? Why don't you want to include (small) test files directly in this repo? If they are small, I don't see the problem (the test files should not be packaged of course).

In MNE, we have a separate repository just for our testing data, but the main problem is that downloading is super slow. Maybe this is faster when you attach files to a release, but isn't this difficult to manage? Do we have to move/copy all testing files for each new release then?

cboulay · 2019-09-20T07:08:35Z

Why don't you want to include (small) test files directly in this repo?

(1) The way the package is structured currently, the tests travel with the python package to pypi. We could probably put the test data in a different folder, or manually remove them from the packages before uploading (seems like silly over-engineering to do it that way). This is also exactly the same test file that's in the parent repo.
(2) I don't want to inflate the git history with binaries. Seems not so bad right now, but it can become a problem quickly, especially because...
(3) there are plans (SCCN/Intheon) to add many more test files to the test suite, especially 'problematic' files.

Attaching to a release feels like the wrong solution. The main reason I want the test files in there at all is to automatically test file loading whenever someone makes a PR. If a new feature requires a new test file, how does one verify the feature before the test file is in a release?

cbrnr · 2019-09-20T08:05:15Z

I totally agree that we need a test suite that automatically runs with a PR. The more files we have in there the better. I think git-lfs should work pretty well for our purposes.

tstenner · 2019-09-20T08:18:06Z

I would like to NOT include the datafile in the repo and instead download the file on demand.

Seconded. It's not always practical to clone a shallow copy and processing any data file every present in the repository slows everything down (or, as Clemens put it: "super slow") . With Git LFS it would be reasonable, but I think it's overkill. Also, once a big file gets pushed it's in the repository practically forever.

Also, once we isolate the broken part of a file it should be possible to create a synthetic test file that triggers the bug, but is much shorter.

For the benchmarks, we need huge files so I have small scripts that generate big files to test the different code paths on the fly and then delete them after. Those could stay in the repository.

@tstenner , how do you think we should get azure to download our test data?

It all depends what the CI should test. For one job (i.e., only one operating system), it doesn't really matter. For parallel jobs on Azure, I'd create one stage to download the files and then copy them to each job so the files get downloaded (and possibly cached) once per run and copying them to the CI workers is fast.

Within a script? Or within a pytest fixture? Something else?

I'd suggest in a script unless pytest handles downloading changed/new files efficiently. rsync would do this, but it'd need a dedicated host. Or a simple folder listing served via HTTP and wget.

cbrnr · 2019-10-10T09:10:59Z

I have about 1.7GB of test files. Not sure any online/cloud solution is suitable, because CIs need to download the data each time. Sure we could restrict ourselves to small files, but it'd still be nice to also test with real-world large files.

cboulay · 2019-10-10T09:52:03Z

but it'd still be nice to also test with real-world large files

Not with CI systems though. If you want to include some tests in the repo that are only run on your local computer (e.g., after checking for the presence of a specific folder and/or only invoked with a special argument to pytest) then that's fine. If you want the tests to be run on the CI servers then the files should be small or generated on the fly.

cbrnr · 2019-10-10T09:57:22Z

Yes, obviously we can't test large files with CIs, so I'm fine with only using small files for CIs. We should still try to run our tests on large files, because these might reveal issues that are not found in small files. We could run our tests on all files (small and large) manually before a new release. I'm open to the implementation details of how we want to make this work, and we also still need to solve the question where we store our large files (since I assume that you all seem to be OK with including small files in either this repo or a separate xdf-testdata repo - I'd prefer the separate repo because that way other repos can re-use the same test files).

cboulay · 2019-10-10T09:59:40Z

Agreed, not within this repo. I don't have enough experience with git-lfs, does it have better performance with large binary files than normal git?

tstenner · 2019-10-10T10:00:23Z

Does it have better performance with large binary files than normal git?

It basically uploads the files separately and replaces them with links, so it's a lot faster than normal git but only as fast as the place where the files get uploaded to.

cbrnr · 2019-10-10T10:00:49Z

Exactly. And we still don't know where to host the large files.

cbrnr · 2019-10-10T10:02:22Z

@cboulay could you go ahead and create an xdf-testdata repo within the xdf-modules organization?

tstenner · 2019-10-10T10:05:08Z

https://github.com/xdf-modules/example-files

cbrnr · 2019-10-11T07:30:27Z

Re how to do the actual testing, I think pytest-datafiles is not even required. This package's main purpose is to copy files to a temp dir in order to avoid modifying the original files. We don't need to do this, because (1) we're pulling our test files from a git repo (and we don't push), and (2) we don't modify our test files anyway.

I'd clone the example-files repo in an Azure script, and we can then use the test files with the @pytest.mark.parametrize decorator.

pytest with datafile

62289c8

cbrnr reviewed Sep 20, 2019

View reviewed changes

cboulay mentioned this pull request Oct 9, 2019

Fix more warnings #48

Merged

2 tasks

cbrnr mentioned this pull request Oct 15, 2019

Add tests for example data to CI #49

Merged

cbrnr closed this in #49 Oct 23, 2019

cbrnr deleted the pytest_datafile branch October 23, 2019 10:25

		vla(b'\x00')


		FIXTURE_DIR = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'test_files')

Conversation

cboulay commented Sep 20, 2019

Uh oh!

cbrnr Sep 20, 2019

Choose a reason for hiding this comment

Uh oh!

cboulay Sep 20, 2019

Choose a reason for hiding this comment

Uh oh!

cbrnr commented Sep 20, 2019

Uh oh!

cboulay commented Sep 20, 2019

Uh oh!

cbrnr commented Sep 20, 2019

Uh oh!

tstenner commented Sep 20, 2019 • edited by cboulay Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbrnr commented Oct 10, 2019

Uh oh!

cboulay commented Oct 10, 2019

Uh oh!

cbrnr commented Oct 10, 2019

Uh oh!

cboulay commented Oct 10, 2019

Uh oh!

tstenner commented Oct 10, 2019

Uh oh!

cbrnr commented Oct 10, 2019

Uh oh!

cbrnr commented Oct 10, 2019

Uh oh!

tstenner commented Oct 10, 2019

Uh oh!

cbrnr commented Oct 11, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tstenner commented Sep 20, 2019 •

edited by cboulay

Loading