-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
discussionrequires active participation to reach a conclusionrequires active participation to reach a conclusionp1-importantImportant, aka current backlog of things to doImportant, aka current backlog of things to do
Description
Opening this issue to start the discussion on how should we approach the implementation of the DVC performance tests (aka time testing, aka benchmarking, etc).
Motivation:
As we go we should see performance degradation. For example, someone made a heavy import that affects CLI time to show something (this dvc help that should run in ms). Or, we implemented some dvc checkout logic that affects performance. Now, we have to run every time manually some additional checks. It's fragile, people tend to forget to run it, etc.
Requirements:
- Cross-platform - Windows, Max, Linux. There are some system-specific scenarios we would like to monitor and catch (100K+ files in NTFS dir). run benchmarks on Mac and Windows dvc-bench#14
- Supports automation - run at least nightly on more or less same stack of machines to being able to compare to previous runs. nightly builds for dvc master dvc-bench#15
- Ideally, creates a ticket with p1/p0 automatically to investigate on a certain threshold.
- Ideally, can try binary search to figure which commit affected this. Ping engineering on noticable performance degradation dvc-bench#8
- At least nightly reports to see changes. nightly builds for dvc master dvc-bench#15
-
Support for bash/python scripts to run.using python for now because bash is hard to run on windows. And it is easier to profile with python. - Easy to run a specific test/all tests locally with a specific DVC installation.
- Easy way to add new tests.
Implementations:
- We should definitely take a look at other projects (cpython? databases?), available cloud solutions for this?
- How do we implement this: separate repo
dvc-testvs a directory in the main repotests/timeortests/benchor whatnot.dvc-testseems to be more flexible to my mind, I don't see any downsides if it's implemented rightdirectoryapproach: @Suor could you put a comment - what specific advantages do you see here that could not implemented in thedirectoryone?
@iterative/engineering any thoughts? any comments are welcome - I'll be editing this ticket to come up with a good summary.
dmpetrov and efiop
Metadata
Metadata
Assignees
Labels
discussionrequires active participation to reach a conclusionrequires active participation to reach a conclusionp1-importantImportant, aka current backlog of things to doImportant, aka current backlog of things to do