Performance testing suite design

Opening this issue to start the discussion on how should we approach the implementation of the DVC performance tests (aka time testing, aka benchmarking, etc).

**Motivation:**

As we go we should see performance degradation. For example, someone made a heavy import that affects CLI time to show something (this `dvc help` that should run in ms). Or, we implemented some `dvc checkout` logic that affects performance. Now, we have to run every time manually some additional checks. It's fragile, people tend to forget to run it, etc.

**Requirements:**

- [ ] Cross-platform - Windows, Max, Linux. There are some system-specific scenarios we would like to monitor and catch (100K+ files in NTFS dir). https://github.com/iterative/dvc-bench/issues/14
- [ ] Supports automation - run at least nightly on more or less same stack of machines to being able to compare to previous runs. https://github.com/iterative/dvc-bench/issues/15
- [ ] Ideally, creates a ticket with p1/p0 automatically to investigate on a certain threshold. 
- [ ] Ideally, can try binary search to figure which commit affected this. https://github.com/iterative/dvc-bench/issues/8
- [ ] At least nightly reports to see changes. https://github.com/iterative/dvc-bench/issues/15
- [x] <s> Support for bash/python scripts to run. </s> using python for now because bash is hard to run on windows. And it is easier to profile with python.
- [x] Easy to run a specific test/all tests locally with a specific DVC installation.
- [x] Easy way to add new tests.

**Implementations:**

* We should definitely take a look at other projects (cpython? databases?), available cloud solutions for this?
* How do we implement this: separate repo `dvc-test` vs a directory in the main repo `tests/time` or `tests/bench` or whatnot.
   - `dvc-test` seems to be more flexible to my mind, I don't see any downsides if it's implemented right
   - `directory` approach: @Suor could you put a comment - what specific advantages do you see here that could not implemented in the `directory` one?

@iterative/engineering any thoughts? any comments are welcome - I'll be editing this ticket to come up with a good summary.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance testing suite design #2511

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance testing suite design #2511

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions