Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Improve regression detection with continous testing #8496
Comments
|
I've been working on a python version integrated with travis here: It is not ready for a pull request yet, since this is a bit more complex than i'd have wished, but already does the job nicely:
Warnings are treated as errors for now, since it could mess up with regression detections, but I'm not sure what to do about this yet. Any idea ? Last but not least, this splits the test suite in multiple part, because we might otherwise run it to the travis time limit. It contains a generic solution to do that with nosetests. I originally wanted to do this work separately, but the travis 2 hours time limit forced my hand. |
|
For parallel tests, see #7267. It uses nose's built-in parallel feature. |
|
Looks nice. I had trouble using it nose processes (did not see all tests, wasn't deterministic), I should test your branch. Note that both solutions can be used in parallel (pun unintended), since my solution splits the work at the travis level, allowing you to have different jobs (running on different VMs), while the nose-based solution splits the work inside a job/VM. The nose solution could be made generic though if travis integrated automatic nose like it does with RSpec, Cucumber and Minitest. |
|
This is copy of the message in #9235 : I understand that due to the nature of the project, tests depend on a third party (the website), and that website may well be quite flaky. But I want to at least eliminate the most unreliable tests. For example, I ran the full test suite (with regression detection) 75 times. Out of those, a few tests have more than 2 detected regressions (false positives):
Some have as many as 19(!). It even generates user issues, for example: Streetvoice #9219 . |
|
Well I run for i in $(seq 1 75) ; do
python3.6 -Werror test/test_download.py TestDownload.test_ACast || break
doneAnd there are no errors. Could you give an example of error messages? |
|
Oh I found one: https://travis-ci.org/anisse/youtube-dl/jobs/123862117#L388
503 is indeed quite strange. |
|
streetvoice.py updated with the new API in 4dccea8. Hopefully fixes 403 errors. Please leave comments if the problem is still. |
|
I've improved the script in the regdetect branch: I've collapsed the commits and it's getting closer to ready for a merge. It now tests for reliability by going back and forth to before/after push, to hopefully reduce the number of false positives. It also does automatic regression bisecting, so we don't need to look at the various commits in push to guess which one introduced a regression. As you have seen, I have put in place automatic merging in my tree, and it already found a few regressions (#9991, #10018, #10030, #10048, #10064, #10096 for example). I'm hoping the last changes will make it even easier to use since it will automatically pinpoint the bad commits, have less false positives, and tell us which tests/websites are not reliable. |
It seems the youtube-dl project has abandoned the idea of using tests with travis-ci to keep all IEs always working (last passed build was 2 years ago, #2571). Not that it's a bad idea, since it's hard to keep up with all those sites. But a side effect of that is that it's harder to detect test fails that are due to a new commit, and not a website change.
What's I'd like to propose, is to keep the current testing infrastructure as a dashboard for working/non-working IEs, and add another "testsuite" that would test each commit for regressions: if a test is failing, the previous commit would be tested as well, and if it fails, it's not considered a regression.
I've made a simple proof of concept https://gist.github.com/anisse/6093f8b5814ab3ce7140 in bash. This could be improved/rewritten, in order to be integrated with travis-ci.
What do you guys think ?