You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of the work in #8496 , I try to make sure we don't run into regressions. The problem is that some tests are so unreliable, that it's very difficult to run them twice and get the same result.
I understand that due to the nature of the project, tests depend on a third party (the website), and that website may well be quite flaky. But I want to at least eliminate the most unreliable tests.
For example, I ran the full test suite (with regression detection) 75 times. Out of those, a few tests have more than 2 detected regressions (false positives):
Some non-deterministic tests really indicates problems in youtube-dl. For a concrete example, see my comment at #9219. Another example is test_Bloomberg. Sometime m3u8 gives a better quality of streams while sometimes f4m does. Currently both cases trigger an error. In this case we should force a specific format in the test. Above all, all non-deterministic should be examined one-by-one. They shouldn't be removed/skipped without a reason.
Anyway, this list is quite useful. Much thanks for the work on such a detailed investigation of tests. Could you paste it to #8496? To make regression tests possible, I may try to attack these tests first.
As part of the work in #8496 , I try to make sure we don't run into regressions. The problem is that some tests are so unreliable, that it's very difficult to run them twice and get the same result.
I understand that due to the nature of the project, tests depend on a third party (the website), and that website may well be quite flaky. But I want to at least eliminate the most unreliable tests.
For example, I ran the full test suite (with regression detection) 75 times. Out of those, a few tests have more than 2 detected regressions (false positives):
Some have as many as 19(!).
ERROR means that there was an error running the test. Maybe we should ignore those. FAIL, means that a test passed, then failed on the same code revision. You can see how this translates in regression detection here:
StreetVoice: https://travis-ci.org/anisse/youtube-dl/jobs/123862153 https://travis-ci.org/anisse/youtube-dl/jobs/123289890
MWave: https://travis-ci.org/anisse/youtube-dl/jobs/123289900
GodTube: https://travis-ci.org/anisse/youtube-dl/jobs/123289914
It even generates user issues, for example: Streetvoice #9219 .
The text was updated successfully, but these errors were encountered: