Enable test reruns on failed fragiled tests #217

avtikhon · 2020-07-15T09:04:14Z

Added ability to use fragile list in JSON format from 'suite.ini'
files.

   fragile = {
        "retries": 10,
        "tests": {
            "bitset.test.lua": {
                "issues": [ "gh-4095" ],
                "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
            }
        }}

Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list.

Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.

Closes tarantool/tarantool#189.

dispatcher.py

lib/test_suite.py

lib/worker.py

lib/test_suite.py

lib/worker.py

LeonidVas · 2020-08-06T22:10:43Z

It looks like this closes # 189. Add this information to the commit message instead of "Part of tarantool / tarantool problem # 5050".

avtikhon · 2020-08-12T10:43:12Z

It looks like this closes # 189. Add this information to the commit message instead of "Part of tarantool / tarantool problem # 5050".

Ok, changed.

LeonidVas · 2020-08-13T07:48:47Z

After some discussion with @Totktonada and @avtikhon, it was decided to try using JSON for configuration a "fragile".
Example:

fragile =
 {
   "tests": {
     "test_name": {
       "checksums": [
         ch1,
         ch2
       ]
     }
   },
   "retries": 10
 }

Useful links:
https://docs.python.org/3/library/configparser.html#supported-ini-file-structure
https://tools.ietf.org/html/rfc7159

avtikhon · 2020-08-13T09:58:41Z

After some discussion with @Totktonada and @avtikhon, it was decided to try using JSON for configuration a "fragile".
Example:
fragile =
 {
   "tests": {
     "test_name": {
       "checksums": [
         ch1,
         ch2
       ]
     }
   },
   "retries": 10
 }
Useful links:
https://docs.python.org/3/library/configparser.html#supported-ini-file-structure
https://tools.ietf.org/html/rfc7159

Implemented.

LeonidVas

As I said before, transition to JSON format should be implemented by a first patch in the patchset. And all new "fragile" options need to be added only to JSON. Introducing two new competing formats in the same patchset is very strange decision.
Also, I think the format fragile = <basename of the test> ; gh-<issue> md5sum:<checksum> is strange (and potentially not valid), because:

; / # is used for a comment adding and it looks like md5sum: <checksum> is a comment, but it is not.
= / : is a separator for key/value and mixing them in the same file is a strange decision.

lib/worker.py

dispatcher.py

lib/test.py

lib/test_suite.py

lib/worker.py

Totktonada · 2020-09-14T05:58:11Z

I didn't look thoroughly, but I would highlight several points.

The approach with checksums is not universal. It does not work if time-related information appears in a result file. It requires periodical checksums updating if a line number is reported as part of an error. I guess it is okay to operate on a test level and either mark it as stable or as fragile.
We can either implement reruns within a worker or on the level upward: within the task management system (dispathers-listeners-workers). Since there is nothing tarantool-specific I would look into the latter way. I pushed a variant to Totktonada/rerun-fragile-test branch (0292fe2).

If you want my opinion about extracting checksums from comments — it is disgusting. Sorry.

There are more points I would highlight about the patchset, but it does not have much sense if we'll get rid of checksums at all. The patchset modifies many different parts of test-run: too much for this small feature, IMHO.

Added ability to use fragile list in JSON format from 'suite.ini' files: fragile = { "tests": { "bitset.test.lua": { "issues": [ "gh-4095" ] } }}

LeonidVas

Looks syntactically correct to me.

Totktonada · 2020-09-23T01:06:30Z

I pushed two branches into tarantool today (and rerun failed jobs several times):

Got several 'address already in use' fails (http_client.test.lua). It is the problem of test-run design (Fix 'Address already in use' error with TCP sockets #141), not the test per se.
Got several fails on net.box_reconnect_after_gh-3164.test.lua with the whole box.info.replication output (time related information, UUIDs).
Got several fails on net.box_reconnect_after_gh-3164.test.lua with peer UUID, Unix socket path, schema version and so on (will differ from run to run).

There are other fails, but I didn't look whether the output was predictable (at least vinyl.ddl.test.lua, don't remember others).

I know, you're sure you can mark almost everything with checksums, but I see the opposite picture now. Maybe I'm just lucky.

At least, please, left ability to rerun a test, which is not marked with any checksum (I didn't look at the last patchset, maybe it is there already).

Aside of this, the 'transient fail' statistics was nice: no need to roll a screen to find whether reruns occur after full local testing.

avtikhon · 2020-09-23T10:03:00Z

I pushed two branches into tarantool today (and rerun failed jobs several times):

I've run the same 3 days twice in hour and I got really lot of results. Checked some of them I've got the same results as Alexander below, but it was expected. I've updated the table [1] with the new column 'Checksum' and set there some of results, green - checksums that available, red - that have changing data. As I've checked before the same issue with checksum we'll have just in some tests - found only 9 with printing box info and some other tests printing other changing data.

[1] - https://docs.google.com/spreadsheets/d/1LGfHM_6tn84lzlz3XIXQnOqNY_KtzRWuZg33_yuv4YI/edit#gid=1890598391

Got several 'address already in use' fails (http_client.test.lua). It is the problem of test-run design (Fix 'Address already in use' error with TCP sockets #141), not the test per se.

Right, we have some these fails, but it passes when tests run inline using fragile list for it.

Got several fails on net.box_reconnect_after_gh-3164.test.lua with the whole box.info.replication output (time related information, UUIDs).

Got several fails on net.box_reconnect_after_gh-3164.test.lua with peer UUID, Unix socket path, schema version and so on (will differ from run to run).

Right this test I've in the table marked as red.

There are other fails, but I didn't look whether the output was predictable (at least vinyl.ddl.test.lua, don't remember others).

I know, you're sure you can mark almost everything with checksums, but I see the opposite picture now. Maybe I'm just lucky.

Right, we have some tests like this, but we have 2 ways to resolve it: fix the test either remove box information printing if it is not really needed. I've prepared the initial patch with checksums set as green in the table mentioned above and the results really good - only 1 new real fail found [2] !

[2] - https://gitlab.com/tarantool/tarantool/-/pipelines/193120097/builds

At least, please, left ability to rerun a test, which is not marked with any checksum (I didn't look at the last patchset, maybe it is there already).

That was the main cause why this feature with tests reruns I didn't want to have without checksums.

Aside of this, the 'transient fail' statistics was nice: no need to roll a screen to find whether reruns occur after full local testing.

Actually after checksums will be written to the suites.ini files these fails would be really rare.

Totktonada · 2020-09-23T23:55:46Z

Got several 'address already in use' fails (http_client.test.lua). It is the problem of test-run design (#141), not the test per se.

Right, we have some these fails, but it passes when tests run inline using fragile list for it.

It is maybe okay as a temporary workaround, but it is not so complex to fix the cause: bind httpd.py on zero port (to let kernel to choose a random one) and read a real port from its output (we already read heartbeats from here).

At least, please, left ability to rerun a test, which is not marked with any checksum (I didn't look at the last patchset, maybe it is there already).

That was the main cause why this feature with tests reruns I didn't want to have without checksums.

To be honest, I don't get the answer.

Aside of this, the 'transient fail' statistics was nice: no need to roll a screen to find whether reruns occur after full local testing.

Actually after checksums will be written to the suites.ini files these fails would be really rare.

But I still want to know about this, when it occurs, to verify whether it is introduced by a patch I work on. It is okay to postpone this feature, but it would be usable.

Totktonada · 2020-09-24T04:46:32Z

lib/worker.py

-            if is_fragile:
+            testname = os.path.basename(task_id[0])
+            fragile_checksums = self.suite.get_test_fragile_checksums(testname)
+            if is_fragile and fragile_checksums:


Here you block ability to rerun a test without a checksum. It is possible that a test cannot be verified just using a result file, but we want to allow reruns for it.

I remember the reason why you don't want to enable reruns by default when no checksums are provided: because it means that existing configs will let tests that are currently in a fragile list being retried.

However this is not so until we'll set retries explicitly, so it seems we can do this without subtle behaviour changes. I mean, if we'll consider empty (or non-existing) checksums list as 'rerun it anyway', everything will be good. But if you prefer explicit '*' in checksum, I don't mind.

You can also consider this as out of scope (not needed right now), but for me it looks quite natural that…

fragile = { "retries": 2, "tests": { "bitset.test.lua": { "issues": ["gh-4095"] } } }

…means 'rerun bitset.test.lua two times'.

Anyway, it is up to you. I just noted that the problem you highlighted looks as not applicable to our situation.

I've commented it in the same time when you wrote the questions in here:
#217 (comment)
in short: I agree with it, but as standalone commit for this feature.

lib/test_suite.py

dispatcher.py

Totktonada · 2020-09-24T05:07:13Z

Status: It looks almost ready (considering given decisions), I just want to discuss several moments before we'll land the patchset.

avtikhon · 2020-09-24T05:52:59Z

Got several 'address already in use' fails (http_client.test.lua). It is the problem of test-run design (#141), not the test per se.

Right, we have some these fails, but it passes when tests run inline using fragile list for it.

It is maybe okay as a temporary workaround, but it is not so complex to fix the cause: bind httpd.py on zero port (to let kernel to choose a random one) and read a real port from its output (we already read heartbeats from here).

Absolutely agree that run the tests like this is temporary workaround and it need to be fixed. And we'll continue work on its fixing in already exists standalone issues like:

tarantool/tarantool-qa#228

At least, please, left ability to rerun a test, which is not marked with any checksum (I didn't look at the last patchset, maybe it is there already).

That was the main cause why this feature with tests reruns I didn't want to have without checksums.

To be honest, I don't get the answer.

I mean that blind rerun of the flaky issues is more dangerous than it's manual rerun that we currently have. Check of 'checksums' is a way to make these reruns partly automate to help the testing infrastructure to avoid of manual reruns. Standalone rerun we'll hide the new flaky issues.
Right, we can add the new feature to rerun the needed tests. It can be provided with special checksum/mark like:
"checksum": [ "temporary_mark_to_rerun_on_any_fail_NOT_TO_COMMIT" ]
and add the special checker in test-run scripts on this mark to pass the tests on rerun on any failed checksum. It will help to check manually some flaky tests. But I think this feature must be committed in separate patch from the current patchset with standalone issue number.
Also in this way I would suggest to have new feature as Sergey Bronnikov suggested to have in some testing mode mandatory reruns for all tests to be able to find the flaky tests at the very start of its appearing, but this feature I would like to implement as standalone commit too.

Aside of this, the 'transient fail' statistics was nice: no need to roll a screen to find whether reruns occur after full local testing.

Actually after checksums will be written to the suites.ini files these fails would be really rare.

But I still want to know about this, when it occurs, to verify whether it is introduced by a patch I work on. It is okay to postpone this feature, but it would be usable.

Right, let's implement it as a next step, not now.

Added ability to set per suite in suite.ini configuration file 'retries' option, which sets the number of accepted reruns of the tests failed from 'fragile' list: fragile = { "retries": 10, "tests": { "bitset.test.lua": { "issues": [ "gh-4095" ], } }} Part of #189

Added ability to check results file checksum on tests fail and compare with the checksums of the known issues mentioned in the fragile list. Fragile list should consist of the results files checksums with its issues in the format: fragile = { "retries": 10, "tests": { "bitset.test.lua": { "issues": [ "gh-4095" ], "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ] } }} Closes #189

Totktonada

LGTM.

Retry a failed test when it is marked as fragile (and several other conditions are met, see below). The test-run already allows to set a list of fragile tests. They are run one-by-one after all parallel ones in order to eliminate possible resource starvation and fit timings to ones when the tests pass. See [1]. In practice this approach does not help much against our problem with flaky tests. We decided to retry failed tests, when they are known as flagile. See [2]. The core idea is to split responsibility: known flaky fails will not deflect attention of a developer, but each fragile test will be marked explicitly, trackerized and will be analyzed by the quality assurance team. The default behaviour is not changed: each test from the fragile list will be run once after all parallel ones. But now it is possible to set retries amount. Beware: the implementation does not allow to just set retries count, it also requires to provide an md5sum of a failed test output (so called reject file). The idea here is to ensure that we retry the test only in case of a known fail: not some other fail within the test. This approach has the limitation: in case of fail a test may output an information that varies from run to run or depend of a base directory. We should always verify the output before put its checksum into the configuration file. Despite doubts regarding this approach, it looks simple and we decided to try and revisit it if there will be a need. See configuration example in [3]. [1]: tarantool/test-run#187 [2]: tarantool/test-run#189 [3]: tarantool/test-run#217 Part of #5050

Retry a failed test when it is marked as fragile (and several other conditions are met, see below). The test-run already allows to set a list of fragile tests. They are run one-by-one after all parallel ones in order to eliminate possible resource starvation and fit timings to ones when the tests pass. See [1]. In practice this approach does not help much against our problem with flaky tests. We decided to retry failed tests, when they are known as flagile. See [2]. The core idea is to split responsibility: known flaky fails will not deflect attention of a developer, but each fragile test will be marked explicitly, trackerized and will be analyzed by the quality assurance team. The default behaviour is not changed: each test from the fragile list will be run once after all parallel ones. But now it is possible to set retries amount. Beware: the implementation does not allow to just set retries count, it also requires to provide an md5sum of a failed test output (so called reject file). The idea here is to ensure that we retry the test only in case of a known fail: not some other fail within the test. This approach has the limitation: in case of fail a test may output an information that varies from run to run or depend of a base directory. We should always verify the output before put its checksum into the configuration file. Despite doubts regarding this approach, it looks simple and we decided to try and revisit it if there will be a need. See configuration example in [3]. [1]: tarantool/test-run#187 [2]: tarantool/test-run#189 [3]: tarantool/test-run#217 Part of #5050 (cherry picked from commit 43482ee)

Totktonada · 2020-09-24T23:04:44Z

The test-run submodule is updated in tarantool in the following commits: 2.6.0-104-g43482eedc, 2.5.1-86-gc5bb549f6, 2.4.2-70-gef330c3b0, 1.10.7-35-g91260069c.

@Totktonada

Removed obvious part in rpm spec for Travis-CI, due to it is no longer in use. ---- Comments from @Totktonada ---- This change is a kind of revertion of the commit d48406d ('test: add more tests to packaging testing'), which did close #4599. Here I described the story, why the change was made and why it is reverted now. We run testing during an RPM package build: it may catch some distribution specific problem. We had reduced quantity of tests and single thread tests execution to keep the testing stable and don't break packages build and deployment due to known fragile tests. Our CI had to use Travis CI, but we were in transition to GitLab CI to use our own machines and don't reach Travis CI limit with five jobs running in parallel. We moved package builds to GitLab CI, but kept build+deploy jobs on Travis CI for a while: GitLab CI was the new for us and we wanted to do this transition smoothly for users of our APT / YUM repositories. After enabling packages building on GitLab CI, we wanted to enable more tests (to catch more problems) and wanted to enable parallel execution of tests to speed up testing (and reduce amount of time a developer wait for results). We observed that if we'll enable more tests and parallel execution on Travis CI, the testing results will become much less stable and so we'll often have holes in deployed packages and red CI. So, we decided to keep the old way testing on Travis CI and perform all changes (more tests, more parallelism) only for GitLab CI. We had a guess that we have enough machine resources and will able to do some load balancing to overcome flaky fails on our own machines, but in fact we picked up another approach later (see below). That's all story behind #4599. What changes from those days? We moved deployment jobs to GitLab CI[^1] and now we completely disabled Travis CI (see #4410 and #4894). All jobs were moved either to GitLab CI or right to GitHub Actions[^2]. We revisited our approach to improve stability of testing. Attemps to do some load balancing together with attempts to keep not-so-large execution time were failed. We should increase parallelism for speed, but decrease it for stability at the same time. There is no optimal balance. So we decided to track flaky fails in the issue tracker and restart a test after a known fail (see details in [1]). This way we don't need to exclude tests and disable parallelism in order to get the stable and fast testing[^3]. At least in theory. We're on the way to verify this guess, but hopefully we'll stick with some adequate defaults that will work everywhere[^4]. To sum up, there are several reasons to remove the old workaround, which was implemented in the scope of #4599: no Travis CI, no foreseeable reasons to exclude tests and reduce parallelism depending on a CI provider. Footnotes: [^1]: This is simplification. Travis CI deployment jobs were not moved as is. GitLab CI jobs push packages to the new repositories backend (#3380). Travis CI jobs were disabled later (as part of #4947), after proofs that the new infrastructure works fine. However this is the another story. [^2]: Now we're going to use GitHub Actions for all jobs, mainly because GitLab CI is poorly integrated with GitHub pull requests (when source branch is in a forked repository). [^3]: Some work toward this direction still to be done: First, 'replication' test suite still excluded from the testing under RPM package build. It seems, we should just enable it back, it is tracked by #4798. Second, there is the issue [2] to get rid of ancient traces of the old attempts to keep the testing stable (from test-run side). It'll give us more parallelism in testing. [^4]: Of course, we perform investigations of flaky fails and fix code and testing problems it feeds to us. However it appears to be the long activity. References: [1]: tarantool/test-run#217 [2]: https://github.com/tarantool/test-run/issues/251

@Totktonada

Removed obvious part in rpm spec for Travis-CI, due to it is no longer in use. ---- Comments from @Totktonada ---- This change is a kind of revertion of the commit d48406d ('test: add more tests to packaging testing'), which did close #4599. Here I described the story, why the change was made and why it is reverted now. We run testing during an RPM package build: it may catch some distribution specific problem. We had reduced quantity of tests and single thread tests execution to keep the testing stable and don't break packages build and deployment due to known fragile tests. Our CI had to use Travis CI, but we were in transition to GitLab CI to use our own machines and don't reach Travis CI limit with five jobs running in parallel. We moved package builds to GitLab CI, but kept build+deploy jobs on Travis CI for a while: GitLab CI was the new for us and we wanted to do this transition smoothly for users of our APT / YUM repositories. After enabling packages building on GitLab CI, we wanted to enable more tests (to catch more problems) and wanted to enable parallel execution of tests to speed up testing (and reduce amount of time a developer wait for results). We observed that if we'll enable more tests and parallel execution on Travis CI, the testing results will become much less stable and so we'll often have holes in deployed packages and red CI. So, we decided to keep the old way testing on Travis CI and perform all changes (more tests, more parallelism) only for GitLab CI. We had a guess that we have enough machine resources and will able to do some load balancing to overcome flaky fails on our own machines, but in fact we picked up another approach later (see below). That's all story behind #4599. What changes from those days? We moved deployment jobs to GitLab CI[^1] and now we completely disabled Travis CI (see #4410 and #4894). All jobs were moved either to GitLab CI or right to GitHub Actions[^2]. We revisited our approach to improve stability of testing. Attemps to do some load balancing together with attempts to keep not-so-large execution time were failed. We should increase parallelism for speed, but decrease it for stability at the same time. There is no optimal balance. So we decided to track flaky fails in the issue tracker and restart a test after a known fail (see details in [1]). This way we don't need to exclude tests and disable parallelism in order to get the stable and fast testing[^3]. At least in theory. We're on the way to verify this guess, but hopefully we'll stick with some adequate defaults that will work everywhere[^4]. To sum up, there are several reasons to remove the old workaround, which was implemented in the scope of #4599: no Travis CI, no foreseeable reasons to exclude tests and reduce parallelism depending on a CI provider. Footnotes: [^1]: This is simplification. Travis CI deployment jobs were not moved as is. GitLab CI jobs push packages to the new repositories backend (#3380). Travis CI jobs were disabled later (as part of #4947), after proofs that the new infrastructure works fine. However this is the another story. [^2]: Now we're going to use GitHub Actions for all jobs, mainly because GitLab CI is poorly integrated with GitHub pull requests (when source branch is in a forked repository). [^3]: Some work toward this direction still to be done: First, 'replication' test suite still excluded from the testing under RPM package build. It seems, we should just enable it back, it is tracked by #4798. Second, there is the issue [2] to get rid of ancient traces of the old attempts to keep the testing stable (from test-run side). It'll give us more parallelism in testing. [^4]: Of course, we perform investigations of flaky fails and fix code and testing problems it feeds to us. However it appears to be the long activity. References: [1]: tarantool/test-run#217 [2]: https://github.com/tarantool/test-run/issues/251 (cherry picked from commit d9c25b7)

@Totktonada

Removed obvious part in rpm spec for Travis-CI, due to it is no longer in use. ---- Comments from @Totktonada ---- This change is a kind of revertion of the commit d48406d ('test: add more tests to packaging testing'), which did close #4599. Here I described the story, why the change was made and why it is reverted now. We run testing during an RPM package build: it may catch some distribution specific problem. We had reduced quantity of tests and single thread tests execution to keep the testing stable and don't break packages build and deployment due to known fragile tests. Our CI had to use Travis CI, but we were in transition to GitLab CI to use our own machines and don't reach Travis CI limit with five jobs running in parallel. We moved package builds to GitLab CI, but kept build+deploy jobs on Travis CI for a while: GitLab CI was the new for us and we wanted to do this transition smoothly for users of our APT / YUM repositories. After enabling packages building on GitLab CI, we wanted to enable more tests (to catch more problems) and wanted to enable parallel execution of tests to speed up testing (and reduce amount of time a developer wait for results). We observed that if we'll enable more tests and parallel execution on Travis CI, the testing results will become much less stable and so we'll often have holes in deployed packages and red CI. So, we decided to keep the old way testing on Travis CI and perform all changes (more tests, more parallelism) only for GitLab CI. We had a guess that we have enough machine resources and will able to do some load balancing to overcome flaky fails on our own machines, but in fact we picked up another approach later (see below). That's all story behind #4599. What changes from those days? We moved deployment jobs to GitLab CI[^1] and now we completely disabled Travis CI (see #4410 and #4894). All jobs were moved either to GitLab CI or right to GitHub Actions[^2]. We revisited our approach to improve stability of testing. Attemps to do some load balancing together with attempts to keep not-so-large execution time were failed. We should increase parallelism for speed, but decrease it for stability at the same time. There is no optimal balance. So we decided to track flaky fails in the issue tracker and restart a test after a known fail (see details in [1]). This way we don't need to exclude tests and disable parallelism in order to get the stable and fast testing[^3]. At least in theory. We're on the way to verify this guess, but hopefully we'll stick with some adequate defaults that will work everywhere[^4]. To sum up, there are several reasons to remove the old workaround, which was implemented in the scope of #4599: no Travis CI, no foreseeable reasons to exclude tests and reduce parallelism depending on a CI provider. Footnotes: [^1]: This is simplification. Travis CI deployment jobs were not moved as is. GitLab CI jobs push packages to the new repositories backend (#3380). Travis CI jobs were disabled later (as part of #4947), after proofs that the new infrastructure works fine. However this is the another story. [^2]: Now we're going to use GitHub Actions for all jobs, mainly because GitLab CI is poorly integrated with GitHub pull requests (when source branch is in a forked repository). [^3]: Some work toward this direction still to be done: First, 'replication' test suite still excluded from the testing under RPM package build. It seems, we should just enable it back, it is tracked by #4798. Second, there is the issue [2] to get rid of ancient traces of the old attempts to keep the testing stable (from test-run side). It'll give us more parallelism in testing. [^4]: Of course, we perform investigations of flaky fails and fix code and testing problems it feeds to us. However it appears to be the long activity. References: [1]: tarantool/test-run#217 [2]: https://github.com/tarantool/test-run/issues/251 (cherry picked from commit d9c25b7)

@Totktonada

Removed obvious part in rpm spec for Travis-CI, due to it is no longer in use. ---- Comments from @Totktonada ---- This change is a kind of revertion of the commit d48406d ('test: add more tests to packaging testing'), which did close #4599. Here I described the story, why the change was made and why it is reverted now. We run testing during an RPM package build: it may catch some distribution specific problem. We had reduced quantity of tests and single thread tests execution to keep the testing stable and don't break packages build and deployment due to known fragile tests. Our CI had to use Travis CI, but we were in transition to GitLab CI to use our own machines and don't reach Travis CI limit with five jobs running in parallel. We moved package builds to GitLab CI, but kept build+deploy jobs on Travis CI for a while: GitLab CI was the new for us and we wanted to do this transition smoothly for users of our APT / YUM repositories. After enabling packages building on GitLab CI, we wanted to enable more tests (to catch more problems) and wanted to enable parallel execution of tests to speed up testing (and reduce amount of time a developer wait for results). We observed that if we'll enable more tests and parallel execution on Travis CI, the testing results will become much less stable and so we'll often have holes in deployed packages and red CI. So, we decided to keep the old way testing on Travis CI and perform all changes (more tests, more parallelism) only for GitLab CI. We had a guess that we have enough machine resources and will able to do some load balancing to overcome flaky fails on our own machines, but in fact we picked up another approach later (see below). That's all story behind #4599. What changes from those days? We moved deployment jobs to GitLab CI[^1] and now we completely disabled Travis CI (see #4410 and #4894). All jobs were moved either to GitLab CI or right to GitHub Actions[^2]. We revisited our approach to improve stability of testing. Attemps to do some load balancing together with attempts to keep not-so-large execution time were failed. We should increase parallelism for speed, but decrease it for stability at the same time. There is no optimal balance. So we decided to track flaky fails in the issue tracker and restart a test after a known fail (see details in [1]). This way we don't need to exclude tests and disable parallelism in order to get the stable and fast testing[^3]. At least in theory. We're on the way to verify this guess, but hopefully we'll stick with some adequate defaults that will work everywhere[^4]. To sum up, there are several reasons to remove the old workaround, which was implemented in the scope of #4599: no Travis CI, no foreseeable reasons to exclude tests and reduce parallelism depending on a CI provider. Footnotes: [^1]: This is simplification. Travis CI deployment jobs were not moved as is. GitLab CI jobs push packages to the new repositories backend (#3380). Travis CI jobs were disabled later (as part of #4947), after proofs that the new infrastructure works fine. However this is the another story. [^2]: Now we're going to use GitHub Actions for all jobs, mainly because GitLab CI is poorly integrated with GitHub pull requests (when source branch is in a forked repository). [^3]: Some work toward this direction still to be done: First, 'replication' test suite still excluded from the testing under RPM package build. It seems, we should just enable it back, it is tracked by #4798. Second, there is the issue [2] to get rid of ancient traces of the old attempts to keep the testing stable (from test-run side). It'll give us more parallelism in testing. [^4]: Of course, we perform investigations of flaky fails and fix code and testing problems it feeds to us. However it appears to be the long activity. References: [1]: tarantool/test-run#217 [2]: https://github.com/tarantool/test-run/issues/251 (cherry picked from commit d9c25b7)

avtikhon requested a review from Totktonada July 15, 2020 09:04

avtikhon self-assigned this Jul 15, 2020

avtikhon force-pushed the avtikhon/gh-5050-retries branch from 9496e53 to 67b59e4 Compare July 27, 2020 13:02

avtikhon force-pushed the avtikhon/gh-5050-retries branch 2 times, most recently from c6cb7a3 to 0a0c6d8 Compare August 6, 2020 07:39

avtikhon requested a review from LeonidVas August 6, 2020 10:32

LeonidVas requested changes Aug 6, 2020

View reviewed changes

dispatcher.py Outdated Show resolved Hide resolved

lib/test_suite.py Outdated Show resolved Hide resolved

lib/worker.py Outdated Show resolved Hide resolved

lib/worker.py Show resolved Hide resolved

lib/worker.py Outdated Show resolved Hide resolved

LeonidVas requested changes Aug 6, 2020

View reviewed changes

lib/worker.py Outdated Show resolved Hide resolved

lib/test_suite.py Outdated Show resolved Hide resolved

lib/worker.py Outdated Show resolved Hide resolved

avtikhon force-pushed the avtikhon/gh-5050-retries branch 3 times, most recently from f1cfe21 to 7869fe9 Compare August 12, 2020 10:42

avtikhon force-pushed the avtikhon/gh-5050-retries branch 2 times, most recently from e76a809 to 358cf41 Compare August 13, 2020 07:18

avtikhon force-pushed the avtikhon/gh-5050-retries branch from 358cf41 to a72b6cd Compare August 13, 2020 09:56

avtikhon requested a review from LeonidVas August 13, 2020 09:57

avtikhon force-pushed the avtikhon/gh-5050-retries branch 3 times, most recently from e40595e to 78a301d Compare August 13, 2020 12:53

LeonidVas requested changes Aug 14, 2020

View reviewed changes

avtikhon mentioned this pull request Sep 14, 2020

[1pt] gitlab-ci: implement testing rerun based on fragile lists tarantool/tarantool#5050

Closed

avtikhon force-pushed the avtikhon/gh-5050-retries branch 5 times, most recently from 339ed9b to 01d9865 Compare September 15, 2020 10:36

avtikhon force-pushed the avtikhon/gh-5050-retries branch from 568344a to c7b1b6c Compare September 22, 2020 15:57

Add ability to use fragile list in JSON format

b44ae16

Added ability to use fragile list in JSON format from 'suite.ini' files: fragile = { "tests": { "bitset.test.lua": { "issues": [ "gh-4095" ] } }}

avtikhon force-pushed the avtikhon/gh-5050-retries branch from c7b1b6c to b6b7d41 Compare September 22, 2020 16:40

LeonidVas approved these changes Sep 22, 2020

View reviewed changes

Totktonada reviewed Sep 24, 2020

View reviewed changes

lib/test_suite.py Outdated Show resolved Hide resolved

Totktonada reviewed Sep 24, 2020

View reviewed changes

lib/test_suite.py Show resolved Hide resolved

Totktonada reviewed Sep 24, 2020

View reviewed changes

dispatcher.py Outdated Show resolved Hide resolved

avtikhon force-pushed the avtikhon/gh-5050-retries branch from b6b7d41 to cde0988 Compare September 24, 2020 07:28

avtikhon added 2 commits September 24, 2020 11:26

avtikhon force-pushed the avtikhon/gh-5050-retries branch from cde0988 to f8a8135 Compare September 24, 2020 08:33

Totktonada approved these changes Sep 24, 2020

View reviewed changes

Totktonada merged commit ec8c991 into master Sep 24, 2020

Totktonada deleted the avtikhon/gh-5050-retries branch September 24, 2020 21:41

Totktonada mentioned this pull request Dec 25, 2020

test: fix replication test suite for packages testing and add it to build spec tarantool/tarantool#4798

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable test reruns on failed fragiled tests #217

Enable test reruns on failed fragiled tests #217

avtikhon commented Jul 15, 2020 •

edited

LeonidVas commented Aug 6, 2020

avtikhon commented Aug 12, 2020

LeonidVas commented Aug 13, 2020

avtikhon commented Aug 13, 2020

LeonidVas left a comment •

edited

Totktonada commented Sep 14, 2020

LeonidVas left a comment

Totktonada commented Sep 23, 2020

avtikhon commented Sep 23, 2020 •

edited

Totktonada commented Sep 23, 2020

Totktonada Sep 24, 2020

avtikhon Sep 24, 2020

Totktonada commented Sep 24, 2020

avtikhon commented Sep 24, 2020 •

edited

Totktonada left a comment

Totktonada commented Sep 24, 2020

Enable test reruns on failed fragiled tests #217

Enable test reruns on failed fragiled tests #217

Conversation

avtikhon commented Jul 15, 2020 • edited

LeonidVas commented Aug 6, 2020

avtikhon commented Aug 12, 2020

LeonidVas commented Aug 13, 2020

avtikhon commented Aug 13, 2020

LeonidVas left a comment • edited

Choose a reason for hiding this comment

Totktonada commented Sep 14, 2020

LeonidVas left a comment

Choose a reason for hiding this comment

Totktonada commented Sep 23, 2020

avtikhon commented Sep 23, 2020 • edited

Totktonada commented Sep 23, 2020

Totktonada Sep 24, 2020

Choose a reason for hiding this comment

avtikhon Sep 24, 2020

Choose a reason for hiding this comment

Totktonada commented Sep 24, 2020

avtikhon commented Sep 24, 2020 • edited

Totktonada left a comment

Choose a reason for hiding this comment

Totktonada commented Sep 24, 2020

avtikhon commented Jul 15, 2020 •

edited

LeonidVas left a comment •

edited

avtikhon commented Sep 23, 2020 •

edited

avtikhon commented Sep 24, 2020 •

edited