Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable test reruns on failed fragiled tests #217

Merged
merged 3 commits into from Sep 24, 2020

Conversation

avtikhon
Copy link
Contributor

@avtikhon avtikhon commented Jul 15, 2020

Added ability to use fragile list in JSON format from 'suite.ini'
files.

   fragile = {
        "retries": 10,
        "tests": {
            "bitset.test.lua": {
                "issues": [ "gh-4095" ],
                "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
            }
        }}

Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list.

Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.

Closes tarantool/tarantool#189.

@avtikhon avtikhon requested a review from Totktonada July 15, 2020 09:04
@avtikhon avtikhon self-assigned this Jul 15, 2020
@avtikhon avtikhon force-pushed the avtikhon/gh-5050-retries branch 2 times, most recently from c6cb7a3 to 0a0c6d8 Compare August 6, 2020 07:39
@avtikhon avtikhon requested a review from LeonidVas August 6, 2020 10:32
dispatcher.py Outdated Show resolved Hide resolved
lib/test_suite.py Outdated Show resolved Hide resolved
lib/worker.py Outdated Show resolved Hide resolved
lib/worker.py Show resolved Hide resolved
lib/worker.py Outdated Show resolved Hide resolved
lib/worker.py Outdated Show resolved Hide resolved
lib/test_suite.py Outdated Show resolved Hide resolved
lib/worker.py Outdated Show resolved Hide resolved
@LeonidVas
Copy link

It looks like this closes # 189. Add this information to the commit message instead of "Part of tarantool / tarantool problem # 5050".

@avtikhon avtikhon force-pushed the avtikhon/gh-5050-retries branch 3 times, most recently from f1cfe21 to 7869fe9 Compare August 12, 2020 10:42
@avtikhon
Copy link
Contributor Author

It looks like this closes # 189. Add this information to the commit message instead of "Part of tarantool / tarantool problem # 5050".

Ok, changed.

@avtikhon avtikhon force-pushed the avtikhon/gh-5050-retries branch 2 times, most recently from e76a809 to 358cf41 Compare August 13, 2020 07:18
@LeonidVas
Copy link

After some discussion with @Totktonada and @avtikhon, it was decided to try using JSON for configuration a "fragile".
Example:

fragile =
 {
   "tests": {
     "test_name": {
       "checksums": [
         ch1,
         ch2
       ]
     }
   },
   "retries": 10
 }

Useful links:
https://docs.python.org/3/library/configparser.html#supported-ini-file-structure
https://tools.ietf.org/html/rfc7159

@avtikhon
Copy link
Contributor Author

After some discussion with @Totktonada and @avtikhon, it was decided to try using JSON for configuration a "fragile".
Example:

fragile =
 {
   "tests": {
     "test_name": {
       "checksums": [
         ch1,
         ch2
       ]
     }
   },
   "retries": 10
 }

Useful links:
https://docs.python.org/3/library/configparser.html#supported-ini-file-structure
https://tools.ietf.org/html/rfc7159

Implemented.

@avtikhon avtikhon force-pushed the avtikhon/gh-5050-retries branch 3 times, most recently from e40595e to 78a301d Compare August 13, 2020 12:53
Copy link

@LeonidVas LeonidVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said before, transition to JSON format should be implemented by a first patch in the patchset. And all new "fragile" options need to be added only to JSON. Introducing two new competing formats in the same patchset is very strange decision.
Also, I think the format fragile = <basename of the test> ; gh-<issue> md5sum:<checksum> is strange (and potentially not valid), because:

  • ; / # is used for a comment adding and it looks like md5sum: <checksum> is a comment, but it is not.
  • = / : is a separator for key/value and mixing them in the same file is a strange decision.

lib/worker.py Outdated Show resolved Hide resolved
lib/worker.py Outdated Show resolved Hide resolved
dispatcher.py Outdated Show resolved Hide resolved
lib/test.py Outdated Show resolved Hide resolved
lib/test_suite.py Outdated Show resolved Hide resolved
lib/test_suite.py Outdated Show resolved Hide resolved
lib/test_suite.py Outdated Show resolved Hide resolved
lib/test_suite.py Outdated Show resolved Hide resolved
lib/test_suite.py Outdated Show resolved Hide resolved
lib/worker.py Outdated Show resolved Hide resolved
@Totktonada
Copy link
Member

I didn't look thoroughly, but I would highlight several points.

  1. The approach with checksums is not universal. It does not work if time-related information appears in a result file. It requires periodical checksums updating if a line number is reported as part of an error. I guess it is okay to operate on a test level and either mark it as stable or as fragile.
  2. We can either implement reruns within a worker or on the level upward: within the task management system (dispathers-listeners-workers). Since there is nothing tarantool-specific I would look into the latter way. I pushed a variant to Totktonada/rerun-fragile-test branch (0292fe2).

If you want my opinion about extracting checksums from comments — it is disgusting. Sorry.

There are more points I would highlight about the patchset, but it does not have much sense if we'll get rid of checksums at all. The patchset modifies many different parts of test-run: too much for this small feature, IMHO.

Added ability to use fragile list in JSON format from 'suite.ini'
files:

  fragile = {
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ]
        }
    }}
Copy link

@LeonidVas LeonidVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks syntactically correct to me.

@Totktonada
Copy link
Member

I pushed two branches into tarantool today (and rerun failed jobs several times):

  • Got several 'address already in use' fails (http_client.test.lua). It is the problem of test-run design (Fix 'Address already in use' error with TCP sockets #141), not the test per se.
  • Got several fails on net.box_reconnect_after_gh-3164.test.lua with the whole box.info.replication output (time related information, UUIDs).
  • Got several fails on net.box_reconnect_after_gh-3164.test.lua with peer UUID, Unix socket path, schema version and so on (will differ from run to run).

There are other fails, but I didn't look whether the output was predictable (at least vinyl.ddl.test.lua, don't remember others).

I know, you're sure you can mark almost everything with checksums, but I see the opposite picture now. Maybe I'm just lucky.

At least, please, left ability to rerun a test, which is not marked with any checksum (I didn't look at the last patchset, maybe it is there already).

Aside of this, the 'transient fail' statistics was nice: no need to roll a screen to find whether reruns occur after full local testing.

@avtikhon
Copy link
Contributor Author

avtikhon commented Sep 23, 2020

I pushed two branches into tarantool today (and rerun failed jobs several times):

I've run the same 3 days twice in hour and I got really lot of results. Checked some of them I've got the same results as Alexander below, but it was expected. I've updated the table [1] with the new column 'Checksum' and set there some of results, green - checksums that available, red - that have changing data. As I've checked before the same issue with checksum we'll have just in some tests - found only 9 with printing box info and some other tests printing other changing data.

[1] - https://docs.google.com/spreadsheets/d/1LGfHM_6tn84lzlz3XIXQnOqNY_KtzRWuZg33_yuv4YI/edit#gid=1890598391

Right, we have some these fails, but it passes when tests run inline using fragile list for it.

  • Got several fails on net.box_reconnect_after_gh-3164.test.lua with the whole box.info.replication output (time related information, UUIDs).
  • Got several fails on net.box_reconnect_after_gh-3164.test.lua with peer UUID, Unix socket path, schema version and so on (will differ from run to run).

Right this test I've in the table marked as red.

There are other fails, but I didn't look whether the output was predictable (at least vinyl.ddl.test.lua, don't remember others).

I know, you're sure you can mark almost everything with checksums, but I see the opposite picture now. Maybe I'm just lucky.

Right, we have some tests like this, but we have 2 ways to resolve it: fix the test either remove box information printing if it is not really needed. I've prepared the initial patch with checksums set as green in the table mentioned above and the results really good - only 1 new real fail found [2] !

[2] - https://gitlab.com/tarantool/tarantool/-/pipelines/193120097/builds

At least, please, left ability to rerun a test, which is not marked with any checksum (I didn't look at the last patchset, maybe it is there already).

That was the main cause why this feature with tests reruns I didn't want to have without checksums.

Aside of this, the 'transient fail' statistics was nice: no need to roll a screen to find whether reruns occur after full local testing.

Actually after checksums will be written to the suites.ini files these fails would be really rare.

@Totktonada
Copy link
Member

Got several 'address already in use' fails (http_client.test.lua). It is the problem of test-run design (#141), not the test per se.

Right, we have some these fails, but it passes when tests run inline using fragile list for it.

It is maybe okay as a temporary workaround, but it is not so complex to fix the cause: bind httpd.py on zero port (to let kernel to choose a random one) and read a real port from its output (we already read heartbeats from here).

At least, please, left ability to rerun a test, which is not marked with any checksum (I didn't look at the last patchset, maybe it is there already).

That was the main cause why this feature with tests reruns I didn't want to have without checksums.

To be honest, I don't get the answer.

Aside of this, the 'transient fail' statistics was nice: no need to roll a screen to find whether reruns occur after full local testing.

Actually after checksums will be written to the suites.ini files these fails would be really rare.

But I still want to know about this, when it occurs, to verify whether it is introduced by a patch I work on. It is okay to postpone this feature, but it would be usable.

lib/worker.py Outdated
if is_fragile:
testname = os.path.basename(task_id[0])
fragile_checksums = self.suite.get_test_fragile_checksums(testname)
if is_fragile and fragile_checksums:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you block ability to rerun a test without a checksum. It is possible that a test cannot be verified just using a result file, but we want to allow reruns for it.

I remember the reason why you don't want to enable reruns by default when no checksums are provided: because it means that existing configs will let tests that are currently in a fragile list being retried.

However this is not so until we'll set retries explicitly, so it seems we can do this without subtle behaviour changes. I mean, if we'll consider empty (or non-existing) checksums list as 'rerun it anyway', everything will be good. But if you prefer explicit '*' in checksum, I don't mind.

You can also consider this as out of scope (not needed right now), but for me it looks quite natural that…

fragile = {
    "retries": 2,
    "tests": {
        "bitset.test.lua": {
            "issues": ["gh-4095"]
        }
    }
}

…means 'rerun bitset.test.lua two times'.

Anyway, it is up to you. I just noted that the problem you highlighted looks as not applicable to our situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've commented it in the same time when you wrote the questions in here:
#217 (comment)
in short: I agree with it, but as standalone commit for this feature.

lib/test_suite.py Outdated Show resolved Hide resolved
dispatcher.py Outdated Show resolved Hide resolved
@Totktonada
Copy link
Member

Status: It looks almost ready (considering given decisions), I just want to discuss several moments before we'll land the patchset.

@avtikhon
Copy link
Contributor Author

avtikhon commented Sep 24, 2020

Got several 'address already in use' fails (http_client.test.lua). It is the problem of test-run design (#141), not the test per se.

Right, we have some these fails, but it passes when tests run inline using fragile list for it.

It is maybe okay as a temporary workaround, but it is not so complex to fix the cause: bind httpd.py on zero port (to let kernel to choose a random one) and read a real port from its output (we already read heartbeats from here).

Absolutely agree that run the tests like this is temporary workaround and it need to be fixed. And we'll continue work on its fixing in already exists standalone issues like:

tarantool/tarantool-qa#228

At least, please, left ability to rerun a test, which is not marked with any checksum (I didn't look at the last patchset, maybe it is there already).

That was the main cause why this feature with tests reruns I didn't want to have without checksums.

To be honest, I don't get the answer.

I mean that blind rerun of the flaky issues is more dangerous than it's manual rerun that we currently have. Check of 'checksums' is a way to make these reruns partly automate to help the testing infrastructure to avoid of manual reruns. Standalone rerun we'll hide the new flaky issues.
Right, we can add the new feature to rerun the needed tests. It can be provided with special checksum/mark like:
"checksum": [ "temporary_mark_to_rerun_on_any_fail_NOT_TO_COMMIT" ]
and add the special checker in test-run scripts on this mark to pass the tests on rerun on any failed checksum. It will help to check manually some flaky tests. But I think this feature must be committed in separate patch from the current patchset with standalone issue number.
Also in this way I would suggest to have new feature as Sergey Bronnikov suggested to have in some testing mode mandatory reruns for all tests to be able to find the flaky tests at the very start of its appearing, but this feature I would like to implement as standalone commit too.

Aside of this, the 'transient fail' statistics was nice: no need to roll a screen to find whether reruns occur after full local testing.

Actually after checksums will be written to the suites.ini files these fails would be really rare.

But I still want to know about this, when it occurs, to verify whether it is introduced by a patch I work on. It is okay to postpone this feature, but it would be usable.

Right, let's implement it as a next step, not now.

Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
Added ability to check results file checksum on tests fail and
compare with the checksums of the known issues mentioned in the
fragile list. Fragile list should consist of the results files
checksums with its issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@Totktonada Totktonada merged commit ec8c991 into master Sep 24, 2020
@Totktonada Totktonada deleted the avtikhon/gh-5050-retries branch September 24, 2020 21:41
Totktonada added a commit to tarantool/tarantool that referenced this pull request Sep 24, 2020
Retry a failed test when it is marked as fragile (and several other
conditions are met, see below).

The test-run already allows to set a list of fragile tests. They are run
one-by-one after all parallel ones in order to eliminate possible
resource starvation and fit timings to ones when the tests pass. See
[1].

In practice this approach does not help much against our problem with
flaky tests. We decided to retry failed tests, when they are known as
flagile. See [2].

The core idea is to split responsibility: known flaky fails will not
deflect attention of a developer, but each fragile test will be marked
explicitly, trackerized and will be analyzed by the quality assurance
team.

The default behaviour is not changed: each test from the fragile list
will be run once after all parallel ones. But now it is possible to set
retries amount.

Beware: the implementation does not allow to just set retries count, it
also requires to provide an md5sum of a failed test output (so called
reject file). The idea here is to ensure that we retry the test only in
case of a known fail: not some other fail within the test.

This approach has the limitation: in case of fail a test may output an
information that varies from run to run or depend of a base directory.
We should always verify the output before put its checksum into the
configuration file.

Despite doubts regarding this approach, it looks simple and we decided
to try and revisit it if there will be a need.

See configuration example in [3].

[1]: tarantool/test-run#187
[2]: tarantool/test-run#189
[3]: tarantool/test-run#217

Part of #5050
Totktonada added a commit to tarantool/tarantool that referenced this pull request Sep 24, 2020
Retry a failed test when it is marked as fragile (and several other
conditions are met, see below).

The test-run already allows to set a list of fragile tests. They are run
one-by-one after all parallel ones in order to eliminate possible
resource starvation and fit timings to ones when the tests pass. See
[1].

In practice this approach does not help much against our problem with
flaky tests. We decided to retry failed tests, when they are known as
flagile. See [2].

The core idea is to split responsibility: known flaky fails will not
deflect attention of a developer, but each fragile test will be marked
explicitly, trackerized and will be analyzed by the quality assurance
team.

The default behaviour is not changed: each test from the fragile list
will be run once after all parallel ones. But now it is possible to set
retries amount.

Beware: the implementation does not allow to just set retries count, it
also requires to provide an md5sum of a failed test output (so called
reject file). The idea here is to ensure that we retry the test only in
case of a known fail: not some other fail within the test.

This approach has the limitation: in case of fail a test may output an
information that varies from run to run or depend of a base directory.
We should always verify the output before put its checksum into the
configuration file.

Despite doubts regarding this approach, it looks simple and we decided
to try and revisit it if there will be a need.

See configuration example in [3].

[1]: tarantool/test-run#187
[2]: tarantool/test-run#189
[3]: tarantool/test-run#217

Part of #5050

(cherry picked from commit 43482ee)
Totktonada added a commit to tarantool/tarantool that referenced this pull request Sep 24, 2020
Retry a failed test when it is marked as fragile (and several other
conditions are met, see below).

The test-run already allows to set a list of fragile tests. They are run
one-by-one after all parallel ones in order to eliminate possible
resource starvation and fit timings to ones when the tests pass. See
[1].

In practice this approach does not help much against our problem with
flaky tests. We decided to retry failed tests, when they are known as
flagile. See [2].

The core idea is to split responsibility: known flaky fails will not
deflect attention of a developer, but each fragile test will be marked
explicitly, trackerized and will be analyzed by the quality assurance
team.

The default behaviour is not changed: each test from the fragile list
will be run once after all parallel ones. But now it is possible to set
retries amount.

Beware: the implementation does not allow to just set retries count, it
also requires to provide an md5sum of a failed test output (so called
reject file). The idea here is to ensure that we retry the test only in
case of a known fail: not some other fail within the test.

This approach has the limitation: in case of fail a test may output an
information that varies from run to run or depend of a base directory.
We should always verify the output before put its checksum into the
configuration file.

Despite doubts regarding this approach, it looks simple and we decided
to try and revisit it if there will be a need.

See configuration example in [3].

[1]: tarantool/test-run#187
[2]: tarantool/test-run#189
[3]: tarantool/test-run#217

Part of #5050

(cherry picked from commit 43482ee)
Totktonada added a commit to tarantool/tarantool that referenced this pull request Sep 24, 2020
Retry a failed test when it is marked as fragile (and several other
conditions are met, see below).

The test-run already allows to set a list of fragile tests. They are run
one-by-one after all parallel ones in order to eliminate possible
resource starvation and fit timings to ones when the tests pass. See
[1].

In practice this approach does not help much against our problem with
flaky tests. We decided to retry failed tests, when they are known as
flagile. See [2].

The core idea is to split responsibility: known flaky fails will not
deflect attention of a developer, but each fragile test will be marked
explicitly, trackerized and will be analyzed by the quality assurance
team.

The default behaviour is not changed: each test from the fragile list
will be run once after all parallel ones. But now it is possible to set
retries amount.

Beware: the implementation does not allow to just set retries count, it
also requires to provide an md5sum of a failed test output (so called
reject file). The idea here is to ensure that we retry the test only in
case of a known fail: not some other fail within the test.

This approach has the limitation: in case of fail a test may output an
information that varies from run to run or depend of a base directory.
We should always verify the output before put its checksum into the
configuration file.

Despite doubts regarding this approach, it looks simple and we decided
to try and revisit it if there will be a need.

See configuration example in [3].

[1]: tarantool/test-run#187
[2]: tarantool/test-run#189
[3]: tarantool/test-run#217

Part of #5050

(cherry picked from commit 43482ee)
@Totktonada
Copy link
Member

The test-run submodule is updated in tarantool in the following commits: 2.6.0-104-g43482eedc, 2.5.1-86-gc5bb549f6, 2.4.2-70-gef330c3b0, 1.10.7-35-g91260069c.

Totktonada pushed a commit to tarantool/tarantool that referenced this pull request Dec 26, 2020
Removed obvious part in rpm spec for Travis-CI, due to it is no
longer in use.

---- Comments from @Totktonada ----

This change is a kind of revertion of the commit
d48406d ('test: add more tests to
packaging testing'), which did close #4599.

Here I described the story, why the change was made and why it is
reverted now.

We run testing during an RPM package build: it may catch some
distribution specific problem. We had reduced quantity of tests and
single thread tests execution to keep the testing stable and don't break
packages build and deployment due to known fragile tests.

Our CI had to use Travis CI, but we were in transition to GitLab CI to
use our own machines and don't reach Travis CI limit with five jobs
running in parallel.

We moved package builds to GitLab CI, but kept build+deploy jobs on
Travis CI for a while: GitLab CI was the new for us and we wanted to do
this transition smoothly for users of our APT / YUM repositories.

After enabling packages building on GitLab CI, we wanted to enable more
tests (to catch more problems) and wanted to enable parallel execution
of tests to speed up testing (and reduce amount of time a developer wait
for results).

We observed that if we'll enable more tests and parallel execution on
Travis CI, the testing results will become much less stable and so we'll
often have holes in deployed packages and red CI.

So, we decided to keep the old way testing on Travis CI and perform all
changes (more tests, more parallelism) only for GitLab CI.

We had a guess that we have enough machine resources and will able to do
some load balancing to overcome flaky fails on our own machines, but in
fact we picked up another approach later (see below).

That's all story behind #4599. What changes from those days?

We moved deployment jobs to GitLab CI[^1] and now we completely disabled
Travis CI (see #4410 and #4894). All jobs were moved either to GitLab CI
or right to GitHub Actions[^2].

We revisited our approach to improve stability of testing. Attemps to do
some load balancing together with attempts to keep not-so-large
execution time were failed. We should increase parallelism for speed,
but decrease it for stability at the same time. There is no optimal
balance.

So we decided to track flaky fails in the issue tracker and restart a
test after a known fail (see details in [1]). This way we don't need to
exclude tests and disable parallelism in order to get the stable and
fast testing[^3]. At least in theory. We're on the way to verify this
guess, but hopefully we'll stick with some adequate defaults that will
work everywhere[^4].

To sum up, there are several reasons to remove the old workaround, which
was implemented in the scope of #4599: no Travis CI, no foreseeable
reasons to exclude tests and reduce parallelism depending on a CI
provider.

Footnotes:

[^1]: This is simplification. Travis CI deployment jobs were not moved
      as is. GitLab CI jobs push packages to the new repositories
      backend (#3380). Travis CI jobs were disabled later (as part of
      #4947), after proofs that the new infrastructure works fine.
      However this is the another story.

[^2]: Now we're going to use GitHub Actions for all jobs, mainly because
      GitLab CI is poorly integrated with GitHub pull requests (when
      source branch is in a forked repository).

[^3]: Some work toward this direction still to be done:

      First, 'replication' test suite still excluded from the testing
      under RPM package build. It seems, we should just enable it back,
      it is tracked by #4798.

      Second, there is the issue [2] to get rid of ancient traces of the
      old attempts to keep the testing stable (from test-run side).
      It'll give us more parallelism in testing.

[^4]: Of course, we perform investigations of flaky fails and fix code
      and testing problems it feeds to us. However it appears to be the
      long activity.

References:

[1]: tarantool/test-run#217
[2]: https://github.com/tarantool/test-run/issues/251
Totktonada pushed a commit to tarantool/tarantool that referenced this pull request Dec 26, 2020
Removed obvious part in rpm spec for Travis-CI, due to it is no
longer in use.

---- Comments from @Totktonada ----

This change is a kind of revertion of the commit
d48406d ('test: add more tests to
packaging testing'), which did close #4599.

Here I described the story, why the change was made and why it is
reverted now.

We run testing during an RPM package build: it may catch some
distribution specific problem. We had reduced quantity of tests and
single thread tests execution to keep the testing stable and don't break
packages build and deployment due to known fragile tests.

Our CI had to use Travis CI, but we were in transition to GitLab CI to
use our own machines and don't reach Travis CI limit with five jobs
running in parallel.

We moved package builds to GitLab CI, but kept build+deploy jobs on
Travis CI for a while: GitLab CI was the new for us and we wanted to do
this transition smoothly for users of our APT / YUM repositories.

After enabling packages building on GitLab CI, we wanted to enable more
tests (to catch more problems) and wanted to enable parallel execution
of tests to speed up testing (and reduce amount of time a developer wait
for results).

We observed that if we'll enable more tests and parallel execution on
Travis CI, the testing results will become much less stable and so we'll
often have holes in deployed packages and red CI.

So, we decided to keep the old way testing on Travis CI and perform all
changes (more tests, more parallelism) only for GitLab CI.

We had a guess that we have enough machine resources and will able to do
some load balancing to overcome flaky fails on our own machines, but in
fact we picked up another approach later (see below).

That's all story behind #4599. What changes from those days?

We moved deployment jobs to GitLab CI[^1] and now we completely disabled
Travis CI (see #4410 and #4894). All jobs were moved either to GitLab CI
or right to GitHub Actions[^2].

We revisited our approach to improve stability of testing. Attemps to do
some load balancing together with attempts to keep not-so-large
execution time were failed. We should increase parallelism for speed,
but decrease it for stability at the same time. There is no optimal
balance.

So we decided to track flaky fails in the issue tracker and restart a
test after a known fail (see details in [1]). This way we don't need to
exclude tests and disable parallelism in order to get the stable and
fast testing[^3]. At least in theory. We're on the way to verify this
guess, but hopefully we'll stick with some adequate defaults that will
work everywhere[^4].

To sum up, there are several reasons to remove the old workaround, which
was implemented in the scope of #4599: no Travis CI, no foreseeable
reasons to exclude tests and reduce parallelism depending on a CI
provider.

Footnotes:

[^1]: This is simplification. Travis CI deployment jobs were not moved
      as is. GitLab CI jobs push packages to the new repositories
      backend (#3380). Travis CI jobs were disabled later (as part of
      #4947), after proofs that the new infrastructure works fine.
      However this is the another story.

[^2]: Now we're going to use GitHub Actions for all jobs, mainly because
      GitLab CI is poorly integrated with GitHub pull requests (when
      source branch is in a forked repository).

[^3]: Some work toward this direction still to be done:

      First, 'replication' test suite still excluded from the testing
      under RPM package build. It seems, we should just enable it back,
      it is tracked by #4798.

      Second, there is the issue [2] to get rid of ancient traces of the
      old attempts to keep the testing stable (from test-run side).
      It'll give us more parallelism in testing.

[^4]: Of course, we perform investigations of flaky fails and fix code
      and testing problems it feeds to us. However it appears to be the
      long activity.

References:

[1]: tarantool/test-run#217
[2]: https://github.com/tarantool/test-run/issues/251

(cherry picked from commit d9c25b7)
Totktonada pushed a commit to tarantool/tarantool that referenced this pull request Dec 26, 2020
Removed obvious part in rpm spec for Travis-CI, due to it is no
longer in use.

---- Comments from @Totktonada ----

This change is a kind of revertion of the commit
d48406d ('test: add more tests to
packaging testing'), which did close #4599.

Here I described the story, why the change was made and why it is
reverted now.

We run testing during an RPM package build: it may catch some
distribution specific problem. We had reduced quantity of tests and
single thread tests execution to keep the testing stable and don't break
packages build and deployment due to known fragile tests.

Our CI had to use Travis CI, but we were in transition to GitLab CI to
use our own machines and don't reach Travis CI limit with five jobs
running in parallel.

We moved package builds to GitLab CI, but kept build+deploy jobs on
Travis CI for a while: GitLab CI was the new for us and we wanted to do
this transition smoothly for users of our APT / YUM repositories.

After enabling packages building on GitLab CI, we wanted to enable more
tests (to catch more problems) and wanted to enable parallel execution
of tests to speed up testing (and reduce amount of time a developer wait
for results).

We observed that if we'll enable more tests and parallel execution on
Travis CI, the testing results will become much less stable and so we'll
often have holes in deployed packages and red CI.

So, we decided to keep the old way testing on Travis CI and perform all
changes (more tests, more parallelism) only for GitLab CI.

We had a guess that we have enough machine resources and will able to do
some load balancing to overcome flaky fails on our own machines, but in
fact we picked up another approach later (see below).

That's all story behind #4599. What changes from those days?

We moved deployment jobs to GitLab CI[^1] and now we completely disabled
Travis CI (see #4410 and #4894). All jobs were moved either to GitLab CI
or right to GitHub Actions[^2].

We revisited our approach to improve stability of testing. Attemps to do
some load balancing together with attempts to keep not-so-large
execution time were failed. We should increase parallelism for speed,
but decrease it for stability at the same time. There is no optimal
balance.

So we decided to track flaky fails in the issue tracker and restart a
test after a known fail (see details in [1]). This way we don't need to
exclude tests and disable parallelism in order to get the stable and
fast testing[^3]. At least in theory. We're on the way to verify this
guess, but hopefully we'll stick with some adequate defaults that will
work everywhere[^4].

To sum up, there are several reasons to remove the old workaround, which
was implemented in the scope of #4599: no Travis CI, no foreseeable
reasons to exclude tests and reduce parallelism depending on a CI
provider.

Footnotes:

[^1]: This is simplification. Travis CI deployment jobs were not moved
      as is. GitLab CI jobs push packages to the new repositories
      backend (#3380). Travis CI jobs were disabled later (as part of
      #4947), after proofs that the new infrastructure works fine.
      However this is the another story.

[^2]: Now we're going to use GitHub Actions for all jobs, mainly because
      GitLab CI is poorly integrated with GitHub pull requests (when
      source branch is in a forked repository).

[^3]: Some work toward this direction still to be done:

      First, 'replication' test suite still excluded from the testing
      under RPM package build. It seems, we should just enable it back,
      it is tracked by #4798.

      Second, there is the issue [2] to get rid of ancient traces of the
      old attempts to keep the testing stable (from test-run side).
      It'll give us more parallelism in testing.

[^4]: Of course, we perform investigations of flaky fails and fix code
      and testing problems it feeds to us. However it appears to be the
      long activity.

References:

[1]: tarantool/test-run#217
[2]: https://github.com/tarantool/test-run/issues/251

(cherry picked from commit d9c25b7)
Totktonada pushed a commit to tarantool/tarantool that referenced this pull request Dec 26, 2020
Removed obvious part in rpm spec for Travis-CI, due to it is no
longer in use.

---- Comments from @Totktonada ----

This change is a kind of revertion of the commit
d48406d ('test: add more tests to
packaging testing'), which did close #4599.

Here I described the story, why the change was made and why it is
reverted now.

We run testing during an RPM package build: it may catch some
distribution specific problem. We had reduced quantity of tests and
single thread tests execution to keep the testing stable and don't break
packages build and deployment due to known fragile tests.

Our CI had to use Travis CI, but we were in transition to GitLab CI to
use our own machines and don't reach Travis CI limit with five jobs
running in parallel.

We moved package builds to GitLab CI, but kept build+deploy jobs on
Travis CI for a while: GitLab CI was the new for us and we wanted to do
this transition smoothly for users of our APT / YUM repositories.

After enabling packages building on GitLab CI, we wanted to enable more
tests (to catch more problems) and wanted to enable parallel execution
of tests to speed up testing (and reduce amount of time a developer wait
for results).

We observed that if we'll enable more tests and parallel execution on
Travis CI, the testing results will become much less stable and so we'll
often have holes in deployed packages and red CI.

So, we decided to keep the old way testing on Travis CI and perform all
changes (more tests, more parallelism) only for GitLab CI.

We had a guess that we have enough machine resources and will able to do
some load balancing to overcome flaky fails on our own machines, but in
fact we picked up another approach later (see below).

That's all story behind #4599. What changes from those days?

We moved deployment jobs to GitLab CI[^1] and now we completely disabled
Travis CI (see #4410 and #4894). All jobs were moved either to GitLab CI
or right to GitHub Actions[^2].

We revisited our approach to improve stability of testing. Attemps to do
some load balancing together with attempts to keep not-so-large
execution time were failed. We should increase parallelism for speed,
but decrease it for stability at the same time. There is no optimal
balance.

So we decided to track flaky fails in the issue tracker and restart a
test after a known fail (see details in [1]). This way we don't need to
exclude tests and disable parallelism in order to get the stable and
fast testing[^3]. At least in theory. We're on the way to verify this
guess, but hopefully we'll stick with some adequate defaults that will
work everywhere[^4].

To sum up, there are several reasons to remove the old workaround, which
was implemented in the scope of #4599: no Travis CI, no foreseeable
reasons to exclude tests and reduce parallelism depending on a CI
provider.

Footnotes:

[^1]: This is simplification. Travis CI deployment jobs were not moved
      as is. GitLab CI jobs push packages to the new repositories
      backend (#3380). Travis CI jobs were disabled later (as part of
      #4947), after proofs that the new infrastructure works fine.
      However this is the another story.

[^2]: Now we're going to use GitHub Actions for all jobs, mainly because
      GitLab CI is poorly integrated with GitHub pull requests (when
      source branch is in a forked repository).

[^3]: Some work toward this direction still to be done:

      First, 'replication' test suite still excluded from the testing
      under RPM package build. It seems, we should just enable it back,
      it is tracked by #4798.

      Second, there is the issue [2] to get rid of ancient traces of the
      old attempts to keep the testing stable (from test-run side).
      It'll give us more parallelism in testing.

[^4]: Of course, we perform investigations of flaky fails and fix code
      and testing problems it feeds to us. However it appears to be the
      long activity.

References:

[1]: tarantool/test-run#217
[2]: https://github.com/tarantool/test-run/issues/251

(cherry picked from commit d9c25b7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement TAP protocol for Lua rests
4 participants