Skip to content

Conversation

Totktonada
Copy link
Contributor

#330 describes a problem with searching a Lua module in the consistent mode:

./test/test-run.py sql-tap/collation -j -1

However, the same tests is working in the parallel mode with one worker process:

./test/test-run.py sql-tap/collation -j 1

First of all, the sql-tap/collation argument actually matches two tests:

  • sql-tap/collation_unicode.test.lua
  • sql-tap/collation.test.lua

The latter is marked as fragile, while the former has no such mark. test-run places them in two different task groups, which are served by two different worker instances. However, the test suite object is shared between them.

A worker's vardir is calculated as suite's vardir + worker_name and it is set back to the suite's vardir. This code is written in assumption that a test suite contains the main vardir before worker's initialization.

This assumption works for the parallel mode, because of two facts:

  • The suite object is created with the main vardir as vardir.
  • The suite object is copied into a worker's process and so it is never changed twice.

The latter property is not hold in the consistent mode. We create 001-sql-tap worker for stable tests and then another 001-sql-tap for fragile tests. And they're in the same process for the consistent mode.

It results to worker's vardir like /tmp/t/001-sql-tap/001-sql-tap. However, the needed Lua module resides in /tmp/t/001-sql-tap.

Let's just acquire the main vardir directly from arguments list.

The problem was overlooked when a separate fragile test group was added. The consistent mode did assumption that one worker is created for one test suite, while it is not so anymore.

This double initialization of the test suite's vardir is actually an effect of parallelization of test-run. A lot of code already exist before the parallelization was implemented and it was not carefully redesigned.

Fixes #330

#330 describes a problem with searching a Lua module in the consistent
mode:

```
./test/test-run.py sql-tap/collation -j -1
```

However, the same tests is working in the parallel mode with one worker
process:

```
./test/test-run.py sql-tap/collation -j 1
```

First of all, the `sql-tap/collation` argument actually matches two
tests:

* `sql-tap/collation_unicode.test.lua`
* `sql-tap/collation.test.lua`

The latter is marked as fragile, while the former has no such mark.
test-run places them in two different task groups, which are served by
two different worker instances. However, the test suite object is shared
between them.

A worker's vardir is calculated as `suite's vardir + worker_name` and it
is set back to the suite's vardir. This code is written in assumption
that a test suite contains the main vardir before worker's
initialization.

This assumption works for the parallel mode, because of two facts:

* The suite object is created with the main vardir as vardir.
* The suite object is copied into a worker's process and so it is never
  changed twice.

The latter property is not hold in the consistent mode. We create
`001-sql-tap` worker for stable tests and then another `001-sql-tap` for
fragile tests. And they're in the same process for the consistent mode.

It results to worker's vardir like `/tmp/t/001-sql-tap/001-sql-tap`.
However, the needed Lua module resides in `/tmp/t/001-sql-tap`.

Let's just acquire the main vardir directly from arguments list.

The problem was overlooked when a separate fragile test group was added.
The consistent mode did assumption that one worker is created for one
test suite, while it is not so anymore.

This double initialization of the test suite's vardir is actually an
effect of parallelization of test-run. A lot of code already exist
before the parallelization was implemented and it was not carefully
redesigned.

Fixes #330
Copy link
Contributor

@ylobankov ylobankov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it works properly, I am ok.

@Totktonada
Copy link
Contributor Author

If it works properly, I am ok.

I made several runs in the parallel mode and in the consistent mode using the sql-tap test suite. The former is not regressed, the latter now works.

I also made an attempt to run all the suites, but I was hit by #380 (comment) and postponed it.

I guess the change is safe enough.

@ylobankov ylobankov merged commit aac77f5 into master May 17, 2023
@ylobankov ylobankov deleted the gh-330-fix-consistent-mode branch May 17, 2023 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modules listed in lua_libs section might be missing in the working directory when consistent mode is used
2 participants