Reset database and servers for each test #568

mliszcz · 2019-06-15T18:44:02Z

This PR resets the database and device servers for each tests.

There is a wrapper script:

run_with_fixture.sh <test-binary> [param1] [param2] ...

The script:

starts docker with mysql and database ds
starts DevTest and FwdTest
runs <test-binary> (forwards all parameters)
stops DevTest and FwdTest
stops dockers

Note: resetting environment for each CxxTest's test_something() function is not in scope of this PR.

Changes

Fix tests that were depending on configuration set in previous tests:
- 24ece26 Correct event::att_conf_event_buffer test
- 4688e24 Correct CXX::cxx_mem_attr test
- 8b69890 Correct event::event_lock test
Docker client is needed to start mysql/database ds from wrapper script
- 840d759 Expose docker daemon in CI containers
tango_admin is needed to ping database / admin device
- ebdd4ed Install tango_admin in CI containers
The script itself
- dfb49e0 Add new test runner and server start/stop scripts
- ae2dd81 Enable new test runner script in CI
We can now run tests in parallel
- d86ffeb Run tests in parallel in CI
Update README
- 666d719 Update instructions how to run tests
Remove old scripts, they will no longer work
- b3db05b Remove old scripts for running tests

Testcase output is collected in test_results directory:

cpp_test_suite/test_results/
└── cxx_stateless_subscription_20190616.193616.465333747
    ├── conf_devtest.out
    ├── cxx_stateless_subscription_exit_code.out
    ├── cxx_stateless_subscription.out
    ├── DevTest_test.0.out
    ├── DevTest_test2.0.out
    ├── DevTest_test.out
    ├── FwdTest_test.out
    └── server_pids
        ├── 3122
        └── 3152

Results

I've compared ctest output with and without changes in this PR. It is the same (timestamps, pids, etc. ... differ of course), except CXX::cxx_old_poll, which seems to produce different output on every run.

There is ~10s overhead for each test (start/stop dockers/servers). This gives ~15m increase in sequential test execution of ~90 tests.

Before any changes (8b69890) - 1000s:
https://travis-ci.org/tango-controls/cppTango/jobs/546186693
```
Total Test time (real) = 1059.32 sec
```
With script (ae2dd81) - 1000 + 1000 = 2000s:
https://travis-ci.org/tango-controls/cppTango/jobs/546296376
```
Total Test time (real) = 1856.24 sec
```
With 8 parallel tests (d86ffeb) - 2000s / 8 = 250s:
https://travis-ci.org/tango-controls/cppTango/jobs/546399263
```
Total Test time (real) = 287.84 sec
```

If we run multiple tests in parallel, some may fail (due to timer expiration under high load).
If there are any failures in CI, I re-run failed tests sequentially and require them to pass two times in a row.

8 parallel tests is a reasonable value for Travis (0 - 4 tests usually fail). I tried with 16 parallel tests but on first run 21 out of 93 tests failed and there was no improvement in execution time.

Remarks

Codacy is failing due to changes in Dockerfiles. The bot suggest to specify version in apt like: apt install package=x.y.z. I don't think it will do any good.
I'm not sure if/how code coverage calculation is impacted. I needed to adjust CMAKE_CTEST_COMMAND for coveralls plugin, but coveralls seems to be no longer in use. @Ingvord could you comment on? Is it possible to see coverage report for this PR on sonar?

Explicitly set attribute's min_value to 0. Until now, min_value = 0 was carried from previous testcase (event::att_conf_event).

Write Short_attr_w value so that device is in ALARM state. Until now, the value = 10 was carried from previous testcase (CXX::cxx_attr_write, test_write_some_scalar_attributes).

Set abs_change property for attribute event_change_tst. Until now, property was carried from previous testcase (event::multi_event).

bourtemb · 2019-07-18T08:25:45Z

When looking at the first log of Travis, it seems like one test failed (37 - CXX::cxx_reconnection_zmq (Failed)) but Travis check status is green as if everything was fine.
So there must be a problem with the final value returned by the script ran by Travis.

mliszcz · 2019-07-18T08:51:12Z

So there must be a problem with the final value returned by the script ran by Travis.

This is by design.

We have problems with our testcases as they rely on timers and sleeps.
When we run multiple tests in parallel, the system experiences a higher load and tests fail more often.
That's why I added a second CTest run with -rerun-failed.

if ! docker exec -w "${build_dir}" cpp_tango ctest -output-on-failure -j8
then
  if ! docker exec -w "${build_dir}" cpp_tango ctest --output-on-failure --rerun-failed --repeat-until-fail 2
  then
    exit 1
  fi
fi

The test failed in the first run and then passed two times in a row on the second run, so I assumed that the test is unstable and the commit should be accepted. Is this behavior ok from your pov?

99% tests passed, 1 tests failed out of 93
Total Test time (real) = 287.51 sec
The following tests FAILED:
	 37 - CXX::cxx_reconnection_zmq (Failed)
Errors while running CTest
Test project /home/tango/src/build
    Start 37: CXX::cxx_reconnection_zmq
    Test #37: CXX::cxx_reconnection_zmq ........   Passed  103.31 sec
    Start 37: CXX::cxx_reconnection_zmq
1/1 Test #37: CXX::cxx_reconnection_zmq ........   Passed  103.49 sec
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 206.81 sec

bourtemb · 2019-07-19T07:38:34Z

Indeed, you already explained that in the long description of the Pull Request...
Sorry for making you repeat yourself... I was so used that when a test fails, Travis check is marked as red...
Your proposal is a clever approach.
I think the best would be that each test never fails, but with your approach, if a test is failing, it should pass again at least twice successfully to be considered as OK.
I think this could be acceptable in the current situation.
Still the best would be that each test never fails.

Could we infer the number of available cores on the Travis machine and use that number (or a relevant number for this number of available cores) in the -j option?

mliszcz · 2019-07-19T08:47:13Z

Could we infer the number of available cores on the Travis machine and use that number (or a relevant number for this number of available cores) in the -j option?

Do you think we need to determine the number of available cores, i.e. can it change? Per travis docs, it is fixed to 2: https://docs.travis-ci.com/user/reference/overview/

Also, there is $TRAVIS_NUMCORES: travis-ci/travis-build#1079

Anyway, I did a dummy commit. According to lscpu output and /proc/cpuinfo, there are two CPUs in the machine used by travis. Judgind by the clock rate and L3 cache those may be Xeons E5-2699 v3. There is some virtualization involved (KVM) and we have just one hyper-threaded core in each CPU. This means that we have four cores in total.

The problem is to determine the relation between the cpu cores and the number of parallel tests.

Some tests are sleeping for a long time, some are waiting for I/O, etc. ... and in such cases another process may have a chance to run on the cpu.

From original post:
sequential: 1856.24 sec
8 in parallel: 287.84 sec (so speedup is almost linear, 1856.24 sec/8 = 232)
16 is too much

If we really want flexibility here, I propose to use: (num of jobs) = 2 * (num of cores).
Update: (num of cores) above includes HT sibling cores.

mliszcz · 2019-11-04T14:40:09Z

@bourtemb, @t-b, @Ingvord is there any interest in having this PR merged? If yes, could you please do a review? I can then do a rework, apply your comments and resolve any confllicts. Otherwise I can close this PR without merging.

t-b · 2019-11-04T14:46:30Z

@mliszcz I would be in favour of having this here.

Ingvord

Fine with me. Go ahead and merge!

mliszcz · 2019-11-04T15:03:37Z

@t-b thanks!. Have you had a chance to read through the changes?

@t-b, @Ingvord, @bourtemb
I think we need to discuss at least whether this feature should be

optional (cmake flag, enabled by default?)
or mandatory (with old behavior removed, including start-/stop-tango targets and pre_test / post_test scripts)?

Currently I've made it mandatory, so that you can't start the database or device servers manually. All you can (and need to) do is to run ctest. This makes running tests simpler but maybe some more complex scenarios (or manual testing) will be imposible to execute. What is your opinion?

Ingvord · 2019-11-04T15:13:13Z

I believe you can always run DevTest and co manually using e.g. CLion. Since these are just executables. Obviously one will need to run conf_devtest on a fresh database before running any test server. Would be nice to have some convenience script for that matter.

Anyway for me this seems to be enough to let's say debug server/client. I can not envision any other complex case right away. Once anyone bumps into that we will handled it.

bourtemb · 2019-11-04T15:59:36Z

Currently I've made it mandatory, so that you can't start the database or device servers manually. All you can (and need to) do is to run ctest. This makes running tests simpler but maybe some more complex scenarios (or manual testing) will be imposible to execute. What is your opinion?

I agree with Igor. We can change the behaviour if needed in the future.

t-b · 2019-11-05T13:56:48Z

@mliszcz I'll make the review after the rebase.

mliszcz · 2019-11-05T19:03:24Z

@t-b thanks! I won't be able to rebase the PR at least during the next couple of days. Also, let's merge first #608 which will bring more conflicts and will require some extra changes to start the second database just for group test.

If you have any general comments regarding the proposed solution, please tell. Otherwise I'll ask you for a review once I have the PR rebased.

t-b · 2019-11-05T21:38:17Z

@mliszcz The general direction is very nice!

t-b

@mliszcz This is a really cool PR!!

The title says only something about databaser server resetting but in fact running the tests locally now just works! It took here 240s with 12 parallel jobs. I had to cherry-pick 4b5128f (zmqeventconsumer/zmqeventsupplier: Make it compatible with all stock cppzmq versions, 2019-06-05) though. So we need to get that merged :))

mliszcz · 2019-12-11T09:17:36Z

Thanks @t-b

It will be hard to bring this whole PR up to date.
I already descoped some parts to separate PRs (which can be merged without conflicts).
Before rebasing this one on master, I'd like to have these merged: #632, #633, #634.
I see you already approved the other PRs. @Ingvord or @bourtemb could you please check them as well?

t-b · 2019-12-11T12:17:42Z

@mliszcz I've merged the mentioned PRs.

mliszcz · 2019-12-15T14:19:23Z

Obsoleted by #640. Let's continue the review in the other PR.

mliszcz added 3 commits June 13, 2019 12:39

Correct event::att_conf_event_buffer test

24ece26

Explicitly set attribute's min_value to 0. Until now, min_value = 0 was carried from previous testcase (event::att_conf_event).

Correct CXX::cxx_mem_attr test

4688e24

Write Short_attr_w value so that device is in ALARM state. Until now, the value = 10 was carried from previous testcase (CXX::cxx_attr_write, test_write_some_scalar_attributes).

Correct event::event_lock test

8b69890

Set abs_change property for attribute event_change_tst. Until now, property was carried from previous testcase (event::multi_event).

mliszcz added the Do Not Merge label Jun 15, 2019

mliszcz force-pushed the tests-reset-environment branch from 3601eb5 to c8313c7 Compare June 15, 2019 19:55