-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parallelization #235
Comments
Requirements:
|
Thanks for your requirements!
Let me summarize my understanding.
Key Points:
I will write about |
Let me summarize my understanding. We want to finish the testing as quickly as possible. So we need to assign tasks promptly to workers who seem to be idle. Here, we designate the entities generating tests as producers and executing tests as consumers. There are two ways of assigning tests. It's push style and pull style. We use pull style because: push style:
pull style:
Let me summarize my understanding. test-unit calling |
Let me summarize my understanding. We want the testing framework to work in various environments. Using external libraries can make implementation more convenient, but it also increases dependencies, making it challenging to work in various environments.
So it doesn't depend on external libraries including default/bundled gems.
Internal workings of the
Let's implement the Thread backend! We temporarily named the producer |
First, we implement a standalone script that communicates with Ruby's
First, we implement Consumer only without Producer. |
In general, the number of tasks is often greater than the number of consumers. Ruby's
We can also explicitly switch execution to another thread using the following:
When multiple threads execute destructive operations on the same object, it occurs race conditions.
We implemented Consumer with Producer. Next:
|
We created 100 tests for testing below. # sample-test.rb
require 'test-unit'
class SampleTest < Test::Unit::TestCase
100.times do |i|
define_method("test_#{i}") do
p i
end
end
end We checked the version of the local # lib/test/unit/version.rb
module Test
module Unit
VERSION = "3.6.3"
end
end To use the local # sample-test.rb
require 'test-unit'
require "test/unit/version"
p Test::Unit::VERSION
class SampleTest < Test::Unit::TestCase
100.times do |i|
define_method("test_#{i}") do
p i
end
end
end $ ruby sample-test.rb
"3.6.1"
$ ruby -I lib sample-test.rb
"3.6.3" We added # lib/test/unit/testcase.rb
module Test
module Unit
class TestCase
def run(result)
puts caller
begin
...
end
end
end
end $ ruby -I lib sample-test.rb -n test_1
~/src/github.com/test-unit/test-unit/lib/test/unit/testsuite.rb:124:in `run_test'
~/src/github.com/test-unit/test-unit/lib/test/unit/testsuite.rb:53:in `run'
... We found multiple tests are being executed sequentially. # lib/test/unit/testsuite.rb
module Test
module Unit
class TestSuite
def run(result, &progress_block)
...
while test = @tests.shift
@n_tests += test.size
run_test(test, result, &progress_block)
@passed = false unless test.passed?
end
...
end
end
end
end We implemented Consumer with Producer below. # lib/test/unit/testsuite.rb
module Test
module Unit
class TestSuite
def run(result, &progress_block)
@start_time = Time.now
yield(STARTED, name)
yield(STARTED_OBJECT, self)
run_startup(result)
n_consumers = 5
tests = Thread::Queue.new
producer = Thread.new do
@tests.each do |test|
tests << test
end
n_consumers.times do
tests << nil
end
end
consumers = n_consumers.times.collect do
Thread.new do
loop do
test = tests.pop
break if test.nil?
@n_tests += test.size
run_test(test, result, &progress_block)
@passed = false unless test.passed?
end
end
end
producer.join
consumers.each(&:join)
# while test = @tests.shift
# @n_tests += test.size
# # run_test(test, result, &progress_block)
# @passed = false unless test.passed?
# end
ensure
begin
run_shutdown(result)
ensure
@elapsed_time = Time.now - @start_time
yield(FINISHED, name)
yield(FINISHED_OBJECT, self)
end
end
end
end
end Next:
|
for future parallelization support. Part of test-unitGH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
for future parallelization support. Part of GH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
We refactored the code to execute multiple tests in the We've started abstracting the The following methods has been migrated from
Next:
|
Before: `Array#shift` **removes** and returns leading elements. So, after removing, it's rolled up the number of tests to `@n_tests`. After: `Array#each` iterates over array elements, **without removing** them. Using `Array#each` instead of `Array#shift`, means there's no need to rolled up the number of tests to `@n_tests`. for future parallelization support. Part of test-unitGH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Before: `Array#shift` **removes** and returns leading elements. So, after removing, it's rolled up the number of tests to `@n_tests`. After: `Array#each` iterates over array elements, **without removing** them. Using `Array#each` instead of `Array#shift`, means there's no need to rolled up the number of tests to `@n_tests`. for future parallelization support. Part of test-unitGH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Before: `Array#shift` **removes** and returns leading elements. So, after removing, it's rolled up the number of tests to `@n_tests`. After: `Array#each` iterates over array elements, **without removing** them. Using `Array#each` instead of `Array#shift`, means there's no need to rolled up the number of tests to `@n_tests`. for future parallelization support. Part of GH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
The instance variables in
These instance variables can't refer from
Next:
Task list:
|
Use `Array#each` instead of `Array#shift` in test-unitGH-240. Before test-unitGH-240 change: `Array#shift` **removes** and returns leading elements. So after removing, it cached `TestCase#passed?` or `TestSuite#passed?` results to `@passed`. After this change: `Array#each` iterates over array elements, **without removing** them. Using `Array#each` instead of `Array#shift`, means there's no need to cached `TestCase#passed?` or `TestSuite#passed?` results to `@passed`. Note: Since `@passed = true` was set in `TestSuite#initialize`, it should default to true when there are no tests. `Array#all?` returns true for an empty array below. The behavior remains unchanged, so there's no issue. ```ruby [].all? { true } # => true ``` for future parallelization support. Part of test-unitGH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Use `Array#each` instead of `Array#shift` in GH-240. Before GH-240 change: `Array#shift` **removes** and returns leading elements. So after removing, it cached `TestCase#passed?` or `TestSuite#passed?` results to `@passed`. After this change: `Array#each` iterates over array elements, **without removing** them. Using `Array#each` instead of `Array#shift`, means there's no need to cached `TestCase#passed?` or `TestSuite#passed?` results to `@passed`. Note: Since `@passed = true` was set in `TestSuite#initialize`, it should default to true when there are no tests. `Array#all?` returns true for an empty array below. The behavior remains unchanged, so there's no issue. ```ruby [].all? { true } # => true ``` for future parallelization support. Part of GH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Migrate the following methods from `TestSuite` to `TestSuiteRunner`. - `#run` - `#run_startup` - `#run_tests` - `#run_test` - `#run_shutdown` - `#handle_exception` Then invoke `TestSuiteRunner` from `TestSuite`. for future parallelization support. Part of test-unitGH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Migrate the following methods from `TestSuite` to `TestSuiteRunner`. - `#run` - `#run_startup` - `#run_tests` - `#run_test` - `#run_shutdown` - `#handle_exception` Then invoke `TestSuiteRunner` from `TestSuite`. for future parallelization support. Part of GH-235. --------- Co-authored-by: Sutou Kouhei <kou@clear-code.com>
We discovered a more elegant approach GH-242, then
We implemented GH-243.
Backend is the Currently, We aim to enable switching via command-line options. We guess two main approaches:
We considered the following approaches:
Next:
Task list:
|
We want to switch the backend (`TestSuiteRunner`) with an option. Currently, it operates sequentially, but we want to switch it to a `Thread` based or other parallel runner. `TestSuiteRunner` is hardcoded to invoke within `TestSuite#run`. It cannot be modified externally, so we need to implement a mechanism to enable external modification. We guess two main approaches: * Stop invoking the runner inside and instead accept it from the outside * Allow injection of the runner from the outside We considered the following approaches: 1. Pass the Runner as a `TestSuite#run` argument Seems broken as the interface might change ```ruby # `TestSuiteRunner#run_test` test.run(result) do |event_name, *args| ``` Here, `test` is a `TestCase` or `TestSuite` object. If it's `TestCase`, use this argument. If it's `TestSuite`, use that argument... Seems does not make sense 2. Pass the Runner as a `TestSuite#initialize` argument Switch the backend per `TestSuite`. When grouping `TestCase`, `TestSuite` are nested. Executing a single `TestSuite`, multiple `TestSuite` needs to be run. Seems does not make sense. 3. Use `TestSuiteRunner` for a system global It's uncommon to switch test runners per test suite, so it's better to handle it globally. e.g.: ```ruby # `TestSuiteRunner.run` class << self def run(test_suite, result, &progress_block) end end ``` Seems to make sense. We've decided to implement approach 3. We've inserted an abstracted layer. After, by replacing `TestSuiteRunner.run` from the outside. for future parallelization support. Part of test-unitGH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
We want to switch the backend (`TestSuiteRunner`) with an option. Currently, it operates sequentially, but we want to switch it to a `Thread` based or other parallel runner. `TestSuiteRunner` is hardcoded to invoke within `TestSuite#run`. It cannot be modified externally, so we need to implement a mechanism to enable external modification. We guess two main approaches: * Stop invoking the runner inside and instead accept it from the outside * Allow injection of the runner from the outside We considered the following approaches: 1. Pass the Runner as a `TestSuite#run` argument Seems broken as the interface might change ```ruby # `TestSuiteRunner#run_test` test.run(result) do |event_name, *args| ``` Here, `test` is a `TestCase` or `TestSuite` object. If it's `TestCase`, use this argument. If it's `TestSuite`, use that argument... Seems does not make sense 2. Pass the Runner as a `TestSuite#initialize` argument Switch the backend per `TestSuite`. When grouping `TestCase`, `TestSuite` are nested. Executing a single `TestSuite`, multiple `TestSuite` needs to be run. Seems does not make sense. 3. Use `TestSuiteRunner` for a system global It's uncommon to switch test runners per test suite, so it's better to handle it globally. e.g.: ```ruby # `TestSuiteRunner.run` class << self def run(test_suite, result, &progress_block) end end ``` Seems to make sense. We've decided to implement approach 3. We've inserted an abstracted layer. After, by replacing `TestSuiteRunner.run` from the outside. for future parallelization support. Part of GH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
`TestSuite` only invokes `TestSuiteRunner.run` (not `.run` of parallel runners such as `Thread` base). As a result, parallel runners don't need to store the default runner, so they use instance variables rather than class variables. for future parallelization support. Part of test-unitGH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
`TestSuite` only invokes `TestSuiteRunner.run` (not `.run` of parallel runners such as `Thread` base). As a result, parallel runners don't need to store the default runner, so they use instance variables rather than class variables. for future parallelization support. Part of GH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
We implemented GH-246. We implemented GH-247. # e.g.: switch thread based runner (not available yet)
Test::Unit::TestSuiteRunner.default = Test::Unit::TestSuiteThreadRunner
# e.g.: switch sequential runner (default)
Test::Unit::TestSuiteRunner.default = Test::Unit::TestSuiteRunner Next:
Task list:
|
Add support for switching the backend such as `Thread`. Please note that the `Thread` based runner is not yet available (raises `NameError`). Examples: * `ruby -I lib test/run-test.rb --parallel`: `Thread` * `ruby -I lib test/run-test.rb --parallel=thread`: `Thread` * `ruby -I lib test/run-test.rb --no-parallel`: Sequential * `ruby -I lib test/run-test.rb` (no --parallel option): Sequential Note: We considered the following other options. 1. `--runner` option: * Already exists but is used to switch the UI execution * UI execution and internal parallelism are independent * Seems does not make sense 2. `--executer` option * Does not exist * `TestSuiteRunner` has run methods (not execute) * Seems does not make sense for future parallelization support. Part of test-unitGH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
Add support for switching the backend such as `Thread`. Please note that the `Thread` based runner is not yet available (raises `NameError`). Examples: * `ruby -I lib test/run-test.rb --parallel`: `Thread` * `ruby -I lib test/run-test.rb --parallel=thread`: `Thread` * `ruby -I lib test/run-test.rb --no-parallel`: Sequential * `ruby -I lib test/run-test.rb` (no --parallel option): Sequential Note: We considered the following other options. 1. `--runner` option: * Already exists but is used to switch the UI execution * UI execution and internal parallelism are independent * Seems does not make sense 2. `--executer` option * Does not exist * `TestSuiteRunner` has run methods (not execute) * Seems does not make sense for future parallelization support. Part of GH-235. Co-authored-by: Sutou Kouhei <kou@clear-code.com>
We implemented GH-250. Add support for switching the backend such as Examples:
Next:
|
First, we implemented a sequential $ ruby -I lib test/run-test.rb --parallel
-----------------------------------------------------------------
471 tests, 1600 assertions, 0 failures, 0 errors, 0 pendings, 6 omissions, 0 notifications 100% passed
-----------------------------------------------------------------
514.18 tests/s, 1755.41 assertions/s Next, we implemented a parallel $ ruby -I lib test/run-test.rb --parallel
-----------------------------------------------------------------
471 tests, 1600 assertions, 6 failures, 14 errors, 0 pendings, 6 omissions, 0 notifications 96.9892% passed
-----------------------------------------------------------------
577.98 tests/s, 1963.40 assertions/s We fixed them one by one. Adding the $ ruby -I lib test/run-test.rb --parallel --stop-on-failure
~/src/github.com/test-unit/test-unit/lib/test/unit/testresult.rb:99:in `throw': uncaught throw #<Object:0x000000010da08918> (UncaughtThrowError)
throw @stop_tag
^^^^^^^^^
from /Users/zzz/src/github.com/test-unit/test-unit/lib/test/unit/testresult.rb:99:in `stop'
from /Users/zzz/src/github.com/test-unit/test-unit/lib/test/unit/autorunner.rb:597:in `block in attach_to_mediator' The related parts are excerpted below. # lib/test/unit/testresult.rb
attr_accessor :stop_tag # lib/test/unit/testresult.rb:41
# Constructs a new, empty TestResult.
def initialize
@run_count, @pass_count, @assertion_count = 0, 0, 0
@summary_generators = []
@problem_checkers = []
@faults = []
@stop_tag = nil # lib/test/unit/testresult.rb:49
initialize_containers
end
def stop
throw @stop_tag # lib/test/unit/testresult.rb:99
end We found sections setting $ grep -r 'stop_tag =' lib/
lib//test/unit/ui/testrunnermediator.rb: result.stop_tag = stop_tag
lib//test/unit/testresult.rb: @stop_tag = nil The related parts are excerpted below. # lib//test/unit/ui/testrunnermediator.rb
catch do |stop_tag|
result.stop_tag = stop_tag # lib//test/unit/ui/testrunnermediator.rb:40
with_listener(result) do
notify_listeners(RESET, @suite.size)
notify_listeners(STARTED, result)
run_suite(result)
end
end
Why does
We guess two approaches:
Next:
|
I'm a contributor of red-data-tools/red-datasets. The tests for red-datasets use test-unit. Thanks!
With the increasing size of test suite, the execution time for tests are slow down. I'll give it a try to improve the execution time below:
Nevertheless the execution time for tests are slow, I want to parallelization support of test-unit.
I would like to discuss the design of parallelization support of test-unit in this issue.
The text was updated successfully, but these errors were encountered: