-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPENSCAP-4118 - Add script to build tests #13029
Conversation
Skipping CI for Draft Pull Request. |
Things left to do:
|
build-scripts/build_tests.py
Outdated
return (filename.endswith('.pass.sh') | ||
or filename.endswith('.fail.sh') or | ||
filename.endswith('.notapplicable.sh')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str.endswith()
takes a tuple, so you can probably just
return (filename.endswith('.pass.sh') | |
or filename.endswith('.fail.sh') or | |
filename.endswith('.notapplicable.sh')) | |
return filename.endswith(('.pass.sh', '.fail.sh', '.notapplicable.sh')) |
build-scripts/build_tests.py
Outdated
for i in range(n): | ||
end = start + chunk_size + (1 if i < remainder else 0) | ||
inters.append(iter(items[start:end])) | ||
start = end | ||
return inters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't mind interleaving the split, you could use python's array slicing "step" property.
Given 3 workers and a total of 10 rules, you can assign rules to them like this:
>>> a = [1,2,3,4,5,6,7,8,9,10]
>>> a[0::3]
[1, 4, 7, 10]
>>> a[1::3]
[2, 5, 8]
>>> a[2::3]
[3, 6, 9]
where the first (start
) number is worker index, and the second (step
) number is the total number of workers.
This is what we use in Contest to avoid any off-by-one problems and ensure no rule is left behind, while making things just simpler and evenly distributing rule complexity across all workers (as complex/simple rules tend to alphabetically bunch up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some limited comments on the python code (as I am not familiar with the build system overall). Feel free to ignore any of them, as I don't know what is the level of code quality you want to maintain in this project.
Aside from the comments, consider (for an extra pedantic exercise):
typing
consistency - most of the code seems untyped, but some functions have types despite being_
(a.k.a. internal, not part of API). I'm also not sure ifdone_file: pathlib.Path = output_path / ".test_done"
does anything (with the inlined type)- Overall
"
vs'
consistency, they seem to be intermixed randomly - Lack of comments (especially in big functions like
_process_rules()
- Inconsistent indent (
ssg.jinja.process_file_with_macros
in_process_shared_file()
seems to squash two arguments at the end,ssg.environment.open_environment
inmain()
uses a different multi-line function call syntax. - f-strings for
logger.*
calls - note that you don't have to use the prehistoric%s
syntax, if you pass only one argument,logging
takes it as verbatim, so it can be an f-string like one of those you use for exceptions.
I'll test the functionality tomorrow, looking forward to it. 🙂
build-scripts/build_tests.py
Outdated
def _get_deny_templated_scenarios(test_config_path): | ||
deny_templated_scenarios = list() | ||
if test_config_path.exists(): | ||
test_config = ssg.yaml.open_raw(str(test_config_path.absolute())) | ||
if 'deny_templated_scenarios' in test_config: | ||
deny_templated_scenarios = test_config['deny_templated_scenarios'] | ||
return deny_templated_scenarios |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
def _get_deny_templated_scenarios(test_config_path): | |
deny_templated_scenarios = list() | |
if test_config_path.exists(): | |
test_config = ssg.yaml.open_raw(str(test_config_path.absolute())) | |
if 'deny_templated_scenarios' in test_config: | |
deny_templated_scenarios = test_config['deny_templated_scenarios'] | |
return deny_templated_scenarios | |
def _get_deny_templated_scenarios(test_config_path): | |
if test_config_path.exists(): | |
test_config = ssg.yaml.open_raw(str(test_config_path.absolute())) | |
if 'deny_templated_scenarios' in test_config: | |
return set(test_config['deny_templated_scenarios']) | |
return set() |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .get()
method is always shorter and in most of the cases faster:
def _get_deny_templated_scenarios(test_config_path): | |
deny_templated_scenarios = list() | |
if test_config_path.exists(): | |
test_config = ssg.yaml.open_raw(str(test_config_path.absolute())) | |
if 'deny_templated_scenarios' in test_config: | |
deny_templated_scenarios = test_config['deny_templated_scenarios'] | |
return deny_templated_scenarios | |
def _get_deny_templated_scenarios(test_config_path): | |
deny_templated_scenarios = set() | |
if test_config_path.exists(): | |
test_config = ssg.yaml.open_raw(str(test_config_path.absolute())) | |
deny_templated_scenarios |= test_config.get('deny_templated_scenarios', []) | |
return deny_templated_scenarios | |
build-scripts/build_tests.py
Outdated
with open(shared_script_path, 'w') as file: | ||
file.write(file_contents) | ||
file.write('\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with open(shared_script_path, 'w') as file: | |
file.write(file_contents) | |
file.write('\n') | |
_write_path(file_contents, shared_script_path) |
?
(And in other similar cases.)
build-scripts/build_tests.py
Outdated
root_path = pathlib.Path(args.root).resolve().absolute() | ||
output_path = pathlib.Path(args.output).resolve().absolute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.resolve()
already makes the path absolute:
$ pydoc3 pathlib.Path.resolve | cat
Help on function resolve in pathlib.Path:
pathlib.Path.resolve = resolve(self, strict=False)
Make the path absolute, resolving all symlinks on the way and also
normalizing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can clean that up.
build-scripts/build_tests.py
Outdated
if __name__ == "__main__": | ||
raise SystemExit(main()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would honestly just let python interpreter defaults handle exit code - raise an exception if something bad happens, otherwise let it finish cleanly.
If you really want an exception-less exit on resolved_rules_dir
not existing (seems like an arbitrary choice?), just do sys.exit(1)
in that one case. sys
will take care of the exit-related exception handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move to sys.exit
. However, under the covers this is how sys.exit
works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess either way works. I remember some problems with raising SystemExit()
with non-CPython intepreters, but we don't use those here, and those problems are probably fixed by now anyway. I would still leave the 0
exit to be done by default instead of SystemExit(0)
, but that's probably bikeshedding.
@@ -0,0 +1,230 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that the PR description says in rationale "Make testing with thin data streams possible." because testing with thin data stream has been possible for many months and people are testing with thin data streams daily. Please reword the rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated it.
@@ -0,0 +1,230 @@ | |||
#!/usr/bin/env python3 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script is slow, it takes more than 2 minutes to build scenarios for RHEL 9. We need to think to speed up.
build-scripts/build_tests.py
Outdated
benchmark_cpes = _get_benchmark_cpes(env_yaml) | ||
|
||
built_profiles_root = resolved_rules_dir.parent / "profiles" | ||
rules_in_profiles = list(_get_rules_in_profile(built_profiles_root)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a set because many rules are present in more than 1 profile so you have a lot of duplicates in this list. Converting to a set will automatically eliminate duplicates because each item can be only once in a set.
build-scripts/build_tests.py
Outdated
rule_root = rule_path.parent | ||
rule_id = rule_root.name | ||
if rule_id not in product_rules: | ||
logger.debug("Skipping %s since it is not in the product", rule_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to open the resolved rule yaml file if it is skipped. You can get the rule id from the file name. It will save some time if you don't open resolved rules that aren't going to be processed.
build-scripts/build_tests.py
Outdated
processes = list() | ||
for chunk in range(args.jobs): | ||
process_args = (benchmark_cpes, env_yaml, output_path, | ||
all_resolved_rules[chunk::args.jobs], templates_root, rules_in_profiles,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would split the rules_in_profiles into chunks instead of all_resolved_rules because the all_resolved_rules contain rules that aren't going to be processed because they are not present in the built content so the subprocesses won't be load balanced. If you split rules_in_profiles to chunks each subprocess would get the same amout of rules to be processed, which would be better load balancing.
build-scripts/build_tests.py
Outdated
file_contents = ssg.jinja.process_file_with_macros(str(test.absolute()), | ||
jinja_dict) | ||
scenario = tests.ssg_test_suite.rule.Scenario(test.name, file_contents) | ||
if scenario.matches_platform(benchmark_cpes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very inefficient.
The method "matches_platform" loads all CPEs for all platforms. It calls common.matches_platform
which calls common._get_platform_cpes
which iterates over all matched products and loads all CPEs for these products.
Therefore for many test scenarios we load many CPEs, including CPEs for other products and we repeat the loading for each test scenario. The result is that this script spends most of the time in loading CPEs.
We need to create a different and simplified way for test scenarios platform matching here.
build-scripts/build_tests.py
Outdated
_write_path(file_contents, output_file) | ||
file_contents = test.read_text() | ||
scenario = tests.ssg_test_suite.rule.Scenario(test.name, file_contents) | ||
if scenario.matches_platform(benchmark_cpes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we also use the problematic method
build-scripts/build_tests.py
Outdated
logger = logging.getLogger() | ||
for test in rule_tests_root.iterdir(): # type: pathlib.Path | ||
if not test.name.endswith(".sh"): | ||
logger.debug("Skipping file %s in rule %s is it doesn't end with .sh.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will skip files like for example linux_os/guide/auditing/auditd_configure_rules/audit_file_modification/audit_rules_unsuccessful_file_modification/tests/test_audit.rules
which are needed for the test scenarios to work.
There was some bad overrding of variables
build-scripts/build_tests.py
Outdated
if not line.startswith('#'): | ||
# The loop is now in the main test content, stop processing the file. | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test scenario linux_os/guide/system/accounts/accounts-session/accounts_tmout/tests/multiline_profile_d.fail.sh
which is marked as # platform = multi_platform_sle
gets build when I build RHEL 9 content. It shouldn't be there, it's for SUSE. The reason seems to be that this test scenario has empty line before the # platform = multi_platform_sle
line and therefore it's skipped by this code here.
SSG_ROOT = str(pathlib.Path(__file__).resolve().parent.parent.absolute()) | ||
JOB_COUNT = multiprocessing.cpu_count() | ||
T = TypeVar("T") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's really fast! It takes me less than 3 seconds on my laptop to build RHEL 9 tests. Great improvement!
build-scripts/build_tests.py
Outdated
) | ||
parser.add_argument( | ||
"--resolved-rules-dir", required=True, | ||
help="Directory with <rule-id>.yml resolved rule YAMLs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space after YAMLs
_write_path(content, rule_output_path / test.name) | ||
|
||
|
||
def _process_rules(env_yaml: Dict, output_path: pathlib.Path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Climate complains about code complexity of this function. Please split it to multiple functions. You can for example extract code related to templated tests to a new function _process_templated_tests which would be consistent with _process_local_tests.
build-scripts/build_tests.py
Outdated
deny_templated_scenarios = _get_deny_templated_scenarios(test_config_path) | ||
for test in template_tests_root.iterdir(): # type: pathlib.Path | ||
if not test.name.endswith(".sh") or test.name in deny_templated_scenarios: | ||
logging.warning("Skipping %s for %s as it is a denied test scenario", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger
macro(ssg_build_tests PRODUCT) | ||
add_custom_command( | ||
OUTPUT "${CMAKE_BINARY_DIR}/${PRODUCT}/tests/.tests_done" | ||
COMMAND env "PYTHONPATH=$ENV{PYTHONPATH}" "${PYTHON_EXECUTABLE}" "${CMAKE_SOURCE_DIR}/build-scripts/build_tests.py" --build-config-yaml "${CMAKE_BINARY_DIR}/build_config.yml" --resolved-rules-dir "${CMAKE_CURRENT_BINARY_DIR}/rules" --output "${CMAKE_CURRENT_BINARY_DIR}/tests" --product-yaml "${CMAKE_SOURCE_DIR}/products/${PRODUCT}/product.yml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works fine for me 👍
Code Climate has analyzed commit fd08c5a and detected 3 issues on this pull request. Here's the issue category breakdown:
The test coverage on the diff in this pull request is 100.0% (50% is the threshold). This pull request will bring the total coverage in the repository to 62.0% (0.0% change). View more on Code Climate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now. It works for me as I expect.
@comps please check it if it's good for you
Looks good to me, builds (at least on my RHEL-9) as intended. I don't see any empty rule directories anymore (so I'm assuming rules without tests are not present inside I already used it to find quite a lot of "broken" tests,
|
Description:
This is a very rough draft a script that renders the Automatus tests for a given product. The script will be clean up once the general form is agreed upon.
Rationale:
Make running the Automatus tests possible with out Automatus script.
Review Hints:
export ADDITIONAL_CMAKE_OPTION="-DSSG_BUILT_TESTS_ENABLED:BOOL=ON"
./build_product rhel10