Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPENSCAP-4118 - Add script to build tests #13029

Merged
merged 25 commits into from
Mar 12, 2025

Conversation

Mab879
Copy link
Member

@Mab879 Mab879 commented Feb 12, 2025

Description:

This is a very rough draft a script that renders the Automatus tests for a given product. The script will be clean up once the general form is agreed upon.

Rationale:

Make running the Automatus tests possible with out Automatus script.

Review Hints:

  1. export ADDITIONAL_CMAKE_OPTION="-DSSG_BUILT_TESTS_ENABLED:BOOL=ON"
  2. ./build_product rhel10

@Mab879 Mab879 added the Infrastructure Our content build system label Feb 12, 2025
@Mab879 Mab879 added this to the 0.1.77 milestone Feb 12, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Used by openshift-ci bot. label Feb 12, 2025
Copy link

openshift-ci bot commented Feb 12, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@Mab879
Copy link
Member Author

Mab879 commented Feb 14, 2025

Things left to do:

  1. Check on Automatus runs. They seem to be broken.
    Commands like ./automatus.py rule --datastream ../build/ssg-rhel10-ds.xml --libvirt qemu:///system automatus_rhel10 root_permissions_syslibrary_files are failing.

  2. I need to doc how to turn it on.

  3. Possible turn down the jobs number when building many products

  4. Verify the correct tests are being ran

Comment on lines 56 to 58
return (filename.endswith('.pass.sh')
or filename.endswith('.fail.sh') or
filename.endswith('.notapplicable.sh'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str.endswith() takes a tuple, so you can probably just

Suggested change
return (filename.endswith('.pass.sh')
or filename.endswith('.fail.sh') or
filename.endswith('.notapplicable.sh'))
return filename.endswith(('.pass.sh', '.fail.sh', '.notapplicable.sh'))

Comment on lines 71 to 75
for i in range(n):
end = start + chunk_size + (1 if i < remainder else 0)
inters.append(iter(items[start:end]))
start = end
return inters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind interleaving the split, you could use python's array slicing "step" property.

Given 3 workers and a total of 10 rules, you can assign rules to them like this:

>>> a = [1,2,3,4,5,6,7,8,9,10]
>>> a[0::3]
[1, 4, 7, 10]
>>> a[1::3]
[2, 5, 8]
>>> a[2::3]
[3, 6, 9]

where the first (start) number is worker index, and the second (step) number is the total number of workers.

This is what we use in Contest to avoid any off-by-one problems and ensure no rule is left behind, while making things just simpler and evenly distributing rule complexity across all workers (as complex/simple rules tend to alphabetically bunch up).

https://github.com/RHSecurityCompliance/contest/blob/1204f812956a855eb15d26b1014cb28e5306ee05/per-rule/test.py#L93-L97

@Mab879 Mab879 marked this pull request as ready for review February 25, 2025 21:57
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Used by openshift-ci bot. label Feb 25, 2025
@Mab879 Mab879 changed the title DRAFT: Add script to build tests Add script to build tests Feb 25, 2025
@Mab879 Mab879 requested review from comps and jan-cerny February 25, 2025 22:00
@Mab879 Mab879 changed the title Add script to build tests OPENSCAP-4118: Add script to build tests Feb 28, 2025
@marcusburghardt marcusburghardt changed the title OPENSCAP-4118: Add script to build tests OPENSCAP-4118 - Add script to build tests Mar 4, 2025
Copy link
Collaborator

@comps comps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some limited comments on the python code (as I am not familiar with the build system overall). Feel free to ignore any of them, as I don't know what is the level of code quality you want to maintain in this project.

Aside from the comments, consider (for an extra pedantic exercise):

  • typing consistency - most of the code seems untyped, but some functions have types despite being _ (a.k.a. internal, not part of API). I'm also not sure if done_file: pathlib.Path = output_path / ".test_done" does anything (with the inlined type)
  • Overall " vs ' consistency, they seem to be intermixed randomly
  • Lack of comments (especially in big functions like _process_rules()
  • Inconsistent indent (ssg.jinja.process_file_with_macros in _process_shared_file() seems to squash two arguments at the end, ssg.environment.open_environment in main() uses a different multi-line function call syntax.
  • f-strings for logger.* calls - note that you don't have to use the prehistoric %s syntax, if you pass only one argument, logging takes it as verbatim, so it can be an f-string like one of those you use for exceptions.

I'll test the functionality tomorrow, looking forward to it. 🙂

Comment on lines 60 to 66
def _get_deny_templated_scenarios(test_config_path):
deny_templated_scenarios = list()
if test_config_path.exists():
test_config = ssg.yaml.open_raw(str(test_config_path.absolute()))
if 'deny_templated_scenarios' in test_config:
deny_templated_scenarios = test_config['deny_templated_scenarios']
return deny_templated_scenarios
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Suggested change
def _get_deny_templated_scenarios(test_config_path):
deny_templated_scenarios = list()
if test_config_path.exists():
test_config = ssg.yaml.open_raw(str(test_config_path.absolute()))
if 'deny_templated_scenarios' in test_config:
deny_templated_scenarios = test_config['deny_templated_scenarios']
return deny_templated_scenarios
def _get_deny_templated_scenarios(test_config_path):
if test_config_path.exists():
test_config = ssg.yaml.open_raw(str(test_config_path.absolute()))
if 'deny_templated_scenarios' in test_config:
return set(test_config['deny_templated_scenarios'])
return set()

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .get() method is always shorter and in most of the cases faster:

Suggested change
def _get_deny_templated_scenarios(test_config_path):
deny_templated_scenarios = list()
if test_config_path.exists():
test_config = ssg.yaml.open_raw(str(test_config_path.absolute()))
if 'deny_templated_scenarios' in test_config:
deny_templated_scenarios = test_config['deny_templated_scenarios']
return deny_templated_scenarios
def _get_deny_templated_scenarios(test_config_path):
deny_templated_scenarios = set()
if test_config_path.exists():
test_config = ssg.yaml.open_raw(str(test_config_path.absolute()))
deny_templated_scenarios |= test_config.get('deny_templated_scenarios', [])
return deny_templated_scenarios

Comment on lines 73 to 75
with open(shared_script_path, 'w') as file:
file.write(file_contents)
file.write('\n')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with open(shared_script_path, 'w') as file:
file.write(file_contents)
file.write('\n')
_write_path(file_contents, shared_script_path)

?

(And in other similar cases.)

Comment on lines 203 to 204
root_path = pathlib.Path(args.root).resolve().absolute()
output_path = pathlib.Path(args.output).resolve().absolute()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.resolve() already makes the path absolute:

$ pydoc3 pathlib.Path.resolve | cat
Help on function resolve in pathlib.Path:

pathlib.Path.resolve = resolve(self, strict=False)
    Make the path absolute, resolving all symlinks on the way and also
    normalizing it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can clean that up.

Comment on lines 236 to 237
if __name__ == "__main__":
raise SystemExit(main())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would honestly just let python interpreter defaults handle exit code - raise an exception if something bad happens, otherwise let it finish cleanly.

If you really want an exception-less exit on resolved_rules_dir not existing (seems like an arbitrary choice?), just do sys.exit(1) in that one case. sys will take care of the exit-related exception handling.

Copy link
Member Author

@Mab879 Mab879 Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move to sys.exit. However, under the covers this is how sys.exit works.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess either way works. I remember some problems with raising SystemExit() with non-CPython intepreters, but we don't use those here, and those problems are probably fixed by now anyway. I would still leave the 0 exit to be done by default instead of SystemExit(0), but that's probably bikeshedding.

Mab879 added a commit to Mab879/content that referenced this pull request Mar 7, 2025
Mab879 added a commit to Mab879/content that referenced this pull request Mar 7, 2025
@@ -0,0 +1,230 @@
#!/usr/bin/env python3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that the PR description says in rationale "Make testing with thin data streams possible." because testing with thin data stream has been possible for many months and people are testing with thin data streams daily. Please reword the rationale.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it.

@@ -0,0 +1,230 @@
#!/usr/bin/env python3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is slow, it takes more than 2 minutes to build scenarios for RHEL 9. We need to think to speed up.

benchmark_cpes = _get_benchmark_cpes(env_yaml)

built_profiles_root = resolved_rules_dir.parent / "profiles"
rules_in_profiles = list(_get_rules_in_profile(built_profiles_root))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a set because many rules are present in more than 1 profile so you have a lot of duplicates in this list. Converting to a set will automatically eliminate duplicates because each item can be only once in a set.

rule_root = rule_path.parent
rule_id = rule_root.name
if rule_id not in product_rules:
logger.debug("Skipping %s since it is not in the product", rule_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to open the resolved rule yaml file if it is skipped. You can get the rule id from the file name. It will save some time if you don't open resolved rules that aren't going to be processed.

processes = list()
for chunk in range(args.jobs):
process_args = (benchmark_cpes, env_yaml, output_path,
all_resolved_rules[chunk::args.jobs], templates_root, rules_in_profiles,)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split the rules_in_profiles into chunks instead of all_resolved_rules because the all_resolved_rules contain rules that aren't going to be processed because they are not present in the built content so the subprocesses won't be load balanced. If you split rules_in_profiles to chunks each subprocess would get the same amout of rules to be processed, which would be better load balancing.

file_contents = ssg.jinja.process_file_with_macros(str(test.absolute()),
jinja_dict)
scenario = tests.ssg_test_suite.rule.Scenario(test.name, file_contents)
if scenario.matches_platform(benchmark_cpes):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very inefficient.

The method "matches_platform" loads all CPEs for all platforms. It calls common.matches_platform which calls common._get_platform_cpes which iterates over all matched products and loads all CPEs for these products.

Therefore for many test scenarios we load many CPEs, including CPEs for other products and we repeat the loading for each test scenario. The result is that this script spends most of the time in loading CPEs.

We need to create a different and simplified way for test scenarios platform matching here.

_write_path(file_contents, output_file)
file_contents = test.read_text()
scenario = tests.ssg_test_suite.rule.Scenario(test.name, file_contents)
if scenario.matches_platform(benchmark_cpes):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we also use the problematic method

logger = logging.getLogger()
for test in rule_tests_root.iterdir(): # type: pathlib.Path
if not test.name.endswith(".sh"):
logger.debug("Skipping file %s in rule %s is it doesn't end with .sh.",
Copy link
Collaborator

@jan-cerny jan-cerny Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will skip files like for example linux_os/guide/auditing/auditd_configure_rules/audit_file_modification/audit_rules_unsuccessful_file_modification/tests/test_audit.rules which are needed for the test scenarios to work.

@jan-cerny jan-cerny self-assigned this Mar 10, 2025
Comment on lines 102 to 104
if not line.startswith('#'):
# The loop is now in the main test content, stop processing the file.
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test scenario linux_os/guide/system/accounts/accounts-session/accounts_tmout/tests/multiline_profile_d.fail.sh which is marked as # platform = multi_platform_sle gets build when I build RHEL 9 content. It shouldn't be there, it's for SUSE. The reason seems to be that this test scenario has empty line before the # platform = multi_platform_sle line and therefore it's skipped by this code here.

SSG_ROOT = str(pathlib.Path(__file__).resolve().parent.parent.absolute())
JOB_COUNT = multiprocessing.cpu_count()
T = TypeVar("T")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's really fast! It takes me less than 3 seconds on my laptop to build RHEL 9 tests. Great improvement!

)
parser.add_argument(
"--resolved-rules-dir", required=True,
help="Directory with <rule-id>.yml resolved rule YAMLs"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space after YAMLs

_write_path(content, rule_output_path / test.name)


def _process_rules(env_yaml: Dict, output_path: pathlib.Path,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Climate complains about code complexity of this function. Please split it to multiple functions. You can for example extract code related to templated tests to a new function _process_templated_tests which would be consistent with _process_local_tests.

deny_templated_scenarios = _get_deny_templated_scenarios(test_config_path)
for test in template_tests_root.iterdir(): # type: pathlib.Path
if not test.name.endswith(".sh") or test.name in deny_templated_scenarios:
logging.warning("Skipping %s for %s as it is a denied test scenario",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger

macro(ssg_build_tests PRODUCT)
add_custom_command(
OUTPUT "${CMAKE_BINARY_DIR}/${PRODUCT}/tests/.tests_done"
COMMAND env "PYTHONPATH=$ENV{PYTHONPATH}" "${PYTHON_EXECUTABLE}" "${CMAKE_SOURCE_DIR}/build-scripts/build_tests.py" --build-config-yaml "${CMAKE_BINARY_DIR}/build_config.yml" --resolved-rules-dir "${CMAKE_CURRENT_BINARY_DIR}/rules" --output "${CMAKE_CURRENT_BINARY_DIR}/tests" --product-yaml "${CMAKE_SOURCE_DIR}/products/${PRODUCT}/product.yml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works fine for me 👍

Copy link

codeclimate bot commented Mar 10, 2025

Code Climate has analyzed commit fd08c5a and detected 3 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 3

The test coverage on the diff in this pull request is 100.0% (50% is the threshold).

This pull request will bring the total coverage in the repository to 62.0% (0.0% change).

View more on Code Climate.

Copy link
Collaborator

@jan-cerny jan-cerny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now. It works for me as I expect.

@comps please check it if it's good for you

@comps
Copy link
Collaborator

comps commented Mar 11, 2025

Looks good to me, builds (at least on my RHEL-9) as intended.

I don't see any empty rule directories anymore (so I'm assuming rules without tests are not present inside build/rhel9/tests/), non-sh files are also included, no {{ or }} found via greps, it all looks legit.

I already used it to find quite a lot of "broken" tests, .sh scripts that don't have a shebang at the top (presumably Automatus calls them via bash script.pass.sh explicitly).

$ cd build/rhel9/tests/
$ (IFS=$'\n'; for f in $(find . -type f -name '*.sh'); do h=$(head -n1 "$f"); [[ $h != '#!/bin/bash' ]] && echo $f; done) | wc -l
266

@jan-cerny jan-cerny merged commit a3eab6b into ComplianceAsCode:master Mar 12, 2025
110 of 111 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Infrastructure Our content build system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants