Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failed wazuh-db integration tests in 4.9.0 #23201

Open
MarcelKemp opened this issue Apr 30, 2024 · 10 comments · May be fixed by #23334
Open

Fix failed wazuh-db integration tests in 4.9.0 #23201

MarcelKemp opened this issue Apr 30, 2024 · 10 comments · May be fixed by #23334
Assignees

Comments

@MarcelKemp
Copy link
Member

MarcelKemp commented Apr 30, 2024

Wazuh version Component Install type Install method Platform
4.9.0 ITs Manager/Agent Packages/Sources Linux/Windows

Description

In the execution of the integration tests that are already migrated in this repository, we have found that there are several failed checks that need to be fixed.
These checks can be found in the following PR:

The reason for this should be determined, and the necessary changes implemented so that the ITs are fixed.

Failed ITs

Integration tests for Wazuh_db

            assert len(result) == int(test_max_files)+1, f'Expected {test_max_files} backup creation messages, but got {result}.'
            files = os.listdir(backups_path)
            total_files = len(files)
>           assert total_files == int(test_max_files), f'Wrong backup file ammount, expected {test_max_files}' \
                                                    f' but {total_files} are present in folder: {files}'
E           AssertionError: Wrong backup file ammount, expected 1 but 3 are present in folder: ['global.db-backup-2024-04-30-14:06:18-journal', 'global.db-backup-2024-04-30-14:06:13.gz', 'global.db-backup-2024-04-30-14:06:18']
E           assert 3 == 1
E            +  where 1 = int('1')

test_wazuh_db/test_backup/test_wdb_backup_configs.py:177: AssertionError
...
=========================== short test summary info ============================
FAILED test_wazuh_db/test_backup/test_wdb_backup_configs.py::test_wdb_backup_configs[Backup Enabled Interval 5s Max_Files 1]

Extra

The merge to master has seen a new failed check due to IT failures:

Integration tests for enrollment on Windows - Tier 0 and 1 / run-test (pull_request)

            try:
                # Start socket monitoring
                socket_monitor.start(timeout=10, accumulations=2, callback=lambda received_event: received_event.encode() in event)
    
>               assert socket_monitor.matches == 2
E               assert 0 == 2
E                +  where 0 = <wazuh_testing.tools.monitors.queue_monitor.QueueMonitor object at 0x00000204FB0C2EC0>.matches
=========================== short test summary info ===========================
FAILED test_enrollment/test_agent/test_agent_auth_enrollment.py::test_agent_auth_enrollment[Valid agent address]
FAILED test_enrollment/test_agent/test_agent_auth_enrollment.py::test_agent_auth_enrollment[Valid compressed agent IPv6 address]
=========== 2 failed, 59 passed, 122 warnings in 337.12s (0:05:37) ============
@LucioDonda
Copy link
Member

LucioDonda commented May 3, 2024

FAILED test_backup/test_wdb_backup_configs.py::test_wdb_backup_configs[Backup Enabled Interval 5s Max_Files 1]

Couldn't replicate it locally :

1rst try:

root@kinetic:/home/vagrant/workspace/qa-integration-framework# python3 -m pytest -s -l /home/vagrant/workspace/wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.10.7, pytest-7.1.2, pluggy-1.0.0+repack
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/vagrant/workspace/wazuh/tests/integration, configfile: pytest.ini
plugins: html-3.1.1, metadata-3.1.1, benchmark-3.2.2
collected 8 items                                                                                                                                                                                                 

../wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py 
Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
....Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.

========================================================================================== 8 passed in 379.03s (0:06:19) ==========================================================================================

2nd try:

root@kinetic:/home/vagrant/workspace/qa-integration-framework# python3 -m pytest -s -l /home/vagrant/workspace/wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.10.7, pytest-7.1.2, pluggy-1.0.0+repack
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/vagrant/workspace/wazuh/tests/integration, configfile: pytest.ini
plugins: html-3.1.1, metadata-3.1.1, benchmark-3.2.2
collected 8 items                                                                                                                                                                                                 

../wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
....Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.

========================================================================================== 8 passed in 299.93s (0:04:59) ==========================================================================================
root@kinetic:/home/vagrant/workspace/qa-integration-framework# python3 -m pytest -s -l /home/vagrant/workspace/wazuh/tests/integration/test_wazuh_db/test_backup/test_

FAILED test_databases/test_wazuh_db.py::test_wazuh_db_messages_agent[Synchronization value]
FAILED test_databases/test_wazuh_db.py::test_wazuh_db_messages_global[Synchronization value]

1st try:

root@kinetic:/home/vagrant/workspace/qa-integration-framework# python3 -m pytest -s -l /home/vagrant/workspace/wazuh/tests/integration/test_wazuh_db/test_databases/test_wazuh_db.py
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.10.7, pytest-7.1.2, pluggy-1.0.0+repack
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/vagrant/workspace/wazuh/tests/integration, configfile: pytest.ini
plugins: html-3.1.1, metadata-3.1.1, benchmark-3.2.2
collected 40 items                                                                                                                                                                                                

../wazuh/tests/integration/test_wazuh_db/test_databases/test_wazuh_db.py 2024/05/03 16:36:24 wazuh-db[53057] debug_op.c:116 at _log_function(): DEBUG: Logging module auto-initialized
2024/05/03 16:36:24 wazuh-db[53057] main.c:125 at main(): DEBUG: Wazuh home directory: /var/ossec
......................................s.

=================================================================================== 39 passed, 1 skipped in 1245.14s (0:20:45) ====================================================================================

2nd try:

root@kinetic:/home/vagrant/workspace/qa-integration-framework# python3 -m pytest -s -l /home/vagrant/workspace/wazuh/tests/integration/test_wazuh_db/test_databases/test_wazuh_db.py
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.10.7, pytest-7.1.2, pluggy-1.0.0+repack
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/vagrant/workspace/wazuh/tests/integration, configfile: pytest.ini
plugins: html-3.1.1, metadata-3.1.1, benchmark-3.2.2
collected 40 items                                                                                                                                                                                                

../wazuh/tests/integration/test_wazuh_db/test_databases/test_wazuh_db.py 2024/05/03 17:00:08 wazuh-db[60886] debug_op.c:116 at _log_function(): DEBUG: Logging module auto-initialized
2024/05/03 17:00:08 wazuh-db[60886] main.c:125 at main(): DEBUG: Wazuh home directory: /var/ossec
......................................s.

=================================================================================== 39 passed, 1 skipped in 1327.22s (0:22:07) ====================================================================================

All cases on 4.9.0 with latest changes (last commit 4da7f2b9b4d34b3aec212e7808830e19b83b6314). This can change from the conditions where the test originally failed. Checking locally from the rebase commit both test failed, base on that we can say that these test are flaky ones (we should double check what is causing that and if so then skip them) or if some of the commits inbetween those 2 did solve them.

Regarding Integration tests for enrollment on Windows - Tier 0 and 1 / run-test (pull_request) I'm still analyzing it.

c.c. @MarcelKemp @vikman90

@LucioDonda LucioDonda linked a pull request May 7, 2024 that will close this issue
@LucioDonda
Copy link
Member

Rechecking of the above comment through Github actions, with this PR
( modified some files to force the same ITs as in this issue) the results were:

ITs for wazuhDB:

=========================== short test summary info ============================
FAILED test_wazuh_db/test_backup/test_wdb_backup_configs.py::test_wdb_backup_configs[Backup Enabled Interval 5s Max_Files 1]
============ 1 failed, 127 passed, 1 skipped in 2504.75s (0:41:44) =============

ITs for enrollment on Windows:

=========================== short test summary info ===========================
FAILED test_enrollment/test_agent/test_agent_auth_enrollment.py::test_agent_auth_enrollment[Valid agent address]
=========== 1 failed, 60 passed, 122 warnings in 328.31s (0:05:28) ============

Continue looking for a change that could have impacted in those results.

@nmkoremblum
Copy link
Member

Rechecking of the above comment through Github actions, with this PR ( modified some files to force the same ITs as in this issue) the results were:

ITs for wazuhDB:

=========================== short test summary info ============================
FAILED test_wazuh_db/test_backup/test_wdb_backup_configs.py::test_wdb_backup_configs[Backup Enabled Interval 5s Max_Files 1]
============ 1 failed, 127 passed, 1 skipped in 2504.75s (0:41:44) =============

ITs for enrollment on Windows:

=========================== short test summary info ===========================
FAILED test_enrollment/test_agent/test_agent_auth_enrollment.py::test_agent_auth_enrollment[Valid agent address]
=========== 1 failed, 60 passed, 122 warnings in 328.31s (0:05:28) ============

Continue looking for a change that could have impacted in those results.

Following the previously mentioned steps, including the modifications introduced in #23334, I could not yet replicate the error in a local environment for the test_wdb_backup_configs

root@30-u22-manager:~# python3 -m pytest -s -l /root/wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py
======================================== test session starts =========================================
platform linux -- Python 3.10.12, pytest-7.1.2, pluggy-1.4.0
rootdir: /root/wazuh/tests/integration, configfile: pytest.ini
plugins: metadata-3.1.0, html-3.1.1, testinfra-5.0.0
collected 8 items                                                                                    

wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
....Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.

==================================== 8 passed in 81.69s (0:01:21) ====================================
root@30-u22-manager:~# python3 -m pytest -s -l /root/wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py
======================================== test session starts =========================================
platform linux -- Python 3.10.12, pytest-7.1.2, pluggy-1.4.0
rootdir: /root/wazuh/tests/integration, configfile: pytest.ini
plugins: metadata-3.1.0, html-3.1.1, testinfra-5.0.0
collected 8 items                                                                                    

wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
....Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.

==================================== 8 passed in 83.13s (0:01:23) ====================================

@nmkoremblum
Copy link
Member

I either was able to reproduce the errors of test_wazuh_db

root@30-u22-manager:~# python3 -m pytest -s -l /root/wazuh/tests/integration/test_wazuh_db/test_databases/test_wazuh_db.py
======================================== test session starts =========================================
platform linux -- Python 3.10.12, pytest-7.1.2, pluggy-1.4.0
rootdir: /root/wazuh/tests/integration, configfile: pytest.ini
plugins: metadata-3.1.0, html-3.1.1, testinfra-5.0.0
collected 40 items                                                                                   

wazuh/tests/integration/test_wazuh_db/test_databases/test_wazuh_db.py 2024/05/14 17:06:13 wazuh-db[33139] debug_op.c:116 at _log_function(): DEBUG: Logging module auto-initialized
2024/05/14 17:06:13 wazuh-db[33139] main.c:125 at main(): DEBUG: Wazuh home directory: /var/ossec
......................................s.

============================= 39 passed, 1 skipped in 1161.88s (0:19:21) =============================
root@30-u22-manager:~# python3 -m pytest -s -l /root/wazuh/tests/integration/test_wazuh_db/test_databases/test_wazuh_db.py
======================================== test session starts =========================================
platform linux -- Python 3.10.12, pytest-7.1.2, pluggy-1.4.0
rootdir: /root/wazuh/tests/integration, configfile: pytest.ini
plugins: metadata-3.1.0, html-3.1.1, testinfra-5.0.0
collected 40 items                                                                                   

wazuh/tests/integration/test_wazuh_db/test_databases/test_wazuh_db.py 2024/05/14 17:33:18 wazuh-db[35604] debug_op.c:116 at _log_function(): DEBUG: Logging module auto-initialized
2024/05/14 17:33:18 wazuh-db[35604] main.c:125 at main(): DEBUG: Wazuh home directory: /var/ossec
......................................s.

============================= 39 passed, 1 skipped in 1169.89s (0:19:29) =============================

@nmkoremblum
Copy link
Member

nmkoremblum commented May 15, 2024

I've re-run all the tests through #23334, and there is only one integration test that is still failing, which is the test_wdb_backup_configs.

Test output
Run cd tests/integration
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.1.2, pluggy-1.5.0
rootdir: /home/runner/work/wazuh/wazuh/tests/integration, configfile: pytest.ini
plugins: html-3.1.1, metadata-3.1.1
collected 129 items

test_wazuh_db/test_backup/test_db_backup.py ...........                  [  8%]
test_wazuh_db/test_backup/test_wdb_backup_configs.py .....F..            [ 14%]
test_wazuh_db/test_configuration/test_wazuhdb_getconfig.py ......        [ 19%]
test_wazuh_db/test_databases/test_agent_database_version.py .            [ 20%]
test_wazuh_db/test_databases/test_wazuh_db.py .......................... [ 40%]
............s.                                                           [ 51%]
test_wazuh_db/test_databases/test_wazuh_db_timeout.py .                  [ 51%]
test_wazuh_db/test_groups/test_get_groups_integrity.py ........          [ 58%]
test_wazuh_db/test_groups/test_set_agent_groups.py ..................... [ 74%]
..........                                                               [ 82%]
test_wazuh_db/test_groups/test_sync_agent_groups_get.py ................ [ 94%]
.......                                                                  [100%]

=================================== FAILURES ===================================
_______ test_wdb_backup_configs[Backup Enabled Interval 5s Max_Files 1] ________

test_configuration = {'metadata': {'description': None, 'enabled': 'yes', 'interval': '5s', 'max_files': '1', ...}, 'sections': [{'elements...check'}, {'attributes': [{'name': 'syscollector'}], 'elements': [{'disabled': {'value': 'yes'}}], 'section': 'wodle'}]}
test_metadata = {'description': None, 'enabled': 'yes', 'interval': '5s', 'max_files': '1', ...}
set_wazuh_configuration = None, truncate_monitored_files = None
remove_backups = None

    @pytest.mark.parametrize('test_configuration, test_metadata', zip(t_configurations, t_config_metadata), ids=t_case_ids)
    def test_wdb_backup_configs(test_configuration, test_metadata, set_wazuh_configuration,
                                truncate_monitored_files, remove_backups):
        '''
        description: Check that given different wdb backup configuration parameters, the expected behavior is achieved.
                     For this, the test gets a series of parameters for the wazuh_db_backups_conf.yaml file and applies
                     them to the manager's ossec.conf. It checks in case of erroneous configurations that the manager was
                     unable to start; otherwise it will check that after creating "max_files+1", there are a total of
                     "max_files" backup files in the backup folder.
                assert wazuh_log_monitor.callback_result, 'Did not receive expected ' \
                                                        '"Invalid value element for interval..." event'
    
                return
            elif not isinstance(test_max_files, numbers.Number) or test_max_files==0:
                wazuh_log_monitor.start(callback=callbacks.generate_callback(patterns.WRONG_MAX_FILES_CALLBACK), timeout=timeout)
                assert wazuh_log_monitor.callback_result, 'Did not receive expected ' \
                                                            '"Invalid value element for max_files..." event'
                return
            else:
                pytest.fail(f"Got unexpected Error: {err}")
    
        # Wait for backup files to be generated
        time.sleep(interval*(int(test_max_files)+1))
    
        # Manage if backup generation is not enabled - no backups expected
        if test_metadata['enabled'] == 'no':
            # Fail the test if a file or more were found in the backups_path
            if os.listdir(backups_path):
                pytest.fail("Error: A file was found in backups_path. No backups where expected when enabled is 'no'.")
        # Manage if backup generation is enabled - one or more backups expected
        else:
            wazuh_log_monitor.start(callback=callbacks.generate_callback(patterns.BACKUP_CREATION_CALLBACK),
                                            timeout=timeout, accumulations=int(test_max_files)+1)
            result = wazuh_log_monitor.callback_result
            assert result, 'Did not receive expected\
                            "Created Global database..." event'
    
    
            assert len(result) == int(test_max_files)+1, f'Expected {test_max_files} backup creation messages, but got {result}.'
            files = os.listdir(backups_path)
            total_files = len(files)
>           assert total_files == int(test_max_files), f'Wrong backup file ammount, expected {test_max_files}' \
                                                    f' but {total_files} are present in folder: {files}'
E           AssertionError: Wrong backup file ammount, expected 1 but 3 are present in folder: ['global.db-backup-2024-05-15-06:07:08', 'global.db-backup-2024-05-15-06:07:08-journal', 'global.db-backup-2024-05-15-06:07:03.gz']
E           assert 3 == 1
E            +  where 1 = int('1')

test_wazuh_db/test_backup/test_wdb_backup_configs.py:177: AssertionError
=========================== short test summary info ============================
FAILED test_wazuh_db/test_backup/test_wdb_backup_configs.py::test_wdb_backup_configs[Backup Enabled Interval 5s Max_Files 1]
============ 1 failed, 127 passed, 1 skipped in 2467.49s (0:41:07) =============
Error: Process completed with exit code 1.

As mentioned here, this test does not fail in a local environment. So further research must be done.

@nmkoremblum
Copy link
Member

nmkoremblum commented May 15, 2024

I performed several tests, and it seems that there is no influence of other test_backup tests over the failure of the one under study, as can be seen in the output of this job, where all other tests where suppressed:

Test output
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.1.2, pluggy-1.5.0
rootdir: /home/runner/work/wazuh/wazuh/tests/integration, configfile: pytest.ini
plugins: html-3.1.1, metadata-3.1.1
collected 1 item

test_wazuh_db/test_backup/test_wdb_backup_configs.py F                   [100%]

=================================== FAILURES ===================================
_______ test_wdb_backup_configs[Backup Enabled Interval 5s Max_Files 1] ________

test_configuration = {'metadata': {'description': None, 'enabled': 'yes', 'interval': '5s', 'max_files': '1', ...}, 'sections': [{'elements...check'}, {'attributes': [{'name': 'syscollector'}], 'elements': [{'disabled': {'value': 'yes'}}], 'section': 'wodle'}]}
test_metadata = {'description': None, 'enabled': 'yes', 'interval': '5s', 'max_files': '1', ...}
set_wazuh_configuration = None, truncate_monitored_files = None
remove_backups = None

    @pytest.mark.parametrize('test_configuration, test_metadata', zip(t_configurations, t_config_metadata), ids=t_case_ids)
    def test_wdb_backup_configs(test_configuration, test_metadata, set_wazuh_configuration,
                                truncate_monitored_files, remove_backups):
        '''
        description: Check that given different wdb backup configuration parameters, the expected behavior is achieved.
                     For this, the test gets a series of parameters for the wazuh_db_backups_conf.yaml file and applies
                     them to the manager's ossec.conf. It checks in case of erroneous configurations that the manager was
                     unable to start; otherwise it will check that after creating "max_files+1", there are a total of
                     "max_files" backup files in the backup folder.
    
        wazuh_min_version: 4.4.0
    
        parameters:
            - test_configuration:
                type: dict
                brief: Configuration loaded from `configuration_templates`.
            - test_metadata:
                type: dict
                brief: Test case metadata.
            - set_wazuh_configuration:
                type: fixture
                brief: Apply changes to the ossec.conf configuration.
            - truncate_monitored_files:
                type: fixture
                brief: Truncate all the log files and json alerts files before and after the test execution.
            - remove_backups:
                type: fixture
                brief: Creates the folder where the backups will be stored in case it doesn't exist. It clears it when the
                       test yields.
        assertions:
            - Verify that manager starts behavior is correct for any given configuration.
            - Verify that the backup file has been created, wait for "max_files+1".
            - Verify that after "max_files+1" files created, there's only "max_files" in the folder.
    
        input_description:
            - Test cases are defined in the parameters and metada variables, that will be applied to the the
              wazuh_db_backup_command.yaml file. The parameters tested are: "enabled", "interval" and "max_files".
              With the given input the test will check the correct behavior of wdb automatic global db backups.
    
        expected_output:
            - f"Invalid value element for interval..."
            - f"Invalid value element for max_files..."
            - f'Did not receive expected "Created Global database..." event'
            - f'Expected {test_max_files} backup creation messages, but got {result}'
            - f'Wrong backup file ammount, expected {test_max_files} but {total_files} are present in folder.
    
        tags:
            - wazuh_db
            - wdb_socket
    
        '''
        test_interval = test_metadata['interval']
        test_max_files = test_metadata['max_files']
        wazuh_log_monitor = file_monitor.FileMonitor(WAZUH_LOG_PATH)
        try:
            control_service('restart')
        except (subprocess.CalledProcessError, ValueError) as err:
            if not validate_interval_format(test_interval):
                wazuh_log_monitor.start(callback=callbacks.generate_callback(patterns.WRONG_INTERVAL_CALLBACK), timeout=timeout)
                assert wazuh_log_monitor.callback_result, 'Did not receive expected ' \
                                                        '"Invalid value element for interval..." event'
    
                return
            elif not isinstance(test_max_files, numbers.Number) or test_max_files==0:
                wazuh_log_monitor.start(callback=callbacks.generate_callback(patterns.WRONG_MAX_FILES_CALLBACK), timeout=timeout)
                assert wazuh_log_monitor.callback_result, 'Did not receive expected ' \
                                                            '"Invalid value element for max_files..." event'
                return
            else:
                pytest.fail(f"Got unexpected Error: {err}")
    
        # Wait for backup files to be generated
        time.sleep(interval*(int(test_max_files)+1))
    
        # Manage if backup generation is not enabled - no backups expected
        if test_metadata['enabled'] == 'no':
            # Fail the test if a file or more were found in the backups_path
            if os.listdir(backups_path):
                pytest.fail("Error: A file was found in backups_path. No backups where expected when enabled is 'no'.")
        # Manage if backup generation is enabled - one or more backups expected
        else:
            wazuh_log_monitor.start(callback=callbacks.generate_callback(patterns.BACKUP_CREATION_CALLBACK),
                                            timeout=timeout, accumulations=int(test_max_files)+1)
            result = wazuh_log_monitor.callback_result
            assert result, 'Did not receive expected\
                            "Created Global database..." event'
    
    
            assert len(result) == int(test_max_files)+1, f'Expected {test_max_files} backup creation messages, but got {result}.'
            files = os.listdir(backups_path)
            total_files = len(files)
>           assert total_files == int(test_max_files), f'Wrong backup file ammount, expected {test_max_files}' \
                                                    f' but {total_files} are present in folder: {files}'
E           AssertionError: Wrong backup file ammount, expected 1 but 3 are present in folder: ['global.db-backup-2024-05-15-17:50:55', 'global.db-backup-2024-05-15-17:50:50.gz', 'global.db-backup-2024-05-15-17:50:55-journal']
E           assert 3 == 1
E            +  where 1 = int('1')

test_wazuh_db/test_backup/test_wdb_backup_configs.py:177: AssertionError
=========================== short test summary info ============================
FAILED test_wazuh_db/test_backup/test_wdb_backup_configs.py::test_wdb_backup_configs[Backup Enabled Interval 5s Max_Files 1]
============================== 1 failed in 30.88s ==============================
Error: Process completed with exit code 1.

@nmkoremblum
Copy link
Member

After disabling all other integration tests, the test in question succeeds, as demonstrated in the output of this job:

Run cd tests/integration
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-7.1.2, pluggy-1.5.0
rootdir: /home/runner/work/wazuh/wazuh/tests/integration, configfile: pytest.ini
plugins: html-3.1.1, metadata-3.1.1
collected 1 items

test_wazuh_db/test_backup/test_wdb_backup_configs.py .                   [100%]

========================= 1 passed in 39.78s (0:00:39) =========================

When running the test in a local environment and monitoring the backup directory, the extra files are not generated by the test itself. This observation is made using the inotifywait tool:

root@30-u22-manager:~# inotifywait -m /var/ossec/backup/db/
Setting up watches.
Watches established.
/var/ossec/backup/db/ OPEN,ISDIR 
/var/ossec/backup/db/ ACCESS,ISDIR 
/var/ossec/backup/db/ DELETE global.db-backup-2024-05-16-18:44:38.gz
/var/ossec/backup/db/ CLOSE_NOWRITE,CLOSE,ISDIR 
/var/ossec/backup/db/ DELETE_SELF 

However, after the test finishes and the Manager is running, we observe the creation of "extra" files by monitoring the same directory:

root@30-u22-manager:~# inotifywait -m /var/ossec/backup/db/
Setting up watches.
Watches established.
/var/ossec/backup/db/ CREATE global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ OPEN global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ CREATE global.db-backup-2024-05-16-18:47:50-journal
/var/ossec/backup/db/ OPEN global.db-backup-2024-05-16-18:47:50-journal
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50-journal
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50-journal
/var/ossec/backup/db/ OPEN,ISDIR 
/var/ossec/backup/db/ CLOSE_NOWRITE,CLOSE,ISDIR 
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ CLOSE_WRITE,CLOSE global.db-backup-2024-05-16-18:47:50-journal
/var/ossec/backup/db/ DELETE global.db-backup-2024-05-16-18:47:50-journal
/var/ossec/backup/db/ CLOSE_WRITE,CLOSE global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ OPEN global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ CREATE global.db-backup-2024-05-16-18:47:50.gz
/var/ossec/backup/db/ OPEN global.db-backup-2024-05-16-18:47:50.gz
/var/ossec/backup/db/ ACCESS global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ ACCESS global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ ACCESS global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ ACCESS global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ ACCESS global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ ACCESS global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ ACCESS global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ ACCESS global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ ACCESS global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ ACCESS global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ CLOSE_NOWRITE,CLOSE global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ MODIFY global.db-backup-2024-05-16-18:47:50.gz
/var/ossec/backup/db/ CLOSE_WRITE,CLOSE global.db-backup-2024-05-16-18:47:50.gz
/var/ossec/backup/db/ DELETE global.db-backup-2024-05-16-18:47:50
/var/ossec/backup/db/ OPEN,ISDIR 
/var/ossec/backup/db/ ACCESS,ISDIR 
/var/ossec/backup/db/ CLOSE_NOWRITE,CLOSE,ISDIR 
/var/ossec/backup/db/ OPEN,ISDIR 
/var/ossec/backup/db/ ACCESS,ISDIR 
/var/ossec/backup/db/ ACCESS,ISDIR 
/var/ossec/backup/db/ CLOSE_NOWRITE,CLOSE,ISDIR 
/var/ossec/backup/db/ DELETE global.db-backup-2024-05-16-18:47:45.gz

These operations are repeated approximately every 5 seconds.

So, another test is likely affecting the results, leaving unexpected files in the backup directory. Alternatively, the variations in machine performance might cause timing issues, such as the "sleep" in the test delaying the execution of assertions.

@LucioDonda
Copy link
Member

Some point on the analysis :

Issue in the test, the interval variable is not being applied, it need something like:

root@kinetic:/home/vagrant/workspace/wazuh# git diff
diff --git a/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py b/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py
index 3742b4e2d9..dc0556fd5f 100644
--- a/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py
+++ b/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py
@@ -61,7 +61,7 @@ from wazuh_testing.utils.services import control_service
 from wazuh_testing.tools.monitors import file_monitor
 from wazuh_testing.utils import callbacks
 from wazuh_testing.constants.paths.logs import WAZUH_PATH, WAZUH_LOG_PATH
-from wazuh_testing.utils.time import validate_interval_format
+from wazuh_testing.utils.time import validate_interval_format, time_to_seconds
 from wazuh_testing.modules.wazuh_db import patterns
 from wazuh_testing.utils import configuration
 
@@ -154,6 +154,8 @@ def test_wdb_backup_configs(test_configuration, test_metadata, set_wazuh_configu
         else:
             pytest.fail(f"Got unexpected Error: {err}")
 
+    interval = time_to_seconds(test_interval)
+
     # Wait for backup files to be generated
     time.sleep(interval*(int(test_max_files)+1))

besides that, the code has this value overriden in code, so it doesn't make any change.

Varying time value does not change anything in the results:

  • from none to 10 seconds (this amount multiplied by max files is the seconds the tests sleeps before checking the logs) the result is the same. This was an original hipothesis thinking of variations between a local environment and jenkins :
0 seconds

# python3 -m pytest -s -l /home/vagrant/workspace/wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py  
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.10.7, pytest-7.1.2, pluggy-1.0.0+repack
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/vagrant/workspace/wazuh/tests/integration, configfile: pytest.ini
plugins: html-3.1.1, testinfra-5.0.0, metadata-3.1.1, benchmark-3.2.2
collected 8 items                                                                                                                                                                                                 

../wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
....Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.

=============================================================================================== 8 passed in 44.65s ================================================================================================

5 seconds - default

root@kinetic:/home/vagrant/workspace/qa-integration-framework# python3 -m pytest -s -l /home/vagrant/workspace/wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py  
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.10.7, pytest-7.1.2, pluggy-1.0.0+repack
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/vagrant/workspace/wazuh/tests/integration, configfile: pytest.ini
plugins: html-3.1.1, testinfra-5.0.0, metadata-3.1.1, benchmark-3.2.2
collected 8 items                                                                                                                                                                                                 

../wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
....Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.

========================================================================================== 8 passed in 76.21s (0:01:16) ===========================================================================================

10 seconds

# python3 -m pytest -s -l /home/vagrant/workspace/wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py  
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.10.7, pytest-7.1.2, pluggy-1.0.0+repack
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/vagrant/workspace/wazuh/tests/integration, configfile: pytest.ini
plugins: html-3.1.1, testinfra-5.0.0, metadata-3.1.1, benchmark-3.2.2
collected 8 items                                                                                                                                                                                                 

../wazuh/tests/integration/test_wazuh_db/test_backup/test_wdb_backup_configs.py Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
....Job for wazuh-manager.service failed because the control process exited with error code.
See "systemctl status wazuh-manager.service" and "journalctl -xeu wazuh-manager.service" for details.
.

========================================================================================== 8 passed in 441.50s (0:07:21) ==========================================================================================

Is this time sleep time necesary? is it attached to some variable ?

Max files:

One of the test arguments is max_files, but is being check with and equal comparator, shouldn't we use a minor equal?

Time for backup DB.

Through the wdb-query script it can take an average of 35 millisecconds per DB:

time for i in {1..1000}; do echo 'global backup create' | wazuh-tools/utils/wdb-query.py ; done

run real user sys
1 0m35.121s 0m23.918s 0m6.898s
2 0m35.778s 0m24.250s 0m7.021s
3 0m37.168s 0m25.361s 0m7.061s

c.c. @cborla

@LucioDonda
Copy link
Member

LucioDonda commented May 20, 2024

During the last day I've repeated the tests in different ways, using the variable of time and adding some additional messages but sadly wasn't able to replicate the error locally. Some remarks:

  • The backups are being generated after the time defined as INTERVAL in the test but also checked after INTERVAL times ( max_files + 1 ). that leaves us a repetitive scenario and makes possible the fact that while checking the last one another backup can be generated.
  • Based on that I've modified and wainting for results here
  • In the same PR I've pushed some windows cahge in prder to force it's execution for recheck.
  • There are some logs errror that appear on the start of 4.9.0 (only on the first run):
    2024/05/20 17:08:14 wazuh-modulesd:vulnerability-scanner: ERROR: VulnerabilityScannerFacade::initEventDispatcher: [json.exception.type_error.302] type must be string, but is object this can cause a delay on the first test, but not very likely to affect the results.

@cborla cborla linked a pull request May 21, 2024 that will close this issue
@cborla
Copy link
Member

cborla commented May 21, 2024

Backup Test Failure Report

Description

This test simulates the backup process at intervals of 5 seconds with a maximum of 3 files retained. The failure occurs due to the mismatch in the expected number of backups at specific intervals. The sleep interval is accurately calculated to validate the backup process. The backup creation and deletion cycle repeats every 5 seconds, maintaining a maximum of 3 backups at any given time.

Test Configuration

  • Name: Backup Enabled Interval 5s Max_Files 3

  • Description:

  • Configuration Parameters:

    • ENABLED: 'yes'
    • INTERVAL: '5s'
    • MAX_FILES: '3'
  • Metadata:

    • enabled: 'yes'
    • interval: '5s'
    • max_files: '3'

Calculation for Sleep Interval

The sleep interval is calculated as time.sleep(interval * (int(test_max_files) + 1)) after restarting the manager. For this configuration:

  • Sleep Interval Calculation:
    • sleep = 5s * (3 + 1) = 20 seconds

This interval is a multiple of the periods during which a new backup is generated.

Backup Generation Process

Here is a detailed breakdown of the backup generation process:

  1. At Restart:

    • Generates 1 backup, total = 1.
  2. At 5 Seconds:

    • Generates 1 backup, total = 2.
  3. At 10 Seconds:

    • Generates 1 backup, total = 3.
  4. At 15 Seconds:

    • Generates 1 backup and deletes 1 old backup, total = 3.
  5. At 20 Seconds:

    • Generates 1 backup and deletes 1 old backup, total = 3.

Detailed Process at the 20-second Mark

At the 20-second mark, the backup manager performs the following steps:

  1. Creates backup file.
  2. Creates backup-journal file.
  3. Copies data to backup-journal.
  4. Copies data to backup.
  5. Closes and deletes backup-journal.
  6. Closes backup.
  7. Compresses and creates backup.gz.
  8. Deletes the backup file.
  9. Deletes previous backup.gz files, retaining only the most recent one.

Updated Test Configuration

  • Name: Backup Enabled Interval 4s Max_Files 3

  • Description:

  • Configuration Parameters:

    • ENABLED: 'yes'
      • Backup enabled.
    • INTERVAL: '4s'
      • The manager generates a backup every 4 seconds.
    • MAX_FILES: '3'
      • Stores up to 3 backups in the directory.
  • Metadata:

    • enabled: 'yes'
      • Test enabled.
    • interval: '4s'
      • Time interval used to determine when to validate.
    • max_files: '3'
      • Expected number of files to be found.

Calculation for Updated Sleep Interval

With the new configuration:

  • Sleep Interval Calculation:
    • sleep = 4s * (3 + 1) = 16 seconds

Updated Backup Generation Process

  1. At Restart:

    • Generates 1 backup, total = 1.
  2. At 4 Seconds:

    • Generates 1 backup, total = 2.
  3. At 8 Seconds:

    • Generates 1 backup, total = 3.
  4. At 12 Seconds:

    • Generates 1 backup and deletes 1 old backup, total = 3.
  5. At 16 Seconds:

    • Generates 1 backup and deletes 1 old backup, total = 3.

Advantages of the New Configuration

  • Avoids synchronization issues: The sleep interval (16s) is no longer a multiple of the backup generation time (5s).

The proposed change to a 4-second interval aims to resolve the issue of having a sleep interval that is a multiple of the backup generation time. This should enhance the reliability and consistency of the backup process during tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

5 participants