Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in Github Action Tests #873

Closed
vcerenu opened this issue Jan 10, 2023 · 9 comments · Fixed by #881
Closed

Errors in Github Action Tests #873

vcerenu opened this issue Jan 10, 2023 · 9 comments · Fixed by #881
Assignees
Labels

Comments

@vcerenu
Copy link
Member

vcerenu commented Jan 10, 2023

In the 4.4 branch, the tests performed when creating a Pull Request finish with the error:

image

It must be verified because the error is generated when trying to start the Wazuh manager service.

@vcerenu vcerenu self-assigned this Jan 10, 2023
@vcerenu
Copy link
Member Author

vcerenu commented Jan 11, 2023

Errors were found when taking action on services within the Docker images where the test is installed.

   TASK [../../roles/wazuh/ansible-wazuh-manager : Ensure Wazuh Manager service is started and enabled.] ***
   task path: /home/runner/work/wazuh-ansible/wazuh-ansible/roles/wazuh/ansible-wazuh-manager/tasks/main.yml:317
   fatal: [wazuh_manager_centos7]: FAILED! => {"changed": false, "cmd": "/usr/bin/systemctl", "msg": "Failed to get D-Bus connection: No such file or directory", "rc": 1, "stderr ": "Failed to get D-Bus connection: No such file or directory\n", "stderr_lines": ["Failed to get D-Bus connection: No such file or directory"], "stdout": "", " stdout_lines": []}
   fatal: [wazuh_manager_debian9]: FAILED! => {"changed": false, "cmd": "/bin/systemctl", "msg": "Failed to connect to bus: No such file or directory", "rc": 1, "stderr": "Failed to connect to bus: No such file or directory\n", "stderr_lines": ["Failed to connect to bus: No such file or directory"], "stdout": "", "stdout_lines": []}

Various problems that may have occurred with processes that use the same Docker images and are having the same problems were reviewed. In the first instance I did not find information and it was extremely necessary to carry out actions with services since with this we can start the complete Wazuh stack that we are installing. After a long search I found a note from the creator of the images in a blog:
https://www.jeffgeerling.com/blog/2022/docker-and-systemd-getting-rid-dreaded-failed-connect-bus-error
In this note he suggests adding some parameters due to Docker updates. These changes worked for the debian test, but the Centos test still didn't work.

As a last measure, creating a Centos own image to carry out the tests was analyzed, a first test was made by creating a Dockerfile with the base image that we are using and adding a systemctl file that I looked for in a repository of a user who had the same problem as we have:
https://github.com/gdraheim/docker-systemctl-images/blob/master/files/docker/systemctl.py

A first test was carried out with the new image created and it responds to the systemctl command, but a different error was generated:

fatal: [wazuh_manager_centos7]: FAILED! => {"changed": false, "msg": "Unable to start service filebeat: "}

The investigation continues with this image.

@vcerenu
Copy link
Member Author

vcerenu commented Jan 12, 2023

The found systemctl file worked but with some problems, so I'm still looking for a better version of it, in addition to reviewing some additional parameters that can be added to be able to use systemd without problems.

It was possible to install the necessary tools, but when pasting the indexer route in the filebeat template, the following result was obtained

# Send events directly to Wazuh indexer
output.elasticsearch:
  hosts:
  - i
  - n
  - d
  - e
  - x
  - e
  - r
  - _
  - c
  - e
  - n
  - t
  - o
  - s
  - 7
  - :
  - 9
  - 2
  - 0
  - 0

The investigation continues

@teddytpc1
Copy link
Member

Some changes were added in the converge playbook for the default scenario to solve the filebeat.yml format. The variable filebeat_output_indexer_hosts was interpreted as a list instead of an item in a list, causing the filebeat.yml format to be corrupted.

Also, the runner image was changed from ubuntu-latest to ubuntu-20.04 and that solved an issue when trying to restart the Filebeat service.

@teddytpc1
Copy link
Member

We have executed several tests with @vcerenu and apparently we have found a solution.
We are still testing.

Changes:

  • Removed the command for the CentOS images in the molecule test.
  • Changed the systemctl executable
  • Modified the GitHub runner image from ubuntu-latest to ubuntu-20.04

@teddytpc1
Copy link
Member

The provedure was tested with the automatic GH Actions tests and also with the Demo deployment. Both cases succeeded.

@vcerenu
Copy link
Member Author

vcerenu commented Jan 20, 2023

Distributed test keeps failing.

The parameter cgroupns_mode: host has been added, but the systemctl command still does not work with the following error:

Failed to get D-Bus connection: Operation not permitted

Added the command: /usr/sbin/init parameter, which starts systemd between various tasks but produces the following error:

Failed to get D-Bus connection: No such file or directory

It was revised that there are no files inside the /run directory and that is why the command is not executed.

Reviewed other amazonlinux and fedora based images, but they all have the same problem.

Fixed within a repository (https://github.com/gdraheim/docker-systemctl-images) that modify the systemctl binary, so that it works correctly. A new image was created based on the image we used previously with the following Dockerfile:

FROM geerlingguy/docker-centos7-ansible

# Install systemd
COPY files/docker/systemctl.py /usr/bin/systemctl

RUN mkdir /run/wazuh-indexer && chmod 777 /run/wazuh-indexer
VOLUME [ "/sys/fs/cgroup" ]
CMD["/usr/bin/systemctl"]

With this image we were able to start the service, but for no apparent reason it does not show status, so it is impossible to know if the service is running correctly.

@teddytpc1 teddytpc1 self-assigned this Jan 24, 2023
@teddytpc1
Copy link
Member

We decided to try to configure an EC2 instance to execute the tests instead of Docker containers.
I found this link that explains how to deploy a GH self-hosted runner using GH Actions.
I ran into several issues:

  • Misconfiguration of GH Access token.
  • Issues when configuring the Runner to connect to GH. Tried several configurations, checked the SG configuration and then tried running installdependencies.sh and generating a new AMI (link). This solved the issue and the workflow finished successfully.

Now I need to develop the test using that instance. Once this is working, we will need to define what tests and Distros we will use.
We will need to create an AMI for each OS to execute the tests.

@teddytpc1
Copy link
Member

Tasks performed:

  • Added the playbooks for Wazuh Server and AIO single instance installations.
  • Added the steps in the workflow to run the playbook.
  • Added variables to set Ansible color in the logs.
  • Tested and corrected both playbooks.
  • Modified the instance type for AIO test.
  • Created a new AMI with more disk space.
  • Adapted steps names.
  • Unified both tests in a single workflow to optimize resources.
  • Added tags to the EC2 instances.
  • Rearranged EC2 creation steps into a new YAML.
  • Moved the CentOS AMI to a new region.
  • Created the Ubuntu 22.04 AMI.
  • Updated the secrets.

@teddytpc1 teddytpc1 linked a pull request Jan 25, 2023 that will close this issue
@teddytpc1
Copy link
Member

The EC2 AL2 AMI was created along with the workflows for that OS and tested. It works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants