Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent/server communication protocol #22677

Open
4 of 6 tasks
havidarou opened this issue Mar 27, 2024 · 2 comments
Open
4 of 6 tasks

Agent/server communication protocol #22677

havidarou opened this issue Mar 27, 2024 · 2 comments
Assignees
Labels

Comments

@havidarou
Copy link
Member

havidarou commented Mar 27, 2024

Description

Wazuh's current communication setup is complex and lacks a standardized approach. This complexity causes compatibility issues. Simplifying and standardizing the protocol is crucial for better efficiency and security.

Currently, we have two models of communication:

  • Agent to server: data gathered by the agent flows to the server to be processed. The agent does not expect anything in return. The data is processed and then presented to the users for analysis. We expect this data flow to be almost constant. This communication should be event-driven, and it shouldn't be connection-oriented, as we pretend to scale the system by load-balancing per request, instead of per-connection load balancing.
  • Server to agent: when the user needs to configure/update an agent, or the server needs the agent to take some action. We expect this data flow to be scarce, but because we need the agent to react to some events, we need to be able to contact any agent at any time. This communication is always started by the agent and it is connection-oriented so we can react as soon as possible to an event.

These two models are very different, so we must address their design differences carefully. We want these two models to share some capabilities, including:

  • Use the same registration and authorization.
  • Be part of the same API.
  • Share the same balancing and networking infrastructure.
  • Share the same security protocol.

Concepts:

  • Agent comms API. This API is used between agents and servers communications and it is the main focus of this issue.
  • Server management API. This is the existing server API, used to manage the server configuration.
  • Agent: Any client that connects to the server via the Agent comms API.
  • Endpoint agent: This is the current Wazuh agent. It is the agent used to protect endpoints.

Registration, authorization, and authentication

All agents must be registered, authorized, and authenticated to connect to the server. This is mandatory for all agents.

The registration process will use the current Server management API. It therefore takes advantage of its RBAC (a specific policy related to agent registration should be created/reviewed).

The authorization and authentication process will use the Agent comms API. We want to use a well-known protocol to implement authorization and authentication using standard libraries and tools. We might consider oAuth 2.0 with JWTs.

Agent comms API

The Agent comms API design should allow us to use third-party infrastructure such as standard load balancers, application firewalls, cache servers, etc, transparently. This means a standard communication protocol is mandatory, including standard security protocols widely adopted by the industry, so we are compatible with current public key infrastructure deployments.

This API must have a version, and all the endpoints and behaviors must be documented so anyone can build agents and develop tests with it.

Endpoint agent changes

It should restart as few times as possible during its normal operation. For instance, it should not restart when receiving a remote configuration or when connecting to a different server.

It should contain the minimum number of binaries and daemons. Existing binaries and daemons will be removed/reviewed.

The deployment process will change to adopt a standardized approach. No more environment variables are used for deployment (they are not consistent across OSs).

Functional requirements

Note: The following functional requirements are expressed from the communications protocol and endpoint agent standpoints.

Agent comms API

Communication

  • Support agent-to-server communication with a standard protocol.
  • The agent-to-server communication is asynchronous and event-driven (this type of communication is not connection-oriented). This will allow for event balancing between servers.
  • Support server-to-agent push in near real time with a standard protocol.
  • The server-to-agent communication is connection-oriented. It is mandatory to accept server actions in near real time.
  • The communication protocol must provide back pressure.
  • The communication protocol must provide rate limiting.

Security

  • The communication protocol must provide a ciphered connection.
  • Incorporate standard security protocols compatible with current public key infrastructure deployments.

Authorization and authentication

  • Implement a standard protocol for secure authentication and authorization.
  • Use a standard protocol to manage sessions and secure communications between agents and the server.

API design

  • Develop a unified Agent comms API for all communications between agents and the server.
  • Ensure the API supports version control and is fully documented.

Agent deployment

  • Agents must register using the Server management API.
  • Introduce or review RBAC policies for agent registration in the Server management API.
  • Endpoint agents will introduce an Agent CLI used to initialize their configuration, including their registration. This CLI will be standardized for all compatible OS.

Endpoint agent behavior

  • The endpoint will not restart when receiving a remote configuration.

Non-functional requirements

  • Scalability:

    • The system must efficiently scale to handle communications from a large number of agents without degradation in performance, both vertically and horizontally.
  • Interoperability:

    • The protocol should be interoperable with existing systems and future technologies, promoting seamless integration.
  • Security:

    • Implement industry-standard encryption for all data in transit.
    • Ensure all components comply with the latest security standards to minimize vulnerabilities.
  • Reliability:

    • The communication system must ensure high availability and fault tolerance.
    • Implement efficient error handling and recovery mechanisms.
  • Performance:

    • Minimize latency in server-to-agent communications for timely execution of actions.
    • Optimize agent-to-server communications to handle high throughput.
  • Usability:

    • The API should be designed for ease of use, allowing third parties to easily integrate and build upon.
  • Maintainability:

    • The system should be easy to update and maintain without significant downtime.
  • Metrics and statistics:

    • The implementation should provide metrics and statistics about its resources. We should rely on standard libraries to build and report metrics.

Implementation restrictions

  • There is an Agent comms API endpoint per Agent module (FIM, Logcollector, SCA, etc). These endpoints can be either stateless or stateful type.
  • There are two Agent comms API endpoints for bulk requests (stateless bulks and stateful bulks).
  • The Wazuh Endpoint agent Agent comms API client will be developed in C++ as a replacement for the current agentd daemon.
  • The Wazuh server Agent comms API server will be developed in Python as a replacement for the current remoted daemon.
  • Integration with existing infrastructure like load balancers, application firewalls, and cache servers must be seamless.
  • Communication must accept not only endpoint but other sources as well (clouds, Android, iOS, containers…).
  • Wazuh agents should populate a datetime field for every event sent to the manager. This datetime reflects the time the event was generated/collected on the agent side.
  • Completely remove the current remoted, agentd, and authd daemons, along with their associated binaries.
  • The chosen protocol is HTTP.
  • We must use a standard codec library for message interchanges (protobuf).
  • The implementation must use standardized libraries and tools.
  • The agent-to-manager communication might require a REST design.
  • The manager-to-agent communication might require a WebSocket design. We must use a standard library like gRPC to model the communications.
  • Agents will generate a UUID per installation.

Plan

Spike. ETA 06/28/2024

MVP implementation. (6 weeks)

Feature complete implementation. (6 weeks)

  • Implement agent registration information customization.
    • Owner: @wazuh/devel-dashboard
    • Teams involved: @wazuh/devel-dashboard
  • Implement client to server batch processing and compression.
    • Owner: @wazuh/devel-agent
    • Teams involved: @wazuh/devel-agent
  • Implement server to indexer stream batching.
    • Owner: @wazuh/devel-pyserver
    • Teams involved: @wazuh/devel-pyserver
  • Implement client metrics in the server.
    • Owner: @wazuh/devel-pyserver
    • Teams involved: @wazuh/devel-pyserver
  • Implement client and server rate limit
    • Owner: @wazuh/devel-pyserver
    • Teams involved: @wazuh/devel-pyserver @wazuh/devel-agent
  • Implement the Endpoint agent CLI @wazuh/devel-agent
    • Owner: @wazuh/devel-agent
    • Teams involved: @wazuh/devel-agent
  • Implement server security measures.
    • Owner: @wazuh/devel-pyserver
    • Teams involved: @wazuh/devel-pyserver

Integration (? weeks)

  • Adapt QA system tests.
  • Adapt Core integration tests.
  • Adapt Coordinator.
  • Adapt Framework integration tests.
  • Adapt Engine.
  • Adapt DevOps repositories. @wazuh/devel-devops
  • Adapt Cloud service. @wazuh/devel-cloud

Acceptance testing. (2 weeks) @wazuh/devel-qa

  • benchmarking
  • limit testing
  • end-to-end testing @wazuh/devel-cloud @wazuh/devel-devops
@snaow
Copy link
Contributor

snaow commented Jun 3, 2024

nice :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In progress
Status: In progress
Development

No branches or pull requests

5 participants