Agent/server communication protocol #22677

havidarou · 2024-03-27T11:27:44Z

Description

Wazuh's current communication setup is complex and lacks a standardized approach. This complexity causes compatibility issues. Simplifying and standardizing the protocol is crucial for better efficiency and security.

Currently, we have two models of communication:

Agent to server: data gathered by the agent flows to the server to be processed. The agent does not expect anything in return. The data is processed and then presented to the users for analysis. We expect this data flow to be almost constant. This communication should be event-driven, and it shouldn't be connection-oriented, as we pretend to scale the system by load-balancing per request, instead of per-connection load balancing.
Server to agent: when the user needs to configure/update an agent, or the server needs the agent to take some action. We expect this data flow to be scarce, but because we need the agent to react to some events, we need to be able to contact any agent at any time. This communication is always started by the agent and it is connection-oriented so we can react as soon as possible to an event.

These two models are very different, so we must address their design differences carefully. We want these two models to share some capabilities, including:

Use the same registration and authorization.
Be part of the same API.
Share the same balancing and networking infrastructure.
Share the same security protocol.

Concepts:

Agent comms API. This API is used between agents and servers communications and it is the main focus of this issue.
Server management API. This is the existing server API, used to manage the server configuration.
Agent: Any client that connects to the server via the Agent comms API.
Endpoint agent: This is the current Wazuh agent. It is the agent used to protect endpoints.

Registration, authorization, and authentication

All agents must be registered, authorized, and authenticated to connect to the server. This is mandatory for all agents.

The registration process will use the current Server management API. It therefore takes advantage of its RBAC (a specific policy related to agent registration should be created/reviewed).

The authorization and authentication process will use the Agent comms API. We want to use a well-known protocol to implement authorization and authentication using standard libraries and tools. We might consider oAuth 2.0 with JWTs.

Agent comms API

The Agent comms API design should allow us to use third-party infrastructure such as standard load balancers, application firewalls, cache servers, etc, transparently. This means a standard communication protocol is mandatory, including standard security protocols widely adopted by the industry, so we are compatible with current public key infrastructure deployments.

This API must have a version, and all the endpoints and behaviors must be documented so anyone can build agents and develop tests with it.

Endpoint agent changes

It should restart as few times as possible during its normal operation. For instance, it should not restart when receiving a remote configuration or when connecting to a different server.

It should contain the minimum number of binaries and daemons. Existing binaries and daemons will be removed/reviewed.

The deployment process will change to adopt a standardized approach. No more environment variables are used for deployment (they are not consistent across OSs).

Functional requirements

Note: The following functional requirements are expressed from the communications protocol and endpoint agent standpoints.

Agent comms API

Communication

Support agent-to-server communication with a standard protocol.
The agent-to-server communication is asynchronous and event-driven (this type of communication is not connection-oriented). This will allow for event balancing between servers.
Support server-to-agent push in near real time with a standard protocol.
The server-to-agent communication is connection-oriented. It is mandatory to accept server actions in near real time.
The communication protocol must provide back pressure.
The communication protocol must provide rate limiting.

Security

The communication protocol must provide a ciphered connection.
Incorporate standard security protocols compatible with current public key infrastructure deployments.

Authorization and authentication

Implement a standard protocol for secure authentication and authorization.
Use a standard protocol to manage sessions and secure communications between agents and the server.

API design

Develop a unified Agent comms API for all communications between agents and the server.
Ensure the API supports version control and is fully documented.

Agent deployment

Agents must register using the Server management API.
Introduce or review RBAC policies for agent registration in the Server management API.
Endpoint agents will introduce an Agent CLI used to initialize their configuration, including their registration. This CLI will be standardized for all compatible OS.

Endpoint agent behavior

The endpoint will not restart when receiving a remote configuration.

Non-functional requirements

Scalability:
- The system must efficiently scale to handle communications from a large number of agents without degradation in performance, both vertically and horizontally.
Interoperability:
- The protocol should be interoperable with existing systems and future technologies, promoting seamless integration.
Security:
- Implement industry-standard encryption for all data in transit.
- Ensure all components comply with the latest security standards to minimize vulnerabilities.
Reliability:
- The communication system must ensure high availability and fault tolerance.
- Implement efficient error handling and recovery mechanisms.
Performance:
- Minimize latency in server-to-agent communications for timely execution of actions.
- Optimize agent-to-server communications to handle high throughput.
Usability:
- The API should be designed for ease of use, allowing third parties to easily integrate and build upon.
Maintainability:
- The system should be easy to update and maintain without significant downtime.
Metrics and statistics:
- The implementation should provide metrics and statistics about its resources. We should rely on standard libraries to build and report metrics.

Implementation restrictions

There is an Agent comms API endpoint per Agent module (FIM, Logcollector, SCA, etc). These endpoints can be either stateless or stateful type.
There are two Agent comms API endpoints for bulk requests (stateless bulks and stateful bulks).
The Wazuh Endpoint agent Agent comms API client will be developed in C++ as a replacement for the current agentd daemon.
The Wazuh server Agent comms API server will be developed in Python as a replacement for the current remoted daemon.
Integration with existing infrastructure like load balancers, application firewalls, and cache servers must be seamless.
Communication must accept not only endpoint but other sources as well (clouds, Android, iOS, containers…).
Wazuh agents should populate a datetime field for every event sent to the manager. This datetime reflects the time the event was generated/collected on the agent side.
Completely remove the current remoted, agentd, and authd daemons, along with their associated binaries.
The chosen protocol is HTTP.
We must use a standard codec library for message interchanges (protobuf).
The implementation must use standardized libraries and tools.
The agent-to-manager communication might require a REST design.
The manager-to-agent communication might require a WebSocket design. We must use a standard library like gRPC to model the communications.
Agents will generate a UUID per installation.

Plan

Spike. ETA 06/28/2024

SPIKE - Initial registration system design #23393
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver @wazuh/devel-indexer
SPIKE - Initial Agent comms API server design #23395
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver
New Agent comms API endpoint client wazuh-agent#1
- Owner: @wazuh/devel-agent
- Teams involved: @wazuh/devel-agent
SPIKE - PoC implementation for agent and server #23396
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver @wazuh/devel-agent @wazuh/devel-indexer
Spike - agent comms API UX wazuh-dashboard#209
- Owner: @wazuh/devel-dashboard
- Teams involved: @wazuh/devel-dashboard
SPIKE - Server to indexer events batching #24713
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver

MVP implementation. (6 weeks)

Develop the framework indexer client #24615
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver
Develop the framework engine client #24646
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver
Develop the new registration system #24294
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver @wazuh/devel-dashboard
Develop Agent comms API #24305
- Implement login, stateless, stateful, and AR endpoints.
- Implement authentication and authorization.
- Implement Engine communication protocol.
- Implement Indexer stream communication protocol, including stream indices initialization.
- Replace remoted with the server daemon.
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver @wazuh/devel-indexer @wazuh/devel-cppserver
Develop the new client wazuh-agent#14
- Use login, stateless, stateful, and AR endpoints.
- Replace agentd with the client daemon.
- Owner: @wazuh/devel-agent
- Teams involved: @wazuh/devel-agent
Develop fleet management use cases #24711
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver @wazuh/devel-dashboard

Feature complete implementation. (6 weeks)

Implement agent registration information customization.
- Owner: @wazuh/devel-dashboard
- Teams involved: @wazuh/devel-dashboard
Implement client to server batch processing and compression.
- Owner: @wazuh/devel-agent
- Teams involved: @wazuh/devel-agent
Implement server to indexer stream batching.
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver
Implement client metrics in the server.
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver
Implement client and server rate limit
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver @wazuh/devel-agent
Implement the Endpoint agent CLI @wazuh/devel-agent
- Owner: @wazuh/devel-agent
- Teams involved: @wazuh/devel-agent
Implement server security measures.
- Owner: @wazuh/devel-pyserver
- Teams involved: @wazuh/devel-pyserver

Integration (? weeks)

Adapt QA system tests.
Adapt Core integration tests.
Adapt Coordinator.
Adapt Framework integration tests.
Adapt Engine.
Adapt DevOps repositories. @wazuh/devel-devops
Adapt Cloud service. @wazuh/devel-cloud

Acceptance testing. (2 weeks) @wazuh/devel-qa

benchmarking
limit testing
end-to-end testing @wazuh/devel-cloud @wazuh/devel-devops

The text was updated successfully, but these errors were encountered:

vikman90 · 2024-03-27T15:06:10Z

This issue may supersede:

snaow · 2024-06-03T07:44:44Z

nice :D

havidarou added level/objective type/enhancement New feature or request labels Mar 27, 2024

gdiazlo mentioned this issue Apr 12, 2024

Data persistence model redesign #22887

Open

10 tasks

havidarou changed the title ~~Agent/manager communication protocol~~ Agent/server communication protocol May 2, 2024

TomasTurina mentioned this issue May 17, 2024

New Agent comms API endpoint client wazuh/wazuh-agent#1

Closed

This was referenced May 14, 2024

SPIKE - Initial registration system design #23393

Closed

SPIKE - Initial Agent comms API server design #23395

Closed

SPIKE - PoC implementation for agent and server #23396

Closed

fdalmaup mentioned this issue May 21, 2024

Connexion 3.0 performance tests #22427

Closed

4 tasks

havidarou assigned Selutario May 23, 2024

This was referenced Jun 13, 2024

Delete deprecated endpoints #17815

Closed

Remove deprecated API endpoints #17781

Closed

Nicogp mentioned this issue Jun 13, 2024

Agent upgrade by WPK fails if any scan is running #14747

Closed

This was referenced Jun 24, 2024

Develop the new registration system #24294

Open

Develop Agent comms API #24305

Open

vikman90 mentioned this issue Jun 25, 2024

Develop the new client wazuh/wazuh-agent#14

Open

Selutario mentioned this issue Jul 10, 2024

Revert deprecation warnings in API endpoints #24522

Closed

4 tasks

fdalmaup self-assigned this Jul 15, 2024

nico-stefani mentioned this issue Jul 15, 2024

Develop the framework indexer client #24615

Open

2 tasks

GGP1 mentioned this issue Jul 16, 2024

Develop the framework engine client #24646

Open

2 tasks

fdalmaup mentioned this issue Jul 19, 2024

Develop fleet management use cases #24711

Open

3 tasks

GGP1 mentioned this issue Jul 19, 2024

SPIKE - Server to indexer events batching #24713

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent/server communication protocol #22677

Agent/server communication protocol #22677

havidarou commented Mar 27, 2024 •

edited by GGP1

Loading

vikman90 commented Mar 27, 2024 •

edited

Loading

snaow commented Jun 3, 2024

Agent/server communication protocol #22677

Agent/server communication protocol #22677

Comments

havidarou commented Mar 27, 2024 • edited by GGP1 Loading

Description

Registration, authorization, and authentication

Agent comms API

Endpoint agent changes

Functional requirements

Agent comms API

Communication

Security

Authorization and authentication

API design

Agent deployment

Endpoint agent behavior

Non-functional requirements

Implementation restrictions

Plan

Spike. ETA 06/28/2024

MVP implementation. (6 weeks)

Feature complete implementation. (6 weeks)

Integration (? weeks)

Acceptance testing. (2 weeks) @wazuh/devel-qa

vikman90 commented Mar 27, 2024 • edited Loading

snaow commented Jun 3, 2024

havidarou commented Mar 27, 2024 •

edited by GGP1

Loading

vikman90 commented Mar 27, 2024 •

edited

Loading