-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Agent comms API
endpoint client
#1
Comments
Initial libraries investigationThere are several C++ libraries that support our requirements. The ones described in this small list are widely used in the industry and tested. Note that no single library covers all needs: those that support high-level protocols like HTTP and HTTP/2 often lack native support for UDP, while libraries handling low-level protocols may not provide comprehensive support for higher-level abstractions. According to wazuh/wazuh#23395 the server team is also considering a number of libraries, we should encourage sharing the same technology across both Agent and Server. Libraries for Network Connections:ZeroMQ (ØMQ): https://zeromq.org/get-started/ gRPC: Boost.Asio / Boost.Beast: https://www.boost.org/users/license.html ASIO (Standalone): https://www.boost.org/LICENSE_1_0.txt CURL and http-request: CURL supports many protocols like HTTP and HTTPS, with built-in SSL/TLS for secure communication. Given that we already have a CURL wrapper While it lacks native WebSocket support, CURL's wide support makes it suitable for HTTP-based event sending and command endpoints. Compared to other options, aside from the increased complexity of adding new dependencies ZeroMQ offers high-performance messaging but limited protocol support; gRPC supports HTTP/2 and real-time communication but is more complex given the need for proto buffer definitions. Continuing with CURL leverages existing code and avoids new dependencies. Note that https://curl.se/libcurl/ Libraries for JWTs:jwt-cpp: |
New agent
New repository
|
UpdateNew repo structureInitial draft considerations:
Additionally, we should aim to identify functionalities that could be replaced by an external library to offload the maintenance of these components. Furthermore, we need to establish a strategy for common code shared between the agent and server to avoid duplication.
For folders that is not code related to the Agent:
Missing from this initial draft are other files such as Complete structure:Opción 1:
Opción 2:
|
Agent comms API
endpoint client
Communications server -> agent/commands endpointNote This is a work in progress
Update: 03/06/2024
Update: 04/06/2024
Update: 05/06/2024
Update: 06/06/2024
Update: 07/06/2024
Update: 10/06/2024
Update: 11/06/2024
Update: 13/06/2024
|
Current wazuh-agentd UML diagramPlantUML code
|
New agent registration mechanismPlantUML code
|
New agent communication mechanismPlantUML code
|
Agent comms API
endpoint clientAgent comms API
endpoint client
PoC SummaryAfter an initial period of research, during which various library variants were evaluated for the Proof of Concept (PoC) implementation (see comment), it was determined that the initial approach for integrating the new agent with the Agent Communications API would use the wazuh-http-request repository as a cURL wrapper. The PoC aims to address the fundamental aspects of communication and design, based on the component diagrams detailed in this issue. This PoC specifically targets the client module and the submodules that interact with the commander and the queue of that design. The following requests specified in the description of this issue were successfully implemented:
The For data persistence, two basic storage models were developed: one using SQLite and the other using RocksDB. These were implemented with a wrapper to abstract the design from the specific database chosen in the future. Currently, command storage utilizes RocksDB, while event storage employs SQLite. A server was developed to mock responses, using the Boost library and JWT to provide the client with a token via the /login request. Regarding events, their structure is defined as follows:
The status of requests can be set to: pending, processing, or dispatched. By default, they are inserted into the database with a pending status. Events are generated automatically through a Python script. The event_queue_monitor submodule extracts these events from the database, updates their status to "processing," and then accumulates them for a time T or until a count N is reached. Subsequently, it automatically sends a /stateless request with these events to the server. This submodule verifies the success of the request, marking the events as dispatched in the database or resetting their status to pending. This submodule features a thread that continuously searches for events in the database. For each batch of events, a new thread is launched to handle dispatch. Regarding commands, their structure is defined as follows:
The status can be set to either pending or dispatched. The command_dispatcher submodule is responsible for making continuous GET requests to the server within a thread. When no commands are available, a Timeout error is returned. If commands are available, they are received and stored in the database. Another thread then retrieves commands with a pending status from the database, marks them as dispatched, and simulates sending them to the commander. Finally, for both submodules, a message format already used by wazuh in the upgrade module was selected (see issue).
and a command would be:
The following diagram was made by @TomasTurina for a better understanding: PlantUML code
|
Test client PoC with PyServer PoC - Stop server and run it again 🟢
EventsServer
Client
CommandsServer
Client
|
Client Send Events BenchmarksIntroductionThe purpose of these benchmarks is to evaluate the performance of sending pre-loaded pending events using Benchmark ResultsThe results of the benchmarks are categorized based on different parameters to provide a clear and detailed analysis. 100 iterations of each test were made to average the results. 1. Varying Batch Sizes with a Fixed Event Count (1,000 events)
Analysis: As the batch size increases, the time required for dispatching events significantly decreases initially and then stabilizes. This demonstrates that larger batch sizes are more efficient for dispatching a high number of events. 2. Varying Event Counts with a Fixed Batch Size (100)
Analysis: The time required increases with the number of events, which is expected. 3. Varying Event Data Sizes with a Fixed Event Count (1000 events) and Batch Size (100)
Analysis: As the data size increases, the time required for dispatching events increases too. This demonstrates that the request take more time, indicating a potential limit for optimization when handling very large data sizes. ConclusionIt will be necessary in successive versions of the client and the server to establish an optimal number for the batch size according to the average or maximum size of the events. Although a larger batch is more efficient, when using the whole communication stack and the database, the times get worse for large volumes of information. The same tests were repeated with 1. Varying Batch Sizes with a Fixed Event Count (1,000 events)
2. Varying Event Counts with a Fixed Batch Size (100)
3. Varying Event Data Sizes with a Fixed Event Count (1000 events) and Batch Size (100)
|
Update on Event Queue Dispatch BenchmarksCommit 805436a adds the following benchmark tests to the PoC IntroductionThe purpose of these benchmarks is to evaluate the performance of dispatching pre-loaded pending events using Benchmark ResultsThe results of the benchmarks are categorized based on different parameters to provide a clear and detailed analysis. 100 iterations of each test were made to average the results. 1. Varying Batch Sizes with a Fixed Event Count (200,000 events)
Analysis: As the batch size increases, the time required for dispatching events significantly decreases initially and then stabilizes. This demonstrates that larger batch sizes are more efficient for dispatching a high number of events. 2. Varying Event Counts with a Fixed Batch Size (100)
Analysis: The time required increases with the number of events, which is expected. However, the increase is more pronounced as the event count grows, indicating a potential area for optimization when handling very large volumes of events. 3. Varying Event Data Sizes with a Fixed Event Count (100,000 events) and Batch Size (100)
Analysis: The dispatch time remains relatively constant regardless of the event data size, suggesting that the current implementation efficiently handles varying sizes of event data. ConclusionThe benchmark results provide valuable insights into the performance characteristics of using
Overall, the results are good and shed a positive light on the use of It's also worth noting that given these measurements results, the true bottleneck in handling events will more likely come from database transactions and http requests to the server. |
Report on Event Queue Dispatch BenchmarksIntroductionThe purpose of these benchmarks is to evaluate the performance of dispatching pre-loaded pending events using Asio's thread pool has been limited to 32 threads (the result of Benchmark ResultsThe results of the benchmarks are categorized based on different parameters to provide a clear and detailed analysis. 1. Varying Batch Sizes with a Fixed Event Count (200,000 events)Using
Using Asio's Thread Pool:
Analysis: Asio's thread pool performs significantly better than 2. Varying Event Counts with a Fixed Batch Size (100)Using
Using Asio's Thread Pool:
Analysis: Asio's thread pool significantly outperforms 3. Varying Event Data Sizes with a Fixed Event Count (100,000 events) and Batch Size (100)Using
Using Asio's Thread Pool:
Analysis: Asio's thread pool demonstrates a more consistent and lower dispatch time across different event data sizes compared to ConclusionThe benchmark results provide valuable insights into the performance characteristics of using
Overall, Asio's thread pool demonstrates superior performance and scalability compared to a manual handling of |
Performance comparison between RocksDB and SQLiteThe test focused on basic database operations — writing, reading, and updating records — without utilizing multi-threading aiming to provide insights into the efficiency and suitability of each database for potential use in our project. ResultsTo do
Update: Performance comparison between RocksDB and SQLiteUsing transactions for bulk inserts dramatically improves SQLite performance.
There's no meaningful difference in terms of performance between SQLite and RocksDB. |
Conclusion
We are ready to move forward with the implementation of the MVP. |
Description
As detailed in wazuh/wazuh#22677, Wazuh's current communication setup is complex and needs to be refactored.
We want to replace the current
wazuh-agentd
service in charge of communicating with the server with a new agent. Additionally, theagent-auth
tool will also be replaced by this agent.The new agent must be able to perform the following tasks:
Server management API
(it needs the login token).Agent comms API
.Agent comms API
.Agent comms API
.Additionally, these are the API endpoints that the agent will use to communicate with the server:
/login
Authenticate (request token).
/events/stateless
Send events.
/events/stateful
The same as the previous one, but it requires persistent data.
/commands
In opposite direction, request made by agent to manager.
The focus of this issue will be on the following tasks:
Implementation restrictions
libcurl
,boost
,gRPC
, etc.Plan
/poc
folder in the new repository.POC working branch: https://github.com/wazuh/wazuh-agent/tree/1-spike-new-agent-comms-api-endpoint-client
The text was updated successfully, but these errors were encountered: