Data persistence model redesign #22887

havidarou · 2024-04-11T09:41:21Z

Description

The current Wazuh persistence model doesn't work well in a cluster environment due to these limitations:

Scalability: because there's data in each Server, scaling up and down requires it to be recreated too often, leading to lower performance and bottlenecks.
Consistency: when an Agent connects to a different Server, its data must be recreated, leading to inconsistencies.
Analytics: creating dashboards and analytics using the Server's database is cumbersome for our current dashboard solutions which end up in an overly complicated RBAC solution.
Storage: Backups and replica options are cumbersome.

We want to leverage the Indexer as the data store. The Indexer architecture, based on OpenSearch, provides a scalable and replicated data store that might avoid the limitations above.

Data flow

The Agents acquire the data. These agents will send requests to the Agent comms API which will route the relevant information to each Server module. Server modules will process their information and update the Indexer with their results.

After the implementation of:

Agent/server communication protocol #22677

The load balancer will balance per request. This will force all the modules to be request-driven and require a distributed store with all the contextual information they might need to do their process. We might need to change some processes so they adapt better to the characteristics of Indexer (document-oriented, distributed, non-transactional store).

flowchart TD

subgraph Agents
    Endpoints
    Clouds
    Other_sources
end

subgraph Indexer["Indexer cluster"]

    subgraph Data_streams["Data streams"]
        alerts_stream["Alerts stream"]
        commands_stream["Commands stream"]
    end

    subgraph Indexer_modules["Indexer modules"]
        initialization["Initialization plugin"]
        commands_manager["Commands manager"]
    end

    subgraph Data_states["Data states"]
        agents_list["Agents list"]
        states["States"]
    end

end

subgraph Server["Server cluster"]

    subgraph Wazuh1["Server node n"]
        api1["Agent comms API"]
        server1["Server </br> management API"]
        Engine1["Engine"]
        VD1["VD"]
    end

    subgraph Wazuh2[" Server node 2"]
        api2["Agent comms API"]
        server2["Server </br> management API"]
        Engine2["Engine"]
        VD2["VD"]
    end

end

subgraph Dashboard
    subgraph Dashboard1["Dashboard"]
    end
end

subgraph lb["Load Balancer"]
    lb_node["Per request"]
end

users["Wazuh users"] --> Dashboard
Agents -- HTTP --> lb
lb -- HTTP --> Wazuh1
lb -- HTTP --> Wazuh2
Dashboard -- HTTP --> Indexer
Wazuh1 -- HTTP --> Indexer
Wazuh2 -- HTTP --> Indexer

style Wazuh1 fill:#abc2eb
style Wazuh2 fill:#abc2eb
style Data_streams fill:#abc2eb
style Data_states fill:#abc2eb
style Dashboard1 fill:#abc2eb
style Indexer_modules fill:#abc2eb

Agent registration

The current Wazuh Agent enrollment process is replaced with a standard token-based validation protocol. During the Wazuh agent deployment process, the agent will be registered using the /registration endpoint in the Server management API.

The Wazuh Agents' information is stored in the Indexer component.

At this point, the Agent can use its credentials to get and validate the authorization token used in the Agent comms API. This token is computationally validated on the server side and must be renewed before it expires with a /login request to the Agent comms API.

flowchart TD

subgraph Agents
    Endpoints
    Clouds
    Other_sources
end

subgraph Indexer["Indexer cluster"]

    subgraph Data_states["Data states"]
        agents_list["Agents list"]
    end

end

subgraph Server["Server cluster"]

    subgraph Wazuh1["Server node n"]
        api1["Agent comms API"]
        server1["Server </br> management API"]

    end

end

subgraph Dashboard
    subgraph Dashboard1["Dashboard"]
    end
end

subgraph lb["Load Balancer"]
    lb_node["Per request"]
end

users["Wazuh users"] --> Dashboard
Agents -- 1. /register --> lb
Agents -- 2. /login --> lb
Agents -- 3. /request_with_token --> lb

lb -- 1. /register --> server1
lb -- 2. /login --> api1
lb -- 3. /request_with_token --> api1

Dashboard -- HTTP --> Indexer
server1 -- 1. Store credentials --> agents_list
api1 -- 2. Read credentials --> agents_list

style Wazuh1 fill:#abc2eb
style Data_states fill:#abc2eb
style Dashboard1 fill:#abc2eb

Stateful modules

Some Agent's modules generate events based on state changes. This state persists between agent restarts.

Every Agent is responsible for its own state.

Every Agent stateful module uses a specific Agent comms API endpoint.

The currently identified stateful modules are:

FIM.
Inventory.
SCA.

flowchart TD

subgraph Agents
    Endpoints
    Clouds
    Other_sources
end

subgraph Indexer["Indexer cluster"]

    subgraph data_states["Data states"]
        states["States"]
    end

end

subgraph Server["Server cluster"]

    subgraph Wazuh1["Server node n"]
        api1["Agent comms API"]

    end

end

subgraph Dashboard
    subgraph Dashboard1["Dashboard"]
    end
end

subgraph lb["Load Balancer"]
    lb_node["Per request"]
end

users["Wazuh users"] --> Dashboard
Agents -- /stateful --> lb

lb -- /stateful --> api1

Dashboard -- HTTP --> Indexer
api1 -- /index --> states

style Wazuh1 fill:#abc2eb
style Dashboard1 fill:#abc2eb
style data_states fill:#abc2eb

Stateless modules

Every Agent stateless module uses a specific Agent comms API endpoint.

Every Agent stateful event also generates a stateless event.

flowchart TD

subgraph Agents
    Endpoints
    Clouds
    Other_sources
end

subgraph Indexer["Indexer cluster"]

    subgraph Data_states["Data streams"]
        alerts_stream["Alerts stream"]
    end

end

subgraph Server["Server cluster"]

    subgraph Wazuh1["Server node n"]
        api1["Agent comms API"]
        engine["Engine"]

    end

end

subgraph Dashboard
    subgraph Dashboard1["Dashboard"]
    end
end

subgraph lb["Load Balancer"]
    lb_node["Per request"]
end

users["Wazuh users"] --> Dashboard
Agents -- /stateless --> lb

lb -- /stateless --> api1

Dashboard -- HTTP --> Indexer
api1 -- /process --> engine
engine -- /index --> alerts_stream

style Wazuh1 fill:#abc2eb
style Data_states fill:#abc2eb
style Dashboard1 fill:#abc2eb

Agent commands

Each Agent will use the Agent comms API's /commands endpoint to poll for commands. Agents must maintain this polling at all times by sending the /commands request in case it drops.

Commands will be stored as an event stream in the Indexer. This stream will be continuously processed by the Commands manager as described in the diagram.

flowchart TD

subgraph Agents
    Endpoints
    Clouds
    Other_sources
end

subgraph Indexer["Indexer cluster"]

    subgraph Data_states["Data streams"]
        commands_stream["Commands stream"]
    end

    subgraph indexer_modules["Indexer modules"]
        commands_manager["Commands manager"]
    end
end

subgraph Server["Server cluster"]

    subgraph Wazuh1["Server node n"]
        api1["Agent comms API"]
        engine["Engine"]
        server1["Server management </br> API"]

    end

end

subgraph Dashboard
    subgraph Dashboard1["Dashboard"]
    end
end

subgraph lb["Load Balancer"]
    lb_node["Per request"]
end

users["Wazuh users"] --> Dashboard
Agents -- /poll_commands --> lb

lb -- /poll_commands --> api1
Dashboard -- HTTP --> Indexer
commands_manager -- 1. /index --> commands_stream
commands_manager -- 2. /reads --> commands_stream
commands_manager -- 3. /respond --> api1
engine -- /send_commands --> commands_manager
server1 -- /send_commands --> commands_manager

style Wazuh1 fill:#abc2eb
style Data_states fill:#abc2eb
style Dashboard1 fill:#abc2eb
style indexer_modules fill:#abc2eb

Challenges

Some security modules require contextual data for their processing. Having this data in the Indexer will slow down their processing and increase the workload of the Indexer.

Functional requirements

Agent registration

This part of the data flow will be implemented here Agent/server communication protocol #22677.

Stateless modules

This part of the data flow will be implemented here Agent/server communication protocol #22677 and here Engine MVP #11334.

Indexer initialization

The Wazuh Indexer must make sure that all of its requirements are ready during its initialization.
The currently identified requirements are:
- Streams indices, mappings, and ingest pipelines.
- States indices, mappings, and ingest pipelines.
- Agent list index, mappings, and ingest pipelines.
- RBAC by default. Identify all necessary users and their minimum permissions.
- Rollover + alias configuration for stream indices.

Stateful modules

The Agent comms API module in the Server updates the states in the Indexer based on stateful requests sent by Agents.

Commands

Every Agent must poll for commands using the /poll_commands Agent comms API endpoint.
To send a command to an Agent you need to use the send_commands Agent comms API endpoint.
During the send_commands endpoint processing, commands are sent to the Commands stream in the Indexer.
The Indexer must process the Commands stream and send the command execution to the Server node where the Agent /poll_commands request is made.

Non-functional requirements

Scalability. The system must handle a growing number of Agents and data without performance degradation.
Performance. The system must ensure low-latency data processing and communication between Agents and Servers. Indexer queries and updates should be optimized to handle high throughput.
Availability. The system must be highly available with minimal downtime. Indexer clusters should have redundancy and failover mechanisms.
Reliability. Data consistency and reliability must be maintained across all components. The system should guarantee the successful delivery and processing of data and commands.
Security. Access controls and authentication mechanisms must be enforced throughout the system.
Maintainability. Clear documentation and well-defined interfaces for components are necessary to facilitate development.
Manageability. The system should provide monitoring and alerting capabilities to track performance and health. Administrative tasks such as backups, scaling, and updates should be automated as much as possible.
Resource Efficiency. Efficient use of CPU, memory, and storage resources must be considered in the design.

Implementation restrictions

The Indexer initialization will be implemented via an Indexer plugin (Initialization plugin).
The Commands stream processing will be implemented via an Indexer plugin (Commands manager). The OpenSearch Job Scheduler plugin should be considered as a base for this.

Plan

Spike. ETA 6/27/2024

Spike - Indexer performance on different Data Persistence Model designs wazuh-indexer#255
- Owner: @wazuh/devel-indexer
- Teams involved: @wazuh/devel-indexer
Spike - Initial indexer initialization plugin design wazuh-indexer#256
- Owner: @wazuh/devel-indexer
- Teams involved: @wazuh/devel-indexer
Agent stateful modules redesign wazuh-agent#2
- Owner: @wazuh/devel-agent
- Teams involved: @wazuh/devel-agent @wazuh/devel-pyserver
Agent command manager wazuh-agent#4
- Owner: @wazuh/devel-agent
- Teams involved: @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-indexer
Spike - data persistence UX wazuh-dashboard#210
- Owner: @wazuh/devel-dashboard
- Teams involved: @wazuh/devel-dashboard

MVP implementation.

Indexer initialization plugin wazuh-indexer-plugins#9
- Owner: @wazuh/devel-indexer
- Teams involved: @wazuh/devel-indexer @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-cppserver
Data persistence model definition
- Owner: @wazuh/devel-indexer
- Teams involved: @wazuh/devel-indexer @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-cppserver
Commands Manager plugin
- Owner: @wazuh/devel-indexer
- Teams involved: @wazuh/devel-indexer @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-cppserver
Develop Agent Inventory module modifications wazuh-agent#15
- Owner: @wazuh/devel-agent
- Teams involved: @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-dashboard
Agent centralized configuration wazuh-agent#32
- This requires the Commands manager plugin.
- Owner: @wazuh/devel-agent
- Teams involved: @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-indexer @wazuh/devel-dashboard

Feature complete implementation.

Develop Indexer initialization plugin with: RBAC by default. Identify all necessary users and their minimum permissions. Rollover + alias configuration for stream indices.
- Owner: @wazuh/devel-indexer
- Teams involved: @wazuh/devel-indexer @wazuh/devel-pyserver
Develop Agent FIM and SCA module modifications.
- Owner: @wazuh/devel-agent
- Teams involved: @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-dashboard
Develop the rest of default commands use cases.
- Owner: @wazuh/devel-agent
- Teams involved: @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-cppserver @wazuh/devel-indexer @wazuh/devel-dashboard

Integration (? weeks)

Adapt QA system tests.
Adapt Core integration tests.
Adapt Coordinator.
Adapt Framework integration tests.
Adapt Engine.
Adapt DevOps repositories. @wazuh/devel-devops
Adapt Cloud service. @wazuh/devel-cloud

Acceptance testing. (2 weeks) @wazuh/devel-qa

benchmarking
limit testing
end-to-end testing @wazuh/devel-cloud @wazuh/devel-devops
Global queries for FIM and inventory data #20540
Functionality testing tier 1 wazuh-qa#4715
Index State Management default policy #18999

The text was updated successfully, but these errors were encountered:

havidarou added level/objective type/enhancement New feature or request labels Apr 11, 2024

vikman90 mentioned this issue Apr 25, 2024

Package Query for Wazuh Agents in Inventory System #23118

Closed

havidarou mentioned this issue May 22, 2024

Global queries for FIM and inventory data #20540

Closed

havidarou assigned AlexRuiz7 May 23, 2024

GGP1 mentioned this issue May 31, 2024

SPIKE - Initial Agent comms API server design #23395

Closed

This was referenced Jun 6, 2024

Agent stateful modules redesign wazuh/wazuh-agent#2

Closed

Agent command manager wazuh/wazuh-agent#4

Closed

AlexRuiz7 mentioned this issue Jun 13, 2024

Spike - Initial indexer initialization plugin design wazuh/wazuh-indexer#256

Closed

4 tasks

Selutario mentioned this issue Jun 24, 2024

Develop the new registration system #24294

Open

7 tasks

This was referenced Jun 25, 2024

Develop the new client wazuh/wazuh-agent#14

Open

Develop Agent Inventory module modifications wazuh/wazuh-agent#15

Open

gdiazlo mentioned this issue Jul 1, 2024

CONFIGURATION ASSESSMENT DASHBOARD - Suggestion wazuh/wazuh-dashboard-plugins#6794

Closed

AlexRuiz7 mentioned this issue Jul 9, 2024

Indexer initialization plugin wazuh/wazuh-indexer-plugins#9

Open

9 tasks

Selutario mentioned this issue Jul 10, 2024

Revert deprecation warnings in API endpoints #24522

Closed

4 tasks

vikman90 mentioned this issue Jul 11, 2024

Agent centralized configuration wazuh/wazuh-agent#32

Open

AlexRuiz7 mentioned this issue Jul 11, 2024

Vulnerability dashboard not following document level security model #24118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data persistence model redesign #22887

Data persistence model redesign #22887

havidarou commented Apr 11, 2024 •

edited by AlexRuiz7

Loading

Data persistence model redesign #22887

Data persistence model redesign #22887

Comments

havidarou commented Apr 11, 2024 • edited by AlexRuiz7 Loading

Description

Data flow

Agent registration

Stateful modules

Stateless modules

Agent commands

Challenges

Functional requirements

Agent registration

Stateless modules

Indexer initialization

Stateful modules

Commands

Non-functional requirements

Implementation restrictions

Plan

Spike. ETA 6/27/2024

MVP implementation.

Feature complete implementation.

Integration (? weeks)

Acceptance testing. (2 weeks) @wazuh/devel-qa

havidarou commented Apr 11, 2024 •

edited by AlexRuiz7

Loading