Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data persistence model redesign #22887

Open
4 of 10 tasks
havidarou opened this issue Apr 11, 2024 · 0 comments
Open
4 of 10 tasks

Data persistence model redesign #22887

havidarou opened this issue Apr 11, 2024 · 0 comments
Assignees
Labels

Comments

@havidarou
Copy link
Member

havidarou commented Apr 11, 2024

Description

The current Wazuh persistence model doesn't work well in a cluster environment due to these limitations:

  • Scalability: because there's data in each Server, scaling up and down requires it to be recreated too often, leading to lower performance and bottlenecks.
  • Consistency: when an Agent connects to a different Server, its data must be recreated, leading to inconsistencies.
  • Analytics: creating dashboards and analytics using the Server's database is cumbersome for our current dashboard solutions which end up in an overly complicated RBAC solution.
  • Storage: Backups and replica options are cumbersome.

We want to leverage the Indexer as the data store. The Indexer architecture, based on OpenSearch, provides a scalable and replicated data store that might avoid the limitations above.

Data flow

The Agents acquire the data. These agents will send requests to the Agent comms API which will route the relevant information to each Server module. Server modules will process their information and update the Indexer with their results.

After the implementation of:

The load balancer will balance per request. This will force all the modules to be request-driven and require a distributed store with all the contextual information they might need to do their process. We might need to change some processes so they adapt better to the characteristics of Indexer (document-oriented, distributed, non-transactional store).

flowchart TD

subgraph Agents
    Endpoints
    Clouds
    Other_sources
end

subgraph Indexer["Indexer cluster"]

    subgraph Data_streams["Data streams"]
        alerts_stream["Alerts stream"]
        commands_stream["Commands stream"]
    end

    subgraph Indexer_modules["Indexer modules"]
        initialization["Initialization plugin"]
        commands_manager["Commands manager"]
    end

    subgraph Data_states["Data states"]
        agents_list["Agents list"]
        states["States"]
    end

end

subgraph Server["Server cluster"]

    subgraph Wazuh1["Server node n"]
        api1["Agent comms API"]
        server1["Server </br> management API"]
        Engine1["Engine"]
        VD1["VD"]
    end

    subgraph Wazuh2[" Server node 2"]
        api2["Agent comms API"]
        server2["Server </br> management API"]
        Engine2["Engine"]
        VD2["VD"]
    end

end

subgraph Dashboard
    subgraph Dashboard1["Dashboard"]
    end
end

subgraph lb["Load Balancer"]
    lb_node["Per request"]
end

users["Wazuh users"] --> Dashboard
Agents -- HTTP --> lb
lb -- HTTP --> Wazuh1
lb -- HTTP --> Wazuh2
Dashboard -- HTTP --> Indexer
Wazuh1 -- HTTP --> Indexer
Wazuh2 -- HTTP --> Indexer

style Wazuh1 fill:#abc2eb
style Wazuh2 fill:#abc2eb
style Data_streams fill:#abc2eb
style Data_states fill:#abc2eb
style Dashboard1 fill:#abc2eb
style Indexer_modules fill:#abc2eb
Loading

Agent registration

The current Wazuh Agent enrollment process is replaced with a standard token-based validation protocol. During the Wazuh agent deployment process, the agent will be registered using the /registration endpoint in the Server management API.

The Wazuh Agents' information is stored in the Indexer component.

At this point, the Agent can use its credentials to get and validate the authorization token used in the Agent comms API. This token is computationally validated on the server side and must be renewed before it expires with a /login request to the Agent comms API.

flowchart TD

subgraph Agents
    Endpoints
    Clouds
    Other_sources
end

subgraph Indexer["Indexer cluster"]

    subgraph Data_states["Data states"]
        agents_list["Agents list"]
    end

end

subgraph Server["Server cluster"]

    subgraph Wazuh1["Server node n"]
        api1["Agent comms API"]
        server1["Server </br> management API"]

    end

end

subgraph Dashboard
    subgraph Dashboard1["Dashboard"]
    end
end

subgraph lb["Load Balancer"]
    lb_node["Per request"]
end

users["Wazuh users"] --> Dashboard
Agents -- 1. /register --> lb
Agents -- 2. /login --> lb
Agents -- 3. /request_with_token --> lb

lb -- 1. /register --> server1
lb -- 2. /login --> api1
lb -- 3. /request_with_token --> api1

Dashboard -- HTTP --> Indexer
server1 -- 1. Store credentials --> agents_list
api1 -- 2. Read credentials --> agents_list

style Wazuh1 fill:#abc2eb
style Data_states fill:#abc2eb
style Dashboard1 fill:#abc2eb
Loading

Stateful modules

Some Agent's modules generate events based on state changes. This state persists between agent restarts.

Every Agent is responsible for its own state.

Every Agent stateful module uses a specific Agent comms API endpoint.

The currently identified stateful modules are:

  • FIM.
  • Inventory.
  • SCA.
flowchart TD

subgraph Agents
    Endpoints
    Clouds
    Other_sources
end

subgraph Indexer["Indexer cluster"]

    subgraph data_states["Data states"]
        states["States"]
    end

end

subgraph Server["Server cluster"]

    subgraph Wazuh1["Server node n"]
        api1["Agent comms API"]

    end

end

subgraph Dashboard
    subgraph Dashboard1["Dashboard"]
    end
end

subgraph lb["Load Balancer"]
    lb_node["Per request"]
end

users["Wazuh users"] --> Dashboard
Agents -- /stateful --> lb

lb -- /stateful --> api1

Dashboard -- HTTP --> Indexer
api1 -- /index --> states

style Wazuh1 fill:#abc2eb
style Dashboard1 fill:#abc2eb
style data_states fill:#abc2eb
Loading

Stateless modules

Every Agent stateless module uses a specific Agent comms API endpoint.

Every Agent stateful event also generates a stateless event.

flowchart TD

subgraph Agents
    Endpoints
    Clouds
    Other_sources
end

subgraph Indexer["Indexer cluster"]

    subgraph Data_states["Data streams"]
        alerts_stream["Alerts stream"]
    end

end

subgraph Server["Server cluster"]

    subgraph Wazuh1["Server node n"]
        api1["Agent comms API"]
        engine["Engine"]

    end

end

subgraph Dashboard
    subgraph Dashboard1["Dashboard"]
    end
end

subgraph lb["Load Balancer"]
    lb_node["Per request"]
end

users["Wazuh users"] --> Dashboard
Agents -- /stateless --> lb

lb -- /stateless --> api1

Dashboard -- HTTP --> Indexer
api1 -- /process --> engine
engine -- /index --> alerts_stream

style Wazuh1 fill:#abc2eb
style Data_states fill:#abc2eb
style Dashboard1 fill:#abc2eb
Loading

Agent commands

Each Agent will use the Agent comms API's /commands endpoint to poll for commands. Agents must maintain this polling at all times by sending the /commands request in case it drops.

Commands will be stored as an event stream in the Indexer. This stream will be continuously processed by the Commands manager as described in the diagram.

flowchart TD

subgraph Agents
    Endpoints
    Clouds
    Other_sources
end

subgraph Indexer["Indexer cluster"]

    subgraph Data_states["Data streams"]
        commands_stream["Commands stream"]
    end

    subgraph indexer_modules["Indexer modules"]
        commands_manager["Commands manager"]
    end
end

subgraph Server["Server cluster"]

    subgraph Wazuh1["Server node n"]
        api1["Agent comms API"]
        engine["Engine"]
        server1["Server management </br> API"]

    end

end

subgraph Dashboard
    subgraph Dashboard1["Dashboard"]
    end
end

subgraph lb["Load Balancer"]
    lb_node["Per request"]
end

users["Wazuh users"] --> Dashboard
Agents -- /poll_commands --> lb

lb -- /poll_commands --> api1
Dashboard -- HTTP --> Indexer
commands_manager -- 1. /index --> commands_stream
commands_manager -- 2. /reads --> commands_stream
commands_manager -- 3. /respond --> api1
engine -- /send_commands --> commands_manager
server1 -- /send_commands --> commands_manager

style Wazuh1 fill:#abc2eb
style Data_states fill:#abc2eb
style Dashboard1 fill:#abc2eb
style indexer_modules fill:#abc2eb

Loading

Challenges

Some security modules require contextual data for their processing. Having this data in the Indexer will slow down their processing and increase the workload of the Indexer.

Functional requirements

Agent registration

Stateless modules

Indexer initialization

  • The Wazuh Indexer must make sure that all of its requirements are ready during its initialization.
  • The currently identified requirements are:
    • Streams indices, mappings, and ingest pipelines.
    • States indices, mappings, and ingest pipelines.
    • Agent list index, mappings, and ingest pipelines.
    • RBAC by default. Identify all necessary users and their minimum permissions.
    • Rollover + alias configuration for stream indices.

Stateful modules

  • The Agent comms API module in the Server updates the states in the Indexer based on stateful requests sent by Agents.

Commands

  • Every Agent must poll for commands using the /poll_commands Agent comms API endpoint.
  • To send a command to an Agent you need to use the send_commands Agent comms API endpoint.
  • During the send_commands endpoint processing, commands are sent to the Commands stream in the Indexer.
  • The Indexer must process the Commands stream and send the command execution to the Server node where the Agent /poll_commands request is made.

Non-functional requirements

  • Scalability. The system must handle a growing number of Agents and data without performance degradation.

  • Performance. The system must ensure low-latency data processing and communication between Agents and Servers. Indexer queries and updates should be optimized to handle high throughput.

  • Availability. The system must be highly available with minimal downtime. Indexer clusters should have redundancy and failover mechanisms.

  • Reliability. Data consistency and reliability must be maintained across all components. The system should guarantee the successful delivery and processing of data and commands.

  • Security. Access controls and authentication mechanisms must be enforced throughout the system.

  • Maintainability. Clear documentation and well-defined interfaces for components are necessary to facilitate development.

  • Manageability. The system should provide monitoring and alerting capabilities to track performance and health. Administrative tasks such as backups, scaling, and updates should be automated as much as possible.

  • Resource Efficiency. Efficient use of CPU, memory, and storage resources must be considered in the design.

Implementation restrictions

  • The Indexer initialization will be implemented via an Indexer plugin (Initialization plugin).
  • The Commands stream processing will be implemented via an Indexer plugin (Commands manager). The OpenSearch Job Scheduler plugin should be considered as a base for this.

Plan

Spike. ETA 6/27/2024

MVP implementation.

  • Indexer initialization plugin wazuh-indexer-plugins#9

    • Owner: @wazuh/devel-indexer
    • Teams involved: @wazuh/devel-indexer @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-cppserver
  • Data persistence model definition

    • Owner: @wazuh/devel-indexer
    • Teams involved: @wazuh/devel-indexer @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-cppserver
  • Commands Manager plugin

    • Owner: @wazuh/devel-indexer
    • Teams involved: @wazuh/devel-indexer @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-cppserver
  • Develop Agent Inventory module modifications wazuh-agent#15

    • Owner: @wazuh/devel-agent
    • Teams involved: @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-dashboard
  • Agent centralized configuration wazuh-agent#32

    • This requires the Commands manager plugin.
    • Owner: @wazuh/devel-agent
    • Teams involved: @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-indexer @wazuh/devel-dashboard

Feature complete implementation.

  • Develop Indexer initialization plugin with: RBAC by default. Identify all necessary users and their minimum permissions. Rollover + alias configuration for stream indices.

    • Owner: @wazuh/devel-indexer
    • Teams involved: @wazuh/devel-indexer @wazuh/devel-pyserver
  • Develop Agent FIM and SCA module modifications.

    • Owner: @wazuh/devel-agent
    • Teams involved: @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-dashboard
  • Develop the rest of default commands use cases.

    • Owner: @wazuh/devel-agent
    • Teams involved: @wazuh/devel-agent @wazuh/devel-pyserver @wazuh/devel-cppserver @wazuh/devel-indexer @wazuh/devel-dashboard

Integration (? weeks)

  • Adapt QA system tests.
  • Adapt Core integration tests.
  • Adapt Coordinator.
  • Adapt Framework integration tests.
  • Adapt Engine.
  • Adapt DevOps repositories. @wazuh/devel-devops
  • Adapt Cloud service. @wazuh/devel-cloud

Acceptance testing. (2 weeks) @wazuh/devel-qa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In progress
Status: In progress
Development

No branches or pull requests

2 participants