Amazon Security Lake integration - Architecture and requirements #113

AlexRuiz7 · 2024-01-09T16:46:35Z

Description

Related issue: #113

In order to develop an integration as a source for Amazon Security Lake, it is necessary to investigate and understand the architecture and requirements that the integration must follow. Therefore, this issue aims to answer the questions of what the integration will look like and how it will be carried out.

Requirements and good practices

The custom source must be able to write data to Security Lake as a set of S3 objects.
The custom source must be compatible with OCSF Schema 1.0.0-rc.2.
The custom source data must be formatted as an Apache Parquet file.
The same OCSF event class should apply to each record within a Parquet-formatted object.
For sources that contain multiple categories of data, deliver each unique Open Cybersecurity Schema Framework (OCSF) event class as a separate source.

Source: https://docs.aws.amazon.com/security-lake/latest/userguide/custom-sources.html

Architecture

Overview of Security Lake

Source: https://docs.aws.amazon.com/security-lake/latest/userguide/what-is-security-lake.html

By taking a look at the conceptual diagram of Amazon Security Lake above these lines, it stands clear that our integration as a source has to be done through an Amazon S3 bucket. In particular, we are looking at the relation between Amazon S3 and "Data from SaaS application, partner solutions, cloud providers and your customer data converted to OCSF".

In order to push the data from wazuh-indexer (OpenSearch) to Amazon S3, we can either use Logstash or Data Prepper. Both tools have the input and output plugins required to read data from OpenSearch and send them to an Amazon S3 bucket.

Logstash vs Data Prepper

Both tools provide:

Elasticsearch / OpenSearch input plugin.
Amazon S3 output plugin.
- Logstash ES - plugins/outputs/s3 (+ sns and sqs)
- Data Prepper - sinks/s3

By comparing both tools, it soon becomes obvious that Logstash is a better choice, for the following reasons:

Larger set of input & output plugins: resulting in a more flexible, scalable and evolutionary integration.
Larger adoption and documentation: Logstash's larger community and documentation will make the integration easier to develop and maintain.
Maturity: compared with Data Prepper, which is a recent project, Logstash has been developed, used and evolved for longer, making it allegedly more stable.

OCSF compliant data as Apache Parquet

As Amazon Security Lake requires the data to use the OCSF schema and the Parquet encoding, we need to find a way to transform our data before delivering it to Amazon Security Lake.

Several proposals have been generated:

Use an auxiliary S3 bucket to store unprocessed data (as-is), transform it using an AWS lambda function and send it to the Amazon Security Lake S3 bucket.
Pipe the Logstash pipeline to a script that transforms and uploads the data to the Amazon Security Lake S3 bucket.
Implement a Logstash output plugin or codec to transform and upload the data to the Amazon Security Lake S3 bucket.

These proposals have their advantages and disadvantages.

Proposal #	Resources required
1	Logstash (opensearch-input + s3-output plugins) + AWS S3 bucket + AWS Lambda function + Amazon Security Lake S3 bucket
2	Logstash (opensearch-input + pipe-output plugins) + Amazon Security Lake S3 bucket
3	Logstash (opensearch-input + s3-output plugins + custom codec) + Amazon Security Lake S3 bucket

While proposal nr.1 is the most realizable, it is also the most expensive. On the other hand, proposal nr.3 is the least realizable, due to the scarce knowledge of Ruby and Logstash's plugins ecosystem, but the cheapest one to the end-user. Proposal nr.2 is a middle ground between the two.

We will explore proposals nr.1 and nr.2, with future plans on exploring proposal nr.3, depending on our success on the other two.

Conclusions

The latest version of Logstash (8.12.0 on the 30th of January 2024), together with the input-opensearch plugin, will be used to implement the integration.
Proposal nr.1 is the most promising, and will take our focus. We know of existing integrations from other companies that use this method, such as PingOne's.

Resources and bibliography

The text was updated successfully, but these errors were encountered:

AlexRuiz7 · 2024-04-24T13:59:07Z

Architecture diagram of Wazuh's integration with Amazon Security Lake.

kclinden · 2024-06-19T11:51:29Z

@AlexRuiz7 did you consider using a kinesis firehose with the lambda as its data transformation? This would let you skip the raw events s3 bucket and have firehose write them directly to the security lake custom source bucket.

AlexRuiz7 · 2024-06-20T09:33:03Z

Hi @kclinden

Not really, I'm no expert in AWS, so I went for the easiest path. I remember reading about it briefly, but iirc it would have increased the maintenance costs. Maybe I'm wrong.

How would it work in that case. Data flows through kinesis firehose straight into the Security Lake bucket? How do you define the OCSF class of the events in that case?

kclinden · 2024-06-24T13:23:04Z

Hi @kclinden

Not really, I'm no expert in AWS, so I went for the easiest path. I remember reading about it briefly, but iirc it would have increased the maintenance costs. Maybe I'm wrong.

How would it work in that case. Data flows through kinesis firehose straight into the Security Lake bucket? How do you define the OCSF class of the events in that case?

Firehose would send the data to the same Lambda function that you have already put together. The benefit being that it would let you skip the intermediate S3 bucket location and have Logstash write directly to the Firehose. Firehose does data transformation by sending to lambda and then Firehose writes to the bucket instead of the Lambda.

For the Data Prepper solution I would probably try to accomplish it all in the pipeline definition similar to this -
https://github.com/ocsf/examples/blob/main/mappings/dataprepper/AWS/v1.1.0/VPC%20Flow/pipeline.yaml

AlexRuiz7 added level/task Task issue type/research Research issue labels Jan 9, 2024

AlexRuiz7 self-assigned this Jan 9, 2024

AlexRuiz7 mentioned this issue Jan 17, 2024

Amazon Security lake integration as source #128

Closed

6 tasks

AlexRuiz7 mentioned this issue Jan 31, 2024

Amazon Security Lake integration - Data transform and delivery (DTD) #145

Closed

5 tasks

AlexRuiz7 mentioned this issue Apr 24, 2024

Amazon Security Lake integration - Technical documentation and user manual #211

Closed

2 tasks

AlexRuiz7 closed this as completed Apr 24, 2024

AlexRuiz7 mentioned this issue Jun 24, 2024

[BUG] Opensearch not presenting distribution value on main api #278

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amazon Security Lake integration - Architecture and requirements #113

Amazon Security Lake integration - Architecture and requirements #113

AlexRuiz7 commented Jan 9, 2024 •

edited

Loading

AlexRuiz7 commented Apr 24, 2024 •

edited

Loading

kclinden commented Jun 19, 2024

AlexRuiz7 commented Jun 20, 2024

kclinden commented Jun 24, 2024

Amazon Security Lake integration - Architecture and requirements #113

Amazon Security Lake integration - Architecture and requirements #113

Comments

AlexRuiz7 commented Jan 9, 2024 • edited Loading

Description

Requirements and good practices

Architecture

Logstash vs Data Prepper

OCSF compliant data as Apache Parquet

Conclusions

Resources and bibliography

AlexRuiz7 commented Apr 24, 2024 • edited Loading

Architecture diagram of Wazuh's integration with Amazon Security Lake.

kclinden commented Jun 19, 2024

AlexRuiz7 commented Jun 20, 2024

kclinden commented Jun 24, 2024

AlexRuiz7 commented Jan 9, 2024 •

edited

Loading

AlexRuiz7 commented Apr 24, 2024 •

edited

Loading