Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(new source): aws_s3 source #4779

Merged
merged 49 commits into from Nov 10, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
2cb29db
Initial implemntation of aws_3 source
jszwedko Oct 5, 2020
5b47846
Add initial integration test
jszwedko Oct 15, 2020
bf534fc
Add initial compression support
jszwedko Oct 20, 2020
7aee881
Add multiline config
jszwedko Oct 22, 2020
1242459
Remove resolved comments
jszwedko Oct 22, 2020
d8df611
Import rusoto from new location
jszwedko Oct 23, 2020
45a8525
Use Arc for credentials
jszwedko Oct 23, 2020
cbcbf6b
Validate S3 event name
jszwedko Oct 23, 2020
2feaaf6
Remove a HashMap clone
jszwedko Oct 23, 2020
9039929
Update some TODOs
jszwedko Oct 23, 2020
5f6bfb7
Add descriptive comment for reading S3 object bodies
jszwedko Oct 23, 2020
cf15d4b
Handle overflow of u64 -> i64
jszwedko Oct 23, 2020
de04df8
Validate visibility_timeout_secs
jszwedko Oct 23, 2020
825224d
Validate that object is in the same region
jszwedko Oct 23, 2020
a846258
Handle empty receipt_handles
jszwedko Oct 23, 2020
6bb4023
Raise error if we cannot parse SQS message
jszwedko Oct 23, 2020
3204b5f
Avoid redundant map
jszwedko Oct 23, 2020
eadf2a5
Add internal events
jszwedko Oct 23, 2020
cab0b68
Add support for fetching queue owned by another AWS account
jszwedko Oct 26, 2020
249e049
Cast usize in internal_events
jszwedko Oct 26, 2020
04e9803
Move region configuration up
jszwedko Oct 26, 2020
93ded11
Set timestamp to S3 object timestamp
jszwedko Oct 26, 2020
dbeec99
Initial `aws_s3` source documentation
jszwedko Oct 26, 2020
55a720d
Fix docs
jszwedko Oct 27, 2020
e0ad883
Move up cue AWS configuration
jszwedko Oct 27, 2020
5b5d01a
Add examples of parsing service logs and sqs url
jszwedko Oct 27, 2020
7055679
Add to CODEOWNERS
jszwedko Oct 27, 2020
7cc983b
Fix region configuration to allow for region or endpoint
jszwedko Oct 27, 2020
fa56529
Rename RegionParseError to RegionParse
jszwedko Oct 27, 2020
1083553
Move SQS bits to a separate file
jszwedko Oct 27, 2020
a6d81e2
Move integration tests back
jszwedko Oct 27, 2020
8256bf0
Clippy
jszwedko Oct 27, 2020
ef31e3f
Scope internal events for aws_s3 source
jszwedko Oct 27, 2020
dbc0cd1
Merge remote-tracking branch 'origin/master' into aws-s3-source
jszwedko Oct 28, 2020
0431a51
Bruce PR feedback
jszwedko Oct 28, 2020
03a7083
Validate S3 event notification version
jszwedko Oct 28, 2020
d88928d
Take ownership of io read error
jszwedko Oct 29, 2020
a9fdaef
Handle pipeline send failures
jszwedko Oct 29, 2020
c90f865
Merge remote-tracking branch 'origin/master' into aws-s3-source
jszwedko Nov 3, 2020
97eeef9
PR feedback from Jean
jszwedko Nov 3, 2020
bed6acd
clippy
jszwedko Nov 3, 2020
bb20901
Readd serde defaults
jszwedko Nov 6, 2020
a1f6d05
Use custom version parsing
jszwedko Nov 9, 2020
9b6d00f
Fix integration tests
jszwedko Nov 9, 2020
84d8953
Merge remote-tracking branch 'origin/master' into aws-s3-source
jszwedko Nov 10, 2020
9a3900a
Use queue_url for config following the pattern of https://github.com/…
jszwedko Nov 10, 2020
16800f8
Fix test error assertion
jszwedko Nov 10, 2020
33fdbdb
Fix AWS S3 integration tests
jszwedko Nov 10, 2020
c5b4345
lint internal event
jszwedko Nov 10, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Expand Up @@ -26,6 +26,7 @@
/docs/reference/components/sinks/tcp.cue @bruceg @lukesteensen

/docs/reference/components/sources/apache_metrics.cue @jszwedko
/docs/reference/components/sources/aws_s3.cue @jszwedko
/docs/reference/components/sources/docker.cue @fanatid
/docs/reference/components/sources/generator.cue @bruceg
/docs/reference/components/sources/host_metrics.cue @bruceg
Expand Down Expand Up @@ -132,6 +133,7 @@

/src/sources/apache_metrics/ @jszwedko
/src/sources/aws_kinesis_firehose/ @jszwedko
/src/sources/aws_s3.rs @jszwedko
/src/sources/docker.rs @fanatid
/src/sources/generator.rs @bruceg
/src/sources/host_metrics.rs @bruceg
Expand Down
60 changes: 58 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 6 additions & 2 deletions Cargo.toml
Expand Up @@ -79,6 +79,7 @@ metrics-tracing-context = { version = "0.1.0-alpha" }
rusoto_core = { version = "0.45.0", features = ["encoding"], optional = true }
rusoto_es = { version = "0.45.0", optional = true }
rusoto_s3 = { version = "0.45.0", optional = true }
rusoto_sqs = { version = "0.45.0", optional = true }
rusoto_logs = { version = "0.45.0", optional = true }
rusoto_cloudwatch = { version = "0.45.0", optional = true }
rusoto_kinesis = { version = "0.45.0", optional = true }
Expand Down Expand Up @@ -130,7 +131,7 @@ hyper-openssl = "0.8"
openssl = "0.10.30"
openssl-probe = "0.1.2"
flate2 = "1.0.19"
async-compression = { version = "0.3.5", features = ["tokio-02", "gzip"] }
async-compression = { version = "0.3.5", features = ["tokio-02", "gzip", "zstd"] }
structopt = "0.3.19"
indexmap = {version = "1.5.1", features = ["serde-1"]}
http = "0.2"
Expand All @@ -145,6 +146,7 @@ headers = "0.3"
rdkafka = { version = "0.24.0", features = ["libz", "ssl", "zstd"], optional = true }
hostname = "0.3.1"
seahash = { version = "3.0.6", optional = true }
semver = { version = "0.11.0", features = ["serde"] }
jemallocator = { version = "0.3.0", optional = true }
lazy_static = "1.3.0"
rlua = { git = "https://github.com/kyren/rlua", optional = true }
Expand Down Expand Up @@ -297,6 +299,7 @@ api-client = [
sources = [
"sources-apache_metrics",
"sources-aws_kinesis_firehose",
"sources-aws_s3",
"sources-docker",
"sources-file",
"sources-generator",
Expand All @@ -318,6 +321,7 @@ sources = [
]
sources-apache_metrics = []
sources-aws_kinesis_firehose = ["base64", "tls", "warp"]
sources-aws_s3 = ["rusoto_core", "rusoto_credential", "rusoto_signature", "rusoto_sts", "rusoto_s3", "rusoto_sqs"]
sources-docker = ["bollard"]
sources-file = ["bytesize", "file-source"]
sources-generator = []
Expand Down Expand Up @@ -500,7 +504,7 @@ aws-cloudwatch-metrics-integration-tests = ["sinks-aws_cloudwatch_metrics"]
aws-ec2-metadata-integration-tests = ["transforms-aws_ec2_metadata"]
aws-kinesis-firehose-integration-tests = ["sinks-aws_kinesis_firehose", "sinks-elasticsearch", "rusoto_es"]
aws-kinesis-streams-integration-tests = ["sinks-aws_kinesis_streams"]
aws-s3-integration-tests = ["sinks-aws_s3"]
aws-s3-integration-tests = ["sources-aws_s3", "sinks-aws_s3"]
clickhouse-integration-tests = ["sinks-clickhouse", "warp"]
docker-integration-tests = ["sources-docker", "unix"]
es-integration-tests = ["sinks-elasticsearch"]
Expand Down
4 changes: 2 additions & 2 deletions Makefile
Expand Up @@ -314,7 +314,7 @@ ifeq ($(CONTAINER_TOOL),podman)
$(CONTAINER_TOOL) run -d --$(CONTAINER_ENCLOSURE)=vector-test-integration-aws --name vector_ec2_metadata \
timberiodev/mock-ec2-metadata:latest
$(CONTAINER_TOOL) run -d --$(CONTAINER_ENCLOSURE)=vector-test-integration-aws --name vector_localstack_aws \
-e SERVICES=kinesis,s3,cloudwatch,elasticsearch,es,firehose \
-e SERVICES=kinesis,s3,cloudwatch,elasticsearch,es,firehose,sqs \
localstack/localstack-full:0.11.6
$(CONTAINER_TOOL) run -d --$(CONTAINER_ENCLOSURE)=vector-test-integration-aws --name vector_mockwatchlogs \
-e RUST_LOG=trace luciofranco/mockwatchlogs:latest
Expand All @@ -324,7 +324,7 @@ else
timberiodev/mock-ec2-metadata:latest
$(CONTAINER_TOOL) run -d --$(CONTAINER_ENCLOSURE)=vector-test-integration-aws --name vector_localstack_aws \
-p 4566:4566 -p 4571:4571 \
-e SERVICES=kinesis,s3,cloudwatch,elasticsearch,es,firehose \
-e SERVICES=kinesis,s3,cloudwatch,elasticsearch,es,firehose,sqs \
localstack/localstack-full:0.11.6
$(CONTAINER_TOOL) run -d --$(CONTAINER_ENCLOSURE)=vector-test-integration-aws -p 6000:6000 --name vector_mockwatchlogs \
-e RUST_LOG=trace luciofranco/mockwatchlogs:latest
Expand Down
153 changes: 153 additions & 0 deletions docs/reference/components/aws.cue
@@ -0,0 +1,153 @@
package metadata
jszwedko marked this conversation as resolved.
Show resolved Hide resolved

import (
"strings"
)

components: [Kind=string]: [Name=string]: {
if Kind == "sink" || Kind == "source" {
if strings.HasPrefix(Name, "aws_") {
configuration: {
assume_role: {
category: "Auth"
common: false
description: "The ARN of an [IAM role](\(urls.aws_iam_role)) to assume at startup."
required: false
type: string: {
default: null
examples: ["arn:aws:iam::123456789098:role/my_role"]
}
}

endpoint: {
common: false
description: "Custom endpoint for use with AWS-compatible services. Providing a value for this option will make `region` moot."
relevant_when: "region = null"
required: false
type: string: {
default: null
examples: ["127.0.0.0:5000/path/to/service"]
}
}

region: {
description: "The [AWS region](\(urls.aws_regions)) of the target service. If `endpoint` is provided it will override this value since the endpoint includes the region."
required: true
relevant_when: "endpoint = null"
type: string: {
examples: ["us-east-1"]
}
}
}

env_vars: {
AWS_ACCESS_KEY_ID: {
description: "The AWS access key id. Used for AWS authentication when communicating with AWS services."
type: string: {
default: null
examples: ["AKIAIOSFODNN7EXAMPLE"]
}
}

AWS_CONFIG_FILE: {
description: "Specifies the location of the file that the AWS CLI uses to store configuration profiles."
type: string: {
default: "~/.aws/config"
}
}

AWS_CREDENTIAL_EXPIRATION: {
description: "Expiration time in RFC 3339 format. If unset, credentials won't expire."
type: string: {
default: null
examples: ["1996-12-19T16:39:57-08:00"]
}
}

AWS_DEFAULT_REGION: {
description: "The default [AWS region](\(urls.aws_regions))."
relevant_when: "endpoint = null"
type: string: {
default: null
examples: ["/path/to/credentials.json"]
}
}

AWS_PROFILE: {
description: "Specifies the name of the CLI profile with the credentials and options to use. This can be the name of a profile stored in a credentials or config file."
type: string: {
default: "default"
examples: ["my-custom-profile"]
}
}

AWS_ROLE_SESSION_NAME: {
description: "Specifies a name to associate with the role session. This value appears in CloudTrail logs for commands performed by the user of this profile."
type: string: {
default: null
examples: ["vector-session"]
}
}

AWS_SECRET_ACCESS_KEY: {
description: "The AWS secret access key. Used for AWS authentication when communicating with AWS services."
type: string: {
default: null
examples: ["wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"]
}
}

AWS_SHARED_CREDENTIALS_FILE: {
description: "Specifies the location of the file that the AWS CLI uses to store access keys."
type: string: {
default: "~/.aws/credentials"
}
}

AWS_SESSION_TOKEN: {
description: "The AWS session token. Used for AWS authentication when communicating with AWS services."
type: string: {
default: null
examples: ["/path/to/credentials.json"]
}
}
}

how_it_works: {
aws_authentication: {
title: "AWS Authentication"
body: """
Vector checks for AWS credentials in the following order:

1. Environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`.
2. The [`credential_process` command](\(urls.aws_credential_process)) in the AWS config file. (usually located at `~/.aws/config`)
3. The [AWS credentials file](\(urls.aws_credentials_file)). (usually located at `~/.aws/credentials`)
4. The [IAM instance profile](\(urls.iam_instance_profile)). (will only work if running on an EC2 instance with an instance profile/role)

If credentials are not found the [healtcheck](#healthchecks) will fail and an
error will be [logged][docs.monitoring#logs].
"""
sub_sections: [
{
title: "Obtaining an access key"
body: """
In general, we recommend using instance profiles/roles whenever possible. In
cases where this is not possible you can generate an AWS access key for any user
within your AWS account. AWS provides a [detailed guide](\(urls.aws_access_keys)) on
how to do this.
"""
},
{
title: "Assuming roles"
body: """
Vector can assume an AWS IAM role via the [`assume_role`](#assume_role) option. This is an
optional setting that is helpful for a variety of use cases, such as cross
account access.
"""
},
]
}
}
}
}
}