Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: azure_blob sink #6861

Merged
merged 43 commits into from Jun 25, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
52e23be
add empty sink & required imports
ArtemTrofimushkin Mar 16, 2021
5eaf45b
implement draft version for azure_blob sink
ArtemTrofimushkin Mar 16, 2021
0819290
enhancement(azure_blob_sink): Determine content_type from compression
ArtemTrofimushkin Mar 16, 2021
5678646
enhancement(azure_blob_sink): Implement custom health check errors
ArtemTrofimushkin Mar 16, 2021
8c3b0c9
enhancement(azure_blob_sink): Implement instrumentation
ArtemTrofimushkin Mar 16, 2021
5cbbd3a
enhancement(azure_blob_sink): Implement retries
ArtemTrofimushkin Mar 16, 2021
7a7a37f
enhancement(azure_blob_sink): Format code & implement text encoding
ArtemTrofimushkin Mar 16, 2021
d8b9949
enhancement(azure_blob_sink): refactor config options
ArtemTrofimushkin Mar 22, 2021
2f734ad
enhancement(azure_blob_sink): implement append_uuid to blob_name
ArtemTrofimushkin Mar 22, 2021
67965c7
enhancement(azure_blob_sink): implement some unit tests
ArtemTrofimushkin Mar 22, 2021
785d1e7
enhancement(azure_blob_sink): setup integration tests
ArtemTrofimushkin Mar 22, 2021
86f941e
enhancement(azure_blob_sink): implement integration test for blob wit…
ArtemTrofimushkin Mar 23, 2021
c2ab937
enhancement(azure_blob_sink): add azure-integration-test make target
ArtemTrofimushkin Mar 23, 2021
a3a7a5a
enhancement(azure_blob_sink): add azure-blob target to tests & fix fo…
ArtemTrofimushkin Mar 23, 2021
5c95c94
enhancement(azure_blob_sink): fix content encoding handling
ArtemTrofimushkin Mar 23, 2021
180dc7d
enhancement(azure_blob_sink): fix formatting & add integration test f…
ArtemTrofimushkin Mar 23, 2021
ee852dd
enhancement(azure_blob_sink): fix json encoding & implement test
ArtemTrofimushkin Mar 24, 2021
e671282
enhancement(azure_blob_sink): add test for json-gzip
ArtemTrofimushkin Mar 24, 2021
348fc5a
enhancement(azure_blob_sink): format code & implement test for blob r…
ArtemTrofimushkin Mar 24, 2021
dac2b9b
enhancement(azure_blob_sink): fix make target description
ArtemTrofimushkin Mar 26, 2021
15879be
enhancement(azure_blob_sink): fix code & Cargo.lock after merge
ArtemTrofimushkin May 12, 2021
a71be3a
enhancement(azure_blob_sink): add sink metrics draft
ArtemTrofimushkin May 12, 2021
aaf9823
enhancement(azure_blob_sink): fix unit tests
ArtemTrofimushkin May 13, 2021
3024b9b
enhancement(azure_blob_sink): upgrade sdk & some dependencies
ArtemTrofimushkin May 14, 2021
a9370bf
enhancement(azure_blob_sink): update sdk usage
ArtemTrofimushkin May 14, 2021
d722e44
enhancement(azure_blob_sink): update sdk usage & fix tests
ArtemTrofimushkin May 26, 2021
53025dd
enhancement(azure_blob_sink): pin emulator version & set required opt…
ArtemTrofimushkin May 26, 2021
db70f00
enhancement(azure_blob_sink): add additional metrics
ArtemTrofimushkin May 28, 2021
ce33999
enhancement(azure_blob_sink): upgrade sdk version
ArtemTrofimushkin Jun 9, 2021
5ba96f2
enhancement(azure_blob_sink): merge upstream changes
ArtemTrofimushkin Jun 9, 2021
52f4d69
enhancement(azure_blob_sink): fix build
ArtemTrofimushkin Jun 15, 2021
8ca7d29
enhancement(azure_blob_sink): ignore failed integration tests
ArtemTrofimushkin Jun 15, 2021
982be7a
enhancement(azure_blob_sink): fix build errors
ArtemTrofimushkin Jun 15, 2021
baf3981
enhancement(azure_blob_sink): fix pr comments
ArtemTrofimushkin Jun 15, 2021
9047878
enhancement(azure_blob_sink): merge upstream changes
ArtemTrofimushkin Jun 21, 2021
b968637
enhancement(azure_blob_sink): merge upstream changes
ArtemTrofimushkin Jun 22, 2021
4ce3adc
enhancement(azure_blob_sink): fix pr comments
ArtemTrofimushkin Jun 22, 2021
3e46344
enhancement(azure_blob_sink): fix content type & integration scripts …
ArtemTrofimushkin Jun 22, 2021
eea8a28
enhancement(azure_blob_sink): add draft for docs reference & update d…
ArtemTrofimushkin Jun 22, 2021
5fb5d8f
enhancement(azure_blob_sink): fix docs & pr comments
ArtemTrofimushkin Jun 24, 2021
4e56d10
enhancement(azure_blob_sink): fix docs
ArtemTrofimushkin Jun 24, 2021
f480c7e
Merge branch 'master' into feature/azure-blob-sink
ArtemTrofimushkin Jun 25, 2021
f2ea1d9
enhancement(azure_blob_sink): fix docs
ArtemTrofimushkin Jun 25, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
194 changes: 150 additions & 44 deletions Cargo.lock

Large diffs are not rendered by default.

18 changes: 16 additions & 2 deletions Cargo.toml
Expand Up @@ -144,6 +144,11 @@ rusoto_signature = { version = "0.46.0", optional = true }
rusoto_sqs = { version = "0.46.0", optional = true }
rusoto_sts = { version = "0.46.0", optional = true }

# Azure
azure_core = { git = "https://github.com/Azure/azure-sdk-for-rust.git", rev = "16bcf0ab1bb6e380d966a69d314de1e99ede553a", default-features = false, features = ["enable_reqwest"], optional = true }
azure_storage = { git = "https://github.com/Azure/azure-sdk-for-rust.git", rev = "16bcf0ab1bb6e380d966a69d314de1e99ede553a", default-features = false, features = ["blob"], optional = true }
reqwest = { version = "0.11", optional = true }

# Tower
tower = { version = "0.4.8", default-features = false, features = ["buffer", "limit", "retry", "timeout", "util"] }
tower-layer = { version = "0.3.1", default-features = false }
Expand Down Expand Up @@ -212,7 +217,7 @@ getset = { version = "0.1.1", default-features = false }
glob = { version = "0.3.0", default-features = false }
grok = { version = "1.1.0", default-features = false, optional = true }
headers = { version = "0.3.4", default-features = false }
heim = { version = "0.1.0-rc.1", default-features = false, features = ["cpu", "disk", "host", "memory", "net"], optional = true }
heim = { git = "https://github.com/heim-rs/heim.git", rev="b292f1535bb27c03800cdb7509fa81a40859fbbb", default-features = false, features = ["cpu", "disk", "host", "memory", "net"], optional = true }
bruceg marked this conversation as resolved.
Show resolved Hide resolved
hostname = { version = "0.3.1", default-features = false }
http = { version = "0.2.4", default-features = false }
hyper = { version = "0.14.9", default-features = false, features = ["stream"] }
Expand All @@ -233,7 +238,7 @@ async-nats = { version = "0.9.18", default-features = false, optional = true }
nom = { version = "6.1.2", default-features = false, optional = true }
notify = { version = "4.0.17", default-features = false }
num_cpus = { version = "1.13.0", default-features = false }
once_cell = { version = "1.3", default-features = false }
once_cell = { version = "1.7", default-features = false }
bruceg marked this conversation as resolved.
Show resolved Hide resolved
openssl = { version = "0.10.35", default-features = false }
openssl-probe = { version = "0.1.4", default-features = false }
percent-encoding = { version = "2.1.0", default-features = false }
Expand Down Expand Up @@ -311,6 +316,8 @@ tower-test = "0.4.0"
walkdir = "2.3.2"
quickcheck = "1.0.3"
lookup = { path = "lib/lookup", features = ["arbitrary"] }
azure_core = { git = "https://github.com/Azure/azure-sdk-for-rust.git", rev = "16bcf0ab1bb6e380d966a69d314de1e99ede553a", features = ["azurite_workaround"] }
azure_storage = { git = "https://github.com/Azure/azure-sdk-for-rust.git", rev = "16bcf0ab1bb6e380d966a69d314de1e99ede553a", features = ["azurite_workaround"] }


[patch.crates-io]
Expand Down Expand Up @@ -544,6 +551,7 @@ sinks-logs = [
"sinks-aws_kinesis_streams",
"sinks-aws_s3",
"sinks-aws_sqs",
"sinks-azure_blob",
"sinks-azure_monitor_logs",
"sinks-blackhole",
"sinks-clickhouse",
Expand Down Expand Up @@ -588,6 +596,7 @@ sinks-aws_kinesis_firehose = ["rusoto", "rusoto_firehose"]
sinks-aws_kinesis_streams = ["rusoto", "rusoto_kinesis"]
sinks-aws_s3 = ["base64", "bytesize", "md-5", "rusoto", "rusoto_s3", "uuid"]
sinks-aws_sqs = ["rusoto", "rusoto_sqs"]
sinks-azure_blob = ["bytesize", "azure_core", "azure_storage", "reqwest", "uuid"]
sinks-azure_monitor_logs = ["bytesize"]
sinks-blackhole = []
sinks-clickhouse = ["bytesize"]
Expand Down Expand Up @@ -654,6 +663,10 @@ aws-integration-tests = [
"aws-sqs-integration-tests",
]

azure-integration-tests = [
"azure-blob-integration-tests"
]

aws-cloudwatch-logs-integration-tests = ["sinks-aws_cloudwatch_logs"]
aws-cloudwatch-metrics-integration-tests = ["sinks-aws_cloudwatch_metrics"]
aws-ec2-metadata-integration-tests = ["transforms-aws_ec2_metadata"]
Expand All @@ -662,6 +675,7 @@ aws-kinesis-firehose-integration-tests = ["rusoto_es", "sinks-aws_kinesis_fireho
aws-kinesis-streams-integration-tests = ["sinks-aws_kinesis_streams"]
aws-s3-integration-tests = ["sinks-aws_s3", "sources-aws_s3"]
aws-sqs-integration-tests = ["sinks-aws_sqs"]
azure-blob-integration-tests = ["sinks-azure_blob"]
clickhouse-integration-tests = ["sinks-clickhouse", "warp"]
docker-logs-integration-tests = ["sources-docker_logs", "unix"]
es-integration-tests = ["sinks-elasticsearch"]
Expand Down
14 changes: 13 additions & 1 deletion Makefile
Expand Up @@ -301,7 +301,7 @@ test-behavior: ## Runs behaviorial test

.PHONY: test-integration
test-integration: ## Runs all integration tests
test-integration: test-integration-aws test-integration-clickhouse test-integration-docker-logs test-integration-elasticsearch
test-integration: test-integration-aws test-integration-azure test-integration-clickhouse test-integration-docker-logs test-integration-elasticsearch
test-integration: test-integration-fluent test-integration-gcp test-integration-humio test-integration-influxdb test-integration-kafka
test-integration: test-integration-logstash test-integration-loki test-integration-mongodb_metrics test-integration-nats
test-integration: test-integration-nginx test-integration-postgresql_metrics test-integration-prometheus test-integration-pulsar
Expand All @@ -319,6 +319,18 @@ ifeq ($(AUTODESPAWN), true)
@scripts/setup_integration_env.sh aws stop
endif

.PHONY: test-integration-azure
test-integration-azure: ## Runs Azure integration tests
ifeq ($(AUTOSPAWN), true)
@scripts/setup_integration_env.sh azure stop
@scripts/setup_integration_env.sh azure start
sleep 5 # Many services are very slow... Give them a sec...
endif
${MAYBE_ENVIRONMENT_EXEC} cargo test --no-fail-fast --no-default-features --features azure-integration-tests --lib ::azure_ -- --nocapture
ifeq ($(AUTODESPAWN), true)
@scripts/setup_integration_env.sh azure stop
endif

.PHONY: test-integration-clickhouse
test-integration-clickhouse: ## Runs Clickhouse integration tests
ifeq ($(AUTOSPAWN), true)
Expand Down
197 changes: 197 additions & 0 deletions docs/reference/components/sinks/azure_blob.cue
@@ -0,0 +1,197 @@
package metadata

components: sinks: azure_blob: {
title: "Azure Blob Storage"

classes: {
commonly_used: true
delivery: "at_least_once"
development: "beta"
egress_method: "batch"
service_providers: ["Azure"]
stateful: false
}

features: {
buffer: enabled: true
healthcheck: enabled: true
send: {
batch: {
enabled: true
common: true
max_bytes: 10485760
timeout_secs: 300
}
compression: {
enabled: true
default: "gzip"
algorithms: ["none", "gzip"]
levels: ["none", "fast", "default", "best", 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
}
encoding: {
enabled: true
codec: {
enabled: true
default: null
enum: ["ndjson", "text"]
}
}
request: {
enabled: true
concurrency: 50
rate_limit_num: 250
headers: false
}
tls: enabled: false
to: {
service: services.azure_blob

interface: {
socket: {
api: {
title: "Azure Blob Service REST API"
url: urls.azure_blob_endpoints
}
direction: "outgoing"
protocols: ["http"]
ssl: "required"
}
}
}
}
}

support: {
targets: {
"aarch64-unknown-linux-gnu": true
"aarch64-unknown-linux-musl": true
"armv7-unknown-linux-gnueabihf": true
"armv7-unknown-linux-musleabihf": true
"x86_64-apple-darwin": true
"x86_64-pc-windows-msv": true
"x86_64-unknown-linux-gnu": true
"x86_64-unknown-linux-musl": true
}
requirements: []
warnings: []
notices: []
}

configuration: {
connection_string: {
description: "The Azure Blob Storage Account connection string. Only authentication with access key supported."
required: true
warnings: []
type: string: {
examples: ["DefaultEndpointsProtocol=https;AccountName=mylogstorage;AccountKey=storageaccountkeybase64encoded;EndpointSuffix=core.windows.net"]
syntax: "literal"
}
}
container_name: {
description: "The Azure Blob Storage Account container name."
required: true
warnings: []
type: string: {
examples: ["my-logs"]
syntax: "literal"
}
}
blob_prefix: {
category: "File Naming"
common: true
description: "A prefix to apply to all object key names. This should be used to partition your objects, and it's important to end this value with a `/` if you want this to be the root azure storage \"folder\"."
required: false
warnings: []
type: string: {
default: "blob/%F/"
examples: ["date/%F/", "date/%F/hour/%H/", "year=%Y/month=%m/day=%d/", "kubernetes/{{ metadata.cluster }}/{{ metadata.application_name }}/"]
syntax: "template"
}
}
blob_append_uuid: {
category: "File Naming"
common: false
description: "Whether or not to append a UUID v4 token to the end of the file. This ensures there are no name collisions high volume use cases."
required: false
warnings: []
type: bool: default: true
}
blob_time_format: {
category: "File Naming"
common: false
description: "The format of the resulting object file name. [`strftime` specifiers](\(urls.strptime_specifiers)) are supported."
required: false
warnings: []
type: string: {
default: "%s"
syntax: "strftime"
}
}
}

input: {
logs: true
metrics: null
}

how_it_works: {
object_naming: {
title: "Object naming"
body: """
By default, Vector will name your blobs in the following format:

<Tabs
block={true}
defaultValue="without_compression"
values={[
{ label: 'Without Compression', value: 'without_compression', },
{ label: 'With Compression', value: 'with_compression', },
]
}>

<TabItem value="without_compression">

```text
<key_prefix><timestamp>-<uuidv4>.log
```

For example:

```text
blob/2021-06-23/1560886634-fddd7a0e-fad9-4f7e-9bce-00ae5debc563.log
```

</TabItem>
<TabItem value="with_compression">

```text
<key_prefix><timestamp>-<uuidv4>.log.gz
```

For example:

```text
blob/2021-06-23/1560886634-fddd7a0e-fad9-4f7e-9bce-00ae5debc563.log.gz
```

</TabItem>
</Tabs>

Vector appends a [UUIDV4](\(urls.uuidv4)) token to ensure there are no name
conflicts in the unlikely event 2 Vector instances are writing data at the same
time.

You can control the resulting name via the `blob_prefix`, `blob_time_format`,
and `blob_append_uuid` options.
"""
}
}

telemetry: metrics: {
events_discarded_total: components.sources.internal_metrics.output.metrics.events_discarded_total
processing_errors_total: components.sources.internal_metrics.output.metrics.processing_errors_total
http_error_response_total: components.sources.internal_metrics.output.metrics.http_error_response_total
http_request_errors_total: components.sources.internal_metrics.output.metrics.http_request_errors_total
processed_bytes_total: components.sources.internal_metrics.output.metrics.processed_bytes_total
}
}
10 changes: 10 additions & 0 deletions docs/reference/services/azure_blob.cue
@@ -0,0 +1,10 @@
package metadata

services: azure_blob: {
name: "Azure Blob Storage "
thing: "a \(name) account"
url: urls.azure_blob
versions: null

description: "[Azure Blob Storage][urls.azure_blob] is Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data."
}
2 changes: 2 additions & 0 deletions docs/reference/urls.cue
Expand Up @@ -79,6 +79,8 @@ urls: {
aws_sqs_api: "\(aws_docs)/AWSSimpleQueueService/latest/APIReference/Welcome.html"
aws_sqs_create: "\(aws_docs)/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-configure-create-queue.html"
aws_vpc_flow_logs: "\(aws_docs)/vpc/latest/userguide/flow-logs.html"
azure_blob: "https://azure.microsoft.com/en-us/services/storage/blobs/"
azure_blob_endpoints: "https://docs.microsoft.com/en-us/rest/api/storageservices/blob-service-rest-api"
azure_monitor: "https://azure.microsoft.com/en-us/services/monitor/"
azure_monitor_logs_endpoints: "https://docs.microsoft.com/en-us/rest/api/monitor/"
base64: "\(wikipedia)/wiki/Base64"
Expand Down
48 changes: 48 additions & 0 deletions scripts/setup_integration/azure_integration_env.sh
@@ -0,0 +1,48 @@
#!/usr/bin/env bash
set -o pipefail

# azure_integration_env.sh
#
# SUMMARY
#
# Builds and pulls down the Vector Azure Integration test environment

set -x

if [ $# -ne 1 ]; then
echo "Usage: $0 {stop|start}" 1>&2
exit 1
exit 1
fi
ACTION=$1

#
# Functions
#

start_podman() {
podman pod create --replace --name vector-test-integration-azure -p 10000-10001:10000-10001
podman run -d --pod=vector-test-integration-azure --name vector_local_azure_blob \
mcr.microsoft.com/azure-storage/azurite:3.11.0 azurite --blobHost 0.0.0.0 --loose
}

start_docker() {
docker network create vector-test-integration-azure
docker run -d --network=vector-test-integration-azure -v /var/run:/var/run -p 10000-10001:10000-10001 --name vector_local_azure_blob \
mcr.microsoft.com/azure-storage/azurite:3.11.0 azurite --blobHost 0.0.0.0 --loose
}

stop_podman() {
podman rm --force vector_local_azure_blob 2>/dev/null; true
podman pod stop vector-test-integration-azure 2>/dev/null; true
podman pod rm --force vector-test-integration-azure 2>/dev/null; true
}

stop_docker() {
docker rm --force vector_local_azure_blob 2>/dev/null; true
docker network rm vector-test-integration-azure 2>/dev/null; true
}

echo "Running $ACTION action for Azure integration tests environment"

"${ACTION}"_"${CONTAINER_TOOL}"