Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[awsfirehose] Bug receiving Cloudwatch logs #38433

Open
shivanshs9 opened this issue Mar 6, 2025 · 8 comments
Open

[awsfirehose] Bug receiving Cloudwatch logs #38433

shivanshs9 opened this issue Mar 6, 2025 · 8 comments
Assignees
Labels
bug Something isn't working receiver/awsfirehose

Comments

@shivanshs9
Copy link

shivanshs9 commented Mar 6, 2025

Component(s)

receiver/awsfirehose

What happened?

Description

I have couple of Cloudwatch log groups that I'm trying to stream to Open Telemetry using Firehose. However it's unable to process the event sometimes and logs the error.
Some logs are successfully being processed and some are facing this issue.

Steps to Reproduce

Use Firehose receiver to receive cloudwatch logs stream as per the AWS doc: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html#FirehoseExample

Expected Result

Logs to be processed successfully by the receiver.

Actual Result

Receiving following error in the firehose receiver plugin.

Collector version

0.120.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

receivers:
  awsfirehose/cwlogs:
    endpoint: 0.0.0.0:4343
    record_type: cwlogs
exporters:
  otlp:
    endpoint: ${env:OTEL_EXPORTER_OTLP_ENDPOINT}
    headers:
      signoz-access-token: ${env:SIGNOZ_API_KEY}
    tls:
      insecure: ${env:OTEL_EXPORTER_OTLP_INSECURE}
      insecure_skip_verify: ${env:OTEL_EXPORTER_OTLP_INSECURE_SKIP_VERIFY}
  batch:
    send_batch_size: 10000
    timeout: 10s
  memory_limiter:
    check_interval: 1s
    limit_percentage: 75
    spike_limit_percentage: 15
  resource/deployenv:
    attributes:
      - action: insert
        key: deployment.environment
        value: ${env:DEPLOYMENT_ENVIRONMENT}
  resourcedetection:
    detectors:
      - env
    override: false
    timeout: 2s
extensions:
  health_check:
    endpoint: 0.0.0.0:13133
service:
  extensions:
    - health_check
  telemetry:
    metrics:
      address: 0.0.0.0:8888
  pipelines:
    logs/cwstream:
      exporters:
        - otlp
      processors:
        - resource/deployenv
        - resourcedetection
        - memory_limiter
        - batch
      receivers:
        - awsfirehose/cwlogs

Log output

2025-03-06T15:40:56.198Z	error	cwlog/unmarshaler.go:85	Invalid log	{"otelcol.component.id": "awsfirehose/cwlogs", "otelcol.component.kind": "Receiver", "otelcol.signal": "logs", "datum_index": 0}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver/internal/unmarshaler/cwlog.(*Unmarshaler).UnmarshalLogs
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver@v0.120.1/internal/unmarshaler/cwlog/unmarshaler.go:85
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver.(*logsConsumer).Consume
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver@v0.120.1/logs_receiver.go:72
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver.(*firehoseReceiver).ServeHTTP
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver@v0.120.1/receiver.go:222
go.opentelemetry.io/collector/config/confighttp.(*decompressor).ServeHTTP
	go.opentelemetry.io/collector/config/confighttp@v0.120.0/compression.go:183
go.opentelemetry.io/collector/config/confighttp.(*ServerConfig).ToServer.maxRequestBodySizeInterceptor.func2
	go.opentelemetry.io/collector/config/confighttp@v0.120.0/confighttp.go:578
net/http.HandlerFunc.ServeHTTP
	net/http/server.go:2294
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP
	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.59.0/handler.go:179
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1
	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.59.0/handler.go:67
net/http.HandlerFunc.ServeHTTP
	net/http/server.go:2294
go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP
	go.opentelemetry.io/collector/config/confighttp@v0.120.0/clientinfohandler.go:26
net/http.serverHandler.ServeHTTP
	net/http/server.go:3301
net/http.(*conn).serve
	net/http/server.go:2102
2025-03-06T15:40:56.198Z	error	awsfirehosereceiver@v0.120.1/receiver.go:224	Unable to consume records	{"otelcol.component.id": "awsfirehose/cwlogs", "otelcol.component.kind": "Receiver", "otelcol.signal": "logs", "error": "record format invalid"}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver.(*firehoseReceiver).ServeHTTP
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver@v0.120.1/receiver.go:224
go.opentelemetry.io/collector/config/confighttp.(*decompressor).ServeHTTP
	go.opentelemetry.io/collector/config/confighttp@v0.120.0/compression.go:183
go.opentelemetry.io/collector/config/confighttp.(*ServerConfig).ToServer.maxRequestBodySizeInterceptor.func2
	go.opentelemetry.io/collector/config/confighttp@v0.120.0/confighttp.go:578
net/http.HandlerFunc.ServeHTTP
	net/http/server.go:2294
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP
	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.59.0/handler.go:179
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1
	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.59.0/handler.go:67
net/http.HandlerFunc.ServeHTTP
	net/http/server.go:2294
go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP
	go.opentelemetry.io/collector/config/confighttp@v0.120.0/clientinfohandler.go:26
net/http.serverHandler.ServeHTTP
	net/http/server.go:3301
net/http.(*conn).serve
	net/http/server.go:2102
2025-03-06T15:54:41.410Z	error	cwlog/unmarshaler.go:52	Expected *gzip.Reader, got *gzip.Reader	{"otelcol.component.id": "awsfirehose/cwlogs", "otelcol.component.kind": "Receiver", "otelcol.signal": "logs"}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver/internal/unmarshaler/cwlog.(*Unmarshaler).UnmarshalLogs
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver@v0.120.1/internal/unmarshaler/cwlog/unmarshaler.go:52
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver.(*logsConsumer).Consume
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver@v0.120.1/logs_receiver.go:72
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver.(*firehoseReceiver).ServeHTTP
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver@v0.120.1/receiver.go:222
go.opentelemetry.io/collector/config/confighttp.(*decompressor).ServeHTTP
	go.opentelemetry.io/collector/config/confighttp@v0.120.0/compression.go:183
go.opentelemetry.io/collector/config/confighttp.(*ServerConfig).ToServer.maxRequestBodySizeInterceptor.func2
	go.opentelemetry.io/collector/config/confighttp@v0.120.0/confighttp.go:578
net/http.HandlerFunc.ServeHTTP
	net/http/server.go:2294
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*middleware).serveHTTP
	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.59.0/handler.go:179
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.NewMiddleware.func1.1
	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.59.0/handler.go:67
net/http.HandlerFunc.ServeHTTP
	net/http/server.go:2294
go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP
	go.opentelemetry.io/collector/config/confighttp@v0.120.0/clientinfohandler.go:26
net/http.serverHandler.ServeHTTP
	net/http/server.go:3301
net/http.(*conn).serve
	net/http/server.go:2102

Additional context

No response

@shivanshs9 shivanshs9 added bug Something isn't working needs triage New item requiring triage labels Mar 6, 2025
@shivanshs9 shivanshs9 changed the title Bug receiving Cloudwatch logs [awsfirehose] Bug receiving Cloudwatch logs Mar 6, 2025
Copy link
Contributor

github-actions bot commented Mar 6, 2025

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@axw
Copy link
Contributor

axw commented Mar 7, 2025

@shivanshs9 can you please try with v0.119.0 and see if the issue reproduces there? I made some fairly large changes in v0.120.0, so this would help narrow down whether it's a regression or pre-existing issue.

@axw
Copy link
Contributor

axw commented Mar 7, 2025

Use Firehose receiver to receive cloudwatch logs stream as per the AWS doc: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html#FirehoseExample

Also, this example shows using the S3 destination, and you would have had to use the HTTP destination - could you please show you configured the destination?

@axw
Copy link
Contributor

axw commented Mar 7, 2025

I just tested with v0.119.0, and I observe error messages related to control messages like this:

{"messageType":"CONTROL_MESSAGE","owner":"CloudwatchLogs","logGroup":"","logStream":"","subscriptionFilters":[],"logEvents":[{"id":"","timestamp":1741312971934,"message":"CWL CONTROL MESSAGE: Checking health of destination Firehose."}]}

However, I still see CloudWatch logs being delivered. I'll put up a PR to fix for this.

When you say:

I have couple of Cloudwatch log groups that I'm trying to stream to Open Telemetry using Firehose. However it's unable to process the event sometimes and logs the error.
Some logs are successfully being processed and some are facing this issue.

Are you basing that just off the error messages in the log? Or are some records from your log groups not getting delivered?

@shivanshs9
Copy link
Author

@axw Sorry for the late reply.
In Signoz, I see only some log groups but not all of them, so I was assuming that those I don't see are the ones that cause this error.

But from your PR, it seems it's an issue with the event type delivered by cloudwatch and not the problem by logs. Maybe the missing service didn't just output any new log events.

Is it possible I can try out your PR build and see if the errors still come or not?

@atoulme atoulme removed the needs triage New item requiring triage label Mar 8, 2025
@axw
Copy link
Contributor

axw commented Mar 8, 2025

Is it possible I can try out your PR build and see if the errors still come or not?

@shivanshs9 you will need to build it locally: if you clone my branch, you can build the collector with make otelcontribcol. Otherwise, if my PR is merged by March 17, it will be available in v0.122.0 per https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/release.md#release-schedule

@axw
Copy link
Contributor

axw commented Mar 11, 2025

@shivanshs9 I just realised you can also grab a binary from CI: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/13733452339

atoulme pushed a commit that referenced this issue Mar 12, 2025
#### Description

Fix the CloudWatch logs unmarshaler so it ignores CONTROL_MESSAGE log
records, rather than returning an error. As mentioned at
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html,
CONTROL_MESSAGE records are produced for health-checking the
destination.

#### Link to tracking issue

Relates to #38433

(Not sure yet if it fixes it.)

#### Testing

- Added a unit test case.
- Created a Firehose delivery stream & CloudWatch subscription filter,
pointing at the collector
- Reproduced the error without the change
(#38433 (comment))
  - Verified that the error no longer occurs with my fix
- Verified that non-control log records are still produced by the
receiver

#### Documentation

N/A
@axw
Copy link
Contributor

axw commented Mar 12, 2025

My PR has been merged, so please test with v0.122.0 when it's out and let us know if you're still observing issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/awsfirehose
Projects
None yet
Development

No branches or pull requests

3 participants