Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector sink fails when sending payloads bigger than 4MB #17926

Closed
trennepohl opened this issue Jul 10, 2023 · 9 comments · Fixed by #18186
Closed

Vector sink fails when sending payloads bigger than 4MB #17926

trennepohl opened this issue Jul 10, 2023 · 9 comments · Fixed by #18186
Labels
meta: regression This issue represents a regression sink: vector Anything `vector` sink related source: vector Anything `vector` source related type: bug A code related bug.
Milestone

Comments

@trennepohl
Copy link
Contributor

trennepohl commented Jul 10, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Vector sink fails to push messages bigger than 4MB.

2023-07-10T11:54:59.051789Z ERROR sink{component_kind="sink" component_id=out_aggregator component_type=vector component_name=out_aggregator}:request{request_id=5159}: vector::sinks::util::retries: Non-retriable er
ror; dropping the request. error=Request failed: status: OutOfRange, message: "Error, message length too large: found 5000077 bytes, the limit is: 4194304 bytes", details: [], metadata: MetadataMap { headers: {"con
tent-type": "application/grpc", "server": "envoy", "x-envoy-upstream-service-time": "124", "content-length": "0", "date": "Mon, 10 Jul 2023 11:54:59 GMT"} } internal_log_rate_limit=true

Configuration

[sinks.out_aggregator]
  type = "vector"
  inputs = [ "add_gcp_info" ]
  address = "${AGGREGATOR_ADDRESS}"
  version = "2"
  batch.max_bytes = 5000000
  request.concurrency = "adaptive"
  request.timeout_secs = 30
  request.retry_max_duration_secs = 10
  buffer.type = "memory"
  buffer.when_full = "block"
  buffer.max_events = 30000

Version

0.31.0

Additional Context

After upgrading Vector to version 0.31.0 we started seeing this error.

After a bit of searching on the internet I bumped into this article.
https://cprimozic.net/notes/posts/rust-tonic-request-response-size-limits/

Looks like from tonic 0.9 default max size is 4mb.

References

@trennepohl trennepohl added the type: bug A code related bug. label Jul 10, 2023
@jszwedko
Copy link
Member

Thanks for opening this @trennepohl ! It seems like we need to:

  • Set the max encoding size in the Vector sink to whatever the maximum configured batch size is
  • Allow configuration of the max decoding size in the Vector source to let users set it higher than 4 MB

Alternatively to preserve backwards compatibility we could remove the limits again and re-add them later. What version were you upgrading from where it worked?

@jszwedko jszwedko added sink: vector Anything `vector` sink related source: vector Anything `vector` source related meta: regression This issue represents a regression labels Jul 10, 2023
@jszwedko jszwedko added this to the Vector 0.31.1 milestone Jul 10, 2023
@trennepohl
Copy link
Contributor Author

trennepohl commented Jul 10, 2023

Thanks for opening this @trennepohl !

No worries 👍

Set the max encoding size in the Vector sink to whatever the maximum configured batch size is

Yeah I tried to do that using max_decoding_message_size on the proto_client in Vector's sink, but didn't work 🤷
Not super familiar with Rust 😅 .

What version were you upgrading from where it worked?

0.29

@jszwedko
Copy link
Member

Thanks for the additional context @trennepohl ! I think we can probably back out the limits for now and then re-add them later; tying them to the batch sizes.

jszwedko added a commit that referenced this issue Aug 8, 2023
This was added in tonic 0.9 but caused a regression in Vector behavior. I think ideally this would
be configurable, but I think in this case we can let users ask for it since the use-case for the
`vector` source/sink is primarily such that the operator controls both sides and doesn't need to
worry about DOS from misbehaving clients.

Fixes: #17926

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
github-merge-queue bot pushed a commit that referenced this issue Aug 8, 2023
This was added in tonic 0.9 but caused a regression in Vector behavior. I think ideally this would
be configurable, but I think in this case we can let users ask for it since the use-case for the
`vector` source/sink is primarily such that the operator controls both sides and doesn't need to
worry about DOS from misbehaving clients.

Fixes: #17926

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
neuronull pushed a commit that referenced this issue Aug 16, 2023
This was added in tonic 0.9 but caused a regression in Vector behavior. I think ideally this would
be configurable, but I think in this case we can let users ask for it since the use-case for the
`vector` source/sink is primarily such that the operator controls both sides and doesn't need to
worry about DOS from misbehaving clients.

Fixes: #17926

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
@jszwedko jszwedko added this to the Vector 0.32.0 milestone Aug 17, 2023
@slgero
Copy link

slgero commented Aug 18, 2023

@jszwedko Hi, I've updated vector to 0.32.0 and still get this error:

Some(Request { source: Status { code: OutOfRange, message: "Error, message length too large: found 4612000 bytes, the limit is: 4194304 bytes", metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Fri, 18 Aug 2023 09:15:09 GMT", "content-length": "0"} }, source: None } })

My configuration:

[sinks.center_vector]
    address = "${AGGREGATOR_ADDRESS}"
    compression = true
    healthcheck = false
    inputs = ["log_files", "vector_logs"]
    type = "vector"
[sinks.center_vector.batch]
    max_events = 5000
    timeout_secs = 1
[sinks.center_vector.buffer]
    max_events = 6000
    type = "memory"
    when_full = "block"

Any thoughts about this?

@jszwedko
Copy link
Member

Huh, interesting. I admittedly hadn't actually tested this 😓 I just followed the recommendations. Let me try it with your config. I'll reopen this to track.

@jszwedko jszwedko reopened this Aug 18, 2023
@jszwedko
Copy link
Member

jszwedko commented Aug 18, 2023

Hi @slgero ,

I'm having trouble reproducing this with 0.32.0 though I can with 0.31.0. Are you sure you are running v0.32.0 as the receiver (that is, the Vector instance with the vector source)?

As I was setting up the reproduction I remembered I did actually test this change locally.

@neuronull
Copy link
Contributor

I was also able to repro this with 0.31.0 but not on 0.32.0. In the A/B cases, both sender and receiver were on the same version.

@slgero
Copy link

slgero commented Aug 21, 2023

Hi, @jszwedko,
As soon as I updated the version on the centre vector too, the error disappeared. I've only updated the version at the agent before.
So you can close the issue. Thank you!

@jszwedko
Copy link
Member

Great, thanks for confirming @slgero !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: regression This issue represents a regression sink: vector Anything `vector` sink related source: vector Anything `vector` source related type: bug A code related bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants