Skip to content

S3_SINK create plenty of little object #19210

@piellick

Description

@piellick

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

hi,

we are facing this behavior. S3_sink create million of little object like above:

2023-11-21-1700559645-3d9d9b4d-896e-431e-bd57-7af8555d2400.log 2,28 Ko
2023-11-21-1700559705-07bfca1e-10b4-41df-ac84-6e5bd9d7362f.log 2,28 Ko
2023-11-21-1700559735-3fae8ed5-4a66-4b17-b0e0-0bdef23c67be.log 2,28 Ko
2023-11-21-1700559915-3b1fe7ee-26ca-4be0-b3ca-4fcd5983e777.log 4,56 Ko
2023-11-21-1700559946-83074ea9-95da-47e2-a3fc-6dfa99d479c3.log 3,42 Ko
2023-11-21-1700560006-0397585b-7f62-4329-9c43-8103537f01f3.log 3,42 Ko
2023-11-21-1700560036-b976d418-3582-4f17-82c1-6b4c3b7b4980.log 2.48 Ko
....

We have a statefullSet with 3 pods on aggregator mode. On theirs aggregators, S3_sink is configured as follow:

   bucket_S3:
       type: aws_s3
       inputs: [LTL_vector_agg]
       bucket: ${BUCKET_NAME}
       region: ${BUCKET_REGION}
       storage_class: "STANDARD"
       compression: none
       endpoint: "https://${BUCKET_NAME}.s3.${BUCKET_REGION}.my_endpoint"
       key_prefix: "%Y-%m-%d-"
       filename_append_uuid: true
       encoding:
         codec: "raw_message" #keep raw format.                      
       auth:
         access_key_id: ${BUCKET_ACCESS_KEY}
         secret_access_key: ${BUCKET_SECRET_KEY}
       healthcheck:
         enabled: false
       buffer:
       - type: disk
         max_size: 10073741824  # default to 10GiB.
         when_full: drop_newest

Is it normal behavior or is it possible to tell to this sink to create bigger objects?

Configuration

No response

Version

0.34

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    domain: configAnything related to configuring Vectorsink: aws_s3Anything `aws_s3` sink related

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions