Skip to content

feat(clickhouse sink): add query_settings to clickhouse sink #22764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pm5
Copy link
Contributor

@pm5 pm5 commented Mar 31, 2025

Summary

It PR adds a query_settings field in the configuration of clickhouse sink, with a few configurations for asynchronous inserts.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

I use the following Vector config to test it:

sources:
  demo_logs:
    type: demo_logs
    format: syslog
    count: 10
    interval: 0.0

sinks:
  clickhouse:
    inputs: ["demo_logs"]
    type: clickhouse
    endpoint: http://localhost:8123/
    database: default
    table: demo_logs
    skip_unknown_fields: true
    auth:
      user: default
      password: ''
      strategy: basic
    query_settings:
      async_insert: true
      wait_for_async_insert: true
      async_insert_max_data_size: 1000000
      async_insert_max_query_number: 1000000

Start ClickHouse, enable asynchronous logs, create a demo_logs table:

CREATE TABLE default.demo_logs
(
    `host` String,
    `message` String,
    `service` String,
    `source_type` String,
    `timestamp` String
)
ENGINE = MergeTree
ORDER BY timestamp

Run Vector. And then check ClickHouse logs for asynchronous inserts:

select now(), * from system.asynchronous_insert_log order by event_time format Vertical

You should see the queries done by Vector.

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Please read our Vector contributor resources.
    • make check-all is a good command to run locally. This check is
      defined here. Some of these
      checks might not be relevant to your PR. For Rust changes, at the very least you should run:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

@pm5 pm5 requested a review from a team as a code owner March 31, 2025 14:27
@bits-bot
Copy link

bits-bot commented Mar 31, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label Mar 31, 2025
@pm5 pm5 changed the title feat(sink: clickhouse) #22373: add query_settings to clickhouse sink feat(sink: clickhouse): add query_settings to clickhouse sink Mar 31, 2025
@pm5 pm5 force-pushed the feat-clickhouse-querysettings branch 2 times, most recently from a017958 to a7b55af Compare March 31, 2025 14:41
@pront pront added the sink: clickhouse Anything `clickhouse` sink related label Mar 31, 2025
@pm5
Copy link
Contributor Author

pm5 commented Apr 7, 2025

Hi @pront ! Thanks for updating the info about this PR. Is there anything I can add to help with the reviewing?

@pront pront changed the title feat(sink: clickhouse): add query_settings to clickhouse sink feat(clickhouse sink): add query_settings to clickhouse sink Apr 7, 2025
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pm5, this looks good overall. Left a small comment.

pub query_settings: ClickHouseQuerySettingsConfig,
}

/// Query settings for the `clickhouse` sink.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the fields are related to https://clickhouse.com/docs/cloud/bestpractices/asynchronous-inserts. Maybe we can rename to async_insert_setttings and add a link to the docs.

Copy link
Contributor Author

@pm5 pm5 Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is #22373 which suggests moving the three existing query settings into a query_settings option, so this was implemented with that in mind. Maybe we can instead move the existing query settings into query_settings as proposed in the future.

Also, not all of ClickHouse settings make sense in the context of Vector, but maybe things like insert_deduplicate do make sense. We can also add these into the query_settings option when there is need for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can for sure add a link to the doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the links above on the query_settings option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically all these settings have the same async_insert prefix (or suffix), that's why I was thinking we might be able to avoid repeating it. But it's a minor UX concern.

Copy link
Contributor Author

@pm5 pm5 Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I prefer keeping the original ClickHouse option names for now. There are just a lot of them and a lot of work if we want to bring additional organisation into them. So if you don't mind, I'm closing this thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I prefer keeping the original ClickHouse option names for now.

This is necessary for backwards compatibility. I was referring only to new fields. For example we now have:

query_settings:
  async_insert: true
  wait_for_async_insert: true
  async_insert_deduplicate: true

An alternative is:

query_settings:
  async_insert:
    wait_for_async_insert: true
    async_insert_deduplicate: true

I also realized there's a nuance with async_insert being omitted and other fields like wait_for_async_insert being present in the config.

Copy link
Contributor Author

@pm5 pm5 Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this is better :) Isn't the alternative duplicating the async keyword in the subcategory name and the config names?

@pm5 pm5 force-pushed the feat-clickhouse-querysettings branch from 232c2c0 to 17e0442 Compare April 9, 2025 04:29
@pm5
Copy link
Contributor Author

pm5 commented Apr 10, 2025

Thanks @pront . Let me know what you think of the current version.

@pm5 pm5 force-pushed the feat-clickhouse-querysettings branch 2 times, most recently from 411a447 to 4440d53 Compare April 14, 2025 16:34
@pm5 pm5 force-pushed the feat-clickhouse-querysettings branch 2 times, most recently from d48b1dc to 0f93164 Compare April 14, 2025 17:57
@pm5 pm5 force-pushed the feat-clickhouse-querysettings branch from 0f93164 to 9906d10 Compare April 14, 2025 17:59
@pm5
Copy link
Contributor Author

pm5 commented Apr 15, 2025

I found a problem with bool type options and fixed it. Also asked clippy to allow more arguments on set_uri_query.

@pront
Copy link
Member

pront commented Apr 15, 2025

I found a problem with bool type options and fixed it. Also asked clippy to allow more arguments on set_uri_query.

Please do not force push because I have to review the whole PR every time 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sinks Anything related to the Vector's sinks sink: clickhouse Anything `clickhouse` sink related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: ClickHouse query settings
3 participants