-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(clickhouse sink): add query_settings to clickhouse sink #22764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
a017958
to
a7b55af
Compare
Hi @pront ! Thanks for updating the info about this PR. Is there anything I can add to help with the reviewing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @pm5, this looks good overall. Left a small comment.
pub query_settings: ClickHouseQuerySettingsConfig, | ||
} | ||
|
||
/// Query settings for the `clickhouse` sink. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the fields are related to https://clickhouse.com/docs/cloud/bestpractices/asynchronous-inserts. Maybe we can rename to async_insert_setttings
and add a link to the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is #22373 which suggests moving the three existing query settings into a query_settings
option, so this was implemented with that in mind. Maybe we can instead move the existing query settings into query_settings
as proposed in the future.
Also, not all of ClickHouse settings make sense in the context of Vector, but maybe things like insert_deduplicate do make sense. We can also add these into the query_settings
option when there is need for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can for sure add a link to the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the links above on the query_settings
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically all these settings have the same async_insert
prefix (or suffix), that's why I was thinking we might be able to avoid repeating it. But it's a minor UX concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I prefer keeping the original ClickHouse option names for now. There are just a lot of them and a lot of work if we want to bring additional organisation into them. So if you don't mind, I'm closing this thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I prefer keeping the original ClickHouse option names for now.
This is necessary for backwards compatibility. I was referring only to new fields. For example we now have:
query_settings:
async_insert: true
wait_for_async_insert: true
async_insert_deduplicate: true
An alternative is:
query_settings:
async_insert:
wait_for_async_insert: true
async_insert_deduplicate: true
I also realized there's a nuance with async_insert
being omitted and other fields like wait_for_async_insert
being present in the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this is better :) Isn't the alternative duplicating the async
keyword in the subcategory name and the config names?
232c2c0
to
17e0442
Compare
Thanks @pront . Let me know what you think of the current version. |
411a447
to
4440d53
Compare
d48b1dc
to
0f93164
Compare
0f93164
to
9906d10
Compare
I found a problem with |
Please do not force push because I have to review the whole PR every time 😅 |
Summary
It PR adds a
query_settings
field in the configuration of clickhouse sink, with a few configurations for asynchronous inserts.Change Type
Is this a breaking change?
How did you test this PR?
I use the following Vector config to test it:
Start ClickHouse, enable asynchronous logs, create a
demo_logs
table:Run Vector. And then check ClickHouse logs for asynchronous inserts:
You should see the queries done by Vector.
Does this PR include user facing changes?
Checklist
make check-all
is a good command to run locally. This check isdefined here. Some of these
checks might not be relevant to your PR. For Rust changes, at the very least you should run:
cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspace
(alternatively, you can runcargo test --all
)Cargo.lock
), pleaserun
dd-rust-license-tool write
to regenerate the license inventory and commit the changes (if any). More details here.References