-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CC-7246: add ability to partition based on timestamp of a record value field #246
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one nit. thanks for taking this up, @dosvath!
...nector/src/test/java/com/wepay/kafka/connect/bigquery/config/BigQuerySinkTaskConfigTest.java
Outdated
Show resolved
Hide resolved
…config/BigQuerySinkTaskConfigTest.java Co-Authored-By: Arjun Satish <wicknicks@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks, @dosvath!
@mtagle @bingqinzhou can you please take a look at this PR? thanks in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, a solid PR 👍
Just one concern,timestampPartitionFieldName
takes a string. What if a we have different topic with different partition field?
kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/BigQuerySinkTask.java
Outdated
Show resolved
Hide resolved
...nector/src/test/java/com/wepay/kafka/connect/bigquery/config/BigQuerySinkTaskConfigTest.java
Show resolved
Hide resolved
For this PR only one partition field name is supported, which means if you'd like a second topic that has a different partition field name, you would have to start another connector. Alternatively, you could create the table for the second topic manually, and specify the field, then use the connector with the new topic added to the topics configuration. (If the table you're trying to write to already exists before starting the connector, and the schema doesn't need updating, the single In the future, functionality could be added to support a list of partitionFieldName's that correspond to the number and order of the topics given to the connector. |
Codecov Report
@@ Coverage Diff @@
## master #246 +/- ##
============================================
+ Coverage 69.05% 70.94% +1.88%
- Complexity 277 289 +12
============================================
Files 32 32
Lines 1477 1504 +27
Branches 155 156 +1
============================================
+ Hits 1020 1067 +47
+ Misses 409 383 -26
- Partials 48 54 +6
|
Another possibility to @dosvath reply is to just add a SMT to rename the field to |
@dosvath Do you need a maven central publication as well? |
@criccomini Could you release a new version with this improvement? We'd like to unblock some customers. |
@NathanNam @criccomini Shall we also make a fix for below mentioned issue before releasing? |
@apoorvmittal10 @NathanNam waiting for your guidance on whether release is required now, or you want to wait to fix the issue @apoorvmittal10 linked to. |
@criccomini I shall say we shall go ahead with #246 while reverting #248 PR. |
We can't revert 248--we are seeing problems with throttling right now without that patch. I think we should fix forward. |
fix forward makes sense. looks like @apoorvmittal10 made a PR here for the fix: #257. if that is all we need, can we prioritize reviewing it? |
Yep-- Just pinged @bingqinzhou |
Merged #257 and released it as 1.6.1, waiting for the new version to show up on maven central. |
Thank you all! |
Replaces #214. Implements additional validation for the configs to either use decorators or field name for partitioning. Adds ability to partition BQ tables based on a field name that contains timestamp information.
Performed Manual Integration Tests
Tokens
PROJECT_ID
,DATASET_ID
,TABLE_ID
are used as substitutes in this example for the actual IDs.The output will list all the rows that have different dates, with an additional column of
partition_id
. Records that have a timestamp within a day's range will have the samepartition_id
as expected.