-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add range partitioning support #174
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for creating this PR. I'll review this later. Before review this PR, I have a question. Could you check my comment?
README.md
Outdated
range_partitioning: | ||
field: customer_id | ||
range: | ||
start: '1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this part use string
instead of integer
? As far as I know, range partition uses a number. And we can check the range is valid (start < end
) if we use integer
.
What do you think of this configuration layout?
(Do we need a range
block?).
range_partitioning:
field: customer_id # document uses `column` but this plugin uses `field`. [1]
start: 1
end: 1000
interval: 10
# [1] https://cloud.google.com/bigquery/docs/creating-partitioned-tables#create_an_integer-range_partitioned_table
(This is just my opinion, I want to ask co-maintainers this comment.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment! I followed the api documentation. If you prefer integer, I'll change this!
https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#RangePartitioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a range block?
I fixed it with 49d2c36.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, @kitagry. Thank you for waiting.
Could you use this layout? (Sorry, we decided to use the original design (except using integer
instead of string
in range values.)
range_partitioning:
field: customer_id
range:
start: 1 # integer not string.
end: 99999 # integer not string.
interval: 1 # integer not string.
and Could you check the range start
+ interval
< end
?
If you have any concern, please let me know.
I referenced the time_partition
configurations.
BigQuery API use
{
"type": string,
"expirationMs": string,
"field": string,
"requirePartitionFilter": boolean
}
embulk configuration
type: bigquery
table: table_name$20160929
time_partitioning:
type: DAY
expiration_ms: 259200000 # integer not strong, use sake case
We discussed this using this design document. (Written in Japanese)
After modification, I'll check the partition feature.
@kitagry Thank you for more work on this PR. I'll review this PR please wait. I'm not the original plugin developer. So, I need to investigate the configuration rule. If this plugin is based on the API settings, It would be better to use the original |
49d2c36
to
f8d039f
Compare
Hi @hiroyuki-sato , I changed range-partitioning fileds to be integer in f8d039f |
f8d039f
to
5398f69
Compare
@kitagry Thanks. I will test this PR later. Please wait for a while. |
This pull request introduces support for range partitioning in BigQuery.
I checked this feature with
example/config_replace_field_range_partitioned_table.yml