CC-7246: add ability to partition based on timestamp of a record value field #214

levzem · 2019-11-20T00:45:26Z

implements this behavior:
https://cloud.google.com/bigquery/docs/partitioned-tables#date_timestamp_partitioned_tables

TLDR: BigQuery can partition based off a column that contains a timestamp, so by just passing a field in the struct to BigQuery, it will specify which column to partition by.

Signed-off-by: Lev Zemlyanov lev@confluent.io

Signed-off-by: Lev Zemlyanov <lev@confluent.io>

CLAassistant · 2019-11-20T00:45:59Z

All committers have signed the CLA.

levzem · 2019-11-20T00:47:06Z

@wicknicks @gharris1727 @aakashnshah any reviews would be appreciated

Signed-off-by: Lev Zemlyanov <lev@confluent.io>

kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/SchemaManager.java

kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/BigQuerySinkConnector.java

aakashnshah · 2019-11-20T18:22:21Z

kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/BigQuerySinkTask.java

@@ -155,6 +155,12 @@ private RowToInsert getRecordRow(SinkRecord record) {
      convertedRecord = FieldNameSanitizer.replaceInvalidKeys(convertedRecord);
    }

+    if (config.useTimestampPartitioning()) {
+      if (!convertedRecord.containsKey(config.getTimestampPartitionFieldName())) {


When/how would this happen? Looks like the first if statement accounts for the field name being non empty. Of course this doesn't ensure the record containing the field, but still wanted to know this.

this means that the record doesn't contain the field - RecordConverter returns a map of all the field names to their values, so that's how I check the struct

kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/BigQuerySinkTask.java

kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/BigQuerySinkConnector.java

aakashnshah · 2019-11-20T18:24:19Z

kcbq-connector/src/test/java/com/wepay/kafka/connect/bigquery/SchemaManagerTest.java

+    final String testTableName = "testTable";
+    final String testDatasetName = "testDataset";
+    final String testDoc = "test doc";
+    final TableId tableId = TableId.of(testDatasetName, testTableName);


can you not put these outside of the function since you use these multiple times?

kcbq-connector/src/test/java/com/wepay/kafka/connect/bigquery/SchemaManagerTest.java

aakashnshah · 2019-11-20T18:25:58Z

kcbq-connector/src/test/java/com/wepay/kafka/connect/bigquery/SchemaManagerTest.java

+    com.google.cloud.bigquery.Schema fakeBigQuerySchema =
+        com.google.cloud.bigquery.Schema.of(Field.of("mock field", LegacySQLTypeName.STRING));


you can use the mockito inline package :)

following the original tests, want to make this a non-invasive addition

levzem · 2019-11-25T18:33:25Z

@mtagle would love some eyes on this :)

archy-bold · 2019-12-03T20:13:29Z

I tried this and I was getting an error 'Streaming to metadata partition of column based partitioning table

$20191127 is disallowed.' It looks like the reason is that, with column-based partitions, you shouldn't supply the partition explicitly. You just supply the table name and BigQuery sorts the partitioning itself.

When I updated the PartitionedTableId::createFullTableName() function to simply return the table, I was able to insert records into the table.

It seems to create the table fine, though.

Source: https://stackoverflow.com/a/50006560

archy-bold · 2019-12-03T20:33:45Z

Actually, this PR does what I explained: #203

rhauch · 2020-02-03T16:45:08Z

FYI: #203 has been superseded by #229.

levzem · 2020-02-04T22:29:32Z

this PR attempts to accomplish the second half of #244 and address issue #169 by allowing auto creation of a column partitioned table in BigQuery by the connector. This may be superseded by a following PR.

levzem · 2020-02-11T20:11:35Z

superseded by #246 thus closing

add ability to partition based on timestamp of a record value field

e19182c

Signed-off-by: Lev Zemlyanov <lev@confluent.io>

add test for schema manager

087cdc6

Signed-off-by: Lev Zemlyanov <lev@confluent.io>

aakashnshah reviewed Nov 20, 2019

View reviewed changes

levzem changed the title ~~FF-1311: add ability to partition based on timestamp of a record value field~~ CC-7246: add ability to partition based on timestamp of a record value field Nov 25, 2019

archy-bold mentioned this pull request Dec 3, 2019

Issue Inserting Timestamp Type Field #217

Closed

dosvath mentioned this pull request Feb 11, 2020

CC-7246: add ability to partition based on timestamp of a record value field #246

Merged

levzem closed this Feb 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CC-7246: add ability to partition based on timestamp of a record value field #214

CC-7246: add ability to partition based on timestamp of a record value field #214

levzem commented Nov 20, 2019

CLAassistant commented Nov 20, 2019 •

edited

Loading

levzem commented Nov 20, 2019 •

edited

Loading

aakashnshah Nov 20, 2019

levzem Nov 20, 2019

aakashnshah Nov 20, 2019

aakashnshah Nov 20, 2019

levzem Nov 20, 2019

levzem commented Nov 25, 2019

archy-bold commented Dec 3, 2019

archy-bold commented Dec 3, 2019

rhauch commented Feb 3, 2020

levzem commented Feb 4, 2020

levzem commented Feb 11, 2020

		com.google.cloud.bigquery.Schema fakeBigQuerySchema =
		com.google.cloud.bigquery.Schema.of(Field.of("mock field", LegacySQLTypeName.STRING));

CC-7246: add ability to partition based on timestamp of a record value field #214

CC-7246: add ability to partition based on timestamp of a record value field #214

Conversation

levzem commented Nov 20, 2019

CLAassistant commented Nov 20, 2019 • edited Loading

levzem commented Nov 20, 2019 • edited Loading

aakashnshah Nov 20, 2019

Choose a reason for hiding this comment

levzem Nov 20, 2019

Choose a reason for hiding this comment

aakashnshah Nov 20, 2019

Choose a reason for hiding this comment

aakashnshah Nov 20, 2019

Choose a reason for hiding this comment

levzem Nov 20, 2019

Choose a reason for hiding this comment

levzem commented Nov 25, 2019

archy-bold commented Dec 3, 2019

archy-bold commented Dec 3, 2019

rhauch commented Feb 3, 2020

levzem commented Feb 4, 2020

levzem commented Feb 11, 2020

CLAassistant commented Nov 20, 2019 •

edited

Loading

levzem commented Nov 20, 2019 •

edited

Loading