Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix query failure of predicate pushdown to BigQuery DATETIME type #9005

Merged
merged 1 commit into from Aug 30, 2021

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Aug 27, 2021

The query having filter on BigQuery DATETIME type caused below error before this change:

trino> select * from bigquery.test.test_datetime where col = cast('0001-01-01 00:00:00.999' as timestamp(3));
Query 20210827_070648_00014_u3ikv failed: 
io.grpc.StatusRuntimeException: INVALID_ARGUMENT: request failed: 
Row filter for xxx.test.test_datetime is invalid. Filter is '(`col` = '+1967030-04-28 09:35:38.000')'
...
  • Run all BigQuery tests locally (passed)

@cla-bot cla-bot bot added the cla-signed label Aug 27, 2021
@ebyhr ebyhr added the bug Something isn't working label Aug 27, 2021
@@ -147,7 +147,7 @@ static String dateToStringConverter(Object value)

static String datetimeToStringConverter(Object value)
{
return formatTimestamp(((Long) value).longValue(), systemDefault());
return formatTimestamp(((Long) value).longValue() / NANOSECONDS_PER_MICROSECOND, UTC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does BigQuery connector only read TIMESTAMP(3)? Otherwise the Object can be a long-timestamp type which is not backed by a long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this method is called only when building filter and not used for reading BigQuery data. If we specify long-timestamp in condition, this method won't be called because the precision is different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, indeed since we only have type-mapping for timestamp/tz millis we'll always get a long here.

Can you add a checkState/verify here + a test that tries to use TIMESTAMP(6) so that once we support higher precisions the test will fail and we'll be reminded to correct this code?

Otherwise we may forget about this.

(If you are already working on higher precision timestamp support then feel free to ignore since there's a lower chance of forgetting this).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative, you can com.google.common.math.LongMath#divide(value, 1000, RoundingMode.UNNECESSARY) so that the code doesn't do silent data conversion for precision >3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NANOSECONDS_PER_MICROSECOND -> MICROSECONDS_PER_MILLI, i guess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar Let me skip applying comments about timestamp(6) negative tests since I will fix soon.

@findepi Thanks, fixed.

@@ -147,7 +147,7 @@ static String dateToStringConverter(Object value)

static String datetimeToStringConverter(Object value)
{
return formatTimestamp(((Long) value).longValue(), systemDefault());
return formatTimestamp(((Long) value).longValue() / NANOSECONDS_PER_MICROSECOND, UTC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative, you can com.google.common.math.LongMath#divide(value, 1000, RoundingMode.UNNECESSARY) so that the code doesn't do silent data conversion for precision >3.

@@ -147,7 +147,7 @@ static String dateToStringConverter(Object value)

static String datetimeToStringConverter(Object value)
{
return formatTimestamp(((Long) value).longValue(), systemDefault());
return formatTimestamp(((Long) value).longValue() / NANOSECONDS_PER_MICROSECOND, UTC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NANOSECONDS_PER_MICROSECOND -> MICROSECONDS_PER_MILLI, i guess

{
String tableName = "test.test_datetime_type";

onBigQuery("DROP TABLE IF EXISTS " + tableName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can run on a shared resources, so table name should be randomized

Copy link
Member Author

@ebyhr ebyhr Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there was opposite comment (should be fixed) by other people in the past. While I can understand both opinion, BigQuery has expiration_timestamp table option. I think we can use this option to cleanup automatically. Of course we can improve by try-with-resources, but it's not perfect solution.
I will simply change to randomized names anyway in this PR.


assertQuery(
"SELECT a FROM " + tableName,
"VALUES (CAST('2021-08-27 12:34:56.123' AS TIMESTAMP(3))), (CAST('2022-09-28 01:02:03.987' AS TIMESTAMP(3)))");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAST('2022-09-28 01:02:03.987' AS TIMESTAMP(3)) is very verbose way of writing TIMESTAMP '2022-09-28 01:02:03.987'

for this, instead of h2 , use

assertThat(query( trino query using big query )
  .matches ( trino query using values )

especially when working with date/time things.

"SELECT a FROM " + tableName + " WHERE a = CAST('2021-08-27 12:34:56.123' AS TIMESTAMP(3))",
"VALUES (CAST('2021-08-27 12:34:56.123' AS TIMESTAMP(3)))");
assertQuery(
"SELECT a FROM " + tableName + " WHERE rand() = 42 OR a = CAST('2021-08-27 12:34:56.123' AS TIMESTAMP(3))",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the case with rand() tests the engine, not the connector. i feel like you can remove it.

@hashhar hashhar self-requested a review August 27, 2021 16:05
@ebyhr ebyhr force-pushed the ebi/bigquery-datetime-pushdown branch from bffbc1a to 69a683c Compare August 30, 2021 00:52
@ebyhr
Copy link
Member Author

ebyhr commented Aug 30, 2021

@findepi @hashhar Applied comments and confirmed TestBigQueryIntegrationSmokeTest passes locally.

ImmutableList.of("'2021-08-27 12:34:56.123'", "'2022-09-28 01:02:03.987'"))) {
assertThat(query("SELECT a FROM " + table.getName()))
.matches("VALUES TIMESTAMP '2021-08-27 12:34:56.123', TIMESTAMP '2022-09-28 01:02:03.987'");
assertThat(query("SELECT a FROM " + table.getName() + " WHERE a = TIMESTAMP '2021-08-27 12:34:56.123'"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance to have io.trino.testing.AbstractTestDistributedQueries#testDataMappingSmokeTest run for BQ?
it would probably cover this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BQ connector doesn't support writes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it support writes for test purposes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage write API is in the beta preview right now

@ebyhr
Copy link
Member Author

ebyhr commented Aug 30, 2021

Let me merge PR though of course I agree adding type mapping tests eventually.

@ebyhr ebyhr merged commit 9024438 into trinodb:master Aug 30, 2021
@ebyhr ebyhr deleted the ebi/bigquery-datetime-pushdown branch August 30, 2021 11:16
@ebyhr ebyhr added this to the 362 milestone Aug 30, 2021
@ebyhr ebyhr mentioned this pull request Aug 30, 2021
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla-signed
Development

Successfully merging this pull request may close these issues.

None yet

4 participants