Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve BigQuery metadata retrieval performance #16064

Merged
merged 3 commits into from Jun 2, 2023

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Feb 10, 2023

SELECT * FROM information_schema.columns WHERE table_catalog = 'bigquery' goes from 17s to 3s in my testing (for just a minor number of tables)

BaseConnectorTest.testSelectInformationSchemaColumns completes under 45s instead of 2m

Description

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@hashhar hashhar self-requested a review February 11, 2023 09:13
@wendigo wendigo force-pushed the serafin/bigquery-metadata-faster branch 3 times, most recently from fd24d17 to 76ccef2 Compare February 11, 2023 19:58
@kokosing kokosing dismissed their stale review February 12, 2023 18:24

Per comment that I have just left

@wendigo wendigo force-pushed the serafin/bigquery-metadata-faster branch from 76ccef2 to 83e3928 Compare February 13, 2023 10:29
@wendigo
Copy link
Contributor Author

wendigo commented Feb 13, 2023

According to allAsList javadoc:

Canceling this future will attempt to cancel all the component futures, and if any of the provided futures fails or is canceled, this one is, too.

@wendigo
Copy link
Contributor Author

wendigo commented Feb 13, 2023

@ebyhr @hashhar can you run BQ tests on this PR?

@ebyhr
Copy link
Member

ebyhr commented Feb 13, 2023

/test-with-secrets sha=83e3928824af65433afeb18289424a93c744ba78

@wendigo wendigo force-pushed the serafin/bigquery-metadata-faster branch from 83e3928 to dae6c32 Compare February 13, 2023 15:47
@wendigo
Copy link
Contributor Author

wendigo commented Feb 13, 2023

@ebyhr @hashhar please run with secrets once again

@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4163879507

@ebyhr
Copy link
Member

ebyhr commented Feb 13, 2023

/test-with-secrets sha=dae6c329aca88b343820ccc0c232409d9f739ce8

@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4168207790

@wendigo wendigo force-pushed the serafin/bigquery-metadata-faster branch from dae6c32 to 73568d5 Compare February 14, 2023 14:04
@wendigo
Copy link
Contributor Author

wendigo commented Feb 20, 2023

@ebyhr can you run this PR once again with secrets?

@hashhar
Copy link
Member

hashhar commented Feb 20, 2023

/test-with-secrets sha=73568d5ccbbb4d71a8f8103cf392f92c223f9186

@wendigo
Copy link
Contributor Author

wendigo commented Feb 20, 2023

Thx @hashhar

@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4224902334

@wendigo
Copy link
Contributor Author

wendigo commented Feb 21, 2023

@ebyhr @hashhar can you rerun? There was a 500 internal error on BQ side.

@ebyhr
Copy link
Member

ebyhr commented Feb 21, 2023

/test-with-secrets sha=73568d5ccbbb4d71a8f8103cf392f92c223f9186

@wendigo wendigo force-pushed the serafin/bigquery-metadata-faster branch from 73568d5 to e8e1030 Compare February 27, 2023 12:10
@github-actions github-actions bot added the bigquery BigQuery connector label Mar 29, 2023
@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4550136213

@wendigo
Copy link
Contributor Author

wendigo commented Mar 29, 2023

Failures in databricks pt seems unrelated.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As follow up we should configure retries for transient failures like rate-limiting.

@wendigo wendigo force-pushed the serafin/bigquery-metadata-faster branch from 5e30ac9 to ea7f7b1 Compare May 25, 2023 09:26
@wendigo
Copy link
Contributor Author

wendigo commented May 25, 2023

@hashhar @ebyhr please review. I've rebased it after merging another BQ PR that conflicted with this one.

@hashhar
Copy link
Member

hashhar commented May 25, 2023

/test-with-secrets sha=ea7f7b102aadd05dd47f109500145acad3879f61

@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5079722278

@wendigo wendigo force-pushed the serafin/bigquery-metadata-faster branch from ea7f7b1 to 1762d63 Compare May 29, 2023 06:14
@wendigo
Copy link
Contributor Author

wendigo commented May 29, 2023

@hashhar @ebyhr please run with secrets (rebased on top of the @hashhar change)

@kokosing
Copy link
Member

/test-with-secrets sha=1762d6395b97bd07eb70695f70ce33048f888079

@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5110550506

`SELECT * FROM information_schema.columns WHERE table_catalog = 'bigquery'` goes from 17s to 3s in my testing (for just a minor number of tables)

BaseConnectorTest.testSelectInformationSchemaColumns completes under 45s instead of 2m
@wendigo wendigo force-pushed the serafin/bigquery-metadata-faster branch from 1762d63 to 4e90dd1 Compare May 30, 2023 12:53
@kokosing
Copy link
Member

/test-with-secrets sha=4e90dd125dfc520c80ef6b611d1cda2752e6e53c

@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5121944501

@wendigo
Copy link
Contributor Author

wendigo commented May 31, 2023

@hashhar @ebyhr can we merge it as it is? The only failing test is existing flakyness: #16416

@hashhar
Copy link
Member

hashhar commented Jun 1, 2023

@hashhar @ebyhr can we merge it as it is? The only failing test is existing flakyness: #16416

RPC retries didn't help? 😞

@hashhar
Copy link
Member

hashhar commented Jun 1, 2023

can you update docs with retry multiplier if we already document other retry configs.

@wendigo
Copy link
Contributor Author

wendigo commented Jun 1, 2023

@hashhar these are not documented

@hashhar hashhar merged commit 4cb743f into trinodb:master Jun 2, 2023
19 of 20 checks passed
@github-actions github-actions bot added this to the 419 milestone Jun 2, 2023
@hashhar hashhar mentioned this pull request Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery BigQuery connector cla-signed
Development

Successfully merging this pull request may close these issues.

None yet

4 participants