Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-impala: Introduce COMPUTE STATS statements #2584

Merged
merged 2 commits into from
Aug 22, 2023

Conversation

sbuldeev
Copy link
Collaborator

Why:
We were facing issues for large sized tables sporadically when using the processing templates with specified quality checks.
The issues appear when we try to move the data from the staging table (where the quality checks were already performed) into the target table. The absence of "COMPUTE STATS" statement against the staging table was causing failures when trying to select data from it due to hitting query limits. Before submitting this PR here are the checks that have been done in order to take that decision:
-A job that was processing a large table with quality checks defined was ran 45 times from which only 17 were successful and other were failures due to the issue mentioned above. After that a "Compute stats" statement was added against the staging table before trying to process the data to target - the result was 100 consecutive successful runs. That's why we assumed this enhancement will optimize the execution of the templates.

More details explained in
#1361

What:
-Adding COMPUTE STATS statement that will be executed against the staging table right before moving the data to prod in order to optimize further selects on it.

Signed-off-by: Stefan Buldeev sbuldeev@vmware.com

The statements is executed against a staging table right before the data is moved from staging to target
@sbuldeev sbuldeev merged commit c72ab07 into main Aug 22, 2023
10 checks passed
@sbuldeev sbuldeev deleted the person/sbuldeev/compute-stats-on-stage-table branch August 22, 2023 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants