Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect plan statistics for explain only by default #1866

Merged
merged 4 commits into from Nov 4, 2019

Conversation

sopel39
Copy link
Member

@sopel39 sopel39 commented Oct 25, 2019

Plan stats collection creates considerable load on
Hive metastore (even when CBO is disabled). Therefore
is better not to collect stats for plan for
non-explain queries

@cla-bot cla-bot bot added the cla-signed label Oct 25, 2019
@sopel39
Copy link
Member Author

sopel39 commented Nov 1, 2019

@findepi AC

Copy link
Member Author

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StatsAndCosts statsAndCosts = StatsAndCosts.empty();
if (collectStatsAndCosts) {
statsAndCosts = StatsAndCosts.create(root, statsProvider, costProvider);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more lines this way

@@ -848,7 +848,7 @@ public Plan createPlan(Session session, @Language("SQL") String sql, List<PlanOp
LogicalPlanner logicalPlanner = new LogicalPlanner(session, optimizers, new PlanSanityChecker(true), idAllocator, metadata, new TypeAnalyzer(sqlParser, metadata), statsCalculator, costCalculator, warningCollector);

Analysis analysis = analyzer.analyze(preparedQuery.getStatement());
return logicalPlanner.plan(analysis, stage);
return logicalPlanner.plan(analysis, stage, true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalQueryRunner is mostly used in tests, therefore it's OK (and actually simpler) to have it collect plan stats always.

@@ -162,6 +163,11 @@ public Plan plan(Analysis analysis)
}

public Plan plan(Analysis analysis, Stage stage)
{
return plan(analysis, stage, analysis.getStatement() instanceof Explain || isCollectPlanStatisticsForAllQueries(session));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since QueryExplainer passes true here, do we need to have instanceof?
i don't like instanceof.

if this is about EXPLAIN ANALYZE, we should be able to pass this information in some better way.

Copy link
Member Author

@sopel39 sopel39 Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is already similar check io/prestosql/execution/SqlQueryExecution.java:410

if this is about EXPLAIN ANALYZE, we should be able to pass this information in some better way.

Maybe there is some other way, but is it worth to trade it for such simple change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this check here for EXPLAIN or EXPLAIN ANALYZE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this check here for EXPLAIN or EXPLAIN ANALYZE?

EXPLAIN, (any kind of).

// We are not able to calculate stats for PARTIAL aggregations
.setSystemProperty(PREFER_PARTIAL_AGGREGATION, "false")
// Distributed query runner does not collect stats for non-EXPLAIN queries by default
.setSystemProperty(COLLECT_PLAN_STATISTICS_FOR_ALL_QUERIES, "true"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I still wonder why we have to set it here, in this class.

Copy link
Member Author

@sopel39 sopel39 Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we need them for stats tests, which the class name suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which the class name suggest?

as it is (// Distributed query runner does not collect stats for non-EXPLAIN queries by default) it just states what the default is, no point in saying it here

i hoped for something more direct, but let it assume it's self-explanatory

@sopel39 sopel39 force-pushed the ks/stats_collection branch 2 times, most recently from 320833c to 3b0e30b Compare November 4, 2019 13:05
Plan stats collection creates considerable load on
Hive metastore (even when CBO is disabled). Therefore
is better not to collect stats for plan for
non-explain queries
@sopel39 sopel39 merged commit e74129d into trinodb:master Nov 4, 2019
@sopel39 sopel39 mentioned this pull request Nov 4, 2019
6 tasks
@martint martint added this to the 325 milestone Nov 14, 2019
@sopel39 sopel39 deleted the ks/stats_collection branch December 4, 2019 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants