Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Parallel Execution Plans on Data Nodes #4306

Closed
TroyDey opened this issue May 5, 2022 · 1 comment · Fixed by #4307
Closed

[Enhancement]: Parallel Execution Plans on Data Nodes #4306

TroyDey opened this issue May 5, 2022 · 1 comment · Fixed by #4307
Assignees
Labels
enhancement An enhancement to an existing feature for functionality multinode performance Query Experience Team

Comments

@TroyDey
Copy link

TroyDey commented May 5, 2022

What type of enhancement is this?

Performance

What subsystems and features will be improved?

Distributed hypertable

What does the enhancement do?

I have pretty good luck getting Postgres to choose parallel execution plans for regular hypertables, but it is nearly impossible to get Postgres to choose a parallel execution plan for remote queries executed on the data nodes. I have been experimenting with this using the timescaledb.enable_remote_explain = on setting and simply executing the remote SQL directly on the data node.

One barrier I found was that the partialize_agg function completely prevents a parallel plan from being chosen even when force_parallel_mode is set to on.

When I set parallel_setup_cost and parallel_tuple_cost = 1 and force_parallel_mode = on, and change partialize_agg to a regular aggregation, such as avg, I can generate a parallel plan but it only ever plans one worker even though max_worker_processes = 59, max_parallel_workers = 48 and max_parallel_workers_per_gather = 24, and the table contains 25 million rows. This has become part of another discussion here.

I realize this is open ended so feel free to close, I mainly wanted to raise the issue of partialize_agg blocking parallel plans.

Implementation challenges

No response

@TroyDey TroyDey added the enhancement An enhancement to an existing feature for functionality label May 5, 2022
@akuzm
Copy link
Member

akuzm commented May 9, 2022

A comment from Gayathri about possible implementation:

(https://iobeam.slack.com/archives/CEKV5LMK3/p1651844444404349?thread_ts=1651833518.202549&cid=CEKV5LMK3)
You can parallelize it only if the passed in function is parallel safe.
fyi: There are 2 places where this function is used: 1) distr hypertables and 2) the current version of caggs. I would check the PG 14 code for PARTITIONWISE_AGGREGATE_PARTIAL and see how we can make changes for the distributed case. Might be able to peek into the plan and add parallel paths? (Partial aggs is being removed for caggs, so the cagg case will soon become irrelevant)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement to an existing feature for functionality multinode performance Query Experience Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants