Fix data movement task dependencies with clang-formatting #110

nicelhc13 · 2023-06-13T17:26:17Z

@yinengy found and reported this bug:

a.print_overview()
@spawn(ts[5], placement=gpu(1), inout=[(a,0)])
def check_array_evict():
    a.print_overview()

@spawn(ts[7], dependencies=[ts[5]], placement=[gpu(1)], input=[(a, 0)])
def check_array_evict2():
    a.print_overview()

@spawn(ts[8], dependencies=[ts[7]], placement=[gpu(2)], input=[(a, 0)])
def check_array_evict3():
    a.print_overview()

In this case, ts[8]'s datamovement task runs before ts[5].
This is because the current data move task dependency is intersection between compute task and parray's referrers.
However, in this case, ts[8]'s dependency is ts[7] and a's dependency is ts[5], and so, there is no intersection.
In order to fix this case, I make the parray's referrers the last task accessing it on OUT or INOUT permission.
Whenever this task is being spawned, the old referrers are cleared.
For correctness, tasks should be spawned in program execution order, but this is not new and so I think it is ok.

wlruys · 2023-06-13T20:22:55Z

I still have to review the bulk of the code but I'm concerned about this comment:
For correctness, tasks should be spawned in program execution order, but this is not new and so I think it is ok.

Tasks can be spawned in any order (its bugged in old Parla: see ut-parla/Parla.py#146) and its one of the reason's that explicit dependencies are more flexible than inferred from dataflow.

wlruys

Question) This means each PArray task_list will only ever have 1 element. Is this enough to handle all parent/child relations/slicing behavior and dependencies? My gut says no, but I still have a hard time understanding the very few cases that our slices can be used concurrently and can't think of an example that breaks. I think it looks/probably is okay! but just bringing it up in case we do find some edge case.

Comment) If this is the solution, it really can't be done at spawn time. It is not thread-safe and things may be spawned out of order. It needs to be after the first implicit topological sort when things enter the mapping phase / become mappable. For example, consider spawning two subtrees with nested tasks where there are some sibling dependencies between trees.

nicelhc13 · 2023-06-13T21:14:50Z

I still have to review the bulk of the code but I'm concerned about this comment: For correctness, tasks should be spawned in program execution order, but this is not new and so I think it is ok.

Tasks can be spawned in any order (its bugged in old Parla: see ut-parla/Parla.py#146) and its one of the reason's that explicit dependencies are more flexible than inferred from dataflow.

Ok. this is the reason why I was waiting for your review :-) I forgot that case. Let me think about this. In the worst case, we may need to enforce users to specify those dependenceis, which I really don't want.

wlruys · 2023-06-13T21:22:44Z

All of the phase triggers/enqueues work as a topo sort, bc the current preconditions are that dependencies must go through the phase first.

Within a phase there could potentially be a reordering (there is not currently), but when they enter and exit it is guaranteed to be a valid execution ordering.

If I remember correctly, that is checked here for mappable:

parla-experimental/src/c/backend/scheduler.cpp

Line 212 in 70a8f31

if (status.mappable && (task->get_state() < Task::MAPPED)) {

wlruys · 2023-06-13T21:25:05Z

I think the quick fix is just to delay adding tasks to parrays (and the reverse) until tasks are ordered and then use the same logic as in this original PR. There might still be a better way? but that's what jumps out to me.

wlruys · 2023-06-13T21:27:14Z

... There might have problems with policy (and mess up the triggers) depending on where/when we are adding these extra dependencies.

nicelhc13 · 2023-06-13T21:32:06Z

I don't see any problem from your suggestion. I will rethink this problem from your suggestion. Thank you!

wlruys · 2023-06-14T00:13:30Z

One edge case is going to be continuation tasks. They are only spawned once, but hit all of the other phases more than once. There should be a counter on the task meta-data already that tracks whether its the first time or not so you can just filter them out.

wlruys · 2023-08-28T19:14:23Z

I forget the status of this? Does it need to be revisted?

wlruys · 2024-01-09T06:02:24Z

We need to fix this for NUWEST, will try my best to think of a solution to change data movement dependency semantics.

nicelhc13 · 2024-01-12T15:48:15Z

working on it from today

nicelhc13 · 2024-01-13T21:36:06Z

To sum up, here is the possible solution:
Delay adding a task itself to a parray list from the creation phase to right after task mapping phase is completed; due to the ordering constraint that parents have been mapped before a task can be mapped, we can guarantee the task list is valid.
Then, we remove this when a compute task is completed.

Mapping policy actually gets more accurate information (it was your concern, wasn't it? Otherwise, I am missing something)

So when we create a data move task for task 8, task 5 should be on the list.
then task 5 will be included in task 8.

nicelhc13 · 2024-01-14T22:16:36Z

I changed data move task creation and its dependency creation. Basically, instead of finding task list of a parray and compute dependency tasks, now each task sets dependency lists for each parray for dependent tasks and propagates it if a dependent task accesses the parray with read-only permission. I am not 100% for sure if this is not wrong but the bug case, cholesky, and sorting seem working correctly.

wlruys

I think I understand this after our call. It seems like a good solution while we resolve other data semantics.

nicelhc13 requested review from wlruys and yinengy June 13, 2023 17:26

wlruys requested changes Jun 13, 2023

View reviewed changes

Fix data dependency

e17395a

nicelhc13 force-pushed the fix/datamove_dependency branch from 31b64e2 to e17395a Compare January 14, 2024 22:08

addcomments

22d8066

nicelhc13 force-pushed the fix/datamove_dependency branch from de8075e to 22d8066 Compare January 14, 2024 22:14

nicelhc13 requested a review from wlruys January 14, 2024 22:15

nicelhc13 force-pushed the fix/datamove_dependency branch from 896fe79 to 22d8066 Compare January 14, 2024 22:18

nicelhc13 and others added 2 commits January 14, 2024 22:18

style: apply ruff format

e0b22a7

remove unncessary file

f7da831

wlruys approved these changes Jan 18, 2024

View reviewed changes

wlruys and others added 2 commits January 17, 2024 21:46

Merge branch 'main' into fix/datamove_dependency

54667db

style: apply ruff format

b2c8485

wlruys merged commit 6bd3166 into main Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix data movement task dependencies with clang-formatting #110

Fix data movement task dependencies with clang-formatting #110

nicelhc13 commented Jun 13, 2023

wlruys commented Jun 13, 2023 •

edited

Loading

wlruys left a comment •

edited

Loading

nicelhc13 commented Jun 13, 2023

wlruys commented Jun 13, 2023 •

edited

Loading

wlruys commented Jun 13, 2023

wlruys commented Jun 13, 2023

nicelhc13 commented Jun 13, 2023

wlruys commented Jun 14, 2023

wlruys commented Aug 28, 2023

wlruys commented Jan 9, 2024

nicelhc13 commented Jan 12, 2024

nicelhc13 commented Jan 13, 2024

nicelhc13 commented Jan 14, 2024

wlruys left a comment

Fix data movement task dependencies with clang-formatting #110

Fix data movement task dependencies with clang-formatting #110

Conversation

nicelhc13 commented Jun 13, 2023

wlruys commented Jun 13, 2023 • edited Loading

wlruys left a comment • edited Loading

Choose a reason for hiding this comment

nicelhc13 commented Jun 13, 2023

wlruys commented Jun 13, 2023 • edited Loading

wlruys commented Jun 13, 2023

wlruys commented Jun 13, 2023

nicelhc13 commented Jun 13, 2023

wlruys commented Jun 14, 2023

wlruys commented Aug 28, 2023

wlruys commented Jan 9, 2024

nicelhc13 commented Jan 12, 2024

nicelhc13 commented Jan 13, 2024

nicelhc13 commented Jan 14, 2024

wlruys left a comment

Choose a reason for hiding this comment

wlruys commented Jun 13, 2023 •

edited

Loading

wlruys left a comment •

edited

Loading

wlruys commented Jun 13, 2023 •

edited

Loading