Skip to content

[SYCL] Do not lock unconditionally while access queue_iml::MInOrderExternalEvent #17575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Alexandr-Konovalov
Copy link
Contributor

@Alexandr-Konovalov Alexandr-Konovalov commented Mar 21, 2025

queue_impl::MInOrderExternalEvent is empty on hot path, use a flag to avoid locking when empty.

queue_impl::MMissedCleanupRequests and queue_impl::MInOrderExternalEvent are
empty on hot path, check a flag instead or before the locking.
Copy link
Contributor

@aelovikov-intel aelovikov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'd prefer it to be two different PRs (in case something would regress, easier bisecting/reverts).

Also, I'd rather see a bit extra boilerplate code via abstracting
class Helper{ atomic flag; other_data b; methods... };

so that we could ensure that whenever b is set/unset the flag is always updated properly.

@@ -1040,6 +1047,8 @@ class queue_impl {
// the fallback implementation of profiling info
bool MFallbackProfiling = false;

// Is value presented in MInOrderExternalEvent?
std::atomic_bool MHasValueInOrderExternalEvent = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name is confusing, maybe MInOrderExteranEventIsSet would be better?

Second, is it correct to assume that we still want the optional because default-constructed/empty event is heavy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second, is it correct to assume that we still want the optional because default-constructed/empty event is heavy?

Yep, one is heavy. I hope one day we able to implement an empty sycl::event with just empty std::shared_ptr<detail::event_impl> impl, then optional became unneeded.

@@ -801,6 +802,7 @@ void queue_impl::revisitUnenqueuedCommandsState(
else {
std::lock_guard<std::mutex> RequestLock(MMissedCleanupRequestsMtx);
MMissedCleanupRequests.push_back(CompletedHostTask->getCommandGraph());
MAreCleanupRequestsMissed.store(true, std::memory_order_release);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the following not an issue:

Thread A        Thread B
acquire mtx
                check atomic without mutex, empty
push_back
atomic_store_true

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is exactly the question I would like to discuss with the fellow sycl experts!

According to my understanding, the change is correct, because the situation you describing is undistinguishing from a situation when Thread A just staying, say, at the beginning of queue_impl::revisitUnenqueuedCommandsState(), i.e. there is no reason for Thread B to expect something in MMissedCleanupRequests at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergey-semenov , @KseniyaTikhomirova would be much more knowledgeable about this than I am.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with @KseniyaTikhomirova about this today. Like Alexandr said, the case is indistinguishable from the first thread being earlier in its execution path.
With the synchronization being made less strict, it will be slightly more likely that we'll exit queue::wait with some dependency related data still present in the queue. But that's not a functional problem since that data will be handled during the next cleanup call.

@Alexandr-Konovalov
Copy link
Contributor Author

Honestly, I'd prefer it to be two different PRs (in case something would regress, easier bisecting/reverts).

Sure, will do, after (or if) the correctness issue would be resolved.

@Alexandr-Konovalov
Copy link
Contributor Author

Honestly, I'd prefer it to be two different PRs (in case something would regress, easier bisecting/reverts).

Sure, will do, after (or if) the correctness issue would be resolved.

If you are ok with the change, it's turn out 3 PR: with the template addition and 2 with enabling it for in-order event and for missed requests. Is it sound good?

@aelovikov-intel
Copy link
Contributor

If you are ok with the change, it's turn out 3 PR: with the template addition and 2 with enabling it for in-order event and for missed requests. Is it sound good?

Adding yet unused helper in a separate PR will mean a unittest needs to be introduced together with it :) I think if you'd have two different PRs that introduce the same helper in otherwise unmodified area of the code, they won't actually cause any merge conflicts, regardless of the order in which they'll go in.

@Alexandr-Konovalov
Copy link
Contributor Author

If you are ok with the change, it's turn out 3 PR: with the template addition and 2 with enabling it for in-order event and for missed requests. Is it sound good?

Adding yet unused helper in a separate PR will mean a unittest needs to be introduced together with it :) I think if you'd have two different PRs that introduce the same helper in otherwise unmodified area of the code, they won't actually cause any merge conflicts, regardless of the order in which they'll go in.

Interesting. To be honest, I have a different model here. I thought about single PR with 3 commits inside (adding CheckLockCheck and enabling for MMissedCleanupRequests and for MInOrderExternalEvent in separate commits). Then, merging to sycl should be done without squash.

But your solution with 2 PRs is also possible.

Are you ok with the code, so I can start splitting it to two PRs?

@aelovikov-intel
Copy link
Contributor

Then, merging to sycl should be done without squash.

We don't do that in this project (I'm not even sure if that's enabled).

MDataPresent.store(true, std::memory_order_release);
func(MData);
}
template <typename F> void get(F &&func) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think modifications on get might be unexpected. Would pop work as a name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Then, it might be reasonable to change put to pop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean push?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, typo. Yes, push/pop instead of put/get.

@@ -850,12 +855,12 @@ class queue_impl {
auto EventRet = Handler.finalize();
EventImplPtr EventRetImpl = getSyclObjImpl(EventRet);
if (Type == CGType::CodeplayHostTask)
Deps.UnenqueuedCmdEvents.push_back(EventRetImpl);
Deps.UnenqueuedCmdEvents.push_back(std::move(EventRetImpl));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSyclObjImpl(EventRet) vs std::move(EventRetImpl) is almost equal, I don't think the temporary variable is justified anymore. IMO, inline it and maybe add a comment for the future as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's turn out this was already modified on sycl branch (reference used instead). So, dropped from the patch.

std::lock_guard<std::mutex> Lock(MInOrderExternalEventMtx);
MInOrderExternalEvent = Event;
MInOrderExternalEvent.put(
[&Event](std::optional<event> &InOrderExternalEvent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind [&](auto &InOrderExternalEvent) { ... } here and similarly elsewhere, but won't insist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@aelovikov-intel
Copy link
Contributor

Are you ok with the code, so I can start splitting it to two PRs?

I am. @sergey-semenov , any objections?

@Alexandr-Konovalov
Copy link
Contributor Author

Are you ok with the code, so I can start splitting it to two PRs?

I am. @sergey-semenov , any objections?

@sergey-semenov, what do you think?

@sergey-semenov
Copy link
Contributor

No objections from me.

@Alexandr-Konovalov Alexandr-Konovalov changed the title [SYCL] Do not lock unconditionally while access queue_iml members [SYCL] Do not lock unconditionally while access queue_iml::MInOrderExternalEvent Apr 7, 2025
@Alexandr-Konovalov
Copy link
Contributor Author

Colleagues @aelovikov-intel , @sergey-semenov , what do you think?

@aelovikov-intel
Copy link
Contributor

aelovikov-intel commented Apr 9, 2025

check a flag instead or before

This might be hard to read. Maybe "use a flag to avoid locking when empty"?

@Alexandr-Konovalov
Copy link
Contributor Author

check a flag instead or before

This might be hard to read. Maybe "use a flag to avoid locking when empty"?

Sure, done.

@aelovikov-intel aelovikov-intel merged commit 10da83a into intel:sycl Apr 10, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants