-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[opt] Support atomic min/max in warp reduction optimization #2956
Conversation
✔️ Deploy Preview for jovial-fermat-aa59dc canceled. 🔨 Explore the source changes: c6c8e87 🔍 Inspect the deploy log: https://app.netlify.com/sites/jovial-fermat-aa59dc/deploys/61484050e497040008722fc6 |
/format |
op_type == AtomicOpType::max || op_type == AtomicOpType::min; | ||
} | ||
|
||
AtomicOpType atomic_op_genre(AtomicOpType op_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why genre...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See latest commit.
/format |
/format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
BTW, do you still remember why ND-Array wasn't supported in the beginning?
@@ -124,5 +124,57 @@ inline bool needs_grad(DataType dt) { | |||
return is_real(dt); | |||
} | |||
|
|||
inline TypedConstant get_max_value(DataType dt) { | |||
if (dt->is_primitive(PrimitiveTypeID::i8)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in the future maybe we can predefine something like a PER_TYPE(i8, int8)
, then we don't have to repeat here..
@@ -134,7 +143,12 @@ void make_thread_local_offload(OffloadedStmt *offload) { | |||
TypeFactory::create_vector_or_scalar_type(1, data_type, true)); | |||
|
|||
auto zero = offload->tls_prologue->insert( | |||
std::make_unique<ConstStmt>(TypedConstant(data_type, 0)), -1); | |||
std::make_unique<ConstStmt>(dest.second == AtomicOpType::max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: get_reduction_init_value() ?
There was no alias analysis for |
Related issue = #2487, #2951, #2952
Previously, only atomic add/sub are supported in warp reduction optimization. This PR aims at also supporting atomic min/max.
Thanks @yolo2themoon for providing the following example:
Profiling results before this PR:
Profiling results after this PR:
We can see ~100x speedup for atomic max.