-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformations: Implement stencil inlining. #2615
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2615 +/- ##
==========================================
+ Coverage 89.61% 89.62% +0.01%
==========================================
Files 360 361 +1
Lines 46198 46334 +136
Branches 6985 7026 +41
==========================================
+ Hits 41399 41528 +129
- Misses 3724 3727 +3
- Partials 1075 1079 +4 ☔ View full report in Codecov by Sentry. |
How nice. I am curious about the performance numbers. |
Me too! There is some polishing to do, it does not seem to work exactly as expected on big kernels. |
fa327fe
to
7453f78
Compare
…from duplicated ones.
Co-authored-by: Sasha Lopoukhine <superlopuh@gmail.com>
3f78657
to
c98e78c
Compare
c98e78c
to
6570843
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this now ready to review?
Just finished my checks, it is now!
Here they are:
|
Nice. The performance looks good. Why do we get a relative error here? Should inlining not perform exactly the same compuation? |
Are these seconds? |
The relative error is xDSL inlined vs OEC inlined. I'm not sure what exactly causes the slight differences yet, but could look into it; it might be quite the rabbit hole, given that different versions of CUDA, MLIR and clang are at play. Regarding inlining doing the same computations; I'm not confident about OEC - or me missing something - on this side yet. In my tests, OEC plains out crashes on some examples if I don't use inlining. On some other examples that run without issue, OEC's inlining appear to change the results relatively significantly; I could look into that too, most likely in priority to xDSL vs old MLIR. That's why I reported the relative error of both frameworks with inlining enabled for now. It is the case without surprise from OEC and at least demonstrating that things are consistent there. I don't mind waiting until all those grey areas are clarified if anybody prefers |
milliseconds! Just the first thing I got to work on that side, I can now fine-tune if anyone wants to see different measures. Those are 512 iteratons over 64x64x64 domains with a halo size of 4 in all directions (i.e. 72x72x72 buffers, computation over the central 64x64x64) NB: 512 iterations without bufferswapping, just repeating the same output buffer update from the same inputs. I'm actually not sure how this influences performance measurements on GPU 🤔 But FWIW, both frameworks are measured the same way here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with the reported relative error, so I approve
Apologies for the monster PR; I might be able to split in two 🤔