Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformations: Implement stencil inlining. #2615

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

PapyChacal
Copy link
Collaborator

Apologies for the monster PR; I might be able to split in two 🤔

@PapyChacal PapyChacal added the transformations Changes or adds a transformatio label May 21, 2024
@PapyChacal PapyChacal self-assigned this May 21, 2024
Copy link

codecov bot commented May 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.62%. Comparing base (97727df) to head (13e93ad).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2615      +/-   ##
==========================================
+ Coverage   89.61%   89.62%   +0.01%     
==========================================
  Files         360      361       +1     
  Lines       46198    46334     +136     
  Branches     6985     7026      +41     
==========================================
+ Hits        41399    41528     +129     
- Misses       3724     3727       +3     
- Partials     1075     1079       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@PapyChacal PapyChacal marked this pull request as ready for review May 21, 2024 16:24
@AntonLydike AntonLydike changed the title transformations: implement stencil inlining. transformations: Implement stencil inlining. May 22, 2024
@tobiasgrosser
Copy link
Contributor

How nice. I am curious about the performance numbers.

@PapyChacal
Copy link
Collaborator Author

How nice. I am curious about the performance numbers.

Me too!

There is some polishing to do, it does not seem to work exactly as expected on big kernels.
Because of my KISS implementation, this pass as-is also combinatorial explode on big kernels.
So I'll first integrate with #2623, to cut off those explosions and be able to pinpoint where exactly I missed something still!

@PapyChacal PapyChacal marked this pull request as draft May 22, 2024 12:42
Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this now ready to review?

@PapyChacal
Copy link
Collaborator Author

@georgebisbas

Is this now ready to review?

Just finished my checks, it is now!

@tobiasgrosser

How nice. I am curious about the performance numbers.

Here they are:

kernel OEC xDSL xDSL w/ inlining Relative error w/ inlining
laplace 332 313 326 3.19949e-11
fastwavesuv 109 409 122 6.31987e-08
fvtp2d_flux 76 525 72 4.16758e-16
fvtp2d_qi 107 549 80 9.55451e-13
fvtp2d_qj 137 987 131 0
hadvuv 91 585 90 3.41916e-08
hadvuv5th 109 570 109 4.04618e-08
hdiffsa 74 294 83 4.44881e-12
nh_p_grad 94 323 104 1.57696e-13
p_grad_c 57 158 67 1.34918e-14
uvbke 28 48 28 3.77537e-16

@PapyChacal PapyChacal marked this pull request as ready for review May 28, 2024 11:08
@tobiasgrosser
Copy link
Contributor

Nice. The performance looks good. Why do we get a relative error here? Should inlining not perform exactly the same compuation?

@georgebisbas
Copy link
Contributor

Are these seconds?

@PapyChacal
Copy link
Collaborator Author

Nice. The performance looks good. Why do we get a relative error here? Should inlining not perform exactly the same compuation?

The relative error is xDSL inlined vs OEC inlined.

I'm not sure what exactly causes the slight differences yet, but could look into it; it might be quite the rabbit hole, given that different versions of CUDA, MLIR and clang are at play.

Regarding inlining doing the same computations; I'm not confident about OEC - or me missing something - on this side yet. In my tests, OEC plains out crashes on some examples if I don't use inlining. On some other examples that run without issue, OEC's inlining appear to change the results relatively significantly; I could look into that too, most likely in priority to xDSL vs old MLIR.

That's why I reported the relative error of both frameworks with inlining enabled for now. It is the case without surprise from OEC and at least demonstrating that things are consistent there.

I don't mind waiting until all those grey areas are clarified if anybody prefers

@PapyChacal
Copy link
Collaborator Author

PapyChacal commented May 28, 2024

Are these seconds?

milliseconds! Just the first thing I got to work on that side, I can now fine-tune if anyone wants to see different measures.

Those are 512 iteratons over 64x64x64 domains with a halo size of 4 in all directions (i.e. 72x72x72 buffers, computation over the central 64x64x64)

NB: 512 iterations without bufferswapping, just repeating the same output buffer update from the same inputs. I'm actually not sure how this influences performance measurements on GPU 🤔 But FWIW, both frameworks are measured the same way here.

Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with the reported relative error, so I approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
transformations Changes or adds a transformatio
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants