Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gt-blas: sycl / mkl oneAPI blas uses 64bit index type (ILP64 only) #127

Closed
bd4 opened this issue Aug 1, 2021 · 6 comments
Closed

gt-blas: sycl / mkl oneAPI blas uses 64bit index type (ILP64 only) #127

bd4 opened this issue Aug 1, 2021 · 6 comments
Assignees

Comments

@bd4
Copy link
Contributor

bd4 commented Aug 1, 2021

MKL oneAPI doesn't currently support LP64. This shows up for getrf/getrs calls for the pivot array. Since this is allocated on device, there is not an easy way around it, especially re the Fortran interface for GENE. Some options:

  1. Always take 32bit args, for SYCL copy to a / from a 64bit array internally, so the user never sees the difference
  2. Define a gpublas_allocate_pivot_array which allocates correct size, and require users to use this

My inclination is to do (1) for now, and if it becomes a performance bottleneck we can revisit. If it's a big batch, then the computation should dominate anyway, so extra pivot array copies shouldn't be too horrible. Also, in the future LP64 may be supported better in oneAPI, then we can switch and they will all be consistent.

@bd4 bd4 self-assigned this Aug 1, 2021
@germasch
Copy link
Contributor

germasch commented Aug 2, 2021

I'm not really sure how much hassle it involves, but I think the best solution would be to define a GTBLAS_INTEGER_KIND or something, and use that on the Fortran side to create the right kind of index arrays in the first place. PETSc does something like that, I believe.

@bd4
Copy link
Contributor Author

bd4 commented Aug 2, 2021

That is basically what I do now for the C++ layer - there is a gt::blas::index_t type alias, and the getrf/rs routines take a pointer to this type for the pivot array. I could add a static global to the C layer that is accessible from Fortran, that exposes the size in bytes of the type. So this is option (3), type alias + size for Fortran interop. From an implementation in gtensor perspective, this is definitely the easiest, probably not too bad in GENE. I guess it comes down to whether ILP64 will remain the favored intel implementation, or if LP64 will reach parity, at which point we can just tell everyone to use that and all the index_t's will be the same. I'll see if I can get any sense of this.

@germasch
Copy link
Contributor

germasch commented Aug 2, 2021

A static global won't work well, it needs to be a compile time constant so that the Fortran compiler can use the correct type. Ie., it basically requires a macro to switch the type inside of Fortran (I think one might be able to make the kind parameter itself a Fortran compile time constant, but that'd still need switching based on a macro, so using a macro directly is probably simpler).

@bd4
Copy link
Contributor Author

bd4 commented Aug 2, 2021

I was imagining it could be an opaque C_PTR in Fortran, but I guess that may not be the case. I'm inclined to do this via cmake/Makefile hackery in client code based on GTENSOR_DEVICE_X. With include files, it would have to be Fortran only and couldn't reuse the existing type alias in any way I can think of, so requires ugly duplication no matter what. If it's going to be a hack, trying to make it simplest / laziest version.

@germasch
Copy link
Contributor

germasch commented Aug 2, 2021

I guess for this particular case you're right, it could be handled entirely opaquely on the Fortran side -- but I still don't much like that (e.g., we already had the case where someone wanted to actually look at the pivots). And it doesn't really handle the more general case where indices are actually calculated/used on the Fortran side (I think that's the case for the sparse solver, currently, though cusparse is a bit of a separate problem).

@bd4
Copy link
Contributor Author

bd4 commented Jul 21, 2022

The GPU API and the CPU API are largely separate, so it's possible to use LP64 for CPU and link CPU MKL appropriately, and the GPU API will still use ILP64. This will likely break if using SYCL_DEVICE_FILTER=host, but that is not something we are trying to support anyway, even if it might occasionally be useful for debugging.

@bd4 bd4 closed this as completed Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants