Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consistent size and index types #191

Open
bd4 opened this issue Aug 5, 2022 · 7 comments
Open

consistent size and index types #191

bd4 opened this issue Aug 5, 2022 · 7 comments
Assignees

Comments

@bd4
Copy link
Contributor

bd4 commented Aug 5, 2022

Currently gtensor uses int based shape_type, while low level indexing like gtensor_storage index operator are gt::size_type, which is an alias for std::size_t (typically unsigned long). Furthermore, the calc_size(shape) helper returns size_type, but there are places in the code which index over a total size with int or cast it to int.

To make matters more interesting, the dim3 type in CUDA and HIP is uint32_t (actually unsigned in for CUDA but that is generally uint32_t). It seems like we should at least be consistent with dim3 types?

@bd4
Copy link
Contributor Author

bd4 commented Aug 5, 2022

FWIW, the SYCL spec landed on using size_t for all index types, and int for dimensions: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:item.class

The fact that the gpu blas APIs typically use int indexes makes things fun here. Bottom line, we don't want silent failures if overflow happens, even in release builds.

@bd4 bd4 self-assigned this Aug 5, 2022
@germasch
Copy link
Contributor

germasch commented Aug 5, 2022

I guess I'll agree that while there was some intention behind the different size for multi-d index vs 1-d index, it's really not possible to not do this in a way that doesn't cause potential subtle overflow bugs.

So probably what we should do is to use <class>::size_type consistently everywhere, get rid of int, with the former being defaulted to gt::size_type which in itself should probably default to std::size_t (but one could argue about uint).

Maybe actually ssize_t would be better, since there are times where negative indices may make some sense. I think this need some more thinking/investigation...

@bd4
Copy link
Contributor Author

bd4 commented Aug 5, 2022

yes this is subtle. As a start, I am using size_t for index calculations inside kernels and for SYCL and whenever doing interations over container.size() or calc_size(shape). This involves no change to the external interface.

@bd4
Copy link
Contributor Author

bd4 commented Aug 5, 2022

There are potentially 3 types at play here:

  1. size_type - for addressing in to memory
  2. index_type, backend - what the underlying GPU lib uses, e.g. for dim3 in CUDA (uint32_t)
  3. index_type, gt high level - currently int, as part of shape_type, not convinced this should be different from (2)

I'm inclined to make gt::index_type a standard that should be used e.g. for GT_LAMBDA (gt::index_type i), and then we have knobs to change this and aren't stuck with a bunch of code with hard coded ints. I am also inclined to change the default to uint32_t on HIP/CUDA and to size_t for SYCL, to match the backend.

@bd4
Copy link
Contributor Author

bd4 commented Aug 5, 2022

Having both size_type and index_type is a bit confusing here. One is 1d, the other is N-d, but of course 1d is just a special case of N-d, so it's wierd.

@germasch
Copy link
Contributor

germasch commented Aug 6, 2022

Having both size_type and index_type is a bit confusing here. One is 1d, the other is N-d, but of course 1d is just a special case of N-d, so it's wierd.

Right, I think I just wrote something along those same lines in another thread.

@bd4
Copy link
Contributor Author

bd4 commented Aug 7, 2022

I think an interesting experiment here, would be to change all indexes to size_type and change default size_type to uint32_t, and see if (a) tests, benchmarks etc pass, (b) gene passes tests, and (c) performance of benchmarks and/or GENE improves significantly. If the answer is yet, then I think it's worth doing. Other than some reverse iteration direction loops in bandsolver which can be rewritten, I don't see negative indexes being used anywhere. It would also be interesting to see if using uint32_t consistently works around the ROCm compiler bug with -O2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants