Skip to content

batched api / eti documentation #2643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 48 commits into
base: develop
Choose a base branch
from

Conversation

alphataubio
Copy link

@alphataubio alphataubio commented May 16, 2025

  • batched/dense
  • batched/sparse
  • eti

@lucbv lucbv self-requested a review May 16, 2025 20:49
@alphataubio alphataubio force-pushed the alphataubio-docs branch 2 times, most recently from 355e932 to 2b36852 Compare May 16, 2025 23:03
jgfouca and others added 19 commits May 18, 2025 11:06
* Par_ilut enhancements and fixes

1) Improve documentation of parameters and confusing code segments
2) Do not let leftover junk in L_entries and U_entries to cause the
   sorted check to fail. Those are output-only parameters.
3) Allow the user to skip the residual computation by setting
   residual_norm_delta_stop to zero (or < 0). This can save some
   memory if memory is tight.

Signed-off-by: James Foucar <jgfouca@sandia.gov>

* Docs

Signed-off-by: James Foucar <jgfouca@sandia.gov>

* Fix doc build errors

Signed-off-by: James Foucar <jgfouca@sandia.gov>

---------

Signed-off-by: James Foucar <jgfouca@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: romintomasetti <romin.tomasetti@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
* Avoid illegal use of View::stride and View::extent in BLAS 1

As the documentation says you can't legally call extent(1)/stride(1)
on a rank-1 View.

Signed-off-by: Christian Trott <crtrott@sandia.gov>

* Avoid illegal use of View::stride and View::extent in Batched

As the documentation says you can't legally call extent(1)/stride(1)
on a rank-1 View

Signed-off-by: Christian Trott <crtrott@sandia.gov>

---------

Signed-off-by: Christian Trott <crtrott@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Including the `std_algorithms/Kokkos_IsSorted.hpp` header is needed for Trilinos builds to compile

Addresses issue kokkos#2612

Signed-off-by: Nathan Ellingwood <ndellin@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
This older rocm toolchain is very slow compared to the new
rocm 6.2.1 so it is disabled in favor of that newer version.

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
…#2615)

Signed-off-by: Daniel Arndt <arndtd@ornl.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: Mitch Murphy <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
lucbv and others added 6 commits May 18, 2025 11:09
* osx workflow: removing explicit parallelism levels

Switching to implicit parallelism levels for build and
installation as it may change over time based on github
hardware and we don't want to muck with it everytime...

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

* Explicitly setting the parallelism level

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

* Untabify

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

* Bumping cxx standard to 20 and removing deprecated_code_3

The standard should be raised ahead of the upcoming release 5.0.0
to make sure the compiler is supporting it and also to prepare the
code itself if needed. Additionally removing the old
Kokkos_ENABLE_DEPRECATED_CODE_3 configuration option as it is no
longer supported.

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

---------

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
…ave changed (kokkos#2568)

Signed-off-by: Junchao Zhang <jczhang@anl.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
* Update changelog for 4.6.01

Signed-off-by: Nathan Ellingwood <ndellin@sandia.gov>

* workflows: update kokkos version to 4.6.01

Signed-off-by: Nathan Ellingwood <ndellin@sandia.gov>

---------

Signed-off-by: Nathan Ellingwood <ndellin@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
* ConjTrans support for batched team gemm

Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>

* fix: scalar types

Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>

* use rank() rather than rank to check the rank of View types

Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>

* Add a helper to avoid illegal use of extent and stride

Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>

---------

Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Co-authored-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Bumps [m2r2](https://github.com/crossnox/m2r2) from 0.3.2 to 0.3.4.
- [Changelog](https://github.com/CrossNox/m2r2/blob/development/CHANGES.md)
- [Commits](CrossNox/m2r2@v0.3.2...v0.3.4)

---
updated-dependencies:
- dependency-name: m2r2
  dependency-version: 0.3.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.16 to 3.28.17.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@28deaed...60168ef)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.28.17
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
dependabot bot and others added 21 commits May 18, 2025 11:09
Bumps [charset-normalizer](https://github.com/jawah/charset_normalizer) from 3.4.1 to 3.4.2.
- [Release notes](https://github.com/jawah/charset_normalizer/releases)
- [Changelog](https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md)
- [Commits](jawah/charset_normalizer@3.4.1...3.4.2)

---
updated-dependencies:
- dependency-name: charset-normalizer
  dependency-version: 3.4.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
* Lapack - SVD: Unifying SVector layout to LayoutLeft

This reduces ETI and allows users to use any valid layout and still
hit the ETI target.

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

* Lapack - SVD: applying clang-format

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

* Lapack - SVD: updating the documentation to reflect new requirement

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

---------

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Co-authored-by: Yuuichi Asahi <y.asahi@nr.titech.ac.jp>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: Brian Kelley <bmkelle@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
* Add certifi==2025.4.26

certifi is required by requests==2.32.2, but requests does not
provide a hash. This causes our --require-hashes to fail, but we
don't catch it on github because certifi is pre-installed there.

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

* Add certifi source hash to build_requirements.txt

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

Co-authored-by: Luc Berger <lberge@sandia.gov>

---------

Signed-off-by: Carl Pearson <cwpears@sandia.gov>
Co-authored-by: Luc Berger <lberge@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
* Add common test for atexit(finalize)

Make sure TPL singletons can clean themselves up
when Kokkos::finalize is run at program exit.

Signed-off-by: Brian Kelley <bmkelle@sandia.gov>

* Rework TPL singletons

- introduce simple Singleton class that reduces duplication
  in the TPL files
- this replaces global unique_ptr to store the singleton object.
  unique_ptr had a a double-free bug when Kokkos::finalize was
  called as an atexit hook

Signed-off-by: Brian Kelley <bmkelle@sandia.gov>

* Formatting

Signed-off-by: Brian Kelley <bmkelle@sandia.gov>

* Update for PR feedback

Signed-off-by: Brian Kelley <bmkelle@sandia.gov>

* Revert fix for unused var warning

Signed-off-by: Brian Kelley <bmkelle@sandia.gov>

---------

Signed-off-by: Brian Kelley <bmkelle@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Bumps [snowballstemmer](https://github.com/snowballstem/snowball) from 2.2.0 to 3.0.1.
- [Changelog](https://github.com/snowballstem/snowball/blob/master/NEWS)
- [Commits](snowballstem/snowball@v2.2.0...v3.0.1)

---
updated-dependencies:
- dependency-name: snowballstemmer
  dependency-version: 3.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.6.0 to 4.7.0.
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](actions/dependency-review-action@ce3cf95...38ecb5b)

---
updated-dependencies:
- dependency-name: actions/dependency-review-action
  dependency-version: 4.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
* Fixup documentation cover page

Signed-off-by: Damien L-G <dalg24@gmail.com>

* Fix Core "Getting Started" URL

Signed-off-by: Damien L-G <dalg24@gmail.com>

---------

Signed-off-by: Damien L-G <dalg24@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
* Clone H100 PR tests onto Lychee

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

* h100_lychee job cleanups

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

* h100_lychee: add TPL build

Signed-off-by: Carl Pearson <cwpears@sandia.gov>

---------

Signed-off-by: Carl Pearson <cwpears@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
* Removing stride_ from blas and sparse

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

* Switching stride_N() for stride(N)

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

* Applying clang-format

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

---------

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: Carl Pearson <cwpears@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
…2640)

* Lapack - SVD: fixing build issue, will look at test tomorrow

Modifying the ETI specialization to make it consistent with a
LayoutLeft vector view. Fixing some issue in RK unit-test.

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

* Lapack - svd: removing tests for layout right and guarding m < n

Cuda and all other TPLs only support layout left so no point in
testing for layout right, additionally Cuda only supports m >= n
use case so I am guarding the tests with matrices where m < n.

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

* Applying clang-format

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>

---------

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
Signed-off-by: alphataubio <alphataubio@gmail.com>
@alphataubio
Copy link
Author

@lucbv READY FOR REVIEW

( Preview: https://alphataubio.com/kokkos-kernels/ )

@alphataubio alphataubio marked this pull request as ready for review May 18, 2025 15:16
Copy link
Contributor

@lucbv lucbv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some initial comment on the first few files, more to come.

Parameters
==========

:member: Team execution policy instance (only for team version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually not a TeamPolicy instance but rather a TeamPolicy::member_type its API is documented here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is mostly wrong and misleading, it needs to be removed.

Kokkos::View<scalar_type**, Kokkos::LayoutRight, memory_space> A("A", n, n);

// Initialize matrix on host
auto A_host = Kokkos::create_mirror_view(A);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a host view with a different memory allocation from that of A so you can keep it around to verify results later.

Suggested change
auto A_host = Kokkos::create_mirror_view(A);
auto Aorig_host = Kokkos::create_mirror(A);

// Copy results back to host
Kokkos::deep_copy(A_host, A);

// Verify results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove everything below as it makes things a bit longer and more complicated. This is not meant to be a unit-test, if we are not testing this properly then we should add an appropriate test. Instead printing the values to screen could be appropriate to demonstrate that the function worked appropriately.

// Copy results back to host
Kokkos::deep_copy(A_host, A);

// Verify for each batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, there is not need for all of this here.

- :math:`u` is the Householder vector
- :math:`\tau` is the Householder scalar
- :math:`u^T` is the transpose of :math:`u`
- :math:`u^H` is the conjugate transpose of :math:`u` (for complex matrices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also no need for this

Parameters
==========

:member: Team execution policy instance (only for team version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is a team policy member

==========

:member: Team execution policy instance (only for team version)
:u2: View containing the Householder vector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really: view containing the entries of u starting at the second element, so it's size is n-1 if u is of size n

// Copy results back to host
Kokkos::deep_copy(A_host, A);

// Verify results: Manually compute H*A and compare
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not useful for an example, this is already implemented in our unit-tests with proper checks

// Copy results back to host
Kokkos::deep_copy(A_host, A);

// Verify results for a few batches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove everything below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.