Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build and install downstream Trilinos packages against pre-installed Kokkos using native CMake build system #11545

Closed
6 tasks done
bartlettroscoe opened this issue Feb 7, 2023 · 14 comments
Labels
pkg: Kokkos TriBITS Issues with the TriBITS framework itself, not usage of the TriBITS framework type: enhancement Issue is an enhancement, not a bug

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Feb 7, 2023

Description

This issue will track Trilinos efforts to pre-build/install Kokkos (using TriBITS) and the build downstream packages against pre-built/installed Kokkos by setting (TPL_ENABLE_Kokkos=ON). This uses the new support in TriBITS for this in:

which is part of:

Initially we will use the TriBITS build system for Kokkos because it provides the subpackages expected by downstream Trilinos packages and provides the rest that is needed by TriBITS (since it will automatically be a TriBITS-compliant external package). Also, initially we can comment out the special logic about pulling in compiler options from Kokkos and will not try a CUDA build to make this easier.

Then we can get this working for the native Kokkos CMake build system.

Tasks

@bartlettroscoe bartlettroscoe created this issue from a note in Trilinos TriBITS Refactor (Selected) Feb 7, 2023
@bartlettroscoe bartlettroscoe added TriBITS Issues with the TriBITS framework itself, not usage of the TriBITS framework pkg: Kokkos type: enhancement Issue is an enhancement, not a bug and removed pkg: Kokkos labels Feb 7, 2023
@bartlettroscoe bartlettroscoe moved this from Selected to In Progress in Trilinos TriBITS Refactor Feb 7, 2023
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Feb 7, 2023
…fig files (trilinos#11545)

Just need to move where tribits_package_decl() is call and call
tribits_pkg_export_cache_var() inside of kokkos_option().
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Feb 9, 2023
…iles (trilinos#11545)

Just need to move where tribits_package_decl() is called before Kokkos defines
its options and then call tribits_pkg_export_cache_var() inside of
kokkos_option().

NOTE: We don't export the variables Kokkos_ENABLE_TESTS or
Kokkos_ENABLE_EXAMPLES because those are special varaibles defined by TriBITS
where the project-level variable value may be different than the cache
variable value (which is on purpose) and also we don't want to export these
variables because downstream packages should not need to know this info.

ToDo: Kokkos really should differentiate what options values it exports and
which it does not to provide a better defined API (and downstream customers
don't need to grep the installed Kokkos_config.h file to figure out this
info).
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Feb 16, 2023
When Kokkos is supplied as an external package, the modern CMake imported
targets from the Kokkos<Subpkg>Config.cmake files also provide the needed
flags.  Therefore, there should be no special mention of Kokkos in the
Trilinos configure logic.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Feb 19, 2023
…rilinos#11545)

With the TriBITS modernization refactoring (TriBITSPub/TriBITS#299) and the
generalizated handling of intenral and external packages
(TriBITSPub/TriBITS#63), we need packages like Kokkos to set critical compiler
options as target properties so that they will be exported in the generated
IMPORTED targets of the Kokkos<Subpkg>Targets.cmake file.

This is needed, for example, to pass some critical compiler flags from the
pre-installed Kokkos to downstream CMake configures of KokkosKernels and the
rest of Trilinos (see trilinos#11545).
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 29, 2023
…s-tribits (trilinos#11545)

I resolved the oneline doc conflict in the file:

* cmake/tribits/core/package_arch/TribitsPackageMacros.cmake

by going with what is on tribits_github_snapshot.
bartlettroscoe added a commit that referenced this issue Mar 29, 2023
…#11545)

I resolved the one-line doc conflict in the file:

* cmake/tribits/core/package_arch/TribitsPackageMacros.cmake

by going with what is on tribits_github_snapshot.
@bartlettroscoe bartlettroscoe changed the title Build and install downstream packages against pre-installed Kokkos built with TriBITS Build and install downstream Trilinos packages against pre-installed Kokkos Mar 29, 2023
@ibaned
Copy link
Contributor

ibaned commented Mar 30, 2023

Thanks for working on this! I will greatly help our Sandia production code when this is done. I'd be happy to help with making the native Kokkos build system export everything TriBITS expects.

trilinos-autotester added a commit that referenced this issue Mar 31, 2023
…03-29

Automatically Merged using Trilinos Pull Request AutoTester
PR Title: TriBITS snapshot update 2023-03-29 (#11545)
PR Author: bartlettroscoe
trilinos-autotester added a commit that referenced this issue Mar 31, 2023
…alled-kokkos-tribits

Automatically Merged using Trilinos Pull Request AutoTester
PR Title: Tweaks to Trilinos to get working with pre-installed Kokkos and KokkosKernels (with TriBITS updates) (#11545)
PR Author: bartlettroscoe
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Apr 1, 2023
…s:develop' (28a7b37).

* trilinos-develop:
  Have Kokkos TriBITS build set compiler options as target properties (trilinos#11545)
  Update logic for TPL_ENABLE_Kokkos=ON (trilinos#11545)
  TrilinosInstallTests_find_package_Trilinos: Run in own subdir
  Move check for ParMETS version for Zoltan2 to Zoltan2 (trilinos#63)
  Have Kokkos TriBITS build properly export options to package config files (trilinos#11545)
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Apr 1, 2023
…s:develop' (28a7b37).

* trilinos-develop:
  Have Kokkos TriBITS build set compiler options as target properties (trilinos#11545)
  Update logic for TPL_ENABLE_Kokkos=ON (trilinos#11545)
  TrilinosInstallTests_find_package_Trilinos: Run in own subdir
  Move check for ParMETS version for Zoltan2 to Zoltan2 (trilinos#63)
  Have Kokkos TriBITS build properly export options to package config files (trilinos#11545)
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 18, 2023
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 18, 2023
… packages (trilinos#11545)

This script removes the usage of Kokkos subpackages from downstream TriBITS
packages.
@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Apr 18, 2023

FYI: I am working to remove subpackages from Kokkos and usage in all downstream TriBITS packages. I am developing this refactoring as a script that you run on downstream TriBITS packages and it will make all of the needed changes automatically (so TriBITS packages outside of Trilinos can run this single script and absorb the changes).

NOTE: Removing Kokkos subpackages definingly breaks backward compatibility for both downstream TriBITS and non-TriBITS CMake packages and for users that are configuring Trilinos (as it changes the names of some of the enable vars). (But it should be easy to absorb the changes in both cases given the scripts I am producing.) But this refactoring is not like falling off a log.

Update: 4/18/2023 While this change can break backward compatibility for some customers, it may not for many/most customers. If they are just depending on Kokkos then they may not need to change anything.

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 20, 2023
* Removed the listing of subpackages from kokkos/cmake/Dependencies.cmake

* Remove the now-unused files
  kokkos/[core,containers,algorithms,simd]/cmake/Dependencies.cmake

* Removed TriBITS macros for a package with subpackages and replace with those
  for a package with no subpackages.  Also, removed all subpackage macros.

* Changed kokkos_process_subpackage() to just call add_subdirectory().

* Changed the name of old KokkosCore tests "XXX" to "CoreXXX" because the
  prefix for all tests is now "Kokkos_" instead of "KokkosCore_"

* Changed the the name of the containers/unit_test/CMakeLists.txt file test
  'TestCompileOnly' to 'ContainersTestCompileOnly' because there is now a
  'CoreTestCompileOnly' test (all prefixed with 'Kokkos_').

* Removed the usage of tribits_configure_file() and wrapper
  kokkos_configure_file() and just call configure_file().  The location of
  PACKAGE_SORUCE_DIR changed so the calls to tribits_configure_file() no
  longer worked.  (Also, these X_config.h.in files were not using any of the
  TriBITS-supported features that needed the calling of
  tribits_configure_file() so there was no reason to not just call raw
  configure_file().)
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 20, 2023
…ages (trilinos#11545)

This script removes the usage of Kokkos subpackages from downstream TriBITS
packages CMake files.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 20, 2023
This is the result of running the script remove_kokkos_subpackages_r.sh to
absorb the refactoring of Kokkos to remove the usage of TriBITS subpackages.

Manual changes may need to be made after this.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 20, 2023
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 20, 2023
…nos#11545)

This makes it a little more robust and reproducable to refactor the packages
downstream from Kokkos.  This script can also be used by other TriBITS and
non-TriBITS CMake packages/projects that depend on Kokkos to adjust to this
refactoring.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 20, 2023
…ilinos#11545)

TODO: Change this to remove_kokkos_subpackages_from_trilinos_packages.sh
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jun 7, 2023
…s:develop' (ab899a0).

* trilinos-develop: (22 commits)
  Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863)
  Teuchos: Fixing cmake logic
  Teuchos: Fixing catch() issues with C++ language drift
  TrilinosSS: include <omp.h> (Fix trilinos#11867)
  MueLu hierarchical: Fix build error
  Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change
  Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host
  Stokhos:  Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP
  Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545)
  Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938)
  Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808)
  KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545)
  Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545)
  Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545)
  Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157)
  Export Kokkos_ENABLE_<OPTION> that are relevant
  Do not append to Kokkos_OPTIONS variables those in the do not export list
  Expand list of kokkos options not to export with cmake
  Tpetra: Don't use std::binary_function
  Tpetra: Fixing missing HIP tesT
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jun 7, 2023
…s:develop' (ab899a0).

* trilinos-develop: (22 commits)
  Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863)
  Teuchos: Fixing cmake logic
  Teuchos: Fixing catch() issues with C++ language drift
  TrilinosSS: include <omp.h> (Fix trilinos#11867)
  MueLu hierarchical: Fix build error
  Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change
  Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host
  Stokhos:  Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP
  Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545)
  Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938)
  Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808)
  KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545)
  Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545)
  Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545)
  Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157)
  Export Kokkos_ENABLE_<OPTION> that are relevant
  Do not append to Kokkos_OPTIONS variables those in the do not export list
  Expand list of kokkos options not to export with cmake
  Tpetra: Don't use std::binary_function
  Tpetra: Fixing missing HIP tesT
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jun 7, 2023
…s:develop' (ab899a0).

* trilinos-develop: (22 commits)
  Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863)
  Teuchos: Fixing cmake logic
  Teuchos: Fixing catch() issues with C++ language drift
  TrilinosSS: include <omp.h> (Fix trilinos#11867)
  MueLu hierarchical: Fix build error
  Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change
  Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host
  Stokhos:  Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP
  Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545)
  Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938)
  Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808)
  KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545)
  Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545)
  Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545)
  Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157)
  Export Kokkos_ENABLE_<OPTION> that are relevant
  Do not append to Kokkos_OPTIONS variables those in the do not export list
  Expand list of kokkos options not to export with cmake
  Tpetra: Don't use std::binary_function
  Tpetra: Fixing missing HIP tesT
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jun 7, 2023
…s:develop' (ab899a0).

* trilinos-develop: (23 commits)
  Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863)
  Teuchos: Fixing cmake logic
  Teuchos: Fixing catch() issues with C++ language drift
  fastilu: Fix memory leak.
  TrilinosSS: include <omp.h> (Fix trilinos#11867)
  MueLu hierarchical: Fix build error
  Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change
  Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host
  Stokhos:  Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP
  Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545)
  Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938)
  Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808)
  KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545)
  Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545)
  Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545)
  Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157)
  Export Kokkos_ENABLE_<OPTION> that are relevant
  Do not append to Kokkos_OPTIONS variables those in the do not export list
  Expand list of kokkos options not to export with cmake
  Tpetra: Don't use std::binary_function
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Jun 7, 2023
…s:develop' (ab899a0).

* trilinos-develop: (23 commits)
  Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863)
  Teuchos: Fixing cmake logic
  Teuchos: Fixing catch() issues with C++ language drift
  fastilu: Fix memory leak.
  TrilinosSS: include <omp.h> (Fix trilinos#11867)
  MueLu hierarchical: Fix build error
  Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change
  Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host
  Stokhos:  Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP
  Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545)
  Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938)
  Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808)
  KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545)
  Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545)
  Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545)
  Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157)
  Export Kokkos_ENABLE_<OPTION> that are relevant
  Do not append to Kokkos_OPTIONS variables those in the do not export list
  Expand list of kokkos options not to export with cmake
  Tpetra: Don't use std::binary_function
  ...
jwillenbring pushed a commit to jwillenbring/Trilinos that referenced this issue Jun 12, 2023
…iles (trilinos#11545)

Just need to move where tribits_package_decl() is called before Kokkos defines
its options and then call tribits_pkg_export_cache_var() inside of
kokkos_option().

NOTE: We don't export the variables Kokkos_ENABLE_TESTS or
Kokkos_ENABLE_EXAMPLES because those are special varaibles defined by TriBITS
where the project-level variable value may be different than the cache
variable value (which is on purpose) and also we don't want to export these
variables because downstream packages should not need to know this info.

ToDo: Kokkos really should differentiate what options values it exports and
which it does not to provide a better defined API (and downstream customers
don't need to grep the installed Kokkos_config.h file to figure out this
info).
jwillenbring pushed a commit to jwillenbring/Trilinos that referenced this issue Jun 12, 2023
When Kokkos is supplied as an external package, the modern CMake imported
targets from the Kokkos<Subpkg>Config.cmake files also provide the needed
flags.  Therefore, there should be no special mention of Kokkos in the
Trilinos configure logic.
jwillenbring pushed a commit to jwillenbring/Trilinos that referenced this issue Jun 12, 2023
…rilinos#11545)

With the TriBITS modernization refactoring (TriBITSPub/TriBITS#299) and the
generalizated handling of intenral and external packages
(TriBITSPub/TriBITS#63), we need packages like Kokkos to set critical compiler
options as target properties so that they will be exported in the generated
IMPORTED targets of the Kokkos<Subpkg>Targets.cmake file.

This is needed, for example, to pass some critical compiler flags from the
pre-installed Kokkos to downstream CMake configures of KokkosKernels and the
rest of Trilinos (see trilinos#11545).
nliber pushed a commit to nliber/kokkos that referenced this issue Jun 22, 2023
…okkos#6104)

* Kokkos: Remove TriBITS subpackages (#11545)

* Removed the listing of subpackages from kokkos/cmake/Dependencies.cmake

* Remove the now-unused files
  kokkos/[core,containers,algorithms,simd]/cmake/Dependencies.cmake

* Removed TriBITS macros for a package with subpackages and replace with those
  for a package with no subpackages.  Also, removed all subpackage macros.

* Changed kokkos_process_subpackage() to just call add_subdirectory().

* Added prefix 'Core' to several tests in
  kokkos/Core/unit_tests/CMakeLists.txt now that prefix is 'Kokkos_'

* Added prefix 'Containers' to several tests in
  kokkos/containers/unit_tests/CMakeLists.txt and
  kokkos/containers/performance_tests/CMakeLists.txt now that prefix is
  'Kokkos_'

* Change name of the kokkos/containers/performance_tests/CMakeLists.txt file
  test 'PerformanceTest_XXX' to 'ContainersPerformanceTest_XXX'.

* Added prefix 'Algorithms' to several tests in
  kokkos/algorithms/unit_tests/CMakeLists.txt now that prefix is 'Kokkos_'

* Removed the usage of tribits_configure_file() and wrapper
  kokkos_configure_file() and just call configure_file().  The location of
  PACKAGE_SORUCE_DIR changed so the calls to tribits_configure_file() no
  longer worked.  (Also, these X_config.h.in files were not using any of the
  TriBITS-supported features that needed the calling of
  tribits_configure_file() so there was no reason to not just call raw
  configure_file().)

SQUASH AGINST: Kokkos: Remove TriBITS subpackages (#11545)

* Fix native build of Kokkos after removing subpackages (trilinos/Trilinos#11545)

This restores the building of the raw CMake build of Kokkos after the
refactoring to remove TriBITS subpackages.

* Kokkos: Remove last of subpackage stuff, fix for tests enable (trilinos/Trilinos#11545)

This gives a full passing build and tests with the Trilinos PR GenConfig
clang-11.0.1 build configuration.

* Fixup update target name in python test script that gets configured

---------

Co-authored-by: Damien L-G <dalg24@gmail.com>
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jun 29, 2023
)

Noticed this while cleaning up from the removal of Kokkos subpackages (See PR
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jun 29, 2023
…os#11545, trilinos#11808)

This duplication resulted from running a simple automated script that created
a commit in PR trilinos#11808.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jun 29, 2023
…os#11545, trilinos#11808)

This duplication resulted from running a simple automated script that created
a commit in PR trilinos#11808.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jul 7, 2023
)

Noticed this while cleaning up from the removal of Kokkos subpackages (See PR
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jul 7, 2023
…os#11545, trilinos#11808)

This duplication resulted from running a simple automated script that created
a commit in PR trilinos#11808.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jul 7, 2023
…os#11545, trilinos#11808)

This duplication resulted from running a simple automated script that created
a commit in PR trilinos#11808.
bartlettroscoe added a commit that referenced this issue Jul 13, 2023
…ts-update

Base CMakeLists.txt cleanup, reduced tarball testing, CI enable tweak, TriBITS update (#11545, #11976)
@bartlettroscoe
Copy link
Member Author

With the merge of PR:

This is story is not fully complete

Trilinos TriBITS Refactor automation moved this from In Progress to Done Jul 14, 2023
JacobDomagala pushed a commit to NexGenAnalytics/Trilinos that referenced this issue Aug 4, 2023
)

Noticed this while cleaning up from the removal of Kokkos subpackages (See PR
JacobDomagala pushed a commit to NexGenAnalytics/Trilinos that referenced this issue Aug 4, 2023
…os#11545, trilinos#11808)

This duplication resulted from running a simple automated script that created
a commit in PR trilinos#11808.
JacobDomagala pushed a commit to NexGenAnalytics/Trilinos that referenced this issue Aug 4, 2023
…os#11545, trilinos#11808)

This duplication resulted from running a simple automated script that created
a commit in PR trilinos#11808.
@bartlettroscoe
Copy link
Member Author

FYI: Here is the Spack PR that has the Trilinos Spack package using the Kokkos Spack package:

@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Aug 18, 2023

FYI: Note that there have been some issues in the Spack packages downstream from Trilinos for this change in:

You can see these in:

These are different issues but there was some hard-coded logic that expected Kokkos to be installed along with Trilinos. There was no reason to do that (and that has been the case for at least 10 years). The possible solutions for the problems are described in:

In summary, my recommendation was/is:

Option-3: Have DTK and other downstream Spack packages stop calling find_package(Trilinos COMPONENTS Kokkos Tpetra <pkgi> ...) and instead call find_packag(Kokkos), find_package(Tpetra), find_package(<pkgi>). (This would require changing any downstream CMake project that calls find_package(Trilinos COMPONENTS ... Kokkos ...).)

My advice would be to choose option-3. That is backward compatible to older versions of Trilinos and it starts to move the package ecosystem away for a Trilinos-centric world. (It should just be that different Spack packages provide different subsets of Trilinos packages as installed CMake packages with installed <Package>Config.cmake files. Trilinos should be a coordination project and provide for very efficient co-development workflows but one should stop assuming that "Trilinos" will be one big installed CMake meta-package called "Trilinos".)

and @jwillenbring agreed in xsdk-project/xsdk-issues#214 (comment).

In my opinion, that is the direction that Trilinos and the package ecosystem should be going.

@sebrowne
Copy link
Contributor

I agree with this approach. I had no idea that worked (find_package(Tpetra), for instance), but that's excellent!

@bartlettroscoe
Copy link
Member Author

I agree with this approach. I had no idea that worked (find_package(Tpetra), for instance), but that's excellent!

@sebrowne, this has been supported for a least 10 years (support for which was not written by me). I have added more explicit documentation for this as part of this TriBITS/Trilinos CMake modernization work. For example, see:

and see a (tested) example specific to Trilinos at:

etphipp added a commit to sandialabs/GenTen that referenced this issue Jun 26, 2024
1a3ea28f6 Merge pull request #6231 from ndellingwood/master
3e85bd920 Fix windows symlink configure issue (#6241)
ea7b12448 CHANGELOG fixup following merge
25592c571 Update master_history.txt
adde1e6aa Merge branch 'release-candidate-4.1.00' for 4.1.00
9e8443018 Merge pull request #6228 from masterleinad/cherry_pick_6223
dd81ecb3d Merge pull request #6223 from masterleinad/fix_simd_on_gpus
5c3e68392 [4.1.00] Changelog for 4.1.00 (#6226)
cd96a740b Merge pull request #6219 from masterleinad/fix_sycl_makefile_4_1_00
23aadf490 Fix compiling SYCL with KOKKOS_IMPL_DO_NOT_USE_PRINTF_USAGE
afc192988 Update version to 4.1.00
6ca60c395 Improve OpenMP affinity warning to include MPI concerns (#6185)
e200ba117 [HIP] Improve heuristic deciding the number of blocks used in parallel_reduce (#6160)
43a797b59 Left align demangled stacktrace output. (#6191)
a40637298 Fix global fence in Kokkos::resize(DynRankView) (#6184)
8661773eb Merge pull request #6195 from fnrizzi/is_trait_v
98f9b4c62 add trait and test
e30f04011 shortcut value for is_dynamic_view
789b62c61 Weed out verbose output from `dynamic_view` container unit test (#6173)
e2a7f085d Merge pull request #6171 from rgayatri23/openmptarget_nvhpc
8266abd1b Merge pull request #6183 from ldh4/simd_replace_unavailable_loadu_storeu_instr
ad966bda0 OpenMPTarget: include desul changes.
c72615afe Merge remote-tracking branch 'upstream/develop' into openmptarget_nvhpc
7b0e378e6 Replace _mm512_loadu_epi64 and _mm512_storeu_epi64 with _mm512_loadu_si512 and _mm512_storeu_si512
18c539504 Merge pull request #5982 from masterleinad/cleanup_functor_analysis
6c134afda Merge pull request #6172 from masterleinad/remove_desul_sycl_extended_namespace
0b7bed581 Allow passing a temporary std::vector to partition_space (#6167)
65ffe4c5d Also create symlinks for CMake configuration files to cmake_packages/Kokkos for TriBITS (#6163)
915c17466 SIMD: make binary op tests to test against all data types (#5913)
62ba94c88 Merge pull request #6175 from dalg24/changelog_372
502dc03c3 Merge pull request #6176 from bartlettroscoe/tril-11938-tribits-hwloc
2bc7b96b7 Clean up FunctorAnalysis
9df5a01a8 Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos/Trilinos#11938)
1af137999 Cherry-pick v3.7.02 changelog into develop [ci skip]
bf3457349 OpenMPTarget: Restore desul changes.
925aca1b1 OpenMPTarget: Replace kokkos macros in desul.
538d18d31 OpenMPTarget: update fixme comment.
e832781a3 Remove extended_namespace template paramter for SYCLMemoryOrder/Scope
c23cfb8d0 Update Makefile.kokkos
d1ecf9acb OpenMPTarget: Add a fixme.
bbd9a7882 OpenMPTarget: Changes for OpenMPTarget backend with nvhpc compiler.
ab6f7565b Implement `HPX::in_parallel` (#6143)
e88537f62 Allow linking against build tree (#6078)
b3f9f7825 sorting: add to binsort support for strided views and reorg tests (#6081)
2a5c949c7 Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (#6157)
2a382b42b Merge pull request #6126 from masterleinad/fix_uninitialized_value_in_combined_reducer
461310de4 Merge pull request #6156 from masterleinad/fix_cuda_lambda_trilinos
12e9645e7 KokkosTools: Don't call callbacks before backends are initialized (#6114)
f8a2a8085 `BinSort`, `BinOp1D`, `BinOp3D`: mark default constructor as deleted (#6131)
d92158c69 Fix bogus warnings in nested CUDA parallel_reduce
31a5f21ae Merge pull request #6136 from masterleinad/fix_nd_builtin_reductions_with_loc
5d81422da Merge pull request #6155 from dalg24/fixup_dual_view
85b014b33 Fix Kokkos_ENABLE_CUDA_LAMBDA for Trilinos
131503d8d Revert to `DualView<class,class=void,class=void,class=void>` when deprecated code 4 is enabled
382f0bea7 Merge pull request #6150 from dalg24/drop_profiling_load_print_option
b2645f80c OpenMPTarget: Enable Cray compiler for the OpenMPTarget backend. (#5889)
6c0adb571 Merge pull request #6149 from dalg24/fixup_cuda_lambda
d74df9b66 [ci skip] Add nightly ci for spack (#6135)
8ede4a496 Merge pull request #6142 from dalg24/cleanup_exported_kokkos_options
d92988f3f Suppress bogus warning about CUDA_LAMBDA being ON
57226c978 Drop Kokkos_ENABLE_PROFILING_LOAD_PRINT option
87c7be94f Merge pull request #6047 from masterleinad/simplify_sycl_reductions
3f565bbb5 Export Kokkos_ENABLE_<OPTION> that are relevant
3c0f9a170 Merge pull request #6148 from dalg24/drop_kokkos_enable_launch_compiler
6b18c2ad5 Drop Kokkos_ENABLE_LAUNCH_COMPILER option
c93577435 Do not append to Kokkos_OPTIONS variables those in the do not export list
2bcfa5177 Expand list of kokkos options not to export with cmake
8f4fb725e Merge pull request #6137 from masterleinad/fix_sycl_bit_cast
3329989db Merge pull request #6123 from e10harvey/floating_point_wrapper
ee43d2a7d Add guards for Cuda
c67ddea40 Try running for other execution spaces
bf9c242ed Allow deprecated declarations in SYCL+Cuda CI
e8dba15a4 Improve indentation of comments
99161e0ac Disable tests for OpenMPTarget
cbc7e8878 Fix bit_cast for SYCL again
f8ed850dd Disable tests failing with NVHPC
4197fa833 Merge pull request #6120 from uliegecsm/kokkos-dual-view-template-types
02fb8d423 core/src: Move floating_point_wrapper to private header
b86d73a2b sorting an empty view should exit early and not fail (#6130)
1767bfe3e dual view: update template types (#6085)
df5681d19 Don't restrict index type in builtin reducers
766f00db0 Merge pull request #6133 from msimberg/hpx-post-apply-compat
336473d0b Merge pull request #6132 from msimberg/hpx-version-requirement-1.8.0
d13cc09ee Conditionally use hpx::post instead of hpx::apply based on HPX version
12b0c8021 Increase minimum required HPX version to 1.8.0
8a541b50b Move half traits to private header and add half/bhalf infinity trait (#6055)
3f602b6f2 Merge pull request #6129 from masterleinad/remove_unused_attach_texture_object
6422681a0 Merge pull request #6121 from masterleinad/use_sycl_bit_cast
0018848c6 Cuda: Remove unused attach_texture_object
e94b5dd36 Kokkos_BitManipulation: KOKKOS_COMPILER_GCC->KOKKOS_COMPILER_GNU (#6119)
7009a28f4 Merge pull request #6122 from masterleinad/ambiguous_bit_cast
6b2459ce6 Fix nightlies -- workaround compiler bug in GCC 9.1 and 9.2 (#6118)
5f45c3086 Qualify calls possibly ambiguous calls to bit_cast
1bc1a5194 Import sycl::bit_cast into the Kokkos namespace
c62a42e1c Allow templated functors in parallel_for, parallel_reduce and parallel_scan (#5976)
fb0c1b8ff Merge pull request #6106 from crtrott/fix-nvhpc-compilation
f15b5ab42 Merge pull request #6116 from rbberger/hpcbind_slurm_bugfix
a85923df5 Merge pull request #6110 from dalg24/fixup_cuda_lambda
531b01dce Fix macro guards in test for NVC++ as the CUDA compiler
aa7ab5fd5 hpcbind: check for correct Slurm variable
b26ee87e6 Merge pull request #6113 from fnrizzi/use_assert_eq_for_std_algo_tests
6ede77357 Merge pull request #6064 from masterleinad/sycl_improve_parallel_scan_new
41d9d0626 Reintroduce test skip for nvhpc < 23.3
ce0b78ffe Merge pull request #6111 from dalg24/drop_unused_cmake_macros
81ce338ac use ASSERT_EQ in all std algorithms tests
ef5d44707 Fixup cmake style
b82161b2d Drop unused cmake macros
417a6ee73 Work around NVHPC 23.x not dealing with __isGlobal
0954a1b0a Drop CUDA_LAMBDA guards in Cuda headers
cfbaf28e0 Reorganize ZeroMemset (#6087)
798efc5a3 Always pass -extended-lambda option to NVCC and force Kokkos_ENABLE_CUDA_LAMBDA ON
1c0e3bf3f Update the OpenACC parallel_reduce() constructs with Range/MDRange/Team (#6072)
cf82edcdd Merge pull request #6108 from dalg24/drop_algorithms_and_containers_config_files
d7c06c433 Revert "Merge pull request #5964 from PhilMiller/cuda-lambda-default"
7ef7d02a3 Drop pointless Kokkos{Algorithms,Containers}_config.h files
5fa72b590 Kokkos: Remove TriBITS Kokkos subpackages (trilinos/Trilinos#11545) (#6104)
60b982ad0 Work around NVHPC 23.x issues
ea134de48 Work around NVHPC issue with enum types
edf63b3c5 Merge pull request #6101 from dalg24/bit_cast
e24750864 Added multiple reducers support for team-level parallel reduce (#5727)
8dc8f4973 Fix typo and remove accidentally committed assertions
26ae798fd change impl of `is_sorted_until` to use reduce (#6097)
7533cb407 Disable tests that fail at runtime with NVHPC (likely not liking the class declaration within the body of the functor)
d6944df86 Merge pull request #6008 from uliegecsm/cuda-uvm-space-instance-fence
5c2d948b0 view(uvm): fence if need in allocation (#6005)
432988bcd Clang-format glitch
eff2716b8 Use Kokkos::bit_cast in SIMD instead of rolling its own
e8a44e579 Add runtime tests for bit_cast
ddf55c1d5 Add the Experimental:: builtin variant (just defer to regular bit_cast)
71ee48ffd Add compile time tests for the constraints on the bit_cast function template
ab41ef8a4 Add implementation of bit_cast in <Kokkos_BitManipulation.hpp>
945281ac0 Merge pull request #5964 from PhilMiller/cuda-lambda-default
a45cc1eff fix ternary op in subset of std algorithms not working with nvhpc (#6095)
7a166d2e4 Enable OpenMP in CUDA-11.0-NVCC-RDC to test DEPRECATED_CODE_3=ON (#5978)
4b6d971dc OpenMPTarget: Update hierarchical parallelism. (#6043)
d251954e0 Work around nvcc issue for view_mapping and add FIXME_NVCC comment
5b1f34168 Merge pull request #6098 from ndellingwood/update-changelog-4.0.01
e8067d4be [ci skip] Fixup changelog
c28472a64 Update changelog
4407f7b2e Remove various test exclusions based on KOKKOS_ENABLE_CUDA_LAMBDA
7e329998e Always expect KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA to be set
51d7c720c Don't fail to define broader 'lambdas are available' macro
447028491 Fix definitions and docs to remove CUDA Lambda option
ddded0eb2 Implement CMake messages per team decision
ca9fd2178 Change Makefile.kokkos too
a906356ca Tentative arguments switch for nvcc 12+
4846d47f5 Unconditionally enable CUDA extended lambda support
62d2b6c87 Merge pull request #6080 from ndellingwood/master
e5490e1e1 Add support for Darwin 32-bit and PPC (#5916)
56ef02c0c Disable failed bit manipulation tests when compiled by NVHPC  (#6088)
bdaa12cb9 Compiling with auto deduction of workgroup sizes
3cc9915fe Improve SYCL parallel_scan
d30b04d1b Merge pull request #6065 from masterleinad/fix_join_value_wrapper_for_neutral_element
de5c017ea Update OpenACC FunctorAdapter (#6077)
55bbd9f17 Converted a shared_ptr to a host view in UnorderedMap (#6073)
7793406c8 Merge pull request #6086 from masterleinad/fix_sycl_execution_space
d5fa56e1d Fix up SYCL execution space instance creation for Intel GPUs
0ab1f11fe Update master_history.txt
5893754f8 Update version to 4.0.1
15776f90c Merge pull request #6046 from ajpowelsnl/CHANGELOG-4.0.0/team_thread_sort
b3bb4a6cf Update changelog (#6058)
24c62bf5c Merge pull request #6074 from masterleinad/fix_sycl_cuda
220495f68 Merge pull request #5906 from masterleinad/define_kokkos_compiler_intel_llvm
4b27b7d9a Fix Kokkos_SIMD with AVX2 on 64-bit architectures (#6075)
b72984afd Merge branch 'release-candidate-4.0.01' for 4.0.01
2e51c67a7 Explicitly cast to CombinedFunctorReducerType
a92e09163 Only pass one wrapper object in SYCL reductions
c0830897c Merge pull request #6057 from cz4rs/changelog-4.0.01
06dbc151d Fix PerfTests by limiting GramSchmidt
94446348a perf_test is still not working
0c681edd0 SYCL: Use in-order queue for SYCL+Cuda
c09dd1c94 Merge pull request #6059 from Rombur/fix_ci_host
b21b1e4fd Merge pull request #6068 from crtrott/fix-makefile-4.0.01
2ac576a6b Update changelog
6d2e8994a Fix typo in Makefile.kokkos
57413bb0b Merge pull request #6063 from stanmoore1/makefile_typo
db802ac0e Fix join for ValueWrapperForNoNeutralElement
568056323 Fix bug in Makefile.kokkos
4feae9ea0 Reduce size of ScatterView test when using OpenMP
9004274b4 Merge pull request #6056 from masterleinad/partially_reverse_5504
3eaf13e66 fix based on comments
318d84ca6 Update changelog to 4.0.01 [ci skip]
079268c9d Partially reverse #5504
0d96f88da OpenMPTarget: Changes to Makefile.kokkos (#6053)
7645d6c78 Merge pull request #6052 from masterleinad/fix_unordered_map_shared_space
d26f88ce4 Don't create a shared state for size() in UnorderedMap's deep_copy
3b1afb530 Remove libnuma (#6048)
bb845f225 Merge pull request #6016 from masterleinad/use_wextra
72687d82d Merge pull request #6049 from dalg24/build_md
d0f577706 Remove (outdated) license information [ci skip]
83873a62b Remove Kokkos Keyword Listing section from BUILD.md and refer to the wiki instead
bb7ae99ec CHANGELOG.md: add threads sort
48b34def1 Desul atomics: let relocatable device code mode be part of the configuration (#5991)
07020624b Merge pull request #5504 from masterleinad/sycl_remove_enqueue_barrier_memcpy_workaround
8352a1193 Merge pull request #5855 from dalg24/num_threads_and_device_id
b24dcb4b9 Merge pull request #5990 from jczhang07/patch-1
3f6a85408 Merge pull request #6041 from ldh4/remove_unused_thread_vector_range_ctors
d80f58054 Define KOKKOS_COMPILER_INTEL_LLVM
140cbd72c Define at most one KOKKOS_COMPILER* macro
b6d1dbae9 Merge pull request #6038 from masterleinad/pgi_compiler
38c647625 Merge pull request #6036 from masterleinad/cherry_pick_trilinos
59124cee0 Merge pull request #6037 from masterleinad/cherry_pick_6036
0126dcb56 Remove unused constructors for ThreadVectorRangeBoundairesStruct that are not taking in TeamMemberType as an argument.
29826df64 Try removing _kokkos_pgi_compiler_bug_workaround
39c35a85c KOKKOS_COMPILER_PGI -> KOKKOS_COMPILER_NVHPC
c9a9ee0d5 Cherry-pick TriBITS update from Trilinos
7aabd2d6a Cherry-pick TriBITS update from Trilinos
b57c17bb2 Add -Wextra
9b644e07a Fix OMPT size compare warnings
be65fe429 Fix enum warnings
715a6ffd5 Merge pull request #6030 from masterleinad/fix_missing_field_initializers
0ce389590 Fix -Wmissing-field-initializers warning
5e574380c Relax scratch space limits for HIP reductions (#6029)
ef1ea9343 Add -Wdeprecated-copy warning and fix OMPT scan bug related to assignment operators (#6026)
9b06259fa #6027: replace remaining instances of ALL_t with Kokkos::ALL_t (#6028)
8b5881f96 Merge pull request #6022 from crtrott/4001-cp-support-ada
e86c8ea0b Merge pull request #6021 from dalg24/rc_4_0_01_support_for_amd_gpu_gfx1100
fdb089b34 Add UnorderedMapInsertOps for coo2crs (#5877)
047698529 Add half_t and bhalf_t limits (#5778)
e6b854846 Merge pull request #6018 from dalg24/rc_4_0_01_bug_desul_atomics_numeric_limits_max
82b39059b Merge pull request #6023 from dalg24/rc_4_0_01_fix_changelog
eb93bbdf4 Merge pull request #6019 from dalg24/rc_4_0_01_warning_hip_std_memcpy
a556f49f6 Merge pull request #6020 from crtrott/4001-cp-nvcc12-cpp20
ffa4f0309 Fixup 4.0 change log (#6015) [ci skip]
89bdbaad3 Fixup 4.0 change log (#6015)
f36c9aee3 Add KOKKOS_ARCH_ADA89 to print_configuration
363912114 Do not define KOKKOS_ARCH_AMPERE with Ada (compute capability 8.9)
991901bd9 add support to compile Kokkos for Ada generation (sm_89) consumer GPUs (RTX40x0)
6050076ab Merge pull request #5986 from masterleinad/cherry_pick_5981
e275a77a6 Add support for AMDGPU target NAVI31 / RX 7900 XT(X): gfx1100
8207b2ed1 Allow c++20 in nvcc_wrapper for nvcc 12 and above
eec1a5380 Allow that C++20 is passed to nvcc
1b2826343 Merge pull request #6000 from Rombur/fix_memcpy
3cbd2ec60 Desul atomics: fix bug in `desul::Impl::numeric_limits_max<uint64_t>` value
981d9c37f Merge pull request #6017 from masterleinad/fix_sycl_device_copyable
6f16f417a Fix namespace for is_device_copyable
8270db3ee Merge pull request #6003 from masterleinad/fix_team_scratch_1_queues_sycl_cuda
54da8a2bb Merge pull request #6000 from Rombur/fix_memcpy
8c3d97e6a Merge pull request #6013 from masterleinad/cherry_pick_6012
9b6a80f59 Merge pull request #6012 from aprokop/fix_version
7c7ae9abf desul: Move lock_array_copied from global scope (#5999)
a7a2d715c SYCL: Make is_device_copyable future-proof (#6009)
4e0d9c7fc CMake: update package compatibility mode when building within Trilinos
33b905be8 CMake: update package compatibility mode when building within Trilinos
79b824ed6 Merge pull request #6010 from masterleinad/fix_sycl_decorated_local_pointers
904fb32ec Fix warning in some user code when using std::memcpy
dc876ea2f Merge pull request #6011 from ldh4/release-candidate-4.0.01
69bd7bda4 Merge pull request #5995 from masterleinad/cleanup_ompy
bd6924397 Merge pull request #5996 from dalg24/desul_atomics_nvcc_warning
86b70c13d Merge pull request #6001 from dalg24/desul_atomics_warning_numeric_limits_max
a6f27bf73 Pass local_accessor directly instead
8400cbf9b simd: Fixed an incorrectly returning size for uint64_t in avx2 (#6004)
b0cc5a07b simd: Fixed an incorrectly returning size for uint64_t in avx2 (#6004)
3fc77899c Merge pull request #5948 from dalg24/kokkos_arch_nvidia_gpu_macro
b097f74ce Drive-by fix typos "fix {to -> too} many"
f0119709f Move Cuda/Kokkos_Cuda_NvidiaGpuArchitectures.hpp -> impl/Kokkos_NvidiaGpuArchitectures.hpp
a798ac7bb Explain acquire_team_scratch_space
c5d2c3dbf m_team_scratch_pool -> m_team_scratch_event
33a5d6065 Fix team_scratch_1_queues for SYCL+Cuda
19a43a647 Fix warning with NVC++
106a4a363 Fixup NVIDIA GPU arch must be defined potentially for other backends as well
762e3ce32 Desul atomics: Fix NVCC warning integer conversion resulted in a change of sign
48640d726 Fix compiling OpenMPTarget for AMD GPUs
d5244e1b3 Cleanup OpenMPTaget ParallelReduce
65aa95e82 Merge pull request #5965 from dalg24/desul_numeric_limits_max
4fde4b035 Support --compiler-options in nvcc_wrapper
be14872fe Remove workaround for submit_barrier not being enqueued properly
9480cb5f8 Merge pull request #5962 from masterleinad/host_iterate_tile_combined_functor_reducer
70f6d34c3 Fix sycl.large_team_scratch_size
b04b46af7 Merge pull request #5984 from uliegecsm/kokkos-graph-hip
fc3f7fc45 Merge pull request #5892 from aprokop/use_std_sort_within_a_bin
65bf47c49 Merge pull request #5983 from masterleinad/fix_unordered_map_m_size
bb5ef8fdc graph(hip): enable test
0eeb3a464 Merge pull request #5971 from masterleinad/fix_reducer_check_serial_hpx
3c629bebc Merge pull request #5774 from tcclevenger/refactor_scan_policy_tests
3cb200c11 Add another test case
6a8e923e5 Use (non-mutable) std::shared_ptr instead
74e2fe90c UnorderedMap: Ensure size() working in case of copies
260886dbf Merge pull request #5981 from masterleinad/fix_sycl_large_team_scratch_size
42991f104 Bit manipulation: implement `byteswap` (#5967)
22cc43312 Add to HIP tests in Makefile
bb8a96b2b Fix sycl.large_team_scratch_size
ee7576301 #5641: Fix HIP & CUDA MDRange reduce for sizeof(value_type) < sizeof(int) (#5745)
9786d576e Merge pull request #5977 from j8asic/patch-1
43b0245a2 Print Kokkos version at configuration time (#5979)
067f74aeb Allow c++20 in nvcc_wrapper for nvcc 12 and above
b000df58c Allow that C++20 is passed to nvcc
82bd4e602 Merge pull request #5963 from masterleinad/fix_partition_master_test
05f644a91 Merge pull request #5966 from dalg24/cuda_bhalf_conversions_ampere_plus
9f5f762ed Merge pull request #5973 from cz4rs/benchmark-add-git-info
63966c186 Merge pull request #5970 from mhalk/feature/add_support_gfx1100
00a24a474 Merge pull request #5972 from aprokop/rename_scoped_profile
3707be788 Merge pull request #5954 from masterleinad/pass_functor_analysis_to_parallel_reduce_ompt
9fe93d474 Merge pull request #5867 from akohlmey/add_cuda_ada_support
b10f35e0c Improve macro name KOKKOS_IMPL_{ARCH_NVIDIA_GPU_AMPERE_PLUS -> NVIDIA_GPU_ARCH_SUPPORT_BHALF}
b4de0ac9f Rename KOKKOS_{ -> IMPL_}ARCH_NVIDIA_GPU
72d39a7a5 Rename ScopedProfileRegion -> ScopedRegion
979899369 [ci skip] Add a comment
488ff103b Bring back git info to benchmarks output
651ba7874 Merge pull request #5968 from kokkos/PhilMiller-patch-1
85ab1bc8a Add support for AMDGPU target NAVI31 / RX 7900 XT(X): gfx1100
42abe3664 Convert OpenMPTarget ParallelScan
6e29e9236 Convert OpenMPTarget ParallelReduce
f670cae47 Let KOKKOS_ARCH_NVIDIA_GPU provide the Compute Capability
a7ac0453c Drop native from performance benchmark build
e0eacdd67 Drop native from macOS build
f46889d2f Drop native from HPX builds
0e302f6d8 Drop Kokkos_ARCH_NATIVE=ON because it breaks with ccache
1d26ca8d3 Make CUDA bhalf conversion code more forward compatible
4f18b1976 Desul atomics: fix bug max uint64_t value
0b2a956ae Merge pull request #5959 from aprokop/scope_guard
2b035de21 Use CombinedReducer in HostIterateTile
2e667d8c2 Fix partition_master test
9b1855061 Address review comments
0f7b7eb06 Merge pull request #5953 from masterleinad/pass_functor_analysis_to_parallel_reduce_sycl
787f94022 Merge pull request #5894 from masterleinad/pass_functor_analysis_to_parallel_reduce_threads
7fa5a75fc Merge pull request #5910 from masterleinad/fix_scan_serial_cuda
d7896e64f Add ParallelScanRangePolicy test
bab74b0ee Merge pull request #5947 from dalg24/desul_hip_rdc
543e971c8 Merge pull request #5958 from dalg24/fixup_openmptarget_concurrency
62fa442c7 Add [[nodiscard]] qualifiers
73de258fd Add ScopedProfileRegion
fb0b94cfa Fix OpenMPTarget::concurrency()
3c77f6fb7 Also convert SYCL ParallelScan
90836d29d Convert SYCL ParallelReduce
a75aa23a1 Merge pull request #5949 from masterleinad/pass_functor_analysis_to_parallel_reduce_openacc
b2ec19d9d Merge pull request #5952 from dalg24/unused_work_range
51fbd42a2 Drop unused ParallelX::WorkRange member types
5b1a0e36a Merge pull request #5950 from dalg24/4.0-changelog
952b841a3 Fix Kokkos_Threads_Parallel_MDRange.hpp
6d24bc083 Update changelog to 4.0.0
4dcb2946d Use KOKKOS_ARCH_NVIDIA_GPU macro in SYCL, OpenACC, and OpenMPTarget backends where appropriate
72271273a Convert OpenACC ParallelReduce
f967fa921 Provide another constructor in Test16_ParallelScan
5d3bcb1d7 Define KOKKOS_ARCH_NVIDIA_GPU macro when targeting an NVIDIA GPU architecture
fc4a9cecf Merge pull request #5942 from dalg24/print_config_disabled_atomics
65a6f9a3e Add comments testing for non-device-callable destructors
7b598eb1b Fix reducer result check for Threads ParallelReduce
9a33347f3 Use local "reducer" variable
1bfd0cc68 Convert Threads ParallelReduce implementations
79f81443a Fix reducer result check for Serial+HPX ParallelReduce
659baf67d Drop DESUL_HIP_RDC compile definition
554032e78 Desul atomics: prefer __CLANG_RDC__ macro
7e4665d07 Merge pull request #5944 from dalg24/drop_kokkos_enable_rfo_prefetch_macro
ee2ddaec0 Drop KOKKOS_ENABLE_RFO_PREFETCH macro
40c40a7c4 Convert OpenMP ParallelReduce (#5893)
5c5ac72da Tell when Kokkos atomics are disabled in print_configuration
d9fc6cb78 Merge pull request #5940 from dalg24/drop_kokkos_enable_atomics_macros
4bf2c5c5f RangePolicyRequire was not using require
e98766bcc Merge pull request #5936 from dalg24/drop_kokkos_arch_turing_macro
e69b7969a Remove mention of the KOKKOS_ENABLE_*_ATOMICS macros in <Kokkos_Macros.hpp> header
6d10edce7 Drop KOKKOS_ENABLE_CUDA_ASM* macros
aafe20c99 Drop `KOKKOS_ENABLE_*_ATOMICS` macros when printing configuration
08dc18059 Merge pull request #5923 from dalg24/drop_kokkos_memory_order
d303e403b Merge pull request #5935 from PhilMiller/intel-macro-cleanup
1528cd402 Merge pull request #5932 from shaomeng/improve_vector
32868fab5 Merge pull request #5931 from tcclevenger/cleanup_unit_test_cmake
537f62e5e Do not define KOKKOS_ARCH_TURING macro with generated GNU makefiles
6dd4800e6 Add KOKKOS_ARCH_ADA89 to print_configuration
0e9990230 Do not define KOKKOS_ARCH_AMPERE with Ada (compute capability 8.9)
61620e83e Revert "Revert "Fix intel hang""
f419b7303 Add missing <atomic> header include
b132b9b90 add cbegin() and cend() to Kokkos::Vector
2bbe1dfc4 Cleanup unit_test/CMakeLists.txt
5f8d0e3d2 Update clang-format CI build (#5930)
8b80bd0b2 Merge pull request #5925 from dalg24/kokkos_hip_architectures
310812b0d Remove extra double quote in CUDA and HIP allocation error messages (#5926)
b9e423e3b Export Kokkos_HIP_ARCHITECTURES variable with CMake
569a60949 Export `Kokkos_CUDA_ARCHITECTURES` variable with CMake (#5919)
1abf65388 Drop Kokkos memory oder classes
3bcf389e2 Use directly memory order from desul in Impl:: atomic funtion templates
2a7629dd6 Prefer non Impl:: atomic_{load,store} in AtomicDataElement since using relaxed memory order
416d7b743 New OpenACC backend implementation for  parallel_scan with  a range policy (#5876)
1cf890742 Use std::sort for sorting within a bin when possible
db890c979 Add test case
f93e48a3b Don't call the functor's destructor on the device for Serial and Cuda
d177f61e3 Merge pull request #5918 from PhilMiller/intel-macro-cleanup
12708a1e4 Use insertion sort for sort within a bin in BinSort (#5890)
90286ca36 Merge pull request #5911 from masterleinad/pass_functor_analysis_to_parallel_reduce_hip
33e5ef694 Revert "Fix intel hang"
54e4396dc containers: Remove workaround for Intel older than the required 19.0.5 and GCC < 5
6a3b1d60f algorithms: Remove workaround for Intel older than the required 19.0.5
1d08f6fc6 Merge pull request #5915 from dalg24/drop_host_lock_arrays
4f871bea8 Convert HIP ParallelScan
70a0af5f8 Convert HIP ParallelReduce
cba99e88d Remove misplaced and commented host lock array code in OpenMPTarget backend
63879dbbc Drop host lock array
b4655f90f Drop (unused) HBW lock array
c5fe10e7f Merge pull request #5817 from dalg24/drop_kokkos_lock_arrays
3c06ffeae Merge pull request #5907 from dalg24/bit_rotate
fcdedf75a Do not bother with sycl::rotate
771c95696 Merge pull request #5895 from masterleinad/pass_functor_analysis_to_parallel_reduce_hpx
bc1138ff2 Merge pull request #5884 from rbberger/amd_rocm_hpcbind
a2181fc62 Merge pull request #5901 from etiennemlb/fix/cmake-deduplication-issue
0691619fd Convert HPX ParallelReduce
ba195725c Use CombinedFunctorReducerType in ParallelReduce (#5874)
22ee14e0b Implement `rot{l,r}` function templates
4ec9fb665 Add AMD ROCm support to hpcbind
0f51821b0 Merge pull request #5905 from crtrott/fix_msvc_cuda
892131740 Silence unused parameter warning
66e143703 Apply clang-format
c74aa417d Split math function test further, to work around compilation issue with MSVC/CUDA
43ec33eab Work around a bug in MSVC/CUDA in a function.
40750093e Work around a failing CTAD occurance on MSVC/CUDA
47844ce3d Fix more rank style changes in MSVC/CUDA build
5b9f300bf Fix another error with MSVC where we need to use rank()
e53f22401 Cleanup prefer {traits:: -> }rank[_dynamic]
fb3d754f5 Merge pull request #4577 from dalg24/bit_manip
e146fc935 Merge pull request #5870 from dalg24/view_rank_member_function
75a3e80ef Disable uchar test to work around broken sycl::ctz on NVIDIA GPUs
b166d7751 Add `Experimental::*_builtin` counterpart to the bit manipulation template functions
c256a98a9 Merge pull request #5881 from msimberg/update-hpx-print-configuration
dff272ff1 Fix CMake deduplication issue when linking with hip::device
c4a5ad08f Update HPX::print_configuration
b3a8182ae Backport function templates from <bit> standard library header
03aae9a96 Merge branch 'develop' into view_rank_member_function
3be7ae202 Add compile-only test for View::rank[_dynamic]
948c6c69f Merge pull request #5620 from cz4rs/core-perf-tests-benchmark-conversion
af89aa748 Merge pull request #5878 from masterleinad/aligned_subview
25ff05bee Fix warning pointless comparison of unsigned integer with zero
3d2dc6ad1 Merge pull request #5887 from msimberg/nvhpc-version-macro-more-digits
2969679b7 Fix MSVC CI build
d3eac2b16 Cleanup prefer {traits:: -> }rank[_dynamic]
60ba1e1ee Add one more digit for KOKKOS_COMPILER_NVHPC version components
4ca034079 Add comment in test
c4b81ec48 Try fixing Cuda 11 CI
86bbae342 Deprecate subview overload taking a template argument for MemoryTraits
0b943434d MemoryTraits::value -> MemoryTraits::impl_value
6e36acf38 Add comment in test
c43e45ecf Remove Aligned memory trait when creating subviews
4286774d1 Fix warning comparison of integers of different signs
a7daa592b Fix printing extents and rank in error message when copying views
10fae1f38 Fixup update Kokkos::rank(View) free function and drop outdated comment
314b96614 Add View::rank[_dynamic] static constexpr data members
2e53f1c3d Add Impl::integral_constant
15989dd3a Merge pull request #5882 from dalg24/deprecate_view_rank_uppercase_r
e348b6972 Deprecate View::Rank
05416c984 View::{R -> r}ank in perf tests
8487a9669 View::{R -> r}ank in unit tests
2840e8d70 View::{R -> r}ank in algorithms and containers
9fb2bbce3 Prefer View::{R -> r}ank
2b532d1f9 Fix cache configuration in CI (#5871)
d39885a2f Merge pull request #5873 from masterleinad/fix_version_macro_develop
b6cdada5b Also test the KOKKOS_VERSION_{LESS,GREATER,EQUAL}
31750118d Add compile-only test to make sure version macros are defined
1d228fa74 Fix version macros
2caf64137 add support to compile Kokkos for Ada generation (sm_89) consumer GPUs (RTX40x0)
2272d3b7f Merge pull request #5865 from msimberg/hpx-concurrency-non-static-member-function
b6c49a9de Make HPX::concurrency() a non-static member function
b9d405ade Fix unused function warning (SYCL)
d25b94b11 Remove unused variable
204b08547 Remove obsolete warning pragmas
b4bd01d48 Use double quotes instead of <angled> include
eb18f1d36 Port Atomic tests
6ab279105 Clean up perf_test CMakeLists
1b9a67fd3 Port Mempool performance test
aa20b2b31 Avoid multiple `main()` definitions
ab55654ae Disable unsupported benchmarks in OpenMPTarget
e3324b37f Port ExecSpacePartitionig tests
a45165d09 Merge pull request #5861 from msimberg/hpx-header-to-subdir
4da9dd924 Move Kokkos_HPX.hpp header into HPX subdirectory
cd8e67fae Merge pull request #5857 from dalg24/rm_unsused_files
cd107dd4d Merge pull request #5856 from dalg24/destruct_delete
36bc91e06 Port GramSchmidt tests
62b8421c2 Remove duplicated helper
90b71cb4c Use correct license headers
b6c619ae9 Add missing tests to Atomic minmax benchmark
e250ce3b6 Move command line helpers implementation into a header
7dd33f81b Remove ported benchmarks from Makefile
e0b5846cd Measure only allocation time
5534a8ff5 Remove redundant include
076d93189 Use named constants
924600bf7 Reduce repetition in ViewFill benchmarks
b1a3135d1 Reduce repetition in ViewResize benchmarks
25876cfcc Port Custom Reduction tests
5635e1379 Use common helper for reporting results
372d03e8b Fix units - Fill
4b8e0e1cc Port Atomic MinMax tests
063fe9a14 Port HexGrad tests
7c9f640ee Port ViewAllocate tests
66e53a955 Remove redundant include
9126797fe Clean-up Benchmark_Context and hide implementation details
1b2d07a00 Port ViewResize tests
5235c89d5 Port ViewFill performance tests
bbde3b134 Remove pointless dummy source file in core
344826082 Remove unused impl/CMakeLists.txt
8b19e2de5 Drop (unused) Impl::destruct_delete utility
d8d9c58c3 Check Kokkos::num_threads and device_id in tests
a4af6f7b6 Add Kokkos::num_threads() and Kokkos::device_id()
2aa257671 Dispatch Kokkos::sort(Kokkos::View) to SYCL oneDPL (#5229)
6f12ca25d Merge pull request #5852 from rgayatri23/OpenMPTarget_intel_pvc_edits
e7aeb9bb3 Merge pull request #5816 from dalg24/tpetra_atomics_max_abs
fa54c9710 Merge pull request #5850 from crtrott/no-deprecated-3-in-makefile
446532e38 Update core/unit_test/TestNumericTraits.hpp
86a442711 Drop (deprecated) KokkosCore_UnitTest_DefaultDeviceTypeInit_* from the makefile
cfb7b2f8a Merge pull request #5854 from dalg24/house_keeping
14f9425af OpenMPTarget: Replace KOKKOS_ARCH_INTEL with KOKKOS_COMPILER_INTEL to protect declare target on Intel GPUs.
34a21cb48 Merge pull request #5847 from dalg24/fixup_omp_thread_pool_size
387de48b7 Move { -> Threads/}Kokkos_Threads.hpp
be83e9a8b Move { -> Serial/}Kokkos_Serial.hpp
743625604 Move { -> Cuda/}Kokkos_Cuda[Space].hpp
1d8dd9078 OpenMPTarget: Enable declare target for all Intel GPUs.
8b2bf33c5 Merge pull request #5849 from dalg24/hpx_asyn_dispatch_warning
f2ec98dab Fix clang+cuda compiler warning about cudaDeviceSynchronize (#5846)
4c878a0a4 OpenMPTarget: Adding declare target for constexpr variables.
568bc2cb4 Don't enable deprecated code 3 in Makefile builds anymore
c005e607a Pass *this to in_parallel in OpenMP::impl_thread_pool_size()
f68098b4e Fix CMake warning when HPX is not enabled
cba11a18f Merge pull request #5841 from dalg24/desul_atomics_source_files
5bb7e0a54 Fixup deprecated code 3 code path OpenMP::impl_thread_pool_size
5ea96bcaa Update HPX backend to use HPX's sender/receiver functionality (#5628)
97ad51b9b Fix unused parameter warning in SYCL lock array and add comment
879d60798 Make OpenMP::concurrency and impl_thread_pool_size non-static (#5836)
46185fed3 Merge pull request #5840 from dalg24/nvhpc_arch_native
153aa59cf Merge pull request #5838 from dalg24/typo_deprecared
41166e149 Merge pull request #5833 from masterleinad/sycl_device_global_static_only
43ccea6a5 Desul atomics: Drop `DESUL_HAVE_{GPU_LIKE,FORWARD}_PROGRESS` macros
1d19328ed Desul atomics: SYCL lock arrays out of sync
37bcd4129 Desul atomics: cleanup macro guards in CUDA/HIP lock guard files
23e2d85b0 Desul atomics: conditionally append the CUDA/HIP/SYCL source files
93487cf1e Fix flag passed to NVHPC when `Kokkos_ARCH_NATIVE` is `ON`
ccbfb0086 Set native flags according to CMAKE_SYSTEM_PROCESSOR (#5831)
b8603a749 Fixup typo `#ifdef KOKKOS_ENABLE_DEPRECA{R -> T}ED_CODE_3`
c10edf35f Skip Tpetra reproducer with NVHPC compiler
f9f1808a6 Merge pull request #5834 from masterleinad/fix_unprefixed_macros_kokkos_host_mdpsan
a62aa40e7 Refactor OpenMPTarget backend (#5726)
f3d9efbd5 Fix unprefixed macros on KokkosExp_Host_IterateTile.hpp
dac21c753 Add non-standard `rsqrt` math function (#5644)
073ce8b9f Try using oneAPI 2023.0.0 in SYCL+Cuda CI (#5813)
b477f998b Merge pull request #5832 from PhilMiller/fix-crs-define
d41a6df17 HIP: Drop obsolete macro definition
87535d8c7 ViewLayoutTiled: Be scrupulous about macro naming and undefining
f4c8f8d28 OpenMPTarget: Be scrupulous about macro naming and undefining
ae585b7f0 CUDA: Fix up comment
fbceafdd0 CUDA: Convert simple value macro to constexpr
71e0ecaf4 CRS: Use Kokkos device function macros rather than duplicating code when compiling for GPU targets
ba4ebc403 Restrict KOKKOS_IMPL_SYCL_DEVICE_GLOBAL_SUPPORTED feature macro detection to static libraries
52586ef1b Merge pull request #5825 from dalg24/device_ptr_to_lock_array_in_constant_memory
0130a3f2e Initial OpenACC parallel_reduce implementation for Team policy (#5610)
59067d41e Use raw literal string to avoid having to escape characters in git commit message (#5823)
333157f73 Merge pull request #5742 from rgayatri23/OpenMP_regression_fix
8103d8263 SIMD backend of ARM NEON (#5775)
fb7d9f23f SYCL: Pass Xsycl-target-backend* only to the linker (#5705)
04e3437f7 Further update to CUDA occupancy calculation (#5739)
a564953aa Desul atomics: let pointer to the device lock arrays (HIP and CUDA) be in constant memory without RDC as well
22380c7c3 Merge pull request #5819 from dalg24/deprecate_kokkos_active_execution_memory_space_macros
92895ffbc Merge pull request #5818 from masterleinad/fix_all_t_deprecations
e8381d8ce Add TODO comment to replace fully-qualified name when possible
ecd23e4ac Spell out Kokkos::ALL_t to avoid deprecation warnings
789dfa772 Merge pull request #5821 from masterleinad/fix_sycl_ci_device_global
9d7257ad9 Fixup turns out Tpetra "abs max" operation does not preserve the sign
2e1a559cc Merge pull request #5820 from crtrott/fix-intel-ice-dev
eabd0e445 Disable global device variables in SYCL+Cuda CI
cd8eb9c10 Remove Cuda and HIP lock arrays altogether
f78d87ac8 Unwire initializing/finalizing Kokkos lock arrays
bd86fe907 Change `#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_{4 -> 3}`
2b5c31a45 Intel ICE Sacado: turn off support for nested OpenMP with ICPC
6701772fc Intel ICE Sacado: use new HostIterateTile API in OpenMP
6688cad1a Intel ICE Sacado: use new HostIterateTile API in HPX
b98e82477 Intel ICE Sacado: use new HostIterateTile API in Threads
80c770d4e Intel ICE Sacado: use new HostIterateTile API in Serial
6935f7054 Intel ICE Sacado: rewrite HostIterateTile
a6a023790 Deprecate `KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_*` macros
60f80e5d4 Merge pull request #5613 from masterleinad/sycl_extended_atomics
2f07a04e2 Fix initial value (identity element) for max abs
258bac69a Add unit test capturing Tpetra custom atomics use case
7cccd74f8 Merge pull request #5707 from masterleinad/sycl_update_2023
d63af25f9 Merge pull request #5814 from dalg24/scratch_locks
5d93865b0 Break lock array dependence of Cuda and HIP teams impl
5d87aa9f7 Merge pull request #5811 from dalg24/rm_desul_atomic_helper
8b8061605 Merge pull request #5810 from masterleinad/move_sycl_headers
61d8569fe Update Dockerfile used for SYCL+Cuda CI
05d008dd9 Address deprecations in oneAPI 2023.0.0
b5b05044c Update minimal compiler requirements for SYCL
0180ff565 Update architecture flags for SYCL+Cuda
b0be8e6c7 Disable tests failing with SYCL+Cuda after update to oneAPI 2023.0.0
3369267b5 Merge pull request #5800 from masterleinad/improve_comment_test_team
7a13414d1 Merge pull request #5767 from masterleinad/fix_scratch_again
84a336a52 Merge pull request #5807 from dalg24/all_t
adb3141e8 Drop desul_* helper functions in tasking
94d9c9e8b Merge pull request #5804 from dalg24/purge_legacy_atomics
ddefe6180 Issue warnings when using Kokkos::Impl::ALL_t
236e892a2 Fixup GH Actions compiler warnings (#5780)
6d90db37b Move all SYCL headers into SYCL directory
05f6a9aab Per review dropped superfluous const-qualifiers
4519e4ca8 Drop anonymous namespace around definitions of ALL, WithoutInitializing, and AllowPadding
e91f7e880 Guard using-declaration in Impl:: namespace with #ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4
5304a4080 Stay off Kokkos::Impl::ALL_t
7869915c0 Move Kokkos::{Impl:: -> }::ALL_t definition and add using-declaration in Impl:: namespace for backward compatibility
1668cf432 Merge pull request #5802 from ibaned/avx512-mask-fix
aeab5bd9d Merge pull request #5805 from dalg24/fixup_rocm54_force_global_launch_launch
d745c3173 Fixup deleted wrong branch in HIP locks
796e96488 Drop `KOKKOS_ENABLE_IMPL_DESUL_ATOMICS` macro define altogether
7f5ea6009 Update diff_files (might be worth revisiting logic)
52953c824 Remove a whole bunch of Kokkos leagacy atomics headers
44140f721 Get rid of #ifdef KOKKOS_ENABLE_IMPL_DESUL_ATOMICS in unit tests
c3fe1d607 Purge macro guards for desul atomics being enabled or not
c54547ef4 Fixup ROCm 5.4 ImplForceGlobalLaunch{Launch -> }_t typo in unit tests
153b4c1da remove const_cast with some code duplication
f253bc48f Print KOKKOS_IMPL_SYCL_DEVICE_GLOBAL_SUPPORTED in print_configuration
4c03c8d10 KOKKOS_SYCL_DEVICE_GLOBAL_SUPPORTED->KOKKOS_IMPL_SYCL_DEVICE_GLOBAL_SUPPORTED
05cb3f511 Purge logic around desul atomics being enabled at configuration time
cb67caf6a Warn at configuration time if attempting to disable desul atomics and force using it (#5801)
5212d90b5 Fix a bug in AVX512 simd_mask::operator[]
aa0f81e35 Replace HIP_LOCK_ARRAYS macros by functions (#5770)
a75c61300 Merge pull request #5796 from Rombur/force_global_launch
53cc29764 Improve comments in TestTeam.hpp
e4b3c8269 SYCL: Add support for arbitrary size atomics
5b3b6e7e1 Rename ImplForceGlobal to ImplForceGlobalLaunch
de37fc212 Merge pull request #5784 from masterleinad/drop_KOKKOS_IMPL_WORKAROUND_INTEL_LLVM_DEFAULT_FLOATING_POINT_MODEL
d50bdd0bb Merge pull request #5797 from cz4rs/container-options
cf6d43dc5 Merge pull request #5786 from dalg24/cleanup_rm_eliminate_warning_for_lock_array
478f087b2 Fix typo
b99fb31e3 Use GTEST_SKIP to skip test
c9929fc73 Merge pull request #5795 from dalg24/reduction_identity_char
0f8b7ca3a Skip test and add comment explaining why
29020350e Fix tests when using ROCm 5.3
d7aa278a4 Remove obsolete container configuration
13c4de2a4 Merge pull request #5793 from dalg24/fixup_jenkins_gnu_generated_makefile
cf67ab408 Force GlobalMemory launch for some Bessel tests when using ROCm 5.4
f9d95058a Add parameter to force using GlobaLMemory launch mechanism using HIP
2e6c2387f Drop KOKKOS_IMPL_WORKAROUND_INTEL_LLVM_DEFAULT_FLOATING_POINT_MODEL
e8c08e2c0 Fix sycl.scratch_align test
de26b23c5 Add missing ReductionIdentity<char> specialization
2bde8fc6b Merge pull request #5792 from masterleinad/improve_assert_macros
24ef794e9 Fixup warning in Jenkins CI build with GNU generated makefile
d2a73f956 Merge pull request #5791 from dalg24/dead_omp_test_source_file
73b4ca835 Prefer ASSERT_EQ over ASSERT_TRUE with ==
aa7865e84 Remove unused OpenMPTarget test source file
7475b8929 Remove dead OpenMP test source file
c30481854 Merge pull request #5755 from Rombur/hip-fix-global-launch
9f09e2b17 Drop unused Kokkos::Impl::eliminate_warning_for_lock_array CUDA/HIP functions
7f08b95a8 Desul atomics cleanup remove unused Impl::eliminate_warning_for_lock_array()
6e73a3540 Merge pull request #5785 from masterleinad/replace_sprintf
0e2fda878 Merge pull request #5642 from cz4rs/enable-flang
20b609a9f sprintf -> snprintf
b5bd709fd Merge pull request #5779 from cz4rs/upgrade-github-actions
4bd3e8588 Upgrade GitHub actions
7652228ed Use `flang-new` for Fedora builds
48e087415 Merge pull request #5777 from junghans/patch-5
619ed2d26 Fix build on Fedora rawhise
910d43e45 OpenMP: Adding an ifdef around chunksize for static schedule for GCC compiler.
728e3d3b3 Merge pull request #5762 from masterleinad/fix_scratch_space_for_sycl
0db3bd83b Fix a typo
4829fb2e4 Add a mutex to protect scratchFunctor
8f4f31d61 Merge pull request #5764 from dalg24/desul_atomics_config
ba0ad25c9 Merge pull request #5765 from ldh4/hpx_team_reduce_sfinae
7a3bfe039 Fix macro typo used in the OpenACC backend parallel_reduce(MDRange). (#5766)
97287f674 Remove unnecessary header
9f24f5587 Merge pull request #5763 from masterleinad/fix_openmp_with_deprecated_code_3
20abee9fd Let increment be of type uintptr_t fixing warning
17581963d Generate <desul/atomics/Config.hpp> file from the generated Makefiles
51aa90411 Desul atomics configure library based what the user enabled
45acff308 Fix reviewers' comments
a9c997cea Fix ScratchSpace pointer comparison for SYCL
aad87921e Merge pull request #5757 from dalg24/desul_atomics_drop_cuda_arch_macro_guards
02941a0a1 Merge pull request #5760 from dalg24/desul_atomics_gnu_and_msvc
7f883bc4c Merge pull request #5756 from dalg24/desul_atomics_sycl_macro
e5e87423d Added missing enable_ifs to hpx team parallel_reduce
33d7fce6c Fix compiling with OpenMP and Kokkos_ENABLE_DEPRECATED_CODE_3
1f68ab49e Desul atomics cleanup enable GCC or MSVC atomics
cd0b63125 Encapsulate staging inside scratch_functor
c6d7662d7 Merge pull request #5759 from dalg24/cmake_package_version_compatibility
49b00de78 CMake: change package COMPATIBILITY mode {SameMajorVersion -> AnyNewerVersion}
0986a3a86 Desul atomics: drop unnecessary macro guard that checks for__CUDA_ARCH__ in PTX assembly code
0e3848f0b Desul atomics: drop unnecessary macro guard that checks for__CUDA_ARCH__ in compare exchange
46aae0f14 Desul atomics fixup detect use of SYCL
989d996aa Merge pull request #5751 from masterleinad/update_kokkos_version_develop
296de1291 Return host functor instead of device one
487deee1c Apply clang-format
cde661d70 Update Kokkos version on develop
b47523388 Merge pull request #5722 from dalg24/openacc_parallel_reduce_mdrange
cf4358ee9 Add more comments
d0d64043b Merge pull request #5747 from dalg24/fixup_omp_makefile
00ab7630d Fixup forgot to add new OpenMP source file in Makefile
a84c7a5de Merge pull request #5741 from ndellingwood/update-testallsandia
57504c4b2 Merge pull request #5698 from masterleinad/static_assert_reducer
761ffda65 Fix HIP Global Launch with HSA_XNACK=1
dafb57794 Merge pull request #5738 from Rombur/refactor_openmp
459e8811b Merge pull request #5740 from seyonglee/openacc_cmake_make_bugfix
cf04bb525 [ci skip] update test_all_sandia
74a7988bf Minor bug fixes on CMake and Make configurations for the OpenACC backend.
fb47be7eb Merge pull request #5730 from tkordenbrock/tkordenbrock/fix-DynamicView-deep_copy-dp-sp
fbfa01e0a Move OpenMP UniqueToken to its own file
2f7e94a8b Move OpenMP functions out of Kokkos_OpenMP_Instance.hpp
f92270b5e Move part of Kokkos_OpenMP_Instance.cpp into Kokkos_OpenMP.cpp
48e86920e Move Kokkos_OpenMP.hpp to OpenMP/Kokkos_OpenMP.hpp
5d136ccda Static asserts for reducers
d2e574ce3 Apply clang-format
77d57d2a6 Merge pull request #5731 from dalg24/cleanup_cuda_blocksize_deduction
3697d4526 Merge pull request #5735 from crtrott/remove-kokkos-cxx-standard-from-buildmd-develop
e0ebaa53a Merge pull request #5733 from ndellingwood/fix-intel19-werror
edfb1e3ab Fix -Werror with intel/19
6aa7bf618 Remove KOKKOS_CXX_STANDARD mentioning from BUILD.md
67dff628b fix broken DynamicView test case #4
1f4468bda fix src/dst Properties in deep_copy(DynamicView,View)
d4bd01277 Revert "Drop pre CUDA 11 macro guards in occupancy calculation"
1fd858961 Drop now unsused `get_shmem_per_sm_prefer_l1` function
d34c75136 Drop pre CUDA 11 macro guards in occupancy calculation
4954ce289 Merge pull request #5689 from cz4rs/performance-results-visualization
a23580e8d Temporarily disable unsupported reduction tests in core/unit_test/incremental/Test14_MDRangeReduce.hpp for the OpenACC backend.
7e651ca70 Group similar options together
ef7fd6014 Configure `ccache` for benchmark builds
1134a1fb5 Simplify Kokkos configuration
64d9b44a1 Use maximum available level of build parallelism
9fd7187b5 Use correct GitHub access token
d17945316 Use correct branch for destination repo
9fbd78a01 Configure `ccache` correctly
901862190 Initial implementation of MDRange parallel_reduce
c6fae3fd5 Move definitions of `OpenACCIterate{Left,Right}` and `OpenACCMDRange{Begin,End,Tile}`
604dc86c0 Remove commented out code
327aac579 Add comment for PerformanceTest_* executables
3a1769b6b Build on pull request
176ae8bbb Use double quotes instead of <angled> include
67a92d345 Do not build tests and examples
92906bf38 Remove security options
2e0934100 Use separate .yml file for benchmarking
07b01efe5 Use correct header guards

git-subtree-dir: tpls/kokkos
git-subtree-split: 1a3ea28f6e97b4c9dd2c8ceed53ad58ed5f94dfe
etphipp added a commit to sandialabs/GenTen that referenced this issue Jun 26, 2024
25a31f881 Merge pull request #1877 from ndellingwood/master
b6a2db921 Update master_history.txt
14ad220a9 Merge branch 'release-candidate-4.1.00' for 4.1.00
1592d9ed9 Merge pull request #1874 from ndellingwood/fix-compatibility-kokkos-4.0
9620913d1 Merge pull request #1873 from kokkos/update-changelog-4.1.00
9e9351bd1 CHANGELOG: small updates
a3c07dfad CHANGELOG: organizing enhancements section
2579c4e3c CHANGELOG: reorganizing the new features section
c1176142b Update changelog for 4.1.00
a0d99bf69 Merge pull request #1868 from lucbv/MKL_INT
7871bd233 Merge pull request #1867 from bartlettroscoe/tril-11966-bad-batched-incl-dir
e624a7d3b Update to version 4.1.00
af312b9a0 Merge pull request #1850 from e10harvey/issue1764
340895119 Merge pull request #1865 from ndellingwood/update-testall
ec4a4cb09 Merge pull request #1864 from vqd8a/streams-tests-fix-small-numthreads
77745756f Add tests for nstreams=1
98eb68eda Merge branch 'develop' into streams-tests-fix-small-numthreads
4dbb5838e Check concurrency with nstream instead
c62d07442 cm_test_all_sandia: updates for blake
cec953f37 Merge pull request #1861 from cwpearson/fix/rocm-5.2.0-hang-quick
22b5f4ef1 Merge pull request #1862 from e10harvey/workaround_gnu_bug_81429
03998f350 Merge branch 'develop' into streams-tests-fix-small-numthreads
b2581bb2d Apply clang format
ba75b4b58 Remove redundant file
6a71179ab Restore orig. KokkosSparse_BsrMatrix.hpp
71f04ce8a Workaround checking OMP_NUM_THREADS with number of streams
f75ec31ce sparse/src: Add ifdef for doxgen < v1.9.7
ce8bb989f Benchmark cleanup for par_ilut and spmv (#1853)
6d79eaf5d sparse/src: Work around gnu compiler bug
478a56b53 use host pointer mode in rocBLAS scal
232b5bdac Merge pull request #1814 from e10harvey/issue1804
8b3c95135 Merge pull request #1856 from e10harvey/enable_sphinx_werror
8fae08018 Merge pull request #1783 from e10harvey/batched_gemm_eti
7865e88ac Merge pull request #1857 from e10harvey/issue1673
8b62c3851 Merge pull request #1855 from ndellingwood/issue-1749
eb92728a6 batched/unit_test: Optionally skip simd dcomplex4
558dbe4a9 docs: Update trmm. Add trtri.
24d259b0d docs: Fix blas rst files
dec2bcb8d Remove TestDeviceType
c5b2305aa docs: Enable sphinx -werror
07dc82a8d docs: Fix sphinx warnings
d88ad3523 sparse: Various doxygen fixes
9d723f6fe batched/dense: Add gesv DynRankView runtime checks
a907ca594 Merge pull request #1854 from ndellingwood/patch-match-trilinos-11921
87a384657 Address PR feedback
fea22d883 Revert ".github/workflows: Print out arch in osx CI"
341a4779f Revert ".github/workflows: Print out arch in osx CI"
91c0b606a Revert ".github/workflows: Print out arch in osx CI"
0f54c3da9 Merge pull request #1852 from e10harvey/docs_parilut_handle_fix
48d67ff62 CMakeLists.txt: Add all_libs alias
a8884845a CMakeLists.txt: Add alias to match what is exported from Trilinos
127c28198 Remove non-existant subdir kokkos-kernels/common/common (#11921, #11863)
d7c9a0771 docs/developer: Add Experimental namespace
fa2bdef62 Merge pull request #1843 from e10harvey/docs_compiler_profiling
b43d47557 Merge pull request #1844 from bartlettroscoe/remove-nonexistant-incl-dir
b3328390e KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos/Trilinos#11545)
ef98cb76a Merge pull request #1848 from e10harvey/fix_typos
4b3bab673 Merge pull request #1849 from ndellingwood/update-cmake-option-naming
5b369abef Update cmake option naming in docs/comments
723ab23aa blas/tpls: Fix gemm include guard typo
c5302a1ca docs: Add profiling for compile times
ac60cd4e2 Merge pull request #1841 from cwpearson/fix/spot-check-tpls-rocm
9292be86d batched/dense/impl: Fix headers
407e31a99 Merge pull request #1835 from dalg24/cuda_uvm
3a1ea766b batched/dense: cleanup gemm handle
5ece26d89 batched/dense: cleanup and move ETI into spec file
90c8a5ed1 batched/eti: Use Trans from KokkosBlas
48d647966 cmake: Fix batched eti args
557002b53 batched/CMakeLists.txt: ETI valid args only
d55ba7bf3 batched/dense/unit_test: Add TEST SKIPPED prints
1c256b1a3 batched: fix eti avail and wrapper
033a75e27 Merge pull request #1820 from vqd8a/sptrsv-solve-streams
940217b31 Merge pull request #1840 from ndellingwood/update-caraway-queues
c0349db3a .github/workflows: Print out arch in osx CI
611641996 .github/workflows: Print out arch in osx CI
9d4de5dbe add rocblas and rocsparse to --spot-check-tpls
9ad25c9b9 batched: note that tpl struct is unused
f663066d6 batched: Remove empty decl ETI files
721f388f9 batched: Populate avail eti files
d55fb1054 .github/workflows: Print out arch in osx CI
f64d6361a batched/dense: Add HostLevel Gemm unification layer
62b863de5 batched/dense/impl: Remove forward decls
dca6ee561 batched/dense/src: Add KokkosBatched_HostLevel_Gemm.hpp
40d76ebc6 perf_test/blas/blas3: Add compile-time checks for BatchLayout
57bfb3f0b batched/dense/unit_test: Run tests if ETI_ONLY is disabled
7ad0ede54 Start moving into HostLevel headers
813d02967 minor cleanup
60ddbb25a Fix constexpr branch
7b6073bb9 batched/eti: ETI host-level interfaces
237597a00 cm_test_all_sandia: update to add caraway queues for MI210, MI250
3917bd320 Merge pull request #1821 from lucbv/spmv_benchmark
82d93a25c Support rocSparse in rocm 5.2.0 (#1833)
5070d87b5 Merge pull request #1824 from e10harvey/issue1823
5ea1c3c32 Update perf_test/sparse/KokkosSparse_spmv_benchmark.cpp
2b3a070c1 applying clang-format
1a69ed2ae SpMV benchmark: adding logic for spmv algorithm
29c24f2bd SpMV: applying clang-format to benchmark
e3b6eb19e SpMV: adding logic in benchmark to chose algorithm to test.
09dc9ff27 SpMV: applying clang-format to benchmark source file
f75527cd6 SpMV: adding benchmark for spmv
7df961ef9 Merge pull request #1836 from dalg24/cleanup_kokkos_enable_pthread
49b0c491d Merge pull request #1834 from dalg24/remove_dead_code
08f4a4613 Merge pull request #1828 from ndellingwood/fix-cusparse-version-check
c74db8cc0 Merge pull request #1826 from brian-kelley/FixRhelNightly
058f099e1 Merge pull request #1827 from lucbv/Kokkos_ALL_t
6f26e1527 Drop outdated workarounds for backward compatibility with now unsupported Kokkos versions
e329be8dc Do not bother querying the value of Kokkos_ENABLE_CUDA_UVM
3273a031b Do not adjust KokkosKernels_INST_MEMSPACE_CUDA[UVM]SPACE default value
ebd1406fb Remove dead code guarded by `#ifdef KOKKOSKERNELS_INST_MEMSPACE_CUDAHOSTPINNEDSPACE`
abe8558b1 Remove remaining decl.hpp files
b1e22208f Remove includes of decl.hpp files
ad541587d sparse/eti: Remove unused decl.hpp.in files
2f0ce87ca Merge pull request #1830 from ndellingwood/weaver-update
4a8667228 scripts/cm_test_all_sandia: Update cuda11 modules
09a4820b3 cm_test_all_sandia: updates for weaver
d0f4a9ca0 Merge branch 'develop' into sptrsv-solve-streams
ea3321c2f Apply clang format
bf498cd4a Remove unnecessary code
b3ef19c74 Applying clang-format
28e813086 Sparse: fixing a few issues related to coo2csr and par_ilut benchmark
f30291cd1 spmv cusparse version check modified for cuda/11.1
1424f8aef Kokkos 4 compatibility: modifying the preprocessor logic
990d7db76 Fix errors and warnings in sems-rhel nighly
2bb633d46 .github/workflows: Summarize github-DOCS errors and warnings
69d0a8b5b Add BsrMatrix SpMV in rocSparse TPL, rewrite BsrMatrix SpMV unit tests (#1769)
63eab04f5 Merge pull request #1819 from ndellingwood/fix-rocblas-build-2
86784956b Merge pull request #1816 from cwpearson/ci/KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520
7c9e7b433 Merge pull request #1822 from lucbv/ger_doc
3794a36be Merge pull request #1818 from jgfouca/jgfouca/par_ilut_perf_test_refactor2
af4688919 Ger: adding documentation stubs in apidocs
5b1c1f4fa Remove unused variable
19333668c Merge branch 'develop' into sptrsv-solve-streams
89d67ff14 Apply clang format
924cdee42 Add unit test for sptrsv via streams
787c711bb Merge pull request #1686 from e10harvey/coo2crs
725b46b89 apply clang-format
b8a22cc6c blas: fixups for ger exec space instances
146ce522f blas: various rocblas execspace fixes
4f1abd794 apply clang-format
954750d0c rocblas tpl spec: add missing comma separating vars in some macros
42ef78393 Merge pull request #1756 from eeprude/ger2
6e80b37f9 formatting
b60e681da Reorganize par_ilut performance test
bf06fef9a Merge pull request #1810 from ndellingwood/fix-rocblas-build
28a0421c2 Merge pull request #1812 from lucbv/blas2_3_on_stream
98c6509eb Merge pull request #1817 from bartlettroscoe/tril-11545-kokkos-no-subpks-develop
ab0f774cd Workaround for #1777 - cusparse spgemm test hang (#1811)
6c514ff1c Merge pull request #1813 from ndellingwood/update-changelog-4.0.01
4e6c85c39 Docs: adding stubs for trsm and trmm and updating gemv and gemm
cd242ba2f New performance test for par_ilut, ginkgo::par_ilut, and spill (#1799)
0ba9eaa3a Manually remove redundant Kokkos dep (#11545)
099d05784 Run script remove_kokkos_subpackages_from_trilinos_packages_r.sh (#11545)
9d95d49d1 only enable KokkosBlas gesv test for CUDA+MAGMA and HOST+BLAS
ff664866d cm_test_all_sandia: load openblas/0.3.20/rocm/5.2.0 for TPL spot check on caraway
28254863f Apply clang format
166716a87 No need to fence after each level
415deb091 Update changelog
1ae83cf16 Update changelog
4fc4831fb Update changelog
2f78417b7 Some changes in sptrsv_solve_streams for cuSPARSE < 11.3
5d027ccec Add sptrsv_solve_streams for cuSPARSE < 11.3
db991036e BLAS2/3: applying clang-format
c00c8a6e3 BLAS2/3: fixing some TPLs issues with execution space code path
a725974a3 Minor fixes for sptrsv cuSPARSE
1331baf11 Merge pull request #1808 from ndellingwood/master
e65f61147 sparse/unit_test: Use host mirror of RandCsMatrix map
005530354 Minor compilation error. Thanks to Luc for the proper suggestion.
19903279d Formatting
7720c8199 Added explanations
0e26dd1a1 Tests passing now at blake
99cbf779d Possible corrections for test on blake
31f2b0555 Fix name mismatch with rocblas tpl spec layer
5a5a2946c sparse: Encapsulate CooMatrix. Cleanup coo2crs TODO.
c208dacae Update master_history.txt
8809e41ca Update to version 4.0.1
9cee1a3d7 sparse/unit_test: Check last entry of col_map. Improve readability.
946f29a63 Merge branch 'develop' into sptrsv-solve-streams
29034f31a Minor changes to match L solve and U solve implementations
311157f62 Merge pull request #1795 from lucbv/norms_on_stream
2087e7009 BLAS3: starting to add stream support for TPL code path of trmm/trsm
4231677db Formatting
89eab5240 Changes made for compilation in blake
961b6362a Changes for testing in blake
6e06af03c Backup
215a00692 Formatting
ab59a34cf Another typo
ac307232e Typo
5cf9c3ea9 Formatting
0453f0d02 Forgot some spots that need a template parameter for the execution space
f27e4d034 Formatting
792bd5fa8 Correcting compilation errors on blake
a368dd3cb Formatting
6742ef3bf Solving compilation issues on the automatic tests
52a2a2de2 Corrections for some automatic tests that are failing
3a91bb0e5 Proper formatting
9f49fb972 Addressing new feedbacks from Luc.
d94c0139e Minor corrections
629337c26 Needed to format two extra files in kokkos-dev-2 in order for the automatic 'check' step to pass
7ce9d9f83 The clang formatting from kokkos-dev-2 puts a space into these 3 files, which needed (the space) to be removed in my Mac in order for the compilation to work. Tests pass in my Mac.
e41861865 All files formatted with clang 8.0
414210378 Addressed all feedbacks from Luc and Kim
b21194af4 Handling compilation warnings and errors at weaver
99a3b9dac All changes again, because previous branch got changes beyond those related to ger
13c5d8633 BLAS2/3: adding proper execution space interfaces to gemv and gemm
31e00593f Merge branch 'release-candidate-4.0.01' for 4.0.01
a46ebd5e9 Merge pull request #1719 from lucbv/gmres_type_fixes
d3b8bc823 BLAS1: adding final fences for code path that return host results
cb9fc79da Merge pull request #1768 from e10harvey/more_sparse_docs
f83016589 BLAS1: applying clang-format
e36c50e4b BLAS1: nrm2w adding support for execution space overload
20463f2a4 BLAS1: nrm1/nrm2 update CUBLAS calls
a0d52184d BLAS1: nrm2(_squared) updated to have executions_space overload
f0088ab94 BLAS1: nrminf fix in the TPL layer for execution space overload
be556c08a BLAS1 nrminf: adding execution space overload
a760a1d60 BLAS nrm1: fixing issues with TPLs
4538fc446 Blas1: updating nrm1 interface to accept execution space instance
ccf8f1557 Merge pull request #1805 from lucbv/blas1_on_stream_docs
03d678724 BLAS1: clang-format for documentation... : (
6606dde03 BLAS1: documentation adding default space info and non-block statement
daf1edce6 BLAS1: updating documentation for changes in PR #1803
3ce7f2985 Merge pull request #1803 from lucbv/blas1_on_stream
6d673920c Merge branch 'develop' into sptrsv-solve-streams
5f89a772f sparse: Fix intel build error
bb0e2fef3 BLAS1: fix documentation for fill and mult and apply clang-format
bf09ba19b BLAS1: fix CUBLAS TPL layer for axpby and scal
fa03d4884 Update blas1.rst
fb6318907 Merge pull request #15 from brian-kelley/GS_Docs
ffefb5386 BLAS1: applying clang format
9d45383d2 BLAS1: fix some Host BLAS TPL issue with execution space overload
b3d73f1d0 Add doxygen for user-facing Gauss-Seidel functions
2949394c0 BLAS1: apply clang-format
93986fd68 sparse: coo2crs add RandomAccess to BmapViewType
2d3c2c4f4 Update sparse/src/KokkosSparse_par_ilut.hpp
4ad4962c5 Update docs/developer/apidocs/sparse.rst
8a35f819a Update docs/developer/contrib.rst
4ce5d2a4e sparse: coo2crs and crs2coo updates
394409fb4 docs: build_doc
4c6d55b11 docs: Update contrib
6e150ac9d sparse: CooMatrix
6016771b3 sparse: CooMatrix
82e13ca28 Update changelog
0dcbd6a17 par_ilut: make Ut_values view atomic in compute_l_u_factors (#1781)
710a2396b Jgfouca/remove par ilut limitations (#1755)
8233f7330 ParIlut: create and destroy spgemm handle for each usage (#1736)
957298552 GMRES: fixing some type issues related to memory space instantiation
49339eb3f Merge pull request #1661 from jgfouca/jgfouca/par_ilut_test
8077e640b Update changelog
4df81e5d0 Fix #1758 (#1762)
221495705 Merge pull request #1763 from lucbv/roc_tpls_upgrade
98c72b5f4 Merge pull request #1759 from tmranse/tmranse/mdfInterface
cfd5928e2 Update changelog
8928788a4 Update version to 4.0.01
229608457 Patch Trilinos #11663
48ca11b50 Fix kk_generate_diagonally_dominant_sparse_matrix hang (#1689)
99654d8cf Merge pull request #1737 from e10harvey/reduce_test_coverage
8e90d005f Remove unused variable (#1734)
db917b2f4 Merge pull request #1727 from lucbv/cuda_11_4_fixes
6cfc547ce Merge pull request #1704 from e10harvey/doc_typos
1f266de0d Merge pull request #1698 from cwpearson/fix/kk-1692
4b731c4fb Merge pull request #1801 from e10harvey/include_omp_settings
01547c447 Blas1: supporting execution space on BLAS1 kernels
1d33c6f9b scripts: Include OMP settings
f78e4eb74 sparse: specify memory space for coo2crs
ea9db31d1 Merge pull request #1800 from brian-kelley/Fix1798
40eac2958 Fix #1798
790c9f506 Blas1: adding execution space instance interface for abs
f69755715 Merge pull request #1797 from kokkos/cwpearson/docs-apt-update
81477dc0d Update docs.yml
ec611fe92 Blas1: adding execution space overload of axpy and axpby
038def615 sparse: Add coo2crs, crs2coo and CooMatrix
a2a741da2 Merge pull request #1649 from e10harvey/get_ci_back_up
6dc008e11 Merge pull request #1796 from e10harvey/fix-docs-check
0b871d129 Remove deprecated code
2b63c1a61 scripts: Fix github-DOCS
26dac2932 scripts: Final changes for clang 10
a176b931b Fix #1786: check that work array is contiguous in SVD (#1793)
03f48fae6 BLAS: fixes and testing for LayoutStride (#1794)
e3a42e418 Fix compile errors
f3ec3b464 Merge branch 'develop' into sptrsv-solve-streams
bcaa37fc8 Merge pull request #1751 from NexGenAnalytics/benchmark-blas3-tests
507c29f68 par_ilut: make Ut_values view atomic in compute_l_u_factors (#1781)
1a6f22b1c Report layouts used
c025caacd Port blas3 gemm test
5015a2cdf Merge pull request #1733 from NexGenAnalytics/5-google-benchmark-blas2-tests
e2d1a1d69 Merge pull request #1790 from kliegeois/fixUnusedVar
ec392dc43 Merge pull request #1789 from NexGenAnalytics/benchmark-openmp-context
b654dd63b Merge pull request #1784 from masterleinad/fix_sycl_printf
7c798ae97 cuSPARSE trisolve with streams
0fd4f2878 Fix unused variable warnings
b1185f3a9 Include OpenMP environment variables in benchmark context
97187c3af Allow passing additional arguments
20ad98ac6 Add execution space to policies
15d616983 Reduce duplication
5d237f8b6 Support all command line parameters
35ee9ee7e Fix formatting
332485486 Add registration wrapper
34a228689 Parse blas2 custom command line parameters
f38b56ab1 Let benchmark decide number of iterations
03728a8b8 Use CMake helper for ODE_RK benchmark
1d70e7aeb Parse common parameters
10dc298b5 Move warm-up out of benchmarking loop
24923b79e Use separate executable
6c21c4df2 Revert changes to blas1 benchmark
278d18fac Use stored time value
b3da12558 Use correct header
7336d9c2f Add a benchmark for LayoutRight
6d027010a Let benchmark calculate FLOP/s
0678b55b1 Include scalar type in the output
e87d532c2 Let benchmark decide the number of repetitions
3b8c2da3d Remove redundant output
bfc68039d #5: Create blas2 gemv benchmark test
8154037ff Merge pull request #1779 from NexGenAnalytics/8-refactor-cmake-mkl
9f12713ad Add --enable-docs option to cm_generate_makefile (#1785)
0a95fff2d Merge pull request #1776 from tmranse/mdfComplex
31ef8f6bf Intial stream interface
837bf841d Merge branch 'develop' into sptrsv-solve-streams
a645960c9 Merge pull request #1773 from brian-kelley/SortAndMergeEarlyExit
005822bcf Merge pull request #1728 from vqd8a/spiluk_numeric-streams
0564b18d3 Merge branch 'develop' into spiluk_numeric-streams
4ca54ed15 Use KOKKOS_IMPL_DO_NOT_USE_PRINTF in Test_Common_UpperBound.hpp
e2ca0694a Merge branch 'develop' into sptrsv-solve-streams
6dc2a6a53 Re-enable and clean up triangle counting perf test (#1752)
378ffb32e Merge pull request #1770 from kliegeois/device_blas2
dc6f763f3 Remove the printf inside the team kernels.
0ae0d31e1 Formatting & remove unused typedefs
17b71d2b3 Add compile-time checks for SortCrs functions
893132ccd Allowed template arg deduction for sort_, sort_and_merge
d49004f77 Remvoe deprecated KokkosKernels::Impl:: sort functions
f666fba99 Sort and merge improvements
47322fbe5 Merge pull request #1778 from lucbv/fix_gesv_uninitialized
ec7ce2133 Gesv: using a value-initialization after all
397a3c660 Gesv: adding small comment for clarity
2114d03b6 Merge pull request #1754 from lucbv/ode_explicit
2bd997ae3 #8 added SYCL path for MKL in FindTPLMKL.cmake file
788018fd4 Batched Gesv: initializing variable to make compiler happy
6b4b8bb17 ODE: fix small typo and rebase error
22cd43ce1 ODE: adding support for adaptive time stepping
9ff29b38d ODE: adding new component for time integration
51ac81620 use crs_matrix view traits for magnitude view
1c2105bb1 remove deprecated Rank call
8ef7d05e8 Move TeamSpmv and TeamVectorSpmv to KokkosSparse
70db534be add support for complex data types in MDF
8f3574e33 spgemm handle: check that A,B,C graphs never change (#1742)
a975fa3e0 #8 Updated FindTPLMKL.cmake to support SYCL option from kokkos
aa96a83ad Jgfouca/remove par ilut limitations (#1755)
7d6485eaa Formatting
43bf36595 Make Werror build happy
f8b2a5e5a Update docs/developer/apidocs/sparse.rst
4dd7e613c Add par_ilu numeric docs
53599f47d Fix #1758 (#1762)
6c003deb3 Fix the doc of KokkosBlas2_team_spmv.hpp
bebcf360d Using Kokkos::ArithTraits instead of Kokkos::Details::ArithTraits
24cb9017b Add calls to KokkosBlas Gemv and Spmv for team batched kernels when m==1
5edb51a45 #8 update FindTPLMKL.cmake to use find_package(MKL)
c9d22ca1b #8: made functionnal current version (v1) for MKL
5ece7b3dd Merge branch 'develop' into spiluk_numeric-streams
e35ed210b Merge pull request #1763 from lucbv/roc_tpls_upgrade
30bd681ff Merge pull request #1759 from tmranse/tmranse/mdfInterface
75c14cd0b Add par_ilut symbolic docs
a2b18d73e Merge pull request #1765 from e10harvey/host_level_docs
1b123b177 Merge pull request #1767 from e10harvey/update_actions_checkout
a9189f56a clang-format...
3065eb31c ROCSPARSE: fix unused variable in unit-test
01c49a8d2 docs: Add stubs for some sparse APIs
f2c217d57 .github: Update to actions/checkout@v3
3d28a4730 Merge pull request #1711 from cwpearson/feature/search
aaadaa0dd docs: Include BatchedGemm
a0a928194 Merge branch 'develop' into spiluk_numeric-streams
1491bd433 Add exec instance support to sort/sort_and_merge utils (#1744)
8e77c01cc TPLs: replicating changes made in Trilinos for ROCBLAS/ROCSPARSE
45a8d3baf address reviewer comments and run clang-format
b079a4e2d Merge pull request #1672 from brian-kelley/FixSpaddPerftest
25dbdcb9b #7 Removed V2 and V1.
f49d41ead #7: V3: simplest way to get rocsparse and rocblas
5c8d760a3 #7: V2 Added hybrid version for rocblas and rocsparse
8efb0356c #7: (v1): old way for rocsparse and rocblas
27ec2cdb8 Spgemm perf test enhancements (#1664)
a94163cbc Patch Trilinos #11663 (#1757)
0e615295f Merge pull request #1753 from kliegeois/device_blas_refact
a2c1610a8 accept r-value A matrix
f11a70ab6 Merge branch 'develop' into get_ci_back_up
6bcfac5bd Adds team- and thread-based lower-bound and upper-bound search and predicates.
9f2399310 Merge branch 'develop' into spiluk_numeric-streams
b483cfce3 Merge pull request #1732 from cwpearson/fix/kk-1731
c77395716 Add calls to KokkosBlas Dot and Axpy for team batched kernels when m==1
11d442b51 Deprecate Kokkos::Details::ArithTraits (#1748)
a3c919474 Merge pull request #1750 from NexGenAnalytics/1718-print-google-benchmark-version
5595b4a92 Leverage std library in BsrMatrix constructor
943cfc6bb add access to inv permutations to mdf handle
38789c2cc add ability to generate compile_commands.json for clangd
252fbf8a2 Clarify comments for context helper functions
d2f9e0113 Mark functions as inline where appropriate
0912b67ac Include google benchmark lib version in benchmark output
1554ee7a8 Extract benchmark CMake code into a separate file
0e507ae38 openblas is now in standard modulepath
aec946c28 Merge pull request #1737 from e10harvey/reduce_test_coverage
873e2a8b1 Merge pull request #1693 from NexGenAnalytics/5-print-get-CUSPARSE-CUBLAS-versions
2a5309b39 Use concurrency() rather than impl_thread_pool_size()
bf9ed2aee ParIlut: create and destroy spgemm handle for each usage (#1736)
fd7f6e515 cm_test_all_sandia: Add llvm/10.0.1
55f24857e perf test utils: fix device ID parsing (#1739)
a7e7bcb74 Merge pull request #1722 from NexGenAnalytics/5-add-git-info
664bfc4d3 Fix kk_generate_diagonally_dominant_sparse_matrix hang (#1689)
60881471b Remove unused variable (#1734)
2cfc5082b spadd perf test: use common infrastructure
2dff92063 Avoid errors about not finalizing Kokkos
1e0fb0249 Fix/enhance backend issues on spadd perftest
ee059d078 Improve readability
323cefa5d Do not print CUBLAS_VER_BUILD
b6f4c80e9 Rename functions
9cc9328c7 #5: added TplsVersion file and  print methods
54d70dc83 Remove sample benchmark
72de68a8d Revert "Enable benchmarks in CI"
a21ce0982 Enable benchmarks in CI
e8b2d6cd0 Use constexpr variables for git info
2f9352acc Switch to header-only implementation
c32f3ad06 Include git information in benchmark context
3b466361c Generate git information during build
bc9265b0d Fix typo
ff097ec63 Merge pull request #1636 from NexGenAnalytics/5-google-bench-dot-test
be9310d97 Reduce BatchedGemm test coverage
221f7abc0 Work around instance resource limits
4e6c1d76e Merge branch 'develop' into spiluk_numeric-streams
560f37286 Fix unused-parameter nstreams error
cb11f0cff Use clang modules
950f633b7 pull in mkl
5c8067c93 More cleanup.
fa5bdf509 More cleanup
4b4e7b82f Cleanup. Need clang toolchain
f2184cf60 Use openblas tpl
3ac5a6fe1 Use stdlibc++ from gnu 8.2.1
678783275 Get a C++17 stdlibc++ in the path
b8ebb9564 scripts/cm_test_all_sandia:   - Add boiler plate for gnu/10.2.1 and intel/19.0.5.281.
afd686eb5 Merge pull request #1723 from kokkos/docs/cwpearson-html-only
26332eda6 Merge pull request #1727 from lucbv/cuda_11_4_fixes
9b0dfbd0f CUDA 11.4: fixing some failing build while trying to reproduce issue #1725
26bf33311 Merge pull request #1726 from e10harvey/ci_format_docs
ff31df01e .github: Automation reminder
5c2702283 Make Sphinix optional
a9877dc6f Install doxygen-latex for HTML docs
3ec0cb7fc #5: Rebased on develop and added kernels print_configuration call
8be303261 #5: Added better name for benchmark tests
56ef2095f #5: Added team dot benchmark test
4fc790848 #5: Fixed clang-format
e9c968cdd #5: Added dot_mv benchmark test
7be07e5a4 #5: Fixed clang-format errors
7dfe9efde #5: generalized execution space and removed unused include
0361d1d32 #5: Added benchmark dot perf test
482cc00f6 clang format
d83c123ea Add nstreams to symbolic call
08e3824f3 Apply clang format to Test_Sparse_spiluk.hpp
004c1c041 Fix undefined reference errors and clean up printf statements
d17877163 Apply clang format
1f74d4399 Add nstreams to avail_byte calculation
f055b6977 Merge branch 'develop' into spiluk_numeric-streams
f658cc4dd Add spiluk_numeric_streams interface
048155245 Merge pull request #1720 from dalg24/drop_pre_kokkos_36_workaround
f41ff478c Merge pull request #1719 from lucbv/gmres_type_fixes
d9df4fd6b Drop obsolete workaround checking whether KOKKOS_IF_ON_{HOST,DEVICE} macros are defined
e5c8da8fc Merge pull request #1710 from cwpearson/feature/iota
ba311291c Adding fix for LUPrec
3831a680a Merge pull request #1707 from lucbv/kk_config_version
b209a157c Merge pull request #1691 from cwpearson/fix/cmake-force
2f069c4fc Use the options ENABLE_PERFTEST, ENABLE_EXAMPLES (#1667)
4414f46c1 GMRES: fixing some type issues related to memory space instantiation
fa3dd4e13 Merge pull request #1717 from ndellingwood/update-changelog-4.0
b202dcbfd Merge pull request #1714 from cwpearson/ci/format-diff
abcf8d4d1 Merge pull request #1716 from ndellingwood/issue-1715
4f39a18ec Merge pull request #1698 from cwpearson/fix/kk-1692
6ce7ea4ec Merge pull request #1695 from kokkos/update-changelog-to-4.0.0
4abf2a3a8 rocsparse spmv tpl: Fix rocsparse_spmv call for rocm < 5.4.0
8ed861214 Adds KokkosKernels::Impl::Iota, a view-like where iota(i) = i + offset
50758c1b2 Merge pull request #1712 from cwpearson/tests/spmv-controls
813626471 Merge pull request #1701 from cwpearson/fix/kk-issue-1700
3a2064350 Merge pull request #1704 from e10harvey/doc_typos
fcf349d33 print the patch that clang-format-8 wants to apply
6ead86002 add explicit tests of opt-in algorithms
a3ab61082 CUSPARSE_MM_ALG_DEFAULT deprecated by 11.1
8697db1e4 Merge pull request #1709 from lucbv/comp_4_0_0
d63de38b5 Merge pull request #1707 from lucbv/kk_config_version
76968d3f7 Merge pull request #1691 from cwpearson/fix/cmake-force
7f3acf133 Compatibility upgrade: adding compatibility branch in code
8469d478f Kokkos Kernels version: need to use upper case variables
f40aabfea Merge pull request #1706 from lucbv/fix_team_mult
db0071a43 team mult: applying clang-format
562aaffd9 team mult: fix type issue in max_error calculation
d692d3585 Merge pull request #1703 from cwpearson/fix/kk-1702
f1dd58cf7 Merge pull request #1694 from lucbv/test_eti_only_off
0b88c05ed test mixed scalars: adding more comments and sending msg to cerr
31190a68c blas/blas1: Add mult docs
f46b24258 blas/blas1: Fix a couple documentation typos.
e4b324c8c test mixed scalars: incorporate Evan's comments
016384fff View::Rank -> View::rank
feb9f9ae6 use rocsparse_spmv_ex for rocm >= 5.4.0
e9ec43800 Introduce KOKKOSKERNELS_ALL_COMPONENTS_ENABLED variable
5153da336 Merge pull request #1697 from cwpearson/fix/kk-1696
8aa7fa23e cast Kokkos::Impl::integral_constant to int
602c526d7 Tested mixed scalars: removing temporary output
557e62a67 Test mixed scalars: more fixes related to mixed scalar tests
6d73c141e Merge pull request #1687 from lucbv/version_integration_fix
45ffc0849 Versions: fixing the CMake logic to export Kokkos Kernels version
5f5b9e0c5 Merge pull request #1685 from e10harvey/test_eti_only
37efc3bff Merge pull request #1665 from NexGenAnalytics/5-print-configuration
8206953f5 scripts: add --disable-test-eti-only
d2386da91 Merge pull request #1615 from lucbv/gemm_mixed_scalars
1fccf4a27 Mixed Scalars: fixing typo
31a756661 Mixed Scalars: fixing some type conversion in unit-tests
92b82ef88 Mixed Scalars: modifying one more test according to review comment
1507de8dc Mixed Scalars: modifying according to PR comments.
e9f463439 Mix Scalars: fixing the tolerance in axpby
d76e8e18a BLAS: mixed gemm
4a29cafbe Merge pull request #1683 from vqd8a/spiluk-nondeterministic-numeric
2140e99b0 #5 Fixed typo
7f579fb5c #5 rebased on develop and updated print_version method for kernels
d12158be6 #5: Fixed mistake in filename and updated Kernels version key
3ddf1dea0 #5: Fixed clang format and removed form this PR benchmark modification
8c1a89e0e #5: Added inline to avoit multiple define problem
e3c311bd7 #5: updated key verification
95b9ddcb5 #5 Updated print_configuration content format
32d58f6c3 #5: fixed previous commit mistake
634b2cad7 #5: added print_configuration file and its test
b60e9913f #5: moved print_configuration to header only file and added its test
cc11c6d7a #5: Added basis for print_configuration method
9455f6505 BLAS: fix build with KokkosKernels_TEST_ETI_ONLY=OFF
747bb9303 Merge pull request #1661 from jgfouca/jgfouca/par_ilut_test
9ff35198d Add utility KokkosSparse::removeCrsMatrixZeros(A, tol) (#1681)
c7765bc1d Merge pull request #1680 from lucbv/export_version_info
a66a5d6d6 Fix uninitialized error
a67bc42ce Apply clang format
d7ca7e7a4 Merge branch 'develop' into spiluk-nondeterministic-numeric
e2b8df3fd Make hlevel_ptr a separate allocation
6d02704ad Remove one unnecessary barrier
0b5bc7a61 Fix race condition when read and write L_values at the same k
76d9ed4ab formatting
7f78fceb1 Support alpha and beta in LUPrec::apply
304bcdaea Merge pull request #1676 from lucbv/perf_test_wrapper
fd8bf8ae4 Update perf_test/sparse/KokkosSparse_mdf.cpp
a936394f1 Merge branch 'develop' into spiluk-nondeterministic-numeric
b0965b7d4 Spgemm non-reuse: unification layer and TPLs (#1678)
d3ffe8214 Perf Tests: adding utilities and instantiation wrapper
c9e631b61 Version: applying clang-format
c284ef4ac Version: adding unit-test to verify that version info is available
c2486ab14 Merge pull request #1679 from dalg24/view_rank
6a6a51045 Fix warnings
1d19eeabb CMake: export version and subversion to config file
a05f21e3b Prefer View::{R->r}ank
91222dba2 formatting
6a4bf14ce Address GH feedback
9586dd948 Use sptrsv instead of blas::trsm
aac450bd3 Merge pull request #1624 from lucbv/MDF_alg_upgrade
9095beb5c MDF: improving performance and adding performance test
61ba79b8a Merge pull request #1677 from masterleinad/update_sycl
167ad420e Update SYCL docker file to include oneDPL
566570a87 Temporary workaround for Kokkos #5860 (#1675)
d1ee1a43e format fix
e50849b37 Fix for openmp-only
771f0f2cf Fix warnings
6615b77c0 Merge remote-tracking branch 'origin/develop' into jgfouca/par_ilut_test
08b71b3ff Fix @file tags in a few headers
d83b0649c Turn off main par_ilut+gmres test if kokkos::serial is not enabled
a89349ddb formatting
dd930a662 Fixes: trsm expects host views
b9bcc5f49 Add new assert/require macros. Other minor fixes
834a85ece Use the options ENABLE_PERFTEST, ENABLE_EXAMPLES (#1667)
6d6ed244e Merge pull request #1670 from masterleinad/update_sycl
a3ee83b55 Merge pull request #1666 from brian-kelley/FixOmpImplThreads
e72bc3859 SYCL CI: Specify the full path to the compiler
16c97ddb6 Call concurrency(), not impl_thread_pool_size()
dec1753fc Testing working in serial and openmp (IF I force determinism on parIlut)
da90033b7 Merge pull request #1654 from dalg24/clock_tic
56cdbd2c6 Merge pull request #1653 from dalg24/drop_pre_kokkos_36_workaround
55eb42008 Merge pull request #1652 from dalg24/are_integral
f0229f902 Merge pull request #1651 from masterleinad/fix_sycl_printf
5b30c5a05 Merge pull request #1662 from ndellingwood/update-version-4.x
40474b7d9 Merge pull request #1660 from masterleinad/update_sycl
03180cdf1 Merge pull request #1659 from kliegeois/fix_documentation_typo
b4d8ca8bf Update nightly SYCL setup
4df9db90a Hands off Kokkos::Impl::are_integral
f33376a54 Add Impl::are_integral_v helper variable template
4ee798d83 Drop pre Kokkos 3.6 workaround
a4bea4798 Replace printf in device code for SYCL
12e1b814b Do not use Kokkos::Impl::clock_tic, prefer std::chrono to get a random seed
839453184 Merge pull request #1647 from e10harvey/issue1571
93ecefbc9 Fix LUPrec license
3074b4b01 CMakeLists.txt: update version to 4.0.99
45630287b Merge remote-tracking branch 'origin/develop' into jgfouca/par_ilut_test
b4f3dd0eb Fix documentation regressions
51c3c5a0c Fix whitespace
cc38f32ef Add deprecated code disable to docs build.
10d155ad4 Merge branch 'develop' into issue1571
78833d6ca Merge pull request #1658 from lucbv/kokkos_deprecate_ALL_t
86edac3b1 Minor fixes
ead0712ef Merge branch 'develop' into issue1571
d2f273c02 osx-ci: adding option to disable deprecated_code_4 in Kokkos
b846db97e Apply suggestions from code review
fed582cb5 Fix an error in Krylov Handle documentation
6c5744fd6 Applying clang-format
215c6beb0 Benchmarks: for some reason the current version fails to build
11be16b61 Fixing deprecated usage of Kokkos::Impl::ALL_t in favore of Kokkos::ALL_t
e04475d55 Things building
1ea3a7b90 .github/workflows:   - Added docs.yml   - Save cycles with -DKokkos_ENABLE_TESTS=OFF
25b4fb815 Add new par_ilut test
f87b7d566 Clean up numeric and symbolic
547a6608a Clean up spiluk numeric
432c9541c Fix for VOLTA
81f77d0fb Prefer team size 32
0b4b667f1 Use atomic_add again
3adaa70ba Not use atomic_add
916100baf Initial fix

git-subtree-dir: tpls/kokkos-kernels
git-subtree-split: 25a31f8812330cec6e8ac5d8ea99bb9a2045cbab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Kokkos TriBITS Issues with the TriBITS framework itself, not usage of the TriBITS framework type: enhancement Issue is an enhancement, not a bug
Projects
Development

No branches or pull requests

3 participants