Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra::MultiVector unit tests fail for Scalar=complex<double>, likely due to kokkos-kernels changes #3493

Closed
mhoemmen opened this issue Sep 25, 2018 · 16 comments

Comments

@mhoemmen
Copy link
Contributor

https://testing.sandia.gov/cdash/testDetails.php?test=55163267&build=3970561

The following tests FAILED:
    1. MultiVector_double_default_local_ordinal_type_default_global_ordinal_type_Kokkos_Compat_KokkosSerialWrapperNode_ComplexDotOneColumn_UnitTest ... 
    56. MultiVector_int_int_std_complex0double0_Kokkos_Compat_KokkosSerialWrapperNode_NonContigView_UnitTest ... 

Further inspection shows large relative errors, so this isn't just an issue of needing to adjust the tolerance for complex numbers. When I run the tests locally on a Mac, I get a segfault.

I'm curious if this is related to the recent kokkos-kernels update.

@trilinos/tpetra

Motivation and Context

Salinas depends on Tpetra with complex Scalar types.

@mhoemmen
Copy link
Contributor Author

Bisection: This commit passes the test:

commit 69de6b8e9c07b26be0c71afd2b6f076dacb111ed (HEAD)
Author: Mark Hoemmen <mhoemme@sandia.gov>
Date:   Wed Sep 19 12:02:20 2018 -0600

    Belos: Fix "native" Tpetra GMRES solver for Teuchos_ENABLE_COMPLEX=OFF

    @trilinos/belos

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Sep 25, 2018

This commit fails; it is the next commit in develop right after the one mentioned above:

commit fbd9a3c4577a661331847a51620795e1b228e946 (HEAD)
Merge: 64396a9c7b f6d4f8a351
Author: kyungjoo-kim <kyukim@sandia.gov>
Date:   Wed Sep 19 13:54:29 2018 -0500

    Merge pull request #3458 from kyungjoo-kim/issue-3438

    KokkosKernels - remove cusparse testing in spgemm.

@kyungjoo-kim
Copy link
Contributor

Really ? As far as I remembered, I just changed testing scope in kokkoskernels testing directory. Oh.... I cannot believe how this could possibly affect other testings.

@kyungjoo-kim
Copy link
Contributor

FYI

@mhoemmen
Copy link
Contributor Author

@kyungjoo-kim It looks like there's something off with the dot products -- does kokkos-kernels test with complex scalar types normally?

@ndellingwood
Copy link
Contributor

@mhoemmen The changes made in PR #3438 (SHA fbd9a3c of failed commit above) removed cusparse testing in kokkos-kernels spgemm unit tests, there were no internal code changes.

@kyungjoo-kim
Copy link
Contributor

@mhoemmen I looked up cdash with complex keywords and I cannot find a running test with complex. Could you double check if there is any cdash test with complex for KokkosKernels ?

@crtrott Do we have KK jenkins tests running with complex ? I cannot access jenkins-son.sandia.gov to check that.

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Sep 25, 2018

@kyungjoo-kim Here is a build with complex enabled:

https://testing.sandia.gov/cdash/testDetails.php?test=55163267&build=3970561

Here are the build details:

https://testing.sandia.gov/cdash/buildSummary.php?buildid=3970561

I was able to get the test to fail on my Mac laptop with Clang and OpenMPI 1.10.x, with just TpetraCore enabled and Trilinos_ENABLE_COMPLEX_DOUBLE=ON.

@kyungjoo-kim
Copy link
Contributor

@mhoemmen Yes I see the test but I did not see a complex test from KokkosKernels. It only includes Tpetra tests. Checking other tests, I cannot find a KokkosKernels complex test running on cdash.

@mhoemmen
Copy link
Contributor Author

@kyungjoo-kim Hm, are there kokkos-kernels tests at all? I'm worried about lack of complex arithmetic coverage for Trilinos tests in general, esp. given that Salinas depends on it. Does EMPIRE also care?

@mhoemmen
Copy link
Contributor Author

Is this relevant? #3336

@srajama1
Copy link
Contributor

@ndellingwood : Can we enable complex on Kokkos jenkins tests. I don't think we do.

This is the first time I hear Salinas using Tpetra and thereby Kokkos Kernels. Is this a new dependency ?

@ndellingwood
Copy link
Contributor

@srajama1 yeah, I'll work with Christian to learn how to do this later this week when he has a bit of time.

@mhoemmen
Copy link
Contributor Author

Sierra has been building with complex arithmetic for years, due only to Salinas. Also I'd really not like to have correctness regressions :)

@mhoemmen mhoemmen changed the title Tpetra::MultiVector: Unit tests fail for Scalar=complex<double> Tpetra::MultiVector unit tests fail for Scalar=complex<double>, likely due to kokkos-kernels changes Sep 26, 2018
@mhoemmen
Copy link
Contributor Author

@prwolfe @rrdrake Please don't update your version of Trilinos until we figure this out. It's likely this will break Salinas.

@micahahoward FYI; I don't think this affects your code but it may be a correctness issue.

@mhoemmen
Copy link
Contributor Author

mhoemmen commented Oct 1, 2018

@kddevin @kyungjoo-kim

I did some more investigation into the ComplexDotOneColumn test in tpetra/core/test/MultiVector/MultiVector_UnitTests.cpp. When I replace the following call to KokkosBlas::dot in multiVectorSingleColumnDot with a hand-rolled loop, I got that test to pass:

lclDot = KokkosBlas::dot (x_1d, y_1d);

I noticed the following build warnings too:

.../Trilinos/packages/kokkos-kernels/src/impl/tpls/KokkosBlas1_dot_tpl_spec_decl.hpp:54:33: warning: 'zdotu_' has C-linkage specified, but returns user-defined type 'std::complex<double>' which is
      incompatible with C [-Wreturn-type-c-linkage]
extern "C" std::complex<double> zdotu_( const int* N, const std::complex<double>* x, const int* x_inc,
                                ^
.../Trilinos/packages/kokkos-kernels/src/impl/tpls/KokkosBlas1_dot_tpl_spec_decl.hpp:56:33: warning: 'cdotu_' has C-linkage specified, but returns user-defined type 'std::complex<float>' which is
      incompatible with C [-Wreturn-type-c-linkage]
extern "C" std::complex<float>  cdotu_( const int* N, const std::complex<float>* x, const int* x_inc,
                                ^

Normally, these warnings are harmless. However, they reminded me of some issues we saw several years ago in Teuchos, relating to calling Fortran functions that return complex numbers. Fortran functions don't necessarily return complex numbers in the way that the C ABI expects. This has nothing to do with input arrays of complex; it just has to do with functions that return a complex number.

There are at least two possible fixes:

  1. Wrap the Fortran functions in a Fortran wrapper that "returns" the complex result as an output argument.
  2. Try to guess the Fortran ABI for returning complex numbers.
  3. Don't call these Fortran functions; call the generic implementations in the KokkosBlas instead.

Option (1) requires a Fortran compiler. Option (2) is risky, because there may be different conventions for different compilers and/or systems. Option (3) is easy, but may lose some performance.

mhoemmen pushed a commit to mhoemmen/Trilinos that referenced this issue Oct 1, 2018
@bartlettroscoe bartlettroscoe added stage: in progress Work on the issue has started labels Oct 1, 2018
mhoemmen added a commit that referenced this issue Oct 2, 2018
kokkos-kernels: Fix #3493 (incorrect dot product results & segfaults for complex Tpetra::MultiVector)
searhein pushed a commit to searhein/Trilinos that referenced this issue Oct 3, 2018
…GDSW-VFEM-Coarse-Spaces

* 'develop' of https://github.com/trilinos/Trilinos: (186 commits)
  Tpetra: initial commit of user's guide (trilinos#3553)
  MueLu: fix master xml list problem
  PyTrilinos: Remove includes of NOX_Version.h
  Xpetra: update threshold type
  Re-enable Belos PCG tests (trilinos#2920 & trilinos#3007) on white/ride
  Phalanx: add debug output support during DAG traversal
  Switch to new devpack/20180521/openmpi/2.1.2/gcc/7.2.0/cuda/9.2.88 env (trilinos#3290)
  MueLu: Fix gold files
  MueLu gold files: Fix bad if statement in rebase scripts
  MueLu: Xpetra: add threshold for zero diag fix
  kokkos-kernels: Patch to fix trilinos#3493
  Tpetra::MultiVector: Improve unit tests (help w/ trilinos#3493)
  running update_params.sh to fix up latex
  Disable four exodus SEACAS tests failing with Not Run on mutrino (trilinos#3496) (trilinos#3530)
  PyTrilinos: Fix case-sensitive include of Ifpack_ConfigDef.h
  Stratimikos Belos adapter: On Belos error, throw Thyra::CatastrophicSolveFailure
  Stratimikos Belos adapter: When "Timer Label" is set, use it in output
  Amesos: SuperLU_DIST version fixes
  put in a new phase 3 aggregation option that avoids singletons at all costs (even groupng vertices with non-neighbors into aggregates)
  Framework: Parameterized CTest build update to enable extra packages to be set by Jenkins parameters. (trilinos#3520)
  ...
@bartlettroscoe bartlettroscoe removed the stage: in progress Work on the issue has started label Oct 15, 2018
tjfulle pushed a commit to tjfulle/Trilinos that referenced this issue Dec 6, 2018
tjfulle pushed a commit to tjfulle/Trilinos that referenced this issue Dec 6, 2018
@trilinos/tpetra

Patch Trilinos' current snapshot of kokkos-kernels, in order to
fix trilinos#3493 and kokkos/kokkos-kernels#307.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants