Skip to content

Conversation

@Infinoid
Copy link
Contributor

@Infinoid Infinoid commented Sep 30, 2020

This patch fixes cuda builds on Debian bullseye (testing) with the nvidia-cuda-toolkit and libcudart10.2 packages installed.

Cmake sees /usr/bin/nvcc and decides that cuda is installed in /usr. Taco then assumes libcudart.so is in /usr/lib64/, but Debian puts it in /usr/lib/x86_64-linux-gnu/, instead. So the build fails with the following errors:

make[2]: *** No rule to make target '/usr/lib64/libcudart.so', needed by 'bin/taco'.  Stop.
make[2]: *** No rule to make target '/usr/lib64/libcudart.so', needed by 'bin/taco-tensor_times_vector'.  Stop.
make[2]: *** No rule to make target '/usr/lib64/libcudart.so', needed by 'bin/taco-test'.  Stop.

Cmake sets CUDA_LIBRARIES to the cuda RT library, so we can use that instead of hardcoding paths. With this, the libtaco.so library links fine.
However, the executables (bin/taco, bin/taco-test, etc) failed to build with the following error:

/usr/bin/ld: ../bin/taco: hidden symbol `cudaFree' in /usr/lib/gcc/x86_64-linux-gnu/10/../../../x86_64-linux-gnu/libcudart_static.a(libcudart_static.a.o) is referenced by DSO

It seems libtaco.so is using symbols from libcudart.so without linking against it, but the executable uses libcudart_static.a, and they don't mix.

I was able to fix this by having libtaco.so link to libcudart.so directly, by changing INTERFACE to PUBLIC. In this case, libtaco.so pulls in libcudart.so directly, and the applications get it for free. Since libtaco.so depends on libcudart.so anyway, I don't think there's a downside to this.

With this, it works properly for me with both nvidia's "cuda*.run" installation script (/usr/local/cuda/lib64) and debian's cuda packages (/usr/lib/x86_64-linux-gnu).

@Infinoid Infinoid changed the title Add a specific search for the location of libcudart Don't hardcode path to libcudart Oct 2, 2020
@Infinoid
Copy link
Contributor Author

Infinoid commented Oct 2, 2020

I've only tested this on very recent Linux distros. It would be good to test this on older linux (ubuntu 18.04 or earlier), and other OSes too.

@Infinoid Infinoid marked this pull request as ready for review October 2, 2020 13:12
Hardcoded paths don't work when using Debian's packaged version of cuda,
as the library paths don't match.  CMake's find_package(CUDA) sets
CUDA_LIBRARIES to the path of libcudart, so just use that instead.
@Infinoid
Copy link
Contributor Author

Infinoid commented Jan 4, 2021

I tested this on ubuntu 18.04, it works fine.

On debian testing with debian-packaged cuda:
-- Found CUDA: /usr/lib (found version "11.1")

On debian testing with nvidia .run file:
-- Found CUDA: /usr/local/cuda (found version "11.2")

On ubuntu 18.04 with nvidia .run file:
-- Found CUDA: /usr/local/cuda (found version "11.2")

and the build completed successfully in all 3 cases.

I noticed two side effects of the patch:

  • PATH, LD_LIBRARY_PATH and LIBRARY_PATH don't need to point at /usr/local/cuda to build taco any more, the cmake path detection is enough. (but they are still needed to run tests)
  • It seems to be linking with libcudart.a, rather than libcudart.so

Is the static linking a problem?

@stephenchouca stephenchouca merged commit 864b65d into tensor-compiler:master Jan 20, 2021
@Infinoid Infinoid deleted the cmake-cuda branch January 21, 2021 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants