Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate Framework-Agnostic CUDA Implementation #4

Open
stellarpower opened this issue Apr 19, 2024 · 2 comments
Open

Separate Framework-Agnostic CUDA Implementation #4

stellarpower opened this issue Apr 19, 2024 · 2 comments

Comments

@stellarpower
Copy link

Hi,

I see there are a few SDTW implementations around. I'm mainly using Keras right now, and there are a few implementations there too. The one I have used thus far is very slow to run, as I reckon it is jumping back into the CPU to interface with portions in Cython; and the rest is not very accessible to the graphcompiler, so the loops aren't unrolled in the GPU. It has to be executed eagerly.

I don't have any experience thus far in interfacing directly with CUDA, but I was wondering how possible it would be to separate out the CUDA implementation of the algorithm from the particular framework that is using it, and then just interface in, be it from Torch or Tensorflow - or potentially some other usecase. As they say, Don't Repeat Yourself.

This would probably apply to CPU use too. I have worked with Pybind11 a few times, and I don't know how well this would interface with either libTorch or tensorflow tensors, but, I presume if in CPU land whatever the graphcompilers can come up with is probably no better than a hand-rolled version in C++, if such a thing is around.

I'm guessing the differentiability would be the main thing - do you know if something like this would be possible? I'm not exactly experienced with neural networks, but I guess if I can use CudNN there must be some way to perform the backpropagation within the same kernel. In Ceres we can use expression templates to differentiate automatically, so I guess something must exist.

Any thoughts? I don't have time for sideprojects right now - but if this ends up making my network learn better, then it may well be worth the time put in.

Thanks

@toinsson
Copy link
Owner

hello again @stellarpower,

the meat of pysdtw really is the two CUDA kernels: functions compute_softdtw_cuda and compute_softdtw_backward_cuda.
If you check the code, they only depend on cuda and math. The rest of the library is just a bunch of convenience code: pytorch integration as nn.module, support for packed sequences and availability on pypi.

After a quick googling, it appears that pytorch has a good integration with cuda, especially in python - and that is what is leveraged in pysdtw. On the other hand, I could not find any examples of calling python cuda kernels with tensorflow - only C++ (https://www.tensorflow.org/guide/create_op#use_the_op_in_python).

So, a framework agnostic python cuda implementation already exists (compute_softdtw_cuda and compute_softdtw_backward_cuda) but wrapping those might not be easy for other framework than pytorch.

@stellarpower
Copy link
Author

Hi,

Right, yes, that's more or less what I mean - there are several versions out there and, be it in python or something else, I think it might be nice to separate the kernel into a separate package, for CPU and GPU, that implements the algorithm; then torch or tensorflow or some other bindings can live in their own package or repository and call in.

I have been looking and it seems there may be some ways to integrate the numba JIT code into tensorflow, but it's not looking that likely, and the standard way does seem to be in compiling an Op from C++ rather than using the ptx output and pulling that in at runtime.

Currently I'm seeing if implementing the backward pass specifically in the Keras version improves performance, as I expect the autodifferentiation is going to be the problem, but if not, it might be that I implement the algorithm in C++ and also write a kernel for it, with the aim of allowing calling in from other languages/frameworks.. Will keep you posted.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants