Open
Description
cublas does not have a specific batch variant of dgmm
, it only has cublas<t>dgmm()
, see https://docs.nvidia.com/cuda/cublas/#id10
However oneMKL only supports the "batch-style" dgmm_batch
interface
This is probably one reason why the cublas backend to oneMKL has no implementation for any dgmm
functions. I'm not sure how widely used dgmm
is, but I'm guessing since it is a core blas algorithm it should probably be supported.
Is there a reason why there is only the batch style variant of dgmm
in oneMKL?
Thanks