# xianyi/OpenBLAS

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

# Hadamard product? #1083

Open
opened this issue Feb 4, 2017 · 23 comments

Projects
6 participants

### personalityson commented Feb 4, 2017

 Elementwise vector product (like dot, but without the reduction to one number) Does it exist? In MKL its vdMul, in cuBLAS its DHAD (so, not part of the standard BLAS function list)... Hadamard product is completely central in machine learning, and is the only reason which prevents me from using OpenBLAS...
Collaborator

### martin-frbg commented Feb 4, 2017

 Does not exist currently but sounds interesting. Would you happen to have some rough number for the speedup from using MKL compared to just letting a modern compiler optimize a naive implementation ?

Contributor

### brada4 commented Feb 4, 2017 • edited

 It used to be in BLAS.... Still in lapack 3.1.1 it will be 3/2 execution time of DOT (3 memory accesses in place of 2 for dot per multiplication), and _dot is only C Probably C macro will perform much better than a function call...
Collaborator

### martin-frbg commented Feb 4, 2017

 @brada4 the only trace of this function appears to be a spurious entry in the intro_blas1 manpage distributed with lapack 3.1.1
Author

### personalityson commented Feb 4, 2017

 Just learned there is a walkaround by using sbmv http://stackoverflow.com/questions/7621520/element-wise-vector-vector-multiplication-in-blas Elementwise matrix*matrix product would still be nice...
Collaborator

### martin-frbg commented Feb 4, 2017

 Not sure abusing sbmv for this would be more efficient than a simple loop over all elements ? BTW the only references for *HAD functions in cuBLAS appear to be people asking if there are any...
Contributor

### brada4 commented Feb 5, 2017 • edited

 Now imagine your sbmv multiplying 2 million-element vectors ... There is less wasteful way with _gemv or _gemm (i.e free dimension(s) ==1) (by magnitude slower than a loop) dot is simple loop. I say macro because for small samples library call would be enormous time waste.
Author

### personalityson commented Feb 5, 2017

 less wasteful way with _gemv or _gemm (i.e free dimension(s) ==1) Please explain... Do you extract the diagonal from the resulting matrix somehow? How is it less wasteful?
Contributor

### brada4 commented Feb 6, 2017

 Diagonal? It is not square.
Collaborator

### martin-frbg commented Feb 6, 2017 • edited

 Not sure I get the gemm/gemv idea either, but it could be instructive to see a quick benchmark comparison of that sbmv suggestion from stackoverflow against a simple loop that the compiler can unroll as needed. As far as I could find out, the *HAD functions were a brainchild of Cray/SGI in the mid-1990s (proposed for future inclusion in LAPACK at some netlib developer meeting in 1995) and part of their SCSF library but never universally supported.
Author

### personalityson commented Feb 6, 2017

 The only way I see with gemv/gemm is to multiply two vectors v1T * v2 (of the same size), which returns a square matrix, and the diagonal is actually the hadamard product. I spent the whole weekend looking if there were some tricks with lda/incx, but I'm too dumb to find anything. Maybe there is? :) sbmv allows the input matrix to be stored in banded matrix form. With number of super-diagonals = 0 you only need to supply the diagonal, which in our case is one of the vectors, so it's not so bad... I should explain the context to my problem: I'm on a work computer where I do not have admin rights, and cannot install or compile anything. I use Excel VBA to declare function calls to libopenblas.dll in order to do some matrix crunching. (When I write that hadamard is "the only reason which prevents me from using OpenBLAS" -- I'm bluffing. mkl_rt.dll does not allow stdcall declarations from VBA, or maybe there is another reason, in any case I can only use OpenBLAS.) So, my two options are sbmv or a C Macro VBA macro for hadamard, and ideally I would want to have it outside of VBA...
Contributor

### brada4 commented Feb 6, 2017

 swap dimensions of v2 and get HAD swap dimensions of v1 and get DOT
Contributor

### grisuthedragon commented Feb 7, 2017

 The Hadamard product is obviously a memory bounded operation, so the only thing one had to take care of is that the matrices are not traversed orthogonal to their storage scheme. If the matrices are accesses in the right way one has 3 loads, 1 multiplication and one store. The only point I see where one can optimize something is if we assume $C := \alpha A\circle B + \beta C$ with $\beta=0$ and using non-temporal stores for C. In this case we only have 2 loads and one store per multiplication but requires aligned access to C and C must be larger than the L3 cache of current Intel-CPUs to give good results.
Contributor

### brada4 commented Feb 7, 2017

 in common case one can treat matrices as 1:(M*N) vectors and apply marginal case of gemm / gemv
Author

### personalityson commented Feb 7, 2017

 I still don't see it. Please give us a concrete example Assume size(v1) = size(v2)
Contributor

Author

### personalityson commented Feb 7, 2017

 dgemv "N", n, 1, 1.0, v1, n, v2, 1, 1.0, res, 1 This returns v1 multiplied by the first element in v2...

### hiro4bbh commented Feb 9, 2017

 I think we can do hadamard products using xtbmv with diagonal matrix A as the following: cblas_dtbmv(CblasColMajor, CblasUpper, CblasNoTrans, CblasNonUnit, n, 0, p, 1, q, 1); This code does hadamard product of vectors p and q whose length are n,which writes result to q.

### hiro4bbh commented Feb 9, 2017 • edited

 Sorry, I recommended to use xtbmv for Hadamard product. However, according to tbmv_thread.c, using dtbmv for Hadamard product is inefficient (calling xaxpy for each elements?), so I recommend to use your own Hadamard product routine in C. Furthermore, in optimized codes on some platforms, there are mysterious bugs, and I have no idea to solve its problem, because it happens sometimes...
Collaborator

### martin-frbg commented Feb 9, 2017

 @personalityson certainly an unexpected but interesting use case. I do now understand the merit of having the *had functions in OpenBLAS even if just for "convenience" without any particular speed benefit. Coding them as simple multiplication loops is very likely to be more efficient than "abusing" a special case of some more general function. As regards the mysterious bugs in optimized code, I trust the common goal is to get rid of them :-)
Author

### personalityson commented Feb 9, 2017

 Btw, the stdcall requirement normally needed for dlls to work in VBA only applies to 32bit dlls I saw an old feature request from 2014 also related to VBA: #423 In any case 64bit bins work fine in 64bit version of Office
Collaborator

### martin-frbg commented Feb 9, 2017 • edited

 Looking at the declaration of xHAD which has alpha and beta multipliers, incx, incy and even incz for a sparse result vector, sbmv/tbmv actually could be a good choice. The MKL-style vxMul on the other hand appears to support only the simple multiplication of N elements of vectors x and y to yield vector z.

Closed

### loadwiki commented Apr 26, 2018

 Hello everybody, I am looking forward this function too! My program does a video tracking task like this OpenCV tracking algorithm. The algorithm does a lot of point wise complex product in Fourier domain. The implementation in OpenCV is cv::mulSpectrum() while it is the bottle neck. My program will run on multi platform such as a x86 and ARM. OpenBLAS is my first choice from consideration of cross-platform and high performance. However, it is a pity to lack point wise vector product. Eigen and MKL do have this function.
Contributor

### brada4 commented Apr 27, 2018

 You can accelerate parts of Eigen using BLAS (OpenBLAS, MKL, Accelerate Framework) https://eigen.tuxfamily.org/dox-devel/TopicUsingBlasLapack.html The "this" function in MKL is not part of BLAS functions, but of other group, that is not provided in OpenBLAS.
to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.