Fix GEMM for A^T * A, A * A^T, A * A... operation #36

ptillet · 2013-06-03T18:41:21Z

In this case, A and A^T have different semantics in the kernel, but refer to the same handle and are considered equal by the generator... I am really not sure on how to handle this. Plus, I'm pretty sure A_A^T and A_A can be implemented using a better kernel... Should I just forbid the handle of LHS and RHS to be the same in that case (and in a later version dispatch to different kernels)? I will try to find out a way to handle this, but this problem seems to lay deep down in the generator's structure... I had really not anticipated that the same handle could refer to two different was of accessing memory in the same kernel !

karlrupp · 2013-06-03T18:48:04Z

Such collisions are usually caught at a higher level (as for x = prod(A, x); ). This needs to be checked for CUDA and the CPU backend as well, so it's reasonable to assume different arguments in the kernel. A runtime check using assert() should nevertheless be applied - just in case.

ptillet · 2013-06-03T18:58:35Z

Well, concerning for the backend, I plan to check it when creating a custom_operation().
Okay, I will add an assert so that people cannot write op.add(A = prod(B, B)) or whatever implying the same handle appearing twice in the same expression.
However, this is a pretty common operation in statistics to compute covariance matrix, so I won't close the issue until I come up with some dispatching mechanism ! :) I'm pretty busy, so I milestone this for 2.0.0 ...
As a side note, I think the generated kernel will be correct if it is generated for A = prod(B, C), and then using C = trans(B). ie things should work if viennacl::linalg::prod() uses generated kernels... This is really a corner case I had not expected !

karlrupp · 2013-06-03T19:03:25Z

I'm not talking about the backend, I'm taking about the front-end, i.e. at the point the user specifies the operation.
A user might write
C = prod(A, B);
D = prod(C, C);
which conceptually uses the same generated kernel, yet in the second statement the front-end first introduces a temporary to avoid the collision.

ptillet · 2013-06-03T19:10:41Z

I see ... Hmmm, the problem with the generator is that it will generate a
kernel that does not compile for prod(C,C). It is also kind of impossible
to introduce a temporary, considering that there is no specific overload of
C = prod(A,B)... Plus, at the time someone writes op.add(D = prod(C,C));
the operation still does not execute. So the temporary is not guaranteed to
stay correct...

2013/6/4 Karl Rupp notifications@github.com

I'm not talking about the backend, I'm taking about the front-end, i.e. at
the point the user specifies the operation.
A user might write
C = prod(A, B);
D = prod(C, C);
which conceptually uses the same generated kernel, yet in the second
statement the front-end first introduces a temporary to avoid the collision.

?
Reply to this email directly or view it on GitHubhttps://github.com//issues/36#issuecomment-18863275
.

ptillet closed this as completed Apr 22, 2020

LutzWeischerFujitsu mentioned this issue Feb 9, 2021

tests fail on AArch64, Fedora 33 #287

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GEMM for A^T * A, A * A^T, A * A... operation #36

Fix GEMM for A^T * A, A * A^T, A * A... operation #36

ptillet commented Jun 3, 2013

karlrupp commented Jun 3, 2013

ptillet commented Jun 3, 2013

karlrupp commented Jun 3, 2013

ptillet commented Jun 3, 2013

Fix GEMM for A^T * A, A * A^T, A * A... operation #36

Fix GEMM for A^T * A, A * A^T, A * A... operation #36

Comments

ptillet commented Jun 3, 2013

karlrupp commented Jun 3, 2013

ptillet commented Jun 3, 2013

karlrupp commented Jun 3, 2013

ptillet commented Jun 3, 2013