Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GEMM for A^T * A, A * A^T, A * A... operation #36

Closed
ptillet opened this issue Jun 3, 2013 · 4 comments
Closed

Fix GEMM for A^T * A, A * A^T, A * A... operation #36

ptillet opened this issue Jun 3, 2013 · 4 comments
Milestone

Comments

@ptillet
Copy link
Collaborator

ptillet commented Jun 3, 2013

In this case, A and A^T have different semantics in the kernel, but refer to the same handle and are considered equal by the generator... I am really not sure on how to handle this. Plus, I'm pretty sure A_A^T and A_A can be implemented using a better kernel... Should I just forbid the handle of LHS and RHS to be the same in that case (and in a later version dispatch to different kernels)? I will try to find out a way to handle this, but this problem seems to lay deep down in the generator's structure... I had really not anticipated that the same handle could refer to two different was of accessing memory in the same kernel !

@karlrupp
Copy link
Collaborator

karlrupp commented Jun 3, 2013

Such collisions are usually caught at a higher level (as for x = prod(A, x); ). This needs to be checked for CUDA and the CPU backend as well, so it's reasonable to assume different arguments in the kernel. A runtime check using assert() should nevertheless be applied - just in case.

@ptillet
Copy link
Collaborator Author

ptillet commented Jun 3, 2013

Well, concerning for the backend, I plan to check it when creating a custom_operation().
Okay, I will add an assert so that people cannot write op.add(A = prod(B, B)) or whatever implying the same handle appearing twice in the same expression.
However, this is a pretty common operation in statistics to compute covariance matrix, so I won't close the issue until I come up with some dispatching mechanism ! :) I'm pretty busy, so I milestone this for 2.0.0 ...
As a side note, I think the generated kernel will be correct if it is generated for A = prod(B, C), and then using C = trans(B). ie things should work if viennacl::linalg::prod() uses generated kernels... This is really a corner case I had not expected !

@karlrupp
Copy link
Collaborator

karlrupp commented Jun 3, 2013

I'm not talking about the backend, I'm taking about the front-end, i.e. at the point the user specifies the operation.
A user might write
C = prod(A, B);
D = prod(C, C);
which conceptually uses the same generated kernel, yet in the second statement the front-end first introduces a temporary to avoid the collision.

@ptillet
Copy link
Collaborator Author

ptillet commented Jun 3, 2013

I see ... Hmmm, the problem with the generator is that it will generate a
kernel that does not compile for prod(C,C). It is also kind of impossible
to introduce a temporary, considering that there is no specific overload of
C = prod(A,B)... Plus, at the time someone writes op.add(D = prod(C,C));
the operation still does not execute. So the temporary is not guaranteed to
stay correct...

2013/6/4 Karl Rupp notifications@github.com

I'm not talking about the backend, I'm taking about the front-end, i.e. at
the point the user specifies the operation.
A user might write
C = prod(A, B);
D = prod(C, C);
which conceptually uses the same generated kernel, yet in the second
statement the front-end first introduces a temporary to avoid the collision.

?
Reply to this email directly or view it on GitHubhttps://github.com//issues/36#issuecomment-18863275
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants