-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse Matrix(compressed) * Dense Matrix product #37
Conversation
Note: needs review!!!
Thanks, Albert! The overall structure looks really good :-) Particularly it's great that you have an implementation for all three backends now. As for the kernels, I'll comment the commit inline. You can then either commit on top of the existing commit, or force-push an amended commit. |
unsigned int row_start = sp_mat_row_indices[row]; | ||
unsigned int row_end = sp_mat_row_indices[row+1]; | ||
|
||
// load work rows to shared memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should limit the maximum size of shared memory used. If I ramp up the number of columns in the dense matrix to e.g. 3000, the GPU is easily out of shared memory.
Note: Revisioned version
for ( unsigned int col = get_local_id(0); col < result_col_size; col += get_local_size(0) ) { | ||
|
||
float r = 0; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! You can optimize it a little further by interchanging the loops for 'col' and 'k', because then 'x' needs to be loaded only once. Right now you might get loaded multiple times.
Cool, almost done. Just a few minor optimizations left. :-) |
Hi Albert! Great job, really ! I would have some suggestions for potential performance improvement, but I think it would be wiser to first have benchmarks to measure whether performance improvements are necessary. |
@PhilippeTillet For a general sparse-dense multiplication this will always be memory bandwidth limited. To make it more compute-limited we would need to introduce some block-CSR format or similar. |
Conflicts: tests/CMakeLists.txt
Sparse Matrix(compressed) * Dense Matrix product
Thanks, Albert! |
Note: needs review!!!