-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial design of matrix type in ISPC #2470
base: main
Are you sure you want to change the base?
Conversation
|
||
C = matrix_mad(matrix_vertical_pack(A), matrix_vertical_pack(B)); | ||
|
||
matrix_store(C, p->mC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need matrix_load
and matrix_store
? Is dereference operator not enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stdlib functions for matrix load/store gives more flexibility if we need to provide more parameters (maybe layout) in a future. But you're right, dereference operator should work as good as stdlib function.
There can be additional stdlib functions available for specific platforms only. For example it may be a function | ||
to set a tile configuration on the platforms with Intel(R) AMX support. | ||
|
||
MAD example in ISPC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have some typical use cases that we are considering designing this? An example with C side code may help to understand full picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C code is below
|
||
varying int X = 1; | ||
uniform int Y = 2; | ||
varying int xy = A[X, Y]; // returns {a23, a23, a23, a23} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not A[X][Y]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A[X][Y]
is used for multi-dimensional arrays. It would be good to distinguish from that since matrix is native ISPC type.
uniform int X = 2; | ||
varying int Y = {2, 1, 3, 5}; | ||
varying int xy = A[X, Y]; // returns {a33, a32, a34, a36} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that code possible?
matrix<int, 4, 4> A;
uniform int X = 0;
int<4> row = A[X];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, short vectors should not be mixed with matrix types.
In the future we may allow to extract the data of any arbitrary size from matrix, and I think it should look more like this:
matrix<int, 1, 4> row = A[X];
|
||
Interoperability | ||
---------------- | ||
Matrix is internal ISPC type. It can't be used as an argument to `export` or `extern "C"` functions. It can be used as an argument for internal ISPC functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that C/C++ code will not know anything about layout/structure/restrictions of ISPC matrix types? I wonder how matrix initialization C/C++ code could look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. On C/C++side it may look like this:
template <class T> void init_matrix(std::vector<T> &M, unsigned int rows, unsigned int cols, T value) {
for (unsigned int r = 0; r < rows; r++)
for (unsigned int c = 0; c < cols; c++) {
M[r * cols + c] = value;
}
}
int main() {
...
std::vector<SRC_T> matrixA(M * N);
std::vector<SRC_T> matrixB(N * K);
std::vector<DST_T> matrixC(M * K);
init_matrix<SRC_T>(matrixA, M, N, 1);
init_matrix<SRC_T>(matrixB, N, K, 0.5);
init_matrix<DST_T>(matrixC, M, K, 0);
// If ISPCRT is used:
ispcrt::Device device(ISPCRT_DEVICE_TYPE_GPU);
ispcrt::Array<unsigned> matrixA_dev(device, matrixA);
ispcrt::Array<unsigned> matrixB_dev(device, matrixB);
ispcrt::Array<DST_T> matrixC_dev(device, matrixC);
// Setup parameters structure
Parameters<DST_T> p;
p.mA = matrixA_dev.devicePtr();
p.mB = matrixB_dev.devicePtr();
p.mC = matrixC_dev.devicePtr();
p.M = M;
p.N = N;
p.K = K;
auto p_dev = ispcrt::Array<Parameters<DST_T>>(device, p);
// Create module and kernel
...
// Create task queue and execute kernel
ispcrt::TaskQueue queue(device);
queue.copyToDevice(p_dev);
queue.copyToDevice(matrixA_dev);
queue.copyToDevice(matrixB_dev);
queue.copyToDevice(matrixC_dev);
...
}
No description provided.