Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial design of matrix type in ISPC #2470

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

aneshlya
Copy link
Collaborator

No description provided.

docs/design/matrix.rst Outdated Show resolved Hide resolved

C = matrix_mad(matrix_vertical_pack(A), matrix_vertical_pack(B));

matrix_store(C, p->mC);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need matrix_load and matrix_store? Is dereference operator not enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stdlib functions for matrix load/store gives more flexibility if we need to provide more parameters (maybe layout) in a future. But you're right, dereference operator should work as good as stdlib function.

There can be additional stdlib functions available for specific platforms only. For example it may be a function
to set a tile configuration on the platforms with Intel(R) AMX support.

MAD example in ISPC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have some typical use cases that we are considering designing this? An example with C side code may help to understand full picture.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C code is below


varying int X = 1;
uniform int Y = 2;
varying int xy = A[X, Y]; // returns {a23, a23, a23, a23}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not A[X][Y]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A[X][Y] is used for multi-dimensional arrays. It would be good to distinguish from that since matrix is native ISPC type.

uniform int X = 2;
varying int Y = {2, 1, 3, 5};
varying int xy = A[X, Y]; // returns {a33, a32, a34, a36}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that code possible?

matrix<int, 4, 4> A;
uniform int X = 0;
int<4> row = A[X];

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, short vectors should not be mixed with matrix types.
In the future we may allow to extract the data of any arbitrary size from matrix, and I think it should look more like this:
matrix<int, 1, 4> row = A[X];


Interoperability
----------------
Matrix is internal ISPC type. It can't be used as an argument to `export` or `extern "C"` functions. It can be used as an argument for internal ISPC functions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that C/C++ code will not know anything about layout/structure/restrictions of ISPC matrix types? I wonder how matrix initialization C/C++ code could look like.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. On C/C++side it may look like this:

    template <class T> void init_matrix(std::vector<T> &M, unsigned int rows, unsigned int cols, T value) {
    for (unsigned int r = 0; r < rows; r++)
        for (unsigned int c = 0; c < cols; c++) {
            M[r * cols + c] = value;
        }
    }
    int main() {
    ...
    std::vector<SRC_T> matrixA(M * N);
    std::vector<SRC_T> matrixB(N * K);
    std::vector<DST_T> matrixC(M * K);
    init_matrix<SRC_T>(matrixA, M, N, 1);
    init_matrix<SRC_T>(matrixB, N, K, 0.5);
    init_matrix<DST_T>(matrixC, M, K, 0);
    
    // If ISPCRT is used:
    ispcrt::Device device(ISPCRT_DEVICE_TYPE_GPU);
    ispcrt::Array<unsigned> matrixA_dev(device, matrixA);
    ispcrt::Array<unsigned> matrixB_dev(device, matrixB);
    ispcrt::Array<DST_T> matrixC_dev(device, matrixC);

    // Setup parameters structure
    Parameters<DST_T> p;

    p.mA = matrixA_dev.devicePtr();
    p.mB = matrixB_dev.devicePtr();
    p.mC = matrixC_dev.devicePtr();
    p.M = M;
    p.N = N;
    p.K = K;
    auto p_dev = ispcrt::Array<Parameters<DST_T>>(device, p);
    // Create module and kernel
   ...
    // Create task queue and execute kernel
    ispcrt::TaskQueue queue(device);

    queue.copyToDevice(p_dev);
    queue.copyToDevice(matrixA_dev);
    queue.copyToDevice(matrixB_dev);
    queue.copyToDevice(matrixC_dev);
   ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants