Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sparse matrix and sparse linear solvers #2906

Closed
14 of 16 tasks
FantasyVR opened this issue Sep 10, 2021 · 8 comments
Closed
14 of 16 tasks

Support sparse matrix and sparse linear solvers #2906

FantasyVR opened this issue Sep 10, 2021 · 8 comments
Labels
feature request Suggest an idea on this project
Milestone

Comments

@FantasyVR
Copy link
Collaborator

FantasyVR commented Sep 10, 2021

Concisely describe the proposed feature
Many problems in computer graphics (especially physical simulation) need to solve sparse linear systems. While some of these problems can be addressed using matrix-free solvers (e.g., conjugate gradients), many need explicitly constructing a sparse matrix, in standard formats such as CSR and COO.

Currently, Taichi does not have first-class support for these matrices. Note that the sparse computation system in Taichi is designed for spatial sparsity instead of matrix sparsity.

Describe the solution you'd like (if any)

  • Use SparseMatrixBuilder to create triplets of sparse matrices.
  • SparseMatrix is built from SparseMatrixBuilder
    • The users could configure the storage format (CSR, COO, etc.).
    • The users could configure the data type of sparse matrix element.
  • Support basic sparse matrix operations such as +, -, *, @, transpose, etc.
  • Support sparse direct solvers such as LLT, LDLT, LU, etc.
    • The users could configure the data type, ordering of solvers.
  • Performance benchmark (Scipy, cuSolver)
  • cuSPARSE, cuSolver function loader
  • create a sparse matrix using cuSparse
  • Support GPU-based basic sparse matrix operations such as +, -, *, @, transpose, etc.
  • COO format sort before transforming to CSR format, reference: here.
  • Support sparse direct solver on GPU

Unify CPU and GPU API

  • Construct GPU sparse matrix builder triplets in ti.kernel
  • Return ndarray as results in spmv and solve function
  • move solve_cu and solve_rf to solve function
  • Datatype configuration of GPU sparse matrix/sparse solver.
  • Performance benchmark of GPU/CPU sparse matrix/solver.

Update:

  • Support more common used packages
    • SuiteSparse, Pardiso
@mzy2240
Copy link

mzy2240 commented Sep 10, 2021

Will you consider supporting sparse matrix solver like Suitesparse?

@zishengye
Copy link

Will you consider an inclusion of some engineering level packages like PETSc and Trilinos. These packages not only support different types of matrix formats but also a lot of preconditioners.

@FantasyVR
Copy link
Collaborator Author

FantasyVR commented Sep 12, 2021

Will you consider supporting sparse matrix solvers like Suitesparse?

@mzy2240 Thanks for your suggestion. We just wrap Eigen for now. SuiteSparse is in our plan. If you have interest,
Welcome to join us and support more features.

@FantasyVR
Copy link
Collaborator Author

Will you consider the inclusion of some engineering level packages like PETSc and Trilinos. These packages not only support different types of matrix formats but also a lot of preconditioners.

Our plan is to design a general sparse linear system solver. But now, we intend to implement a basic and useable version based on Eigen. More packages like SuitSparse, Pardiso, CuSolver et al. are in our plan. And we welcome you to join us during the process.

@zishengye
Copy link

Maybe KokkosKernel could help your work.

@mzy2240
Copy link

mzy2240 commented Nov 22, 2021

Do you consider supporting conversion from and to scipy.sparse matrix? Scipy is probably the most used library when dealing with sparse matrix in python.

@Hanke98
Copy link
Collaborator

Hanke98 commented Sep 1, 2022

Hi, there! I am now working on implementing the basic operation operations such as +, -, *, @, etc. on GPU. However, I found that the + and - operations are not common cases according to this post.

Is it OK to use cuSparse Routine csrgeam, which performs C=α∗A+β∗B, to implement the + operation?

FantasyVR added a commit that referenced this issue Nov 29, 2022
Issue: #2906 

### Brief Summary
Using ndarray `read_int` function to attain triplets is not efficient.
The triplets could be copied to host memory using
`memcpy_device_to_host`.
In addition, the triplets stored in the map container are already
ordered. In function `build_csr_from_coo`, the coo triplets are sorted
again. To improve performance, we could use the unordered map to store
triplets.
strongoier added a commit that referenced this issue Nov 30, 2022
Issue: #2906

Co-authored-by: Olinaaaloompa <106292061+Olinaaaloompa@users.noreply.github.com>
Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
FantasyVR added a commit to FantasyVR/taichi that referenced this issue Nov 30, 2022
Issue: taichi-dev#2906

Co-authored-by: Olinaaaloompa <106292061+Olinaaaloompa@users.noreply.github.com>
Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
FantasyVR added a commit that referenced this issue Dec 2, 2022
Issue: #2906 

### Brief Summary
The `read_int` function of ndarray consumes more than 100M gpu memory.
It's better to use `memcpy_device_to_host` function to obtain
`num_triplets_`.
FantasyVR added a commit that referenced this issue Dec 20, 2022
Issue: #2906 

### Brief Summary
When solving a sparse linear system without reordering the system
matrix, `analyze_pattern` and `factorize` would consume a lot of GPU
memory. For example, the `res` in the `stable fluid` example can not be
larger than 500. After reordering the sparse system matrix, we can set
`res=512` now.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
FantasyVR added a commit that referenced this issue Dec 26, 2022
Issue: #2906 

### Brief Summary
To be consistent with API on CPU backend, this pr provides LU sparse
solver on CUDA backend. CuSolver just provides a CPU version API of LU
sparse solver which is used in this PR. The cuSolverRF provides a GPU
version LU solve, but it only supports `double` datatype. Thus, it's not
used in this PR.

Besides, the `print_triplets` is refactored to resolve the ndarray
`read` constraints (the `read` and `write` data should be the same
datatype).

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#2906 taichi-dev#6082 

+ Support element access for sparse matrix on GPU.
+ Add tests for GPU sparse matrix operators.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#2906

### Brief Summary
The previous API of spmv is `A.spmv(x, y)` which means `y=A@x`. 
Now the spmv function is removed. We can directly use `y = A @ x` to
compute the results on Cuda backend.
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#2906 

### Brief Summary
Replace the array storing triplets in sparse matrix builder with
ndarray. It unifies the sparse matrix builder creation on the CPU and
GPU.

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…atrix (taichi-dev#6605)

Issue: taichi-dev#2906 

### Brief Summary
When building GPU sparse matrix, cuSparse API requires three separated
arrays: row index ptr, col index ptr, and values ptr. However, the
sparse matrix builder only uses one ndarray to store all triplets, the
memory layout is like: [row, col, value, row, col, value, ...]. In this
pr, I retrieve all data from ndarray and merge all triplets in the same
position of the sparse matrix. Then, all triplets are stored in three
separate arrays. At last, these three arrays are used to build sparse
matrix using cuSparse API.
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…chi-dev#6651)

Issue: taichi-dev#2906 

Previously, ndarray cann't be used as a vector on CPU backend in spmv
and sparse linear solve operations. In this PR, the ndarray is
[mapped](https://eigen.tuxfamily.org/dox/classEigen_1_1Map.html) as
Eigen's vector, and then it's used to do math computation.

In addition, the implicit mass spring example is modified to use
ndarrays. It enables us to run this example on both the CPU and GPU
backend.
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#2906 

### Brief Summary
Using ndarray `read_int` function to attain triplets is not efficient.
The triplets could be copied to host memory using
`memcpy_device_to_host`.
In addition, the triplets stored in the map container are already
ordered. In function `build_csr_from_coo`, the coo triplets are sorted
again. To improve performance, we could use the unordered map to store
triplets.
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#2906

Co-authored-by: Olinaaaloompa <106292061+Olinaaaloompa@users.noreply.github.com>
Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#2906 

### Brief Summary
The `read_int` function of ndarray consumes more than 100M gpu memory.
It's better to use `memcpy_device_to_host` function to obtain
`num_triplets_`.
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#2906 

### Brief Summary
When solving a sparse linear system without reordering the system
matrix, `analyze_pattern` and `factorize` would consume a lot of GPU
memory. For example, the `res` in the `stable fluid` example can not be
larger than 500. After reordering the sparse system matrix, we can set
`res=512` now.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#2906 

### Brief Summary
To be consistent with API on CPU backend, this pr provides LU sparse
solver on CUDA backend. CuSolver just provides a CPU version API of LU
sparse solver which is used in this PR. The cuSolverRF provides a GPU
version LU solve, but it only supports `double` datatype. Thus, it's not
used in this PR.

Besides, the `print_triplets` is refactored to resolve the ndarray
`read` constraints (the `read` and `write` data should be the same
datatype).

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Suggest an idea on this project
Projects
None yet
Development

No branches or pull requests

5 participants