Taichi Benchmarks

The Taichi programming language is known for attaining high performance with easily understandable programs. The elegant parallel programming style has attracted many users to the Taichi community and we improve the compiler together. The benchmark codes here serve mainly three purposes:

Provide a target problem set
Since Taichi is a domain-specific language (DSL) focusing on the computer graphics and parallel computing domain, general benchmark cases cannot fully characterize Taichi to its benefit.
Provide a multi-dimensional comparison between Taichi and other popular frameworks
Performance is not the only objective, in fact, codes in this repository are not particularly tuned for the optimal performance. We also want to present the friendly, concise syntax Taichi exposed to its users.
Open discussions for future performance improvements
Through comparing identical algorithms implemented in different frameworks, we can learn and benefit from the entire open-source community to keep improving our language and compiler.

In order to fulfill our purposes, we build this benchmark project with the following principles:

State-of-the-art baselines
Compare with well-performed baselines can help Taichi to get aware of further optimization opportunities.
Reproducible results
Tests can be reproduced with the plot_benchmark.py script under each subdirectory.
Easy-to-read coding style
Elegant coding style and high performance are equally important. Through comparisons between Taichi and manually optimized code, users can have a better understanding of Taichi's optimization techniques.

Highlights

We have conducted performance evaluation on an Nvidia Geforce RTX3080 graphics card. Compared with the baselines, we share some inspiring performance results achieved by Taichi on the basis of its easy-to-use programming style:

Performance approaches device capability roofline, in terms of both computation and memory bandwidth. [Source: Nested SAXPY, Array fill.]

Minimized coding efforts, comparable performance against CUDA. [Source: MPM, 3x3 SVD, Path Tracer, Nested SAXPY.]

Easy-to-read code, extraordinary performance against JAX (GPU) and Numba/Numpy (CPU). [Source: Differentiable Smoke Simulation, Poisson Disk Sampling.]

Future Works

We are driving the benchmark work in two directions:

More use cases with strong baseline implementations
We are working on extending our benchmarks to cover more generalized parallel tasks. Benchmark items can be added when there are proper baseline implementations to compare with.
More target backends
The current tests are conducted primarily on Nvidia GPUs. We are extending our benchmark on more devices as Taichi is designed to be hardware neural. Also, performance reports are welcome if you have a supported device by Taichi!

Name		Name	Last commit message	Last commit date
Latest commit History 291 Commits
core		core
scripts		scripts
suites		suites
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Taichi Benchmarks

Highlights

Future Works

About

Releases

Packages

Contributors 10

Languages

License

taichi-dev/taichi_benchmark

Folders and files

Latest commit

History

Repository files navigation

Taichi Benchmarks

Highlights

Future Works

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages