diff --git a/CMakeLists.txt b/CMakeLists.txt index e8b99e29e35b3..d6dd64c7bec1e 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -8,7 +8,7 @@ project(taichi) SET(TI_VERSION_MAJOR 0) SET(TI_VERSION_MINOR 5) -SET(TI_VERSION_PATCH 8) +SET(TI_VERSION_PATCH 9) execute_process( WORKING_DIRECTORY ${CMAKE_SOURCE_DIR} diff --git a/README.md b/README.md index 14cf760c3368b..8475c9ad92472 100644 --- a/README.md +++ b/README.md @@ -34,6 +34,26 @@ python3 -m pip install taichi-nightly-cuda-10-1 |**PyPI**|[![Build Status](https://travis-ci.com/yuanming-hu/taichi-wheels-test.svg?branch=master)](https://travis-ci.com/yuanming-hu/taichi-wheels-test)|[![Build Status](https://travis-ci.com/yuanming-hu/taichi-wheels-test.svg?branch=master)](https://travis-ci.com/yuanming-hu/taichi-wheels-test)|[![Build status](https://ci.appveyor.com/api/projects/status/39ar9wa8yd49je7o?svg=true)](https://ci.appveyor.com/project/IteratorAdvance/taichi-wheels-test)| ## Updates +- (Mar 28, 2020) v0.5.9 released + - **CPU backends** + - Support `bitmasked` as the leaf block structure for `1x1x1` masks (#676) (by **Yuanming Hu**) + - **CUDA backend** + - Support `bitmasked` as the leaf block structure for `1x1x1` masks (#676) (by **Yuanming Hu**) + - **Documentation** + - Updated contributor guideline (#658) (by **Yuanming Hu**) + - **Infrastructure** + - 6x faster compilation on CPU backends (#673) (by **Yuanming Hu**) + - **Language and syntax** + - Simplify dense.bitmasked to bitmasked (#670) (by **Ye Kuang**) + - Support break in non-parallel for statements (#583) (by **彭于斌**) + - **Metal backend** + - Changes to enable `bitmasked` on Metal! (#661) (by **Ye Kuang**) + - Silence compile warning with [[maybe_unused]] (#650) (by **Ye Kuang**) + - Add bitmasked support in MetalRuntime (#638) (by **Ye Kuang**) + - **Optimization** + - Merge adjacent if's with identical conditions (#668) (by **xumingkuan**) + - Dive into container statements to find local loads/stores for optimization, and optimize loads of new allocas to 0 (#662) (by **xumingkuan**) + - [Full log](https://github.com/taichi-dev/taichi/releases/tag/0.5.9) - (Mar 24, 2020) v0.5.8 released. Visible/notable changes: - **Language features** - Access out-of-bound checking on CPU backends (#572) (by **xumingkuan**) @@ -65,77 +85,8 @@ python3 -m pip install taichi-nightly-cuda-10-1 - Fixed infinitely looping signal handlers - Fixed `ti test` on release mode - Doc updated -- (Mar 3, 2020) v0.5.6 released - - Fixed runtime LLVM bitcode loading failure on Linux - - Fixed a GUI bug in `ti.GUI.line` (by **Mingkuan Xu [xumingkuan]**) - - Fixed frontend syntax error false positive (static range-fors) (by **Mingkuan Xu [xumingkuan]**) - - `arch=ti.arm64` is now supported. (Please build from source) - - CUDA supported on NVIDIA Jetson. (Please build from source) -- (Mar 2, 2020) v0.5.5 released: **Experimental CUDA 10.0/10.1 support on Windows. Feedbacks are welcome!** -- (Mar 1, 2020) v0.5.4 released - - Metal backend now supports < 32bit args (#530) (by **Ye Kuang [k-ye]**) - - Added `ti.imread/imwrite/imshow` for convenient image IO (by **Yubin Peng [archibate]**) - - `ti.GUI.set_image` now takes all numpy unsigned integer types (by **Yubin Peng [archibate]**) - - Bug fix: [Make sure KernelTemplateMapper extractors's size is the same as the number of args](https://github.com/taichi-dev/taichi/issues/534) (by **Ye Kuang [k-ye]**) - - [Avoid duplicate evaluations in chaining comparison (such as `1 < ti.append(...) < 3 < 4`)](https://github.com/taichi-dev/taichi/issues/540) (by **Mingkuan Xu [xumingkuan]**) - - Frontend kernel/function structure checking (#544) (by **Mingkuan Xu [xumingkuan]**) - - Throw exception instead of SIGABRT to obtain RuntimeError in Python-scope (by **Yubin Peng [archibate]**) - - Mark sync bit only after running a kernel on GPU (by **Ye Kuang [k-ye]**) - - `@ti.classkernel` is deprecated. Always use `ti.kernel`, no matter you are decorating a class member function or not (by **Ye Kuang [k-ye]**) - - Fix ti.func AST transform (due to locals() not saving compile result) #538, #539 (by **Yubin Peng [archibate]**) - - Add a KernelSimplicityASTChecker to ensure grad kernel is compliant (#553) (by **Ye Kuang [k-ye]**) - - Fixed MSVC C++ mangling which leads to unsupported characters in LLVM NVPTX ASM printer - - CUDA unified memory dependency is now removed. Set `TI_USE_UNIFIED_MEMORY=0` to disable unified memory usage - - Improved `ti.GUI.line` performance - - (For developers) compiler significantly refactored and folder structure reorganized -- (Feb 25, 2020) v0.5.3 released - - Better error message when try to declare tensors after kernel invocation (by **Yubin Peng [archibate]**) - - Logging: `ti.warning` renamed to `ti.warn` - - Arch: `ti.x86_64` renamed to `ti.x64`. `ti.x86_64` is deprecated and will be removed in a future release - - (For developers) Improved runtime bit code compilation thread safety (by **Yubin Peng [archibate]**) - - Improved OS X GUI performance (by **Ye Kuang [k-ye]**) - - Experimental support for new integer types `u8, i8, u16, i16, u32` (by **Yubin Peng [archibate]**) - - Update doc (by **Ye Kuang [k-ye]**) -- (Feb 20, 2020) v0.5.2 released - - Gradients for `ti.pow` now supported (by **Yubin Peng [archibate]**) - - Multi-threaded unit testing (by **Yubin Peng [archibate]**) - - Fixed Taichi crashing when starting multiple instances simultaneously (by **Yubin Peng [archibate]**) - - Metal backend now supports `ti.pow` (by **Ye Kuang [k-ye]**) - - Better algebraic simplification (by **Mingkuan Xu [xumingkuan]**) - - `ti.normalized` now optionally takes a argument `eps` to prevent division by zero in differentiable programming - - Improved random number generation by decorrelating PRNG streams on CUDA - - Set environment variable `TI_LOG_LEVEL` to `trace`, `debug`, `info`, `warn`, `error` to filter out/increase verbosity. Default=`info` - - [bug fix] fixed a loud failure on differentiable programming code generation due to a new optimization pass - - Added `ti.GUI.triangle` [example](https://github.com/taichi-dev/taichi/blob/master/misc/test_gui.py#L11) - - Doc update: added `ti.cross` for 3D cross products - - Use environment variable `TI_TEST_THREADS` to override testing threads - - [For Taichi developers, bug fix] `ti.init(print_processed=True)` renamed to `ti.init(print_preprocessed=True)` - - Various development infrastructure improvements by **Yubin Peng [archibate]** - - Official Python3.6 - Python3.8 packages on OS X (by **wYw [Detavern]**) -- (Feb 16, 2020) v0.5.1 released - - Keyboard and mouse events supported in the GUI system. Check out [mpm128.py](https://github.com/taichi-dev/taichi/blob/4f5cc09ae0e35a47ad71fdc582c1ecd5202114d8/examples/mpm128.py) for a interactive demo! (by **Yubin Peng [archibate] and Ye Kuang [k-ye]**) - - Basic algebraic simplification passes (by **Mingkuan Xu [xumingkuan]**) - - (For developers) `ti` (`ti.exe`) command supported on Windows after setting `%PATH%` correctly (by **Mingkuan Xu [xumingkuan]**) - - General power operator `x ** y` now supported in Taichi kernels (by **Yubin Peng [archibate]**) - - `.dense(...).pointer()` now abbreviated as `.pointer(...)`. `pointer` now stands for a dense pointer array. This leads to cleaner code and better performance. (by **Kenneth Lozes [KLozes]**) - - (Advanced struct-fors only) `for i in X` now iterates all child instances of `X` instead of `X` itself. Skip this if you only use `X=leaf node` such as `ti.f32/i32/Vector/Matrix`. - - Fixed cuda random number generator racing conditions -- (Feb 14, 2020) **v0.5.0 released with a new Apple Metal GPU backend for Mac OS X users!** (by **Ye Kuang [k-ye]**) - - Just initialize your program with `ti.init(..., arch=ti.metal)` and run Taichi on your Mac GPUs! - - A few takeaways if you do want to use the Metal backend: - - For now, the Metal backend only supports `dense` SNodes and 32-bit data types. It doesn't support `ti.random()` or `print()`. - - Pre-2015 models may encounter some undefined behaviors under certain conditions (e.g. read-after-write). According to our tests, it seems like the memory order on a single GPU thread could go inconsistent on these models. - - The `[]` operator in Python is slow in the current implementation. If you need to do a large number of reads, consider dumping all the data to a `numpy` array via `to_numpy()` as a workaround. For writes, consider first generating the data into a `numpy` array, then copying that to the Taichi variables as a whole. - - Do NOT expect a performance boost yet, and we are still profiling and tuning the new backend. (So far we only saw a big performance improvement on a 2015 MBP 13-inch model.) -- [Full changelog](changelog.md) +- [Full history](changelog.md) -## Short-term goals -- (Done) Fully implement the LLVM backend to replace the legacy source-to-source C++/CUDA backends (By Dec 2019) - - The only missing features compared to the old source-to-source backends: - - Vectorization on CPUs. Given most users who want performance are using GPUs (CUDA), this is given low priority. - - Automatic shared memory utilization. Postponed until Feb/March 2020. -- (Done) Redesign & reimplement (GPU) memory allocator (by the end of Jan 2020) -- (WIP) Tune the performance of the LLVM backend to match that of the legacy source-to-source backends (Hopefully by Feb, 2020. Current progress: setting up/tuning for final benchmarks) ## Related papers - [**(ICLR 2020) Differentiable Programming for Physical Simulation**](https://arxiv.org/abs/1910.00935) [[Video]](https://www.youtube.com/watch?v=Z1xvAZve9aE) [[BibTex]](https://raw.githubusercontent.com/yuanming-hu/taichi/master/misc/difftaichi_bibtex.txt) [[Code]](https://github.com/yuanming-hu/difftaichi) diff --git a/changelog.md b/changelog.md index 0fa22640aa858..a0965e1311d8c 100644 --- a/changelog.md +++ b/changelog.md @@ -1,4 +1,66 @@ # Changelog +- (Mar 3, 2020) v0.5.6 released + - Fixed runtime LLVM bitcode loading failure on Linux + - Fixed a GUI bug in `ti.GUI.line` (by **Mingkuan Xu [xumingkuan]**) + - Fixed frontend syntax error false positive (static range-fors) (by **Mingkuan Xu [xumingkuan]**) + - `arch=ti.arm64` is now supported. (Please build from source) + - CUDA supported on NVIDIA Jetson. (Please build from source) +- (Mar 2, 2020) v0.5.5 released: **Experimental CUDA 10.0/10.1 support on Windows. Feedbacks are welcome!** +- (Mar 1, 2020) v0.5.4 released + - Metal backend now supports < 32bit args (#530) (by **Ye Kuang [k-ye]**) + - Added `ti.imread/imwrite/imshow` for convenient image IO (by **Yubin Peng [archibate]**) + - `ti.GUI.set_image` now takes all numpy unsigned integer types (by **Yubin Peng [archibate]**) + - Bug fix: [Make sure KernelTemplateMapper extractors's size is the same as the number of args](https://github.com/taichi-dev/taichi/issues/534) (by **Ye Kuang [k-ye]**) + - [Avoid duplicate evaluations in chaining comparison (such as `1 < ti.append(...) < 3 < 4`)](https://github.com/taichi-dev/taichi/issues/540) (by **Mingkuan Xu [xumingkuan]**) + - Frontend kernel/function structure checking (#544) (by **Mingkuan Xu [xumingkuan]**) + - Throw exception instead of SIGABRT to obtain RuntimeError in Python-scope (by **Yubin Peng [archibate]**) + - Mark sync bit only after running a kernel on GPU (by **Ye Kuang [k-ye]**) + - `@ti.classkernel` is deprecated. Always use `ti.kernel`, no matter you are decorating a class member function or not (by **Ye Kuang [k-ye]**) + - Fix ti.func AST transform (due to locals() not saving compile result) #538, #539 (by **Yubin Peng [archibate]**) + - Add a KernelSimplicityASTChecker to ensure grad kernel is compliant (#553) (by **Ye Kuang [k-ye]**) + - Fixed MSVC C++ mangling which leads to unsupported characters in LLVM NVPTX ASM printer + - CUDA unified memory dependency is now removed. Set `TI_USE_UNIFIED_MEMORY=0` to disable unified memory usage + - Improved `ti.GUI.line` performance + - (For developers) compiler significantly refactored and folder structure reorganized +- (Feb 25, 2020) v0.5.3 released + - Better error message when try to declare tensors after kernel invocation (by **Yubin Peng [archibate]**) + - Logging: `ti.warning` renamed to `ti.warn` + - Arch: `ti.x86_64` renamed to `ti.x64`. `ti.x86_64` is deprecated and will be removed in a future release + - (For developers) Improved runtime bit code compilation thread safety (by **Yubin Peng [archibate]**) + - Improved OS X GUI performance (by **Ye Kuang [k-ye]**) + - Experimental support for new integer types `u8, i8, u16, i16, u32` (by **Yubin Peng [archibate]**) + - Update doc (by **Ye Kuang [k-ye]**) +- (Feb 20, 2020) v0.5.2 released + - Gradients for `ti.pow` now supported (by **Yubin Peng [archibate]**) + - Multi-threaded unit testing (by **Yubin Peng [archibate]**) + - Fixed Taichi crashing when starting multiple instances simultaneously (by **Yubin Peng [archibate]**) + - Metal backend now supports `ti.pow` (by **Ye Kuang [k-ye]**) + - Better algebraic simplification (by **Mingkuan Xu [xumingkuan]**) + - `ti.normalized` now optionally takes a argument `eps` to prevent division by zero in differentiable programming + - Improved random number generation by decorrelating PRNG streams on CUDA + - Set environment variable `TI_LOG_LEVEL` to `trace`, `debug`, `info`, `warn`, `error` to filter out/increase verbosity. Default=`info` + - [bug fix] fixed a loud failure on differentiable programming code generation due to a new optimization pass + - Added `ti.GUI.triangle` [example](https://github.com/taichi-dev/taichi/blob/master/misc/test_gui.py#L11) + - Doc update: added `ti.cross` for 3D cross products + - Use environment variable `TI_TEST_THREADS` to override testing threads + - [For Taichi developers, bug fix] `ti.init(print_processed=True)` renamed to `ti.init(print_preprocessed=True)` + - Various development infrastructure improvements by **Yubin Peng [archibate]** + - Official Python3.6 - Python3.8 packages on OS X (by **wYw [Detavern]**) +- (Feb 16, 2020) v0.5.1 released + - Keyboard and mouse events supported in the GUI system. Check out [mpm128.py](https://github.com/taichi-dev/taichi/blob/4f5cc09ae0e35a47ad71fdc582c1ecd5202114d8/examples/mpm128.py) for a interactive demo! (by **Yubin Peng [archibate] and Ye Kuang [k-ye]**) + - Basic algebraic simplification passes (by **Mingkuan Xu [xumingkuan]**) + - (For developers) `ti` (`ti.exe`) command supported on Windows after setting `%PATH%` correctly (by **Mingkuan Xu [xumingkuan]**) + - General power operator `x ** y` now supported in Taichi kernels (by **Yubin Peng [archibate]**) + - `.dense(...).pointer()` now abbreviated as `.pointer(...)`. `pointer` now stands for a dense pointer array. This leads to cleaner code and better performance. (by **Kenneth Lozes [KLozes]**) + - (Advanced struct-fors only) `for i in X` now iterates all child instances of `X` instead of `X` itself. Skip this if you only use `X=leaf node` such as `ti.f32/i32/Vector/Matrix`. + - Fixed cuda random number generator racing conditions +- (Feb 14, 2020) **v0.5.0 released with a new Apple Metal GPU backend for Mac OS X users!** (by **Ye Kuang [k-ye]**) + - Just initialize your program with `ti.init(..., arch=ti.metal)` and run Taichi on your Mac GPUs! + - A few takeaways if you do want to use the Metal backend: + - For now, the Metal backend only supports `dense` SNodes and 32-bit data types. It doesn't support `ti.random()` or `print()`. + - Pre-2015 models may encounter some undefined behaviors under certain conditions (e.g. read-after-write). According to our tests, it seems like the memory order on a single GPU thread could go inconsistent on these models. + - The `[]` operator in Python is slow in the current implementation. If you need to do a large number of reads, consider dumping all the data to a `numpy` array via `to_numpy()` as a workaround. For writes, consider first generating the data into a `numpy` array, then copying that to the Taichi variables as a whole. + - Do NOT expect a performance boost yet, and we are still profiling and tuning the new backend. (So far we only saw a big performance improvement on a 2015 MBP 13-inch model.) - (Feb 12, 2020) v0.4.6 released. - (For compiler developers) An error will be raised when `TAICHI_REPO_DIR` is not a valid path (by **Yubin Peng [archibate]**) - Fixed a CUDA backend deadlock bug diff --git a/docs/version b/docs/version index 659914ae9416f..416bfb0a2212b 100644 --- a/docs/version +++ b/docs/version @@ -1 +1 @@ -0.5.8 +0.5.9 diff --git a/misc/make_changelog.py b/misc/make_changelog.py index 649e8b0b9a9bb..58c16955396c1 100644 --- a/misc/make_changelog.py +++ b/misc/make_changelog.py @@ -25,14 +25,14 @@ def format(c): 'cuda': 'CUDA backend', 'doc': 'Documentation', 'infra': 'Infrastructure', - 'ir': 'Intermediate Representation', - 'lang': 'Language and Syntax', + 'ir': 'Intermediate representation', + 'lang': 'Language and syntax', 'metal': 'Metal backend', 'misc': 'Miscellaneous', 'opt': 'Optimization', } -print(f'-(, 2020) v{ver} released') +print(f'- (, 2020) v{ver} released') for i, c in enumerate(commits): s = format(c) if s.startswith('[release]'):