From 7476aa9767364d27f15c94978da4ef8a64f823a6 Mon Sep 17 00:00:00 2001 From: Ingvar Stepanyan Date: Mon, 19 Apr 2021 13:29:42 +0100 Subject: [PATCH] Update SIMD blog post (#545) --- src/features/simd.md | 142 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 119 insertions(+), 23 deletions(-) diff --git a/src/features/simd.md b/src/features/simd.md index aac590a92..3aa7d7710 100644 --- a/src/features/simd.md +++ b/src/features/simd.md @@ -1,8 +1,8 @@ --- title: 'Fast, parallel applications with WebAssembly SIMD' -author: 'Deepti Gandluri ([@dptig](https://twitter.com/dptig)), Thomas Lively ([@tlively52](https://twitter.com/tlively52))' +author: 'Deepti Gandluri ([@dptig](https://twitter.com/dptig)), Thomas Lively ([@tlively52](https://twitter.com/tlively52)), Ingvar Stepanyan ([@RReverser](https://twitter.com/RReverser))' date: 2020-01-30 -updated: 2020-06-09 +updated: 2021-04-19 tags: - WebAssembly description: 'Bringing vector operations to WebAssembly' @@ -18,28 +18,67 @@ The high-level goal of the WebAssembly SIMD proposal is to introduce vector oper The set of SIMD instructions is large, and varied across architectures. The set of operations included in the WebAssembly SIMD proposal consist of operations that are well supported on a wide variety of platforms, and are proven to be performant. To this end, the current proposal is limited to standardizing Fixed-Width 128-bit SIMD operations. -The current proposal introduces a new v128 value type, and a number of new operations that operate on this type. The criteria used to determine these operations are: +The current proposal introduces a new `v128` value type, and a number of new operations that operate on this type. The criteria used to determine these operations are: - The operations should be well supported across multiple modern architectures. - Performance wins should be positive across multiple relevant architectures within an instruction group. - The chosen set of operations should minimize performance cliffs if any. -The proposal is in active development, both V8 and the toolchain have working prototype implementations for experimentation. As these are prototype implementations, they are subject to change as new operations are added to the proposal. +The proposal is now in [finalized state (phase 4)](https://github.com/WebAssembly/simd/issues/480), both V8 and the toolchain have working implementations. -## Using WebAssembly SIMD +## Enabling SIMD support + +### Feature detection + +First of all, note that SIMD is a new feature and isn't yet available in all browsers with WebAssembly support. You can find which browsers support new WebAssembly features on the [webassembly.org](https://webassembly.org/roadmap/) website. + +To ensure that all users can load your application, you'll need to build two different versions - one with SIMD enabled and one without it - and load the corresponding version depending on feature detection results. To detect SIMD at runtime, you can use [`wasm-feature-detect`](https://github.com/GoogleChromeLabs/wasm-feature-detect) library and load the corresponding module like this: + +```js +import { simd } from 'wasm-feature-detect'; + +(async () => { + const hasSIMD = await simd(); + const module = await ( + hasSIMD + ? import('./module-with-simd.js') + : import('./module-without-simd.js') + ); + // …now use `module` as you normally would +})(); +``` + +To learn about building code with SIMD support, check the section [below](#building-with-simd-support). ### Enabling experimental SIMD support in Chrome -WebAssembly SIMD support is prototyped behind a flag in Chrome, to try out the SIMD support on the browser, pass `--enable-features=WebAssemblySimd`, or toggle the "WebAssembly SIMD support" flag in `chrome://flags`. This work is bleeding edge, and continuously being worked on. To minimize the chances of breakage, please use the latest version of the toolchain as detailed below, and a recent Chrome Canary. If something doesn’t look right, please [file a bug](https://crbug.com/v8). +WebAssembly SIMD support will be available by default from Chrome 91, while on older versions it's gated behind a flag. To try out the SIMD support in stable Chrome, pass `--enable-features=WebAssemblySimd`, or toggle the "WebAssembly SIMD support" flag in `chrome://flags`. Make sure to use the latest version of the toolchain as detailed below, and a recent Chrome Canary. If something doesn’t look right, please [file a bug](https://crbug.com/v8). -### Building C / C++ to target SIMD +WebAssembly SIMD is also available as an origin trial in Chrome versions 84-90. Origin trials allow developers to experiment with a feature on the chosen origin, and provide valuable feedback. Once an origin trial token has been registered, the trial users are opted into the feature for the duration of the trial period without having to update Chrome flags. + +To try this out, read the [origin trial developer guide](https://github.com/GoogleChrome/OriginTrials/blob/gh-pages/developer-guide.md), and [register for an origin trial token](https://developers.chrome.com/origintrials/#/view_trial/-4708513410415853567). More information about origin trials can be found in the [FAQ](https://github.com/GoogleChrome/OriginTrials/blob/gh-pages/developer-guide.md#faq). Please file a [bug](https://bugs.chromium.org/p/v8/issues/entry) if something isn't working as you expect. The origin trial is compatible with Emscripten versions 2.0.17 onwards. + +### Enabling experimental SIMD support in Firefox -WebAssembly’s SIMD support depends on using a recent build of clang with the WebAssembly LLVM backend enabled. Emscripten has support for the WebAssembly SIMD proposal as well. Install and activate the latest-upstream distribution of emscripten using [emsdk](https://emscripten.org/docs/getting_started/downloads.html) to use the bleeding edge SIMD features. +WebAssembly SIMD is available behind a flag in Firefox. Currently it's supported only on x86 and x86-64 architectures. To try out the SIMD support in Firefox, go to `about:config` and enable `javascript.options.wasm_simd`. Note that this feature is still experimental and being worked on. + +### Enabling experimental SIMD support in Node.js + +In Node.js WebAssembly SIMD can be enabled via `--experimental-wasm-simd` flag: ```bash -./emsdk install latest-upstream +node --experimental-wasm-simd main.js +``` -./emsdk activate latest-upstream +## Building with SIMD support + +### Building C / C++ to target SIMD + +WebAssembly’s SIMD support depends on using a recent build of clang with the WebAssembly LLVM backend enabled. Emscripten has support for the WebAssembly SIMD proposal as well. Install and activate the `latest` distribution of emscripten using [emsdk](https://emscripten.org/docs/getting_started/downloads.html) to use the bleeding edge SIMD features. + +```bash +./emsdk install latest +./emsdk activate latest ``` There are a couple of different ways to enable generating SIMD code when porting your application to use SIMD. Once the latest upstream emscripten version has been installed, compile using emscripten, and pass the `-msimd128` flag to enable SIMD. @@ -110,22 +149,85 @@ void multiply_arrays(int* out, int* in_a, int* in_b, int size) { This manually rewritten code assumes that the input and output arrays are aligned and do not alias and that size is a multiple of four. The autovectorizer cannot make these assumptions and has to generate extra code to handle the cases where they are not true, so hand-written SIMD code often ends up being smaller than autovectorized SIMD code. +### Cross-compiling existing C / C++ projects + +Many existing projects already support SIMD when targeting other platforms, in particular [SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) and [AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) instructions on x86 / x86-64 platforms and [NEON](https://en.wikipedia.org/wiki/ARM_architecture#Advanced_SIMD_(Neon)) instructions on ARM platforms. There are two ways those are usually implemented. + +First one is via assembly files that take care of SIMD operations and are linked together with C / C++ during the build process. The assembly syntax and instructions are highly platform-dependant and not portable, so, to make use of SIMD, such projects need to add WebAssembly as an additional supported target and reimplement corresponding functions using either [WebAssembly text format](https://webassembly.github.io/spec/core/text/index.html) or intrinsics described [above](#building-c-/-c++-to-target-simd). + +Another common approach is to use SSE / SSE2 / AVX / NEON intrinsics directly from C / C++ code and here Emscripten can help. Emscripten [provides compatible headers and an emulation layer](https://emscripten.org/docs/porting/simd.html) for all those instruction sets, and an emulation layer that compiles them directly to Wasm intrinsics where possible, or scalarized code otherwise. + +To cross-compile such projects, first enable SIMD via project-specific configuration flags, e.g. `./configure --enable-simd` so that it passes `-msse`, `-msse2`, `-mavx` or `-mfpu=neon` to the compiler and calls corresponding intrinsics. Then, additionally pass `-msimd128` to enable WebAssembly SIMD too either by using `CFLAGS=-msimd128 make …` / `CXXFLAGS="-msimd128 make …` or by modifying the build config directly when targeting Wasm. + +### Building Rust to target SIMD + +When compiling Rust code to target WebAssembly SIMD, you'll need to enable the same `simd128` LLVM feature as in Emscripten above. + +If you can control `rustc` flags directly or via environment variable `RUSTFLAGS`, pass `-C target-feature=+simd128`: + +```bash +rustc … -C target-feature=+simd128 -o out.wasm +``` + +or + +```bash +RUSTFLAGS="-C target-feature=+simd128" cargo build +``` + +Like in Clang / Emscripten, LLVM’s autovectorizers are enabled by default for optimized code when `simd128` feature is enabled. + +For example, Rust equivalent of the `multiply_arrays` example above + +```rust +pub fn multiply_arrays(out: &mut [i32], in_a: &[i32], in_b: &[i32]) { + in_a.iter() + .zip(in_b) + .zip(out) + .for_each(|((a, b), dst)| { + *dst = a * b; + }); +} +``` + +would produce similar autovectorized code for the aligned part of the inputs. + +In order to have manual control over the SIMD operations, you can use the nightly toolchain, enable Rust feature `wasm_simd` and invoke the intrinsics from the [`std::arch::wasm32`](https://doc.rust-lang.org/stable/core/arch/wasm32/index.html#simd) namespace directly: + +```rust +#![feature(wasm_simd)] + +use std::arch::wasm32::*; + +pub unsafe fn multiply_arrays(out: &mut [i32], in_a: &[i32], in_b: &[i32]) { + in_a.chunks(4) + .zip(in_b.chunks(4)) + .zip(out.chunks_mut(4)) + .for_each(|((a, b), dst)| { + let a = v128_load(a.as_ptr() as *const v128); + let b = v128_load(b.as_ptr() as *const v128); + let prod = i32x4_mul(a, b); + v128_store(dst.as_mut_ptr() as *mut v128, prod); + }); +} +``` + +Alternatively, use a helper crate like [`packed_simd`](https://crates.io/crates/packed_simd_2) that abstracts over SIMD implementations on various platforms. + ## Compelling use cases The WebAssembly SIMD proposal seeks to accelerate high compute applications like audio/video codecs, image processing applications, cryptographic applications, etc. Currently WebAssembly SIMD is experimentally supported in widely used open source projects like [Halide](https://github.com/halide/Halide/blob/master/README_webassembly.md), [OpenCV.js](https://docs.opencv.org/3.4/d5/d10/tutorial_js_root.html), and [XNNPACK](https://github.com/google/XNNPACK). Some interesting demos come from the [MediaPipe project](https://github.com/google/mediapipe) by the Google Research team. -As per their description, MediaPipe is a framework for building multimodal (eg. video, audio, any time series data) applied ML pipelines. And they have a [Web version](https://mediapipe.page.link/web), too! +As per their description, MediaPipe is a framework for building multimodal (eg. video, audio, any time series data) applied ML pipelines. And they have a [Web version](https://developers.googleblog.com/2020/01/mediapipe-on-web.html), too! -One of the most visually appealing demos where it’s easy to observe the difference in performance SIMD makes, is a following hand-tracking system. Without SIMD, you can get only around 3 frames per second on a modern laptop, while with SIMD enabled you get a much smoother experience at 15-16 frames per second. +One of the most visually appealing demos where it’s easy to observe the difference in performance SIMD makes, is a CPU-only (non-GPU) build of a hand-tracking system. [Without SIMD](https://storage.googleapis.com/aim-bucket/users/tmullen/demos_10_2019_cdc/rebuild_04_2021/mediapipe_handtracking/gl_graph_demo.html), you can get only around 14-15 FPS (frames per second) on a modern laptop, while [with SIMD enabled in Chrome Canary](https://storage.googleapis.com/aim-bucket/users/tmullen/demos_10_2019_cdc/rebuild_04_2021/mediapipe_handtracking_simd/gl_graph_demo.html) you get a much smoother experience at 38-40 FPS.
-Visit the [demo](https://pursuit.page.link/MediaPipeHandTrackingSimd) in Chrome Canary with SIMD enabled to try it! - Another interesting set of demos that makes use of SIMD for smooth experience, come from OpenCV - a popular computer vision library that can also be compiled to WebAssembly. They’re available by [link](https://bit.ly/opencv-camera-demos), or you can check out the pre-recorded versions below:
@@ -143,14 +245,8 @@ Another interesting set of demos that makes use of SIMD for smooth experience, c
Emoji replacement
-## SIMD Origin Trial - -The WebAssembly SIMD origin trial is available for experimentation in Chrome versions 84-86. Origin trials allow developers to experiment with a feature, and provide valuable feedback. Once an origin trial token has been registered, the trial users are opted into the feature for the duration of the trial period without having to update Chrome flags. - -To try this out, read the [origin trial developer guide](https://github.com/GoogleChrome/OriginTrials/blob/gh-pages/developer-guide.md), and [register for an origin trial token](https://developers.chrome.com/origintrials/#/view_trial/-4708513410415853567). More information about origin trials can be found in the [FAQ](https://github.com/GoogleChrome/OriginTrials/blob/gh-pages/developer-guide.md#faq), please file a [bug](https://bugs.chromium.org/p/v8/issues/entry) if something isn't working as you expect. The origin trial is compatible with emscripten versions 1.39.15 onwards. - -Ongoing experimental support is available on a recent Chrome Canary as detailed [above](#using-webassembly-simd), with the use of latest-upstream Emscripten toolchain. - ## Future work -The current SIMD proposal is in [Phase 3](https://github.com/WebAssembly/meetings/blob/master/process/phases.md#3-implementation-phase-community--working-group), so the future work here is to push the proposal forward in the standardization process. Fixed width SIMD gives significant performance gains over scalar, but it doesn’t effectively leverage wider width vector operations that are available in modern hardware. As the current proposal moves forward, some future facing work here is to determine the feasibility of extending the proposal with longer width operations. +The current fixed-width SIMD proposal is in [Phase 4](https://github.com/WebAssembly/meetings/blob/master/process/phases.md#3-implementation-phase-community--working-group), so it's considered complete. + +Some explorations of future SIMD extensions have started in [Relaxed SIMD](https://github.com/WebAssembly/relaxed-simd) and [Flexible Vectors](https://github.com/WebAssembly/flexible-vectors) proposals, which, at the moment of writing, are in Phase 1.