xtensor-stack · serge-sans-paille · Nov 15, 2025
diff --git a/README.md b/README.md
@@ -142,7 +142,7 @@ This example outputs:
 
 ### Auto detection of the instruction set extension to be used
 
-The same computation operating on vectors and using the most performant instruction set available:
+The same computation operating on vectors and using the most performant instruction set available at compile time, based on the provided compiler flags (e.g. ``-mavx2`` for GCC and Clang to target AVX2):
 
 ```cpp
 #include <cstddef>

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -12,15 +12,27 @@ C++ wrappers for SIMD intrinsics.
 Introduction
 ------------
 
-SIMD (Single Instruction, Multiple Data) is a feature of microprocessors that has been available for many years. SIMD instructions perform a single operation
+`SIMD`_ (Single Instruction, Multiple Data) is a feature of microprocessors that has been available for many years. SIMD instructions perform a single operation
 on a batch of values at once, and thus provide a way to significantly accelerate code execution. However, these instructions differ between microprocessor
 vendors and compilers.
 
 `xsimd` provides a unified means for using these features for library authors. Namely, it enables manipulation of batches of scalar and complex numbers with the same arithmetic
 operators and common mathematical functions as for single values.
 
-`xsimd` makes it easy to write a single algorithm, generate one version of the algorithm per micro-architecture and pick the best one at runtime, based on the
-running processor capability.
+There are several ways to use `xsimd`:
+
+- one can write a generic, vectorized, algorithm and compile it as part of their
+  application build, with the right architecture flag;
+
+- one can write a generic, vectorized, algorithm and compile several version of
+  it by just changing the architecture flags, then pick the best version at
+  runtime;
+
+- one can write a vectorized algorithm specialized for a given architecture and
+  still benefit from the high-level abstraction proposed by `xsimd`.
+
+Of course, nothing prevents the combination of several of those approach, but
+more about this in section :ref:`Writing vectorized code`.
 
 You can find out more about this implementation of C++ wrappers for SIMD intrinsics at the `The C++ Scientist`_. The mathematical functions are a
 lightweight implementation of the algorithms also used in `boost.SIMD`_.
@@ -52,6 +64,10 @@ The following SIMD instruction set extensions are supported:
 +--------------+---------------------------------------------------------+
 | WebAssembly  | WASM                                                    |
 +--------------+---------------------------------------------------------+
+| Risc-V       | Vector ISA                                              |
++--------------+---------------------------------------------------------+
+| PowerPC      | VSX                                                     |
++--------------+---------------------------------------------------------+
 
 Licensing
 ---------
@@ -104,6 +120,7 @@ This software is licensed under the BSD-3-Clause license. See the LICENSE file f
 
 
 
+.. _SIMD: https://fr.wikipedia.org/wiki/Single_instruction_multiple_data
 .. _The C++ Scientist: http://johanmabille.github.io/blog/archives/
 .. _boost.SIMD: https://github.com/NumScale/boost.simd
 
diff --git a/docs/source/vectorized_code.rst b/docs/source/vectorized_code.rst
@@ -69,5 +69,11 @@ as a template parameter:
 
 .. literalinclude:: ../../test/doc/explicit_use_of_an_instruction_set_mean_arch_independent.cpp
 
+Then you just need to ``#include`` that file, force instantiation for a specific
+architecture and pass the appropriate flag to the compiler. For instance:
+
+.. literalinclude:: ../../test/doc/sum_sse2.cpp
+
+
 This can be useful to implement runtime dispatching, based on the instruction set detected at runtime. `xsimd` provides a generic machinery :cpp:func:`xsimd::dispatch()` to implement
 this pattern. Based on the above example, instead of calling ``mean{}(arch, a, b, res, tag)``, one can use ``xsimd::dispatch(mean{})(a, b, res, tag)``. More about this can be found in the :ref:`Arch Dispatching` section.