From 19a6d459c71da1641344f04703894d91db335682 Mon Sep 17 00:00:00 2001
From: serge-sans-paille <sergesanspaille@free.fr>
Date: Sat, 15 Nov 2025 18:18:39 +0100
Subject: [PATCH] Improve and update documentation

Update supported architecture and make various usage scenario more
explicit.

Fix #1202
---
 README.md                       |  2 +-
 docs/source/index.rst           | 23 ++++++++++++++++++++---
 docs/source/vectorized_code.rst |  6 ++++++
 3 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/README.md b/README.md
index 74bf0f7c4..0ff4ad6e6 100644
--- a/README.md
+++ b/README.md
@@ -142,7 +142,7 @@ This example outputs:
 
 ### Auto detection of the instruction set extension to be used
 
-The same computation operating on vectors and using the most performant instruction set available:
+The same computation operating on vectors and using the most performant instruction set available at compile time, based on the provided compiler flags (e.g. ``-mavx2`` for GCC and Clang to target AVX2):
 
 ```cpp
 #include <cstddef>
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 6e71a9cbe..63bdbbe64 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -12,15 +12,27 @@ C++ wrappers for SIMD intrinsics.
 Introduction
 ------------
 
-SIMD (Single Instruction, Multiple Data) is a feature of microprocessors that has been available for many years. SIMD instructions perform a single operation
+`SIMD`_ (Single Instruction, Multiple Data) is a feature of microprocessors that has been available for many years. SIMD instructions perform a single operation
 on a batch of values at once, and thus provide a way to significantly accelerate code execution. However, these instructions differ between microprocessor
 vendors and compilers.
 
 `xsimd` provides a unified means for using these features for library authors. Namely, it enables manipulation of batches of scalar and complex numbers with the same arithmetic
 operators and common mathematical functions as for single values.
 
-`xsimd` makes it easy to write a single algorithm, generate one version of the algorithm per micro-architecture and pick the best one at runtime, based on the
-running processor capability.
+There are several ways to use `xsimd`:
+
+- one can write a generic, vectorized, algorithm and compile it as part of their
+  application build, with the right architecture flag;
+
+- one can write a generic, vectorized, algorithm and compile several version of
+  it by just changing the architecture flags, then pick the best version at
+  runtime;
+
+- one can write a vectorized algorithm specialized for a given architecture and
+  still benefit from the high-level abstraction proposed by `xsimd`.
+
+Of course, nothing prevents the combination of several of those approach, but
+more about this in section :ref:`Writing vectorized code`.
 
 You can find out more about this implementation of C++ wrappers for SIMD intrinsics at the `The C++ Scientist`_. The mathematical functions are a
 lightweight implementation of the algorithms also used in `boost.SIMD`_.
@@ -52,6 +64,10 @@ The following SIMD instruction set extensions are supported:
 +--------------+---------------------------------------------------------+
 | WebAssembly  | WASM                                                    |
 +--------------+---------------------------------------------------------+
+| Risc-V       | Vector ISA                                              |
++--------------+---------------------------------------------------------+
+| PowerPC      | VSX                                                     |
++--------------+---------------------------------------------------------+
 
 Licensing
 ---------
@@ -104,6 +120,7 @@ This software is licensed under the BSD-3-Clause license. See the LICENSE file f
 
 
 
+.. _SIMD: https://fr.wikipedia.org/wiki/Single_instruction_multiple_data
 .. _The C++ Scientist: http://johanmabille.github.io/blog/archives/
 .. _boost.SIMD: https://github.com/NumScale/boost.simd
 
diff --git a/docs/source/vectorized_code.rst b/docs/source/vectorized_code.rst
index 18fcf8524..d536fa816 100644
--- a/docs/source/vectorized_code.rst
+++ b/docs/source/vectorized_code.rst
@@ -69,5 +69,11 @@ as a template parameter:
 
 .. literalinclude:: ../../test/doc/explicit_use_of_an_instruction_set_mean_arch_independent.cpp
 
+Then you just need to ``#include`` that file, force instantiation for a specific
+architecture and pass the appropriate flag to the compiler. For instance:
+
+.. literalinclude:: ../../test/doc/sum_sse2.cpp
+
+
 This can be useful to implement runtime dispatching, based on the instruction set detected at runtime. `xsimd` provides a generic machinery :cpp:func:`xsimd::dispatch()` to implement
 this pattern. Based on the above example, instead of calling ``mean{}(arch, a, b, res, tag)``, one can use ``xsimd::dispatch(mean{})(a, b, res, tag)``. More about this can be found in the :ref:`Arch Dispatching` section.