tarantool · patiencedaur · Aug 24, 2021 · Jul 14, 2021 · Jul 15, 2021 · Jul 29, 2021
diff --git a/conf.py b/conf.py
@@ -62,7 +62,6 @@
     'book/connectors/__*',
     'book/replication/*_1.rst',
     'book/replication/*_2.rst',
-    'book/box/engines/vinyl.rst',
     'getting_started/using_package_manager.rst',
     'getting_started/using_docker.rst',
     'dev_guide/box_protocol.rst',

diff --git a/doc/book/box/engines/index.rst b/doc/book/box/engines/index.rst
@@ -1,44 +1,34 @@
 .. _engines-chapter:
 
-********************************************************************************
 Storage engines
-********************************************************************************
+===============
 
-A storage engine is a set of very-low-level routines which actually store and
-retrieve tuple values. Tarantool offers a choice of two storage engines:
+A storage engine is a set of low-level routines which actually store and
+retrieve :term:`tuple <tuple>` values. Tarantool offers a choice of two storage engines:
 
-* memtx (the in-memory storage engine) is the default and was the first to
-  arrive.
+*   :doc:`memtx <memtx>` is the in-memory storage engine used by default.
+*   :doc:`vinyl <vinyl>` is the on-disk storage engine.
 
-* vinyl (the on-disk storage engine) is a working key-value engine and will
-  especially appeal to users who like to see data go directly to disk, so that
-  recovery time might be shorter and database size might be larger.
+Below you can find comparing of the two engines in brief.
+All the details on how each engine works you can find in the dedicated
+sections:
 
-  On the other hand, vinyl lacks some functions and options that are available
-  with memtx. Where that is the case, the relevant description in this manual
-  contains a note beginning with the words "Note re storage engine".
+.. toctree::
+   :maxdepth: 1
 
-Further in this section we discuss the details of storing data using
-the vinyl storage engine.
-
-To specify that the engine should be vinyl, add the clause ``engine = 'vinyl'``
-when creating a space, for example:
-
-.. code-block:: lua
-
-    space = box.schema.space.create('name', {engine='vinyl'})
+   memtx
+   vinyl
 
 .. _vinyl_diff:
 
-================================================================================
-Differences between memtx and vinyl storage engines
-================================================================================
+Difference between memtx and vinyl storage engines
+--------------------------------------------------
 
-The primary difference between memtx and vinyl is that memtx is an "in-memory"
-engine while vinyl is an "on-disk" engine. An in-memory storage engine is
+The primary difference between memtx and vinyl is that memtx is an in-memory
+engine while vinyl is an on-disk engine. An in-memory storage engine is
 generally faster (each query is usually run under 1 ms), and the memtx engine
-is justifiably the default for Tarantool, but on-disk engine such as vinyl is
-preferable when the database is larger than the available memory and adding more
+is justifiably the default for Tarantool. But on-disk engine such as vinyl is
+preferable when the database is larger than the available memory, and adding more
 memory is not a realistic option.
 
 .. container:: table
@@ -69,5 +59,3 @@ memory is not a realistic option.
     | yield                                       | Does not yield on the select requests unless the     | Yields on the select requests or on its equivalents: |
     |                                             | transaction is committed to WAL                      | get() or pairs()                                     |
     +---------------------------------------------+------------------------------------------------------+------------------------------------------------------+
-
-.. include:: vinyl.rst
diff --git a/doc/book/box/engines/memtx.rst b/doc/book/box/engines/memtx.rst
@@ -0,0 +1,161 @@
+.. _engines-memtx:
+
+Storing data with memtx
+=======================
+
+The ``memtx`` storage engine is used in Tarantool by default. It keeps all data in random-access memory (RAM), and therefore has very low read latency.
+
+The obvious question here is:
+if all the data is stored in memory, how can you prevent the data loss in case of emergency such as outage or Tarantool instance failure?
+
+First of all, Tarantool persists all data changes by writing requests to the write-ahead log (WAL) that is stored on disk.
+Read more about that in the :ref:`memtx-persist` section.
+In case of a distributed application, there is an option of synchronous replication that ensures keeping the data consistent on a quorum of replicas.
+Although replication is not directly a storage engine topic, it is a part of the answer regarding data safety. Read more in the :ref:`memtx-replication` section.
+
+In this chapter, the following topics are discussed in brief with the references to other chapters that explain the subject matter in details.
+
+..  contents::
+    :local:
+    :depth: 1
+
+.. _memtx-memory:
+
+Memory model
+------------
+
+There is a fixed number of independent :ref:`execution threads <atomic-threads_fibers_yields>`.
+The threads don't share state. Instead they exchange data using low-overhead message queues.
+While this approach limits the number of cores that the instance uses,
+it removes competition for the memory bus and ensures peak scalability of memory access and network throughput.
+
+Only one thread, namely, the **transaction processor thread** (further, **TX thread**)
+can access the database, and there is only one TX thread for each Tarantool instance.
+In this thread, transactions are executed in a strictly consecutive order.
+Multi-statement transactions exist to provide isolation:
+each transaction sees a consistent database state and commits all its changes atomically.
+At commit time, a yield happens and all transaction changes are written to :ref:`WAL <internals-wal>` in a single batch.
+In case of errors during transaction execution, a transaction is rolled-back completely.
+Read more in the following sections: :ref:`atomic-transactions`, :ref:`atomic-transactional-manager`.
+
+Within the TX thread, there is a memory area allocated for Tarantool to store data. It's called **Arena**.
+
+.. image:: memtx/arena2.svg
+
+Data is stored in :term:`spaces <space>`. Spaces contain database records—:term:`tuples <tuple>`.
+To access and manipulate the data stored in spaces and tuples, Tarantool builds :doc:`indexes </book/box/indexes>`.
+
+Special `allocators <https://github.com/tarantool/small>`__ manage memory allocations for spaces, tuples, and indexes within the Arena.
+The slab allocator is the main allocator used to store tuples.
+Tarantool has a built-in module called ``box.slab`` which provides the slab allocator statistics
+that can be used to monitor the total memory usage and memory fragmentation.
+For details, see the ``box.slab`` module :doc:`reference </reference/reference_lua/box_slab>`.
+
+.. image:: memtx/spaces_indexes.svg
+
+Also inside the TX thread, there is an event loop. Within the event loop, there are a number of :ref:`fibers <fiber-fibers>`.
+Fibers are cooperative primitives that allows interaction with spaces, that is, reading and writting the data.
+Fibers can interact with the event loop and between each other directly or by using special primitives called channels.
+Due to the usage of fibers and :ref:`cooperative multitasking <atomic-cooperative_multitasking>`, the ``memtx`` engine is lock-free in typical situations.
+
+.. image:: memtx/fibers-channels.svg
+
+To interact with external users, there is a separate :ref:`network thread <atomic-threads_fibers_yields>` also called the **iproto thread**.
+The iproto thread receives a request from the network, parses and checks the statement,
+and transforms it into a special structure—a message containing an executable statement and its options.
+Then the iproto thread ships this message to the TX thread and runs the user's request in a separate fiber.
+
+.. image:: memtx/iproto.svg
+
+.. _memtx-persist:
+
+Data persistence
+----------------
+
+To ensure :ref:`data persistence <index-box_persistence>`, Tarantool does two things.
+
+*   After executing data change requests in memory, Tarantool writes each such request to the :ref:`write-ahead log (WAL) <internals-wal>` files (``.xlog``)
+    that are stored on disk. Tarantool does this via a separate thread called the **WAL thread**.
+
+.. image:: memtx/wal.svg
+
+*   Tarantool periodically takes the entire :doc:`database snapshot </reference/reference_lua/box_snapshot>` and saves it on disk.
+    It is necessary for accelerating instance's restart because when there are too many WAL files, it can be difficult for Tarantool to restart quickly.
+
+    To save a snapshot, there is a special fiber called the **snapshot daemon**.
+    It reads the consistent content of the entire Arena and writes it on disk into a snapshot file (``.snap``).
+    Due of the cooperative multitasking, Tarantool cannot write directly on disk because it is a locking operation.
+    That is why Tarantool interacts with disk via a separate pool of threads from the :doc:`fio </reference/reference_lua/fio>` library.
+
+.. image:: memtx/snapshot03.svg
+
+So, even in emergency situations such as an outage or a Tarantool instance failure,
+when the in-memory database is lost, the data can be restored fully during Tarantool restart.
+
+What happens during the restart:
+
+1.  Tarantool finds the latest snapshot file and reads it.
+2.  Tarantool finds all the WAL files created after that snapshot and reads them as well.
+3.  When the snapshot and WAL files have been read, there is a fully recovered in-memory data set
+    corresponding to the state when the Tarantool instance stopped.
+4.  While reading the snapshot and WAL files, Tarantool is building the primary indexes.
+5.  When all the data is in memory again, Tarantool is building the secondary indexes.
+6.  Tarantool runs the application.
+
+.. _memtx-indexes:
+
+Accessing data
+--------------
+
+To access and manipulate the data stored in memory, Tarantool builds indexes.
+Indexes are also stored in memory within the Arena.
+
+Tarantool supports a number of :ref:`index types <index-types>` intended for different usage scenarios.
+The possible types are TREE, HASH, BITSET, and RTREE.
+
+Select query are possible against secondary index keys as well as primary keys.
+Indexes can have multi-part keys.
+
+For detailed information about indexes, refer to the :doc:`/book/box/indexes` page.
+
+.. _memtx-replication:
+
+Replicating data
+----------------
+
+Although this topic is not directly related to the ``memtx`` engine, it completes the overall picture of how Tarantool works in case of a distributed application.
+
+Replication allows multiple Tarantool instances to work on copies of the same database.
+The copies are kept in sync because each instance can communicate its changes to all the other instances.
+It is implemented via WAL replication.
+
+To send data to a replica, Tarantool runs another thread called **relay**.
+Its purpose is to read the WAL files and send them to replicas.
+On a replica, the fiber called **applier** is run. It receives the changes from a remote node and applies them to the replica's Arena.
+All the changes are being written to WAL files via the replica's WAL thread as if they are done locally.
+
+.. image:: memtx/replica-xlogs.svg
+
+By default, :ref:`replication <replication-architecture>` in Tarantool is asynchronous: if a transaction
+is committed locally on a master node, it does not mean it is replicated onto any
+replicas.
+
+:ref:`Synchronous replication <repl_sync>` exists to solve this problem. Synchronous transactions
+are not considered committed and are not responded to a client until they are
+replicated onto some number of replicas.
+
+For more information on replication, refer to the :doc:`corresponding chapter </book/replication/index>`.
+
+.. _memtx-summary:
+
+Summary
+--------
+
+The main key points describing how the in-memory storage engine works can be summarized in the following way:
+
+*   All data is in RAM.
+*   Access to data is from one thread.
+*   Tarantool writes all data change requests in WAL.
+*   Data snapshots are taken periodically.
+*   Indexes are build to access the data.
+*   WAL can be replicated.
diff --git a/doc/book/box/engines/memtx/arena2.svg b/doc/book/box/engines/memtx/arena2.svg