Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add statistics for CRUD operations on router #244

Merged
merged 8 commits into from Feb 25, 2022

Commits on Feb 21, 2022

  1. Configuration menu
    Copy the full SHA
    2c234cf View commit details
    Browse the repository at this point in the history

Commits on Feb 25, 2022

  1. stats: add statistics for CRUD router operations

    Add statistics module for collecting metrics of CRUD operations on
    router. Wrap all CRUD operation calls in the statistics collector.
    Statistics must be enabled manually with `crud.cfg`. They can be
    disabled, restarted or re-enabled later.
    
    This patch introduces `crud.cfg`. `crud.cfg` is a tool to set module
    configuration. It is similar to Tarantool `box.cfg`, although we don't
    need to call it to bootstrap the module -- it is used only to change
    configuration. `crud.cfg` is a callable table. To change configuration,
    call it: `crud.cfg{ stats = true }`. You can check table contents as
    with ordinary table, but do not change them directly -- use call
    instead. Table contents is immutable and use proxy approach
    (see [1, 2]). Iterating through `crud.cfg` with pairs is not supported
    yet, refer to #265.
    
    `crud.stats()` returns
    
    ---
    - spaces:
        my_space:
          insert:
            ok:
              latency: 0.002
              count: 19800
              time: 39.6
            error:
              latency: 0.000001
              count: 4
              time: 0.000004
    ...
    
    `spaces` section contains statistics for each observed space.
    If operation has never been called for a space, the corresponding
    field will be empty. If no requests has been called for a
    space, it will not be represented. Space data is based on
    client requests rather than storages schema, so requests
    for non-existing spaces are also collected.
    
    Possible statistics operation labels are
    `insert` (for `insert` and `insert_object` calls),
    `get`, `replace` (for `replace` and `replace_object` calls), `update`,
    `upsert` (for `upsert` and `upsert_object` calls), `delete`,
    `select` (for `select` and `pairs` calls), `truncate`, `len`, `count`
    and `borders` (for `min` and `max` calls).
    
    Each operation section consists of different collectors
    for success calls and error (both error throw and `nil, err`)
    returns. `count` is the total requests count since instance start
    or stats restart. `latency` is the average time of requests execution,
    `time` is the total time of requests execution.
    
    Since `pairs` request behavior differs from any other crud request, its
    statistics collection also has specific behavior. Statistics (`select`
    section) are updated after `pairs` cycle is finished: you
    either have iterated through all records or an error was thrown.
    If your pairs cycle was interrupted with `break`, statistics will
    be collected when pairs objects are cleaned up with Lua garbage
    collector.
    
    Statistics are preserved between package reloads. Statistics are
    preserved between Tarantool Cartridge role reloads [3] if CRUD Cartridge
    roles are used.
    
    1. http://lua-users.org/wiki/ReadOnlyTables
    2. tarantool/tarantool#2867
    3. https://www.tarantool.io/en/doc/latest/book/cartridge/cartridge_api/modules/cartridge.roles/#reload
    
    Part of #224
    DifferentialOrange committed Feb 25, 2022
    Configuration menu
    Copy the full SHA
    7ee2d7d View commit details
    Browse the repository at this point in the history
  2. stats: fix LuaJit breaking pairs __gc

    In some cases LuaJit optimizes using gc_observer table to handle pairs
    object gc. It had lead to incorrect behavior (ignoring some pairs
    interrupted with break in stats) and tests fail in some cases
    (for example, if you run only stats unit tests).
    
    Part of #224
    DifferentialOrange committed Feb 25, 2022
    Configuration menu
    Copy the full SHA
    aeb21e6 View commit details
    Browse the repository at this point in the history
  3. stats: add detailed statistics for select/pairs

    After this patch, statistics `select` section additionally contains
    `details` collectors.
    
    ```
    crud.stats('my_space').select.details
    ---
    - map_reduces: 4
      tuples_fetched: 10500
      tuples_lookup: 238000
    ...
    ```
    
    `map_reduces` is the count of planned map reduces (including those not
    executed successfully). `tuples_fetched` is the count of tuples fetched
    from storages during execution, `tuples_lookup` is the count of tuples
    looked up on storages while collecting responses for calls (including
    scrolls for multibatch requests). Details data is updated as part of
    the request process, so you may get new details before `select`/`pairs`
    call is finished and observed with count, latency and time collectors.
    
    Part of #224
    DifferentialOrange committed Feb 25, 2022
    Configuration menu
    Copy the full SHA
    af30790 View commit details
    Browse the repository at this point in the history
  4. tests: use in-built stats instead of custom helper

    Use in-built `crud.stats()` info instead on `storage_stat` helper
    in tests to track map reduce calls.
    
    Part of #224
    DifferentialOrange committed Feb 25, 2022
    Configuration menu
    Copy the full SHA
    3a38c0d View commit details
    Browse the repository at this point in the history
  5. stats: resolve space name from id

    `crud.len` supports using space id instead of name. After this patch,
    stats wrapper support mapping id to name.
    
    Since using space id is a questionable pattern (see #255), this commit
    may be reverted later.
    
    Part of #224
    DifferentialOrange committed Feb 25, 2022
    Configuration menu
    Copy the full SHA
    d80adc6 View commit details
    Browse the repository at this point in the history
  6. stats: integrate with metrics rock

    If `metrics` [1] found, you can use metrics collectors to store
    statistics. `metrics >= 0.10.0` is required to use metrics driver.
    (`metrics >= 0.9.0` is required to use summary quantiles with
    age buckets. `metrics >= 0.5.0, < 0.9.0` is unsupported
    due to quantile overflow bug [2]. `metrics == 0.9.0` has bug that do
    not permits to create summary collector without quantiles [3].
    In fact, user may use `metrics >= 0.5.0`, `metrics != 0.9.0`
    if he wants to use metrics without quantiles, and `metrics >= 0.9.0`
    if he wants to use metrics with quantiles. But this is confusing,
    so let's use a single restriction for both cases.)
    
    The metrics are part of global registry and can be exported together
    (e.g. to Prometheus) with default tools without any additional
    configuration. Disabling stats destroys the collectors.
    
    Metrics collectors are used by default if supported. To explicitly set
    driver, call `crud.cfg{ stats = true, stats_driver = driver }`
    ('local' or 'metrics'). To enable quantiles, call
    ```
    crud.cfg{
        stats = true,
        stats_driver = 'metrics',
        stats_quantiles = true,
    }
    ```
    With quantiles, `latency` statistics are changed to 0.99 quantile
    of request execution time (with aging). Quantiles computations increases
    performance overhead up to 10% when used in statistics.
    
    Add CI matrix to run tests with `metrics` installed. To get full
    coverage on coveralls, #248 must be resolved.
    
    1. https://github.com/tarantool/metrics
    2. tarantool/metrics#235
    3. tarantool/metrics#262
    
    Closes #224
    DifferentialOrange committed Feb 25, 2022
    Configuration menu
    Copy the full SHA
    112b257 View commit details
    Browse the repository at this point in the history
  7. tests: separate performance tests

    Before this patch, performance tests ran together with unit and
    integration with `--coverage` flag. Coverage analysis cropped the
    result of performance tests to 10-15 times. For metrics integration
    it resulted in timeout errors and drop of performance which is not
    reproduces with coverage disabled. Moreover, before this patch log
    capture was disabled and performance tests did not displayed any
    results after run. Now performance tests also run is separate CI job.
    
    After this patch, `make -C build coverage` will run lightweight
    version of performance test. `make -C build performance` will run real
    performance tests.
    
    You can paste output table to GitHub [1].
    
    This path also reworks current performance test. It adds new cases to
    compare module performance with or without statistics, statistic
    wrappers and compare different metrics drivers and reports new info:
    average call time and max call time.
    
    Performance test result: overhead is 3-10% in case of `local` driver and
    5-15% in case of `metrics` driver, up to 20% for `metrics` with
    quantiles. Based on several runs on HP ProBook 440 G7 i7/16Gb/256SSD.
    
    1. https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables
    
    Closes #233, follows up #224
    DifferentialOrange committed Feb 25, 2022
    Configuration menu
    Copy the full SHA
    0ed9215 View commit details
    Browse the repository at this point in the history