Skip to content

tarantool crashes on batch upsert  #4957

@nazaroid

Description

@nazaroid

Tarantool version:
2.5.0-0-gfef6505
OS version:
centos 7
Bug description:
I have an application where data ingested through batches (each batch is a transaction) of upsert operations on vinyl.
After some time, tnt began to crash on various operations. First, when creating run-files, then on some read operations. After digging into the tnt sources, I came to the conclusion that the reason could be the BOX_UPDATE_OP_CNT_MAX restriction. which is triggered after accumulation of upsert operations between merge lsm-segments.

To confirm the theory, I've made a small test - launched 2 batches (2 transactions) of 3000 upsert-s of the same tuples (with idencial key).
On the second batch tnt began to try to make merge (compaction) and crashed with the same error that caused while my application was running.

Question 1: How can I take under control batch of upsert operations, to avoid triggering BOX_UPDATE_OP_CNT_MAX?

  • The size of the batch is fixed,
  • the size of the restriction we know (BOX_UPDATE_OP_CNT_MAX = 4000).
    Is there any way to control amount of accumulated upserts between merge, to GUARANTEED to avoid triggering BOX_UPDATE_OP_CNT_MAX?

Question 2: What to do with the database if more than 4000 upsert has been accumulated before the merge? Reading does not work and merge does not work, an error occurs with any of these operations.

Steps to reproduce:

  • box.begin()
    upsert 5000 times identical tuples (with same key)
    box.commit()
    box.snapshot() (or index:compact())
    Optional (but very desirable):
  • cfg
    vinyl_run_count_per_level: 2
    feedback_host: https://feedback.tarantool.io
    readahead: 16320
    memtx_dir: memtx_dir
    checkpoint_interval: 60
    replication_anon: false
    replication_connect_timeout: 30
    coredump: false
    vinyl_run_size_ratio: 2
    replication_timeout: 1
    wal_dir_rescan_delay: 2
    checkpoint_count: 1
    too_long_threshold: 0.5
    vinyl_bloom_fpr: 0.05
    strip_core: true
    feedback_enabled: true
    wal_max_size: 1048576
    log_level: 6
    slab_alloc_factor: 1.05
    hot_standby: false
    background: false
    vinyl_dir: vinyl_dir
    vinyl_cache: 134217728
    vinyl_read_threads: 1
    replication_sync_lag: 10
    vinyl_timeout: 60
    net_msg_max: 768
    listen: '3301'
    replication_skip_conflict: false
    vinyl_max_tuple_size: 10485760
    memtx_min_tuple_size: 16
    vinyl_write_threads: 4
    force_recovery: true
    memtx_max_tuple_size: 1048576
    log_format: plain
    feedback_interval: 3600
    sql_cache_size: 5242880
    wal_mode: write
    worker_pool_threads: 4
    memtx_memory: 134217728
    read_only: false
    work_dir: /var/lib/tarantool/3301_work_dir
    wal_dir: wal_dir
    vinyl_memory: 134217728
    checkpoint_wal_threshold: 1000000000000000000
    vinyl_page_size: 8192
    replication_sync_timeout: 300
  • backtrace
    vinyl.compaction.0/102/task D> ClientError at /usr/src/tarantool/src/box/xrow_update.c:166

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingvinyl

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions