-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vinyl: slow start from xlogs #1900
Comments
Even if the statement read from xlog on recovery was already dumped by the vinyl engine during the previous run and hence will be ignored by vy_tx_write(), we will still issue vy_get() to execute it in case there are secondary indexes. This makes recovery from xlog way too slow. Fix this by moving the check if a statement being recovered should be applied or not from vy_tx_write() to VinylSpace::execute* Closes #1900
Even if the statement read from xlog on recovery was already dumped by the vinyl engine during the previous run and hence will be ignored by vy_tx_write(), we will still issue vy_get() to execute it in case there are secondary indexes. This makes recovery from xlog way too slow. Fix this by adding extra check if a statement being recovered should be applied or not to VinylSpace::execute*. Closes #1900
Even if the statement read from xlog on recovery was dumped by the vinyl engine before the shutdown and hence would be ignored by vy_tx_write(), we would still issue vy_get() on it in case the space has secondary indexes. This makes recovery from xlog way too slow. Let's try to avoid such unnecessary reads by adding extra check if a statement being replayed is already on disk directly to VinylSpace::execute*. The new check compares the statement's LSN with the LSN which is known to be dumped to disk in all ranges of this index, i.e. min over all runs' max LSNs. Closes #1900
Bloom filter (#1919) has been pushed, please re-check this problem under 1.7.3-269-g4c520cf or later version of Tarantool. |
On local recovery it can occur that a statement replayed from the WAL was written to a run file by the Vinyl engine before restart to free up memory. Replaying such statements is useless, besides it can result in exceeding the memory quota during recovery, which isn't allowed as the scheduler isn't started until recovery is complete. Before commit a0d2768 ("vinyl: apply TX in vy_prepare and make vinyl similar to memtx") we filtered out such statements by checking the statement LSN before inserting it into the memory tree (in vy_commit()). The above mentioned commit broke this check, because it moved LSN assignment after tree insertion, but even before the commit the check was inefficient since in the presence of secondary keys we still read the primary index on recovery even if we didn't intend to apply the statement. That said we need to check if the statement was applied before committing it, in vy_replace(), vy_delete(), vy_update(), and vy_upsert(). As we don't know the statement LSN there yet, we pass a pointer to recovery->vclock to Engine::beginInitialRecovery(). The vclock is incremented before applying each WAL row so we can use it to get the LSN of a statement coming from WAL. Closes #1900
On local recovery it can occur that a statement replayed from the WAL was written to a run file by the Vinyl engine before restart to free up memory. Replaying such statements is useless, besides it can result in exceeding the memory quota during recovery, which isn't allowed as the scheduler isn't started until recovery is complete. Before commit a0d2768 ("vinyl: apply TX in vy_prepare and make vinyl similar to memtx") we filtered out such statements by checking the statement LSN before inserting it into the memory tree (in vy_commit()). The above mentioned commit broke this check, because it moved LSN assignment after tree insertion, but even before the commit the check was inefficient since in the presence of secondary keys we still read the primary index on recovery even if we didn't intend to apply the statement. That said we need to check if the statement was applied before committing it, in vy_replace(), vy_delete(), vy_update(), and vy_upsert(). As we don't know the statement LSN there yet, we pass a pointer to recovery->vclock to Engine::beginInitialRecovery(). The vclock is incremented before applying each WAL row so we can use it to get the LSN of a statement coming from WAL. Closes #1900
On local recovery it can occur that a statement replayed from the WAL was written to a run file by the Vinyl engine before restart to free up memory. Replaying such statements is useless, besides it can result in exceeding the memory quota during recovery, which isn't allowed as the scheduler isn't started until recovery is complete. Before commit a0d2768 ("vinyl: apply TX in vy_prepare and make vinyl similar to memtx") we filtered out such statements by checking the statement LSN before inserting it into the memory tree (in vy_commit()). The above mentioned commit broke this check, because it moved LSN assignment after tree insertion, but even before the commit the check was inefficient since in the presence of secondary keys we still read the primary index on recovery even if we didn't intend to apply the statement. That said we need to check if the statement was applied before committing it, in vy_replace(), vy_delete(), vy_update(), and vy_upsert(). As we don't know the statement LSN there yet, we pass a pointer to recovery->vclock to Engine::beginInitialRecovery(). The vclock is incremented before applying each WAL row so we can use it to get the LSN of a statement coming from WAL. Closes #1900
On local recovery it can occur that a statement replayed from the WAL was written to a run file by the Vinyl engine before restart to free up memory. Replaying such statements is useless, besides it can result in exceeding the memory quota during recovery, which isn't allowed as the scheduler isn't started until recovery is complete. Before commit a0d2768 ("vinyl: apply TX in vy_prepare and make vinyl similar to memtx") we filtered out such statements by checking the statement LSN before inserting it into the memory tree (in vy_commit()). The above mentioned commit broke this check, because it moved LSN assignment after tree insertion, but even before the commit the check was inefficient since in the presence of secondary keys we still read the primary index on recovery even if we didn't intend to apply the statement. That said we need to check if the statement was applied before committing it, in vy_replace(), vy_delete(), vy_update(), and vy_upsert(). As we don't know the statement LSN there yet, we pass a pointer to recovery->vclock to Engine::beginInitialRecovery(). The vclock is incremented before applying each WAL row so we can use it to get the LSN of a statement coming from WAL. Closes #1900
This ticket is to rewrite INSERT with REPLACE and UPDATE with UPSERT during xlog recovery, i.e. make sure xlog recovery is non-reading as much as possible. |
We already replace INSERT with REPLACE during recovery - see src/box/vinyl.c. Regarding UPDATE vs UPSERT, we can't do that, because UPSERT contains a full tuple to insert while UPDATE doesn't. |
Database was created by code:
About 25Gb data was inserted and no snapshot was created.
Tarantool startup taking about 5 hours:
2016-11-07T11:45:08.000 tarantool@hashdb[29147]: main/101/tarantoolctl I> mapping 4294967296 bytes for tuple arena...
...
2016-11-07T17:32:17.000 tarantool@hashdb[29147]: iproto/102/iproto I> binary: started
Looks like unnecessary pread(2) calls during startup:
(gdb) bt
#0 0x00007ff916f3aa93 in pread64 () from /lib64/libpthread.so.0
#1 0x0000000000456f11 in vy_pread_file ()
#2 0x000000000045a12e in vy_run_iterator_load_page ()
#3 0x000000000045acbc in vy_run_iterator_start ()
#4 0x000000000045afb8 in vy_run_iterator_next_key ()
#5 0x00000000004598c2 in vy_merge_iterator_locate ()
#6 0x0000000000461b6b in vy_read_iterator_merge_next_key ()
#7 0x0000000000473641 in vy_read_iterator_next ()
#8 0x00000000004739e9 in vy_get ()
#9 0x0000000000450fa1 in vinyl_replace_all(space, request, vy_tx, txn_stmt) [clone .isra.6] ()
#10 0x00000000004513a6 in VinylSpace::executeReplace(txn, space, request*) ()
#11 0x0000000000489683 in process_rw(request, tuple*) ()
#12 0x0000000000489ff6 in apply_wal_row(xstream, xrow_header) ()
#13 0x00000000004976c5 in recover_remaining_wals(recovery, xstream, vclock*) ()
#14 0x00000000004980fd in recovery_follow_local ()
#15 0x000000000048a976 in box_init() ()
#16 0x000000000048ebce in box_load_cfg ()
#17 0x000000000040f2a8 in load_cfg ()
#18 0x000000000049d5f1 in lbox_cfg_load(lua_State*) ()
#19 0x00000000004d67e7 in lj_BC_FUNCC ()
#20 0x000000000052340d in lj_cf_dofile ()
#21 0x00000000004d67e7 in lj_BC_FUNCC ()
#22 0x00000000004e70af in lua_pcall ()
#23 0x00000000004bef55 in lbox_call ()
#24 0x00000000004b9a31 in run_script_f ()
#25 0x000000000040e8cc in fiber_cxx_invoke(int ()(__va_list_tag), __va_list_tag*) ()
#26 0x00000000004c9310 in fiber_loop ()
#27 0x00000000005eda2f in coro_init ()
The text was updated successfully, but these errors were encountered: