Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vinyl: slow start from xlogs #1900

Closed
andrew-statsenko opened this issue Nov 8, 2016 · 5 comments
Closed

vinyl: slow start from xlogs #1900

andrew-statsenko opened this issue Nov 8, 2016 · 5 comments
Assignees
Labels
feature A new functionality vinyl
Milestone

Comments

@andrew-statsenko
Copy link

Database was created by code:

local space = box.schema.create_space('hashdb', {engine='vinyl'})
space:create_index('cloud', {unique = true, parts = {1, 'string'}})
space:create_index('sha1',  {unique = true, parts = {2, 'string'}})
space:create_index('md5',   {unique = true, parts = {3, 'string'}})

About 25Gb data was inserted and no snapshot was created.

Tarantool startup taking about 5 hours:

2016-11-07T11:45:08.000 tarantool@hashdb[29147]: main/101/tarantoolctl I> mapping 4294967296 bytes for tuple arena...
...
2016-11-07T17:32:17.000 tarantool@hashdb[29147]: iproto/102/iproto I> binary: started

Looks like unnecessary pread(2) calls during startup:

(gdb) bt
#0 0x00007ff916f3aa93 in pread64 () from /lib64/libpthread.so.0
#1 0x0000000000456f11 in vy_pread_file ()
#2 0x000000000045a12e in vy_run_iterator_load_page ()
#3 0x000000000045acbc in vy_run_iterator_start ()
#4 0x000000000045afb8 in vy_run_iterator_next_key ()
#5 0x00000000004598c2 in vy_merge_iterator_locate ()
#6 0x0000000000461b6b in vy_read_iterator_merge_next_key ()
#7 0x0000000000473641 in vy_read_iterator_next ()
#8 0x00000000004739e9 in vy_get ()
#9 0x0000000000450fa1 in vinyl_replace_all(space, request, vy_tx, txn_stmt) [clone .isra.6] ()
#10 0x00000000004513a6 in VinylSpace::executeReplace(txn, space, request*) ()
#11 0x0000000000489683 in process_rw(request, tuple*) ()
#12 0x0000000000489ff6 in apply_wal_row(xstream, xrow_header) ()
#13 0x00000000004976c5 in recover_remaining_wals(recovery, xstream, vclock*) ()
#14 0x00000000004980fd in recovery_follow_local ()
#15 0x000000000048a976 in box_init() ()
#16 0x000000000048ebce in box_load_cfg ()
#17 0x000000000040f2a8 in load_cfg ()
#18 0x000000000049d5f1 in lbox_cfg_load(lua_State*) ()
#19 0x00000000004d67e7 in lj_BC_FUNCC ()
#20 0x000000000052340d in lj_cf_dofile ()
#21 0x00000000004d67e7 in lj_BC_FUNCC ()
#22 0x00000000004e70af in lua_pcall ()
#23 0x00000000004bef55 in lbox_call ()
#24 0x00000000004b9a31 in run_script_f ()
#25 0x000000000040e8cc in fiber_cxx_invoke(int ()(__va_list_tag), __va_list_tag*) ()
#26 0x00000000004c9310 in fiber_loop ()
#27 0x00000000005eda2f in coro_init ()

locker added a commit that referenced this issue Nov 8, 2016
Even if the statement read from xlog on recovery was already dumped by
the vinyl engine during the previous run and hence will be ignored by
vy_tx_write(), we will still issue vy_get() to execute it in case there
are secondary indexes. This makes recovery from xlog way too slow. Fix
this by moving the check if a statement being recovered should be
applied or not from vy_tx_write() to VinylSpace::execute*

Closes #1900
locker added a commit that referenced this issue Nov 8, 2016
Even if the statement read from xlog on recovery was already dumped by
the vinyl engine during the previous run and hence will be ignored by
vy_tx_write(), we will still issue vy_get() to execute it in case there
are secondary indexes. This makes recovery from xlog way too slow. Fix
this by adding extra check if a statement being recovered should be
applied or not to VinylSpace::execute*.

Closes #1900
locker added a commit that referenced this issue Nov 9, 2016
Even if the statement read from xlog on recovery was dumped by the vinyl
engine before the shutdown and hence would be ignored by vy_tx_write(),
we would still issue vy_get() on it in case the space has secondary
indexes. This makes recovery from xlog way too slow. Let's try to avoid
such unnecessary reads by adding extra check if a statement being
replayed is already on disk directly to VinylSpace::execute*. The new
check compares the statement's LSN with the LSN which is known to be
dumped to disk in all ranges of this index, i.e. min over all runs' max
LSNs.

Closes #1900
@rtsisyk rtsisyk added feature A new functionality vinyl labels Nov 22, 2016
@rtsisyk rtsisyk added this to the 1.7.3 milestone Nov 22, 2016
@rtsisyk rtsisyk added the prio2 label Nov 22, 2016
@rtsisyk
Copy link
Contributor

rtsisyk commented Nov 22, 2016

Please return to this ticket after #1908 and #1919.

@rtsisyk rtsisyk modified the milestones: 1.7.4, 1.7.3 Dec 23, 2016
@rtsisyk
Copy link
Contributor

rtsisyk commented Feb 20, 2017

Bloom filter (#1919) has been pushed, please re-check this problem under 1.7.3-269-g4c520cf or later version of Tarantool.

@rtsisyk rtsisyk added the needs feedback Something is unclear with the issue label Feb 22, 2017
@rtsisyk
Copy link
Contributor

rtsisyk commented Mar 14, 2017

This problem has been fixed by #1908 and #1919.

@rtsisyk rtsisyk closed this as completed Mar 14, 2017
@rtsisyk rtsisyk removed the needs feedback Something is unclear with the issue label Mar 14, 2017
@locker locker reopened this May 4, 2017
@locker locker assigned locker and unassigned alyapunov May 4, 2017
@locker locker modified the milestones: 1.7.5, 1.7.4 May 4, 2017
locker added a commit that referenced this issue May 5, 2017
On local recovery it can occur that a statement replayed from the WAL
was written to a run file by the Vinyl engine before restart to free up
memory. Replaying such statements is useless, besides it can result in
exceeding the memory quota during recovery, which isn't allowed as the
scheduler isn't started until recovery is complete. Before commit
a0d2768 ("vinyl: apply TX in vy_prepare and make vinyl similar to
memtx") we filtered out such statements by checking the statement LSN
before inserting it into the memory tree (in vy_commit()). The above
mentioned commit broke this check, because it moved LSN assignment after
tree insertion, but even before the commit the check was inefficient
since in the presence of secondary keys we still read the primary index
on recovery even if we didn't intend to apply the statement.

That said we need to check if the statement was applied before
committing it, in vy_replace(), vy_delete(), vy_update(), and
vy_upsert(). As we don't know the statement LSN there yet, we pass a
pointer to recovery->vclock to Engine::beginInitialRecovery(). The
vclock is incremented before applying each WAL row so we can use it to
get the LSN of a statement coming from WAL.

Closes #1900
locker added a commit that referenced this issue May 5, 2017
On local recovery it can occur that a statement replayed from the WAL
was written to a run file by the Vinyl engine before restart to free up
memory. Replaying such statements is useless, besides it can result in
exceeding the memory quota during recovery, which isn't allowed as the
scheduler isn't started until recovery is complete. Before commit
a0d2768 ("vinyl: apply TX in vy_prepare and make vinyl similar to
memtx") we filtered out such statements by checking the statement LSN
before inserting it into the memory tree (in vy_commit()). The above
mentioned commit broke this check, because it moved LSN assignment after
tree insertion, but even before the commit the check was inefficient
since in the presence of secondary keys we still read the primary index
on recovery even if we didn't intend to apply the statement.

That said we need to check if the statement was applied before
committing it, in vy_replace(), vy_delete(), vy_update(), and
vy_upsert(). As we don't know the statement LSN there yet, we pass a
pointer to recovery->vclock to Engine::beginInitialRecovery(). The
vclock is incremented before applying each WAL row so we can use it to
get the LSN of a statement coming from WAL.

Closes #1900
locker added a commit that referenced this issue May 8, 2017
On local recovery it can occur that a statement replayed from the WAL
was written to a run file by the Vinyl engine before restart to free up
memory. Replaying such statements is useless, besides it can result in
exceeding the memory quota during recovery, which isn't allowed as the
scheduler isn't started until recovery is complete. Before commit
a0d2768 ("vinyl: apply TX in vy_prepare and make vinyl similar to
memtx") we filtered out such statements by checking the statement LSN
before inserting it into the memory tree (in vy_commit()). The above
mentioned commit broke this check, because it moved LSN assignment after
tree insertion, but even before the commit the check was inefficient
since in the presence of secondary keys we still read the primary index
on recovery even if we didn't intend to apply the statement.

That said we need to check if the statement was applied before
committing it, in vy_replace(), vy_delete(), vy_update(), and
vy_upsert(). As we don't know the statement LSN there yet, we pass a
pointer to recovery->vclock to Engine::beginInitialRecovery(). The
vclock is incremented before applying each WAL row so we can use it to
get the LSN of a statement coming from WAL.

Closes #1900
locker added a commit that referenced this issue May 10, 2017
On local recovery it can occur that a statement replayed from the WAL
was written to a run file by the Vinyl engine before restart to free up
memory. Replaying such statements is useless, besides it can result in
exceeding the memory quota during recovery, which isn't allowed as the
scheduler isn't started until recovery is complete. Before commit
a0d2768 ("vinyl: apply TX in vy_prepare and make vinyl similar to
memtx") we filtered out such statements by checking the statement LSN
before inserting it into the memory tree (in vy_commit()). The above
mentioned commit broke this check, because it moved LSN assignment after
tree insertion, but even before the commit the check was inefficient
since in the presence of secondary keys we still read the primary index
on recovery even if we didn't intend to apply the statement.

That said we need to check if the statement was applied before
committing it, in vy_replace(), vy_delete(), vy_update(), and
vy_upsert(). As we don't know the statement LSN there yet, we pass a
pointer to recovery->vclock to Engine::beginInitialRecovery(). The
vclock is incremented before applying each WAL row so we can use it to
get the LSN of a statement coming from WAL.

Closes #1900
@kostja
Copy link
Contributor

kostja commented May 11, 2017

This ticket is to rewrite INSERT with REPLACE and UPDATE with UPSERT during xlog recovery, i.e. make sure xlog recovery is non-reading as much as possible.

@kostja kostja reopened this May 11, 2017
@locker
Copy link
Member

locker commented May 16, 2017

We already replace INSERT with REPLACE during recovery - see src/box/vinyl.c. Regarding UPDATE vs UPSERT, we can't do that, because UPSERT contains a full tuple to insert while UPDATE doesn't.

@locker locker closed this as completed May 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality vinyl
Projects
None yet
Development

No branches or pull requests

5 participants