Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vinyl: Don't bloat tx manager with rows when doing space:pairs() #2534

Closed
kostja opened this issue Jun 20, 2017 · 3 comments
Closed

vinyl: Don't bloat tx manager with rows when doing space:pairs() #2534

kostja opened this issue Jun 20, 2017 · 3 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@kostja
Copy link
Contributor

kostja commented Jun 20, 2017

Since the conflict manager still keeps every row, and doesn't have gaps, don't add every row read through a consistent read view to the conflict manager, otherwise you can easily run out of memory when iterating over a medium size space.

These rows are useless in the tx manager anyway.

@kostja kostja added the bug Something isn't working label Jun 20, 2017
@kostja kostja added this to the 1.7.5 milestone Jun 20, 2017
@rtsisyk rtsisyk added the prio2 label Jun 22, 2017
@alyapunov
Copy link
Contributor

alyapunov commented Aug 7, 2017

RO transaction were removed by 9c57ed6.
My original fix was aba925a and it worked fine with the :pairs() case but it didn't get through code review.
I think we can arise my original fix.

@alyapunov
Copy link
Contributor

Blocked by #2700

alyapunov added a commit that referenced this issue Aug 21, 2017
RO TXs was removed by 9c57ed6.
That caused uncontrolled memory overusage by simple space:pairs.

Reimplement back read-only transactions. Send select/get/pairs to
read view if read-write TX was not explicitly started before.

Unfortunatelly the patch disables cache filling by autocommit TXs.

Fixes #2534
@rtsisyk rtsisyk modified the milestones: 1.7.6, 1.7.5 Aug 22, 2017
@rtsisyk rtsisyk modified the milestones: 1.7.6, 1.7.7 Oct 24, 2017
@kostja kostja modified the milestones: 1.7.7, 1.7.8, 1.8.6, wishlist Oct 24, 2017
locker added a commit that referenced this issue Dec 14, 2017
Currently, if tx is not NULL, tx->read_view is used, otherwise the
global read view is used. For the sake of #2534 (read view for read only
autocommit statements), we will need to pass an arbitrary read view to
this function. So let's add the corresponding argument. This also allows
us to drop env from the argument list.

While we are at it, let's also inline vy_index_full_by_stmt() as it is a
trivial wrapper around vy_index_get().

Needed for #2534
locker added a commit that referenced this issue Dec 14, 2017
Every iteration over a secondary index tracks a point in the transaction
manager (due to lookup in the primary index). As a result, if the user
calls 'select' or 'pairs' over a huge data set, it will consume a lot of
memory due to this tracked points, even if the user doesn't uses
transactions.

To mitigate this, let's send all read only transactions to read view
immediately so that tracking is disabled completely during iteration.

Closes #2534
kostja pushed a commit that referenced this issue Dec 19, 2017
Currently, if tx is not NULL, tx->read_view is used, otherwise the
global read view is used. For the sake of #2534 (read view for read only
autocommit statements), we will need to pass an arbitrary read view to
this function. So let's add the corresponding argument. This also allows
us to drop env from the argument list.

While we are at it, let's also inline vy_index_full_by_stmt() as it is a
trivial wrapper around vy_index_get().

Needed for #2534
locker added a commit that referenced this issue Dec 19, 2017
Every iteration over a secondary index tracks a point in the transaction
manager (due to lookup in the primary index). As a result, if the user
calls 'select' or 'pairs' over a huge data set, it will consume a lot of
memory due to this tracked points, even if the user doesn't uses
transactions.

To mitigate this, let's send all read only transactions to read view
immediately so that tracking is disabled completely during iteration.
Note, with this patch select() called outside a transaction doesn't
populate the cache any more, but it seems to be OK as caching large
select() requests results in cache thrashing.

Closes #2534
locker added a commit that referenced this issue Dec 19, 2017
Every iteration over a secondary index tracks a point in the transaction
manager (due to lookup in the primary index). As a result, if the user
calls 'select' or 'pairs' over a huge data set, it will consume a lot of
memory due to this tracked points, even if the user doesn't uses
transactions.

To mitigate this, let's send all read only transactions to read view
immediately so that tracking is disabled completely during iteration.
Note, with this patch select() called outside a transaction doesn't
populate the cache any more, but it seems to be OK as caching large
select() requests results in cache thrashing.

Closes #2534
kostja pushed a commit that referenced this issue Dec 19, 2017
Every iteration over a secondary index tracks a point in the transaction
manager (due to lookup in the primary index). As a result, if the user
calls 'select' or 'pairs' over a huge data set, it will consume a lot of
memory due to this tracked points, even if the user doesn't uses
transactions.

To mitigate this, let's send all read only transactions to read view
immediately so that tracking is disabled completely during iteration.
Note, with this patch select() called outside a transaction doesn't
populate the cache any more, but it seems to be OK as caching large
select() requests results in cache thrashing.

Closes #2534
@locker
Copy link
Member

locker commented Dec 20, 2017

Fixed by a31c2c1

@locker locker closed this as completed Dec 20, 2017
@kostja kostja modified the milestones: wishlist, 1.7.7 Jan 9, 2018
locker added a commit that referenced this issue Mar 7, 2018
When scanning a secondary index, we actually track each tuple in the
transaction manager twice - as a part of the interval read from the
secondary index and as a point in the primary index when retrieving
the full tuple. This bloats the read set - instead of storing just one
interval for a range request, we also store each tuple returned by it,
which may count to thousands. There's no point in this extra tracking,
because whenever we change a tuple in the primary index, we also update
it in all secondary indexes. So let's remove it to save us some memory
and cpu cycles.

This is an alternative fix for #2534
It should also mitigate #3197
locker added a commit that referenced this issue Mar 7, 2018
When scanning a secondary index, we actually track each tuple in the
transaction manager twice - as a part of the interval read from the
secondary index and as a point in the primary index when retrieving
the full tuple. This bloats the read set - instead of storing just one
interval for a range request, we also store each tuple returned by it,
which may count to thousands. There's no point in this extra tracking,
because whenever we change a tuple in the primary index, we also update
it in all secondary indexes. So let's remove it to save us some memory
and cpu cycles.

This is an alternative fix for #2534
It should also mitigate #3197
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants