Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: region leaks on select #3199

Closed
Gerold103 opened this issue Feb 27, 2018 · 8 comments
Closed

sql: region leaks on select #3199

Gerold103 opened this issue Feb 27, 2018 · 8 comments
Assignees
Labels
bug Something isn't working sql
Milestone

Comments

@Gerold103
Copy link
Collaborator

box.cfg{}
box.sql.execute('CREATE TABLE test (id PRIMARY KEY, a INTEGER, b INTEGER)')
box.sql.execute('INSERT INTO test VALUES (1, 1, 1), (2, 2, 2)')
box.sql.execute('SELECT a, b, a + b FROM test ORDER BY b')

Before SELECT set a breakpoint to region_alloc and in it see region_used. Run SELECT multiple times - region_used increases. It is region leak.

@Gerold103 Gerold103 added bug Something isn't working sql labels Feb 27, 2018
@kyukhin kyukhin added this to the 2.1.0 milestone Feb 27, 2018
@Gerold103
Copy link
Collaborator Author

I have a test case:

tarantool> box.sql.execute('SELECT a, b, a + b FROM test ORDER BY b')
---
- - [1, 1, 2]
  - [2, 2, 4]
...

tarantool> fiber.info()[fiber.self().id()].memory
---
- total: 61744
  used: 82
...

tarantool> box.sql.execute('SELECT a, b, a + b FROM test ORDER BY b')
---
- - [1, 1, 2]
  - [2, 2, 4]
...

tarantool> fiber.info()[fiber.self().id()].memory
---
- total: 61744
  used: 164
...

tarantool> box.sql.execute('SELECT a, b, a + b FROM test ORDER BY b')
---
- - [1, 1, 2]
  - [2, 2, 4]
...

tarantool> fiber.info()[fiber.self().id()].memory
---
- total: 61744
  used: 246
...

tarantool> box.sql.execute('SELECT a, b, a + b FROM test ORDER BY b')
---
- - [1, 1, 2]
  - [2, 2, 4]
...

tarantool> fiber.info()[fiber.self().id()].memory
---
- total: 61744
  used: 328
...

As it seen, fiber uses more and more memory.

@Gerold103
Copy link
Collaborator Author

The same is actual for iproto - it uses the same SQL API as Lua.

@Korablev77
Copy link
Contributor

It turns out that bug was introduced by this commit: a706e97
Also, see original ticket: #3021

The idea to allocate memory on region while making record firstly seemed to be reasonable (moreover, it has passed through review and got to trunk). However, ephemeral spaces have nothing in common with txn routine. Thus, txn_commit() won't release memory which is allocated on region to hold tuples to be inserted into ephemeral spaces. So, for ephemeral spaces we MUST (and it is not kind of optimisation as we thought before) allocate memory for records using ordinary malloc.

I can suggest following solutions:

  1. Attempt at pointing out to which table (ephemeral or not) we are making record at the compilation time by setting flag in fifth register of OP_MakeRecord. It may be not so easy, since at compilation stage we are operating only with cursor numbers, so I am not sure that we always can differ them.

  2. Passing cursor number (from which we are able to fetch space id) in the fifth register to OP_MakeRecord (almost like p.1, but we don't have to guess whether cursor with given number is ephemeral or not). On the other hand, it leads to additional moves at runtime (while executing this opcode). Meanwhile, it seems to be very hot and should be optimized as much as possible.

  3. Brutally release all region memory after vdbe's halt. (Actually, it sounds like a nonsense)

  4. Simply revert this patch. Without it allocation will always be using malloc, but before insertion to ordinary space tuple will be copied to region.

  5. Contrary to p.4, always allocate on region, but before insertion to ephemeral space, tuple will be copied to malloc.

@Gerold103
Copy link
Collaborator Author

I like 3, because it allows to move more memory allocations to a region. For example, column metadata. Halt must in such a case do fiber_gc(), if there is no active transaction. But this does not work, when SQL iterators will be introduced. In any case, I very do not like, that now VDBE has many members on runtime memory.
At the second priority I like 1.

And there is a second problem with region: even when it is for insert, it can leak, if VDBE fails before box_insert. As far as I know, we do not call box_txn_begin on autocommit statements? If we do not, then there is no box_txn_rollback, and a region leaks. @Korablev77 can you please check it?

@Korablev77
Copy link
Contributor

Well, p.3 really works (at least now, even if it is unlikely to be the most elegant solution).

Also, I can't reproduce/notice memory leaks, if box_insert fails itself or any routine before (with fiber_gc() in vdbeHalt()), even if I add goto abort_due_to_error in different places among opcodes handling routine (what ordinary happens when smth goes wrong).

As far as I know, we do not call box_txn_begin on autocommit statements?

It seems that we do call:

case OP_TTransaction: {
	if (p->autoCommit) {
		rc = box_txn_begin() == 0 ? SQLITE_OK : SQL_TARANTOOL_ERROR;
	}

And all inserts/deletes/updates start execution from OP_TTransaction.

Korablev77 added a commit that referenced this issue Feb 28, 2018
When ticket #3021 was implemented, alongside with it leak of region
memory was introduced. It occured due to allocation memory for tuples at
region at once (previously, it was allocated using malloc and before
insertion to space it was copied to the region).  However, if space is
ephemeral, it has nothing in common with txn routine. As a result,
region memory for tuples to be inserted in ephemeral spaces isn't pinned
to any txn and isn't freed by txn_commit() or txn_rollback(). To avoid
this leak, now  all used region memory is released in vdbe's halt
routine.

Closes #3199
@Khatskevich
Copy link
Contributor

Is it really true, that it is hard to guess whether you are inserting to ephemeral space or not at compile time?
I insist that if the final solution, for now, is 3, then the new high priority issue is created for further investigations. (Because we are dealing with the most frequent opcode)

@Korablev77
Copy link
Contributor

Is it really true, that it is hard to guess whether you are inserting to ephemeral space or not at compile time?

I would say yes. Anyway, you are able to investigate it yourself.

@Gerold103
Copy link
Collaborator Author

Gerold103 commented Mar 1, 2018 via email

@kyukhin kyukhin closed this as completed in 2f40023 Mar 7, 2018
ImeevMA added a commit that referenced this issue Nov 17, 2018
This probleam appeared because region was cleaned twice: once in
sqlite3VdbeHalt() and once in sqlite3VdbeDelete() which was
executed during sqlite3_finalize(). Autogenerated ids that were
saved there, were fetched after sqlite3VdbeHalt() and before
sqlite3_finalize(). In this patch region cleaning in
sqlite3VdbeHalt() were removed.

Needed for #3505
Follow up #2618
Follow up #3199
Gerold103 pushed a commit that referenced this issue Nov 19, 2018
This probleam appeared because region was cleaned twice: once in
sqlite3VdbeHalt() and once in sqlite3VdbeDelete() which was
executed during sqlite3_finalize(). Autogenerated ids that were
saved there, were fetched after sqlite3VdbeHalt() and before
sqlite3_finalize(). In this patch region cleaning in
sqlite3VdbeHalt() were removed.

Follow up #2618
Follow up #3199
ImeevMA added a commit that referenced this issue Nov 19, 2018
This probleam appeared because region was cleaned twice: once in
sqlite3VdbeHalt() and once in sqlite3VdbeDelete() which was
executed during sqlite3_finalize(). Autogenerated ids that were
saved there, were fetched after sqlite3VdbeHalt() and before
sqlite3_finalize(). In this patch region cleaning in
sqlite3VdbeHalt() were removed.

Follow up #2618
Follow up #3199
ImeevMA added a commit that referenced this issue Nov 21, 2018
This probleam appeared because region was cleaned twice: once in
sqlite3VdbeHalt() and once in sqlite3VdbeDelete() which was
executed during sqlite3_finalize(). Autogenerated ids that were
saved there, were fetched after sqlite3VdbeHalt() and before
sqlite3_finalize(). In this patch region cleaning in
sqlite3VdbeHalt() were removed.

Follow up #2618
Follow up #3199
ImeevMA added a commit that referenced this issue Nov 21, 2018
Too many autogenerated ids leads to SEGFAULT. This problem
appeared because region was cleaned twice: once in
sqlite3VdbeHalt() and once in sqlite3VdbeDelete() which was
executed during sqlite3_finalize(). Autogenerated ids that were
saved there, were fetched after sqlite3VdbeHalt() and before
sqlite3_finalize(). In this patch region cleaning in
sqlite3VdbeHalt() has been removed.

Follow up #2618
Follow up #3199
kyukhin pushed a commit that referenced this issue Nov 27, 2018
Too many autogenerated ids leads to SEGFAULT. This problem
appeared because region was cleaned twice: once in
sqlite3VdbeHalt() and once in sqlite3VdbeDelete() which was
executed during sqlite3_finalize(). Autogenerated ids that were
saved there, were fetched after sqlite3VdbeHalt() and before
sqlite3_finalize(). In this patch region cleaning in
sqlite3VdbeHalt() has been removed.

Follow up #2618
Follow up #3199
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sql
Projects
None yet
Development

No branches or pull requests

4 participants