sql: region leaks on select #3199

Gerold103 · 2018-02-27T11:49:57Z

box.cfg{}
box.sql.execute('CREATE TABLE test (id PRIMARY KEY, a INTEGER, b INTEGER)')
box.sql.execute('INSERT INTO test VALUES (1, 1, 1), (2, 2, 2)')
box.sql.execute('SELECT a, b, a + b FROM test ORDER BY b')

Before SELECT set a breakpoint to region_alloc and in it see region_used. Run SELECT multiple times - region_used increases. It is region leak.

The text was updated successfully, but these errors were encountered:

Gerold103 · 2018-02-27T15:38:31Z

I have a test case:

tarantool> box.sql.execute('SELECT a, b, a + b FROM test ORDER BY b')
---
- - [1, 1, 2]
  - [2, 2, 4]
...

tarantool> fiber.info()[fiber.self().id()].memory
---
- total: 61744
  used: 82
...

tarantool> box.sql.execute('SELECT a, b, a + b FROM test ORDER BY b')
---
- - [1, 1, 2]
  - [2, 2, 4]
...

tarantool> fiber.info()[fiber.self().id()].memory
---
- total: 61744
  used: 164
...

tarantool> box.sql.execute('SELECT a, b, a + b FROM test ORDER BY b')
---
- - [1, 1, 2]
  - [2, 2, 4]
...

tarantool> fiber.info()[fiber.self().id()].memory
---
- total: 61744
  used: 246
...

tarantool> box.sql.execute('SELECT a, b, a + b FROM test ORDER BY b')
---
- - [1, 1, 2]
  - [2, 2, 4]
...

tarantool> fiber.info()[fiber.self().id()].memory
---
- total: 61744
  used: 328
...

As it seen, fiber uses more and more memory.

Gerold103 · 2018-02-27T15:42:16Z

The same is actual for iproto - it uses the same SQL API as Lua.

Korablev77 · 2018-02-27T21:30:17Z

It turns out that bug was introduced by this commit: a706e97
Also, see original ticket: #3021

The idea to allocate memory on region while making record firstly seemed to be reasonable (moreover, it has passed through review and got to trunk). However, ephemeral spaces have nothing in common with txn routine. Thus, txn_commit() won't release memory which is allocated on region to hold tuples to be inserted into ephemeral spaces. So, for ephemeral spaces we MUST (and it is not kind of optimisation as we thought before) allocate memory for records using ordinary malloc.

I can suggest following solutions:

Attempt at pointing out to which table (ephemeral or not) we are making record at the compilation time by setting flag in fifth register of OP_MakeRecord. It may be not so easy, since at compilation stage we are operating only with cursor numbers, so I am not sure that we always can differ them.
Passing cursor number (from which we are able to fetch space id) in the fifth register to OP_MakeRecord (almost like p.1, but we don't have to guess whether cursor with given number is ephemeral or not). On the other hand, it leads to additional moves at runtime (while executing this opcode). Meanwhile, it seems to be very hot and should be optimized as much as possible.
Brutally release all region memory after vdbe's halt. (Actually, it sounds like a nonsense)
Simply revert this patch. Without it allocation will always be using malloc, but before insertion to ordinary space tuple will be copied to region.
Contrary to p.4, always allocate on region, but before insertion to ephemeral space, tuple will be copied to malloc.

Gerold103 · 2018-02-27T22:00:09Z

I like 3, because it allows to move more memory allocations to a region. For example, column metadata. Halt must in such a case do fiber_gc(), if there is no active transaction. But this does not work, when SQL iterators will be introduced. In any case, I very do not like, that now VDBE has many members on runtime memory.
At the second priority I like 1.

And there is a second problem with region: even when it is for insert, it can leak, if VDBE fails before box_insert. As far as I know, we do not call box_txn_begin on autocommit statements? If we do not, then there is no box_txn_rollback, and a region leaks. @Korablev77 can you please check it?

Korablev77 · 2018-02-28T11:09:52Z

Well, p.3 really works (at least now, even if it is unlikely to be the most elegant solution).

Also, I can't reproduce/notice memory leaks, if box_insert fails itself or any routine before (with fiber_gc() in vdbeHalt()), even if I add goto abort_due_to_error in different places among opcodes handling routine (what ordinary happens when smth goes wrong).

As far as I know, we do not call box_txn_begin on autocommit statements?

It seems that we do call:

case OP_TTransaction: {
	if (p->autoCommit) {
		rc = box_txn_begin() == 0 ? SQLITE_OK : SQL_TARANTOOL_ERROR;
	}

And all inserts/deletes/updates start execution from OP_TTransaction.

When ticket #3021 was implemented, alongside with it leak of region memory was introduced. It occured due to allocation memory for tuples at region at once (previously, it was allocated using malloc and before insertion to space it was copied to the region). However, if space is ephemeral, it has nothing in common with txn routine. As a result, region memory for tuples to be inserted in ephemeral spaces isn't pinned to any txn and isn't freed by txn_commit() or txn_rollback(). To avoid this leak, now all used region memory is released in vdbe's halt routine. Closes #3199

Khatskevich · 2018-02-28T22:44:59Z

Is it really true, that it is hard to guess whether you are inserting to ephemeral space or not at compile time?
I insist that if the final solution, for now, is 3, then the new high priority issue is created for further investigations. (Because we are dealing with the most frequent opcode)

Korablev77 · 2018-02-28T23:43:20Z

Is it really true, that it is hard to guess whether you are inserting to ephemeral space or not at compile time?

I would say yes. Anyway, you are able to investigate it yourself.

Gerold103 · 2018-03-01T00:18:31Z

"Zabey, vse ravno ne peresporish"

…

1 марта 2018 г., в 2:43, Nikita Pettik ***@***.***> написал(а): Is it really true, that it is hard to guess whether you are inserting to ephemeral space or not at compile time? I would say yes. Anyway, you are able to investigate it yourself. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3199 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE_WK7WiqhDZLAKJPI4BV4aTQNeElZWiks5tZeSbgaJpZM4SUzSo>.

This probleam appeared because region was cleaned twice: once in sqlite3VdbeHalt() and once in sqlite3VdbeDelete() which was executed during sqlite3_finalize(). Autogenerated ids that were saved there, were fetched after sqlite3VdbeHalt() and before sqlite3_finalize(). In this patch region cleaning in sqlite3VdbeHalt() were removed. Needed for #3505 Follow up #2618 Follow up #3199

This probleam appeared because region was cleaned twice: once in sqlite3VdbeHalt() and once in sqlite3VdbeDelete() which was executed during sqlite3_finalize(). Autogenerated ids that were saved there, were fetched after sqlite3VdbeHalt() and before sqlite3_finalize(). In this patch region cleaning in sqlite3VdbeHalt() were removed. Follow up #2618 Follow up #3199

Too many autogenerated ids leads to SEGFAULT. This problem appeared because region was cleaned twice: once in sqlite3VdbeHalt() and once in sqlite3VdbeDelete() which was executed during sqlite3_finalize(). Autogenerated ids that were saved there, were fetched after sqlite3VdbeHalt() and before sqlite3_finalize(). In this patch region cleaning in sqlite3VdbeHalt() has been removed. Follow up #2618 Follow up #3199

Gerold103 added bug Something isn't working sql labels Feb 27, 2018

kyukhin assigned Khatskevich Feb 27, 2018

kyukhin added this to the 2.1.0 milestone Feb 27, 2018

kyukhin assigned Korablev77 Feb 28, 2018

kyukhin closed this as completed in 2f40023 Mar 7, 2018

Khatskevich mentioned this issue Mar 7, 2018

Allocate memory for ephemeral spaces with malloc and on region for real spaces #3218

Open

Korablev77 mentioned this issue Oct 9, 2018

sql: fix sql/gh-3199-no-mem-leaks.test.lua #3732

Closed

ImeevMA mentioned this issue May 7, 2019

sql: test sql/gh-3199-no-mem-leaks.test.lua fails sometime #4198

Closed

Totktonada mentioned this issue Jan 29, 2020

test: fiber's region leak check is unstable #4750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: region leaks on select #3199

sql: region leaks on select #3199

Gerold103 commented Feb 27, 2018

Gerold103 commented Feb 27, 2018

Gerold103 commented Feb 27, 2018

Korablev77 commented Feb 27, 2018

Gerold103 commented Feb 27, 2018

Korablev77 commented Feb 28, 2018

Khatskevich commented Feb 28, 2018

Korablev77 commented Feb 28, 2018

Gerold103 commented Mar 1, 2018 via email

sql: region leaks on select #3199

sql: region leaks on select #3199

Comments

Gerold103 commented Feb 27, 2018

Gerold103 commented Feb 27, 2018

Gerold103 commented Feb 27, 2018

Korablev77 commented Feb 27, 2018

Gerold103 commented Feb 27, 2018

Korablev77 commented Feb 28, 2018

Khatskevich commented Feb 28, 2018

Korablev77 commented Feb 28, 2018

Gerold103 commented Mar 1, 2018 via email