Finished queries continue using memory pool on worker #11030

sshkvar · 2022-02-14T08:45:00Z

Hi, after update to Trino 367 we found that finished queries doesn't release memory pool on workers.

As a result we have a lot of queries which actually finished but still exists in workers memory pool.
On the chart below you can find amount of such queries and see how these amount grown after cluster restart

As you can see we have 12000 queries which actually finished but using memory pool on workers.
Les't see example of one worker

On this worker we have 166 running queries, but 98 of them already finished but still using memory pool.
For example 20220210_093944_48617_446ev
When I try to click on this query on open in new tab I see query not found

losipiuk · 2022-02-14T11:28:44Z

@sshkvar thanks for the report.
Can you provide some more information about the queries you are running.

what connectors do you use
can you quote example problematic query?
Is the problem always reproducable for query of a given shape or is it non-deterministic?

sshkvar · 2022-02-14T14:32:29Z

@losipiuk thanks for the quick reply,

These queries pretty different like simple select .... from ... or create table ... as select and in different catalogs (connectors) in our case mysql, raptor and potentially iceberg
This problem isn't always reproducible, it is randomly reproducible

davseitsev · 2022-02-16T17:04:07Z

Small update. All stuck queries have reservation only in ExchangeOperator:

  "queryMemoryAllocations": {
    "20220216_125714_10727_uzgkx": [
      {
        "tag": "ExchangeOperator",
        "allocation": 13218
      }
    ],
....

Amount of reserved memory is pretty small, up to a few megabytes.

losipiuk · 2022-02-17T14:48:51Z

@arhimondr is this manifestation of #10950? Or different thing?

losipiuk · 2022-02-17T15:02:12Z

And @sshkvar what Trino version did you use before 367?

sshkvar · 2022-02-17T15:06:50Z

And @sshkvar what Trino version did you use before 367?

We used Trino(Presto) 346

losipiuk · 2022-02-17T15:08:31Z

And you are not using query/task based retries which were introduced in 367?

sshkvar · 2022-02-17T15:10:03Z

No, we are not using retries. As I know it disabled by default

losipiuk · 2022-02-17T15:21:17Z

No, we are not using retries. As I know it disabled by default

Yeah.

Tricky - did you maybe have a chance to take a look into logs? Any errors/stacktraces logged?

arhimondr · 2022-02-17T15:28:35Z

@sshkvar Do you know if the queries you see holding memory reservations finished successfully or failed?

sshkvar · 2022-02-17T15:33:17Z

Tricky - did you maybe have a chance to take a look into logs? Any errors/stacktraces logged?

we didn't find any errors

@sshkvar Do you know if the queries you see holding memory reservations finished successfully or failed?

queries finished successfully

arhimondr · 2022-02-17T17:29:11Z

@sshkvar Do your queries contain LIMIT operation?

We've found a race condition that may result in memory not being release properly: #11088. But that shouldn't happen unless you have a LIMIT operation after exchange. For example a LIMIT after JOIN.

Also the likelihood of this problem occurrence should be fairly low. But since you are running thousands of queries it may happen that some of them trigger the race condition mentioned.

arhimondr · 2022-02-17T19:21:54Z

We should also improve our integration tests to verify that memory pools are empty upon completion: #11093

losipiuk · 2022-02-18T12:08:53Z

Should be fixed by #11088

arhimondr mentioned this issue Feb 17, 2022

Fix race condition in DirectExchangeClient #11088

Merged

losipiuk closed this as completed Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finished queries continue using memory pool on worker #11030

Finished queries continue using memory pool on worker #11030

sshkvar commented Feb 14, 2022

losipiuk commented Feb 14, 2022

sshkvar commented Feb 14, 2022

davseitsev commented Feb 16, 2022

losipiuk commented Feb 17, 2022

losipiuk commented Feb 17, 2022

sshkvar commented Feb 17, 2022 •

edited

Loading

losipiuk commented Feb 17, 2022

sshkvar commented Feb 17, 2022

losipiuk commented Feb 17, 2022

arhimondr commented Feb 17, 2022

sshkvar commented Feb 17, 2022

arhimondr commented Feb 17, 2022 •

edited

Loading

arhimondr commented Feb 17, 2022

losipiuk commented Feb 18, 2022

Finished queries continue using memory pool on worker #11030

Finished queries continue using memory pool on worker #11030

Comments

sshkvar commented Feb 14, 2022

losipiuk commented Feb 14, 2022

sshkvar commented Feb 14, 2022

davseitsev commented Feb 16, 2022

losipiuk commented Feb 17, 2022

losipiuk commented Feb 17, 2022

sshkvar commented Feb 17, 2022 • edited Loading

losipiuk commented Feb 17, 2022

sshkvar commented Feb 17, 2022

losipiuk commented Feb 17, 2022

arhimondr commented Feb 17, 2022

sshkvar commented Feb 17, 2022

arhimondr commented Feb 17, 2022 • edited Loading

arhimondr commented Feb 17, 2022

losipiuk commented Feb 18, 2022

sshkvar commented Feb 17, 2022 •

edited

Loading

arhimondr commented Feb 17, 2022 •

edited

Loading