Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finished queries continue using memory pool on worker #11030

Closed
sshkvar opened this issue Feb 14, 2022 · 14 comments
Closed

Finished queries continue using memory pool on worker #11030

sshkvar opened this issue Feb 14, 2022 · 14 comments

Comments

@sshkvar
Copy link
Contributor

sshkvar commented Feb 14, 2022

Hi, after update to Trino 367 we found that finished queries doesn't release memory pool on workers.

As a result we have a lot of queries which actually finished but still exists in workers memory pool.
On the chart below you can find amount of such queries and see how these amount grown after cluster restart

image

As you can see we have 12000 queries which actually finished but using memory pool on workers.
Les't see example of one worker
image

On this worker we have 166 running queries, but 98 of them already finished but still using memory pool.
For example 20220210_093944_48617_446ev
When I try to click on this query on open in new tab I see query not found

image

@losipiuk
Copy link
Member

@sshkvar thanks for the report.
Can you provide some more information about the queries you are running.

  • what connectors do you use
  • can you quote example problematic query?
  • Is the problem always reproducable for query of a given shape or is it non-deterministic?

@sshkvar
Copy link
Contributor Author

sshkvar commented Feb 14, 2022

@losipiuk thanks for the quick reply,

These queries pretty different like simple select .... from ... or create table ... as select and in different catalogs (connectors) in our case mysql, raptor and potentially iceberg
This problem isn't always reproducible, it is randomly reproducible

@davseitsev
Copy link

Small update. All stuck queries have reservation only in ExchangeOperator:

  "queryMemoryAllocations": {
    "20220216_125714_10727_uzgkx": [
      {
        "tag": "ExchangeOperator",
        "allocation": 13218
      }
    ],
....

Amount of reserved memory is pretty small, up to a few megabytes.

@losipiuk
Copy link
Member

@arhimondr is this manifestation of #10950? Or different thing?

@losipiuk
Copy link
Member

And @sshkvar what Trino version did you use before 367?

@sshkvar
Copy link
Contributor Author

sshkvar commented Feb 17, 2022

And @sshkvar what Trino version did you use before 367?

We used Trino(Presto) 346

@losipiuk
Copy link
Member

And you are not using query/task based retries which were introduced in 367?

@sshkvar
Copy link
Contributor Author

sshkvar commented Feb 17, 2022

No, we are not using retries. As I know it disabled by default

@losipiuk
Copy link
Member

No, we are not using retries. As I know it disabled by default

Yeah.

Tricky - did you maybe have a chance to take a look into logs? Any errors/stacktraces logged?

@arhimondr
Copy link
Contributor

@sshkvar Do you know if the queries you see holding memory reservations finished successfully or failed?

@sshkvar
Copy link
Contributor Author

sshkvar commented Feb 17, 2022

Tricky - did you maybe have a chance to take a look into logs? Any errors/stacktraces logged?

we didn't find any errors

@sshkvar Do you know if the queries you see holding memory reservations finished successfully or failed?

queries finished successfully

@arhimondr
Copy link
Contributor

arhimondr commented Feb 17, 2022

@sshkvar Do your queries contain LIMIT operation?

We've found a race condition that may result in memory not being release properly: #11088. But that shouldn't happen unless you have a LIMIT operation after exchange. For example a LIMIT after JOIN.

Also the likelihood of this problem occurrence should be fairly low. But since you are running thousands of queries it may happen that some of them trigger the race condition mentioned.

@arhimondr
Copy link
Contributor

We should also improve our integration tests to verify that memory pools are empty upon completion: #11093

@losipiuk
Copy link
Member

Should be fixed by #11088

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants