Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Alfred Telemetry Prometheus endpoint threads from stacking up #70

Closed
thijslemmens opened this issue Feb 16, 2021 · 4 comments · Fixed by #100
Closed

Prevent Alfred Telemetry Prometheus endpoint threads from stacking up #70

thijslemmens opened this issue Feb 16, 2021 · 4 comments · Fixed by #100

Comments

@thijslemmens
Copy link
Contributor

A client had an issue where fetching some metrics was getting very slow, slower then the scraping frequency of Prometheus. After a while the http threads where being exausted.

Possible solution: Wrap the the fetching of metrics in a separate thread that gets killed when taking too long, returning an error message.

@thijslemmens
Copy link
Contributor Author

thijslemmens commented May 12, 2021

Or wrapping in a Future with a timeout:
https://stackoverflow.com/questions/2275443/how-to-timeout-a-thread

We just have to check if that timeout would also stop the query.

@kerkhofsd
Copy link
Contributor

My main concern is: how are we going to define this timeout?

This needs to match exactly with the Prometheus poll interval? If it's less, Prometheus requests might be aborted. If it's more, threads might still stack up?

@thijslemmens
Copy link
Contributor Author

My main concern is: how are we going to define this timeout?

This needs to match exactly with the Prometheus poll interval? If it's less, Prometheus requests might be aborted. If it's more, threads might still stack up?

You are right. A timeout could be an "extra". Controlling concurrency is probably the more structural fix. We can use a lock:
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantLock.html

There is a tryLock() method that returns false immediately if another thread holds the lock. In that case Telemetry can return an error code. This way you can limit Telemetry to only have one active thread at the same time. Would 'one' be acceptable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants