-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task(Observability): Introduce prometheus-flask-exporter and enable monitoring for presigned urls #864
task(Observability): Introduce prometheus-flask-exporter and enable monitoring for presigned urls #864
Conversation
…onitoring for PreSigned URLs
The style in this PR agrees with This formatting comment was generated automatically by a script in uc-cdis/wool. |
Pull Request Test Coverage Report for Build 10942
💛 - Coveralls |
@@ -0,0 +1,7 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments to explain what this script does and why we need it would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Let me push a commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the work in this script be done in the Dockerfile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This decision was based on research around best practices for prometheus + uwsgi monitoring:
https://www.metricfire.com/blog/monitoring-python-web-app/
Co-authored-by: Pauline Ribeyre <ribeyre@uchicago.edu>
…of github.com:uc-cdis/fence into chore/introduce_prometheus_counter_for_presigned_urls
…om travis config" This reverts commit af42e0f.
fence/__init__.py
Outdated
from werkzeug.middleware.dispatcher import DispatcherMiddleware | ||
|
||
# This MUST be declared before the multiprocess lib is imported/initialized | ||
# to unblock unit testing without having to explicitly declare the env. variable | ||
# More details on this awkwardness: https://github.com/prometheus/client_python/issues/250 | ||
tmp_dir = tempfile.TemporaryDirectory() | ||
os.environ["prometheus_multiproc_dir"] = tmp_dir.name | ||
|
||
from prometheus_client import CollectorRegistry, multiprocess, make_wsgi_app | ||
from prometheus_flask_exporter.multiprocess import UWsgiPrometheusMetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this our standard process? We load prometheus in the init and wire it up below? I haven't seen how we do this yet so wasn't sure if this is a new endeavor or just repeating a common pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a new endeavor.
As far I have researched, this is the best way to accommodate a single point of configuration to track metrics around our RESTful API endpoints across all the UWSGI workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used prometheus at all, let alone at our org, but rather than coupling it with the code, could it trigger by watching the nginx logs?
https://docs.nginx.com/nginx-ingress-controller/logging-and-monitoring/prometheus/
I think this kind of solution might be nice in that it's agnostic from the code, and could be used to emit metrics for multiple types of services rather than just fence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took care of the nginx part already with some prometheus-exporter side card containers:
https://github.com/uc-cdis/cloud-automation/pull/1372/files
Now the next step is to introduce this new interface for us, so we can provide observability for our own functions.
(as in, not being agnostic from the code, but making observability blended into our code).
The presigned url is just the beginning.
imagine all the cool metrics we could observe to get better insights about our end-users' interactions:
https://github.com/uc-cdis/mfence/blob/master/src/mfence/blueprints/other_metrics.py#L16
@@ -1,6 +1,6 @@ | |||
[tool.poetry] | |||
name = "fence" | |||
version = "4.22.0" | |||
version = "4.29.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 4.23.0
? Not sure why this is bumping up to 4.29.0
from 4.22.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the pyproject.toml
versioning has not been updated.
Please check the current semantic versioning sequence in https://github.com/uc-cdis/fence/releases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks for updating that!
@themarcelor i cherry-picked the 3 commits into your branch as we discussed |
…of github.com:uc-cdis/fence into chore/introduce_prometheus_counter_for_presigned_urls
…le id (guid) to avoid overloading prometheus
…of github.com:uc-cdis/fence into chore/introduce_prometheus_counter_for_presigned_urls
…f the file id (guid) to avoid overloading prometheus" This reverts commit db9af96.
@@ -270,6 +279,10 @@ def app_config( | |||
_load_keys(app, root_dir) | |||
_set_authlib_cfgs(app) | |||
|
|||
app.prometheus_counters = {} | |||
# TODO: if prometheus is disabled in config, do not setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@themarcelor @williamhaley Prometheus can be enabled by default, but should we add a setting to the fence config file? would the ability to disable it be useful for Gen3 systems which are not managed by us, or maybe compose-services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we ever want to disable observability? 🤷🏼♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We’re not sending metrics to prometheus, Prometheus fetches the metrics, so the absence of prometheus is completely harmless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Jira Ticket: PXP-7422
Exporting metrics from Fence so we can observe events in our Grafana dashboards.
Result:
Enabling uwsgi stats http server.
Adding a prometheus Counter to the Presigned URL function.
Tracking the username and the file_id for each presigned url request.
Additional info:
Running unit tests:
This change is linked with uc-cdis/cloud-automation#1348.
This change is blocked by: uc-cdis/cloud-automation#1498
New Features