Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task(Observability): Introduce prometheus-flask-exporter and enable monitoring for presigned urls #864

Merged

Conversation

themarcelor
Copy link
Contributor

@themarcelor themarcelor commented Jan 21, 2021

Jira Ticket: PXP-7422

Exporting metrics from Fence so we can observe events in our Grafana dashboards.

Result:

qa-dcp@dev_admin:~$ kc exec -it presigned-url-fence-deployment-664d476489-vt6m4 -- curl http://127.0.0.1/metrics | tail -n3
# HELP pre_signed_url_req_total Multiprocess metric
# TYPE pre_signed_url_req_total counter
pre_signed_url_req_total{file_id="b9c368cd-d681-4d42-b9d1-1778c41fc1f7",username="marcelo@email.com"} 4.0
  1. Adding new dependencies:
prometheus_client = "^0.5.0"
prometheus-flask-exporter = "^0.18.1"
  1. Enabling uwsgi stats http server.

  2. Adding a prometheus Counter to the Presigned URL function.
    Tracking the username and the file_id for each presigned url request.

Additional info:

  • The Disk space usage is negligible. The metrics & labels remain the same, the only thing that changes is the value of each metric (counter / gauges / histograms):
$ kc exec -it presigned-url-fence-deployment-594bc9f4cb-pf4d7 -- bash
...
bash-4.4# du -sh /var/www/metrics/metrics.txt
16.0K	/var/www/metrics/metrics.txt
bash-4.4# du -sh /var/tmp/uwsgi_flask_metrics/
56.0K	/var/tmp/uwsgi_flask_metrics/

Running unit tests:

fence [chore/introduce_prometheus_counter_for_presigned_urls●] % poetry run pytest -vv tests/data/test_data.py::test_indexd_download_file

This change is linked with uc-cdis/cloud-automation#1348.
This change is blocked by: uc-cdis/cloud-automation#1498

New Features

  • Exporting metrics from Fence so we can observe events in our Grafana dashboards.

@github-actions
Copy link

The style in this PR agrees with black. ✔️

This formatting comment was generated automatically by a script in uc-cdis/wool.

@coveralls
Copy link

coveralls commented Jan 21, 2021

Pull Request Test Coverage Report for Build 10942

  • 18 of 18 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.06%) to 70.498%

Totals Coverage Status
Change from base Build 10885: 0.06%
Covered Lines: 5955
Relevant Lines: 8447

💛 - Coveralls

@@ -0,0 +1,7 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments to explain what this script does and why we need it would be useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Let me push a commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the work in this script be done in the Dockerfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decision was based on research around best practices for prometheus + uwsgi monitoring:
https://www.metricfire.com/blog/monitoring-python-web-app/

.travis.yml Outdated Show resolved Hide resolved
deployment/uwsgi/uwsgi.ini Show resolved Hide resolved
dockerrun.bash Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
fence/blueprints/data/indexd.py Outdated Show resolved Hide resolved
fence/blueprints/data/indexd.py Outdated Show resolved Hide resolved
fence/blueprints/data/indexd.py Outdated Show resolved Hide resolved
tests/data/test_data.py Outdated Show resolved Hide resolved
themarcelor and others added 2 commits January 23, 2021 01:12
Co-authored-by: Pauline Ribeyre <ribeyre@uchicago.edu>
@paulineribeyre paulineribeyre requested review from paulineribeyre and removed request for frickjack and itsJiaqi May 12, 2021 19:50
Comment on lines 11 to 20
from werkzeug.middleware.dispatcher import DispatcherMiddleware

# This MUST be declared before the multiprocess lib is imported/initialized
# to unblock unit testing without having to explicitly declare the env. variable
# More details on this awkwardness: https://github.com/prometheus/client_python/issues/250
tmp_dir = tempfile.TemporaryDirectory()
os.environ["prometheus_multiproc_dir"] = tmp_dir.name

from prometheus_client import CollectorRegistry, multiprocess, make_wsgi_app
from prometheus_flask_exporter.multiprocess import UWsgiPrometheusMetrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this our standard process? We load prometheus in the init and wire it up below? I haven't seen how we do this yet so wasn't sure if this is a new endeavor or just repeating a common pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a new endeavor.
As far I have researched, this is the best way to accommodate a single point of configuration to track metrics around our RESTful API endpoints across all the UWSGI workers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used prometheus at all, let alone at our org, but rather than coupling it with the code, could it trigger by watching the nginx logs?

https://docs.nginx.com/nginx-ingress-controller/logging-and-monitoring/prometheus/

I think this kind of solution might be nice in that it's agnostic from the code, and could be used to emit metrics for multiple types of services rather than just fence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took care of the nginx part already with some prometheus-exporter side card containers:
https://github.com/uc-cdis/cloud-automation/pull/1372/files

Now the next step is to introduce this new interface for us, so we can provide observability for our own functions.
(as in, not being agnostic from the code, but making observability blended into our code).
The presigned url is just the beginning.

imagine all the cool metrics we could observe to get better insights about our end-users' interactions:
https://github.com/uc-cdis/mfence/blob/master/src/mfence/blueprints/other_metrics.py#L16

@@ -1,6 +1,6 @@
[tool.poetry]
name = "fence"
version = "4.22.0"
version = "4.29.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 4.23.0? Not sure why this is bumping up to 4.29.0 from 4.22.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the pyproject.toml versioning has not been updated.
Please check the current semantic versioning sequence in https://github.com/uc-cdis/fence/releases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks for updating that!

@paulineribeyre
Copy link
Contributor

@themarcelor i cherry-picked the 3 commits into your branch as we discussed

themarcelor and others added 5 commits May 17, 2021 11:04
…of github.com:uc-cdis/fence into chore/introduce_prometheus_counter_for_presigned_urls
…le id (guid) to avoid overloading prometheus
…of github.com:uc-cdis/fence into chore/introduce_prometheus_counter_for_presigned_urls
…f the file id (guid) to avoid overloading prometheus"

This reverts commit db9af96.
@@ -270,6 +279,10 @@ def app_config(
_load_keys(app, root_dir)
_set_authlib_cfgs(app)

app.prometheus_counters = {}
# TODO: if prometheus is disabled in config, do not setup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@themarcelor @williamhaley Prometheus can be enabled by default, but should we add a setting to the fence config file? would the ability to disable it be useful for Gen3 systems which are not managed by us, or maybe compose-services?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we ever want to disable observability? 🤷🏼‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’re not sending metrics to prometheus, Prometheus fetches the metrics, so the absence of prometheus is completely harmless

fence/blueprints/data/indexd.py Outdated Show resolved Hide resolved
@themarcelor themarcelor merged commit e5d6947 into master May 18, 2021
@themarcelor themarcelor deleted the chore/introduce_prometheus_counter_for_presigned_urls branch May 18, 2021 16:13
Copy link
Contributor

@williamhaley williamhaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants