Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

charts/*: drop wireService label, use app= instead, add servicemonitor support #2413

Merged
merged 9 commits into from
Jun 10, 2022

Conversation

flokli
Copy link
Contributor

@flokli flokli commented May 18, 2022

This aligns labels a bit more with how they look like in other
deployments. In some cases, we were already setting the app label,
too.

There's one possible regression:
The wire-server-metrics helm chart previously configured kube-prometheus-stack to automatically scrape everything with a wireService label at port http,
path /i/metrics.

This custom configuration has been removed, and instead each chart provides the option to create ServiceMonitor resources, which add wire services to metric scraping that way.

Checklist

  • The PR Title explains the impact of the change.
  • The PR description provides context as to why the change should occur and what the code contributes to that effect. This could also be a link to a JIRA ticket or a Github issue, if there is one.
  • If this PR changes development workflow or dependencies, they have been A) automated and B) documented under docs/developer/. All efforts have been taken to minimize development setup breakage or slowdown for co-workers.
  • If HTTP endpoint paths have been added or renamed, or feature configs have changed, the endpoint / config-flag checklist (see Wire-employee only backend wiki page) has been followed.
  • If a cassandra schema migration has been added, I ran make git-add-cassandra-schema to update the cassandra schema documentation.
  • changelog.d contains the following bits of information (details):
    • A file with the changelog entry in one or more suitable sub-sections. The sub-sections are marked by directories inside changelog.d.
    • If new config options introduced: added usage description under docs/reference/config-options.md
    • If new config options introduced: recommended measures to be taken by on-premise instance operators.
    • If a cassandra schema migration is backwards incompatible (see also these docs), measures to be taken by on-premise instance operators are explained.
    • If a data migration (not schema migration) introduced: measures to be taken by on-premise instance operators.
    • If public end-points have been changed or added: does nginz need un upgrade?
    • If internal end-points have been added or changed: which services have to be deployed in a specific order?

@flokli flokli temporarily deployed to cachix May 18, 2022 14:16 Inactive
Copy link
Member

@jschaul jschaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine to me; if CI passes.

I'm not sure if the wire-server-metrics chart truly used anywhere by anyone (not if it's installed, but if anyone is looking at any of those metrics)

@flokli
Copy link
Contributor Author

flokli commented May 18, 2022

Then I'd probably even propose going forward to entirely removing the wire-server-metrics charts, and updating the docs accordingly. Once there are ServiceProbes, a default kube-prometheus stack, or grafana-agent should be able to scrape these metrics.

@flokli flokli marked this pull request as ready for review May 18, 2022 15:41
@akshaymankar
Copy link
Member

The wire-server-metrics chart is used by the federation environments. So, if you do change this flow, please don't forget to update the federation environments. Hopefully, this change doesn't blow up there 😄

@flokli
Copy link
Contributor Author

flokli commented May 19, 2022

The wire-server-metrics chart is used by the federation environments. So, if you do change this flow, please don't forget to update the federation environments.

Are we fine with just stopping to instantiate the wire-server-metrics chart there?

@akshaymankar
Copy link
Member

Are we fine with just stopping to instantiate the wire-server-metrics chart there?

We added metrics there because it was hard for us to figure out what was wrong when things didn't work. So, I would say maybe this is still needed, but best ask people working on federation these days. /cc @smatting @pcapriotti @mdimjasevic @stephen-smith

@smatting
Copy link
Contributor

smatting commented May 23, 2022

So, I would say maybe this is still needed, but best ask people working on federation these days. /cc @smatting @pcapriotti @mdimjasevic @stephen-smith

I don't think we have used these metrics for any debugging yet. @supersven have you made use of them while debugging calling? If not I think it's fine to remove it, wdyt?

Once there are ServiceProbes, a default kube-prometheus stack, or grafana-agent should be able to scrape these metrics

Would this also be eventually available for the federation environments?

@jschaul
Copy link
Member

jschaul commented May 23, 2022

The wire-server-metrics chart works fine as it is right now, and has no dependencies on other things, it even has up-to-date docs in https://docs.wire.com/how-to/install/monitoring.html

So I don't see any reason to break a so-far working chart without providing alternatives. If things are temporarily broken for less than 1 week and you intend to restore the behaviour of wire-server-metrics chart working as before; then I have no issue with this PR. If the intention is to make a whole new monitoring capability at some point in the far future and in the meantime things will be broken on all environments that use wire-server-metrics; then I'm not a fan. I'm not familiar enough with service probes and your overall intention to understand what you'd like to do; please explain.

@supersven
Copy link
Contributor

I don't think we have used these metrics for any debugging yet. @supersven have you made use of them while debugging calling? If not I think it's fine to remove it, wdyt?

Nope. Debugging calling is currently more looking at logs and reasoning about Helm charts.

@flokli
Copy link
Contributor Author

flokli commented May 23, 2022

I guess we can just keep this PR open until it also adds ServiceProbes, which provides an alternative.

@flokli flokli temporarily deployed to cachix June 8, 2022 15:21 Inactive
@flokli flokli temporarily deployed to cachix June 9, 2022 13:59 Inactive
@flokli flokli changed the title charts/*: drop wireService label, use app= instead charts/*: drop wireService label, use app= instead, add servicemonitor support Jun 9, 2022
@flokli
Copy link
Contributor Author

flokli commented Jun 9, 2022

I went through all services in /services, and checked if they expose metrics at /i/metrics.

federator and nginz don't expose metrics at /i/metrics.

For those that did, I created the necessary helm file to create a ServiceMonitor resource, which marks the endpoint for scraping to a prometheus-operator (or something else looking for these CRs, like grafana-agent-operator).

As the CRDs don't come shipped with Kubernetes out of the box, but they're usually installed while installing a monitoring operator, it's opt-in and disabled by default.

@flokli flokli temporarily deployed to cachix June 9, 2022 14:20 Inactive
This aligns labels a bit more with how they look like in other
deployments. In some cases, we were already setting the `app` label,
too.

There's one possible regression:
The wire-server-metrics helm chart configured kube-prometheus-stack to
automatically scrape everything with a wireService label at port http,
path /i/metrics. This will be fixed in a followup, by adding
ServiceProbe resources to each workload that exposes metrics.
@flokli flokli temporarily deployed to cachix June 9, 2022 17:16 Inactive
@flokli flokli merged commit 46d5edb into develop Jun 10, 2022
@flokli flokli deleted the charts-wireService-app branch June 10, 2022 07:53
@flokli
Copy link
Contributor Author

flokli commented Jun 10, 2022

Hmm, it seems it's not possible to modify the spec.selector.matchLabels field of a Deployment or StatefulSet resource, so this might require deleting and recreating these resources.

@flokli
Copy link
Contributor Author

flokli commented Jun 14, 2022

For posterity: release notes to document the upgrade were added in #2472.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants