Skip to content

fix(monitoring): alloy enable_compression=false + manual install runbook fallback#32

Merged
mfwolffe merged 2 commits into
trunkfrom
fix/alloy-enable-compression-false
May 10, 2026
Merged

fix(monitoring): alloy enable_compression=false + manual install runbook fallback#32
mfwolffe merged 2 commits into
trunkfrom
fix/alloy-enable-compression-false

Conversation

@espadonne
Copy link
Copy Markdown
Contributor

Summary

Two real issues found while bringing up Grafana Cloud monitoring on shithub-prod:

1. shithubd scrape returns up=0

Alloy 1.16's prometheus scraper advertises `Accept-Encoding: gzip`. shithubd's HTTP middleware honors that and returns gzipped bytes with `Content-Encoding: gzip`. Alloy's parser then tries to read the raw `0x1f` gzip magic byte as text and fails:

```
last_error: expected a valid start token, got "\x1f" ("INVALID") while parsing
```

Workaround: `enable_compression = false` on the shithubd scrape block. Disables the negotiation entirely; shithubd returns plain text. Metrics payload is small enough that wire-size cost is irrelevant.

The actual root-cause fix belongs in either Alloy (handle Content-Encoding properly) or shithubd (skip compression on /metrics). This PR is just the operational unblock — filing a follow-up issue for the shithubd-side fix.

2. Operators without ansible can't bootstrap

The role assumes you have ansible installed and a real `inventory/production`. We don't, currently — the droplet was hand-built and there's no production inventory anywhere. New runbook section walks through applying the same actions over plain SSH so the next operator isn't blocked.

3. New troubleshooting entry

Added a "shithubd up=0 while node up=1" entry pointing at the gzip fix above, so the next person who hits this gets to the answer in seconds instead of an hour.

Test plan

  • Applied `enable_compression = false` to the live droplet's hand-written config
  • `alloy debug page` reports `health: up`, `last_error` empty
  • `up{job="shithubd"}` returns 1 in Grafana Explore
  • No code changes outside ansible template + doc

Follow-ups (separate PRs)

  • File issue: shithubd should bypass compression middleware on /metrics
  • Once that lands, can drop `enable_compression = false` again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants