bug: Escalating duplicate reconnections in failedPodHandler

### Provide environment information

- Trigger.dev Version: 4.0.4
- Deployment: Self-hosted Kubernetes (GKE Autopilot)

### Describe the bug

The `failedPodHandler` in the supervisor creates escalating duplicate connections when the Kubernetes informer experiences disconnections. Each time the informer disconnects (typically every 5 minutes due to Kubernetes watch timeouts), multiple error events fire, and each one independently calls `informer.start()`, creating duplicate handlers that compound over time.

**Observed escalation pattern:**
- First disconnect: 2 errors → 2 reconnects
- Second disconnect: 3 errors → 3 reconnects
- Third disconnect: 5 errors → 5 reconnects
- Fourth disconnect: 9 errors → 9 reconnects
- Fifth disconnect: 17 errors → 33 reconnects (!)

This creates growing CPU usage, API server load, and log pollution.

### Reproduction repo

N/A - Bug occurs in the main trigger.dev supervisor when self-hosted on Kubernetes

### To reproduce

1. Deploy Trigger.dev self-hosted on Kubernetes with RBAC configured
2. Monitor supervisor logs for the `failed-pod-handler` component
3. Wait for normal Kubernetes watch timeout (5-10 minutes)
4. Observe error events and reconnection messages escalating over multiple disconnects

**Expected:** 1 error → 1 reconnect per disconnect
**Actual:** Errors and reconnects escalate: 2 → 3 → 5 → 9 → 17 → 33+

### Additional information

**Production Log Evidence:**

Initial disconnect (5 errors):
```json
{"timestamp":"2025-10-22T08:21:05.843Z","message":"error event fired","$name":"failed-pod-handler"...}
{"timestamp":"2025-10-22T08:21:05.844Z","message":"error event fired","$name":"failed-pod-handler"...}
[5 total errors, followed by 5 reconnects]
```

Later disconnect (17 errors → 33 reconnects!):
```json
{"timestamp":"2025-10-22T08:31:07.893Z","message":"error event fired"...} // x17
{"timestamp":"2025-10-22T08:31:08.893Z","message":"informer connected"...} // x33 over 100ms
```

**Root Cause:** The error handler has no guard against concurrent reconnections. When multiple error events fire, each independently calls `informer.start()`, creating compounding duplicate handlers.

**Proposed Fix:** Add a reconnection guard flag to ensure only one reconnection happens at a time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bug: Escalating duplicate reconnections in failedPodHandler #2623

Provide environment information

Describe the bug

Reproduction repo

To reproduce

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

bug: Escalating duplicate reconnections in failedPodHandler #2623

Description

Provide environment information

Describe the bug

Reproduction repo

To reproduce

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions