Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 41 additions & 26 deletions docs/api/rate-limits.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,41 +4,56 @@ sidebar_position: 4

# Rate limits

The TrakRF API applies a per-key rate limit to protect the shared service from runaway integrations. Normal customer traffic is well below the limits; this page exists so integrators have the authoritative answer when they hit a `429`.
The TrakRF API applies a per-key rate limit to protect the shared service from runaway integrations. Normal customer traffic is well below the limits; this page is the authoritative answer for integrators who want to pace requests or who hit a `429`.

## How the limit works

Each API key has its own **token bucket**:
Each API key has its own **token bucket** described by two numbers:

- **Refill rate** the steady-state request budget.
- **Bucket capacity** the burst allowance. Requests can spike up to this before the steady-state budget takes over.
- **Burst ceiling — 120.** The most requests you can make in a short spike before a `429`. This is the bucket's capacity, reported live in `X-RateLimit-Limit`.
- **Sustained rate — 60 requests / 60 seconds.** The long-run budget the bucket refills at (~1 token per second), advertised in the `RateLimit-Policy` header as `60;w=60`.

Tokens replenish continuously at the refill rate. A request costs one token. When the bucket is empty, further requests receive `429 Rate Limit Exceeded` until tokens replenish.
A request costs one token; tokens refill continuously at the sustained rate. When the bucket is empty, further requests get `429` until a token refills. So you can burst up to 120 requests, but holding throughput above 60/minute drains the bucket and throttles you down to the sustained rate.

## Default tier

All keys start with this allowance unless your organization's subscription specifies otherwise:

| Limit | Value |
| ------------ | -------------------- |
| Steady-state | 60 requests / minute |
| Burst | 120 requests |
| Allowance | Value | Header |
| -------------- | ------------------ | --------------------------- |
| Sustained rate | 60 requests / 60 s | `RateLimit-Policy: 60;w=60` |
| Burst ceiling | 120 requests | `X-RateLimit-Limit: 120` |

Tier-specific allowances keyed to subscription plans are on the roadmap. If your integration needs more throughput than the default tier, [contact support](mailto:support@trakrf.id).

## Response headers

Every API response on the public surface — including 4xx and 5xx errors (`401`, `403`, `404`, `409`, `415`, `429`, `500`) — includes three headers describing the current state of your bucket:
Every API response on the public surface — including 4xx and 5xx errors (`401`, `403`, `404`, `405`, `409`, `415`, `429`, `500`) — carries four rate-limit headers describing your bucket. They are real pacing signals; you can drive client-side throttling off them.

| Header | Units | Meaning |
| ----------------------- | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `X-RateLimit-Limit` | integer requests | Your steady-state budget per 60-second window (e.g. `60`). |
| `X-RateLimit-Remaining` | integer requests | Steady-state budget remaining, bounded by `Limit`. Decrements only after the burst margin is consumed — a value of `Limit` does not mean a full request budget, it means you are still inside the burst headroom. |
| `X-RateLimit-Reset` | Unix epoch seconds | Wall-clock time at which `Remaining` will next equal `Limit`. Equal to "now" when you already have full quota. |
| Header | Units | Meaning |
| ----------------------- | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `RateLimit-Policy` | `quota;w=window-seconds` | The **sustained** rate — `60;w=60` is 60 requests per 60 seconds. This is where the steady-state "60/min" lives. The burst ceiling in `X-RateLimit-Limit` is higher; throughput held above this policy is eventually throttled with a `429`. |
| `X-RateLimit-Limit` | integer requests | The **burst ceiling** — the maximum requests in a single burst before a `429` (the bucket's capacity, `120` on the default tier). The lower sustained rate is advertised separately in `RateLimit-Policy`. |
| `X-RateLimit-Remaining` | integer requests | Requests remaining before a `429`. **Decrements by one on every request**, from `X-RateLimit-Limit` down to `0`, and refills over time toward the ceiling at the sustained rate. Clients **may** pace on this value. |
| `X-RateLimit-Reset` | Unix epoch seconds | When `X-RateLimit-Remaining` will next equal `X-RateLimit-Limit` — i.e. when the bucket has fully refilled to the ceiling. Equals "now" when you are already at the ceiling. |

The headers ride on every response status the public surface emits, so clients can read them even when the request itself failed — error responses are not a blind spot for budget-tracking dashboards or observability metrics.
The headers ride on every response status the public surface emits — including `429` itself — so clients can read them even when the request failed, and budget-tracking dashboards aren't blind on errors.

**Don't pace on `X-RateLimit-Remaining`.** It looks like a preemptive-pacing input but it isn't one. The bucket holds 2× `Limit` tokens (the burst margin), and `Remaining` is reported as `min(bucket, Limit)` so the header never exceeds the steady-state cap a customer was sold. The practical consequence: `Remaining` stays at `Limit` while the bucket is anywhere inside the burst margin, and only starts decrementing once the bucket has drained below the steady-state line. By the time the header moves, the burst margin is already gone and the next bad spike trips `429`. A client that watches `Remaining < threshold` to back off is using a signal that can't fire until throttling is imminent. Pace on `429` + `Retry-After` instead — that's the integrator-correct contract on this surface.
### What you'll observe

Starting from a full bucket, `X-RateLimit-Remaining` decrements one-for-one and trips `429` when it reaches zero:

```
req 1 → 200 X-RateLimit-Remaining: 119
req 2 → 200 X-RateLimit-Remaining: 118
req 120 → 200 X-RateLimit-Remaining: 0
req 121 → 429 Retry-After: 1
```

`RateLimit-Policy: 60;w=60` rides every one of those responses. After a burst, the bucket refills at ~1 token/second, so sustained throughput settles at 60/minute.

**Pacing guidance:** pace short-term bursts off `X-RateLimit-Remaining` (slow down as it approaches `0`), and size sustained throughput against `RateLimit-Policy` (stay at or below 60/60s so you don't drain the bucket). Honor `429` + `Retry-After` as the backstop.

## When you hit the limit

Expand All @@ -50,7 +65,7 @@ Throttled requests get a `429` with the standard error envelope:
"type": "rate_limited",
"title": "Rate limited",
"status": 429,
"detail": "Retry after 30 seconds",
"detail": "Rate limit exceeded; retry after 1 second",
"instance": "/api/v1/assets",
"request_id": "01J..."
}
Expand All @@ -60,24 +75,24 @@ Throttled requests get a `429` with the standard error envelope:
…plus the `Retry-After` header:

```
Retry-After: 30
Retry-After: 1
```

The `Retry-After` value is an integer number of **seconds** to wait before the next request will succeed (assuming no other throttled requests in the meantime). Respect itretrying immediately will cost another token and may extend the throttle window.
`Retry-After` is an integer number of **seconds** to wait before retrying — delta-seconds, floored at `1`, never an HTTP-date. Respect it: retrying earlier costs another token against an empty bucket and can extend the throttle window. The envelope's `request_id` is mirrored in the `X-Request-Id` response header (see [Errors → Filing support tickets](./errors#filing-support-tickets)).

## Recommended client behavior

- **Back off on 429.** Wait at least `Retry-After` seconds before retrying. Exponential backoff with jitter on repeated 429s is ideal. This is the primary pacing signal on the public surface — see the warning above on why preemptive pacing against `X-RateLimit-Remaining` does not work.
- **Treat the `X-RateLimit-*` headers as observability, not pacing.** Surface them in dashboards and request-budget metrics if useful, but don't drive client-side throttling decisions off `Remaining` or `Reset` — drive throttling off `429` + `Retry-After`.
- **Pace proactively off the headers.** Watch `X-RateLimit-Remaining` for burst budget — slow down as it nears `0` — and keep sustained throughput at or under the `RateLimit-Policy` rate (60/60s on the default tier). Both are real pacing signals now: `Remaining` decrements 1:1, so it moves with every request rather than lagging.
- **Back off on 429.** Wait at least `Retry-After` seconds before retrying; exponential backoff with jitter on repeated 429s is ideal. `Retry-After` is the authoritative backstop even if header-based pacing slips.
- **Don't treat 429 as a server error.** It's a client-side signal — retry policy should differ from retry-on-500. See [Errors](./errors) for the full retry guidance.

## All endpoints participate in the bucket

Every endpoint on the public surface — including `GET /api/v1/orgs/me` and every write under `/api/v1/assets` and `/api/v1/locations` — counts against your bucket and emits `X-RateLimit-Limit`, `X-RateLimit-Remaining`, and `X-RateLimit-Reset` headers. There are no carve-outs.
Every endpoint on the public surface — including `GET /api/v1/orgs/me` and every write under `/api/v1/assets` and `/api/v1/locations` — counts against your bucket and emits the `RateLimit-Policy` and `X-RateLimit-*` headers. There are no carve-outs.

At 60 requests/minute the steady-state budget comfortably covers normal integration traffic, but a few patterns are worth flagging:
At 60 requests/minute sustained, the budget comfortably covers normal integration traffic, but a few patterns are worth flagging:

- **Liveness/connectivity probes against `GET /api/v1/orgs/me`** — these count, so probe at a frequency your budget tolerates. Once per minute is the simplest pattern that always fits inside the default tier with room to spare; once every 30 seconds is fine if `/orgs/me` is the only thing the probe hits. Aggressive sub-second probes will trip throttling.
- **Liveness/connectivity probes against `GET /api/v1/orgs/me`** — these count, so probe at a frequency your budget tolerates. Once per minute always fits the default tier with room to spare; once every 30 seconds is fine if `/orgs/me` is the only thing the probe hits. Aggressive sub-second probes will drain the bucket and trip throttling.
- **Bulk writes** — every `POST` / `PATCH` / `DELETE` under `/api/v1/assets` and `/api/v1/locations` consumes one token. For ingest workloads above the default tier, [contact support](mailto:support@trakrf.id) about a custom tier rather than spreading writes across multiple keys.

`GET /api/v1/orgs/me` returns the standard `{ "data": ... }` envelope, same as every other endpoint on the public surface. See [Private endpoints → /orgs/me](./private-endpoints#orgs-me) for the full catalog entry.
Expand All @@ -88,4 +103,4 @@ Limits apply to each API key independently. An organization with three keys has

## Horizontal scaling

Rate limiting is currently implemented in-process on a single backend instance. A horizontally-scaled deployment will move the bucket state to Redis or similar; the `X-RateLimit-*` headers and `429` semantics remain identical.
Rate limiting is currently implemented in-process on a single backend instance. A horizontally-scaled deployment will move the bucket state to Redis or similar; the `RateLimit-Policy`, `X-RateLimit-*` headers, and `429` semantics remain identical.
Loading