Requests permanently hanging on api's side (upstream server) on client network failure (from downstream)

### Version

Any version of Cortex suffers from this.

### Description

Assuming there's a regular realtime API deployed on a Cortex cluster, if while making a request to the api the connection gets silently killed by the client (due to a network failure), then the request on the API's side will never close. This is really problematic over time - each API replica with a max concurrency limit will exhaust all of its slots with orphaned connections that are never dropped. This is especially obvious with APIs that don't cycle their replicas that often.

### Configuration

```yaml
# cortex.yaml

- name: test-api
  kind: RealtimeAPI
  pod:
    port: 8080
    max_concurrency: 1
    max_queue_length: 128
    containers:
    - name: api
      image: <user-id>.dkr.ecr.<region>.amazonaws.com/cortexlabs/test-api:latest
      compute:
        cpu: 200m
        mem: 128Mi
  autoscaling:
    target_in_flight: 1
    max_replicas: 1
```

```Dockerfile
# cpu.Dockerfile

FROM python:3.8-slim

ENV PYTHONUNBUFFERED TRUE

COPY app/requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r /app/requirements.txt

COPY app /app
WORKDIR /app/
ENV PYTHONPATH=/app

ENV CORTEX_PORT=8080
CMD uvicorn --workers 1 --host 0.0.0.0 --port $CORTEX_PORT main:app
```

```python
# app/main.py

import time
import asyncio
from datetime import datetime

from fastapi import FastAPI, Header, Request, File, UploadFile
from fastapi.responses import PlainTextResponse

app = FastAPI()


@app.middleware("http")
async def add_process_time_header(request: Request, call_next):
    if "x-request-id" in request.headers:
        request_id = request.headers["x-request-id"]
    else:
        request_id = "0"

    print(
        f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {request_id} | middleware start",
        flush=True,
    )
    start_time = time.time()
    response = await call_next(request)
    process_time = time.time() - start_time
    print(
        f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {request_id} | middleware finish: {str(process_time)}",
        flush=True,
    )
    return response


@app.get("/healthz")
def healthz():
    return PlainTextResponse("ok")


@app.post("/")
async def sleep(sleep: float = 0, x_request_id: str = Header(None), image: UploadFile = File(...)):
    print(
        f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {x_request_id} | request start",
        flush=True,
    )
    start_time = time.time()
    image = await image.read()
    process_time = time.time() - start_time
    print(
        f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {x_request_id} | downloaded image: "
        + str(process_time),
        flush=True,
    )
    # time.sleep(sleep)
    await asyncio.sleep(sleep)
    process_time = time.time() - start_time
    print(
        f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {x_request_id} | request finish: "
        + str(process_time),
        flush=True,
    )
    return PlainTextResponse("ok")
```

```text
# app/requirements.txt

uvicorn[standard]
fastapi
python-multipart
```

### Steps to reproduce

1. Build and deploy the API.
2. Have a client ready to run `r=$((1 + $RANDOM % 100)); echo "request ID: $r"; SECONDS=0; curl -X POST -H "x-request-id: $r" -F image=@wp.jpg http://<api-endpoint>?sleep=1 --limit-rate 5k; echo "$SECONDS seconds"`.
3. While the client is making the request, disconnect the client silently (disabling WiFi, disconnecting the Ethernet cable, taking out the battery, etc).
4. Make a new request with no rate limit ``r=$((1 + $RANDOM % 100)); echo "request ID: $r"; SECONDS=0; curl -X POST -H "x-request-id: $r" -F image=@wp.jpg http://<api-endpoint>?sleep=1; echo "$SECONDS seconds"``. Notice how the request never gets fulfilled. That's because the concurrency is set to 1 on the API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Requests permanently hanging on api's side (upstream server) on client network failure (from downstream) #2412

Version

Description

Configuration

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Requests permanently hanging on api's side (upstream server) on client network failure (from downstream) #2412

Description

Version

Description

Configuration

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions