Skip to content

Requests permanently hanging on api's side (upstream server) on client network failure (from downstream) #2412

@RobertLucian

Description

@RobertLucian

Version

Any version of Cortex suffers from this.

Description

Assuming there's a regular realtime API deployed on a Cortex cluster, if while making a request to the api the connection gets silently killed by the client (due to a network failure), then the request on the API's side will never close. This is really problematic over time - each API replica with a max concurrency limit will exhaust all of its slots with orphaned connections that are never dropped. This is especially obvious with APIs that don't cycle their replicas that often.

Configuration

# cortex.yaml

- name: test-api
  kind: RealtimeAPI
  pod:
    port: 8080
    max_concurrency: 1
    max_queue_length: 128
    containers:
    - name: api
      image: <user-id>.dkr.ecr.<region>.amazonaws.com/cortexlabs/test-api:latest
      compute:
        cpu: 200m
        mem: 128Mi
  autoscaling:
    target_in_flight: 1
    max_replicas: 1
# cpu.Dockerfile

FROM python:3.8-slim

ENV PYTHONUNBUFFERED TRUE

COPY app/requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r /app/requirements.txt

COPY app /app
WORKDIR /app/
ENV PYTHONPATH=/app

ENV CORTEX_PORT=8080
CMD uvicorn --workers 1 --host 0.0.0.0 --port $CORTEX_PORT main:app
# app/main.py

import time
import asyncio
from datetime import datetime

from fastapi import FastAPI, Header, Request, File, UploadFile
from fastapi.responses import PlainTextResponse

app = FastAPI()


@app.middleware("http")
async def add_process_time_header(request: Request, call_next):
    if "x-request-id" in request.headers:
        request_id = request.headers["x-request-id"]
    else:
        request_id = "0"

    print(
        f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {request_id} | middleware start",
        flush=True,
    )
    start_time = time.time()
    response = await call_next(request)
    process_time = time.time() - start_time
    print(
        f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {request_id} | middleware finish: {str(process_time)}",
        flush=True,
    )
    return response


@app.get("/healthz")
def healthz():
    return PlainTextResponse("ok")


@app.post("/")
async def sleep(sleep: float = 0, x_request_id: str = Header(None), image: UploadFile = File(...)):
    print(
        f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {x_request_id} | request start",
        flush=True,
    )
    start_time = time.time()
    image = await image.read()
    process_time = time.time() - start_time
    print(
        f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {x_request_id} | downloaded image: "
        + str(process_time),
        flush=True,
    )
    # time.sleep(sleep)
    await asyncio.sleep(sleep)
    process_time = time.time() - start_time
    print(
        f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {x_request_id} | request finish: "
        + str(process_time),
        flush=True,
    )
    return PlainTextResponse("ok")
# app/requirements.txt

uvicorn[standard]
fastapi
python-multipart

Steps to reproduce

  1. Build and deploy the API.
  2. Have a client ready to run r=$((1 + $RANDOM % 100)); echo "request ID: $r"; SECONDS=0; curl -X POST -H "x-request-id: $r" -F image=@wp.jpg http://<api-endpoint>?sleep=1 --limit-rate 5k; echo "$SECONDS seconds".
  3. While the client is making the request, disconnect the client silently (disabling WiFi, disconnecting the Ethernet cable, taking out the battery, etc).
  4. Make a new request with no rate limit r=$((1 + $RANDOM % 100)); echo "request ID: $r"; SECONDS=0; curl -X POST -H "x-request-id: $r" -F image=@wp.jpg http://<api-endpoint>?sleep=1; echo "$SECONDS seconds". Notice how the request never gets fulfilled. That's because the concurrency is set to 1 on the API.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions