-
Notifications
You must be signed in to change notification settings - Fork 606
Closed
Description
Version
Any version of Cortex suffers from this.
Description
Assuming there's a regular realtime API deployed on a Cortex cluster, if while making a request to the api the connection gets silently killed by the client (due to a network failure), then the request on the API's side will never close. This is really problematic over time - each API replica with a max concurrency limit will exhaust all of its slots with orphaned connections that are never dropped. This is especially obvious with APIs that don't cycle their replicas that often.
Configuration
# cortex.yaml
- name: test-api
kind: RealtimeAPI
pod:
port: 8080
max_concurrency: 1
max_queue_length: 128
containers:
- name: api
image: <user-id>.dkr.ecr.<region>.amazonaws.com/cortexlabs/test-api:latest
compute:
cpu: 200m
mem: 128Mi
autoscaling:
target_in_flight: 1
max_replicas: 1
# cpu.Dockerfile
FROM python:3.8-slim
ENV PYTHONUNBUFFERED TRUE
COPY app/requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r /app/requirements.txt
COPY app /app
WORKDIR /app/
ENV PYTHONPATH=/app
ENV CORTEX_PORT=8080
CMD uvicorn --workers 1 --host 0.0.0.0 --port $CORTEX_PORT main:app
# app/main.py
import time
import asyncio
from datetime import datetime
from fastapi import FastAPI, Header, Request, File, UploadFile
from fastapi.responses import PlainTextResponse
app = FastAPI()
@app.middleware("http")
async def add_process_time_header(request: Request, call_next):
if "x-request-id" in request.headers:
request_id = request.headers["x-request-id"]
else:
request_id = "0"
print(
f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {request_id} | middleware start",
flush=True,
)
start_time = time.time()
response = await call_next(request)
process_time = time.time() - start_time
print(
f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {request_id} | middleware finish: {str(process_time)}",
flush=True,
)
return response
@app.get("/healthz")
def healthz():
return PlainTextResponse("ok")
@app.post("/")
async def sleep(sleep: float = 0, x_request_id: str = Header(None), image: UploadFile = File(...)):
print(
f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {x_request_id} | request start",
flush=True,
)
start_time = time.time()
image = await image.read()
process_time = time.time() - start_time
print(
f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {x_request_id} | downloaded image: "
+ str(process_time),
flush=True,
)
# time.sleep(sleep)
await asyncio.sleep(sleep)
process_time = time.time() - start_time
print(
f"{datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]} | {x_request_id} | request finish: "
+ str(process_time),
flush=True,
)
return PlainTextResponse("ok")
# app/requirements.txt
uvicorn[standard]
fastapi
python-multipart
Steps to reproduce
- Build and deploy the API.
- Have a client ready to run
r=$((1 + $RANDOM % 100)); echo "request ID: $r"; SECONDS=0; curl -X POST -H "x-request-id: $r" -F image=@wp.jpg http://<api-endpoint>?sleep=1 --limit-rate 5k; echo "$SECONDS seconds"
. - While the client is making the request, disconnect the client silently (disabling WiFi, disconnecting the Ethernet cable, taking out the battery, etc).
- Make a new request with no rate limit
r=$((1 + $RANDOM % 100)); echo "request ID: $r"; SECONDS=0; curl -X POST -H "x-request-id: $r" -F image=@wp.jpg http://<api-endpoint>?sleep=1; echo "$SECONDS seconds"
. Notice how the request never gets fulfilled. That's because the concurrency is set to 1 on the API.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working