-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
Closed
Labels
Description
Recently, we deployed a ML model with FastAPI, and encountered an issue.
The code looks like this.
from ocr_pipeline.model.ocr_wrapper import OcrWrapper
ocr_wrapper = OcrWrapper(**config.model_load_params) # loads 1.5 GB PyTorch model
...
@api.post('/')
async def predict(file: UploadFile = File(...)):
preds = ocr_wrapper.predict(file.file, **config.model_predict_params)
return json.dumps({"data": preds})The above written command consumes min. 3GB of RAM.
gunicorn --workers 2 --worker-class=uvicorn.workers.UvicornWorker app.main:apiIs there any way to scale the number of workers without consuming too much RAM?
ENVIRONMENT:
Ubuntu 18.04
Python 3.6.9
fastapi==0.61.2
uvicorn==0.12.2
gunicorn==20.0.4
uvloop==0.14.0
cosimo, inspiralpatterns, omarmhaimdat and yhs0602