DJL Serving Management API

DJL Serving provides a set of API allow user to manage models at runtime:

Register a model
Increase/decrease number of workers for specific model
Describe a model's status
Unregister a model
List registered models

Management API is listening on port 8080 and only accessible from localhost by default. To change the default setting, see DJL Serving Configuration.

Similar as Inference API.

Management APIs

Register a model

Registers a new model as a single model workflow. The workflow name and version matches the model name and version.

POST /models

url - Model url.
model_name - the name of the model and workflow; this name will be used as {workflow_name} in other API as path. If this parameter is not present, modelName will be inferred by url.
model_version - the version of the mode
engine - the name of engine to load the model. The default is MXNet if the model doesn't define its engine.
gpu_id - the GPU device id to load the model. The default is CPU (`-1').
batch_size - the inference batch size. The default value is 1.
max_batch_delay - the maximum delay for batch aggregation. The default value is 100 milliseconds.
max_idle_time - the maximum idle time before the worker thread is scaled down.
min_worker - the minimum number of worker processes. The default value is 1.
max_worker - the maximum number of worker processes. The default is the same as the setting for min_worker.
synchronous - whether or not the creation of worker is synchronous. The default value is true.

curl -X POST "http://localhost:8080/models?url=https%3A%2F%2Fresources.djl.ai%2Ftest-models%2Fmlp.tar.gz"

{
  "status": "Model \"mlp\" registered."
}

Download and load model may take some time, user can choose asynchronous call and check the status later.

The asynchronous call will return before trying to create workers with HTTP code 202:

curl -v -X POST "http://localhost:8080/models?url=https%3A%2F%2Fresources.djl.ai%2Ftest-models%2Fmlp.tar.gz&synchronous=false"

< HTTP/1.1 202 Accepted
< content-type: application/json
< x-request-id: bf998daa-892f-482b-a660-6d0447aa5a7a
< Pragma: no-cache
< Cache-Control: no-cache; no-store, must-revalidate, private
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< content-length: 56
< connection: keep-alive
< 
{
  "status": "Model \"mlp\" registration scheduled."
}

Register a workflow

POST /workflows

url - Workflow url.
engine - the name of engine to load the model. The default is MXNet if the model doesn't define its engine.
gpu_id - the GPU device id to load the model. The default is CPU (`-1').
min_worker - the minimum number of worker processes. The default value is 1.
max_worker - the maximum number of worker processes. The default is the same as the setting for min_worker.
synchronous - whether or not the creation of worker is synchronous. The default value is true.

curl -X POST "http://localhost:8080/workflows?url=https%3A%2F%2Fresources.djl.ai%2Ftest-workflows%2Fmlp.tar.gz"

{
  "status": "Workflow \"mlp\" registered."
}

Download and load model may take some time, user can choose asynchronous call and check the status later.

The asynchronous call will return before trying to create workers with HTTP code 202:

curl -v -X POST "http://localhost:8080/workflows?url=https%3A%2F%2Fresources.djl.ai%2Ftest-workflows%2Fmlp.tar.gz&synchronous=false"

< HTTP/1.1 202 Accepted
< content-type: application/json
< x-request-id: bf998daa-892f-482b-a660-6d0447aa5a7a
< Pragma: no-cache
< Cache-Control: no-cache; no-store, must-revalidate, private
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< content-length: 56
< connection: keep-alive
< 
{
  "status": "Workflow \"mlp\" registration scheduled."
}

Scale workers

PUT /models/{model_name}

PUT /models/{model_name}/{version}

PUT /workflows/{workflow_name}

PUT /workflows/{workflow_name}/{version}

batch_size - the inference batch size. The default value is 1.
max_batch_delay - the maximum delay for batch aggregation. The default value is 100 milliseconds.
max_idle_time - the maximum idle time before the worker thread is scaled down.
min_worker - the minimum number of worker processes. The default value is 1.
max_worker - the maximum number of worker processes. The default is the same as the setting for min_worker.

Use the Scale Worker API to dynamically adjust the number of workers to better serve different inference request loads.

There are two different flavour of this API, synchronous vs asynchronous.

The asynchronous call will return immediately with HTTP code 202:

curl -v -X PUT "http://localhost:8080/workflows/mlp?min_worker=3"

< HTTP/1.1 202 Accepted
< content-type: application/json
< x-request-id: 74b65aab-dea8-470c-bb7a-5a186c7ddee6
< content-length: 33
< connection: keep-alive
< 
{
  "status": "Worker updated"
}

Describe model or workflow

GET /models/{model_name}

GET /workflows/{workflow_name}

Use the Describe Model API to get detail runtime status of a model or workflow:

curl http://localhost:8080/models/mlp

{
  "modelName": "mlp",
  "modelUrl": "https://resources.djl.ai/test-models/mlp.tar.gz",
  "minWorkers": 1,
  "maxWorkers": 1,
  "batchSize": 1,
  "maxBatchDelay": 100,
  "maxIdleTime": 60,
  "status": "Healthy",
  "loadedAtStartup": false,
  "workers": [
    {
      "id": 1,
      "startTime": "2021-07-14T09:01:17.199Z",
      "status": "READY",
      "gpu": false
    }
  ]
}

Unregister a model or workflow

DELETE /models/{model_name}

DELETE /workflows/{workflow_name}

Use the Unregister Model or workflow API to free up system resources:

curl -X DELETE http://localhost:8080/models/mlp

{
  "status": "Workflow \"mlp\" unregistered"
}

List workflows

GET /models

GET /workflows

limit - (optional) the maximum number of items to return. It is passed as a query parameter. The default value is 100.
next_page_token - (optional) queries for next page. It is passed as a query parameter. This value is return by a previous API call.

Use the Workflows API to query current registered models and workflows:

curl "http://localhost:8080/workflows"

This API supports pagination:

curl "http://localhost:8080/models?limit=2&next_page_token=0"

{
  "models": [
    {
      "modelName": "mlp",
      "modelUrl": "https://resources.djl.ai/test-models/mlp.tar.gz"
    }
  ]
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

management_api.md

management_api.md

DJL Serving Management API

Management APIs

Register a model

Register a workflow

Scale workers

Describe model or workflow

Unregister a model or workflow

List workflows

Files

management_api.md

Latest commit

History

management_api.md

File metadata and controls

DJL Serving Management API

Management APIs

Register a model

Register a workflow

Scale workers

Describe model or workflow

Unregister a model or workflow

List workflows