# Sequence Classification task

Vespa has [recently implemented](https://blog.vespa.ai/stateless-model-evaluation/)
accelerated model evaluation using ONNX Runtime in the stateless cluster.
This opens up new usage areas for Vespa, such as serving model predictions.

## Define the model server

Define the task and the model to use. The [SequenceClassification](https://pyvespa.readthedocs.io/en/latest/reference-api.html#sequenceclassification) task takes a text input and return an array of floats that depends on the model used to solve the task. The `model` argument can be the id of the model as defined by the huggingface model hub.

In [1]:
from vespa.ml import SequenceClassification

task = SequenceClassification(
    model_id="bert_tiny", 
    model="google/bert_uncased_L-2_H-128_A-2"
)

A `ModelServer` is a simplified application package focused on stateless model evaluation. It can take as many tasks as we want.

In [2]:
from vespa.package import ModelServer

model_server = ModelServer(
    name="bert_model_server",
    tasks=[task],
)

## Deploy the model server

We can either host our model server on Vespa Cloud or deploy it locally using a Docker container.

### Host it on VespaCloud

Check [this short guide](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-cloud.html)
for detailed information about how to setup your Vespa Cloud account
and where to find the environment variables defined below.

In [3]:
import os

os.environ["TENANT_NAME"] = "vespa-team"
os.environ["APPLICATION_NAME"] = "pyvespa-integration"
if (os.getenv("VESPA_CLOUD_USER_KEY")):
    with open(os.path.join(os.getenv("WORK_DIR"), "key.pem"), "w") as f:
        f.write(os.getenv("VESPA_CLOUD_USER_KEY").replace(r"\n", "\n"))
    os.environ["USER_KEY"] = os.path.join(os.getenv("WORK_DIR"), "key.pem")
    os.environ["INSTANCE_NAME"] = "test"
os.environ["DISK_FOLDER"] = os.path.join(os.getenv("WORK_DIR"), "sample_application")

In [None]:
from vespa.deployment import VespaCloud

if (os.getenv("VESPA_CLOUD_USER_KEY")):
    vespa_cloud = VespaCloud(
        tenant=os.getenv("TENANT_NAME"),
        application=os.getenv("APPLICATION_NAME"),
        key_location=os.getenv("USER_KEY"),
        application_package=model_server,
    )
    app = vespa_cloud.deploy(
        instance=os.getenv("INSTANCE_NAME"), disk_folder=os.getenv("DISK_FOLDER")
    )

### Deploy locally

Similarly, we can deploy the model server locally in a Docker container.

In [None]:
from vespa.deployment import VespaDocker

vespa_docker = VespaDocker(disk_folder=os.getenv("DISK_FOLDER"), port=8081)
app = vespa_docker.deploy(application_package=model_server)

## Get model information

Get models available:

In [6]:
app.get_model_endpoint()

{'bert_tiny': 'http://localhost:8081/model-evaluation/v1/bert_tiny'}

Get information about a specific model:

In [7]:
app.get_model_endpoint(model_id="bert_tiny")

{'model': 'bert_tiny',
 'functions': [{'function': 'output_0',
   'info': 'http://localhost:8081/model-evaluation/v1/bert_tiny/output_0',
   'eval': 'http://localhost:8081/model-evaluation/v1/bert_tiny/output_0/eval',
   'arguments': [{'name': 'input_ids', 'type': 'tensor(d0[],d1[])'},
    {'name': 'attention_mask', 'type': 'tensor(d0[],d1[])'},
    {'name': 'token_type_ids', 'type': 'tensor(d0[],d1[])'}]}]}

## Get predictions

Get a prediction:

In [8]:
app.predict(x="this is a test", model_id="bert_tiny")

[0.053629081696271896, -0.01650623418390751]