# Zeno Build Tutorial 2: Performing Inference

In this tutorial, we'll how to use
[Zeno Build](https://github.com/zeno-ml/zeno-build/) to perform inference with a
variety of LLMs and visualize/compare the results.
We'll assume that you've already read the
[previous tutorial](01_visualization.ipynb) and have a basic understanding of
how to use Zeno Build to visualize results.

Specifically, we will use models from [Hugging Face](https://huggingface.com) and [OpenAI](https://openai.com) to predict the sentiment of English movie reviews from the [Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/index.html).

## Setup

First, we do some imports. Also make sure that your OpenAI key is set in your `.env` file like in the previous tutorial.

In [None]:
import pandas as pd
from datasets import load_dataset

from zeno_build.evaluation.text_features.exact_match import avg_exact_match, exact_match
from zeno_build.evaluation.text_features.length import input_length, output_length
from zeno_build.experiments.experiment_run import ExperimentRun
from zeno_build.models.lm_config import LMConfig
from zeno_build.models.text_generate import generate_from_text_prompt
from zeno_build.reporting.visualize import visualize

## Preparing Data

Next, we'll import the Stanford Sentiment Treebank from Hugging Face. After performing this import, we will have the input text in `text` and output label in `labels`.

In [None]:
dataset = load_dataset("glue", "sst2", split="validation")
data = list(dataset["sentence"])
label_map = dataset.features["label"].names
labels = [label_map[label] for label in dataset["label"]]
df = pd.DataFrame({"text": data, "label": labels})

Now we'll define a few templates that we use to prompt the model. We'll use different ones for the models we use on Hugging Face (because they mostly specialize in completing text), and for models using OpenAI (because they are more chat-based models).

In [None]:
prompt_templates = {
    "huggingface": (
        "Review: {{text}}\n\n"
        "Q: Is this review a negative or positive review?\n\nA: It is a"
    ),
    "openai_chat": (
        "Review: {{text}}\n\n"
        "Please answer with one word. "
        "Is this review a negative or positive review?"
    ),
}

Now we perform inference. We loop over three models, defined using Zeno Build's `LMConfig` class.
This class defines a "provider" (e.g. Hugging Face or OpenAI) and a model name.

Next we call the `generate_from_text_prompt()` function. This function takes several arguments:
* *Inputs:* Which take the form of a list of dictionaries, where each dictionary contains one or more keys corresponding to the places in the prompt to be filled in.
* *Prompt Template:* A template like the ones we defined above. It can have one or more slots (like `{{text}}`) that are filled in from elements of the input dictionaries.
* *Model Config:* The model configuration as an `LMConfig` object.
* *Generation Parameters:* Including things like `temperature`, `max_tokens`, and `top_p`, mirroring the OpenAI API.
* *Requests Per Minute:* For API-based models such as OpenAI, the maximum number of requests to send per minute (to avoid rate limiting).

Based on the results of this we create an `ExperimentRun`.

In [None]:
all_results = []
for lm_config in [
    LMConfig(provider="openai_chat", model="gpt-3.5-turbo"),
    LMConfig(provider="huggingface", model="gpt2"),
    LMConfig(provider="huggingface", model="gpt2-xl"),
]:
    predictions = generate_from_text_prompt(
        [{"text": x} for x in data],
        prompt_template=prompt_templates[lm_config.provider],
        model_config=lm_config,
        temperature=0.0001,
        max_tokens=1,
        top_p=1.0,
        requests_per_minute=400,
    )
    result = ExperimentRun(
        name=lm_config.model,
        parameters={"provider": lm_config.provider, "model": lm_config.model},
        predictions=[x.strip().lower() for x in predictions],
    )
    all_results.append(result)


Generating the outputs should take several minutes.


## Performing Evaluation/Visualization/Analysis

Next we define functions for analysis, and then call the `visualize` function to perform visualization, as we did in [the previous tutorial](01_visualization.ipynb).

In [None]:

functions = [
    output_length,
    input_length,
    exact_match,
    avg_exact_match,
]

visualize(
    df,
    labels,
    all_results,
    "text-classification",
    "text",
    functions,
    zeno_config={"cache_path": "zeno_cache"},
)

## Next Steps

Up until now, we've been focusing on text classification. In the [next tutorial](03_text_generation.ipynb), we'll look at how to use Zeno Build to evaluate text generation.