# Generate
This notebook demonstrate the `tailwiz.generate` function. The purpose of the function is to generate a response to a `prompt`. You have the option of passing in labeled data as references that the AI will use to generate text for the unlabeled data's prompts. If no labeled data is passed in, text will still be generated in response to your data's `prompt`s, but possibly with unexpected results.

The main difference between `tailwiz.parse` and `tailwiz.generate` is that, with `tailwiz.parse`, the labels must be extracted directly from the text. By contrast, `tailwiz.generate` is able to generate labels simply given a prompt.

In [None]:
# Import required packages.
import tailwiz
import pandas as pd

## 1. Data prep
First, we read in our example data from a .csv file using the `pandas` library.

In [None]:
labeled_data = pd.read_csv('data/tweets-labeled.csv')
unlabeled_data = pd.read_csv('data/tweets-unlabeled.csv')

Our example data is Twitter data. It consists of tweets (`text`), the tweet sentiment (`sentiment`, either positive or negative), and an excerpt that identifies the sentiment of the tweet (`selected_text`). We have 200 labeled examples and ~3K unlabeled examples. We will focus on the tweet sentiments in this notebook: our goal will be to use `tailwiz` to label the 3K unlabeled examples using our 200 labeled examples as references.

Note that in the `tailwiz.classify` example notebook (`classify.ipynb`), we treat the positive/negative labels as classes into which we classify the rest of our unlabeled data. In this notebook, we treat the positive/negative labels as responses to the prompt, "Label this tweet as either "positive" or "negative" based on its sentiment: ...".

Also note that providing more labeled examples will generally improve performance.

Below is a preview of our data.

In [None]:
# View first 5 rows of labeled data.
labeled_data.head()

In [None]:
# View first 5 rows of unlabeled data.
unlabeled_data.head()

We must now create prompts. This might require some creativity, since you must compose a prompt specific to your desired outcome. Note that the prompt is the only text that will be passed to `tailwiz.generate`, so it should contain as much relevant information as possible. Below is an example.

In [None]:
prompt = 'Label this tweet as either "positive" or "negative" based on its sentiment: '

# We give all examples the same prompt beginning.
labeled_data['prompt'] = prompt + labeled_data.text
unlabeled_data['prompt'] = prompt + unlabeled_data.text

In [None]:
# Before calling tailwiz.generate with our data, we must rename our label column in accordance to `tailwiz.generate` standards.
# Specifically, the prompt column must be named 'prompt' (it already is), and the label column must be named 'label' (it is currently
# named sentiment).
labeled_data = labeled_data.rename(columns={'sentiment': 'label'})
unlabeled_data = unlabeled_data.rename(columns={'sentiment': 'label'})

## 2. Call `generate` function
The next step is to call `tailwiz.generate`! We set `output_metrics` to `True` to also output an estimate of the performance of our classification job.

This may take a few minutes (5-15 minutes). If this is your first time running `tailwiz.generate`, you might see some extra downloads.

In [None]:
results, performance_estimate = tailwiz.generate(
    unlabeled_data[['prompt']],
    labeled_examples=labeled_data[['prompt', 'label']],
    output_metrics=True,
)

## 3. Inspect and save results
After generating responses for our unlabeled data, we can inspect and save results.

First, let's inspect the first five rows to do a quick sanity check. Note the new column, `label_from_tailwiz`.

In [None]:
results.head()

We can also print out our performance estimate to gain some additional insight to our labels.

In [None]:
performance_estimate

Note that this is only an estimate based on your labeled data. We will not know for certain how the text generation job actually performed on the unlabeled data.

Finally, we can save these results:

In [None]:
results.to_csv('data/tweets-unlabeled-with-generate-results-from-tailwiz.csv', index=False)  # We set index to False to avoid saving the index column added by pandas.