# Generate
This notebook demonstrate the `tailwiz.generate` function. The purpose of the function is to generate a response to a `prompt`. You have the option of passing in labeled data as references that the AI will use to generate text for the unlabeled data's prompts. If no labeled data is passed in, text will still be generated in response to your data's `prompt`s, but possibly with unexpected results.

The main difference between `tailwiz.parse` and `tailwiz.generate` is that, with `tailwiz.parse`, the labels must be extracted directly from the text. By contrast, `tailwiz.generate` is able to generate labels simply given a prompt.

In [None]:
###################################################################
#######            START - Edits variables here.            #######

# Instructions or question describing your task. The prompt will be
# attached to the beginning of each of your texts.
prompt = 'Extract the most important phrase in determining the sentiment of the text.'

# Path of data (csv) to be used to generate text.
to_generate = 'data/tweets.csv'
# Column name of the text to be used to generate text.
to_generate_text_col = 'text'

# Path of labeled data (csv) that tailwiz learn from.
labeled_examples = 'data/tweets-with-labels.csv'
# Column name of the text to be learned by tailwiz.
labeled_examples_text_col = 'text'
# Column name of the label to be learned by tailwiz.
labeled_examples_label_col = 'sentiment'

# Path to where you want to save your results.
save_csv = 'data/tweets-with-tailwiz-labels.csv'

##################################################################
#######   END - Leave unedited to run with example data.   #######

The example data consists of tweets (`text`), the tweet sentiment (`sentiment`, positive or negative), and an excerpt that identifies the sentiment of the tweet (`selected_text`). We have 200 labeled examples and ~3K unlabeled examples. Our goal will be to use `tailwiz` to label the 3K unlabeled examples using our 200 labeled examples as references. Providing more labeled examples will generally improve performance.

## 1. Install `tailwiz`

In [None]:
!python -m pip install --upgrade tailwiz

In [None]:
# Import required packages.
import tailwiz
import pandas as pd

## 2. Data prep
First, we read in our example data from a .csv file using the `pandas` library.

In [None]:
labeled_examples = pd.read_csv('data/tweets-with-labels.csv')
to_generate = pd.read_csv('data/tweets.csv')

Our example data is Twitter data. It consists of tweets (`text`), the tweet sentiment (`sentiment`, either positive or negative), and an excerpt that identifies the sentiment of the tweet (`selected_text`). We have 200 labeled examples and ~3K unlabeled examples. We will focus on the tweet sentiments in this notebook: our goal will be to use `tailwiz` to label the 3K unlabeled examples using our 200 labeled examples as references.

Note that in the `tailwiz.classify` example notebook (`classify.ipynb`), we treat the positive/negative labels as classes into which we classify the rest of our unlabeled data. In this notebook, we treat the positive/negative labels as responses to the prompt, "Label this tweet as either "positive" or "negative" based on its sentiment: ...".

Also note that providing more labeled examples will generally improve performance.

Below is a preview of our data.

In [None]:
# View first 5 rows of labeled data.
labeled_examples.head()

In [None]:
# View first 5 rows of unlabeled data.
to_generate.head()

In [None]:
# Before calling tailwiz.generate with our data, we must rename our label column in accordance to `tailwiz.generate` standards.
labeled_examples = labeled_examples.rename(columns={labeled_examples_label_col: 'label'})

We must create a prompt column. `tailwiz` will attempt to follow the prompt to generate labels.

In [None]:
labeled_examples['prompt'] = prompt + ' ' + labeled_examples[labeled_examples_text_col]
to_generate['prompt'] = prompt + ' ' + to_generate[to_generate_text_col]

## 3. Call `generate` function
The next step is to call `tailwiz.generate`! We set `output_metrics` to `True` to also output an estimate of the performance of our classification job.

This may take a few minutes (5-15 minutes). If this is your first time running `tailwiz.generate`, you might see some extra downloads.

In [None]:
results, performance_estimate = tailwiz.generate(
    to_generate[['prompt']],
    labeled_examples=labeled_examples[['prompt', 'label']],
    output_metrics=True,
)

## 3. Inspect and save results
After generating responses for our unlabeled data, we can inspect and save results.

First, let's inspect the first five rows to do a quick sanity check. A new column, `tailwiz_label`, contains the newly generated labels.

In [None]:
results.head()

We can also print out our performance estimate to gain some additional insight to our labels.

In [None]:
performance_estimate

Note that this is only an estimate based on your labeled data. We will not know for certain how the text generation job actually performed on the unlabeled data.

Finally, we can save these results:

In [None]:
results.to_csv(save_csv, index=False)  # We set index to False to avoid saving the index column added by pandas.