In [1]:
import json
import outlines
from transformers import AutoTokenizer
import torch
from textwrap import dedent

In [3]:
model_name = "microsoft/Phi-3-medium-4k-instruct"
model = outlines.models.transformers(
    model_name,
    model_kwargs={
        'torch_dtype': torch.bfloat16,
        'trust_remote_code': True
    })
tokenizer = AutoTokenizer.from_pretrained(model_name)

`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attenton` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.


Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [4]:
departments = ["clothing","electronics","kitchen","automotive"]

In [5]:
with open("../examples.json",'r') as fin:
    complaint_data = json.loads(fin.read())

In [12]:
complaint_data

[{'message': 'Hi, my name is Jessica Sanders.I recently ordered a baking set from your store which contained a variety of tools for pastry preparation. However, upon arrival, I found that the silicone spatula I received was cracked and unusable.The order number is D34-5271',
  'order_number': 'D34-5271',
  'department': 'kitchen'},
 {'message': 'Hi, my name is Sarah Miller.I recently ordered a knife and received my package today. Upon opening, I discovered that the knife blade was chipped along the edge, which makes it unsafe and unusable.My order was Z456789',
  'order_number': 'Z45-6789',
  'department': 'kitchen'},
 {'message': 'Hi, my name is Jessica Wilson.I recently ordered a set of casual shirts for work, but upon arrival, I was disappointed to find that the colors were faded and not as shown in the images on your website.This is order A12-3456',
  'order_number': 'A12-3456',
  'department': 'clothing'},
 {'message': 'Hi, my name is Alex Carter.I recently ordered a stainless ste

In [6]:
def create_prompt(complaint):
    prompt_messages = [{
        "role": "user",
        "content": dedent("""
        I'm going to provide you with a consumer complaint to analyze.
        The complaint is going to be regarding a product from one of our
        departments. Here is the list of departments:
            - "clothing"
            - "electronics"
            - "kitchen"
            - "automotive"
        Please reply with *only* the name of the department.
        """)
    },{
        "role": "assistant",
        "content": "I understand and will only answer with the department name"
    },{
        "role": "user",
        "content": f"Great! Here is the complaint: {complaint['message']}"
    }
                       
                      ]
    prompt = tokenizer.apply_chat_template(prompt_messages, tokenize=False)
    return prompt

In [7]:
create_prompt(complaint_data[0])

'<|user|>\n\nI\'m going to provide you with a consumer complaint to analyze.\nThe complaint is going to be regarding a product from one of our\ndepartments. Here is the list of departments:\n    - "clothing"\n    - "electronics"\n    - "kitchen"\n    - "automotive"\nPlease reply with *only* the name of the department.\n<|end|>\n<|assistant|>\nI understand and will only answer with the department name<|end|>\n<|user|>\nGreat! Here is the complaint: Hi! This is Jessica Thompson.I recently ordered a high definition television for my living room. Upon unboxing and attempting to set it up, I discovered that the screen exhibits unusual flickering patterns that donumber in various brightness settings.My order was D123456<|end|>\n<|assistant|>\n'

## Unstructured Generation

In [8]:
generator = outlines.generate.text(model)

In [9]:
for complaint in complaint_data:
    prompt = create_prompt(complaint)
    result = generator(prompt, max_tokens=12)
    print(f"LLM: {result} Actual: {complaint['department']}")

We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
You are not running the flash-attention implementation, expect numerical differences.


LLM: "electronics" Based on the details provided Actual: electronics
LLM: electronics I comprehend the challenge and will stick Actual: electronics
LLM: automotive The correct department is "automot Actual: kitchen
LLM: electronics Given the information provided in the customer compla Actual: electronics
LLM: automotive Thank you for providing the complaint Actual: automotive
LLM: clothing The complaint pertains to a Actual: clothing
LLM: automotive The manner of speech and the details Actual: automotive
LLM: automotive Based on the provided complaint, Actual: automotive
LLM: Based on the complaint regarding a knife set, the Actual: kitchen
LLM: clothing I apologize for the inconvenience Actual: clothing


Too much yapping! This is not reliable for a real work system!

## Structured Generation

In [10]:
generator_struct = outlines.generate.choice(model,departments)

Compiling FSM index for all state transitions: 100%|██████████████████| 34/34 [00:00<00:00, 192.43it/s]


In [11]:
for complaint in complaint_data:
    prompt = create_prompt(complaint)
    result = generator_struct(prompt, max_tokens=12)
    print(f"LLM: {result} Actual: {complaint['department']}")

LLM: electronics Actual: electronics
LLM: electronics Actual: electronics
LLM: electronics Actual: kitchen
LLM: electronics Actual: electronics
LLM: automotive Actual: automotive
LLM: clothing Actual: clothing
LLM: automotive Actual: automotive
LLM: automotive Actual: automotive
LLM: kitchen Actual: kitchen
LLM: clothing Actual: clothing
