# Testing Your Dev Environment

The purpose of this notebook is to ensure that your development environment for this workshop is setup correctly.

## Basic Libraries

Let's start by making sure all of the libraries are installed correctly:

In [1]:
import json
import outlines
import torch
from transformers import AutoTokenizer
from textwrap import dedent

## Loading models

This tutorial was built using `microsoft/Phi-3-medium-4k-instruct` but this can be too large for most laptops/notebook environments. To make this easier there are two good altneratives to Phi-3-medium:

- `microsoft/Phi-3-mini-4k-instruct`: Smaller (but still several GB) model that performs reasonably well.
- `Qwen/Qwen2-0.5B-Instruct`: Very small model, should run on most machine (the default for the notebook).

Unless you have a particularly powerful and high RAM macbook pro that can handle the large models, it is highly recommended that you stick with Qwen2 for the exercises. The results will be fairly mediocre, but it should be very easy to transfer all the skills learned in the exercises to more powerful models either on a desktop at home or hosted in the cloud.

Here is the basic code to load the model and tokenizer (using for generating instruct prompts):

In [2]:
model_name = "Qwen/Qwen2-0.5B-Instruct"
model = outlines.models.transformers(
    model_name,
    # this assumes Apple Silicon
    # Remove or change to 'cuda'/'cpu' if you
    # are using another device.
    device='mps',
    model_kwargs={
        'torch_dtype': torch.bfloat16,
        'trust_remote_code': True
    })
tokenizer = AutoTokenizer.from_pretrained(model_name)

Then we create a simple prompt:

In [3]:
messages = [
    {
        "role": "user",
        "content": "Generate a phone number"
    },
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False)
prompt

'<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nGenerate a phone number<|im_end|>\n'

## Unstructured Generation

This code will test *unstructured* generation with outlines. If you can run this in a reasonable amount of time you have selected the appropiate sized model for you laptop.

In [4]:
generator = outlines.generate.text(model)

In [5]:
generator(prompt,max_tokens=24)

We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)


'茫然'

## Structured Generation

Finally we run an example using *structured* generation. If you can run this code, then everything should run fine for the rest of the exercises.

In [6]:
generator_struct = outlines.generate.regex(model, r"\([0-9]{3}\) [0-9]{3}-[0-9]{4}")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Compiling FSM index for all state transitions: 100%|████| 15/15 [00:00<00:00, 33.31it/s]


In [7]:
generator_struct(prompt)

'(100) 555-1234'