# Using LMQL with local models (i.e. HuggingFace, Llama, etc.)
Make sure to have followed the setup instructions in the `README.md` and that you have a large enough GPU. This example will use the Mistral 7B models ([Mistra.ai](https://mistral.ai/news/announcing-mistral-7b/)).

### Creating a model service endpoint
Local models have to be hosted using LMQL ´serve-model´. This is run in the terminal to create an endpoint that is used in the code. Read more here: [LMQL - Local Models / Transformers](https://lmql.ai/docs/models/hf.html)

It is possible to add additional parameters that are normally given in the `AutoModelForCausalLM.from_pretrained(...)` HuggingFace function ([see documentation](https://huggingface.co/transformers/v3.0.2/model_doc/auto.html#transformers.AutoConfig.from_pretrained)). This allows setting parameters such as the `repetition_penalty` or `load_in_4bit` (quantization). To do this, specify the parameter with double dashes and the parameter value after a space (i.e. `--load_in_4bit True`). Use the `port` parameter to specify the port number and add `--cuda` to load the model on GPU.

This is a basic example of loading the Mistral 7B Instruct model on GPU with an increased repetition penalty and quantization.

Run in the terminal:  
```
lmql serve-model mistralai/Mistral-7B-Instruct-v0.1 --repetition_penalty 1.1 --load_in_4bit True --cuda --port 9999
```

In [1]:
import lmql
import nest_asyncio
nest_asyncio.apply()

model_name = "mistralai/Mistral-7B-Instruct-v0.1"

llm = lmql.model(model_name, endpoint="localhost:9999")

  from .autonotebook import tqdm as notebook_tqdm


### Prompt formats and completion/chat models
With local GPTs, it's critical to distinguish between two model types: *completion models* and *chat/instruct models*. This has important implications for the way the model should be prompted. I recommend reading this great article on the difference between the two: [Language Models - Completion and Chat-Completion](https://langroid.github.io/langroid/blog/2023/09/19/language-models-completion-and-chat-completion/)

**Prompt formats and special tokens**  
Chat models (often called chat, instruct, hf (as in "human feedback"), or similar) have been trained using special tokens to teach the model the difference between user input and assistant output. When using a model, it's important to understand what special tokens and prompt format is expected. The Mistral models use a similar input to Llama 2. More details can be found here: [Prompt Engineering Guide - Mistral 7B LLM](https://www.promptingguide.ai/models/mistral-7b).

In [2]:
BOS_TOKEN = "<s>"
EOS_TOKEN = "</s>"
BINST_TOKEN = "[INST]"
EINST_TOKEN = "[/INST]"

emotions_list = ["happy", "sad", "dissapointed", "angry", "frustrated", "joyful", "anxious"]

@lmql.query(model=llm)
def meaning():
    '''lmql
    argmax
        "{BOS_TOKEN}{BINST_TOKEN} I have stolen my dog's toy. What emotion do you think my dog will feel? {EINST_TOKEN}"
        
        "Let's think about this for a second... [ANALYSIS]" where not "\n" in ANALYSIS
        
        "Based on this analysis, the emotion your dog will most likely experience is: [EMOTION]" distribution EMOTION in emotions_list
    '''

result = meaning()

In [3]:
result.prompt

"<s>[INST] I have stolen my dog's toy. What emotion do you think my dog will feel? [/INST]Let's think about this for a second...  If you stole your dog's toy, it's likely that your dog would feel emotions such as sadness, frustration or anger. They may also exhibit behaviors such as whining, pacing, or loss of appetite. It's important to remember that dogs form strong emotional bonds with their toys, and taking something they value away can cause distress.Based on this analysis, the emotion your dog will most likely experience is: dissapointed"

In [4]:
print(result.variables['ANALYSIS'])

 If you stole your dog's toy, it's likely that your dog would feel emotions such as sadness, frustration or anger. They may also exhibit behaviors such as whining, pacing, or loss of appetite. It's important to remember that dogs form strong emotional bonds with their toys, and taking something they value away can cause distress.


In [5]:
print(result.variables['P(EMOTION)'])

[('happy', 0.0005932049404840637), ('sad', 0.0865745248912329), ('dissapointed', 0.6188276273220179), ('angry', 0.07515209122145111), ('frustrated', 0.1612957475629304), ('joyful', 0.0035589946695807942), ('anxious', 0.05399780939230316)]
