# LMQL Features
This notebook presents some of the features of LMQL with local models.

Create the endpoint for model inference:  
```
lmql serve-model mistralai/Mistral-7B-Instruct-v0.1 --repetition_penalty 1.1 --load_in_4bit True --cuda --port 9999
```

*See the `examples/local.ipynb` to see how to setup a local model.*

In [8]:
import lmql
import nest_asyncio
nest_asyncio.apply()

model_name = "mistralai/Mistral-7B-Instruct-v0.1"

llm = lmql.model(model_name, endpoint="localhost:9999")

BOS_TOKEN = "<s>"
EOS_TOKEN = "</s>"
BINST_TOKEN = "[INST]"
EINST_TOKEN = "[/INST]"

### LLM Placeholders
The basic functionality of LMQL is the ability to generate tokens at specific placeholder locations in text. When LMQL runs, the LLM will generate output according to the previously defined strings and store the generated output. This can be used with both completion models and chat models (the latter still requires prompt tokens, depending on the model used). This allows one to guide the output  

LMQL can be used in Python by decorating a function with the `@lmql.query()` decorator. Within this function, all top level strings will be treated as LMQL queries. F-string operations are supported (allowing arguments to be passed in using `{}`). It is not necessary to return variables, although it is possible (note that the return should take place *inside* a top-level string).

In [None]:
# Stop generation on the first newline
@lmql.query(model=llm)
def capital(country):
    '''lmql
    # Supports f-string placeholders
    "{BOS_TOKEN}{BINST_TOKEN} What is the capital of {country}? {EINST_TOKEN}"
    
    # Guide output
    "The capital city of {country} is[CAPITAL]"
    
    # Return statements must be within the top-level strings
    # return CAPITAL
    '''

result = capital("Denmark")

# Get the entire prompt
print(f"Full prompt: {result.prompt}")

# Access placeholder variables
print(f"CAPITAL variable: {result.variables['CAPITAL']}")

### Constraints and programmatic usage
One of the main features of LMQL is that it allows you to constrain the generated output from an LLM. A detailed overview can be found in the [LMQL documentation](https://lmql.ai/docs/language/constraints.html).

In general, it is possible to use the following constraints:
* Stop generation at specific tokens (i.e., punctuation, newline)
* Limit the length of generated output
* Constrain the output type (i.e. enforce an integer at a specific point)
* Limit possible tokens to a limited number of items (token masking)
* Custom constraints

In [9]:
# Stop generation on the first newline
@lmql.query(model=llm)
def dog_joke():
    '''lmql
    "{BOS_TOKEN}{BINST_TOKEN} Tell me a joke about dogs {EINST_TOKEN}"
    
    # Stop at the first newline
    "Why did the dog[DOG_JOKE]" where STOPS_AT(DOG_JOKE, '\n')
    '''

result = dog_joke()
print(result.prompt)

<s>[INST] Tell me a joke about dogs [/INST]Why did the dog go to the gym? He wanted to get ruff


In [10]:
# Enforce a specific type of the output
@lmql.query(model=llm)
def meaning():
    '''lmql
    "{BOS_TOKEN}{BINST_TOKEN} How many planets are in the solar system? {EINST_TOKEN}"
    
    # Make sure that we get an integer of 2 digits or less
    "There are [N] planets in the solar system" where INT(N) and len(N) < 3
    '''

result = meaning()
print(f"Result: {result.variables['N']}, type: {type(result.variables['N'])}")

Result: 8, type: <class 'int'>


Regex is also supported.

In [11]:
# Limit options to a subset of tokens
@lmql.query(model=llm)
def review_sentiment(review):
    '''lmql
    "{BOS_TOKEN}{BINST_TOKEN} Review: {review}"
    "What is the sentiment of this review? {EINST_TOKEN}"
    
    # Classify the sentiment
    "The sentiment of the review is [SENTIMENT]" where SENTIMENT in ['positive', 'neutral', 'negative']
    '''

result = review_sentiment(review='We had a lovely stay and the food was great!')
print(result.variables['SENTIMENT'])

positive


### Measuring distributions
LMQL allows changing the decoding algorithms which in turn enables access to token probability distributions. This means that LLMs can be used as classifiers, for example by applying the argmax decoding algorithm for a token mask.

Several decoding algorithms, such as argmax and beam search. See more in the [LMQL documentation](https://lmql.ai/docs/language/decoding.html)

In [None]:
# Limit options to a subset of tokens
@lmql.query(model=llm)
def meaning():
    '''lmql
    argmax
        "{BOS_TOKEN}{BINST_TOKEN} What is the best country in scandinavia? {EINST_TOKEN}"
        "The best country in scandinavia is [REASONING]"
        "Therefore, the answer is [COUNTRY]" distribution COUNTRY in ['Denmark', 'Sweden', 'Norway', 'Finalnd', 'Iceland']
    '''

result = meaning()
print(f"Reasoning: {result.variables['REASONING']}")
print(f"Result: {result.variables['COUNTRY']}")
print(f"Token distributions: {result.variables['P(COUNTRY)']}")

### Tool augmentation
It's fairly straightforward to implement tools in LMQL. Coroutines are also supported (`async` and `await`) in the LMQL query (as seen in below example). To use a tool, just define a custom function with the tool interaction and call it from the LMQL query.

In [13]:
# Define a "Wikipedia search" tool using the Wikipedia API
import requests

async def wikipedia_search(term):
    try:
        term = term.strip("\n '.")
        result = requests.get(f"https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles={term}&origin=*")
        result = eval(result.text)
        result = result['query']['pages']
        first_page_id = list(result.keys())[0]
        result = result[first_page_id]['extract']
    except:
        return "No result"
        
    return result

# Limit options to a subset of tokens
@lmql.query(model=llm)
def wikipedia_qa(question):
    '''lmql
    "{BOS_TOKEN}{BINST_TOKEN} {question} {EINST_TOKEN}\n\n"
    
    "Let's search Wikipedia for the term '[TERM]\n\n" where STOPS_AT(TERM, "'")
    
    result = await wikipedia_search(TERM)

    "Background information: {result}"
    
    "Does this result answer the question? [YESNO]" where YESNO in set(['yes', 'no'])
    
    "Final answer: [ANSWER]"
    '''

result = wikipedia_qa(question='Who invented the game Monopoly?')
print(result.prompt)

<s>[INST] Who invented the game Monopoly? [/INST]

Let's search Wikipedia for the term 'Monopoly'

Result: A monopoly (from Greek μόνος, mónos, 'single, alone' and πωλεῖν, pōleîn, 'to sell'), as described by Irving Fisher, is a market with the "absence of competition", creating a situation where a specific person or enterprise is the only supplier of a particular thing. This contrasts with a monopsony which relates to a single entity's control of a market to purchase a good or service, and with oligopoly and duopoly which consists of a few sellers dominating a market. Monopolies are thus characterised by a lack of economic competition to produce the good or service, a lack of viable substitute goods, and the possibility of a high monopoly price well above the seller's marginal cost that leads to a high monopoly profit. The verb monopolise or monopolize refers to the process by which a company gains the ability to raise prices or exclude competitors. In economics, a monopoly is a single