## Initialisation
For the examples in the notebook to work as intended, the openai and dotenv(python-dotenv) libraries need to be installed in the environment where the notebook is started. If necessary, run `!pip install <library>` in a code cell to install a missing library.

Beforre doing something interesting, we need to import some libraries and methods, and we need to provide a valid openai key (below, this is read from a local .env - file).

In [None]:
# Libraries used in this chapter
import os
import openai
from openai import OpenAI
from dotenv import load_dotenv, find_dotenv
import pandas as pd
import numpy as np
import tiktoken
import json
from pprint import pprint

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap, ListedColormap
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity

# useful pandas - setting
pd.options.mode.copy_on_write = True

In [None]:
# reading openai-key and initialising client

_ = load_dotenv(find_dotenv()) # reading from local .env - file
openai.api_key  = os.environ['OPENAI_API_KEY']

client=OpenAI()

## 1. Basic usage of the ChatCompletions endpoint

### 1.1 Calling the API and getting a response... 
To answer a natural language question using the API, we need to...  
 - initiate the client
 - choose a model using the "model" parameter (see [OpenAI model docs](https://platform.openai.com/docs/models/overview) for updated details)
 - intitiate the interaction using the "message" parameter

In [None]:
# initiating the client and making a first request using the "message" parameter

# initiating the interaction
messages = [  
{"role": "system", "content": "You are a helpful tourist information agent in Bergen, Norway."},    
{"role": "user", "content": "Hvor finner jeg det berømte akvariet?"}  
] 

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=messages
)

pprint(response.choices[0].message.content, width=80)

### 1.2 Useful chat.completions.create - parameters: *max_tokens, temperature, top_p and n*   
***temperature***: adjusts the "creativity" of the responses: a higher temperature gives more imaginative and varied responses, but also a higher frequency of errors and hallucinations. Default=1, max=2, min=0. To get repeatable, strictly factual answers use *temperature*=0.   
***top_p***: cutoff probablity value for predicted tokens. Using a lower value means using a stricter "filter" when adding new elements in the response, resulting in shorter, less varied and more to-the-point answers. Default=1. 
Note: openAI recommends to use only use one of the *temperature* and *top_p* parameters for tuning, and leave the other at default (=1).  
***max_tokens***: limit for the number of tokens used (prompt and response in total).  
***n***: the number of responses returned. Generating several responses to the same query can in some situations be used as input to multi-step processes selecting the "best" response among several (according to some evaluation criteria).

In [None]:
# using top_p or temperature to adjust "creativity"
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=messages,
    temperature=0,
    top_p=1
)
pprint(response.choices[0].message.content, width=80)

In [None]:
# generating several responses - try adjusting the temperature
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=messages,
    temperature=1,
    n=3
)
for c in response.choices:
    pprint('Svar nr ' + str(c.index+1) + ": "+  c.message.content, width=80)
    print('\n')

In [None]:
# temperature=0 does not necessarily eliminate Hallucinations
# prompt adjustment: You are very factual. If you do not know something say you do not know.

messages = [  
{"role": "system", "content": "You are a tourist information agent in Bergen, Norway."},    
{"role": "user", "content": "Hvor finner jeg kontoret til Tryg forsikring?"}  
] 

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=messages,
    temperature=0
)

pprint(response.choices[0].message.content, width=80)

### 1.3 Some additional chat.completions.create - parameters 
***frequency_penalty***: Reduces the probability of adding tokens accoording to the frequency of which they appear in the preceding text. Scale: -2 to 2, a higher positive value means a stricter "penalty" for high frequency.<br/>
***presence_penalty***: Reduces the probability of adding tokens if they occur in the preceding text. Scale: -2 to 2, a higher positive value indicates a stricter "penalty" for earlier occurence.<br/>
***stop***: kan (optionally) be used to let selected words og word combinations terminate the answer, if they occur.<br/>
***seed***: (beta-funksjonalitet) supply a seed value to generate a deterministic sample (however this functionality is in beta, and a deterministic sample is not guaranteed).  

## 2. The chat completions - object (the response you get when calling chat.completions.create) 

The chat completions - object contains a lot of elements, however many are just relatively uninteresting reference information that one would normally ignore. See the official [API-documentation](https://platform.openai.com/docs/api-reference/chat/object) for a general (and up-to-date) reference.

The most interesting parts here are the *choices* and to some extent *usage* lists.

<img src="resources/chat_completion.png" alt="OpenAI chatCompletion" />

The **usage** (list) provides a count of tokens used in the prompt, in the response, and in total. These figures can be useful for cost control and in situations when context token limitations are an issue. 

In [None]:
# inspecting the "usage" list
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=messages,
    temperature=0,
    top_p=1
)
pprint(dict(response.usage))


The ***choices*** list contains the responses generated by the model.<br/> 
Each ***choices[index]***- list element in turn contains the elements: 
 - finish_reason: indicates why the reponse generation was terminated - useful for identifying technical issues like problems with context length
 - logprobs: gives probability values for each token in the generated response
 - message (liste): the chat messages and some details on how each response has been generated

In [None]:
# inspecting an element in the "choices" list
pprint(dict(response.choices[0]), depth=1, width=80)

The ***choices[index].message*** - list contains the actual response text as well as some additional information concerning how the response was generated:  
- role: the role providing the response (normally "assistant") 
- content: the response 
- tool_calls: describing any tool calls (see below) the agent has done to produce the response

Note that the content of "messages" in the call to chat.completions.create is NOT included in the response, so if you want to keep track of a several-step conversation you will need to implement some way of storing the conversation history. 

In [None]:
# inspecting the "message" list for a "choices" list element
pprint(dict(response.choices[0].message), depth=1, width=80)

## 3. Tools og Tool Calls - additional super-powers for the ChatCompletions API 
Using the ChatCompletions API, it is possible to give the LLM access to additional "tools" to produce responses - for example, it can be supplied with updated information from the web. 

The API cannot call external systems on its own, but it can be provided with a formal description of available functions - tools - that it can use. When processing a request, the LLM then decides whether to use a tool. If it wants to use a tool, it returns a function call in the specified format. Actually calling the function in question has to be done in the application calling the API (with the appropriate caution and security checks).

The typical steps for using function calling in an application can be summarized as follows:
1. Call the LLM model with the user query and a set of functions defined in the functions parameter.
2. Based on the query and the function description(s), the model choose whether to call one or more functions. If using a function, the message content will be a JSON object matching the function description.
3. Parse the returned string into JSON in your application code, and call your function with the provided arguments.
4. Call the LLM model again by appending the function response as a new message (role="tool") and finally let the model summarize the results back to the user.

A simple (and somewhat clumsy) practical example of this approach in action can be found below.

In [None]:
# a dummy function to call

# A hard coded "weather app" function - could be replaced with external API call or similar
def get_current_weather(location):
    """Get the current weather in a given location"""
    if "bergen" in location.lower():
        return json.dumps({"location": "Bergen", "temperature": "-10", "unit": "celcius", "weather": "heavy snow"})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "85", "unit": "fahrenheit", "weather": "sunny"})
    elif "paris" in location.lower():
        return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius", "weather": "cloudy"})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

# Also, we need a json description of the "weather app" function: name, free-text description and parameters
tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        }
                    },
                    "required": ["location"],
                },
            },
        }
    ]

In [None]:
# run_conversation: helper function to answer a random user query, potentially using a tool

def run_conversation(query):
      # Step 1: initiate the conversation - send the query and available functions to the model
    messages = [{"role": "user", "content": query}]
    
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-1106",
        messages=messages,
        temperature=1.5,
        tools=tools, # function description 
        tool_choice="auto",  # auto is the default, can be overrun
    )

    # Step 2: store the initial response message, and check if the model wanted to call a function
    response_message = response.choices[0].message
    messages.append(response_message)
    tool_calls = response_message.tool_calls

    if tool_calls:
        # Step 3 (optional): call the function(s)
        # There is only one function in this example, but you can have multiple
        available_functions = {
            "get_current_weather": get_current_weather,
        } 
        
        for tool_call in tool_calls:
            # call chosen function, with arguments as specified by the model
            function_name = tool_call.function.name
            function_to_call = available_functions[function_name]
            function_args = json.loads(tool_call.function.arguments) 
            function_response = function_to_call(
                location=function_args.get("location")
            ) 
            
            # extend conversation with function response
            messages.append(
                {
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": function_response,
                }
            ) 
            
        # Step 4 (optional): send data about each function call and response back to the model;
        # get a new response from the model where it can see the function response
        second_response = client.chat.completions.create(
            model="gpt-3.5-turbo-1106",
            messages=messages,
        )  
        messages.append(second_response.choices[0].message)  
    return messages


In [None]:
# function calling in action

# Using the output from run_conversation, we can inspect messages generated for different queries.
# examples: weather in Bergen, The Smiths members, Python lists...
# Generally the agent is quite "trigger happy" when deciding whether to use tools.

test_messages=run_conversation("Who are the members of the Smiths?")

for msg in test_messages: 
    pprint(dict(msg))
    print('\n')

## 4. The Embeddings endpoint

<center><img src="resources/image resolution.jpg" alt="image resolution" width="800" />

OpenAI provides a separate endpoint for converting text input to so-called embeddings, which are (long) vectors of floats. Embeddings can be understood as a dimensional reduction technique for text data and are typically used for comparing text elements for similarity in a quick and computationally cheap way. 

In the context of LLM applications, embeddings are often used as a tool to build relevant and "dense" context information for LLM prompts. Using embeddings and various numerical similarity measures, text parts from large volumes of background information can be ranked according to relevance for a given user query. 

When using the embeddings endpoint, the maximum amount of input tokens allowed and the dimensionality of the returned vectors both depend on the selected embedding model. Using the standard choice of embedding model - *text-embedding-3-small* - the input text chunks can be up to 8191 tokens long and the returned vectors are 1536-dimensional, ie they always contain 1536 numbers.  

### 4.1 A simple embeddings example
Below we use a small set of text samples to illustrate some features of embeddings and the OpenAI embeddings endpoint.

#### 4.1.1 Loading and inspecting text data

In [None]:
#read and print prepared data set with text samples

# helper function to count tokens
def token_count(text, encoding_model):
    encoding=tiktoken.get_encoding(encoding_model)
    n_tokens = len(encoding.encode(text))
    return n_tokens

# read text samples from csv, and add a token count column 
df_text = pd.read_csv('data/text_samples_mat.csv', header=0, sep=';')
df_text['n_tokens']=df_text.apply(lambda row: token_count(row['quote_text'], "cl100k_base"), axis=1)
df_text

In [None]:
# inspection a single quote
print(df_text.loc[2,'quote_text'])

#### 4.1.2 Creating embeddings using the API
Using *embeddings.create* to produce embeddings is rather straightforward - here we use a single string, but the API also accepts arrays of strings. 

In [None]:
# create and inspect embedding

# embedding helper function - returns embedding using selected model
def embed_helper(AIclient, text, model_name):
    embedding=client.embeddings.create(
          model=model_name,
          input=text,
          encoding_format="float")
    return embedding

In [None]:
# produce embedding using the API
test_embedding=embed_helper(client, df_text.loc[2,'quote_text'], "text-embedding-3-small")

# checking dimensionality and inspecting the "raw" embedding ouput:
print('No of elements: '+str(len(test_embedding.data[0].embedding))) 
print(test_embedding.data[0].embedding)

In [None]:
# simple visualisation of embedding vectors

# visualisation helper function
def barplot_embedding(embedding, label_list):
    sns.heatmap(np.array(embedding.data[0].embedding).reshape(-1,1536), cmap="Greys", center=0, square=False, xticklabels=False, cbar=False)
    plt.gcf().set_size_inches(13,1)
    plt.yticks([0.5], labels=[label_list])
    plt.show()

# barplot for embedding selected quotes
test_embedding=embed_helper(client, "", "text-embedding-3-small")
barplot_embedding(test_embedding, "empty string")

test_embedding=embed_helper(client, df_text.loc[0,'quote_text'], "text-embedding-3-small")
barplot_embedding(test_embedding, df_text.loc[0,'title'])

test_embedding=embed_helper(client, df_text.loc[1,'quote_text'], "text-embedding-3-small")
barplot_embedding(test_embedding, df_text.loc[1,'title'])


#### 4.1.3 Ranking text elements by similarity 
Using a vector similarity measure, we can rank the quotes in our "library" for similariy with a given input statement. 

Note that mixing languages in these comparisons works technically, but can give strange or less precise results.

In [None]:
# Helper functions for text comparison

# naive helper function to compute cosine similarity between two strings
def similarity_helper(AIclient, text_1, text_2, embed_model):
    text_1_embedded_np=np.array(embed_helper(AIclient, text_1, embed_model).data[0].embedding).reshape(1,-1)
    text_2_embedded_np=np.array(embed_helper(AIclient, text_2, embed_model).data[0].embedding).reshape(1,-1)
    similarity=cosine_similarity(text_1_embedded_np, text_2_embedded_np)[0,0]
    return similarity

# helper function to compute similarity of data frame column to an input string
def df_add_similarity(AIclient, df_text, df_text_column, input_text, embed_model):   
    df_text['input_similarity']=df_text.apply(lambda row: similarity_helper(AIclient, row[df_text_column], input_text, embed_model), axis=1)
    return df_text

In [None]:
# ranking text elements in a DataFrame according to similarity with a given input string

# define input string here (reference for comparisons):
# input_text="Gourmetpølser har blitt vår nye nasjonalrett, jubler Pølsens Venner"
input_text=df_text.loc[1, 'quote_text']

# include reference text in DataFrame 
df_input=pd.DataFrame(data={'quote_id':100, 'quote_name':'Input', 'author': 'user', 'title':'Input' , 'quote_text': input_text}, index=[10])
df_total=pd.concat([df_text, df_input])

# calculate similarity
df_add_similarity(client, df_total, 'quote_text', input_text, "text-embedding-3-small")

In [None]:
# creating a simple similarity illustration
values=list(df_total.sort_values('input_similarity', ascending=True)['input_similarity'])
names=list(df_total.sort_values('input_similarity', ascending=True)["quote_text"].str[:50])
color_labels=list(df_total.sort_values('input_similarity', ascending=True)['author'])

cmap='tab20'
color_map = plt.colormaps[cmap].resampled(20)
color_label_set=list(set(color_labels))
colors=list()
for c in color_labels:
    colors.append(color_map(color_label_set.index(c)/len(color_label_set)))
                  
fig, ax = plt.subplots(figsize=(10,5))
ax.barh(names, values, color=colors)
plt.show()

To get a better feel for the possibilities and limitations of embeddings, feel free to repeat and expand on the example with your own quote data set.

## 5. Moderations

The Moderations endpoint scores input text towards a range of different types of unwanted or inappropriate content, and can be used for basic moderation of chat dialogues. In addition to score values, the API also flags input scoring above a treshold value. 

In [None]:
moderation = client.moderations.create(input="Jeg hater alle 50-åringer fra Kolbotn og vil flå dem levende")
pprint(dict(moderation.results[0].category_scores))