In [1]:
import os
from dotenv import load_dotenv

In [2]:
load_dotenv()

True

# 1. OpenAI API and its Cost

In [3]:
import openai

Let's look at how the OpenAI API works:

In [4]:
question = "What is the closest star to Earth?"

messages = [
    {
        "role": "user",
        "content": question
    }
]

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=messages,
    temperature=0
)

In [5]:
response.keys()

dict_keys(['id', 'object', 'created', 'model', 'choices', 'usage'])

In [6]:
response.choices[0].keys()

dict_keys(['index', 'message', 'finish_reason'])

In [7]:
print(response.choices[0].message)

{
  "role": "assistant",
  "content": "The closest star to Earth is the Sun."
}


This is the answer "message". Note that the AI is called "assistant".

Another important information is the number of tokens used, as that tells us the cost of the API call:

In [8]:
print(response.usage)

{
  "prompt_tokens": 15,
  "completion_tokens": 9,
  "total_tokens": 24
}


Looking up the costs on openai website, we find:
- 0.0015 USD per 1000 prompt tokens
- 0.0020 USD per 1000 completion tokens

So, the total cost is:

In [9]:
prompt_cost = 0.0015
completion_cost = 0.0020

cost = (response.usage.prompt_tokens * prompt_cost + response.usage.completion_tokens * completion_cost)/1000

In [10]:
print(f"{cost:.6f} USD")

0.000041 USD


Let's wrap this together in a class:

In [11]:
class BaseAgent:
    
    def __init__(self, model="gpt-3.5-turbo", prices=[0.0015, 0.0020]):
        self.model = model
        self.prompt_cost = prices[0]
        self.completion_cost = prices[1]
        self.cost_list = []
        self.total_cost = 0
        
    def reply(self, question):
        
        messages = [
            {
                "role": "user",
                "content": question
            }
        ]
        
        response = openai.ChatCompletion.create(
            model=self.model,
            messages=messages,
            temperature=0
        )
        
        self.calculate_cost(response.usage)
        
        message = response.choices[0].message
        
        print(message.content)

    def calculate_cost(self, usage):
        cost = (usage.prompt_tokens * self.prompt_cost + usage.completion_tokens * self.completion_cost)/1000
        self.cost_list.append(cost)
        self.total_cost += cost

Test it:

In [12]:
agent = BaseAgent()

In [13]:
agent.reply("What is the closest star to Earth?")

The closest star to Earth is the Sun.


In [14]:
print(f"{agent.total_cost:.6f} USD")

0.000041 USD


Let's ask it another question:

In [15]:
agent.reply("What does 'USD' stand for in the forex context?")

In the forex context, 'USD' stands for United States Dollar. It is the official currency of the United States and is widely used as a reserve currency around the world.


In [16]:
print(f"{agent.total_cost:.6f} USD")

0.000139 USD


We can see that `total_cost` is accumulating the costs of all questions asked.

In [17]:
sum(agent.cost_list)

0.00013900000000000002

# 2. It's Hard to have a Conversation with Someone that doesn't Listen

In [18]:
agent = BaseAgent()

In [19]:
agent.reply("Hi, my name is Andrea.")

Hello Andrea! How can I assist you today?


In [20]:
agent.reply("What is my name?")

I'm sorry, but I don't have access to personal information about individuals unless it has been shared with me in the course of our conversation.


Problem: the each call to the LLM is independent than the previous one, so the AI is ignoring the history of the conversation. To fix this, we need to give it a "memory":

In [21]:
class Memory:
    
    def __init__(self):
        self.messages = []
        
    def add(self, message):
        self.messages.append(dict(message))

    def __str__(self):
        return "\n".join([str(message) for message in self.messages])

Let's modify the `BaseAgent` class to make use of the memory:

In [22]:
class BaseAgent:
    
    def __init__(self, memory, model="gpt-3.5-turbo", prices=[0.0015, 0.0020]):
        self.memory = memory
        self.model = model
        self.prompt_cost = prices[0]
        self.completion_cost = prices[1]
        self.cost_list = []
        self.total_cost = 0
        
    def reply(self, question):
        
        human_message = {
            "role": "user",
            "content": question
        }
        self.memory.add(human_message)
        
        response = openai.ChatCompletion.create(
            model=self.model,
            messages=self.memory.messages,
            temperature=0
        )
        
        self.calculate_cost(response.usage)
        
        agent_message = response.choices[0].message
        self.memory.add(agent_message)
        
        print(agent_message.content)

    def calculate_cost(self, usage):
        cost = (usage.prompt_tokens * self.prompt_cost + usage.completion_tokens * self.completion_cost)/1000
        self.cost_list.append(cost)
        self.total_cost += cost

Test it:

In [23]:
memory = Memory()
agent = BaseAgent(memory=memory)

In [24]:
agent.reply("Hi, my name is Andrea.")

Hello Andrea! How can I assist you today?


In [25]:
print(agent.memory)

{'role': 'user', 'content': 'Hi, my name is Andrea.'}
{'role': 'assistant', 'content': 'Hello Andrea! How can I assist you today?'}


In [26]:
agent.reply("What is my name?")

Your name is Andrea.


In [27]:
print(agent.memory)

{'role': 'user', 'content': 'Hi, my name is Andrea.'}
{'role': 'assistant', 'content': 'Hello Andrea! How can I assist you today?'}
{'role': 'user', 'content': 'What is my name?'}
{'role': 'assistant', 'content': 'Your name is Andrea.'}


With each exchange in the conversation, the whole history has to be sent as input to the LLM. Problems:

1. LLMs have a finite input size
2. providers (such as OpenAI) charge based on the number of tokens

To solve these issues, as the memory grows, we must start deleting the oldest history.

To solve the problem, we define a `delete_history` method in the `Memory` class, which counts the number of tokens of the conversation and, if that is larger than `max_tokens`, it deletes the first message.

In [28]:
import tiktoken

In [29]:
class Memory:
    
    def __init__(self, max_tokens=3000):
        self.messages = []
        self.max_tokens = max_tokens
        self.encoding = tiktoken.get_encoding("cl100k_base")
        
    def add(self, message):
        self.messages.append(dict(message))

    def delete_history(self):
        while True:
            total_tokens = 0
            for message in self.messages:
                message_tokens = len(self.encoding.encode(message["content"]))
                total_tokens += message_tokens
            if total_tokens > self.max_tokens:
                self.messages = self.messages[1:]
            else:
                return

    def __str__(self):
        return "\n".join([str(message) for message in self.messages])

We then modify `BaseAgent` so as to delete the history every time the API is called. We do so by moving the logic to a new method `generate_response`, so that `reply` is cleaner:

In [30]:
class BaseAgent:
    
    def __init__(self, memory, model="gpt-3.5-turbo", prices=[0.0015, 0.0020]):
        self.memory = memory
        self.model = model
        self.prompt_cost = prices[0]
        self.completion_cost = prices[1]
        self.cost_list = []
        self.total_cost = 0
        
    def reply(self, question):

        human_message = {
            "role": "user",
            "content": question
        }
        self.memory.add(human_message)
        
        agent_message = self.generate_response()
        self.memory.add(agent_message)

        print(agent_message.content)

    def generate_response(self):
        
        self.memory.delete_history()
        
        response = openai.ChatCompletion.create(
            model=self.model,
            messages=self.memory.messages,
            temperature=0
        )

        self.calculate_cost(response.usage)

        agent_message = response.choices[0].message
        
        return agent_message
    
    def calculate_cost(self, usage):
        cost = (usage.prompt_tokens * self.prompt_cost + usage.completion_tokens * self.completion_cost)/1000
        self.cost_list.append(cost)
        self.total_cost += cost

Test it:

In [31]:
memory = Memory(max_tokens=30)
agent = BaseAgent(memory=memory)

In [32]:
agent.reply("My name is Andrea.")

Nice to meet you, Andrea! How can I assist you today?


In [33]:
print(agent.memory)

{'role': 'user', 'content': 'My name is Andrea.'}
{'role': 'assistant', 'content': 'Nice to meet you, Andrea! How can I assist you today?'}


In [34]:
agent.reply("What is 1+1?")

The sum of 1+1 is 2.


In [35]:
print(agent.memory)

{'role': 'user', 'content': 'My name is Andrea.'}
{'role': 'assistant', 'content': 'Nice to meet you, Andrea! How can I assist you today?'}
{'role': 'user', 'content': 'What is 1+1?'}
{'role': 'assistant', 'content': 'The sum of 1+1 is 2.'}


In [36]:
agent.reply("What is the most common word in English?")

The most common word in English is "the".


In [37]:
print(agent.memory)

{'role': 'user', 'content': 'What is 1+1?'}
{'role': 'assistant', 'content': 'The sum of 1+1 is 2.'}
{'role': 'user', 'content': 'What is the most common word in English?'}
{'role': 'assistant', 'content': 'The most common word in English is "the".'}


In [38]:
agent.reply("What is my name?")

I'm sorry, but I don't have access to personal information about individuals unless it has been shared with me in the course of our conversation.


By playing with `max_tokens`, we can control the trade-off between larger context (and hence more reliable answer) and lower costs.

# 3. Initial Prompt and Prompt Engineering

Until now we have not specified any "initial instruction" to the AI, so the only input it takes is the user's question. But what if we want the agent to act in a specific way, e.g., to talk only in one language, or to only focus on a specific subject?

This can be done by passing to it an initial prompt, which in OpenAI is nothing but a message with the "system" role. Let's therefore add this to the `Memory` class, and modify the `delete_history` so as never to delete this first message.

In [39]:
class Memory:
    
    def __init__(self, initial_prompt=None, max_tokens=3000):
        if initial_prompt is None:
            initial_prompt = ""
        self.messages = [
            {
                "role": "system",
                "content": initial_prompt
            }
        ]
        self.max_tokens = max_tokens
        self.encoding = tiktoken.get_encoding("cl100k_base")
        
    def add(self, message):
        self.messages.append(dict(message))

    def delete_history(self):
        while True:
            total_tokens = 0
            for message in self.messages:
                message_tokens = len(self.encoding.encode(message["content"]))
                total_tokens += message_tokens
            if total_tokens > self.max_tokens:
                self.messages[1:] = self.messages[2:]
            else:
                return

    def __str__(self):
        return "\n".join([str(message) for message in self.messages])

Test it:

In [40]:
initial_prompt = """
You are a finance expert. Answer the user's financial question providing technical details if needed.
If a question is not about finance, politely decline to answer, as that is beyond your scope and expertise.
"""

memory = Memory(initial_prompt=initial_prompt)
agent = BaseAgent(memory=memory)

In [41]:
agent.reply("What are some of the main drivers of FX volatility?")

Foreign exchange (FX) volatility refers to the fluctuations in currency exchange rates. Several factors can drive FX volatility, including:

1. Macroeconomic Factors: Economic indicators such as GDP growth, inflation rates, interest rates, and employment data can significantly impact currency values. Changes in these factors can lead to shifts in investor sentiment and subsequent currency volatility.

2. Central Bank Policies: Monetary policy decisions by central banks, such as interest rate changes or quantitative easing measures, can have a significant impact on currency values. Central bank actions influence investor expectations and can lead to increased volatility in the FX market.

3. Political and Geopolitical Events: Political instability, elections, trade disputes, and geopolitical tensions can create uncertainty in the markets and lead to increased FX volatility. Changes in government policies or unexpected events can cause currency values to fluctuate.

4. Market Sentiment a

In [42]:
agent.reply("How many stars are there in the Milky Way?")

I'm sorry, but I am a finance expert and do not have the expertise to answer questions about astronomy or astrophysics. It would be best to consult an astronomer or refer to scientific sources for accurate information on the number of stars in the Milky Way.


This is just modifying the style of the AI, but you can use the prompt to fundamentally alter the nature of the agent. For example, you may use it as an "intention classifier":

In [43]:
initial_prompt = """
The user will provide a restaurant review.
If it's positive, output 1; if it's negative, output -1; in all other cases, output 0.
"""

memory = Memory(initial_prompt=initial_prompt)
agent = BaseAgent(memory=memory)

In [44]:
user_review = """
Our visit was perfect! The place is outstanding and comfy. People are gentle and well educated.
Food is amazing!! They serve a 5-course menu and everything is delicious! Their wine menu is one of the best!!!
We had a very good time and took home sweet memories! For sure on our next visit to the city we’ll go back there!
"""

In [45]:
agent.reply(user_review)

1


In some cases, a little prompt engineering can make the model from bad to good:

In [46]:
initial_prompt = """
The user will provide a restaurant review. From it, determine if the restaurant should be recommended to a vegan person.
If yes, output 1; if no, output -1; in all other cases, output 0.
"""

memory = Memory(initial_prompt=initial_prompt)
agent = BaseAgent(memory=memory)

user_review = """
The place is so great: beautiful music and ambiance, and the food was delicious!
The best is the angus stake, but also the bratwurst and salad was something from heaven.
While there is no vegetarian option, the mushroom in my hamburger were spectacular.
"""
agent.reply(user_review)

1


In [47]:
initial_prompt = """
The user will provide a restaurant review. From it, determine if the restaurant should be recommended to a vegan person.
Provide a reasoning for your answer. Output your reasoning. Then in the new line output 1 if yes, -1 if no, and 0 in all other cases.
"""

memory = Memory(initial_prompt=initial_prompt)
agent = BaseAgent(memory=memory)

agent.reply(user_review)

Based on the review, it seems that the restaurant does not have a specific vegetarian option, but they do have a mushroom option in the hamburger. However, the review also mentions that the best dishes are the angus steak and bratwurst, which are both meat-based. 

Considering this information, it is not recommended to a vegan person as the restaurant does not have a dedicated vegetarian menu and the focus seems to be on meat-based dishes. 

Output: -1


The final use case we look in, is if we want to have a conversation based on a specific context. For example, you could pass in the prompt a user-provided text, which the user can then query via the agent:

In [48]:
webpage_content = """
Ebury Logo
About us
Corporates
Institutions
Partner with us
Careers
Login
Powering growth beyond borders.
From payments, collections, risk management, financing and more – we help businesses maximise their global growth potential.

OUR STORY

We make international trade more accessible, simple and personal.
We believe in a world where any business, big or small, should be able to transact globally with the same ease
and reliability they experience locally. We bring together our in-house platform,
in-depth expertise and custom solutions to help businesses go borderless and achieve their ambitions faster.


£ 27 B transacted in FY2022

1 M+ payments processed in the last 12 months

50,000 + clients served worldwide

1,600 + employees in 21 countries

Meet our team Leadership:

Peter Holmes, SVP of Client Onboarding
Richard Hughes, SVP of Credit Risk
Toby Young, Group Technology Director
Venancio Gallego, Strategic Advisor
Zafeer Ahmed, Global Head of Dealing
Ana Muñoz Fenollosa, Group Financial Director
Duane Swailes, SVP of Sales Acceleration & Marketing
Enrique Colin, SVP of Product and Data
Enrique Diaz-Alvarez, Chief Risk Officer
Fernando Pierri, Chief Commercial Officer
Juan Lobato, Founder & CEO

JOIN OUR TEAM
Join us as we build the international trade platform of the future and transform how businesses transact globally.

Company
Our story
Press room
Our global presence
Careers
Resource Hub
Blog
Podcast
Ebury Labs
Help Centre
Corporate solutions
E-commerce
NGO's and charities
Mass Payments
Corporate products
Payments and collections
Digital platforms
Business lending
FX risk management
Institutions
Ebury Institutional Solutions
Partner with us
White Label Solution
Branded affiliates
Affiliates

Get in touch with us
We’re happy to help! Contact us to learn more.

Subscribe to our blog   
Expert insights to grow your business globally.

FX Talk an Ebury podcast   
Get a breakdown of the global markets from our experts.

Join our team   
Explore open roles across 32+ offices worldwide.

Legal Privacy Notice Cookie Notice Manage cookies
United Kingdom - English

Ebury Partners UK Ltd is authorised and regulated by the Financial Conduct Authority as an Electronic Money Institution. Reference number: 900797. Ebury Partners UK Ltd is registered with the Information Commissioners Office, with registration number: ZA345828. Ebury Partners Markets Ltd is authorised and regulated by the Financial Conduct Authority as an Investment Firm to provide advice and execute trades in MiFID derivative products. Reference number: 784063. EBURY and EBURY What Borders? are trademarks.

Ebury Partners UK Ltd © 2023
"""

In [49]:
initial_prompt = f"""
Answer the user's questions based solely on the following context, which comes from Ebury's website.

CONTEXT:
'''
{webpage_content}
'''
"""

memory = Memory(initial_prompt=initial_prompt)
agent = BaseAgent(memory=memory)

In [50]:
agent.reply("What is the website about?")

The website is about Ebury, a company that provides various services to help businesses maximize their global growth potential. These services include payments, collections, risk management, financing, and more. Ebury aims to make international trade more accessible, simple, and personal for businesses of all sizes.


In [51]:
agent.reply("Who is Ebury's responsible for Client Onboarding?")

Peter Holmes is the Senior Vice President (SVP) of Client Onboarding at Ebury.


# 4. Embedding Vectors and Vector Databases

Consider a very domain-specific question, such as "My company produces fertilisers. Can I export to Russia in 2023?". Likely, the LLM does not know that:

In [52]:
memory = Memory()
agent = BaseAgent(memory=memory)

agent.reply("My company produces fertilisers. Can I export to Russia in 2023?")

As an AI language model, I don't have access to real-time data or the ability to predict future events. Therefore, I cannot provide you with a definitive answer regarding exporting fertilizers to Russia in 2023. 

To determine whether you can export fertilizers to Russia in 2023, you should consider several factors. These may include trade agreements between your country and Russia, any specific regulations or restrictions on fertilizer imports in Russia, and the current political and economic situation between the two countries.

To get accurate and up-to-date information, it is recommended to consult with trade organizations, government agencies, or legal experts specializing in international trade. They can provide you with the most relevant and reliable information regarding exporting fertilizers to Russia in 2023.


How can we use the ideas above to improve the answers that the AI gives?

Suppose we have access to Ebury's internal documentation: the answer is in there, but we cannot paste the whole corpus into the context! We then need to identify the document which contains the answer, and in it, the paragraph that contains the answer. We are then going to provide only that as context.

The way to "find the relevant paragraph in the corpus" is to use **embedding vectors**.

Let's start with the simplest type of embedding vector: a "word embedding".

In [53]:
import gensim.downloader as api

In [54]:
model = api.load("glove-wiki-gigaword-100")

In [55]:
vec = model.get_vector("beer")

len(vec)

100

In [56]:
vec[:5]

array([-0.047204,  0.56615 , -0.29627 , -0.46257 , -0.015033],
      dtype=float32)

In [57]:
model.most_similar(vec)

[('beer', 0.9999999403953552),
 ('drink', 0.8182137608528137),
 ('drinks', 0.7982838153839111),
 ('liquor', 0.7321157455444336),
 ('coffee', 0.7253385782241821),
 ('vodka', 0.7246952056884766),
 ('wine', 0.7244510054588318),
 ('bottle', 0.717643141746521),
 ('beers', 0.6993280649185181),
 ('bottled', 0.6970836520195007)]

Let's try to compose vectors, and see if we find something interesting in their meaning.

In [58]:
vec1 = model.get_vector("beer")
vec2 = model.get_vector("germany")
vec3 = model.get_vector("italy")

new_vec = vec1 - vec2 + vec3

model.most_similar(new_vec)

[('beer', 0.7892547249794006),
 ('wine', 0.7375332713127136),
 ('drink', 0.7100510597229004),
 ('champagne', 0.7068606019020081),
 ('drinks', 0.6879450082778931),
 ('bottle', 0.6631824970245361),
 ('vodka', 0.6605401039123535),
 ('coffee', 0.6547015309333801),
 ('liquor', 0.6361646056175232),
 ('tea', 0.6283336281776428)]

Word embeddings generalize to document embedding: a sentence (and even a full text) can be transformed into a vector, which captures its semantic content.

In [59]:
from langchain.embeddings import OpenAIEmbeddings

In [60]:
embeddings = OpenAIEmbeddings()

vec = embeddings.embed_query("The dog plays with the ball.")

len(vec)

1536

In [61]:
vec2 = embeddings.embed_query("The hound plays with the ball.")
vec3 = embeddings.embed_query("The cat plays with the ball.")
vec4 = embeddings.embed_query("Spain is hereby established as a social and democratic State.")

In [62]:
import numpy as np

In [63]:
def cosine_similarity(a, b):
    a = np.array(a)
    b = np.array(b)
    return np.dot(a, b)/(np.linalg.norm(a)*np.linalg.norm(b))

In [64]:
cosine_similarity(vec, vec2)

0.9562994653334981

In [65]:
cosine_similarity(vec, vec3)

0.9496210236373911

In [66]:
cosine_similarity(vec, vec4)

0.718338264586324

Now, the idea is simple: given all the documents in the corpus, we split them into text chunks (the "value"), and compute the embedding vector of each chunk (the "key"). We save each key-value pair in a database which, given its structure, is called a **vector database**.

Then, when the user asks a question, we turn the question into an embedding vector, and look for the the most similar vectors among the database keys: we then retrieve the corresponding values (the text chunks).

Let's start constructing the vector database:

In [67]:
from langchain.text_splitter import TokenTextSplitter
from langchain.document_loaders import UnstructuredPDFLoader

In [68]:
text_splitter = TokenTextSplitter(chunk_size=300, chunk_overlap=0)

In [69]:
files = [
    os.path.join('..', 'data', 'sanctions-russia.pdf')
]

all_docs = []
for file in files:
    loader = UnstructuredPDFLoader(file)
    docs = loader.load_and_split(text_splitter=text_splitter)
    all_docs = all_docs + docs

In [70]:
print(all_docs[5].page_content)

 that the Commission manages a Consolidated List of all designations, which is up to date and available on the EU Sanctions map.

As regards the TARIC codes, the TARIC database is regularly updated in order to include all targeted goods.

4. Can EU nationals be sanctioned?

Last update: 8 April 2022

Sanctions adopted pursuant to Article 215 TFEU are to pursue the objectives of the Common Foreign and Security Policy. In line with these objectives, it is for the Council to decide on the scope of sanctions, including on which persons - irrespective their nationality – are subject to these measures.

6

5. What are the benefits of the sanctions for European citizens?

Last update: 8 April 2022

Since the beginning of Putin’s aggression against Ukraine, many European citizens have shared their concerns about peace in Europe, shown solidarity with Ukrainian refugees and supported the need for Ukraine to receive political, financial and humanitarian assistance. By aiming to undermine the Kre

In [71]:
from langchain.vectorstores import FAISS

In [72]:
embeddings = OpenAIEmbeddings()

vdb = FAISS.from_documents(all_docs, embeddings)

Test it:

In [73]:
doc_list = vdb.similarity_search("My company produces fertilisers. Can I export to Russia in 2023?", k=5)

In [74]:
doc_list[0].page_content

'. EU companies can export plant protection products, herbicides or fertilisers to the Russia (e.g. CN codes e.g. CN codes: HS38089910 for pesticides and 38089323 herbicides). Regarding the involvement of listed persons or restrictions on transactions with certain Russian companies, see questions 4.\n\nAs regards imports of fertilisers, a specific treatment applies only to potash fertilisers and fertilisers containing potash: restrictions apply to new contracts, but not to existing ones, until 10 July 2022, and from that date onwards imports would be subject to a quota limit corresponding to the volume of annual imports from Russia, in order to avoid circumvention of the bans applicable for imports from Belarus.\n\nEU companies should not be confused by certain caps and restriction on import of certain chemicals from Russia used as compounds for phytosanitary products; those restrictions prohibit the purchase and import of them from Russia into the Union and not the sale and export to 

Assuming that this piece of text contains the answer to the question, we would now like to pass this to the chatbot as context. Of course, we can to this by hand (copy-paste), but then the bot is hardly automatic.

Rather, we must give the LLM the **option** to search the vector database. In OpenAI, we can do so via the feature of **function call**.

# 5. Funcion Call for RAG

How do we integrate the vector database search seen above with our chatbot?

To see this, start from a different problem. Suppose you have a function that perform a google search, returning the top links found:

In [75]:
from googlesearch import search

In [76]:
def google_search(query):
    return [url for url in search(query)]

In [77]:
search_result = google_search("Ebury")
search_result

['https://ebury.com/',
 'https://ebury.com/',
 'https://www.ebury.es/',
 'https://www.ebury.es/ebury-online/',
 'https://uk.linkedin.com/company/eburyfintech',
 'https://www.ebury.es/que-te-ofrecemos/',
 'https://es.linkedin.com/company/eburyfintech',
 'https://www.instagram.com/ebury_fintech/',
 'https://www.crunchbase.com/organization/ebury',
 'https://www.xataka.com/empresas-y-economia/que-hace-ebury-fintech-malaguena-que-banco-santander-ha-invertido-400-millones-euros',
 'https://www.laopiniondemalaga.es/malaga/2023/05/08/ebury-refuerza-apuesta-malaga-fintech-87011550.html']

How do we allow the LLM to "call" this function? In OpenAI API, we do so using the following structure:

In [78]:
google_search_string = """
{
    "name": "google_search",
    "description": "Perform a Google search for a query, obtaining a list of related urls.",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The query you want to search."
            }
        },
        "required": ["query"]
    }
}
"""

In [79]:
import json

google_search_json = json.loads(google_search_string)

We now pass this string to the `openai.ChatCompletion.create` call (together with the messages):

In [80]:
question = "What is the url of Ebury?"

messages = [
    {
        "role": "user",
        "content": question
    }
]

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=messages,
    functions=[google_search_json],
    temperature=0
)

Let's look at the response:

In [81]:
answer = response["choices"][0]
print(answer)

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": null,
    "function_call": {
      "name": "google_search",
      "arguments": "{\n  \"query\": \"Ebury url\"\n}"
    }
  },
  "finish_reason": "function_call"
}


The LLM did not output a message content, but rather a "function call". Inside it, it tells us the name of the function it wants to call, and the arguments.

As you see, the LLM cannot **directly** run a function: it can only output text! But we can parse this output to actually run the function, and provide it the answer as a 'role' = 'function' message:

In [82]:
messages = [
    {
        "role": "user",
        "content": question
    },
    answer.message,
    {
        "role": "function",
        "name": "google_search",
        "content": search_result[0]
    }
]

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=messages,
    functions=[google_search_json],
    temperature=0
)

In [83]:
answer = response["choices"][0]
print(answer)

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": "The URL of Ebury is https://ebury.com/."
  },
  "finish_reason": "stop"
}


Now, we automate all this for the `search_documentation` function, which enables the LLM to search internal documentation -- what we call the **knowledge base**.

First, we define a new class `KnowledgeBase`:

In [84]:
import re

In [85]:
class KnowledgeBase:
    
    def __init__(self, vdb, max_chunks=5):
        self.vdb = vdb
        self.max_chunks = max_chunks
        self.function_name = "search_documentation"
        self.function = json.loads(self.search.__doc__)
        
    def search(self, query):
        """
        {
            "name": "search_documentation",
            "description": "Access information from internal documentation.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The user's query."
                    }
                },
                "required": ["query"]
            }
        }
        """
        retrieved = self.vdb.similarity_search(query, k=self.max_chunks)
        context = {}
        for i, doc in enumerate(retrieved):
            file_path = doc.metadata["source"]
            file_name = os.path.normpath(file_path).split(os.sep)[-1]
            title = f"INFORMATION {i + 1} (from {file_name})"
            content = re.sub("\s+", " ", doc.page_content)
            context[title] = content
        return str(context)

The docstring of this function, jsonized into the attribute `function`, is what we pass to OpenAI's API: it tells the LLM that, if it wants to access information from internal documentation, it can do so by calling the function "search_documentation" with input the query (a string).

In order for this to work, we must modify the `BaseAgent`:

In [86]:
class BaseAgent:
    
    def __init__(self, memory, knowledge_base, model="gpt-3.5-turbo", prices=[0.0015, 0.0020]):
        self.memory = memory
        self.knowledge_base = knowledge_base
        self.model = model
        self.prompt_cost = prices[0]
        self.completion_cost = prices[1]
        self.cost_list = []
        self.total_cost = 0
        
    def reply(self, question):

        human_message = {
            "role": "user",
            "content": question
        }
        self.memory.add(human_message)

        answer = False
        while not answer:
            
            agent_message = self.generate_response()
            self.memory.add(agent_message)
            
            if agent_message.content is not None:
                answer = agent_message.content
            
            else:
                function_call = agent_message.function_call
                function_name = function_call.name
                kwargs = json.loads(function_call.arguments)
                
                print(f"[Agent calling function {function_name} with arguments {kwargs}]\n")
                
                if function_name == self.knowledge_base.function_name:
                    function_output = self.knowledge_base.search(**kwargs)
                    function_message = {
                        "role": "function",
                        "name": function_name,
                        "content": function_output
                    }
                    self.memory.add(function_message)
                
                else:
                    function_output = "WARNING: Function not found!"
        
        print(answer)

    def generate_response(self):
        
        self.memory.delete_history()
        
        response = openai.ChatCompletion.create(
            model=self.model,
            messages=self.memory.messages,
            functions=[self.knowledge_base.function],
            temperature=0
        )

        self.calculate_cost(response.usage)

        agent_message = response.choices[0].message
        
        return agent_message
    
    def calculate_cost(self, usage):
        cost = (usage.prompt_tokens * self.prompt_cost + usage.completion_tokens * self.completion_cost)/1000
        self.cost_list.append(cost)
        self.total_cost += cost

Finally, we need to make a modification to the `Memory` class:

In [87]:
class Memory:
    
    def __init__(self, initial_prompt=None, max_tokens=3000):
        if initial_prompt is None:
            initial_prompt = ""
        self.messages = [
            {
                "role": "system",
                "content": initial_prompt
            }
        ]
        self.max_tokens = max_tokens
        self.encoding = tiktoken.get_encoding("cl100k_base")
        
    def add(self, message):
        self.messages.append(dict(message))

    def delete_history(self):
        while True:
            total_tokens = 0
            for message in self.messages:
                if message["content"] != None:
                    message_tokens = len(self.encoding.encode(message["content"]))
                    total_tokens += message_tokens
            if total_tokens > self.max_tokens:
                self.messages[1:] = self.messages[2:]
            else:
                return

    def __str__(self):
        return "\n".join([str(message) for message in self.messages])

Test it:

In [88]:
initial_prompt = """
Answer the user's question/request.
If you are unsure, you can consult the internal documentation by calling the function 'search_documentation'.
"""

memory = Memory(initial_prompt=initial_prompt)
knowledge_base = KnowledgeBase(vdb)
agent = BaseAgent(memory=memory, knowledge_base=knowledge_base, model='gpt-4')

In [89]:
agent.reply("My company produces luxury goods. Can I export to Russia?")

[Agent calling function search_documentation with arguments {'query': 'exporting luxury goods to Russia'}]

According to the EU regulations, EU nationals or EU companies are prohibited from providing luxury goods to a person in Russia or for use in Russia. This includes the sale, supply, transfer, or export, directly or indirectly, of luxury goods. The prohibition applies not only in the territory of the Union but also to any national of a Member State, and to any legal person incorporated or constituted under the law of a Member State, irrespective of where that person or legal person is.

However, if the goods have been imported by a Russian person or company before the imposition of sanctions and are now being sold in Russia by these companies, the prohibition would not apply. There is also an exception for goods necessary for the official purposes of diplomatic or consular missions of Member States or partner countries in Russia, or of international organizations enjoying immunitie

We can take a look at the memory, to see what happened:

In [90]:
print(agent.memory)

{'role': 'system', 'content': "\nAnswer the user's question/request.\nIf you are unsure, you can consult the internal documentation by calling the function 'search_documentation'.\n"}
{'role': 'user', 'content': 'My company produces luxury goods. Can I export to Russia?'}
{'role': 'assistant', 'content': None, 'function_call': <OpenAIObject at 0x7f623bd0fef0> JSON: {
  "name": "search_documentation",
  "arguments": "{\n  \"query\": \"exporting luxury goods to Russia\"\n}"
}}
{'role': 'function', 'name': 'search_documentation', 'content': "{'INFORMATION 1 (from sanctions-russia.pdf)': 'EU nationals or EU companies are prohibited from providing luxury goods as defined in Article 3h of Regulation (EU) No 833/2014 to a person in Russia or for use in Russia even if the goods have already been imported in the country. EU operators are furthermore prohibited from participating, knowingly and intentionally, in activities the object or effect of which is to circumvent these export restrictions.

# 6. A Webapp for your LLM

Now we have all the core pieces for our chatbot, and can therefore move out of the notebook to an actual script. What we want to do is

1. Put some order in the code
2. Give the chatbot an interface, i.e., build a webapp

I have already done that: all the code can be found in the `src` folder.

# 7. Competition: Can your Chatbot Answer Correctly to our Questions?

Competition! You have 20 minutes to play around with the prompt (or, if you have other ideas, feel free to try them!): the purpose is to answer correctly 3 questions that will be revealed at the after the 20 minutes! Hints:

- One question will be general knowledge.
- One will require a little mathematical reasoning.
- One will be related to our internal documentation.