### **Load Environment variables from .env file**

In [1]:
from langchain.llms import AzureOpenAI
import openai
import tiktoken 
#to avoid error:Could not automatically map gpt-35-turbo to a tokeniser...
tiktoken.model.MODEL_TO_ENCODING["gpt-35-turbo"] = "cl100k_base"
from dotenv import load_dotenv
import os
from IPython.display import display, HTML, JSON, Markdown
import json


In [2]:
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") 
OPENAI_DEPLOYMENT_ENDPOINT = os.getenv("OPENAI_DEPLOYMENT_ENDPOINT")
OPENAI_DEPLOYMENT_NAME = os.getenv("OPENAI_DEPLOYMENT_NAME")
OPENAI_MODEL_NAME = os.getenv("OPENAI_MODEL_NAME")
OPENAI_DEPLOYMENT_VERSION = os.getenv("OPENAI_DEPLOYMENT_VERSION")

OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME = os.getenv("OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME")
OPENAI_ADA_EMBEDDING_MODEL_NAME = os.getenv("OPENAI_ADA_EMBEDDING_MODEL_NAME")

OPENAI_DAVINCI_DEPLOYMENT_NAME = os.getenv("OPENAI_DAVINCI_DEPLOYMENT_NAME")
OPENAI_DAVINCI_MODEL_NAME = os.getenv("OPENAI_DAVINCI_MODEL_NAME")

# Configure OpenAI API
openai.api_type = "azure"
openai.api_version = OPENAI_DEPLOYMENT_VERSION
openai.api_base = OPENAI_DEPLOYMENT_ENDPOINT
openai.api_key = OPENAI_API_KEY

### **Model initialization**

In [3]:

def init_llm(model=OPENAI_MODEL_NAME,
             deployment_name=OPENAI_DEPLOYMENT_NAME, 
             temperature=0,
             max_tokens=400,
             top_p=1,
             ):
    
    llm = AzureOpenAI(deployment_name=deployment_name,  
                  model=model,
                  temperature=temperature,
                  max_tokens=max_tokens,
                  top_p=top_p,
                  model_kwargs={"stop": ["<|im_end|>"]}
    ) 
    return llm


### **Prompt engineering techniques**
Those techniques are not specific to summarization, but can be used for any task.

1. Use delimeters to clearly separate the exact text the model should summarize.

    Delimiters could be any kind of punctuation, that separates specific pieces of text. 
    Tripple quotes, Triple backtics, Triple dashes, Angle brackets, XML tags, etc.

2. Ask model for structured output, e.g json, html, etc. If you ask for a json define the structure of the json, e.g. what fields should be in the json.

3. Check assumptions required to do the task.

4. Few-shot prompting. Provide examples of completing tasks and then ask model to perform the task.



In [4]:
# Using delimiters to clearly separate input text.

text = f"""
**Customer**: Hello, my name is John, and I'm a customer of Imaginal Bank.
**Clerk**: Hello, John! My name is Sara, and I'm a customer service representative at Imaginal Bank. How can I assist you today?
**Customer**: Hi, Sara. I'm interested in your bank's investment programs. 
              Can you tell me more about them, especially in terms of risk management?

**Clerk**: Absolutely, John. We have a few key programs I can highlight.

First, there's our 'Balanced Growth Fund'. It's a diversified mutual fund that invests in a mix of equities and bonds to provide both growth and income, reducing risk through diversification. 

We also have the 'Index Tracker ETF', which is designed to replicate the performance of a specific market index. By spreading investments across the entire index, it inherently reduces the risk associated with individual stocks.

Additionally, for those with a lower risk tolerance, we have the 'Secure Income Bond Fund', which focuses on government and high-quality corporate bonds. 

Our financial advisors are always available to guide you in choosing the right program based on your financial goals and risk tolerance.

**Customer**: I see. Could you elaborate on how the Balanced Growth Fund manages risk?

**Clerk**: Sure. The Balanced Growth Fund mitigates risk by diversifying investments across a wide range of assets. If one investment performs poorly, it's likely to be offset by other investments that are performing well. Furthermore, our portfolio managers actively manage the fund, adjusting holdings based on changing market conditions to manage risk and enhance returns.

**Customer**: Does the bank provide any tools to monitor my investments?

**Clerk**: Yes, John. We offer an online platform called 'Imaginal Investor Dashboard'. It provides real-time tracking of your investments, balance updates, and market trends. You can also set up alerts to be notified about significant changes in your portfolio.

**Customer**: That sounds quite comprehensive. How can I get started?

**Clerk**: You can schedule an appointment with one of our financial advisors. They'll walk you through your options, help you understand your risk tolerance, and guide you in choosing the right investment program. Would you like me to arrange that for you?

**Customer**: Yes, please. That would be helpful.

**Clerk**: Fantastic, John! Let's get that set up for you...

"""

prompt = f""" Summarize the text delimited by triple backticks into a summary of 20 words. 
```{text}```

Output the summary in the form: Summary: <summary> <|im_end|>
"""


llm=init_llm()
sum = llm(prompt)
display(Markdown(sum))


Summary: Imaginal Bank offers investment programs such as Balanced Growth Fund, Index Tracker ETF, and Secure Income Bond Fund. The bank mitigates risk by diversifying investments and actively managing the fund. The Imaginal Investor Dashboard provides real-time tracking of investments, balance updates, and market trends. Customers can schedule an appointment with a financial advisor to choose the right investment program.

In [5]:
# Check the specific conditions/assumptions in the text

prompt = f""" You will be provided with a text delimited by triple quotes. 
If it's contains utterances of a Customer and a Clerk, summarize the text into single sentence.
If the text does not contain utterances of a Customer and a Clerk, 
then just write 'No customer and clerk utterances found'.
\"\"\" {text}  \"\"\"

Output the summary in the form: Summary: <summary> <|im_end|>
"""
llm=init_llm()
sum = llm(prompt)
display(Markdown(sum))
 

Summary: John is interested in Imaginal Bank's investment programs and asks Sara about them. Sara explains the Balanced Growth Fund, Index Tracker ETF, and Secure Income Bond Fund. John asks about how the Balanced Growth Fund manages risk and if the bank provides tools to monitor investments. Sara explains how the Balanced Growth Fund mitigates risk and tells John about the Imaginal Investor Dashboard. John schedules an appointment with a financial advisor.

#### Give the model time to think

Very useful technique is to give the model time to think by specifying the strict steps (sub-tasks) to complete the task and asking for output in a specific format. This is a very powerful technique.


In [6]:
text = f"""
**Customer**: Hello, my name is John, and I'm a customer of Imaginal Bank.
**Clerk**: Hello, John! My name is Sara, and I'm a customer service representative at Imaginal Bank. How can I assist you today?
**Customer**: Hi, Sara. I'm interested in your bank's investment programs. 
              Can you tell me more about them, especially in terms of risk management?

**Clerk**: Absolutely, John. We have a few key programs I can highlight.

First, there's our 'Balanced Growth Fund'. It's a diversified mutual fund that invests in a mix of equities and bonds to provide both growth and income, reducing risk through diversification. 

We also have the 'Index Tracker ETF', which is designed to replicate the performance of a specific market index. By spreading investments across the entire index, it inherently reduces the risk associated with individual stocks.

Additionally, for those with a lower risk tolerance, we have the 'Secure Income Bond Fund', which focuses on government and high-quality corporate bonds. 

Our financial advisors are always available to guide you in choosing the right program based on your financial goals and risk tolerance.

**Customer**: I see. Could you elaborate on how the Balanced Growth Fund manages risk?

**Clerk**: Sure. The Balanced Growth Fund mitigates risk by diversifying investments across a wide range of assets. If one investment performs poorly, it's likely to be offset by other investments that are performing well. Furthermore, our portfolio managers actively manage the fund, adjusting holdings based on changing market conditions to manage risk and enhance returns.

**Customer**: Does the bank provide any tools to monitor my investments?

**Clerk**: Yes, John. We offer an online platform called 'Imaginal Investor Dashboard'. It provides real-time tracking of your investments, balance updates, and market trends. You can also set up alerts to be notified about significant changes in your portfolio.

**Customer**: That sounds quite comprehensive. How can I get started?

**Clerk**: You can schedule an appointment with one of our financial advisors. They'll walk you through your options, help you understand your risk tolerance, and guide you in choosing the right investment program. Would you like me to arrange that for you?

**Customer**: Yes, please. That would be helpful.

**Clerk**: Fantastic, John! Let's get that set up for you...

"""

prompt = f""" Your task is to perform the following actions:
1 - Summarize the text delimited by triple backticks into single sentence.
2 - Translate the summary into Portuguese.
3 - List the Customer questions.

Use the following output format:
Summary: <summary>
Translated summary: <portuguese summary>
Customer questions: <enumerated customer questions>

Text: ```{text}``` <|im_end|>
"""

llm=init_llm()
sum = llm(prompt)
display(Markdown(sum))


Summary: The customer is interested in Imaginal Bank's investment programs and asks about risk management. The clerk explains the Balanced Growth Fund, Index Tracker ETF, and Secure Income Bond Fund, and how they manage risk. The clerk also mentions the Imaginal Investor Dashboard, which provides real-time tracking of investments and market trends. The customer schedules an appointment with a financial advisor to get started.

Translated summary: O cliente está interessado nos programas de investimento do Imaginal Bank e pergunta sobre gerenciamento de risco. O atendente explica o Balanced Growth Fund, o Index Tracker ETF e o Secure Income Bond Fund, e como eles gerenciam o risco. O atendente também menciona o Imaginal Investor Dashboard, que fornece rastreamento em tempo real de investimentos e tendências de mercado. O cliente agenda uma consulta com um consultor financeiro para começar.

Customer questions:
1. Can you tell me more about your bank's investment programs?
2. How do your investment programs manage risk?
3. Does the bank provide any tools to monitor my investments?
4. How can I get started with investing?

##### Let's do some math

In [8]:
prompt = f""" Determine if the student's solution delimited by the triple backticks is correct or not.
Question: When I was 2 years old my sister was twice my age. I'm now 40 years old how old is my sister now? 
Student's answer: ```The sister is now 80 years old.``` 
If student is correct, then write 'Correct', otherwise write 'Incorrect'.  
<|im_end|>
"""

llm=init_llm()
sum = llm(prompt)
display(HTML(sum))

##### Ask model to do its own solution and then to compare both solutions and coclude which is correct.
 Define the task as a list of sub-tasks and ask the model to perform them in a specific order (splitting the task into sub-tasks technique). 
 Then ask the model to compare its own solution with the solution provided by the model and conclude which is correct.
 Ask model to share it reasoning for the conclusion.

In [68]:
Question = f"""When I was 2 years old my sister was twice of my age . I'm now 40 years old, how old is my sister now?"""
Student_Solution = f""" The sister is now 80 years old."""

prompt = f""" Determine if the Student's Solution for the Question is correct or not.
To solve the problem do the following:
1 - First, work out your OWN solution to the problem. Evaluate your final result to make sure it is correct and adheres to the question's conditions. 
2 - Second, compare your solution to the student's solution and evaluate if the student's solution is correct or not.
Don't decide if the student's solution is correct until you have done the problem yourself.

Student's Solution: ```{Student_Solution}```
Question: ```{Question}```


Use the following output format:
Actual solution steps: <your own solution steps>
Student's solution: <student's solution>
Student's solution is correct: <true/false>
<|im_end|>

"""

llm=init_llm()
sum = llm(prompt)
display(Markdown(sum))


Actual solution steps: 
- When I was 2 years old, my sister was 4 years old (2 * 2 = 4)
- The age difference between us is 2 years (4 - 2 = 2)
- I am now 40 years old (given)
- Therefore, my sister is 40 + 2 = 42 years old now (age difference + my age)
Student's solution: The sister is now 80 years old.
Student's solution is correct: False

#### Model Hallucinations and how to avoid them

Hallucination is when the model generates text that is not supported by the input.
To reduce hallucinations, use the techniques listed above, and strictly instruct the model to find the relevant information in the input text. 
If the information is not in the input, instruct the model to generate a specific output, e.g. "I don't know the answer to this question". 

#### Iterative Prompt Development
Prompt engineering is an iterative process. You almost never get the prompt right the first time.
You start with the idea, then you implement the prompt, get the experimental resuslts, do the Error Analysis and iterate again.

**Try -> Analyze -> Clarify Instructions -> Try again**

In more advanced phases refine prompts with a batch of examples.

### Basic Summarization 

In [5]:
#helper function
#Note that this functions calculates the number of tokens in the prompt
def summarize_text(llm, prompt_prefix, text_file):
    
    # read the text file 
    with open(text_file, 'r') as file:
        text = file.read()
    # concatenate the prompt with the data
    prompt = prompt_prefix.format(text=text)
    return llm(prompt)

In [70]:
llm=init_llm()
prompt_prefix = """ Summarize the text delimited by triple backticks into 2-3 sentences: ```{text}``` 
<|im_end|>
"""

sum = summarize_text(llm, prompt_prefix, "./data/bank-call-center-transcript.txt")
display(Markdown(sum))

Imaginal Bank offers a range of investment programs, including the Balanced Growth Fund, Index Tracker ETF, and Secure Income Bond Fund. The Balanced Growth Fund is a diversified mutual fund that invests in a mix of equities and bonds to provide both growth and income, reducing risk through diversification. The bank's portfolio managers actively manage the fund, adjusting holdings based on changing market conditions to manage risk and enhance returns. The Imaginal Investor Dashboard provides real-time tracking of investments, balance updates, and market trends. Customers can schedule an appointment with a financial advisor to help them choose the right investment program based on their financial goals and risk tolerance.

#### More advanced prompts

In [71]:
llm=init_llm()
prompt_prefix = """ Prepare a summary for the text delimited by the triple backticks```{text}``` based on the points mentioned below. 
Please evaluate whether the clerk successfully accomplished the following tasks:

Greeting the customer politely and professionally.
Accurately understanding the customer's inquiry.
Providing clear and detailed information in response.
Asking questions as needed for clarification.
Discussing both benefits and risks with the customer.
Explaining the tools and resources available to the customer.
Inviting the customer to take further action.
Offering assistance for the next steps.
Ending the conversation on a positive note.

<|im_end|>
"""

sum = summarize_text(llm, prompt_prefix, "./data/bank-call-center-transcript.txt")
display(Markdown(sum))

The clerk successfully accomplished the following tasks:

- Greeting the customer politely and professionally.
- Accurately understanding the customer's inquiry.
- Providing clear and detailed information in response.
- Asking questions as needed for clarification.
- Discussing both benefits and risks with the customer.
- Explaining the tools and resources available to the customer.
- Inviting the customer to take further action.
- Offering assistance for the next steps.
- Ending the conversation on a positive note.

##### Output in tabular format

In [31]:
llm=init_llm()
prompt_prefix = """Prepare a summary for the text delimited by the triple backticks:```{text}``` based on the points mentioned below.
Generate the output in the HTML format, with each item on a separate row.

Assign a color code to each item based on the clerk's performance - 
items that were successfully addressed should be marked in green, 
whereas items that were not met should be highlighted in red.

Here are the evaluation criterias to consider:

Did the clerk greet the customer in a polite and professional manner?
Did the clerk begin the conversation on a negative note?
Did the clerk understand the customer's inquiry?
Did the clerk provide clear and detailed information?
Did the clerk ask questions to clarify the situation?


<|im_end|>
"""

res = summarize_text(llm, prompt_prefix, "./data/bank-call-center-transcript.txt")
display(HTML(res))

0,1
Did the clerk greet the customer in a polite and professional manner?,Yes
Did the clerk begin the conversation on a negative note?,No
Did the clerk understand the customer's inquiry?,Yes
Did the clerk provide clear and detailed information?,Yes
Did the clerk ask questions to clarify the situation?,Yes


##### Output as JSON

In [19]:
llm=init_llm()

prompt_prefix = """" Evaluate the clerk performance from the text delimited by the triple backticks: ```{text}``` based on the points mentioned below.

Did the clerk greet the customer in a polite and professional manner?
Did the clerk comprehend the customer's inquiry accurately?
Did the clerk provide comprehensive and clear information?
Did the clerk ask relevant questions to clarify the customer's situation?
Did the clerk explain the benefits and potential risks to the customer?
Did the clerk detail the tools and resources available to the customer?
Did the clerk encourage the customer to take further action?
Did the clerk offer assistance with proceeding to the next steps?
Did the clerk end the interaction on a positive and upbeat note?"

The output should be presented in JSON format, adhering to the following key-value pairs:

"Greet the Customer Politely and Professionally": "Yes/No",
"Understand the Customer's Inquiry": "Yes/No",
"Provide Clear and Detailed Information": "Yes/No",
"Ask Clarifying Questions": "Yes/No",
"Discuss the Benefits and Risks": "Yes/No",
"Explain Available Tools and Resources": "Yes/No",
"Invite Further Action": "Yes/No",
"Offer to Assist with Next Steps": "Yes/No",
"End on a Positive Note": "Yes/No"

<|im_end|>
"""

res = summarize_text(llm, prompt_prefix, "./data/bank-call-center-transcript.txt")
display(JSON(json.loads(res)))



<IPython.core.display.JSON object>

### Summarize large documents
To summarize larger documents, we can split the document into smaller chunks and summarize each chunk separately. We can then combine the summaries of each chunk to get the final summary.
LangChain has a built-in chain for doing that. 



In [74]:
from langchain import OpenAI
from langchain.chains.summarize import load_summarize_chain

In [75]:
from langchain.document_loaders import PyPDFLoader
llm=init_llm()

large_pdf_path ="./data/Large_language_model.pdf"
loader = PyPDFLoader(large_pdf_path)

#output type is List[Document]
pages = loader.load_and_split()

#count tokens in the document
total_tokens = 0
encoder = enc = tiktoken.get_encoding("cl100k_base")

for page in pages:
    total_tokens += len(encoder.encode(page.page_content))
    
# The document has more 13000 tokens. This is too many for the LLM to process in one go.
# This is whay we split the document into smaller chunks.
print(f"Total tokens in the document: {total_tokens}")


Total tokens in the document: 13420


In [76]:
#create summarization chain
summary_chain = load_summarize_chain(llm=llm, chain_type='map_reduce',verbose=False )

##### There are following chain_types: stuff, map_reduce, refine, map-rerank
See here for the details: https://docs.langchain.com/docs/components/chains/index_related_chains

In [77]:
#Many calls in loop causing RateLimitError: Too Many Requests
#sum = summary_chain.run(pages)
sum = summary_chain.run(pages[0:5])

display(Markdown(sum))

 

The article discusses the use of language models (LLMs) in natural language processing (NLP) tasks. It explains zero-shot prompting, where the model is given no solved examples, and few-shot performance, where the model is given a small number of solved examples. The article also discusses instruction tuning, a form of fine-tuning that trains the LLM on many examples of tasks formulated as natural language instructions. The article explains various techniques for instruction tuning, including self-instruct and OpenAI's InstructGPT protocol. The article concludes by discussing perplexity, a measure of how well a model is able to predict the contents of a dataset, and the challenges of evaluating large language models. 

"""<|im_sep|>