# Prompt Chunking

## Setup
#### Follow [README](https://github.com/tirtho/open-ai/blob/main/README.md) and perform setup before running the notebooks

Reference : 
- [Azure Open AI](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/overview)
- [NAIC P & C Insurance Industries Full Year Report, 2021](https://content.naic.org/sites/default/files/inline-files/2021%20Annual%20Property%20%26%20Casualty%20and%20Title%20Industry%20Report.pdf)

#### Load the API key and relevant Python libaries.

In [1]:
import openai
import sys

from azure_openai_setup import get_openai_client, get_config_from_os_env, get_chat_completion

THE_MODEL = 'gpt-4o'
endpoint, key, version = get_config_from_os_env()
#print(f"{endpoint}, {key}, {version}")
status, client = get_openai_client(aoai_endpoint = endpoint, 
                                   aoai_api_key = key, 
                                   aoai_version = version
                                  )
print(f"Connecting to Open AI returned status as {status}")


Got OPENAI API Key from environment variable
Connecting to Open AI returned status as True


## Chunking
Break your large documents into chunks and summarize the individual chunks first. Then create a Summary of the summaries of the individual chunks to get the final Summary.

In [2]:
IndustryReportPart1 = """
FINANCIAL RESULTS 2021 (in million dollars, except for percent):
- Item | YoY Change | 2021 | 2020
- Net Premium Written | 9.2% | 719,815 | 658,913
- Net Premiums Earned | 7.4% | 693,664 | 646,014
- Net Losses Incurred | 12.8% | 432,474 | 383,308
- Loss Expenses Incurred | 1.1% | 70,638 | 69,888
- Underwriting Expenses | 5.3% | 189,487 | 179,964
- Underwrting Gain (Loss) | NM | (39) | 12,100
- Net Loss Ratio | 2.4 pts | 72.5% | 70.2%
- Expense Ratio | (1.0) pts | 26.3% | 27.3%
- Combined Ratio | 0.9 pts | 99.6% | 98.7%
- Net Investment Income Earned | 2.6% | 52,932 | 51,596  
- Net Realized Gains (Loss) | 64.5% | 18,200 | 11,064
- Net Investment Gain (Loss) | 13.5% | 71,132 | 62,660
- Investment Yield | (0.15) pts | 2.60% | 2.75% 
- Total Other Income | 240.0% | 3,514 | 1,034
- Net Income | (4.9%) | 60,537 | 59,196 
- Return on Revenue | (0.4) pts | 7.9% | 8.4%
- Policyholders' Surplus | 2 12.8% | 1,077,866 | 955,136
- Return on Surplus | (0.5) pts | 6.0% | 6.4% 
"""

#### Get Summary of part 1

In [3]:
prompt = f"""
Create a Summary of the Financial Results section based on the data \
provided in the NAIC Report delimited by triple backticks.

The Summary should contain Net Premium Growth between the years, \
performance in Investments, Loss Ratio, Expense Ratio and Combined Ratio

NAIC Report: ```{IndustryReportPart1}```
"""

my_prompt = [
              {
                "role": "user", 
                "content": f"{prompt}"
                }
              ]      
tokens_used, finish_reason, completion1 = get_chat_completion(
                                                the_client=client, 
                                                the_model=THE_MODEL,
                                                the_messages=my_prompt)
#print(f"Completion: {completion}\nTokens used: {tokens_used}\nFinish Reason: {finish_reason}")
print(f'{completion1}')

The financial results for 2021, as reported in the NAIC Report, indicate several key trends and performance metrics in the insurance industry:

1. **Net Premium Growth**: There was a significant increase in net premiums written, which grew by 9.2% from $658,913 million in 2020 to $719,815 million in 2021. Similarly, net premiums earned rose by 7.4%, reaching $693,664 million compared to $646,014 million in the previous year.

2. **Investment Performance**: The net investment income earned saw a modest increase of 2.6%, totaling $52,932 million in 2021, up from $51,596 million in 2020. Notably, net realized gains surged by 64.5% to $18,200 million. Overall, the net investment gain improved by 13.5%, amounting to $71,132 million. However, the investment yield slightly decreased by 0.15 percentage points to 2.60%.

3. **Loss Ratio**: The net loss ratio increased by 2.4 percentage points, rising from 70.2% in 2020 to 72.5% in 2021. This indicates a higher proportion of premiums were used t

In [5]:
IndustryReportPart2 = """
UNDERWRITING OPERATIONS:
Catastrophes
According to the National Centers for Environmental
Information, National Oceanic and Atmospheric
Administration (NOAA), there were 20 weather/climate
disaster events with total losses of more than $1 billion in the U.S. in 2021. These events included 11 severe storms, 4
tropical cyclones, 2 floods, 1 wildfire event, and 1 winter storm. Overall costs for these events were $148.0 billion. The
costliest events are discussed below.

INVESTMENT OPERATIONS:
Bonds continued to comprise the majority of cash and
invested assets accounting for 48.6% of the total at
December 31, 2021. However, the low interest rate
environment has pressured insurers to seek investment
gains through more risky investments. The industry’s
cash and invested 

EMERGING RISKS:
Social Inflation
Social inflation is a term used to describe the potential for rising insurance claim costs resulting from increased
litigation, broader definitions of liability, more plaintiff-friendly legal decisions, and larger compensatory jury
awards. Social inflation has the potential to emerge through both traditional product liability exposures (e.g., asbestos
or opioids) as well as behavioral liability exposures (e.g., breach of privacy, sexual misconduct, or corporate
misconduct). In addition, social inflation exposures can emerge under various lines of coverage (e.g., general liability,
products liability, or workers’ compensation).
Economic Inflation
The U.S. economy has shown signs of rising inflation, with various measures spiking to their highest levels in over forty
years. Rising economic inflation in recent months has had impact on loss costs, which could impact reserve adequacy
and underwriting profitability for many lines of business. This has been more evident in property coverages as supply
chain issues have led to higher costs for building materials, replacement parts, and labor. In addition, rising inflation
has the potential to impact the value of a wide range of assets held by insurers, particularly fixed income holdings that
are not able to be held to maturity. Finally, inflation has the potential to lead to rapidly rising interest rates, which
could drive increased surrender activity, margin calls on certain derivatives, and have other significant life insurance
product impacts
"""

#### Get Summary of part 2

In [6]:
prompt = f"""
Create a Summary of the Different sections of the report based on the data \
provided in the NAIC Report delimited by triple backticks.

The Summary should contain summary of the individual sections. \

NAIC Report: ```{IndustryReportPart2}```
"""

my_prompt = [
              {
                "role": "user", 
                "content": f"{prompt}"
                }
              ]      
tokens_used, finish_reason, completion2 = get_chat_completion(
                                                the_client=client, 
                                                the_model=THE_MODEL,
                                                the_messages=my_prompt)
#print(f"Completion: {completion}\nTokens used: {tokens_used}\nFinish Reason: {finish_reason}")
print(f'{completion2}')

**Summary of the NAIC Report**

**Underwriting Operations: Catastrophes**
In 2021, the United States experienced 20 significant weather and climate disaster events, each causing over $1 billion in losses. These included 11 severe storms, 4 tropical cyclones, 2 floods, 1 wildfire, and 1 winter storm, with total costs amounting to $148 billion. The report highlights the financial impact of these events on the insurance industry.

**Investment Operations**
Bonds remain the predominant form of cash and invested assets, making up 48.6% of the total as of December 31, 2021. However, the persistent low interest rate environment has compelled insurers to pursue higher returns through riskier investments, affecting the industry's investment strategies.

**Emerging Risks**
- **Social Inflation**: This refers to the rising insurance claim costs due to increased litigation, broader liability definitions, and more favorable legal outcomes for plaintiffs. It affects various liability exposures, incl

#### Now get the final Summary of the two Summaries

In [7]:
prompt = f"""
Create a Summary from the contents in response1 and response2, each delimited by triple backticks.

```{completion1}```

```{completion2}```
"""

my_prompt = [
              {
                "role": "user", 
                "content": f"{prompt}"
                }
              ]      
tokens_used, finish_reason, completion3 = get_chat_completion(
                                                the_client=client, 
                                                the_model=THE_MODEL,
                                                the_messages=my_prompt)
#print(f"Completion: {completion}\nTokens used: {tokens_used}\nFinish Reason: {finish_reason}")
print(f'{completion3}')

The 2021 NAIC Report highlights key trends and challenges in the insurance industry. There was notable growth in net premiums, with a 9.2% increase in net premiums written and a 7.4% rise in net premiums earned. Investment performance showed a modest 2.6% increase in net investment income, with significant gains in net realized investments. However, the investment yield slightly decreased. The net loss ratio increased, indicating higher losses, while the expense ratio improved, reflecting better control over underwriting expenses. The combined ratio rose slightly, suggesting a minor decline in underwriting profitability.

The report also underscores the impact of 20 major weather and climate disasters in the U.S., costing $148 billion, which significantly affected the insurance sector. Bonds remain the primary investment, but low interest rates have pushed insurers towards riskier investments. Emerging risks include social inflation, driven by increased litigation and broader liability

#### Now get the Summary of the entire text in one go and then compare with the above
It will be great to use Embeddings to compare if the two are confirmed similar!

In [8]:
IndustryReport = """
FINANCIAL RESULTS 2021 (in million dollars, except for percent):
- Item | YoY Change | 2021 | 2020
- Net Premium Written | 9.2% | 719,815 | 658,913
- Net Premiums Earned | 7.4% | 693,664 | 646,014
- Net Losses Incurred | 12.8% | 432,474 | 383,308
- Loss Expenses Incurred | 1.1% | 70,638 | 69,888
- Underwriting Expenses | 5.3% | 189,487 | 179,964
- Underwrting Gain (Loss) | NM | (39) | 12,100
- Net Loss Ratio | 2.4 pts | 72.5% | 70.2%
- Expense Ratio | (1.0) pts | 26.3% | 27.3%
- Combined Ratio | 0.9 pts | 99.6% | 98.7%
- Net Investment Income Earned | 2.6% | 52,932 | 51,596  
- Net Realized Gains (Loss) | 64.5% | 18,200 | 11,064
- Net Investment Gain (Loss) | 13.5% | 71,132 | 62,660
- Investment Yield | (0.15) pts | 2.60% | 2.75% 
- Total Other Income | 240.0% | 3,514 | 1,034
- Net Income | (4.9%) | 60,537 | 59,196 
- Return on Revenue | (0.4) pts | 7.9% | 8.4%
- Policyholders' Surplus | 2 12.8% | 1,077,866 | 955,136
- Return on Surplus | (0.5) pts | 6.0% | 6.4% 

UNDERWRITING OPERATIONS:
Catastrophes
According to the National Centers for Environmental
Information, National Oceanic and Atmospheric
Administration (NOAA), there were 20 weather/climate
disaster events with total losses of more than $1 billion in the U.S. in 2021. These events included 11 severe storms, 4
tropical cyclones, 2 floods, 1 wildfire event, and 1 winter storm. Overall costs for these events were $148.0 billion. The
costliest events are discussed below.

INVESTMENT OPERATIONS:
Bonds continued to comprise the majority of cash and
invested assets accounting for 48.6% of the total at
December 31, 2021. However, the low interest rate
environment has pressured insurers to seek investment
gains through more risky investments. The industry’s
cash and invested 

EMERGING RISKS:
Social Inflation
Social inflation is a term used to describe the potential for rising insurance claim costs resulting from increased
litigation, broader definitions of liability, more plaintiff-friendly legal decisions, and larger compensatory jury
awards. Social inflation has the potential to emerge through both traditional product liability exposures (e.g., asbestos
or opioids) as well as behavioral liability exposures (e.g., breach of privacy, sexual misconduct, or corporate
misconduct). In addition, social inflation exposures can emerge under various lines of coverage (e.g., general liability,
products liability, or workers’ compensation).
Economic Inflation
The U.S. economy has shown signs of rising inflation, with various measures spiking to their highest levels in over forty
years. Rising economic inflation in recent months has had impact on loss costs, which could impact reserve adequacy
and underwriting profitability for many lines of business. This has been more evident in property coverages as supply
chain issues have led to higher costs for building materials, replacement parts, and labor. In addition, rising inflation
has the potential to impact the value of a wide range of assets held by insurers, particularly fixed income holdings that
are not able to be held to maturity. Finally, inflation has the potential to lead to rapidly rising interest rates, which
could drive increased surrender activity, margin calls on certain derivatives, and have other significant life insurance
product impacts
"""

prompt = f"""
Create a Summary from the contents in IndustryReport delimited by triple backticks.

NAIC Report: ```{IndustryReport}```
"""

my_prompt = [
              {
                "role": "user", 
                "content": f"{prompt}"
                }
              ]      
tokens_used, finish_reason, completion4 = get_chat_completion(
                                                the_client=client, 
                                                the_model=THE_MODEL,
                                                the_messages=my_prompt)
#print(f"Completion: {completion}\nTokens used: {tokens_used}\nFinish Reason: {finish_reason}")
print(f'{completion4}')

The 2021 NAIC Industry Report highlights key financial results, underwriting operations, investment operations, and emerging risks in the insurance sector.

**Financial Results:**
- Net Premium Written increased by 9.2% to $719.8 billion.
- Net Premiums Earned rose by 7.4% to $693.7 billion.
- Net Losses Incurred saw a significant increase of 12.8% to $432.5 billion.
- Underwriting Expenses grew by 5.3% to $189.5 billion, resulting in an underwriting loss of $39 million.
- The Combined Ratio slightly increased to 99.6%, indicating a marginal decline in underwriting profitability.
- Net Investment Income Earned increased by 2.6% to $52.9 billion, while Net Realized Gains surged by 64.5% to $18.2 billion.
- Overall, Net Income decreased by 4.9% to $60.5 billion, with a Return on Revenue of 7.9%.

**Underwriting Operations:**
- The U.S. experienced 20 significant weather/climate disaster events in 2021, costing $148 billion, including severe storms, tropical cyclones, floods, a wildfire, 

## To Do Exercise
Try to create the summary of the summaries of the chunks and a summary of the entire text and use embeddings to compare results and check if these two summaries (one direct and the other obtained via chunking) are similar!

In [10]:
prompt = f"""
Compare the two documents delimited by triple backticks and infer if both are saying the same thing. 
Answer in one word, either 'yes' or 'no'.

Document1: ```{completion3}```
Document2: ```{completion4}```
"""

my_prompt = [
              {
                "role": "user", 
                "content": f"{prompt}"
                }
              ]      
tokens_used, finish_reason, completion5 = get_chat_completion(
                                                the_client=client, 
                                                the_model=THE_MODEL,
                                                the_messages=my_prompt)
#print(f"Completion: {completion}\nTokens used: {tokens_used}\nFinish Reason: {finish_reason}")
print(f'{completion5}')

Yes
