# Demo / Hands-On Session: Gen AI for Internal Productivity Increase

In [1]:
import json

import numpy as np
import pandas as pd
import urllib3
from langchain.prompts import ChatPromptTemplate

from src.models import gpt35, embedding_model, gpt4
from src.utils import cosine_similarity

## **Open the Hood - RAG & Semantic Search**

<img src="data/open-the-hood-wide.jpg" alt="image" width="800" height="auto">

For most internal use cases, Large Language Models need access to internal data. The easiest and most popular method to give LLMs access to internal documents is **Retrieval Augmented Generation (RAG)**. 

The following is a high-level overview of RAG:

<img src="data/rag.jpg" alt="image" width="600" height="auto">

1. User enters a query that is combined with a pre-configured prompt
2. The query is used to search for documents that are relevant to the query
3. The *retriever* returns the relevant documents
4. Query, prompt and the relevant context are combined and sent to the LLM
5. The LLM answers the query by encorporating the provided context

The **retrieval** (steps 2 + 3) can be implemented in various ways, as long as it identifies documents that are relevant to the query. Most popular and relatively easy to implement is **semantic search**.

### Semantic Search

A concrete example: Let's say we want to identify passages in our terms and conditions that are relevant for electric vehicles. A simple keyword search is hardly effective because of the different phrases you could use:

Search Phrase|Phrase in Terms and Conditions|Match
---|---|:---:
electrical vehicle|electrical vehicle| <font color="green"> ✔ </font>
electrical vehicle|electrical car|❌
electrical vehicle|electric car|❌

Semantic search & embeddings allow to compare the latent **meaning** of words and phrases:
Search Phrase|Phrase in Terms and Conditions|Match
---|---|:---:
electrical vehicle|electrical vehicle|<font size="3"> ⬤ </font>
electrical vehicle|electrical car|<font size="4"> ◕ </font>
electrical vehicle|electric car|<font size="4"> ◕ </font>


### Word Embeddings

* Embeddings are multidimensional vectors that represent the meaning of words or phrases
* Words or phrases with similar meanings have vectors that are close / similar to each other
* There are different ways to measure vector similarity. One popular way is cosine similarity: $cos \varphi = {{\vec a \cdot \vec b} \over {|\vec a| \cdot |\vec b|}}$


<img src="data/cosine-similarity.jpg" alt="image" width="400" height="auto">

Retrieve embeddings from OpenAI embedding models and compute their similarity.

In [3]:
cosine_similarity(
    embedding_model.embed_query("electric vehicle"), 
    embedding_model.embed_query("electric car")
)

0.9746821759962818

In [4]:
cosine_similarity(
    embedding_model.embed_query("electric vehicle"), 
    embedding_model.embed_query("horse")
)

0.7720132668477968

**TASK**: Adapt the code above to compute similarities of words or longer phrases to get a better feeling for how embeddings relate to each other.

## **Cruising on AXA's freeway**

<img src="data/mustang-cruising-wide.jpg" alt="image" width="800" height="auto">

How is this relevant for AXA? One of many ways RAG can be used in an insurance company, is to try to automatically determine whether or not a claim is covered based on the claim description, the individual policy and the general terms and conditions.

Prerequisites for our demo:
* Access to OpenAIs LLMs und Embedding-Modellen
* [Allg. Vertragsbedingungen MF](data/MF-AVB.pdf)
* [MF-Police](data/MF-Police.pdf)


Load terms and conditions from a file and compute embeddings for each passage

In [2]:
insurance_conditions = pd.read_csv("data/MF-AVB.csv", sep="@")
insurance_conditions["embedding"] = embedding_model.embed_documents(insurance_conditions["Text"])
insurance_conditions.head(3)

APIConnectionError: Connection error.

Function to do a semantic search on terms and conditions

In [29]:
def semantic_search(query, df, top_n=10):
    """Returns the top_n most relevant rows for a given query"""

    # Embed query
    query_embedding = embedding_model.embed_query(query)

    # Calculate similarity between query and row embedding
    df["similarity"] = df["embedding"].apply(
        lambda x: cosine_similarity(x, query_embedding)
    )

    # Return top_n most relevant AVB passages
    return df.sort_values(by="similarity", ascending=False).iloc[:top_n][
        ["id", "Teil", "Titel", "Untertitel", "Text", "similarity"]]

Execute semantic search for different words and phrases

In [30]:
semantic_search("Elektroauto", terms_and_conditions, top_n=5)

Unnamed: 0,id,Teil,Titel,Untertitel,Text,similarity
98,D5.1,Services und Zusatzleistungen,E-Mobilität Ladestation,Versicherte Sache,D5.1) E-Mobilität Ladestation/Versicherte Sach...,0.811702
102,D6.1,Services und Zusatzleistungen,E-Mobilität Batterie,Versicherte Sache,D6.1) E-Mobilität Batterie/Versicherte Sache: ...,0.80595
100,D5.3,Services und Zusatzleistungen,E-Mobilität Ladestation,Nicht versichert sind:,D5.3) E-Mobilität Ladestation/Nicht versichert...,0.795725
103,D6.2,Services und Zusatzleistungen,E-Mobilität Batterie,Versicherungsschutz,D6.2) E-Mobilität Batterie/Versicherungsschutz...,0.795004
99,D5.2,Services und Zusatzleistungen,E-Mobilität Ladestation,Versicherungsschutz,D5.2) E-Mobilität Ladestation/Versicherungssch...,0.794113


... also works for entire phrases

In [31]:
semantic_search("Ich hatte einen Zusammenstoss mit einer Wildsau", terms_and_conditions, top_n=5)

Unnamed: 0,id,Teil,Titel,Untertitel,Text,similarity
62,C2.7,Kaskoversicherung: Schäden an Ihrem Fahrzeug,"Schäden durch Natur, Tiere\nund Unbekannte (Te...",Kollision mit Tieren,"C2.7) Schäden durch Natur, Tiere\nund Unbekann...",0.808831
28,A12.3,Rahmenbedingungen des Versicherungsvertrags,Schadenfall,Kasko,A12.3) Schadenfall/Kasko: Die bzw. der Anspruc...,0.803845
59,C2.4,Kaskoversicherung: Schäden an Ihrem Fahrzeug,"Schäden durch Natur, Tiere\nund Unbekannte (Te...",Naturereignisse,"C2.4) Schäden durch Natur, Tiere\nund Unbekann...",0.8005
112,D8.1,Services und Zusatzleistungen,Verletzung an Ihnen und Mitfahrenden,Versicherungsschutz,D8.1) Verletzung an Ihnen und Mitfahrenden/Ver...,0.792004
61,C2.6,Kaskoversicherung: Schäden an Ihrem Fahrzeug,"Schäden durch Natur, Tiere\nund Unbekannte (Te...",Schäden durch Marder und Nagetiere,"C2.6) Schäden durch Natur, Tiere\nund Unbekann...",0.791725


... also works (to some extent) when using a different language in the query

In [32]:
semantic_search("I crashed into a deer", terms_and_conditions, top_n=5)

Unnamed: 0,id,Teil,Titel,Untertitel,Text,similarity
88,D1,Services und Zusatzleistungen,Grobfahrlässigkeit,,D1) Grobfahrlässigkeit: Bei grobfahrlässiger V...,0.7725
62,C2.7,Kaskoversicherung: Schäden an Ihrem Fahrzeug,"Schäden durch Natur, Tiere\nund Unbekannte (Te...",Kollision mit Tieren,"C2.7) Schäden durch Natur, Tiere\nund Unbekann...",0.77179
34,A12.8,Rahmenbedingungen des Versicherungsvertrags,Schadenfall,Angetrunkener und fahrunfähiger Zustand oder k...,A12.8) Schadenfall/Angetrunkener und fahrunfäh...,0.764479
59,C2.4,Kaskoversicherung: Schäden an Ihrem Fahrzeug,"Schäden durch Natur, Tiere\nund Unbekannte (Te...",Naturereignisse,"C2.4) Schäden durch Natur, Tiere\nund Unbekann...",0.763711
58,C2.3,Kaskoversicherung: Schäden an Ihrem Fahrzeug,"Schäden durch Natur, Tiere\nund Unbekannte (Te...","Glasbruch an Front-, Heck- und Seitenscheiben","C2.3) Schäden durch Natur, Tiere\nund Unbekann...",0.762035


### Demo 1: Simple coverage check based on policy and terms & conditions

Load Policy and print extract

In [13]:
with open('data/MF-Police.json', 'r') as f:
    policy = json.load(f)

print(json.dumps(policy, indent=4)[:1000] + "\n...")

{
    "Offerte": {
        "Police": {
            "Motorfahrzeugversicherung": "OPTIMA",
            "PoliceNr": "16.167.745",
            "Versicherungsnehmer": "Max Mustermann",
            "Adresse": "Pionierstrasse 3, 8400 Winterthur",
            "Beginn": "11.06.2021",
            "Ablauf": "31.12.2024",
            "Zahlungsart": "j\u00e4hrlich",
            "Jahrespr\u00e4mieF\u00e4lligAm": "01.01.",
            "Vertragsgrundlagen": "Allgemeinen Vertragsbedingungen (AVB) f\u00fcr STRADA, Ausgabe 10.2013, www.axa.ch/doc/aacew",
            "Pr\u00e4mien\u00fcbersicht": {
                "Haftpflicht": {
                    "Jahrespr\u00e4mieBruttoInCHF": 1065.3,
                    "Pr\u00e4mienstufe": 30,
                    "Jahrespr\u00e4mieNettoInCHF": 319.59,
                    "GesetzlicheAbgabenInCHF": 22.58
                },
                "Kasko": {
                    "Kollision": {
                        "Jahrespr\u00e4mieBruttoInCHF": 1324.3,
                  

Implement function for coverage check

1. Find passages from insurance conditions that are most relevant for the given claim
2. Ask an LLM whether the claim is covered given the insurance conditions, the policy and the accident description

**TASK**: Complete the given code by replacing the placeholder `...` with the actual code

In [6]:
coverage_check_prompt = ChatPromptTemplate.from_template("""
    Check whether and / or to what extent the given claim as described in the damage description is covered by the insurance.
    Policy: '''{policy} '''
    General insurance conditions: '''{insurance_conditions} '''
    Damage description: '''{damage_description} '''
""")

def check_coverage(damage_description, policy=policy, insurance_conditions=insurance_conditions, prompt=coverage_check_prompt, llm=gpt4):
    """Check if a given claim is covered by the insurance by extracting relevant insurance conditions and calling an LLM"""
    
    # Find relevant passages from general insurance conditions
    relevant_conditions = ...
    
    relevant_conditions = "\n\n".join(relevant_conditions["Text"])

    # Build chain and call LLM
    chain = prompt | llm
    return chain.invoke({"insurance_conditions": relevant_conditions, "policy": policy, "damage_description": damage_description})

#### Deckung prüfen

In [19]:
check_coverage("I had an accident at the Hallauer Bergrennen").content

'Gemäß den Vertragsbedingungen sind Schäden, die bei der Teilnahme an Rennen und ähnlichen Fahrten entstehen, nicht versichert (B6.2, C11.3). Da der Schadenhergang beim Hallauer Bergrennen auftrat, handelt es sich um eine Rennveranstaltung, und der Schaden ist daher nicht gedeckt.\n\nDa der Schadenhergang nicht gedeckt ist, werden die Kosten für Reparatur, Rettung, Bergung und andere Leistungen des Versicherers nicht übernommen. Es ist ratsam, sich an den Versicherer zu wenden, um weitere Details und Informationen zu erhalten.'

In [20]:
check_coverage(
    "I was driving on the N1 from Winterthur to St. Gallen when a deer was crossing the street all of a sudden. "
    "I couldn't stop in time and crashed into the deer."
).content

'Gemäß den Vertragsbedingungen ist der beschriebene Schadenhergang gedeckt. Der Schaden wird gemäß den Bedingungen der Teilkasko-Versicherung als Schaden durch Zusammenstoß mit Tieren behandelt. Die Versicherung übernimmt die Kosten für die Reparatur des Fahrzeugs. Es ist jedoch erforderlich, dass der Anspruchsberechtigte der Versicherung ermöglicht, das beschädigte Fahrzeug vor der Reparatur zu besichtigen.'

In [21]:
check_coverage(
    "I parked my car at the Migros in Rosenberg. When coming back to the car, "
    "I noticed a dent that obviously had been caused by another car while I was gone."
).content

'Gemäß den Vertragsbedingungen ist der beschriebene Schadenhergang durch die Kaskoversicherung abgedeckt. Da es sich um einen Schaden handelt, der durch ein anderes Fahrzeug verursacht wurde, wird er unter der Kategorie "Schäden am parkierten Fahrzeug" behandelt. Die genauen Leistungen sind in der Police aufgeführt. Es ist jedoch zu beachten, dass die Anzahl versicherter Schadenfälle in der Police pro Versicherungsjahr begrenzt sein kann. In diesem Fall gilt die Begrenzung unabhängig von einem Fahrzeugwechsel und den Monaten, in denen der Vertrag im Kalenderjahr in Kraft war. Es ist ratsam, den Schaden unverzüglich der Versicherung zu melden und ihre Einwilligung für Reparaturen einzuholen, um sicherzustellen, dass der Schaden gedeckt ist.'

**TASK**: Come up with other damage / accident descriptions and check whether they are covered. Adapt the `coverage_check_prompt` to see how it influences the output.

### Demo 2: Agentic Workflows & ReAct

While the simple approach from above might work in simple cases, it has various limitations:
* The semantic search is always executed with the query given by the user. Sometimes this might not find all relevant passages (e.g. for long queries).
* The LLM has no information about current information such as the date
* LLMs are notoriously bad at maths (although they might think otherwise)

One solution is to give the LLM access to external tools such as a calculator and have it decide itself when to use them.

A way to implement this is with **ReAct**. In this approach, the LLM is prompted to think at each step what it needs to do in order to solve a given query. It then can either decide that it has all required information to answer the query or that it has to use another tool at its disposal in order to get additional information:
<img src="data/react.png" alt="image" width="800" height="auto">

Import required dependencies

In [7]:
from datetime import date

from langchain.tools import tool
from langchain.agents import AgentExecutor, create_react_agent

from src.utils import Calculator

Define the available tools as python functions. The docstrings are made available to the LLM and should therefore be as descriptive as possible.

In [16]:
@tool
def get_current_date() -> str:
    """
    Get the current date. Always use this tool when dates are relevant to answer the query.
    """
    return date.today().strftime("%Y-%m-%d")


@tool
def get_relevant_terms_and_conditions(query: str) -> str:
    """
    Get the relevant terms and conditions for a given query by doing a semantic search.
    Use this tool multiple times with different queries if you are missing relevant information.
    """
    return semantic_search(query, insurance_conditions)["Text"]


@tool
def make_calculation(expression: str) -> str:
    """
    Make a calculation by solving the given mathematical expression.
    Always use this tool when making calculations. Don't try to make calculations without it.
    """
    # Make sure the expression is well formatted
    expression = expression.replace("'", "").replace('"', '').strip()
    
    calc = Calculator()
    return calc.eval_expr(expression)


@tool
def get_current_car_value(model: str) -> int:
    """Get the value of the specified car model in CHF"""
    return 40000  # Dummy value

Define the *ReAct* Prompt. This is just a template and can be adapted if needed.

In [18]:
react_prompt = ChatPromptTemplate.from_template("""
Answer the following questions as best you can. You have access to the following tools:

{tools}
    
Use the following format:

Question: the input question you must answer

Thought: you should always think about what to do

Action: the action to take, should be one of [{tool_names}]

Action Input: the input to the action

Observation: the result of the action

... (this Thought/Action/Action Input/Observation can repeat N times)

Thought: I now know the final answer

Final Answer: the final answer to the original input question in the language of the question

Begin!

Question: Check whether and / or to what extent the given claim as described in the damage description is covered by the insurance. Policy: '''{policy} ''' Damage description: '''{damage_description} '''

Thought:{agent_scratchpad}
""")

Initialize the ReAct agent

In [19]:
tools = [get_current_date, get_relevant_terms_and_conditions, make_calculation, get_current_car_value]

agent = create_react_agent(gpt4, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [None]:
def react_coverage_check(damage_description):
    return agent_executor.invoke({"policy": policy, "damage_description": damage_description})

Now we are going to execute the coverage check with *ReAct*. Whether the given damage is considered a total loss, i.e. whether the covers the replacement of the car or only the repair, depends on several aspects:
* How old is the car?
* What would repairing the car cost in comparison to the value of the car?
* Does the policy include a "purchase price guarantee".

See section C10.2 in the [insurance conditions](data/MF-GIC.pdf) for more details.

In [20]:
react_coverage_check(
    "I crashed into a tree with my Tesla last week while driving to work. The repair cost estimate is 25000 CHF. "
    "Is this considered a total loss? In other words, will my insurance cover the costs to replace the car or only pay for the repair?"
)

**TASKs**: 
* Execute the same coverage check multiple times to see what happens
* Adapt the prompt to see if you can find a way to make the agent reason in a correct way.
* Come up with additional claims to see how it changes the behaviour of the agent
* Change the LLM from `gpt4` to `gpt35` to see how it influences the behaviour