<br>
<b><font size="6">Knowlege graphs based RAG testing</font></b>
<br><br>
<font size="3">Date: 2024.07.02.</font><br>
<font size="3">Author: Petkovics Tamás <<a href="mailto:petkow@gmail.com">petkow@gmail.com</a>></font>


# Introduction<br>

<font size="4">
    
**Business Problem**<br> 
Utilizing raw, unstructured financial news data to derive financially meaningful, accurate, and consistent signals for quantitative investors is challenging.


**Business Question**<br> 
How can we systematically process and extract accurate knowledge from unstructured financial news data?


**Proposal**<br> 
Employ fine-tuned Large Language Models (LLMs) augmented with curated knowledge from Knowledge Graphs to accurately interpret, consistently analyze, and transform raw news data into a structured format with substantial market value.
</font>

## Knowledge Graphs as source for Retrieval-Augmented Generation (RAG)<br> 
<font size="4">  
Constructing knowledge graphs with LLMs is well-explored, but leveraging them effectively is less common. It is beneficial to construct knowledge graphs primarily from factual, curated, and structured data, which LLMs and laypersons often lack when interpreting news. Knowledge graphs can utilize ontologies and symbolic reasoning to complement the stochastic nature of LLMs with logic-based rules and implicit knowledge, enhancing consistency and accuracy.
</font>

![Image](image3.png)

## Knowledge Graph Implementations (High-Level Architectural Considerations)<br> 
<font size="4">
    
The choice of implementation has significant implications for capabilities, particularly in terms of symbolic reasoning and data exchange. There are two main frameworks:
</font>

### Semantic Web (W3C), RDF, and OWL<br>  
<font size="4">
    
**Pros:**
- Supports symbolic reasoning and inter-organizational data exchange
- Based on open specifications, avoiding vendor lock-in

**Cons:**
- Less efficient implementations
- Rigid, with many constraints due to specifications
- Steeper learning curve
</font>

### Labeled Property Graphs (e.g., Neo4J)<br>  
<font size="4">    

**Pros:**
- Efficient, quick, and flexible
- Easier to learn

**Cons:**
- Lacks standards, leading to vendor lock-in
- Does not support symbolic reasoning and federated data exchange
</font>

***
**Semantic web stack:**
![Semantic Web Stack](sws.png)

![Ontology](onto.svg)

## Showcasing a Small-Scale RAG Application with Agentic Designs<br> 
<font size="4">  
This section demonstrates the practical application of a small-scale Retrieval-Augmented Generation system using agentic designs.
</font>

![Image](image4.png)

# Some code

In [1]:
# Claude
import anthropic
from tenacity import retry, stop_after_attempt, wait_random_exponential

# Neo4J
from py2neo import Graph

# RDF
from rdflib import Graph as RDFGraph
from rdflib.plugins.stores.sparqlstore import SPARQLStore

# RAG components

## RDF

In [2]:
fuseki_store = SPARQLStore("http://fuseki:3030/ds")
rdf_graph = RDFGraph(store=fuseki_store)

In [3]:
def generate_search_query_rdf(search_strings):
    
    if isinstance(search_strings, list):
        regex_pattern = '|'.join(search_strings)
    else:
        regex_pattern = search_strings
    
    sparql_template = """
        PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
        PREFIX owl: <http://www.w3.org/2002/07/owl#>
        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        DESCRIBE ?subject ?obj1 ?obj2 WHERE {{

            ?subject rdfs:label ?label .
            ?subject a owl:NamedIndividual .
            
            FILTER(REGEX(STR(?label), "{regex_pattern}", "i"))
            
             OPTIONAL {{
                 ?subject ?p ?obj1 .
                 ?obj1 a owl:NamedIndividual .
                 
                       OPTIONAL {{
                         ?obj1 ?p2 ?obj2 .
                         ?obj2 a owl:NamedIndividual .
                       }}
                            
                 }}

                
        }}
    """
    return sparql_template.format(regex_pattern=regex_pattern)

def rdf_query(search_string):
    result = rdf_graph.query(generate_search_query_rdf(search_string))
        
    return result.serialize(format='ttl', encoding='utf-8').decode('utf-8')

In [4]:
# Testing with some keywords
print(rdf_query(['SolarSpark', 'GreenEnergy Holdings', 'Jane Doe', 'retail energy services']))

@prefix : <http://www.example.org/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

:Green_Energy a :Company,
        rdfs:Resource,
        owl:NamedIndividual,
        owl:Thing ;
    rdfs:label "GreenEnergy Holdings LLC" ;
    :EBITDA_YoY_change "-40%" ;
    :companyEvent :RegulatoryEvent ;
    :customerChurnRate "25%" ;
    owl:sameAs :Green_Energy .

:SolarSpark a :Company,
        rdfs:Resource,
        owl:NamedIndividual,
        owl:Thing ;
    rdfs:label "SolarSpark Inc." ;
    :EBITDA_YoY_change "-15%" ;
    :debtToEquityRatio "2.8" ;
    :marketCapitalization "$300 million" ;
    :recentAcquisitions "3.5" ;
    :recentLayoffs "20% of workforce" ;
    owl:sameAs :SolarSpark .

:RegulatoryEvent a :Event,
        rdfs:Resource,
        owl:NamedIndividual,
        owl:Thing ;
    rdfs:label "GreenEnergy regulatory event" ;
    rdfs:comment "GreenEnergy Holdings is facing potential license revocation in two major market

In [5]:
# Testing with some keywords
print(rdf_query(['EuroProperty Advisors', 'e-commerce', 'industrial real estate', 'warehouse', 'Ruhr Area', 'Greater London', 'Paris', 'Warsaw']))

@prefix : <http://www.example.org/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

_:N4c1085ebf3404ee08b809eb2a928a8e3 a rdfs:Class,
        rdfs:Resource,
        owl:Class,
        owl:Restriction ;
    rdfs:subClassOf _:N4c1085ebf3404ee08b809eb2a928a8e3,
        _:Nd7a34725adae49d286baa0b3f9f8eced,
        :GrowingIndustry,
        :Industry,
        rdfs:Resource,
        owl:Thing ;
    owl:equivalentClass _:N4c1085ebf3404ee08b809eb2a928a8e3,
        _:Nd7a34725adae49d286baa0b3f9f8eced,
        :GrowingIndustry ;
    owl:onProperty :industryExempted ;
    owl:someValuesFrom :Regulation .

_:Nd7a34725adae49d286baa0b3f9f8eced a rdfs:Class,
        rdfs:Resource,
        owl:Class ;
    rdfs:subClassOf _:N4c1085ebf3404ee08b809eb2a928a8e3,
        _:Nd7a34725adae49d286baa0b3f9f8eced,
        :GrowingIndustry,
        :Industry,
        rdfs:Resource,
        owl:Thi

## Neo4J

In [6]:
neo4j_graph = Graph("bolt://neo4j:7687", auth=("neo4j", "password"))

### Query test

In [7]:
def generate_company_search_query_neo4j(search_string):
    
    if isinstance(search_string, list):
        regex_pattern = '|'.join(search_string)
    else:
        regex_pattern = search_string
    
    cypher_template = """
    MATCH p=(source)-[r *1..2]-(target)
    WHERE source.name =~ '(?i).*({regex_pattern}).*'
    
    RETURN source AS source_node,
           target AS related_node,
           [rel IN relationships(p) | {{
               type: type(rel),
               properties: properties(rel),
               start_node: startNode(rel).name,
               end_node: endNode(rel)
           }}] AS relationships
    """
    return cypher_template.format(regex_pattern=regex_pattern)

def neo4j_query(search_string):
    results = neo4j_graph.run(generate_company_search_query_neo4j(search_string)).data()
    
    output = []
    for record in results:
        source = record['source_node']
        related = record['related_node']
        relationships = record['relationships']
        
        output.append(f"Source: {source['name']}")
        output.append(f"Related: {related}")
        for rel in relationships:
            output.append(f"Relationship: {rel['start_node']} --[{rel['type']}]--> {rel['end_node']}")
            output.append(f"Properties: {rel['properties']}")
        output.append("---")
    
    return "\n".join(output)

In [8]:
# Testing with some keywords
print(neo4j_query('GreenEnergy'))

Source: GreenEnergy Holdings LLC
Related: (_2:Company {EBITDA_YoY_change: '-15%', debtToEquityRatio: '2.8', marketCapitalization: '$300 million', name: 'SolarSpark Inc.', recentAcquisitions: '3.5', recentLayoffs: '20% of workforce'})
Relationship: SolarSpark Inc. --[ACQUIRES]--> (_1:Company {EBITDA_YoY_change: '-40%', customerChurnRate: '25%', name: 'GreenEnergy Holdings LLC'})
Properties: {}
---
Source: GreenEnergy Holdings LLC
Related: (_14:Event {comment: 'GreenEnergy Holdings is facing potential license revocation in two major markets due to alleged fraudulent practices.', name: 'GreenEnergy regulatory event'})
Relationship: GreenEnergy Holdings LLC --[COMPANY_EVENT]--> (_14:Event {comment: 'GreenEnergy Holdings is facing potential license revocation in two major markets due to alleged fraudulent practices.', name: 'GreenEnergy regulatory event'})
Properties: {}
---
Source: GreenEnergy regulatory event
Related: (_1:Company {EBITDA_YoY_change: '-40%', customerChurnRate: '25%', name:

# LLM agents

In [9]:
claude_client = anthropic.Anthropic()

In [10]:
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def call_claude(messages, system_prompt = '', tool = False):

    if tool:
        tool_params = {
            'tool_choice': {"type": "any"},
            'tools' : [kg_search_tool_definition, submission_tool_definition]
        }
    else:
        tool_params = {}
    
    response = claude_client.messages.create(
        model='claude-3-5-sonnet-20240620',
        max_tokens=4000,
        system=system_prompt,
        messages=messages,
        temperature=0,
        **tool_params
        
    )

    return response

## KG-to-text agent

In [11]:
system_prompt_kg2txt = """You are a graph-to-text converter AI. You will receive a knowledge graph extract in an RDF turtle format or a Neo4J labeled property graph extract, and you need to translate this into natural language format to support the decision making of a financial analyst.
You will also receive a news article to which the supplemental knowledge from the knowledge graph pertains to, and you need to provide a textual summary on relevant facts from the graph, that might assist the analyst to understand the context of the news article accurately."""

In [12]:
main_prompt_kg2txt = """
Within the tags '<graph></graph>' you will receive a raw subgraph data from a knowledge graph. Within the tags '<article></article>' you will receive a financial news article.

<graph>
{}
</graph>

<article>{}</article>

Translate the graph data into a natural language format based summary to support the analysis of the news article."""

In [13]:
def kg_query_agent(news_article_str, search_strings, source='rdf'):

    if source == 'rdf':
        query_results = rdf_query(search_strings)
    elif source == 'neo4j':
        query_results = neo4j_query(search_strings)
    else:
        raise ValueError("Invalid source parameter value: should be one of these 'rdf', 'neo4j'")
    
    messages = [{'role': 'user','content':[{'type': 'text','text': main_prompt_kg2txt.format(query_results, news_article_str)}]}]
    
    response = call_claude(messages, system_prompt_kg2txt)
    
    return response.content[0].text

## Analyst agent

In [14]:
system_prompt_analyst = """You are a financial news analyst AI. Your task is to read financial news articles, identify key entities and asset classes relevant to the news, and provide a sentiment score on a scale of -10 to 10 predicting investor sentiment.

- **Sentiment Scale:**
  - -10: Extremely bearish sentiment
  - 0: Neutral sentiment
  - +10: Extremely bullish sentiment

- **Process:**
     - Base your analysis on the information provided in the news article.
     - Provide a sentiment score with a brief rationale (1-5 sentences) explaining your decision, considering financial metrics, regulatory issues, industry trends, macroeconomic events, and entity-specific events.
     - Identify key entities (e.g., companies, countries, industries, financial instruments, regions) from the news article for which additional context is needed.
     - Use the knowledge graph tool to query relevant details about these entities.
     - Extend your analysis with the new information, as it may reveal crucial details not present in the public news release.
     - Update your sentiment score and rationale based on the additional context.

- **Important Considerations:**
  - Assess how the news might affect future prospects, market positions, financial health, and macroeconomic conditions.
  - Be willing to make substantial changes to your sentiment score when presented with additional context, as the knowledge graph often contains non-public information that can alter the interpretation of the news event.

Submit your results after completing the contextual analysis."""

In [15]:
main_prompt_template_analyst = """Within the tags '<article></article>' you will receive a relevant news article.

<article>{}</article>

Provide your sentiment analysis on this news article, after you have identified key entities and requested additional context from the knowledge graph, for which you have tool available for you to use."""

### Claude tool definitions

In [16]:
kg_search_tool_definition = {
    "name": "query_knowledge_graph",
    "description": "Retrieve supplemental information from a knowledge graph agent on companies, organizations, instruments and people. The query will result in a textual response from the agent containing additional context in a natural language format.",
    "input_schema": {
        "type": "object",
        "properties": {
                "search_strings": {
                    "type": "array",
                    "items": {
                        "type": "string",
                        "description": "List of search strings, for which every element labeled similarly should be retrieved from the knowledge graph, with a connected subgraph containing details and relations of the target entity. Provide simple stems of strings and avoid multiple words if possible as the search is based on a simple case insensitive regex matching."
                    }
                }
        },
        "required": ["search_string"]
    }
}


# Submission tool
submission_tool_definition = {
    "name": "submit_results",
    "description": "Submit sentiment analysis results",
    "input_schema": {
           "type":"object",
   "properties":{
      "summary":{
         "description":"Short summary of the news article",
         "type":"string",
         "maxLength":600
      },
      "key_entities":{
         "type":"array",
         "items":{
            "type":"string",
            "description":"list of key entities, companies, instruments to which the article pertains to"
         }
      },
      "region":{
            "type":"string",
            "enum":[
               "US",
               "CA", 
               "UK",
               "EU",
               "AU",
               "JP",
               "CN",
               "GLOBAL"
            ],
            "description":"The economic region where this article might be relevant."
      },    
      "asset_class":{
         "type":"array",
         "items":{
            "type":"string",
            "enum":[
               "EQUITIES",
               "FIXED INCOME",
               "CASH",
               "REAL ESTATE",
               "COMMODITIES"
            ],
            "description":"List of asset classes for which the article and the resulting investment sentiment might be relevant"
         }
      },
    "sentiment_score_per_asset_class":{
         "type":"array",
         "items":{
            "type":"object",
            "properties":{
                "asset_class":{
                    "type":"string",
                    "enum":[
                       "EQUITIES",
                       "FIXED INCOME",
                       "CASH",
                       "REAL ESTATE",
                       "COMMODITIES"
                    ]},
                "sentiment_score": {
                     "type":"number",
                     "minimum":-10,
                     "maximum":10
                }
            },      
            "description":"List of objects, which provide a sentiment score from -10 to +10 for every relevant asset class."
         }
      },
      "overall_sentiment_score":{
         "description":"The overall sentiment score of the article, a value from -10 to +10.",
         "type":"number",
         "minimum":-10,
         "maximum":10
      },
      "rationale":{
         "description":"The short summary on the rationale behind choosing the particular sentiment value.",
         "type":"string",
         "maxLength":300
       },
     "confidence":{
         "description":"A confidence score on the analysis based on how much information and context is available to provide a firm analysis on the situation. Minimum number of 0 means, that the analysis is completely arbitrary, not enough information is available to provide a valid sentiment score. 100 means the analysis is very thorough, all the relevant information was available to provide a definite sentiment analysis.",
         "type":"number",
         "minimum":0,
         "maximum":100
      },
      },
        "required": ["summary", "key_entities", "asset_class", "region", "overall_sentiment_score", "sentiment_score_per_asset_class", "confidence"]
    }
}

### API Call functions

In [17]:
def claude_news_analytics(news_article_str, RAG=None):
    messages = [{'role': 'user','content':[{'type': 'text','text': main_prompt_template_analyst.format(news_article_str)}]}]

    response = call_claude(messages, system_prompt_analyst, tool = True )
    messages.append({"role": "assistant", "content": response.content})

    tool_request = response.content[-1]
    search_strings = tool_request.input['search_strings']

    if RAG == 'rdf':
        query_results = kg_query_agent(news_article_str, search_strings, source='rdf')
    elif RAG == 'neo4j':
        query_results = kg_query_agent(news_article_str, search_strings, source='neo4j')
    elif RAG == None:
        query_results = 'No additional knowledge available on these topics.'
    else:
        raise ValueError("Invalid RAG parameter value: should be one of these 'rdf', 'neo4j' or None")

    tool_response = {
        "role": "user",
        "content": [
            {
            "type": "tool_result",
            "tool_use_id": tool_request.id,
            "content": query_results
            }
        ]
    }
    messages.append(tool_response)

    response = call_claude(messages, system_prompt_analyst, tool = True )
    
    return response.content[-1].input, messages

# Live demo

## Mockup news 1

In [18]:
mockup_news_1 = """HOUSTON, Jan. 15, 2024 (GLOBE NEWSWIRE) -- SolarSpark Inc. ("SolarSpark" or the "Company") (NASDAQ:SLSP), an independent retail energy services company, announced today that the Company and GreenEnergy Holdings LLC ("GreenEnergy") have agreed to modify the terms of their acquisition agreement from July 2023.
The modified agreement adjusts the earnout provisions, with SolarSpark set to make a lump sum payment of $18.0 million in June 2024. This change will allow SolarSpark to assume full control over GreenEnergy's operations earlier than initially planned.
"This adjustment to our agreement with GreenEnergy reflects our evolving strategy in the current market environment," said Jane Doe, SolarSpark's President and Chief Executive Officer. "We believe this move will position us to better address the challenges and opportunities in our industry."
About SolarSpark Inc.
SolarSpark Inc. provides retail energy services to residential and commercial customers across 19 states in the United States.
"""

### Testing without RAG

In [19]:
# Without context
response, messages = claude_news_analytics(mockup_news_1)

In [20]:
response

{'summary': "SolarSpark Inc., a retail energy services company, has modified its acquisition agreement with GreenEnergy Holdings LLC. The new terms involve a lump sum payment of $18 million in June 2024, allowing SolarSpark to gain full control of GreenEnergy's operations earlier than initially planned. The CEO states this change aligns with their evolving strategy in the current market environment.",
 'key_entities': ['SolarSpark Inc.', 'GreenEnergy Holdings LLC', 'Jane Doe'],
 'asset_class': ['EQUITIES'],
 'region': 'US',
 'overall_sentiment_score': 2,
 'sentiment_score_per_asset_class': [{'asset_class': 'EQUITIES',
   'sentiment_score': 2}],
 'confidence': 60,
 'rationale': "The modification of the acquisition agreement suggests a proactive approach by SolarSpark to adapt to market conditions. The earlier assumption of control over GreenEnergy's operations could lead to faster integration and potential synergies. However, the lump sum payment might impact short-term cash flow. The l

### Testing with RDF based RAG

In [21]:
# With RAG context - RDF
response, messages = claude_news_analytics(mockup_news_1, RAG='rdf')

In [22]:
# Agent searches for these keywords in the graph
messages[1]['content'][0].input['search_strings']

['SolarSpark', 'GreenEnergy Holdings', 'Jane Doe']

In [23]:
# Raw output of the query
print(rdf_query(messages[1]['content'][0].input['search_strings']))

@prefix : <http://www.example.org/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

:Green_Energy a :Company,
        rdfs:Resource,
        owl:NamedIndividual,
        owl:Thing ;
    rdfs:label "GreenEnergy Holdings LLC" ;
    :EBITDA_YoY_change "-40%" ;
    :companyEvent :RegulatoryEvent ;
    :customerChurnRate "25%" ;
    owl:sameAs :Green_Energy .

:SolarSpark a :Company,
        rdfs:Resource,
        owl:NamedIndividual,
        owl:Thing ;
    rdfs:label "SolarSpark Inc." ;
    :EBITDA_YoY_change "-15%" ;
    :debtToEquityRatio "2.8" ;
    :marketCapitalization "$300 million" ;
    :recentAcquisitions "3.5" ;
    :recentLayoffs "20% of workforce" ;
    owl:sameAs :SolarSpark .

:RegulatoryEvent a :Event,
        rdfs:Resource,
        owl:NamedIndividual,
        owl:Thing ;
    rdfs:label "GreenEnergy regulatory event" ;
    rdfs:comment "GreenEnergy Holdings is facing potential license revocation in two major market

In [24]:
# KG-to-text agent summarizes
print(messages[-1]['content'][0]['content'])

Based on the knowledge graph data provided, here's a summary to support the analysis of the news article:

1. Company Profiles:
   - SolarSpark Inc. is a publicly traded company with a market capitalization of $300 million.
   - GreenEnergy Holdings LLC is facing significant challenges.

2. Financial Performance:
   - SolarSpark has experienced a 15% year-over-year decrease in EBITDA.
   - GreenEnergy has seen a more severe 40% year-over-year decline in EBITDA.

3. Operational Challenges:
   - SolarSpark recently laid off 20% of its workforce, indicating potential cost-cutting measures.
   - SolarSpark has a high debt-to-equity ratio of 2.8, suggesting significant leverage.
   - GreenEnergy is experiencing a high customer churn rate of 25%, which could be impacting its financial performance.

4. Regulatory Issues:
   - GreenEnergy is facing a significant regulatory event, with potential license revocation in two major markets due to alleged fraudulent practices.

5. Acquisition Context

In [25]:
# This is the final structured response from the analyst agent
response

{'summary': "SolarSpark Inc. has modified its acquisition agreement with GreenEnergy Holdings LLC, opting for a lump sum payment of $18 million in June 2024 to assume full control of GreenEnergy's operations earlier than initially planned. This change is described as reflecting SolarSpark's evolving strategy in the current market environment.",
 'key_entities': ['SolarSpark Inc.', 'GreenEnergy Holdings LLC', 'Jane Doe'],
 'asset_class': ['EQUITIES'],
 'region': 'US',
 'overall_sentiment_score': -3,
 'sentiment_score_per_asset_class': [{'asset_class': 'EQUITIES',
   'sentiment_score': -3}],
 'confidence': 85,
 'rationale': "The modified acquisition agreement, while potentially allowing SolarSpark to address GreenEnergy's issues more quickly, comes with significant risks. SolarSpark's own financial challenges (EBITDA decline, high debt, recent layoffs) combined with GreenEnergy's severe problems (regulatory issues, high customer churn, EBITDA decline) create a complex and risky situation

### Testing with Labeled Property Graph (Neo4j) based RAG

In [26]:
# With RAG context - Neo4j
response, messages = claude_news_analytics(mockup_news_1, RAG='neo4j')

In [27]:
# Agent searches for these keywords in the graph
messages[1]['content'][0].input['search_strings']

['SolarSpark', 'GreenEnergy Holdings', 'retail energy services', 'Jane Doe']

In [28]:
# Raw output of the query
print(neo4j_query(messages[1]['content'][0].input['search_strings']))

Source: GreenEnergy Holdings LLC
Related: (_2:Company {EBITDA_YoY_change: '-15%', debtToEquityRatio: '2.8', marketCapitalization: '$300 million', name: 'SolarSpark Inc.', recentAcquisitions: '3.5', recentLayoffs: '20% of workforce'})
Relationship: SolarSpark Inc. --[ACQUIRES]--> (_1:Company {EBITDA_YoY_change: '-40%', customerChurnRate: '25%', name: 'GreenEnergy Holdings LLC'})
Properties: {}
---
Source: GreenEnergy Holdings LLC
Related: (_14:Event {comment: 'GreenEnergy Holdings is facing potential license revocation in two major markets due to alleged fraudulent practices.', name: 'GreenEnergy regulatory event'})
Relationship: GreenEnergy Holdings LLC --[COMPANY_EVENT]--> (_14:Event {comment: 'GreenEnergy Holdings is facing potential license revocation in two major markets due to alleged fraudulent practices.', name: 'GreenEnergy regulatory event'})
Properties: {}
---
Source: SolarSpark Inc.
Related: (_1:Company {EBITDA_YoY_change: '-40%', customerChurnRate: '25%', name: 'GreenEnergy

In [29]:
# KG-to-text agent summarizes
print(messages[-1]['content'][0]['content'])

Based on the knowledge graph data provided, here's a summary of relevant facts to support the analysis of the news article:

1. Financial Health:
   - SolarSpark Inc. is experiencing financial challenges, with a 15% year-over-year decrease in EBITDA.
   - The company has a high debt-to-equity ratio of 2.8, indicating significant leverage.
   - SolarSpark's market capitalization is $300 million.

2. Recent Corporate Actions:
   - SolarSpark has made 3.5 recent acquisitions, suggesting an aggressive growth strategy.
   - The company has undergone significant restructuring, laying off 20% of its workforce.

3. Acquisition of GreenEnergy Holdings LLC:
   - GreenEnergy Holdings is facing severe financial difficulties, with a 40% year-over-year decrease in EBITDA.
   - GreenEnergy has a high customer churn rate of 25%, indicating potential issues with customer satisfaction or retention.

4. Regulatory Concerns:
   - GreenEnergy Holdings is facing a major regulatory event, with potential lice

In [30]:
# This is the final structured response from the analyst agent
response

{'summary': "SolarSpark Inc., a retail energy services company, has modified its acquisition agreement with GreenEnergy Holdings LLC. The new terms involve a lump sum payment of $18 million in June 2024, allowing SolarSpark to assume full control of GreenEnergy's operations earlier than initially planned. The company's CEO, Jane Doe, states that this change reflects their evolving strategy in the current market environment.",
 'key_entities': ['SolarSpark Inc.', 'GreenEnergy Holdings LLC', 'Jane Doe'],
 'asset_class': ['EQUITIES'],
 'region': 'US',
 'overall_sentiment_score': -6,
 'sentiment_score_per_asset_class': [{'asset_class': 'EQUITIES',
   'sentiment_score': -6}],
 'confidence': 85,
 'rationale': "The sentiment score of -6 reflects a significantly bearish outlook for SolarSpark Inc. based on the news and additional context. While the company is attempting to address challenges through the modified acquisition agreement, the underlying financial and regulatory issues present subs

## Mockup news 2

In [31]:
mockup_news_2 = """LONDON, Oct. 1, 2023 (GLOBE NEWSWIRE) -- EuroProperty Advisors, a leading real estate services firm, has released its annual report on industrial land prices across Europe. The report highlights significant changes in the valuation of plots suitable for e-commerce fulfillment centers.
According to the report, the average price for large industrial parcels (50 to 100 hectares) in 10 surveyed European markets now stands at €250,000 per hectare, compared to €125,000 a year ago. Smaller plots of 5 to 10 hectares, typically used for last-mile delivery centers, have seen prices rise to over €600,000 per hectare, up from approximately €500,000 last year.
Markus Vogel, head of EuroProperty Advisors' Industrial & Logistics Research division, stated, "The changing landscape of industrial real estate reflects the evolving needs of the e-commerce sector. We're seeing significant price movements in key markets."
The report identifies Germany's Ruhr Area, Greater London, Paris, and Warsaw as the most active warehouse construction markets. Price increases varied across these regions, with the Ruhr Area seeing a 30% rise, London and Paris experiencing 20% growth, and Warsaw noting an 18% increase.
Industry experts suggest that these price changes may impact the development of new warehouses and distribution centers in densely populated areas, potentially affecting the expansion plans of major e-commerce players.
EuroProperty Advisors will be hosting a webinar on Thursday, October 5, 2023, at 12:00 PM CET to discuss the implications of these findings for the e-commerce and logistics sectors.
"""

### Testing without RAG

In [32]:
# Without context
response, messages = claude_news_analytics(mockup_news_2)

In [33]:
response

{'summary': "EuroProperty Advisors' annual report reveals significant increases in industrial land prices across Europe, particularly for plots suitable for e-commerce fulfillment centers. Large industrial parcels have doubled in price to €250,000 per hectare, while smaller plots for last-mile delivery centers have increased to over €600,000 per hectare. The Ruhr Area, Greater London, Paris, and Warsaw are identified as the most active warehouse construction markets, with price increases ranging from 18% to 30%. These changes may impact the development of new warehouses and distribution centers in densely populated areas.",
 'key_entities': ['EuroProperty Advisors',
  'Ruhr Area',
  'Greater London',
  'Paris',
  'Warsaw',
  'e-commerce sector',
  'industrial real estate',
  'logistics sector'],
 'asset_class': ['REAL ESTATE', 'EQUITIES'],
 'region': 'EU',
 'overall_sentiment_score': 6,
 'sentiment_score_per_asset_class': [{'asset_class': 'REAL ESTATE',
   'sentiment_score': 8},
  {'as

### Testing with RDF based RAG

In [34]:
# With RAG context - RDF
response, messages = claude_news_analytics(mockup_news_2, RAG='rdf')

In [35]:
# Agent searches for these keywords in the graph
messages[1]['content'][0].input['search_strings']

['EuroProperty Advisors',
 'e-commerce',
 'industrial real estate',
 'Ruhr Area',
 'Greater London',
 'Paris',
 'Warsaw',
 'Markus Vogel']

In [36]:
# Raw output of the query
print(rdf_query(messages[1]['content'][0].input['search_strings']))

@prefix : <http://www.example.org/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

_:N4fd0cb381a9e4d44909f13020906037f a rdfs:Class,
        rdfs:Resource,
        owl:Class ;
    rdfs:subClassOf _:N4fd0cb381a9e4d44909f13020906037f,
        _:N6b3b91d7e2b840ac9ac0b1a632cca0ac,
        :GrowingIndustry,
        :Industry,
        rdfs:Resource,
        owl:Thing ;
    owl:equivalentClass _:N4fd0cb381a9e4d44909f13020906037f,
        _:N6b3b91d7e2b840ac9ac0b1a632cca0ac,
        :GrowingIndustry ;
    owl:intersectionOf [ a rdf:List,
                rdfs:Resource ;
            rdf:first :Industry ;
            rdf:rest [ a rdf:List,
                        rdfs:Resource ;
                    rdf:first _:N6b3b91d7e2b840ac9ac0b1a632cca0ac ;
                    rdf:rest () ] ] .

_:N6b3b91d7e2b840ac9ac0b1a632cca0ac a rdfs:Class,
        rdfs:Resource,
        owl:Class,
   

In [37]:
# KG-to-text agent summarizes
print(messages[-1]['content'][0]['content'])

Based on the knowledge graph provided, I can offer the following summary to support the analysis of the news article:

1. Economic Context:
   The IMF has revised the Eurozone GDP growth forecast for 2024 from 1.8% to 0.5%, citing inflationary pressures and reduced consumer spending. This economic outlook may impact the industrial real estate market.

2. E-commerce Sector:
   Contrary to the article's implication of growth, the e-commerce sector is experiencing a significant downturn. Major e-commerce players report a 25% year-over-year decrease in sales volume across European markets. This decline is reflected in Paris, where the e-commerce market share has decreased by 15%.

3. Warehouse Utilization:
   A recent industry survey reveals that 30% of existing e-commerce warehouses in Europe are operating at less than 50% capacity. In the Ruhr Area specifically, warehouse occupancy is at 45%. This contrasts with the article's focus on rising land prices and suggests potential oversupply.

In [38]:
# This is the final structured response from the analyst agent
response

{'summary': 'EuroProperty Advisors reports significant increases in industrial land prices across Europe, particularly for e-commerce fulfillment centers. However, additional context reveals a contrasting reality of declining e-commerce sales, underutilized warehouses, and decreased construction activity.',
 'key_entities': ['EuroProperty Advisors',
  'e-commerce sector',
  'industrial real estate',
  'Ruhr Area',
  'Greater London',
  'Paris',
  'Warsaw',
  'European Commission'],
 'asset_class': ['REAL ESTATE', 'EQUITIES'],
 'region': 'EU',
 'overall_sentiment_score': -6,
 'sentiment_score_per_asset_class': [{'asset_class': 'REAL ESTATE',
   'sentiment_score': -7},
  {'asset_class': 'EQUITIES', 'sentiment_score': -5}],
 'confidence': 90,
 'rationale': 'The initial article painted a bullish picture for industrial real estate and e-commerce. However, the knowledge graph data reveals a starkly different reality. The e-commerce sector is experiencing a 25% YoY decline, 30% of warehouses 

### Testing with Labeled Property Graph (Neo4j) based RAG

In [39]:
# With RAG context - Neo4j
response, messages = claude_news_analytics(mockup_news_2, RAG='neo4j')

In [40]:
# Agent searches for these keywords in the graph
messages[1]['content'][0].input['search_strings']

['EuroProperty Advisors',
 'e-commerce',
 'industrial real estate',
 'Ruhr Area',
 'Greater London',
 'Paris',
 'Warsaw',
 'Markus Vogel']

In [41]:
# Raw output of the query
print(neo4j_query(messages[1]['content'][0].input['search_strings']))

Source: EuroProperty Advisors
Related: (_12:Event {comment: 'Recent satellite data shows a 40% decrease in construction activity for new warehouses across Europe in Q3 2023.', name: 'Industry Trend Real Estate'})
Relationship: EuroProperty Advisors --[RELATED_EVENT]--> (_12:Event {comment: 'Recent satellite data shows a 40% decrease in construction activity for new warehouses across Europe in Q3 2023.', name: 'Industry Trend Real Estate'})
Properties: {}
---
Source: EuroProperty Advisors
Related: (_5:Industry {constructionActivity: '-40%', name: 'Warehouse Sector'})
Relationship: EuroProperty Advisors --[RELATED_EVENT]--> (_12:Event {comment: 'Recent satellite data shows a 40% decrease in construction activity for new warehouses across Europe in Q3 2023.', name: 'Industry Trend Real Estate'})
Properties: {}
Relationship: Warehouse Sector --[RELATED_EVENT]--> (_12:Event {comment: 'Recent satellite data shows a 40% decrease in construction activity for new warehouses across Europe in Q3 

In [42]:
# KG-to-text agent summarizes
print(messages[-1]['content'][0]['content'])

Based on the knowledge graph data provided, I can offer the following summary to supplement the analysis of the news article:

The article's discussion of rising industrial land prices for e-commerce fulfillment centers should be considered in the context of several recent industry trends and economic factors:

1. Warehouse Construction Slowdown: Recent satellite data indicates a 40% decrease in construction activity for new warehouses across Europe in Q3 2023. This slowdown contrasts with the rising land prices mentioned in the article and may suggest a potential market correction or shift in demand.

2. E-commerce Sector Decline: Major e-commerce players have reported a 25% year-over-year decrease in sales volume across European markets. This decline is particularly notable in Paris, where the e-commerce market share has dropped by 15%. This trend could explain why land prices are rising despite decreased construction activity, as companies may be reassessing their expansion strategi

In [43]:
# This is the final structured response from the analyst agent
response

{'summary': "EuroProperty Advisors reports significant increases in industrial land prices across Europe, particularly for plots suitable for e-commerce fulfillment centers. Large industrial parcels have doubled in price to €250,000 per hectare, while smaller plots for last-mile delivery centers have increased to over €600,000 per hectare. The most active warehouse construction markets are identified as Germany's Ruhr Area, Greater London, Paris, and Warsaw, with price increases ranging from 18% to 30%. These changes may impact future warehouse and distribution center development in populated areas.",
 'key_entities': ['EuroProperty Advisors',
  'e-commerce',
  'industrial real estate',
  'Ruhr Area',
  'Greater London',
  'Paris',
  'Warsaw'],
 'asset_class': ['REAL ESTATE', 'EQUITIES'],
 'region': 'EU',
 'overall_sentiment_score': -2,
 'sentiment_score_per_asset_class': [{'asset_class': 'REAL ESTATE',
   'sentiment_score': -3},
  {'asset_class': 'EQUITIES', 'sentiment_score': -1}],
 

# Annex (Some Data Architecture)

![Image](image5.png)