This is to test a more cleaned up graph RAG pipeline rather than the jargon left in the previous notebook (work in progress).

In [15]:
import os
import time

from langchain_core.runnables import RunnablePassthrough
from pydantic import BaseModel, Field
from langchain_core.output_parsers import StrOutputParser
from langchain_neo4j import Neo4jGraph
from langchain_community.chat_models import ChatOllama
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_community.vectorstores import Neo4jVector
from langchain_core.documents import Document
from langchain_ollama import OllamaEmbeddings
from langchain_experimental.llms.ollama_functions import OllamaFunctions


In [39]:
from neo4j import GraphDatabase
url = "bolt://localhost:7687"
username = "neo4j"
password= "neo4j"

graph = Neo4jGraph(url=url, username=username, password=password)
driver = GraphDatabase.driver(url, auth=(username, password))
session = driver.session()

In [6]:
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_ollama.llms import OllamaLLM

llm = OllamaLLM(model = "llama3.1:8b-instruct-fp16") 

llm_transformer = LLMGraphTransformer(llm=llm)

graphCypherQA chain generates a cypher queries based on the input question vs GraphQAChain is to extract entities from the input question

In [67]:
from langchain.prompts import PromptTemplate
from langchain_core.prompts import PromptTemplate
from langchain_neo4j import GraphCypherQAChain

CYPHER_GENERATION_TEMPLATE = """
You are an AI system that generates Cypher queries to retrieve data from a graph database. 
The graph schema consists of nodes and relationships related to the following ontologies:
- Adversarial Autoencoders (AAEs)
- Generative Adversarial Networks (GANs)
- Simple Classification
You will receive a question related to the domain. Based on that, generate a Cypher query that answers the question.

Here is the schema:
{schema}

For example if you are to retrieve the nodes to a specific naming convention:

MATCH (n)
WHERE n.uri IS NOT NULL AND n.uri CONTAINS 'GAN'
RETURN n

Based on the above, answer the following question by creating a Cypher query to retrieve the relevant data from the graph.

Question: {question}
"""

CYPHER_GENERATION_PROMPT = PromptTemplate(
    input_variables=["schema", "question"], template=CYPHER_GENERATION_TEMPLATE
)

CYPHER_QA_TEMPLATE = """
You are an assistant that helps to provide human-readable answers to questions based on classes and relationships in the graph.
The graph is related to three ontology instances on:
- Adversarial Autoencoders (AAEs)
- Generative Adversarial Networks (GANs)
- Simple Classification

You will receive the context (results of a Cypher query) and a question. Based on that, return an understandable answer.

Context: {context}
Question: {question}
"""

CYPHER_QA_PROMPT = PromptTemplate(
    input_variables=["context", "question"], template=CYPHER_QA_TEMPLATE
)

graphCypher_chain = GraphCypherQAChain.from_llm(
    OllamaLLM(model = "llama3.1:8b-instruct-fp16", temperature=0.0), 
    graph=graph, 
    cypher_prompt=CYPHER_GENERATION_PROMPT, 
    qa_prompt=CYPHER_QA_PROMPT,
    verbose=True,
    allow_dangerous_requests = True
)

In [27]:
def queryNeo4j(driver, query):
    """Runs a single Cypher query."""
    with driver.session() as session:
        try:
            return session.run(query)
            
        except Exception as e:
            print(f"Error executing query: {e}")

In [34]:
def get_schema_from_neo4j(graph):
    
    schema = graph.get_schema
    return schema

def get_answer_for_question(graph, question):
    schema = get_schema_from_neo4j(graph)
    
    cypher_query = graphCypher_chain.invoke({
        "schema" : schema,
        "query" : question
    })
    
    cypher_query_string = cypher_query.get("query")
    
    if not cypher_query_string:
        raise ValueError("Cypher query generation failed or returned an empty query.")
    
    
    context = queryNeo4j(driver, cypher_query)
    
    human_readable_answer = graphCypher_chain.invoke({
        "context" : context,
        "query" : question
    })
    
    return human_readable_answer
    

In [68]:
question = "which nodes are related to GANs and what are their relationships?"

answer = get_answer_for_question(graph, question)
print(answer)



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (n)
WHERE n.uri IS NOT NULL AND n.uri CONTAINS 'GAN'
OPTIONAL MATCH (n)-[r]-()
RETURN n, r
[0m
Full Context:
[32;1m[1;3m[{'n': {'uri': 'GAN_Generator_OUTP'}, 'r': ({'uri': 'GAN_Generator_OUTP'}, 'ns0__joinsDataSet', {'ns0__is_transient_dataset': [True], 'ns0__data_sample_dimensionality': [2], 'uri': 'GAN_GeneratedSet', 'ns0__data_sample_features': [784]})}, {'n': {'uri': 'GAN_Generator_OUTP'}, 'r': ({'uri': 'GAN_Generator_OUTP'}, 'ns0__joinsLayer', {'ns0__layer_num_units': [784], 'uri': 'GAN_Generator_OUT'})}, {'n': {'ns0__layer_num_units': [3000], 'uri': 'AAE_Label_GAN_L2Clone'}, 'r': ({'ns0__layer_num_units': [30], 'uri': 'AAE_Label_GAN_Y'}, 'ns0__previousLayer', {'ns0__layer_num_units': [3000], 'uri': 'AAE_Label_GAN_L2Clone'})}, {'n': {'ns0__layer_num_units': [3000], 'uri': 'AAE_Label_GAN_L2Clone'}, 'r': ({'ns0__layer_num_units': [3000], 'uri': 'AAE_Label_GAN_L2Clone'}, 'ns0__nextLaye

Notes:
make prompt to describe annetto

can we use a bellman ford algorithm to get shortest paths in related documents from the communities we have developed using the leidan algorithm?

In [59]:
graphCypher_chain.invoke({"schema" : graph.get_schema, "query": "Which nodes are related to GANs and what are their relationships?"})



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (n:ns0__TrainingSingleForwardOnly)-[r]-(m)
MATCH (n:ns0__ConcatLayer)-[r]-(m)
MATCH (n:ns0__CategoricalCrossEntropy)-[r]-(m)
RETURN n, r, m;
[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m


{'schema': 'Node properties:\nResource {uri: STRING, ns0__layer_num_units: LIST, ns0__eval_score: LIST, ns0__learning_rate: LIST, owl__qualifiedCardinality: LIST, owl__minQualifiedCardinality: LIST, ns0__dropout_rate: LIST, rdfs__comment: LIST, ns0__labels_count: LIST, ns0__labels_dtype: LIST, ns0__normal_mu: LIST, ns0__normal_sigma: LIST, ns0__data_sample_dimensionality: LIST, ns0__data_sample_features: LIST, ns0__momentum: LIST, ns0__number_of_epochs: LIST, ns0__learning_rate_decay_epochs: LIST, ns0__batch_size: LIST, ns0__learning_rate_decay: LIST, ns0__data_description: LIST, ns0__data_samples: LIST, ns0__data_location: LIST, ns0__is_transient_dataset: LIST, ns0__num_of_iterations: LIST, owl__maxQualifiedCardinality: LIST, ns0__has_bias: LIST}\n_GraphConfig {_dataTypePropertyLabel: STRING, _subPropertyOfRel: STRING, _classNamePropName: STRING, _handleVocabUris: INTEGER, _applyNeo4jNaming: BOOLEAN, _relNamePropName: STRING, _domainRel: STRING, _keepLangTag: BOOLEAN, _keepCustomDataT

In [45]:
from yfiles_jupyter_graphs import GraphWidget
def showGraph():
    session = driver.session()
    widget = GraphWidget(graph = session.run("MATCH (n)-[r]->(m) RETURN n, r, m").graph())
    widget.node_label_mapping = 'uri'
    widget.circular_layout()
    widget.set_sidebar(start_with='Data')
    return widget

showGraph()

GraphWidget(layout=Layout(height='800px', width='100%'))

In [58]:
print(graph.get_schema)

Node properties:
Resource {uri: STRING, ns0__layer_num_units: LIST, ns0__eval_score: LIST, ns0__learning_rate: LIST, owl__qualifiedCardinality: LIST, owl__minQualifiedCardinality: LIST, ns0__dropout_rate: LIST, rdfs__comment: LIST, ns0__labels_count: LIST, ns0__labels_dtype: LIST, ns0__normal_mu: LIST, ns0__normal_sigma: LIST, ns0__data_sample_dimensionality: LIST, ns0__data_sample_features: LIST, ns0__momentum: LIST, ns0__number_of_epochs: LIST, ns0__learning_rate_decay_epochs: LIST, ns0__batch_size: LIST, ns0__learning_rate_decay: LIST, ns0__data_description: LIST, ns0__data_samples: LIST, ns0__data_location: LIST, ns0__is_transient_dataset: LIST, ns0__num_of_iterations: LIST, owl__maxQualifiedCardinality: LIST, ns0__has_bias: LIST}
_GraphConfig {_dataTypePropertyLabel: STRING, _subPropertyOfRel: STRING, _classNamePropName: STRING, _handleVocabUris: INTEGER, _applyNeo4jNaming: BOOLEAN, _relNamePropName: STRING, _domainRel: STRING, _keepLangTag: BOOLEAN, _keepCustomDataTypes: BOOLEAN,

In [14]:
graph.get_structured_schema

{'node_props': {'Resource': [{'property': 'uri', 'type': 'STRING'},
   {'property': 'ns0__layer_num_units', 'type': 'LIST'},
   {'property': 'ns0__eval_score', 'type': 'LIST'},
   {'property': 'ns0__learning_rate', 'type': 'LIST'},
   {'property': 'owl__qualifiedCardinality', 'type': 'LIST'},
   {'property': 'owl__minQualifiedCardinality', 'type': 'LIST'},
   {'property': 'ns0__dropout_rate', 'type': 'LIST'},
   {'property': 'rdfs__comment', 'type': 'LIST'},
   {'property': 'ns0__labels_count', 'type': 'LIST'},
   {'property': 'ns0__labels_dtype', 'type': 'LIST'},
   {'property': 'ns0__normal_mu', 'type': 'LIST'},
   {'property': 'ns0__normal_sigma', 'type': 'LIST'},
   {'property': 'ns0__data_sample_dimensionality', 'type': 'LIST'},
   {'property': 'ns0__data_sample_features', 'type': 'LIST'},
   {'property': 'ns0__momentum', 'type': 'LIST'},
   {'property': 'ns0__number_of_epochs', 'type': 'LIST'},
   {'property': 'ns0__learning_rate_decay_epochs', 'type': 'LIST'},
   {'property': 'n

graphrag implementation using pdf documents
https://neo4j.com/developer-blog/global-graphrag-neo4j-langchain/

To make this process more scalable and robust, you could use an LLM to:

Extract relevant concepts (nodes) and relationships from the GAN paper.
Parse this data into a standard format for Neo4j, automatically generating Cypher queries.
Match and merge the extracted data with the existing RDF/OWL-based graph in Neo4j.
Here is a conceptual overview of how you might set this up using LangChain:

Step 1: Parse and Convert OWL to Neo4j: Convert the RDF/OWL data to a Cypher-compatible format (using libraries like rdflib, RDF2Neo4j, or custom scripts) and load it into Neo4j.

Step 2: Extract Entities and Relationships from the GAN Paper: Use an LLM to extract key entities and relationships from the GAN paper. This can be done using named entity recognition (NER), dependency parsing, or a model like GPT to analyze the paper and generate structured triples (subject-predicate-object).

Step 3: Merge the Two Graphs Using a Cypher Chain: Use a Cypher query chain (like GraphCypherQAChain) to:

Generate Cypher queries for both graphs.
Merge them based on the common schema or attributes.
Handle any conflicts (e.g., duplicate nodes or relationships).
Step 4: Continuous Integration for New Papers: As new papers are added, automate the process of extracting concepts and relationships from the new paper, transforming it into Cypher queries, and merging it with the existing graph. This can be scaled by using an LLM to process and generate the necessary Cypher queries dynamically for any new paper.

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain_core.documents import Document

pdf_loader = PyPDFLoader("papers/AAE.pdf")
documents = pdf_loader.load()




graph_documents = llm_transformer.convert_to_graph_documents(documents)
print(f"Nodes:{graph_documents[0].nodes}")
print(f"Relationships:{graph_documents[0].relationships}")



In [None]:
AAEGraph = Neo4jGraph()

graph.add_graph_documents(
    graph_documents,
    baseEntityLabel=True,
    include_source=True
)