# Working with Memgraph and Obsidian for Knowledge Graphs and RAG (Powered by LangChain)

This notebook demonstrates how to use Memgraph and Obsidian for Knowledge Graphs and RAG (Powered by LangChain).  This is a work in progress and highly experimental, so the code you see here is subject to change as I refine (and sometimes on a whim when I want to try something new).  The goal is to create a seamless integration between Memgraph and Obsidian for creating and querying knowledge graphs.  The integration is powered by LangChain, which will be adding additional functionality with LLMs such as GPT-4o, Llama-3, and others.

## Introduction

### What is Memgraph?
Memgraph is a high-performance, in-memory graph database that is designed to be fast, scalable, and easy to use.  It is a great tool for creating and querying knowledge graphs.  You can learn more about Memgraph at [https://memgraph.com/](https://memgraph.com/).

### What is Obsidian?
Obsidian is a powerful knowledge management tool that allows you to create and organize your notes, ideas, and knowledge in a graph-like structure.  It is a great tool for creating and visualizing knowledge graphs.  You can learn more about Obsidian at [https://obsidian.md/](https://obsidian.md/).

### What is LangChain?
LangChain is a framework for developing applications powered by large language models (LLMs).

LangChain simplifies every stage of the LLM application lifecycle:

- **Development**: Build your applications using LangChain's open-source [building blocks](https://python.langchain.com/v0.2/docs/concepts/#langchain-expression-language) and [components](https://python.langchain.com/v0.2/docs/concepts/). Hit the ground running using [third-party integrations(https://python.langchain.com/v0.2/docs/integrations/platforms/)] and [Templates](https://python.langchain.com/v0.2/docs/templates/).
- **Productionization**: Use [LangSmith](https://docs.smith.langchain.com/) to inspect, monitor and evaluate your chains, so that you can continuously optimize and deploy with confidence.
- Deployment: Turn any chain into an API with [LangServe](https://python.langchain.com/v0.2/docs/langserve/).

![](https://python.langchain.com/v0.2/svg/langchain_stack_dark.svg)


Concretely, the framework consists of the following open-source libraries:

- **langchain-core**: Base abstractions and LangChain Expression Language.
- **langchain-community**: Third party integrations.
  - Partner packages (e.g. **langchain-openai**, **langchain-anthropic**, etc.): Some integrations have been further split into their own lightweight packages that only depend on langchain-core.
- langchain: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
- langgraph: Build robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
- langserve: Deploy LangChain chains as REST APIs.
- LangSmith: A developer platform that lets you debug, test, evaluate, and monitor LLM applications.

### Bor and ODIN

Much of this is based on **Bor** which is a backend that powers the **ODIN** or **RUNE** front-ends.  However, I am looking more than just extending Obsidian, so I will be adjusting and improving on this design for my own purposes.  I will be using the **Bor** backend as a starting point, but I will be making significant changes to it as I go along.

For more information you can view the GitHub repositories at:
- [Bor](https://github.com/memgraph/bor)
- [ODIN](https://github.com/memgraph/odin)



In [None]:
# !git clone https://github.com/memgraph/odin.git
# !git clone https://github.com/memgraph/bor.git
# %pip install -r /Users/dcarmocan/Projects/Reports/report_notebooks/bor/requirements.txt
# %pip install langchain --upgrade
# %pip install langchain-openai --upgrade
# %pip install pandasai --upgrade
# # %pip install pandas --upgrade
# %pip install datasets --upgrade
# %pip install transformers --upgrade
# %pip install torch --upgrade
# %pip install torchtext --upgrade
# %pip install sentence-transformers --upgrade
# %pip install "fsspec[http]" --upgrade

# %pip install "gqlalchemy[all]"
# %pip install "gqlalchemy[all]" --upgrade
# %pip install "gqlalchemy[torch_pyg]"
# %pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.13.0+cpu.html"

# Imports and Config

## Installing Memgraph

First things first, we need to install Memgraph.  You can download Memgraph from [https://memgraph.com/download](https://memgraph.com/download), however, for what we are doing, it is easier to run Memgraph Platform and Lab in a Docker compose environment.  You can find the instructions for this at [https://memgraph.com/docs/memgraph-lab/installation/docker-compose](https://memgraph.com/docs/memgraph-lab/installation/docker-compose).

For my settings, I just followed the standard instructions for running Memgraph Platform and Lab in a Docker compose environment. and then added some adjustments. 

This is an example of the `docker-compose.yml` file I used:

```yaml
version: "3.8"
services:
  memgraph:
    image: memgraph/memgraph-mage:latest
    container_name: memgraph-mage
    pull_policy: always
    ports:
      - "7687:7687"
      - "7444:7444"
    command: ["--log-level=TRACE"]
    networks:
      - memgraph-network

  memgraph-lab:
    image: memgraph/lab:latest
    container_name: memgraph-lab
    pull_policy: always
    ports:
      - "3000:3000"
    depends_on:
      - memgraph
    environment:
      QUICK_CONNECT_MG_HOST: memgraph
      QUICK_CONNECT_MG_PORT: 7687
    networks:
      - memgraph-network

networks:
  memgraph-network:
    external: true
    name: memgraph-platform_memgraph-network

```

The main thing that I wanted to be sure of is that the Obsidian path would be mounted as a volume in the Memgraph Lab container.  This is the line that I added to the `memgraph-lab` service:

```yaml
    volumes:
      - /path/to/obsidian:/root/.config/obsidian
```



## Imports

In [None]:
from __future__ import annotations
from typing import Optional, Dict, List, Any, Iterator
import os
import sys
import logging
import pandas as pd
import numpy as np

import requests
import json
import time
from pathlib import Path
from gqlalchemy import Memgraph


import os
import re
from typing import List
from gqlalchemy import Memgraph
from langchain.agents import Tool, initialize_agent, AgentType
from langchain.chat_models import ChatOpenAI
from langchain.prompts import MessagesPlaceholder, PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.schema import SystemMessage


# notes_dir = Path("/Users/dcarmocan/Projects/mind-cave")
notes_dir = "/Users/dcarmocan/Projects/mind-cave"

In [None]:
notes_dir

## Our Components

In order to rebuild this, we need to understand the components that we are working with.  The main components are:

- **MemgraphManager**
- **Constants**
- **CollectionManager**
- **CollectionManager**
- **CypherQueryHandler**
- **VaultManager**
- **GeneralQueryAgent**

### The `MemgraphManager` Class


In [None]:
# import os
# import mgclient
# from markdown import Markdown
# from markdown.treeprocessors import Treeprocessor
# from markdown.extensions import Extension
# import re
# import datetime

# # Initialize the MemgraphManager
# class MemgraphManager:
#     def __init__(self):
#         try:
#             self.connection = mgclient.connect(host='localhost', port=7687, lazy=False)
#             self.connection.autocommit = True
#         except Exception as e:
#             print(f"Failed to connect to Memgraph: {e}")
#             raise

#     def execute_query(self, query: str):
#         print(f"Executing query: {query}")
#         try:
#             cursor = self.connection.cursor()
#             cursor.execute(query)
#             results = cursor.fetchall()
#             cursor.close()
#             print("Query executed successfully.")
#             return results
#         except Exception as e:
#             print(f"Failed to execute query: {e}")
#             if cursor:
#                 cursor.close()
#             raise

#     def create_node(self, label: str, properties: dict):
#         props = ", ".join([f"{k}: '{v}'" for k, v in properties.items()])
#         query = f"CREATE (n:{label} {{{props}}})"
#         self.execute_query(query)

#     def create_edge(self, from_node: str, to_node: str, relationship: str):
#         query = f"""
#         MATCH (a), (b)
#         WHERE a.name = '{from_node}' AND b.name = '{to_node}'
#         CREATE (a)-[r:{relationship}]->(b)
#         """
#         self.execute_query(query)

# # Custom Markdown extension to extract headings
# class HeadingExtractor(Treeprocessor):
#     def __init__(self, md):
#         super().__init__(md)
#         self.headings = []

#     def run(self, root):
#         for element in root:
#             if element.tag.startswith('h'):
#                 self.headings.append((element.tag, element.text))
#         return root

# class ExtractHeadings(Extension):
#     def extendMarkdown(self, md):
#         heading_extractor = HeadingExtractor(md)
#         md.treeprocessors.register(heading_extractor, 'extract_headings', 10)
#         md.heading_extractor = heading_extractor

# # Function to parse a markdown file and extract headings and links
# def parse_markdown(file_path):
#     md = Markdown(extensions=[ExtractHeadings()])
#     with open(file_path, 'r', encoding='utf-8') as f:
#         content = f.read()
#     md.convert(content)
#     links = re.findall(r'\[.*?\]\((.*?)\)', content)
#     return md.heading_extractor.headings, links

# # Function to sanitize heading text for Cypher queries
# def sanitize_text(text):
#     if text is None:
#         return "None"
#     sanitized = text.replace("'", "\\'").replace("#", "").replace("?", "")
#     return sanitized

# # Path to the directory containing markdown files (Obsidian vault)
# notes_dir = '/Users/dcarmocan/Projects/mind-cave'

# # Initialize the MemgraphManager
# mg_manager = MemgraphManager()

# # Process each markdown file in the directory
# for root, dirs, files in os.walk(notes_dir):
#     for file in files:
#         if file.endswith('.md'):
#             file_path = os.path.join(root, file)
#             print(f"Processing file: {file_path}")
            
#             try:
#                 # Extract headings and links
#                 headings, links = parse_markdown(file_path)
#                 file_name = os.path.basename(file_path)
                
#                 # Create a node for the file
#                 mg_manager.create_node('File', {'name': file_name})
                
#                 # Create nodes and edges for each heading
#                 previous_heading = file_name
#                 for tag, heading in headings:
#                     sanitized_heading = sanitize_text(heading)
#                     heading_node = f"{file_name}_{sanitized_heading}"
#                     mg_manager.create_node(tag, {'name': heading_node})
#                     mg_manager.create_edge(previous_heading, heading_node, 'CONTAINS')
#                     previous_heading = heading_node

#                 # Create edges for links to other files
#                 for link in links:
#                     linked_file = os.path.basename(link)
#                     mg_manager.create_edge(file_name, linked_file, 'LINKS_TO')
#             except Exception as e:
#                 print(f"Error processing file {file_path}: {e}")


In [None]:
# import os
# import mgclient
# from markdown import Markdown
# from markdown.treeprocessors import Treeprocessor
# from markdown.extensions import Extension
# import re
# import datetime

# # Initialize the MemgraphManager
# class MemgraphManager:
#     def __init__(self):
#         try:
#             self.connection = mgclient.connect(host='localhost', port=7687, lazy=False)
#             self.connection.autocommit = True
#         except Exception as e:
#             print(f"Failed to connect to Memgraph: {e}")
#             raise

#     def execute_query(self, query: str):
#         print(f"Executing query: {query}")
#         try:
#             cursor = self.connection.cursor()
#             cursor.execute(query)
#             results = cursor.fetchall()
#             cursor.close()
#             print("Query executed successfully.")
#             return results
#         except Exception as e:
#             print(f"Failed to execute query: {e}")
#             if cursor:
#                 cursor.close()
#             raise

#     def find_duplicates(self):
#         query = """
#         MATCH (n)
#         WITH n.name AS name, collect(n) AS nodes
#         WHERE size(nodes) > 1
#         RETURN name, nodes
#         """
#         return self.execute_query(query)

#     def merge_nodes(self, name):
#         merge_query_corrected = f"""
#         CALL merge.node(
#             ['File'], 
#             {{name: '{name}'}},
#             {{}},
#             {{}}
#         ) YIELD node
#         RETURN node
#         """
#         self.execute_query(merge_query_corrected)

#     def merge_duplicates(self):
#         duplicates = self.find_duplicates()
#         for name, _ in duplicates:
#             self.merge_nodes(name)

#     def create_node(self, label: str, properties: dict):
#         props = ", ".join([f"{k}: '{v}'" for k, v in properties.items()])
#         query = f"CREATE (n:{label} {{{props}}})"
#         self.execute_query(query)

#     def create_edge(self, from_node: str, to_node: str, relationship: str):
#         query = f"""
#         MATCH (a), (b)
#         WHERE a.name = '{from_node}' AND b.name = '{to_node}'
#         CREATE (a)-[r:{relationship}]->(b)
#         """
#         self.execute_query(query)

#     def search_nodes(self, label: str, property_name: str, property_value: str):
#         query = f"MATCH (n:{label} {{{property_name}: '{property_value}'}}) RETURN n"
#         return self.execute_query(query)

#     def search_relationships(self, relationship: str):
#         query = f"MATCH ()-[r:{relationship}]->() RETURN r"
#         return self.execute_query(query)


# # Custom Markdown extension to extract headings
# class HeadingExtractor(Treeprocessor):
#     def __init__(self, md):
#         super().__init__(md)
#         self.headings = []

#     def run(self, root):
#         for element in root:
#             if element.tag.startswith('h'):
#                 self.headings.append((element.tag, element.text))
#         return root

# class ExtractHeadings(Extension):
#     def extendMarkdown(self, md):
#         heading_extractor = HeadingExtractor(md)
#         md.treeprocessors.register(heading_extractor, 'extract_headings', 10)
#         md.heading_extractor = heading_extractor

# # Function to parse a markdown file and extract headings and links
# def parse_markdown(file_path):
#     md = Markdown(extensions=[ExtractHeadings()])
#     with open(file_path, 'r', encoding='utf-8') as f:
#         content = f.read()
#     md.convert(content)
#     links = re.findall(r'\[.*?\]\((.*?)\)', content)
#     return md.heading_extractor.headings, links

# # Function to sanitize heading text for Cypher queries
# def sanitize_text(text):
#     if text is None:
#         return "None"
#     sanitized = text.replace("'", "\\'").replace("#", "").replace("?", "")
#     return sanitized

# # Path to the directory containing markdown files (Obsidian vault)
# notes_dir = '/Users/dcarmocan/Projects/mind-cave'

In [None]:
import os
import mgclient
from markdown import Markdown
from markdown.treeprocessors import Treeprocessor
from markdown.extensions import Extension
import re

# Initialize the MemgraphManager
class MemgraphManager:
    def __init__(self):
        try:
            self.connection = mgclient.connect(host='localhost', port=7687, lazy=False)
            self.connection.autocommit = True
        except Exception as e:
            print(f"Failed to connect to Memgraph: {e}")
            raise

    def execute_query(self, query: str):
        print(f"Executing query: {query}")
        try:
            cursor = self.connection.cursor()
            cursor.execute(query)
            results = cursor.fetchall()
            cursor.close()
            print("Query executed successfully.")
            return results
        except Exception as e:
            print(f"Failed to execute query: {e}")
            if cursor:
                cursor.close()
            raise

    def find_duplicates(self):
        query = """
        MATCH (n)
        WITH n.name AS name, collect(n) AS nodes
        WHERE size(nodes) > 1
        RETURN name, nodes
        """
        return self.execute_query(query)

    def merge_nodes(self, name):
        sanitized_name = sanitize_text(name)
        merge_query_corrected = f"""
        CALL merge.node(
            ['File'], 
            {{name: '{sanitized_name}'}},
            {{}},
            {{}}
        ) YIELD node
        RETURN node
        """
        self.execute_query(merge_query_corrected)

    def merge_duplicates(self):
        duplicates = self.find_duplicates()
        for name, _ in duplicates:
            try:
                self.merge_nodes(name)
            except Exception as e:
                print(f"Failed to merge nodes with name '{name}': {e}")

    def create_node(self, label: str, properties: dict):
        props = ", ".join([f"{k}: '{sanitize_text(v)}'" for k, v in properties.items()])
        query = f"CREATE (n:{label} {{{props}}})"
        self.execute_query(query)

    def create_edge(self, from_node: str, to_node: str, relationship: str):
        query = f"""
        MATCH (a), (b)
        WHERE a.name = '{sanitize_text(from_node)}' AND b.name = '{sanitize_text(to_node)}'
        CREATE (a)-[r:{relationship}]->(b)
        """
        self.execute_query(query)

    def search_nodes(self, label: str, property_name: str, property_value: str):
        query = f"MATCH (n:{label} {{{property_name}: '{property_value}'}}) RETURN n"
        return self.execute_query(query)

    def search_relationships(self, relationship: str):
        query = f"MATCH ()-[r:{relationship}]->() RETURN r"
        return self.execute_query(query)

# Custom Markdown extension to extract headings
class HeadingExtractor(Treeprocessor):
    def __init__(self, md):
        super().__init__(md)
        self.headings = []

    def run(self, root):
        for element in root:
            if element.tag.startswith('h'):
                self.headings.append((element.tag, element.text))
        return root

class ExtractHeadings(Extension):
    def extendMarkdown(self, md):
        heading_extractor = HeadingExtractor(md)
        md.treeprocessors.register(heading_extractor, 'extract_headings', 10)
        md.heading_extractor = heading_extractor

# Function to parse a markdown file and extract headings and links
def parse_markdown(file_path):
    md = Markdown(extensions=[ExtractHeadings()])
    with open(file_path, 'r', encoding='utf-8') as f:
        content = f.read()
    md.convert(content)
    links = re.findall(r'\[.*?\]\((.*?)\)', content)
    return md.heading_extractor.headings, links

# Function to sanitize text for Cypher queries
def sanitize_text(text):
    if text is None:
        return "None"
    sanitized = re.sub(r"[^a-zA-Z0-9_\s]", "", text.replace("'", "\\'").replace("#", "").replace("?", ""))
    return sanitized

In [107]:

# Path to the directory containing markdown files (Obsidian vault)
notes_dir = '/Users/dcarmocan/Projects/mind-cave'

# Initialize the MemgraphManager
mg_manager = MemgraphManager()

# Process each markdown file in the directory
for root, dirs, files in os.walk(notes_dir):
    for file in files:
        if file.endswith('.md'):
            file_path = os.path.join(root, file)
            print(f"Processing file: {file_path}")
            
            # Extract headings and links
            headings, links = parse_markdown(file_path)
            file_name = os.path.basename(file_path)
            
            # Create a node for the file
            mg_manager.create_node('File', {'name': file_name})
            
            # Create nodes and edges for each heading
            previous_heading = file_name
            for tag, heading in headings:
                sanitized_heading = sanitize_text(heading)
                heading_node = f"{file_name}_{sanitized_heading}"
                mg_manager.create_node(tag, {'name': heading_node})
                mg_manager.create_edge(previous_heading, heading_node, 'CONTAINS')
                previous_heading = heading_node

            # Create edges for links to other files
            for link in links:
                link_file_name = os.path.basename(link)
                mg_manager.create_edge(file_name, link_file_name, 'LINKS_TO')

# Merge duplicate nodes
mg_manager.merge_duplicates()

Processing file: /Users/dcarmocan/Projects/mind-cave/SPOG.md
Executing query: CREATE (n:File {name: 'SPOG.md'})
Query executed successfully.
Executing query: CREATE (n:hr {name: 'SPOG.md_None'})
Query executed successfully.
Executing query: 
        MATCH (a:File {name: 'SPOG.md'}), (b:File {name: 'SPOG.md_None'})
        CREATE (a)-[r:CONTAINS]->(b)
        
Query executed successfully.
Executing query: CREATE (n:hr {name: 'SPOG.md_None'})
Query executed successfully.
Executing query: 
        MATCH (a:File {name: 'SPOG.md_None'}), (b:File {name: 'SPOG.md_None'})
        CREATE (a)-[r:CONTAINS]->(b)
        
Query executed successfully.
Executing query: CREATE (n:h3 {name: 'SPOG.md_SPOG'})
Query executed successfully.
Executing query: 
        MATCH (a:File {name: 'SPOG.md_None'}), (b:File {name: 'SPOG.md_SPOG'})
        CREATE (a)-[r:CONTAINS]->(b)
        
Query executed successfully.
Executing query: CREATE (n:h4 {name: 'SPOG.md_Zoom SPOG'})
Query executed successfully.
Executing q

#### Adding More Methods for Indexes and Constraints


In [None]:
import mgclient
import re

class MemgraphManager:
    def __init__(self):
        try:
            self.connection = mgclient.connect(host='localhost', port=7687, lazy=False)
            self.connection.autocommit = True
        except Exception as e:
            print(f"Failed to connect to Memgraph: {e}")
            raise

    def execute_query(self, query: str):
        print(f"Executing query: {query}")
        try:
            cursor = self.connection.cursor()
            cursor.execute(query)
            results = cursor.fetchall()
            cursor.close()
            print("Query executed successfully.")
            return results
        except Exception as e:
            print(f"Failed to execute query: {e}")
            if cursor:
                cursor.close()
            raise

    def find_duplicates(self):
        query = """
        MATCH (n:File)
        WITH n.name AS name, COUNT(n) AS count
        WHERE count > 1
        RETURN name, count
        """
        return self.execute_query(query)

    def merge_nodes(self, name):
        merge_query = f"""
        MATCH (n:File {{name: '{name}'}})
        WITH COLLECT(n) AS nodes
        CALL {{
            WITH nodes
            UNWIND nodes AS n
            WITH n LIMIT 1
            RETURN n
        }}
        WITH nodes, n AS to_keep
        UNWIND nodes AS n
        WITH n, to_keep WHERE n <> to_keep
        OPTIONAL MATCH (n)-[r]-()
        DELETE r
        DETACH DELETE n
        RETURN to_keep
        """
        self.execute_query(merge_query)

    def merge_duplicates(self):
        duplicates = self.find_duplicates()
        for name, _ in duplicates:
            self.merge_nodes(name)

    def create_node(self, label: str, properties: dict):
        props = ", ".join([f"{k}: '{v}'" for k, v in properties.items()])
        query = f"CREATE (n:{label} {{{props}}})"
        self.execute_query(query)

    def create_edge(self, from_node: str, to_node: str, relationship: str):
        query = f"""
        MATCH (a:File {{name: '{from_node}'}}), (b:File {{name: '{to_node}'}})
        CREATE (a)-[r:{relationship}]->(b)
        """
        self.execute_query(query)

    def sanitize_text(self, text):
        if text is None:
            return "None"
        sanitized = text.replace("'", "\\'").replace("#", "").replace("?", "")
        return sanitized


In [108]:
import os
import mgclient
from markdown import Markdown
from markdown.treeprocessors import Treeprocessor
from markdown.extensions import Extension
import re

class MemgraphManager:
    def __init__(self):
        try:
            self.connection = mgclient.connect(host='localhost', port=7687, lazy=False)
            self.connection.autocommit = True
        except Exception as e:
            print(f"Failed to connect to Memgraph: {e}")
            raise

    def execute_query(self, query: str):
        print(f"Executing query: {query}")
        try:
            cursor = self.connection.cursor()
            cursor.execute(query)
            results = cursor.fetchall()
            cursor.close()
            print("Query executed successfully.")
            return results
        except Exception as e:
            print(f"Failed to execute query: {e}")
            if cursor:
                cursor.close()
            raise

    def find_duplicates(self):
        query = """
        MATCH (n:File)
        WITH n.name AS name, COUNT(n) AS count
        WHERE count > 1
        RETURN name, count
        """
        return self.execute_query(query)

    def merge_nodes(self, name):
        merge_query = f"""
        MATCH (n:File {{name: '{name}'}})
        WITH COLLECT(n) AS nodes
        CALL {{
            WITH nodes
            UNWIND nodes AS n
            WITH n LIMIT 1
            RETURN n
        }}
        WITH nodes, n AS to_keep
        UNWIND nodes AS n
        WITH n, to_keep WHERE n <> to_keep
        OPTIONAL MATCH (n)-[r]-()
        DELETE r
        DETACH DELETE n
        RETURN to_keep
        """
        self.execute_query(merge_query)

    def merge_duplicates(self):
        duplicates = self.find_duplicates()
        for name, _ in duplicates:
            self.merge_nodes(name)

    def create_node(self, label: str, properties: dict):
        props = ", ".join([f"{k}: '{v}'" for k, v in properties.items()])
        query = f"CREATE (n:{label} {{{props}}})"
        self.execute_query(query)

    def create_edge(self, from_node: str, to_node: str, relationship: str):
        query = f"""
        MATCH (a:File {{name: '{from_node}'}}), (b:File {{name: '{to_node}'}})
        CREATE (a)-[r:{relationship}]->(b)
        """
        self.execute_query(query)

    def sanitize_text(self, text):
        if text is None:
            return "None"
        sanitized = text.replace("'", "\\'").replace("#", "").replace("?", "")
        return sanitized

# Custom Markdown extension to extract headings
class HeadingExtractor(Treeprocessor):
    def __init__(self, md):
        super().__init__(md)
        self.headings = []

    def run(self, root):
        for element in root:
            if element.tag.startswith('h'):
                self.headings.append((element.tag, element.text))
        return root

class ExtractHeadings(Extension):
    def extendMarkdown(self, md):
        heading_extractor = HeadingExtractor(md)
        md.treeprocessors.register(heading_extractor, 'extract_headings', 10)
        md.heading_extractor = heading_extractor

# Function to parse a markdown file and extract headings and links
def parse_markdown(file_path):
    md = Markdown(extensions=[ExtractHeadings()])
    with open(file_path, 'r', encoding='utf-8') as f:
        content = f.read()
    md.convert(content)
    links = re.findall(r'\[.*?\]\((.*?)\)', content)
    return md.heading_extractor.headings, links

# Path to the directory containing markdown files (Obsidian vault)
notes_dir = '/Users/dcarmocan/Projects/mind-cave'

# Initialize the MemgraphManager
mg_manager = MemgraphManager()

# Process each markdown file in the directory
for root, dirs, files in os.walk(notes_dir):
    for file in files:
        if file.endswith('.md'):
            file_path = os.path.join(root, file)
            print(f"Processing file: {file_path}")
            
            # Extract headings and links
            headings, links = parse_markdown(file_path)
            file_name = os.path.basename(file_path)
            
            # Create a node for the file
            mg_manager.create_node('File', {'name': file_name})
            
            # Create nodes and edges for each heading
            previous_heading = file_name
            for tag, heading in headings:
                sanitized_heading = mg_manager.sanitize_text(heading)
                heading_node = f"{file_name}_{sanitized_heading}"
                mg_manager.create_node(tag, {'name': heading_node})
                mg_manager.create_edge(previous_heading, heading_node, 'CONTAINS')
                previous_heading = heading_node
            
            # Create edges for each link
            for link in links:
                link_file_name = os.path.basename(link)
                mg_manager.create_edge(file_name, link_file_name, 'LINKS_TO')

# Merge duplicate nodes
mg_manager.merge_duplicates()

# Add indexes and constraints
def add_indexes_and_constraints(mg_manager):
    index_query = "CREATE INDEX ON :File(name)"
    mg_manager.execute_query(index_query)

    # Handle duplicates before applying the unique constraint
    mg_manager.merge_duplicates()

    # Ensure 'name' property is unique for 'File' nodes
    constraint_query = "CREATE CONSTRAINT ON (f:File) ASSERT f.name IS UNIQUE"
    mg_manager.execute_query(constraint_query)

add_indexes_and_constraints(mg_manager)


Processing file: /Users/dcarmocan/Projects/mind-cave/SPOG.md
Executing query: CREATE (n:File {name: 'SPOG.md'})
Query executed successfully.
Executing query: CREATE (n:hr {name: 'SPOG.md_None'})
Query executed successfully.
Executing query: 
        MATCH (a:File {name: 'SPOG.md'}), (b:File {name: 'SPOG.md_None'})
        CREATE (a)-[r:CONTAINS]->(b)
        
Query executed successfully.
Executing query: CREATE (n:hr {name: 'SPOG.md_None'})
Query executed successfully.
Executing query: 
        MATCH (a:File {name: 'SPOG.md_None'}), (b:File {name: 'SPOG.md_None'})
        CREATE (a)-[r:CONTAINS]->(b)
        
Query executed successfully.
Executing query: CREATE (n:h3 {name: 'SPOG.md_SPOG'})
Query executed successfully.
Executing query: 
        MATCH (a:File {name: 'SPOG.md_None'}), (b:File {name: 'SPOG.md_SPOG'})
        CREATE (a)-[r:CONTAINS]->(b)
        
Query executed successfully.
Executing query: CREATE (n:h4 {name: 'SPOG.md_Zoom SPOG'})
Query executed successfully.
Executing q

#### Memgraph Validation

In [113]:
import os
import mgclient
from markdown import Markdown
from markdown.treeprocessors import Treeprocessor
from markdown.extensions import Extension
import re
import datetime
from mgclient import DatabaseError

class MemgraphManager:
    def __init__(self):
        try:
            self.connection = mgclient.connect(host='localhost', port=7687, lazy=False)
            self.connection.autocommit = True
        except Exception as e:
            print(f"Failed to connect to Memgraph: {e}")
            raise

    def execute_query(self, query: str):
        print(f"Executing query: {query}")
        try:
            cursor = self.connection.cursor()
            cursor.execute(query)
            results = cursor.fetchall()
            cursor.close()
            print("Query executed successfully.")
            return results
        except Exception as e:
            print(f"Failed to execute query: {e}")
            if cursor:
                cursor.close()
            raise

    def find_duplicates(self):
        query = """
        MATCH (n:File)
        WITH n.name AS name, COUNT(n) AS count
        WHERE count > 1
        RETURN name, count
        """
        return self.execute_query(query)

    def merge_nodes(self, name):
        merge_query = f"""
        MATCH (n:File {{name: '{name}'}})
        WITH COLLECT(n) AS nodes
        CALL {{
            WITH nodes
            UNWIND nodes AS n
            WITH n LIMIT 1
            RETURN n
        }} AS kept_node
        WITH nodes, kept_node.n AS to_keep
        UNWIND nodes AS n
        WITH n, to_keep WHERE n <> to_keep
        OPTIONAL MATCH (n)-[r]-()
        DELETE r
        DETACH DELETE n
        RETURN to_keep
        """
        self.execute_query(merge_query)

    def merge_duplicates(self):
        duplicates = self.find_duplicates()
        for name, _ in duplicates:
            self.merge_nodes(name)

    def create_node(self, label: str, properties: dict):
        props = ", ".join([f"{k}: '{v}'" for k, v in properties.items()])
        query = f"CREATE (n:{label} {{{props}}})"
        try:
            self.execute_query(query)
        except DatabaseError as e:
            if "unique constraint violation" in str(e):
                print(f"Node {label} with properties {properties} already exists. Skipping creation.")
            else:
                raise

    def create_edge(self, from_node: str, to_node: str, relationship: str):
        query = f"""
        MATCH (a), (b)
        WHERE a.name = '{from_node}' AND b.name = '{to_node}'
        CREATE (a)-[r:{relationship}]->(b)
        """
        self.execute_query(query)

class HeadingExtractor(Treeprocessor):
    def __init__(self, md):
        super().__init__(md)
        self.headings = []

    def run(self, root):
        for element in root:
            if element.tag.startswith('h'):
                self.headings.append((element.tag, element.text))
        return root

class ExtractHeadings(Extension):
    def extendMarkdown(self, md):
        heading_extractor = HeadingExtractor(md)
        md.treeprocessors.register(heading_extractor, 'extract_headings', 10)
        md.heading_extractor = heading_extractor

def parse_markdown(file_path):
    md = Markdown(extensions=[ExtractHeadings()])
    with open(file_path, 'r', encoding='utf-8') as f:
        content = f.read()
    md.convert(content)
    links = re.findall(r'\[.*?\]\((.*?)\)', content)
    return md.heading_extractor.headings, links

def sanitize_text(text):
    if text is None:
        return "None"
    sanitized = text.replace("'", "\\'").replace("#", "").replace("?", "")
    return sanitized

def add_indexes_and_constraints(mg_manager):
    index_query = "CREATE INDEX ON :File(name)"
    mg_manager.execute_query(index_query)

    mg_manager.merge_duplicates()

    constraint_query = "CREATE CONSTRAINT ON (f:File) ASSERT f.name IS UNIQUE"
    mg_manager.execute_query(constraint_query)

# Path to the directory containing markdown files (Obsidian vault)
notes_dir = '/Users/dcarmocan/Projects/mind-cave'

# Initialize the MemgraphManager
mg_manager = MemgraphManager()

# Add indexes and constraints
add_indexes_and_constraints(mg_manager)

# Process each markdown file in the directory
for root, dirs, files in os.walk(notes_dir):
    for file in files:
        if file.endswith('.md'):
            file_path = os.path.join(root, file)
            print(f"Processing file: {file_path}")
            
            # Extract headings and links
            headings, links = parse_markdown(file_path)
            file_name = os.path.basename(file_path)
            
            # Create a node for the file
            mg_manager.create_node('File', {'name': file_name})
            
            # Create nodes and edges for each heading
            previous_heading = file_name
            for tag, heading in headings:
                sanitized_heading = sanitize_text(heading)
                if tag == 'hr':
                    print(f"Skipping creation of 'hr' node for heading: {heading}")
                    continue
                heading_node = f"{file_name}_{sanitized_heading}"
                mg_manager.create_node(tag, {'name': heading_node})
                mg_manager.create_edge(previous_heading, heading_node, 'CONTAINS')
                previous_heading = heading_node

            # Create edges for links to other files
            for link in links:
                link_file_name = os.path.basename(link)
                mg_manager.create_edge(file_name, link_file_name, 'LINKS_TO')

# Queries to validate the relationships
print("Validating relationships between File to File and File to h1")
mg_manager.execute_query("MATCH (n:File)-[r]->(m:File) RETURN n, r, m LIMIT 100;")
mg_manager.execute_query("MATCH (n:File)-[r]->(m:h1) RETURN n, r, m LIMIT 100;")
mg_manager.execute_query("MATCH (n:File)-[r]->(m) WHERE m:File OR m:h1 RETURN n, r, m LIMIT 100;")
mg_manager.execute_query("MATCH (n:File)-[r]->(m) RETURN n, r, m LIMIT 100;")


Executing query: CREATE INDEX ON :File(name)
Query executed successfully.
Executing query: 
        MATCH (n:File)
        WITH n.name AS name, COUNT(n) AS count
        WHERE count > 1
        RETURN name, count
        
Query executed successfully.
Executing query: CREATE CONSTRAINT ON (f:File) ASSERT f.name IS UNIQUE
Query executed successfully.
Processing file: /Users/dcarmocan/Projects/mind-cave/SPOG.md
Executing query: CREATE (n:File {name: 'SPOG.md'})
Failed to execute query: Unable to commit due to unique constraint violation on :File(name)
Node File with properties {'name': 'SPOG.md'} already exists. Skipping creation.
Skipping creation of 'hr' node for heading: None
Skipping creation of 'hr' node for heading: None
Executing query: CREATE (n:h3 {name: 'SPOG.md_SPOG'})
Query executed successfully.
Executing query: 
        MATCH (a), (b)
        WHERE a.name = 'SPOG.md' AND b.name = 'SPOG.md_SPOG'
        CREATE (a)-[r:CONTAINS]->(b)
        
Query executed successfully.
Execut

[(<mgclient.Node(id=21736, labels={'File'}, properties={'name': 'SPOG.md'}) at 0x17168eee0>,
  <mgclient.Relationship(start_id=21736, end_id=21737, type='CONTAINS', properties={}) at 0x177ae20f0>,
  <mgclient.Node(id=21737, labels={'hr'}, properties={'name': 'SPOG.md_None'}) at 0x1605eac70>),
 (<mgclient.Node(id=21736, labels={'File'}, properties={'name': 'SPOG.md'}) at 0x177d42cd0>,
  <mgclient.Relationship(start_id=21736, end_id=21738, type='CONTAINS', properties={}) at 0x1611d7970>,
  <mgclient.Node(id=21738, labels={'hr'}, properties={'name': 'SPOG.md_None'}) at 0x177d409f0>),
 (<mgclient.Node(id=21736, labels={'File'}, properties={'name': 'SPOG.md'}) at 0x109364e70>,
  <mgclient.Relationship(start_id=21736, end_id=24071, type='CONTAINS', properties={}) at 0x1611d7b30>,
  <mgclient.Node(id=24071, labels={'hr'}, properties={'name': 'SPOG.md_None'}) at 0x1093676f0>),
 (<mgclient.Node(id=21736, labels={'File'}, properties={'name': 'SPOG.md'}) at 0x1093652c0>,
  <mgclient.Relationship(

In [130]:
import memgraph

class MemgraphManager:
    def __init__(self, host='127.0.0.1', port=7687):
        self.connection = memgraph.connect(host=host, port=port)

    def execute_query(self, query):
        try:
            cursor = self.connection.cursor()
            cursor.execute(query)
            results = cursor.fetchall()
            cursor.close()
            return results
        except memgraph.DatabaseError as e:
            print(f"Failed to execute query: {e}")
            raise e

    def execute_and_fetch(self, query, params=None):
        try:
            cursor = self.connection.cursor()
            cursor.execute(query, params)
            results = cursor.fetchall()
            cursor.close()
            return results
        except memgraph.DatabaseError as e:
            print(f"Failed to execute query: {e}")
            raise e

    def find_duplicates(self):
        query = """
        MATCH (n:File)
        WITH n.name AS name, COUNT(n) AS count
        WHERE count > 1
        RETURN name, count
        """
        return self.execute_query(query)

    def merge_nodes(self, name):
        merge_query = f"""
        MATCH (n:File {{name: '{name}'}})
        WITH COLLECT(n) AS nodes
        CALL {{
            WITH nodes
            UNWIND nodes AS n
            WITH n LIMIT 1
            RETURN n
        }} AS kept_node
        WITH nodes, kept_node.n AS to_keep
        UNWIND nodes AS n
        WITH n, to_keep WHERE n <> to_keep
        OPTIONAL MATCH (n)-[r]-()
        DELETE r
        DETACH DELETE n
        RETURN to_keep
        """
        self.execute_query(merge_query)

    def merge_duplicates(self):
        duplicates = self.find_duplicates()
        for name, _ in duplicates:
            self.merge_nodes(name)


ModuleNotFoundError: No module named 'memgraph'

### The `Constants` Class


In [119]:
# Load environment variables from .env file
from dotenv import load_dotenv
load_dotenv()

# Set constants for LangChain from environment variables
class Constants:
    LLM_MODEL_TEMPERATURE = float(os.getenv("LLM_MODEL_TEMPERATURE"))
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
    LLM_MODEL_NAME = os.getenv("LLM_MODEL_NAME")


constants = Constants()

### The `CollectionManager` Class

In [128]:
class CollectionManager:
    def __init__(self, directory, memgraph):
        self.directory = directory
        self.memgraph = memgraph
        self.query_handler = CypherQueryHandler()

    def extract_links(self, content):
        """Extracts links from the content of a Markdown file."""
        links = re.findall(r'\[\[([^\]]+)\]\]', content)  # Matches [[Link]]
        return links

    def parse_markdown(self, content):
        """Parses Markdown content to extract headings and their hierarchy."""
        headings = []
        lines = content.split('\n')
        for line in lines:
            match = re.match(r'^(#+)\s+(.*)', line)
            if match:
                level = len(match.group(1))
                text = match.group(2)
                headings.append((level, text))
        return headings

    def create_nodes_and_edges(self, file_name, content):
        """Creates nodes and edges in MemGraph based on the Markdown content."""
        headings = self.parse_markdown(content)
        links = self.extract_links(content)

        # Create a top-level node for the file
        query = """
        MERGE (n:File {name: $name})
        ON CREATE SET n.content = $content
        """
        params = {"name": file_name, "content": content}
        self.memgraph.execute_and_fetch(query, params)

        last_nodes = {0: file_name}  # To keep track of last nodes at each heading level
        for level, text in headings:
            node_name = f"{file_name}::{text}"
            query = """
            MERGE (n:Heading {name: $name, level: $level, text: $text})
            """
            params = {"name": node_name, "level": level, "text": text}
            self.memgraph.execute_and_fetch(query, params)

            # Create an edge from the last node of the previous level to this node
            parent_node = last_nodes[level - 1]
            query = """
            MATCH (p {name: $parent_name}), (c {name: $child_name})
            MERGE (p)-[:CONTAINS]->(c)
            """
            params = {"parent_name": parent_node, "child_name": node_name}
            self.memgraph.execute_and_fetch(query, params)

            last_nodes[level] = node_name

        # Create edges to linked files
        for link in links:
            query = """
            MATCH (a:File {name: $file_name}), (b:File {name: $link_name})
            MERGE (a)-[:LINKS_TO]->(b)
            """
            params = {"file_name": file_name, "link_name": link}
            self.memgraph.execute_and_fetch(query, params)

    def process_file(self, file_path):
        """Process a single Markdown file and store in MemGraph."""
        try:
            with open(file_path, 'r', encoding='utf-8') as md_file:
                content = md_file.read()
                file_name = os.path.basename(file_path).replace('.md', '')
                print(f"Processing file: {file_path}")
                self.create_nodes_and_edges(file_name, content)
        except Exception as e:
            print(f"Error reading {file_path}: {e}")

    def process_directory(self):
        """Walk through the directory and process each Markdown file."""
        for root, _, files in os.walk(self.directory):
            for file in files:
                if file.endswith('.md'):
                    file_path = os.path.join(root, file)
                    self.process_file(file_path)
    
    def merge_duplicates(self):
        queries = [
            """
            MATCH (n)
            WITH n.name AS name, collect(n) AS nodes
            WHERE size(nodes) > 1
            CALL apoc.refactor.mergeNodes(nodes) YIELD node
            RETURN node
            """,
            """
            MATCH (n)-[r]->(m)
            WHERE n IS NULL OR m IS NULL
            DELETE r
            """
        ]

        for query in queries:
            self.execute_query(query)


### The `CypherQueryHandler` Class

In [124]:
from typing import List

class CypherQueryHandler:

    @staticmethod
    def get_check_if_db_empty_query() -> str:
        return ("MATCH (n) "
                "RETURN count(n) AS nodes")

    @staticmethod
    def get_export_for_repo_path_query(repo_path: str) -> str:
        return (f"MATCH p=(n {{ repo_path: '{repo_path}' }})-[r]->(m {{ repo_path: '{repo_path}' }}) "
                "WITH project(p) AS repo_specific_subgraph "
                "RETURN repo_specific_subgraph")

    @staticmethod
    def get_export_for_file_path_query(file_path: str) -> str:
        return (f"MATCH p=(n {{ file_path: '{file_path}' }})-[r]->(m {{ file_path: '{file_path}' }}) "
                "WITH project(p) AS file_specific_subgraph "
                "RETURN file_specific_subgraph")

    @staticmethod
    def get_delete_all_query() -> str:
        return ("MATCH (n) "
                "DETACH DELETE n")

    @staticmethod
    def get_delete_all_for_repo_query(repo_path: str) -> str:
        return (f"MATCH (n) "
                f"WHERE n.repo_path = '{repo_path}' "
                "DETACH DELETE n")

    @staticmethod
    def get_delete_graph_for_file_query(file_path: str) -> str:
        return (f"MATCH (n) "
                f"WHERE n.file_path = '{file_path}' "
                "DETACH DELETE n")

    @staticmethod
    def get_rename_file_query(old_file_path: str, new_file_path: str) -> str:
        return (f"MATCH (n) "
                f"WHERE n.file_path = '{old_file_path}' "
                f"SET n.file_path = '{new_file_path}'")

    @staticmethod
    def get_strings_to_embed_query(file_path: str) -> str:
        return (f"MATCH (n) "
                f"OPTIONAL MATCH (n)-[r]->(m) "
                f"WHERE n.file_path = '{file_path}' "
                "RETURN ID(n) as Node_ID, "
                "n.name as Node_Name, "
                "labels(n)[0] as Node_Type, "
                "collect({Neighbour_Name: m.name, Neighbour_Type: labels(m)[0], Relationship_Type: type(r)}) as Connections")

    @staticmethod
    def get_node_description_query(node_id: int) -> str:
        return (f"MATCH (n) "
                f"WHERE ID(n) = {node_id} "
                "OPTIONAL MATCH (n)-[r1]->(m) "
                "OPTIONAL MATCH (p)-[r2]->(n) "
                "RETURN ID(n) as Node_ID, "
                "n.name as Node_Name, "
                "labels(n)[0] as Node_Type, "
                "collect({ Neighbour_Name: m.name, Neighbour_Type: labels(m)[0], Relationship_Type: type(r1) }) as In_Connections, "
                "collect({ Neighbour_Name: p.name, Neighbour_Type: labels(p)[0], Relationship_Type: type(r2) }) as Out_Connections")

    @staticmethod
    def get_set_embeddings_query(node_id: int, embeddings: List[float]) -> str:
        return (f"MATCH (n) "
                f"WHERE ID(n) = {node_id} "
                f"SET n.embeddings = {embeddings}")

    @staticmethod
    def get_tmp_create_query(embeddings: List[float]) -> str:
        return (f"MATCH (n) "
                "WITH count(n) as N "
                "UNWIND range(1, N) as id "
                f"CREATE (:Temp {{name: 'temp_'+id, embeddings: {embeddings}}})")

    @staticmethod
    def get_tmp_delete_query() -> str:
        return ("MATCH (n:Temp) "
                "DETACH DELETE n")

    @staticmethod
    def get_schema_for_repo_query(repo_path: str) -> str:
        return (f"MATCH p=(n {{ repo_path: '{repo_path}' }})-[r]->(m {{ repo_path: '{repo_path}' }}) "
                "WITH project(p) AS repo_specific_subgraph "
                "CALL llm_util.schema(repo_specific_subgraph, 'prompt_ready') "
                "YIELD schema "
                "RETURN schema")

    @staticmethod
    def get_vector_search_query() -> str:
        return ("MATCH (m) "
                "WHERE (NOT m.name STARTS WITH 'temp') OR (m.name IS NULL) "
                "WITH COLLECT(m) as nodes "
                "MATCH (tmp) "
                "WHERE tmp.name STARTS WITH 'temp' "
                "WITH COLLECT(tmp) as temps, nodes "
                "CALL node_similarity.cosine_pairwise('embeddings', nodes, temps) YIELD node1, node2, similarity as cosine_similarity "
                "RETURN ID(node1), cosine_similarity "
                "ORDER BY cosine_similarity DESC "
                "LIMIT 3")

    @staticmethod
    def get_embeddings_for_node_query(node_id: int) -> str:
        return (f"MATCH (m) "
                f"WHERE ID(m) = {node_id} "
                "RETURN m.embeddings as embeddings")


### The `VaultManager` Class

In [125]:
class VaultManager:
    def __init__(self, directory: str, memgraph):
        self.collection_manager = CollectionManager(directory, memgraph)
        self.memgraph = memgraph
        self.query_handler = CypherQueryHandler()

    def update_vault(self):
        """Updates the entire vault by processing all Markdown files."""
        print("Updating vault...")
        self.collection_manager.process_directory()
        print("Vault update complete.")

    def clear_repo(self, repo_path: str):
        """Clears all data related to a specific repository."""
        print(f"Clearing repository: {repo_path}...")
        query = self.query_handler.get_delete_all_for_repo_query(repo_path)
        self.memgraph.execute(query)
        print(f"Repository {repo_path} cleared.")

    def clear_file(self, file_path: str):
        """Clears all data related to a specific file."""
        print(f"Clearing file: {file_path}...")
        query = self.query_handler.get_delete_graph_for_file_query(file_path)
        self.memgraph.execute(query)
        print(f"File {file_path} cleared.")

    def rename_file(self, old_file_path: str, new_file_path: str):
        """Renames a file in the vault."""
        print(f"Renaming file from {old_file_path} to {new_file_path}...")
        query = self.query_handler.get_rename_file_query(old_file_path, new_file_path)
        self.memgraph.execute(query)
        print(f"File renamed from {old_file_path} to {new_file_path}.")

    def export_repo(self, repo_path: str):
        """Exports the data for a specific repository."""
        print(f"Exporting repository: {repo_path}...")
        query = self.query_handler.get_export_for_repo_path_query(repo_path)
        result = self.memgraph.execute(query)
        print(f"Repository {repo_path} exported.")
        return result

    def export_file(self, file_path: str):
        """Exports the data for a specific file."""
        print(f"Exporting file: {file_path}...")
        query = self.query_handler.get_export_for_file_path_query(file_path)
        result = self.memgraph.execute(query)
        print(f"File {file_path} exported.")
        return result

    def get_schema_for_repo(self, repo_path: str):
        """Gets the schema for a specific repository."""
        print(f"Getting schema for repository: {repo_path}...")
        query = self.query_handler.get_schema_for_repo_query(repo_path)
        result = self.memgraph.execute(query)
        print(f"Schema for repository {repo_path} obtained.")
        return result


### The `GeneralQueryAgent` Class

In [126]:
class GeneralQueryAgent:
    def __init__(self, repo_path: str, tools: List[Tool]) -> None:
        self.repo_path = repo_path
        self.system_message = ''
        self._init_system_message()

        self.llm = ChatOpenAI(
            temperature=constants.LLM_MODEL_TEMPERATURE,
            openai_api_key=constants.OPENAI_API_KEY,
            model_name=constants.LLM_MODEL_NAME
        )

        self.agent_kwargs = {
            "extra_prompt_messages": [MessagesPlaceholder(variable_name="memory")],
            "system_message": self.system_message,
        }

        self.memory = ConversationBufferMemory(memory_key="memory", return_messages=True)

        self.run_cypher_query = Tool.from_function(
            func=MemgraphManager.select_query_tool,
            name="run_cypher_query",
            description="""Useful when you want to run Cypher queries on the knowledge graph. 
                            Note that the input has to be valid Cypher. Consult the graph schema in order to know how to write correct queries. 
                            Pay attention to the repo_path attribute.
                            Returns results of executing the query."""
        )

        self.tools = tools + [self.run_cypher_query]
        self._init_agent(self.tools)

    def _init_agent(self, tools: List[Tool]) -> None:
        self.agent = initialize_agent(
            tools,
            self.llm,
            agent=AgentType.OPENAI_FUNCTIONS,
            verbose=True,
            agent_kwargs=self.agent_kwargs,
            memory=self.memory
        )

    def _init_system_message(self) -> None:
        # Define the path to your prompt file explicitly
        prompt_name = 'system_message_query'
        prompt_path = Path('./core/knowledgebase/prompts') / prompt_name
        prompt_text = prompt_path.read_text()
        prompt_template = PromptTemplate.from_template(prompt_text)

        mm = MemgraphManager()
        schema = mm.get_schema_for_repo(self.repo_path)
        self.system_message = SystemMessage(
            content=prompt_template.format(schema=schema, repo_path=self.repo_path)
        )

    def ask(self, question: str) -> str:
        return self.agent.run(question)


# Testing Our Config

Now that we have all our components in place, we can test our configuration.

In [129]:
import os

# Assuming the following imports from previous code:
# from your_module import MemgraphManager, CollectionManager

# Define the directory path to your Obsidian vault for testing
test_vault_directory = '/Users/dcarmocan/Projects/mind-cave'

# Initialize MemgraphManager
memgraph_manager = MemgraphManager()

# Initialize CollectionManager with the test directory and MemgraphManager
collection_manager = CollectionManager(test_vault_directory, memgraph_manager)

# Perform a test by processing the directory
print("Processing the test Obsidian vault directory...")
collection_manager.process_directory()
print("Processing complete.")

# Optionally, you can run additional checks or queries to validate the results
# For example, you can print some nodes and relationships from the database to verify

# Print some nodes
print("Some nodes in the database:")
nodes_query = "MATCH (n) RETURN n LIMIT 10"
nodes_result = memgraph_manager.execute_query(nodes_query)
for node in nodes_result:
    print(node)

# Print some relationships
print("Some relationships in the database:")
relationships_query = "MATCH (n)-[r]->(m) RETURN n, r, m LIMIT 10"
relationships_result = memgraph_manager.execute_query(relationships_query)
for relationship in relationships_result:
    print(relationship)


Processing the test Obsidian vault directory...
Processing file: /Users/dcarmocan/Projects/mind-cave/SPOG.md
Error reading /Users/dcarmocan/Projects/mind-cave/SPOG.md: 'MemgraphManager' object has no attribute 'execute_and_fetch'
Processing file: /Users/dcarmocan/Projects/mind-cave/Zoom Dashboard Documentation.md
Error reading /Users/dcarmocan/Projects/mind-cave/Zoom Dashboard Documentation.md: 'MemgraphManager' object has no attribute 'execute_and_fetch'
Processing file: /Users/dcarmocan/Projects/mind-cave/Summary Week - 20230915.md
Error reading /Users/dcarmocan/Projects/mind-cave/Summary Week - 20230915.md: 'MemgraphManager' object has no attribute 'execute_and_fetch'
Processing file: /Users/dcarmocan/Projects/mind-cave/Project Plan Template.md
Error reading /Users/dcarmocan/Projects/mind-cave/Project Plan Template.md: 'MemgraphManager' object has no attribute 'execute_and_fetch'
Processing file: /Users/dcarmocan/Projects/mind-cave/Cap_Planning_Article.md
Error reading /Users/dcarmo

In [None]:
# Example usage:
# Path to the directory containing markdown files (Obsidian vault)
notes_dir = '/Users/dcarmocan/Projects/mind-cave'

# Initialize the VaultManager and update the vault
vault_manager = VaultManager(memgraph=memgraph,directory=notes_dir)
vault_manager.update_vault()

# Initialize a GeneralQueryAgent with any required tools
# For simplicity, no additional tools are added here
agent = GeneralQueryAgent(repo_path=notes_dir, tools=[])

In [None]:
# Make sure to test the API path
base_url = "http://localhost:8000"

# add_file = "/knowledge_base/notes/add_file"
notes_dir = '/Users/dcarmocan/Projects/mind-cave'
collection = CollectionManager(notes_dir, memgraph)



In [None]:
import os

# Get the current file path
current_file_path = os.path.abspath("__file__")

# Update the vault
vault_manager = VaultManager(notes_dir, memgraph)
# vault_manager.clear_vault()
vault_manager.update_vault()

# Initialize a GeneralQueryAgent with any required tools
# For simplicity, no additional tools are added here
agent = GeneralQueryAgent(repo_path=notes_dir, tools=[])

In [None]:
local_repo_init ="/knowledge_base/general/init_local_repo"
local_repo_init_url = base_url + local_repo_init

base_data = {
  "path" : "/Users/dcarmocan/Projects/mind-cave",
  "type": "Notes"
}

repo_req = requests.post(local_repo_init_url, json=base_data)

In [None]:
repo_req.text