In [1]:
import logging

from langchain.prompts import ChatPromptTemplate
import json
from langchain.document_loaders import TextLoader
from langchain.document_loaders import DirectoryLoader
from langchain.chains import create_extraction_chain
from langchain.chat_models import ChatOpenAI
import re

from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()
import instructor
from openai import OpenAI


aclient = instructor.patch(OpenAI())

from typing import Optional, List
from pydantic import BaseModel, Field


In [2]:
input = """ The economist J.K. Galbraith once wrote, “Faced with a choice between changing one’s mind and proving there is no need to do so, almost everyone gets busy with the proof.”

Leo Tolstoy was even bolder: “The most difficult subjects can be explained to the most slow-witted man if he has not formed any idea of them already; but the simplest thing cannot be made clear to the most intelligent man if he is firmly persuaded that he knows already, without a shadow of doubt, what is laid before him.”

What’s going on here? Why don’t facts change our minds? And why would someone continue to believe a false or inaccurate idea anyway? How do such behaviors serve us?
The Logic of False Beliefs
Humans need a reasonably accurate view of the world in order to survive. If your model of reality is wildly different from the actual world, then you struggle to take effective actions each day.

However, truth and accuracy are not the only things that matter to the human mind. Humans also seem to have a deep desire to belong.

In Atomic Habits, I wrote, “Humans are herd animals. We want to fit in, to bond with others, and to earn the respect and approval of our peers. Such inclinations are essential to our survival. For most of our evolutionary history, our ancestors lived in tribes. Becoming separated from the tribe—or worse, being cast out—was a death sentence.”

Understanding the truth of a situation is important, but so is remaining part of a tribe. While these two desires often work well together, they occasionally come into conflict.

In many circumstances, social connection is actually more helpful to your daily life than understanding the truth of a particular fact or idea. The Harvard psychologist Steven Pinker put it this way, “People are embraced or condemned according to their beliefs, so one function of the mind may be to hold beliefs that bring the belief-holder the greatest number of allies, protectors, or disciples, rather than beliefs that are most likely to be true.”

We don’t always believe things because they are correct. Sometimes we believe things because they make us look good to the people we care about.

I thought Kevin Simler put it well when he wrote, “If a brain anticipates that it will be rewarded for adopting a particular belief, it’s perfectly happy to do so, and doesn’t much care where the reward comes from — whether it’s pragmatic (better outcomes resulting from better decisions), social (better treatment from one’s peers), or some mix of the two.”

False beliefs can be useful in a social sense even if they are not useful in a factual sense. For lack of a better phrase, we might call this approach “factually false, but socially accurate.” When we have to choose between the two, people often select friends and family over facts.

This insight not only explains why we might hold our tongue at a dinner party or look the other way when our parents say something offensive, but also reveals a better way to change the minds of others.

Facts Don’t Change Our Minds. Friendship Does.
Convincing someone to change their mind is really the process of convincing them to change their tribe. If they abandon their beliefs, they run the risk of losing social ties. You can’t expect someone to change their mind if you take away their community too. You have to give them somewhere to go. Nobody wants their worldview torn apart if loneliness is the outcome.

The way to change people’s minds is to become friends with them, to integrate them into your tribe, to bring them into your circle. Now, they can change their beliefs without the risk of being abandoned socially.

The British philosopher Alain de Botton suggests that we simply share meals with those who disagree with us:

“Sitting down at a table with a group of strangers has the incomparable and odd benefit of making it a little more difficult to hate them with impunity. Prejudice and ethnic strife feed off abstraction. However, the proximity required by a meal – something about handing dishes around, unfurling napkins at the same moment, even asking a stranger to pass the salt – disrupts our ability to cling to the belief that the outsiders who wear unusual clothes and speak in distinctive accents deserve to be sent home or assaulted. For all the large-scale political solutions which have been proposed to salve ethnic conflict, there are few more effective ways to promote tolerance between suspicious neighbours than to force them to eat supper together.”

Perhaps it is not difference, but distance that breeds tribalism and hostility. As proximity increases, so does understanding. I am reminded of Abraham Lincoln’s quote, “I don’t like that man. I must get to know him better.”

Facts don’t change our minds. Friendship does.

The Spectrum of Beliefs
Years ago, Ben Casnocha mentioned an idea to me that I haven’t been able to shake: The people who are most likely to change our minds are the ones we agree with on 98 percent of topics.

If someone you know, like, and trust believes a radical idea, you are more likely to give it merit, weight, or consideration. You already agree with them in most areas of life. Maybe you should change your mind on this one too. But if someone wildly different than you proposes the same radical idea, well, it’s easy to dismiss them as a crackpot.

One way to visualize this distinction is by mapping beliefs on a spectrum. If you divide this spectrum into 10 units and you find yourself at Position 7, then there is little sense in trying to convince someone at Position 1. The gap is too wide. When you’re at Position 7, your time is better spent connecting with people who are at Positions 6 and 8, gradually pulling them in your direction.

The most heated arguments often occur between people on opposite ends of the spectrum, but the most frequent learning occurs from people who are nearby. The closer you are to someone, the more likely it becomes that the one or two beliefs you don’t share will bleed over into your own mind and shape your thinking. The further away an idea is from your current position, the more likely you are to reject it outright.

When it comes to changing people’s minds, it is very difficult to jump from one side to another. You can’t jump down the spectrum. You have to slide down it.

Any idea that is sufficiently different from your current worldview will feel threatening. And the best place to ponder a threatening idea is in a non-threatening environment. As a result, books are often a better vehicle for transforming beliefs than conversations or debates.

In conversation, people have to carefully consider their status and appearance. They want to save face and avoid looking stupid. When confronted with an uncomfortable set of facts, the tendency is often to double down on their current position rather than publicly admit to being wrong.

Books resolve this tension. With a book, the conversation takes place inside someone’s head and without the risk of being judged by others. It’s easier to be open-minded when you aren’t feeling defensive.

Arguments are like a full frontal attack on a person’s identity. Reading a book is like slipping the seed of an idea into a person’s brain and letting it grow on their own terms. There’s enough wrestling going on in someone’s head when they are overcoming a pre-existing belief. They don’t need to wrestle with you too.

Why False Ideas Persist
There is another reason bad ideas continue to live on, which is that people continue to talk about them.

Silence is death for any idea. An idea that is never spoken or written down dies with the person who conceived it. Ideas can only be remembered when they are repeated. They can only be believed when they are repeated.

I have already pointed out that people repeat ideas to signal they are part of the same social group. But here’s a crucial point most people miss:

People also repeat bad ideas when they complain about them. Before you can criticize an idea, you have to reference that idea. You end up repeating the ideas you’re hoping people will forget—but, of course, people can’t forget them because you keep talking about them. The more you repeat a bad idea, the more likely people are to believe it.

Let’s call this phenomenon Clear’s Law of Recurrence: The number of people who believe an idea is directly proportional to the number of times it has been repeated during the last year—even if the idea is false.

Each time you attack a bad idea, you are feeding the very monster you are trying to destroy. As one Twitter employee wrote, “Every time you retweet or quote tweet someone you’re angry with, it helps them. It disseminates their BS. Hell for the ideas you deplore is silence. Have the discipline to give it to them.”

Your time is better spent championing good ideas than tearing down bad ones. Don’t waste time explaining why bad ideas are bad. You are simply fanning the flame of ignorance and stupidity.

The best thing that can happen to a bad idea is that it is forgotten. The best thing that can happen to a good idea is that it is shared. It makes me think of Tyler Cowen’s quote, “Spend as little time as possible talking about how other people are wrong.”

Feed the good ideas and let bad ideas die of starvation.


"""

In [3]:
""" We classify input based on the available document types"""

classification = {
    "Natural Language Text": {
        "type": "TEXT",
        "subclass": [
            "Articles, essays, and reports",
            "Books and manuscripts",
            "News stories and blog posts",
            "Research papers and academic publications",
            "Social media posts and comments",
            "Website content and product descriptions",
            "Personal narratives and stories"
        ]
    },
    "Structured Documents": {
        "type": "TEXT",
        "subclass": [
            "Spreadsheets and tables",
            "Forms and surveys",
            "Databases and CSV files"
        ]
    },
    "Code and Scripts": {
        "type": "TEXT",
        "subclass": [
            "Source code in various programming languages",
            "Shell commands and scripts",
            "Markup languages (HTML, XML)",
            "Stylesheets (CSS) and configuration files (YAML, JSON, INI)"
        ]
    },
    "Conversational Data": {
        "type": "TEXT",
        "subclass": [
            "Chat transcripts and messaging history",
            "Customer service logs and interactions",
            "Conversational AI training data"
        ]
    },
    "Educational Content": {
        "type": "TEXT",
        "subclass": [
            "Textbook content and lecture notes",
            "Exam questions and academic exercises",
            "E-learning course materials"
        ]
    },
    "Creative Writing": {
        "type": "TEXT",
        "subclass": [
            "Poetry and prose",
            "Scripts for plays, movies, and television",
            "Song lyrics"
        ]
    },
    "Technical Documentation": {
        "type": "TEXT",
        "subclass": [
            "Manuals and user guides",
            "Technical specifications and API documentation",
            "Helpdesk articles and FAQs"
        ]
    },
    "Legal and Regulatory Documents": {
        "type": "TEXT",
        "subclass": [
            "Contracts and agreements",
            "Laws, regulations, and legal case documents",
            "Policy documents and compliance materials"
        ]
    },
    "Medical and Scientific Texts": {
        "type": "TEXT",
        "subclass": [
            "Clinical trial reports",
            "Patient records and case notes",
            "Scientific journal articles"
        ]
    },
    "Financial and Business Documents": {
        "type": "TEXT",
        "subclass": [
            "Financial reports and statements",
            "Business plans and proposals",
            "Market research and analysis reports"
        ]
    },
    "Advertising and Marketing Materials": {
        "type": "TEXT",
        "subclass": [
            "Ad copies and marketing slogans",
            "Product catalogs and brochures",
            "Press releases and promotional content"
        ]
    },
    "Emails and Correspondence": {
        "type": "TEXT",
        "subclass": [
            "Professional and formal correspondence",
            "Personal emails and letters"
        ]
    },
    "Metadata and Annotations": {
        "type": "TEXT",
        "subclass": [
            "Image and video captions",
            "Annotations and metadata for various media"
        ]
    },
    "Language Learning Materials": {
        "type": "TEXT",
        "subclass": [
            "Vocabulary lists and grammar rules",
            "Language exercises and quizzes"
        ]
    },
    "Audio Content": {
    "type": "AUDIO",
    "subclass": [
        "Music tracks and albums",
        "Podcasts and radio broadcasts",
        "Audiobooks and audio guides",
        "Recorded interviews and speeches",
        "Sound effects and ambient sounds"
    ]
    },
    "Image Content": {
        "type": "IMAGE",
        "subclass": [
            "Photographs and digital images",
            "Illustrations, diagrams, and charts",
            "Infographics and visual data representations",
            "Artwork and paintings",
            "Screenshots and graphical user interfaces"
        ]
    },
    "Video Content": {
        "type": "VIDEO",
        "subclass": [
            "Movies and short films",
            "Documentaries and educational videos",
            "Video tutorials and how-to guides",
            "Animated features and cartoons",
            "Live event recordings and sports broadcasts"
        ]
    },
    "Multimedia Content": {
        "type": "MULTIMEDIA",
        "subclass": [
            "Interactive web content and games",
            "Virtual reality (VR) and augmented reality (AR) experiences",
            "Mixed media presentations and slide decks",
            "E-learning modules with integrated multimedia",
            "Digital exhibitions and virtual tours"
        ]
    },
    "3D Models and CAD Content": {
        "type": "3D_MODEL",
        "subclass": [
            "Architectural renderings and building plans",
            "Product design models and prototypes",
            "3D animations and character models",
            "Scientific simulations and visualizations",
            "Virtual objects for AR/VR environments"
        ]
    },
    "Procedural Content": {
        "type": "PROCEDURAL",
        "subclass": [
            "Tutorials and step-by-step guides",
            "Workflow and process descriptions",
            "Simulation and training exercises",
            "Recipes and crafting instructions"
        ]
    }
}

system_prompt = f""" Classify content based on the following categories: {str(classification)}"""

In [4]:
class CognitiveLayerSubgroup(BaseModel):
    """ CognitiveCategorySubgroup in a general category """
    id: int
    name:str
    data_type:str


class CognitiveCategory(BaseModel):
    """Cognitive  category"""
    name:str
    cognitive_subgroups: List[CognitiveLayerSubgroup] = Field(..., default_factory=list)

In [5]:
def classify_input(input) -> CognitiveCategory:
    """Classify input"""
    model = "gpt-4-1106-preview"
    user_prompt = f"Use the given format to extract information from the following input: {input}."


    out = aclient.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": user_prompt,
            },
            {
                "role": "system",
                "content": system_prompt,
            },
            
            {
                "role": "system",
                "content": "Make sure both values are returned. Incomplete results will result in termination",
            },
        ],
        response_model=CognitiveCategory,
    )
    return out

In [6]:
required_layers = classify_input(input = input)
print(required_layers.json())

{"name":"Natural Language Text","cognitive_subgroups":[{"id":1,"name":"Articles, essays, and reports","data_type":"TEXT"}]}


In [7]:
print(required_layers.dict())

{'name': 'Natural Language Text', 'cognitive_subgroups': [{'id': 1, 'name': 'Articles, essays, and reports', 'data_type': 'TEXT'}]}


In [9]:
print("Hello")

Hello


In [8]:
print(required_layers.dict()['cognitive_subgroups'][0]['data_type'])

TEXT


In [9]:


system_prompt = f"""
You are tasked with analyzing a {required_layers.dict()['cognitive_subgroups'][0]['data_type']} files, especially in a multilayer network context for tasks such as analysis, categorization, and feature extraction, various layers can be incorporated to capture the depth and breadth of information contained within the {required_layers.dict()['cognitive_subgroups'][0]['data_type']} 
These layers can help in understanding the content, context, and characteristics of the {required_layers.dict()['cognitive_subgroups'][0]['data_type']}
Your objective is to extract meaningful layers of information that will contribute to constructing a detailed multilayer network or knowledge graph.
Approach this task by considering the unique characteristics and inherent properties of the data at hand.
VERY IMPORTANT: The context you are working in is {required_layers.dict()['name']} and specific domain you are extracting data on is {required_layers.dict()['cognitive_subgroups'][0]['name']}

Guidelines for Layer Extraction:

Take into account: The content type that in this case is: {required_layers.dict()['cognitive_subgroups'][0]['name']} should play a major role in how you decompose into layers.

Based on your analysis, define and describe the layers you've identified, explaining their relevance and contribution to understanding the dataset. Your independent identification of layers will enable a nuanced and multifaceted representation of the data, enhancing applications in knowledge discovery, content analysis, and information retrieval.

."""

In [10]:
class CognitiveLayerSubgroup(BaseModel):
    """ CognitiveLayerSubgroup in a general layer """
    id: int
    name:str
    description: str


class CognitiveLayer(BaseModel):
    """Cognitive  layer"""
    category_name:str
    cognitive_layers: List[CognitiveLayerSubgroup] = Field(..., default_factory=list)

In [11]:
def determine_layers(input) -> CognitiveLayer:
    """Classify input"""
    model = "gpt-4-1106-preview"
    user_prompt = f"Use the given format to extract information from the following input: {input}."


    out = aclient.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": user_prompt,
            },
            {
                "role": "system",
                "content": system_prompt,
            },
        ],
        response_model=CognitiveLayer,
    )
    return out

In [12]:
cognitive_layers = determine_layers(input=input)

In [13]:
print(cognitive_layers)

category_name='Natural Language Text' cognitive_layers=[CognitiveLayerSubgroup(id=1, name='Thematic Content Layer', description='This layer captures the central themes, topics, or subject matter of the articles, essays, and reports. It involves analyzing the text for key messages, main arguments, and overarching ideologies, which aids in understanding the focus and direction of the content.'), CognitiveLayerSubgroup(id=2, name='Structural Composition Layer', description="This layer encompasses the organization and structure of the text, including headings, subheadings, paragraphs, and overall layout. Understanding how the text is organized can reveal the author's approach to presenting information and building their argument."), CognitiveLayerSubgroup(id=3, name='Linguistic Style Layer', description="This layer analyzes the linguistic choices and writing styles used in the text, such as vocabulary, sentence construction, and use of literary devices. It provides insights into the author

In [39]:
layer_names = [layer_subgroup.name for layer_subgroup in cognitive_layers.cognitive_layers]

print("Extracted Layer Names:", layer_names)

Extracted Layer Names: ['Thematic Content Layer', 'Structural Composition Layer', 'Linguistic Style Layer', 'Contextual Relevance Layer', 'Semantic Connectivity Layer', 'Rhetorical Strategies Layer', 'Audience Engagement Layer', 'Narrative Flow Layer', 'Cognitive Dissonance Layer', 'Social Influence Layer']


In [40]:
layer_names[:3]


['Thematic Content Layer',
 'Structural Composition Layer',
 'Linguistic Style Layer']

In [15]:
def system_prompt(layer:str=None)->str: 
    return f"""You are a top-tier algorithm
designed for extracting information in structured formats to build a knowledge graph.
- **Nodes** represent entities and concepts. They're akin to Wikipedia nodes.
- **Edges** represent relationships between concepts. They're akin to Wikipedia links.
- The aim is to achieve simplicity and clarity in the
knowledge graph, making it accessible for a vast audience.
YOU ARE ONLY EXTRACTING DATA FOR COGNITIVE LAYER {layer}
## 2. Labeling Nodes
- **Consistency**: Ensure you use basic or elementary types for node labels.
  - For example, when you identify an entity representing a person,
   always label it as **"person"**.
  Avoid using more specific terms like "mathematician" or "scientist".
  - Include event, entity, time, or action nodes to the category.
  - Classify the memory type as episodic or semantic.
- **Node IDs**: Never utilize integers as node IDs.
    Node IDs should be names or human-readable identifiers found in the text.
## 3. Handling Numerical Data and Dates
- Numerical data, like age or other related information,
should be incorporated as attributes or properties of the respective nodes.
- **No Separate Nodes for Dates/Numbers**:
Do not create separate nodes for dates or numerical values.
 Always attach them as attributes or properties of nodes.
- **Property Format**: Properties must be in a key-value format.
- **Quotation Marks**: Never use escaped single or double quotes within property values.
- **Naming Convention**: Use camelCase for property keys, e.g., `birthDate`.
## 4. Coreference Resolution
- **Maintain Entity Consistency**:
When extracting entities, it's vital to ensure consistency.
If an entity, such as "John Doe", is mentioned multiple times
in the text but is referred to by different names or pronouns (e.g., "Joe", "he"),
always use the most complete identifier for that entity throughout the knowledge graph.
 In this example, use "John Doe" as the entity ID.
Remember, the knowledge graph should be coherent and easily understandable,
 so maintaining consistency in entity references is crucial.
## 5. Strict Compliance
Adhere to the rules strictly. Non-compliance will result in termination"""

In [16]:
import instructor
from openai import OpenAI

from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

aclient = instructor.patch(OpenAI())

from typing import Optional, List
from pydantic import BaseModel, Field


class Node(BaseModel):
    """Node in a knowledge graph."""
    id: int
    description: str
    category: str
    memory_type: str
    created_at: Optional[float] = None
    summarized: Optional[bool] = None


class Edge(BaseModel):
    """Edge in a knowledge graph."""
    source: int
    target: int
    description: str
    created_at: Optional[float] = None
    summarized: Optional[bool] = None


class KnowledgeGraph(BaseModel):
    """Knowledge graph."""
    nodes: List[Node] = Field(..., default_factory=list)
    edges: List[Edge] = Field(..., default_factory=list)


def generate_graph(input, layer:str=None) -> KnowledgeGraph:
    """Generate a knowledge graph from a user query."""
    model = "gpt-4-1106-preview"
    user_prompt = f"Use the given format to extract information from the following input: {input}."


    out = aclient.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": user_prompt,
            },
            {
                "role": "system",
                "content": system_prompt(layer=layer),
            },
            {
                "role": "system",
                "content": "Must include both nodes and edges",
            },
        ],
        response_model=KnowledgeGraph,
    )
    return out

In [41]:
layer_graphs = []

for layer in layer_names[:3]:
    print("Layer processed is:", str(layer))
    layer_graph = generate_graph(input=input, layer= layer)
    print("Layer graph is:", str(layer_graph))
    layer_graphs.append(layer_graph)



Layer processed is: Thematic Content Layer
Layer graph is: nodes=[Node(id=1, description='Economist who wrote about the preference of people to prove existing beliefs over changing their minds.', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=2, description='Believed that unformed minds can understand difficult subjects but biased minds cannot accept even simple things.', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=3, description='Humans prioritize social belonging over accuracy, causing them to hold beliefs for social advantage rather than truth.', category='concept', memory_type='semantic', created_at=None, summarized=None), Node(id=4, description='A powerful human need which often conflicts with the pursuit of truth.', category='concept', memory_type='semantic', created_at=None, summarized=None), Node(id=5, description='The conflict between social belonging and understanding the truth.', category='conce

In [18]:
layer_decomposition

KnowledgeGraph(nodes=[Node(id=1, description='J.K. Galbraith', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=2, description='Leo Tolstoy', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=3, description='Steven Pinker', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=4, description='Kevin Simler', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=5, description='Alain de Botton', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=6, description='Abraham Lincoln', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=7, description='Ben Casnocha', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=8, description='Tyler Cowen', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=9, description='Atomic Habi

In [19]:
import networkx as nx
import uuid
from datetime import datetime

def create_user_content_graph(user_id, custom_user_properties=None, additional_categories=None, default_fields=None):
    # Define default fields for all nodes if not provided
    if default_fields is None:
        default_fields = {
            'created_at': datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
            'updated_at': datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        }

    # Merge custom user properties with default properties; custom properties take precedence
    user_properties = {**default_fields, **(custom_user_properties or {})}

    # Default content categories
    content_categories = {
        "Temporal": ["Historical events", "Schedules and timelines"],
        "Positional": ["Geographical locations", "Spatial data"],
        "Propositions": ["Hypotheses and theories", "Claims and arguments"],
        "Personalization": ["User preferences", "User information"]
    }

    # Update content categories with any additional categories provided
    if additional_categories:
        content_categories.update(additional_categories)

    # Create a new MultiDiGraph
    G = nx.MultiDiGraph()

    # Add the user node with properties
    G.add_node(user_id, **user_properties)

    # Add content category nodes and their edges
    for category, subclasses in content_categories.items():
        category_properties = {**default_fields, 'type': 'category'}
        G.add_node(category, **category_properties)
        G.add_edge(user_id, category, relationship='created')

        # Add subclass nodes and their edges
        for subclass in subclasses:
            unique_id = str(uuid.uuid4())
            subclass_node_id = f"{subclass} - {unique_id}"
            subclass_properties = {**default_fields, 'type': 'subclass', 'content': subclass}
            G.add_node(subclass_node_id, **subclass_properties)
            G.add_edge(category, subclass_node_id, relationship='includes')

    return G



In [20]:
category_name = required_layers.dict()['name']

# Extract the names of the cognitive subgroups
subgroup_names = [subgroup['name'] for subgroup in required_layers.dict()['cognitive_subgroups']]

# Construct the additional_categories structure
additional_categories = {
    category_name: subgroup_names
}



In [44]:
additional_categories

{'Natural Language Text': ['Articles, essays, and reports']}

In [21]:
# Example usage
user_id = 'user123'
custom_user_properties = {
    'username': 'exampleUser',
    'email': 'user@example.com'
}

# additional_categories = {
#     "Natural Language Text": ["Articles, essays, and reports", "Books and manuscripts"]
# }

G = create_user_content_graph(user_id, custom_user_properties, additional_categories)

# Accessing the graph
print("Nodes in the graph:")
print(G.nodes(data=True))
print("\nEdges in the graph:")
print(G.edges(data=True))

Nodes in the graph:
[('user123', {'created_at': '2024-02-25 17:25:52', 'updated_at': '2024-02-25 17:25:52', 'username': 'exampleUser', 'email': 'user@example.com'}), ('Temporal', {'created_at': '2024-02-25 17:25:52', 'updated_at': '2024-02-25 17:25:52', 'type': 'category'}), ('Historical events - fe2b09db-93cb-4763-b186-5e08aa83b2f0', {'created_at': '2024-02-25 17:25:52', 'updated_at': '2024-02-25 17:25:52', 'type': 'subclass', 'content': 'Historical events'}), ('Schedules and timelines - 9584f7c2-bb9e-4b8d-9bf5-78545e705648', {'created_at': '2024-02-25 17:25:52', 'updated_at': '2024-02-25 17:25:52', 'type': 'subclass', 'content': 'Schedules and timelines'}), ('Positional', {'created_at': '2024-02-25 17:25:52', 'updated_at': '2024-02-25 17:25:52', 'type': 'category'}), ('Geographical locations - a695710f-9a71-4949-81a4-773e7c693802', {'created_at': '2024-02-25 17:25:52', 'updated_at': '2024-02-25 17:25:52', 'type': 'subclass', 'content': 'Geographical locations'}), ('Spatial data - 8ee

In [22]:
import graphistry
import pandas as pd

# Assuming Graphistry is already configured with API key
# graphistry.register(api=3, username='your_username', password='your_password')

# Convert NetworkX graph to a Pandas DataFrame
edges = nx.to_pandas_edgelist(G)
graphistry.register(api=3, username='Vasilije1990', password='Q@HLdgv5SMUsGxy') 

# Visualize the graph
graphistry.edges(edges, 'source', 'target').plot()

In [23]:
print(G)

MultiDiGraph with 15 nodes and 14 edges


In [24]:
layer_decomposition 

KnowledgeGraph(nodes=[Node(id=1, description='J.K. Galbraith', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=2, description='Leo Tolstoy', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=3, description='Steven Pinker', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=4, description='Kevin Simler', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=5, description='Alain de Botton', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=6, description='Abraham Lincoln', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=7, description='Ben Casnocha', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=8, description='Tyler Cowen', category='person', memory_type='semantic', created_at=None, summarized=None), Node(id=9, description='Atomic Habi

In [35]:
def append_data_to_graph(G, category_name, subclass_content, new_data):
    # Find the node ID for the subclass within the category
    subclass_node_id = None
    for node, data in G.nodes(data=True):
        if subclass_content in node:
            subclass_node_id = node
            break

    if not subclass_node_id:
        print(f"Subclass '{subclass_content}' under category '{category_name}' not found in the graph.")
        return G

    # Mapping from old node IDs to new node IDs
    node_id_mapping = {}

    # Add nodes from the Pydantic object
    for node in new_data.nodes:
        new_node_id = f"{subclass_node_id} - {str(uuid.uuid4())}"
        G.add_node(new_node_id, 
                   created_at=datetime.now().strftime("%Y-%m-%d %H:%M:%S"), 
                   updated_at=datetime.now().strftime("%Y-%m-%d %H:%M:%S"), 
                   description=node.description, 
                   category=node.category, 
                   memory_type=node.memory_type, 
                   type='detail')
        G.add_edge(subclass_node_id, new_node_id, relationship='detail')

        # Store the mapping from old node ID to new node ID
        node_id_mapping[node.id] = new_node_id

    # Add edges from the Pydantic object using the new node IDs
    for edge in new_data.edges:
        # Use the mapping to get the new node IDs
        source_node_id = node_id_mapping.get(edge.source)
        target_node_id = node_id_mapping.get(edge.target)

        if source_node_id and target_node_id:
            G.add_edge(source_node_id, target_node_id, description=edge.description, relationship='relation')
        else:
            print(f"Could not find mapping for edge from {edge.source} to {edge.target}")

    return G




# Assuming `pydata` is your Pydantic model instance containing the nodes and edges information
# and `G` is your existing graph

# Here's how you would call this function:
category_name = list(additional_categories.keys())[0]
subclass_content = list(additional_categories.values())[0][0]


Updated Nodes: MultiDiGraph with 43 nodes and 52 edges


In [42]:
for layer_decomposition in layer_graphs:
    F = append_data_to_graph(G, category_name, subclass_content, layer_decomposition)
    
    # Print updated graph for verification
    print("Updated Nodes:", F)


Updated Nodes: MultiDiGraph with 58 nodes and 81 edges
Updated Nodes: MultiDiGraph with 74 nodes and 105 edges
Updated Nodes: MultiDiGraph with 88 nodes and 131 edges


In [43]:
import graphistry
import pandas as pd

# Assuming Graphistry is already configured with API key
# graphistry.register(api=3, username='your_username', password='your_password')

# Convert NetworkX graph to a Pandas DataFrame
edges = nx.to_pandas_edgelist(F)
graphistry.register(api=3, username='Vasilije1990', password='') 

# Visualize the graph
graphistry.edges(edges, 'source', 'target').plot()

In [40]:
print(list(additional_categories.values())[0][0])

Articles, essays, and reports


In [42]:
list(additional_categories.keys())[0]

'Natural Language Text'

In [37]:
def list_graph_relationships_with_node_attributes(graph):
    print("Graph Relationships with Node Attributes:")
    for source, target, data in graph.edges(data=True):
        # Get source and target node attributes
        source_attrs = graph.nodes[source]
        target_attrs = graph.nodes[target]
        relationship = data.get('relationship', 'No relationship specified')

        # Format and print source and target node attributes along with the relationship
        source_attrs_formatted = ', '.join([f"{k}: {v}" for k, v in source_attrs.items()])
        target_attrs_formatted = ', '.join([f"{k}: {v}" for k, v in target_attrs.items()])
        
        print(f"Source [{source_attrs_formatted}] -> Target [{target_attrs_formatted}]: Relationship [{relationship}]")

# Assuming 'F' is your graph instance
list_graph_relationships_with_node_attributes(F)

Graph Relationships with Node Attributes:
Source [created_at: 2024-02-25 17:25:52, updated_at: 2024-02-25 17:25:52, username: exampleUser, email: user@example.com] -> Target [created_at: 2024-02-25 17:25:52, updated_at: 2024-02-25 17:25:52, type: category]: Relationship [created]
Source [created_at: 2024-02-25 17:25:52, updated_at: 2024-02-25 17:25:52, username: exampleUser, email: user@example.com] -> Target [created_at: 2024-02-25 17:25:52, updated_at: 2024-02-25 17:25:52, type: category]: Relationship [created]
Source [created_at: 2024-02-25 17:25:52, updated_at: 2024-02-25 17:25:52, username: exampleUser, email: user@example.com] -> Target [created_at: 2024-02-25 17:25:52, updated_at: 2024-02-25 17:25:52, type: category]: Relationship [created]
Source [created_at: 2024-02-25 17:25:52, updated_at: 2024-02-25 17:25:52, username: exampleUser, email: user@example.com] -> Target [created_at: 2024-02-25 17:25:52, updated_at: 2024-02-25 17:25:52, type: category]: Relationship [created]
So

In [45]:
import os


In [46]:
print(os.getcwd())

/Users/vasa/Projects/cognee/cognee


## HOW TO QUERY THE GRAPH