<a href="https://colab.research.google.com/github/tomasonjo/blogs/blob/master/llm/ParentChildRetriever.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install neo4j openai tiktoken langchain wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11678 sha256=9f90f32a58c160130e1b271219d426d94fd6559de2b5a6cf79338727baaa27ad
  Stored in directory: /root/.cache/pip/wheels/5e/b6/c5/93f3dec388ae76edc830cb42901bb0232504dfc0df02fc50de
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [3]:
from langchain.graphs import Neo4jGraph

url = "neo4j+s://fc4af8ee.databases.neo4j.io"
username ="neo4j"
password = ""
graph = Neo4jGraph(
    url=url,
    username=username,
    password=password
)

In [6]:
from langchain.document_loaders import WikipediaLoader
from langchain.text_splitter import TokenTextSplitter

# Read the wikipedia article
raw_documents = WikipediaLoader(query="Walt Disney").load()
# Define chunking strategy
parent_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=24)
child_splitter = TokenTextSplitter(chunk_size=100, chunk_overlap=24)

parent_documents = parent_splitter.split_documents(raw_documents)
for d in parent_documents:
    child_documents = child_splitter.split_documents([d])
    params = {"parent": d.page_content, "children": [c.page_content for c in child_documents]}
    graph.query("""
    CREATE (p:Parent {text: $parent})
    WITH p
    UNWIND $children AS child
    CREATE (c:Child {text: child})
    CREATE (c)-[:HAS_PARENT]->(p)
    """, params)

In [7]:
import os

os.environ['OPENAI_API_KEY'] = "sk-"
from langchain.vectorstores.neo4j_vector import Neo4jVector
from langchain.embeddings.openai import OpenAIEmbeddings

retrieval_query = """
MATCH (node)-[:HAS_PARENT]->(parent)
RETURN parent.text AS text, score, {} AS metadata
"""

vector_index = Neo4jVector.from_existing_graph(
    OpenAIEmbeddings(),
    url=url,
    username=username,
    password=password,
    index_name='retrieval',
    node_label="Child",
    text_node_properties=['text'],
    embedding_node_property='embedding',
    retrieval_query = retrieval_query
)

In [8]:
response = vector_index.similarity_search("Where was Walt Disney born?")
print(response[0].page_content)

Walter Elias Disney (; December 5, 1901 –  December 15, 1966) was an American animator, film producer, and entrepreneur. A pioneer of the American animation industry, he introduced several developments in the production of cartoons. As a film producer, he holds the record for most Academy Awards earned and nominations by an individual. He was presented with two Golden Globe Special Achievement Awards and an Emmy Award, among other honors. Several of his films are included in the National Film Registry by the Library of Congress and have also been named as some of the greatest films ever by the American Film Institute.
Born in Chicago in 1901, Disney developed an early interest in drawing. He took art classes as a boy and took a job as a commercial illustrator at the age of 18. He moved to California in the early 1920s and set up the Disney Brothers Studio (now The Walt Disney Company) with his brother Roy. With Ub Iwerks, he developed the character Mickey Mouse in 1928, his first highl

In [9]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

vector_qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(), chain_type="stuff", retriever=vector_index.as_retriever())

In [10]:
response = vector_qa.run("Where was Walt Disney born?")
print(response)

Walt Disney was born in Chicago, Illinois, United States.
