# Document Loading & Retrieval

Author: [Abdulazeez Jimoh](https://github.com/abdulazeezoj)


This implementation is based on:

- PyPDF - PDF Document Parsing
- Llama Index - Document Loading & Retrieval
- Weaviate - Vector Storage
- Neo4J - Graph Database
- Redis - Ingestion Cache & Document Storage
- OpenAI - Large Language Models


In [2]:
from pprint import pprint

from llama_index.agent import OpenAIAgent
from llama_index.embeddings import OpenAIEmbedding
from llama_index.indices.knowledge_graph.retrievers import KGRetrieverMode
from llama_index.llms import OpenAI
from llama_index.query_engine import SubQuestionQueryEngine
from llama_index.response.schema import Response
from llama_index.response_synthesizers.type import ResponseMode
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.vector_stores.types import VectorStoreQueryMode

from diplodigst.config import DiploConfig
from diplodigst.tools import DiploDocLoader, DiploDocRetriever
from diplodigst.types import DiploIndex

In [3]:
diplo_config = DiploConfig()

In [4]:
# initialise embed model
llm_model = OpenAI(api_key=diplo_config.OPENAI_API_KEY)
embed_model = OpenAIEmbedding(api_key=diplo_config.OPENAI_API_KEY)

In [5]:
# initialise document loader
doc_loader = DiploDocLoader(
    embed_model=embed_model,
    llm_model=llm_model,
    weaviate_host=diplo_config.WEAVIATE_HOST,
    weaviate_port=diplo_config.WEAVIATE_PORT,
    redis_host=diplo_config.REDIS_HOST,
    redis_port=diplo_config.REDIS_PORT,
    neo4j_host=diplo_config.NEO4J_HOST,
    neo4j_port=diplo_config.NEO4J_PORT,
    neo4j_username=diplo_config.NEO4J_USERNAME,
    neo4j_password=diplo_config.NEO4J_PASSWORD,
    name=diplo_config.PROJECT_NAME,
    chunk_size=500,
    chunk_overlap=25,
    verbose=True,
)

[ INFO ] Initializing vector store
[ INFO ] Vector store initialized
[ INFO ] Initializing graph store
[ INFO ] Graph store initialized
[ INFO ] Initializing document store
[ INFO ] Document store initialized
[ INFO ] Initializing index store
[ INFO ] Index store initialized
[ INFO ] Initializing ingestion cache
[ INFO ] Ingestion cache initialized
[ INFO ] Initializing LLM and embedding models
[ INFO ] LLM and embedding models initialized
[ INFO ] Initializing service context
[ INFO ] Service context initialized
[ INFO ] Initializing storage context
[ INFO ] Storage context initialized
[ INFO ] Initializing ingestion pipeline
[ INFO ] Ingestion pipeline initialized


In [6]:
# load documents index
doc_index: DiploIndex = doc_loader.load()

[ INFO ] Loading vector index
[ INFO ] Vector index loaded
[ INFO ] Loading knowledge graph index
[ INFO ] Knowledge graph index loaded


In [10]:
context_query_engine = QueryEngineTool(
    query_engine=doc_index.vector.as_query_engine(similarity_top_k=3),
    metadata=ToolMetadata(
        name="context_query_engine",
        description=(
            "Provides information for a query from documents."
            "Use a question as input to the tool."
        ),
    ),
)

knowledge_query_engine = QueryEngineTool(
    query_engine=doc_index.graph.as_query_engine(similarity_top_k=3),
    metadata=ToolMetadata(
        name="knowledge_query_engine",
        description=(
            "Provides information for a query from documents."
            "Use a question as input to the tool."
        ),
    ),
)

In [11]:
diplo_knowledge_agent: OpenAIAgent = OpenAIAgent.from_tools(
    tools=[knowledge_query_engine],
    verbose=True,
)

diplo_context_agent: OpenAIAgent = OpenAIAgent.from_tools(
    tools=[context_query_engine],
    verbose=True,
)

In [9]:
diplo_knowledge_agent.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.

Added user message to memory: Hello
Assistant: Hi there! How can I assist you today?

Added user message to memory: What are the commitments of Switzerland to the global digital compact?
=== Calling Function ===
Calling function: knowledge_query_engine with args: {
  "input": "commitments of Switzerland to the global digital compact"
}
Got output: Switzerland's commitments to the global digital compact include actively supporting the development of interoperable trustworthy data spaces and exploring further processes and policies. The country is committed to making digital self-determination a reality and has supported OHCHR's B-tech project. Switzerland is also a member of the Freedom Online Coalition and attaches importance to the fight against online discrimination and the distribution of misleading content. Additionally, Switzerland chairs a committee aiming to negotiate a framework convention and supports the AI for Good Global 

In [12]:
diplo_context_agent.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.

Added user message to memory: Hello
Assistant: Hi there! How can I assist you today?

Added user message to memory: What are the commitments of Switzerland to the global digital compact?
=== Calling Function ===
Calling function: context_query_engine with args: {
  "input": "commitments of Switzerland to the global digital compact"
}
Got output: Switzerland commits to further supporting efforts to address connectivity issues, such as through the GIGA Initiative and other efforts undertaken by ITU, UNESCO, and other UN institutions. Switzerland also commits to further international discussions and best practices on enabling environments for the deployment of broadband. Additionally, Switzerland supports Access Now's campaign to prevent internet shutdowns and the 2023 conference on digital rights, Rightscon. Switzerland attaches great importance to the fight against online discrimination and the distribution of misleading content and c