# Searchflow

### Packages needed to run this notebook
- libmagic: ```brew install libmagic```

### ChromaDB
Don't forget to start the ChromaDB server: ```docker run -p 7777:8000 chromadb/chroma```

### Run the API
https://docs.astral.sh/uv/guides/integration/fastapi/

For development purposes, you can run the API with: ```uv run fastapi dev```

In [2]:
# Load packages from the src directory
import sys
import json
sys.path.append('../src')
from vectrix_graphs import ExtractDocuments, setup_logger, ExtractMetaData


from dotenv import load_dotenv
load_dotenv()

True

### Extracting chunks of data from a document

In [3]:
# Create chunks of data from a document
extract = ExtractDocuments(
    logger=setup_logger(name="Files", level="INFO"),
    )

result = extract.extract(file_path="./files/attention_is_all_you_need.pdf")


[32m2024-10-26 16:56:20,964 - Files - INFO - Extracting documents from ./files/attention_is_all_you_need.pdf[0m


In [4]:
print('Metadata:')
print(json.dumps(result[1].metadata, indent=4))

print('Content:')
print(result[1].page_content)

Metadata:
{
    "file_directory": "./files",
    "filename": "attention_is_all_you_need.pdf",
    "languages": [
        "eng"
    ],
    "last_modified": "2024-10-22T21:44:24",
    "page_number": 1,
    "orig_elements": "eJx1VE2P2zYQ/SsDnS3VlrW73j0G2ObSjwB1kMNiYVDSWCJKkQpJ+aNB/nvfUFq3TZOTzSFn5s17T/PyJWPDA9t40G32RFn9uK5rVT7k3KhtXlUt5+q+2eYll6pudmtW24dsRdnAUbUqKuR8yRrnfKutihzS2airm+KhZ931EZGHxxIpS/Ss29gjeL+R4Oi0jZL18rKptsX9iqrNtihfV/R2vnu4K3Zyru4fi93/z/N7BLJwDZEHmeKDvrD5Y1QNZ19xcdSGD6323ETnr/Kg+EliIVsurRpYwipGUKGdPehwUMYcgPhgmdtibI9ZmsF2k+rSnC8Z2y57TdEQD4Nr9VFzYrFcl1W+WedluS83T1X1VFaSPSLzYKehZo9XG4EW+SIMZfueqXUDSLSRAn+e2DZM0Ssb2qkRSIQGbAIpz1SrwC0h1rhhNHwhjDZ5D+zkPIL25MwkScqQ5cmnn3h2/s9AsVeRtG3M1DIpS2iEwh5/W1LUcjoVJIBqDpFG9kfngay7ITDBSRMLPlGO/1NiKYC4d1PXS4cbqzRw0yurw1DQJ6bRu9EFgAC4MwUto7zhxJhNryM6TJ5XqcteyBAo7FcLA8EZNlch4js9wopaHUa2QbCfdexvNDWcsP6LqECS7lGtoOcLZtbyUQQpDTw0KKCxiyBGpUZRBdAZencWeJhjoSc6EEdhkiJQQ1v6PCmj45XOPbyGy5lL6DgqSIMR9F+qNjMmD+21lxdBdxaOauAIzAi3ojRQSX3A0Lag3yc/NyWBxycOVO

In [5]:
# Add additional metadata using a NER-pipeline
ner = ExtractMetaData(
    logger=setup_logger(name="ExtractMetaData", level="INFO",),
    model="llama3.1-70B" # Options are gpt-4o-mini, llama3.1-8B , llama3.1-70B
    )
result_with_metadata = ner.extract(result, source="uploaded_file")

[32m2024-10-26 16:56:25,890 - ExtractMetaData - INFO - Extracting metadata from 52 documents, using llama3.1-70B[0m
[31m2024-10-26 16:56:31,776 - ExtractMetaData - ERROR - Error during batch processing: Error code: 429 - {'error': {'message': 'Request was rejected due to request rate limiting. Your rate limits are 600 RPM (10 QPS) and 180000 TPM (3000 TPS). See details: https://docs.together.ai/docs/rate-limits', 'type': 'credit_limit', 'param': None, 'code': None}}[0m


In [6]:
result_with_metadata[0].metadata

{'filename': 'attention_is_all_you_need.pdf',
 'filetype': 'application/pdf',
 'author': '',
 'source': 'uploaded_file',
 'word_count': 107,
 'language': '',
 'content_type': '',
 'tags': '',
 'summary': '',
 'read_time': 0.535,
 'last_modified': '2024-10-22T21:44:24'}

### Adding the documents to a vector database (Chroma)

For this demo, the vector database will be saved locally on disk, restarting the container will delete the database.
I prefer using the cosine distance instead of the default squared L2 distance, we pass this using the `hnsw:space` metadata.

$$
d = 1.0 - \frac{\sum(A_i \times B_i)}{\sqrt{\sum(A_i^2) \cdot \sum(B_i^2)}}
$$

We use Ollama to calculate the embeddings locally with BGE-M3, since over a 100 langues are supported this is ideal for embedding Arabic documents.

BGE-M3 is based on the XLM-RoBERTa architecture and is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity:

- Multi-Functionality: It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval.
- Multi-Linguality: It can support more than 100 working languages.
- Multi-Granularity: It is able to process inputs of different granularities, spanning from short sentences to long documents of up to 8192 tokens.

> ℹ️ So all embeddings will be calculated locally ℹ️


In [7]:
from vectrix_graphs import vectordb

vectordb.remove_collection("demo")
vectordb.create_collection("demo")

In [8]:
vectordb.add_documents(result_with_metadata)

In [None]:
# Now let's query the vector database
vectordb.similarity_search(
    query="What is the attention mechanism?",
    k=3
    )

## Asking questions to the Graph
 Let's now ask questions using the LangGraph workflow

### Example 1: Using closed source LLMs


### Example 2: Using open-source LLMs that can be self-hosted

In [None]:
# Load packages from the src directory
import sys
from IPython.display import Markdown, display, Image
sys.path.append('../src')

from langchain_core.messages import HumanMessage
from vectrix_graphs import local_slm_demo

# Display the graph
display(Image(local_slm_demo.get_graph().draw_mermaid_png()))

#Ask the question
input = [HumanMessage(content="What is the attention mechanism?")]


# Run the graph
response = await local_slm_demo.ainvoke({"messages": input})
display(Markdown(f"***Question:*** \n {input[0].content}\n"))
display(Markdown(response['messages'][-1].content))

In [None]:
# Load packages from the src directory
import sys
import json
from IPython.display import Markdown, display, Image
sys.path.append('../src')

from langchain_core.messages import HumanMessage
from vectrix_graphs import local_slm_demo

#Ask the question
input = [HumanMessage(content="What is the attention mechanism?")]

response = await local_slm_demo.ainvoke({"messages": input})
print(json.dumps(response, indent=4))

In [None]:
print(response['messages'][-1])

In [14]:
from langchain_core.messages import HumanMessage, AIMessage
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate


human = HumanMessage(content="What is the attention mechanism?")
ai = AIMessage(content="The attention mechanism is a technique used in neural networks to enable the model to focus on relevant parts of the input data during processing.")

messages = ChatPromptTemplate.from_messages([human, ai])


llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

chain = messages | llm | StrOutputParser()

chain.invoke({})

'The attention mechanism is a technique used in neural networks, particularly in natural language processing (NLP) and computer vision, to allow models to focus on specific parts of the input data that are most relevant to the task at hand. It helps improve the performance of models by enabling them to weigh the importance of different input elements dynamically.\n\n### Key Concepts of Attention Mechanism:\n\n1. **Contextual Focus**: Instead of processing all input data uniformly, the attention mechanism allows the model to selectively concentrate on certain parts of the input. This is particularly useful in tasks like translation, where certain words in a source sentence may be more relevant to specific words in the target sentence.\n\n2. **Weights and Scores**: The attention mechanism computes a set of attention scores that determine how much focus to place on each part of the input. These scores are typically derived from the similarity between the current state of the model (e.g., 