# Vectrix Graphs

### Packages needed to run this notebook
- ***libmagic***: ```brew install libmagic```
*This is used to extract the file type from the file.*


### ChromaDB
For lightweight usage, don't forget to start the ChromaDB server: ```docker run -p 7777:8000 chromadb/chroma```

### Weaviate
If you want to use Weaviate instead of ChromaDB (which I do recommand for stability and performance), don't forget to start the Weaviate server: ```docker-compose -p weaviate up -d```

It's important to run this command from the ```src/vectrix_graphs/db``` directory. You also need a .env file in this directory with the following content:
```
COHERE_API_KEY=my_cohere_api_key
```

### Run the API
https://docs.astral.sh/uv/guides/integration/fastapi/

For development purposes, you can run the API with: ```uv run fastapi dev```

In [2]:
# Load packages from the src directory
import sys
import json
sys.path.append('../src')
from vectrix_graphs import ExtractDocuments, setup_logger, ExtractMetaData


from dotenv import load_dotenv
load_dotenv()



True

### Extracting chunks of data from a document

In [None]:
# Create chunks of data from a document
extract = ExtractDocuments(
    logger=setup_logger(name="Files", level="INFO"),
    )

result = extract.extract(file_path="./files/attention_is_all_you_need.pdf")


In [None]:
print('Metadata:')
print(json.dumps(result[1].metadata, indent=4))

print('Content:')
print(result[1].page_content)

In [None]:
# Add additional metadata using a NER-pipeline
ner = ExtractMetaData(
    logger=setup_logger(name="ExtractMetaData", level="INFO",),
    model="gpt-4o-mini" # Options are gpt-4o-mini, llama3.1-8B , llama3.1-70B
    )
result_with_metadata = ner.extract(result, source="uploaded_file")

In [None]:
result_with_metadata[0].metadata

### Adding the documents to a vector database (Chroma)

For this demo, the vector database will be saved locally on disk, restarting the container will delete the database.
I prefer using the cosine distance instead of the default squared L2 distance, we pass this using the `hnsw:space` metadata.

$$
d = 1.0 - \frac{\sum(A_i \times B_i)}{\sqrt{\sum(A_i^2) \cdot \sum(B_i^2)}}
$$

We use Ollama to calculate the embeddings locally with BGE-M3, since over a 100 langues are supported this is ideal for embedding Arabic documents.

BGE-M3 is based on the XLM-RoBERTa architecture and is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity:

- Multi-Functionality: It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval.
- Multi-Linguality: It can support more than 100 working languages.
- Multi-Granularity: It is able to process inputs of different granularities, spanning from short sentences to long documents of up to 8192 tokens.

> ℹ️ So all embeddings will be calculated locally ℹ️


In [6]:
from vectrix_graphs import vectordb

vectordb.remove_collection("demo")
vectordb.create_collection("demo")

In [7]:
vectordb.add_documents(result_with_metadata)

In [3]:
from vectrix_graphs import vectordb
# Now let's query the vector database
vectordb.similarity_search(
    query="What is the attention mechanism?",
    k=3
    )

[Document(metadata={'language': 'EN', 'tags': "['AI', 'Machine Learning', 'Natural Language Processing']", 'read_time': 0.685, 'summary': 'The document discusses self-attention mechanisms, particularly in the context of the Transformer model, highlighting its advantages over traditional models that use RNNs or convolution. It also mentions applications of self-attention in various tasks such as reading comprehension and language modeling.', 'source': 'uploaded_file', 'filetype': 'application/pdf', 'last_modified': '2024-10-22T21:44:24', 'content_type': 'blog_post', 'filename': 'attention_is_all_you_need.pdf', 'word_count': 137.0, 'author': '', 'uuid': '98a952ae-df5b-4bf7-93f8-ee9d2ada043d'}, page_content='Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstracti

In [None]:
import chromadb



client = chromadb.HttpClient(host='localhost', port=7777)

collection = client.get_collection("demo")
collection.query(
    query_embeddings=embeddings,
    n_results=3,
    where={"metadata_field": "uploaded_file"}
)

## Asking questions to the Graph
 Let's now ask questions using the LangGraph workflow

### Example 1: Using closed source LLMs


### Example 2: Using open-source LLMs that can be self-hosted

In [None]:
# Load packages from the src directory
import sys
from IPython.display import Markdown, display, Image
sys.path.append('../src')

from langchain_core.messages import HumanMessage
from vectrix_graphs import local_slm_demo

# Display the graph
display(Image(local_slm_demo.get_graph().draw_mermaid_png()))

#Ask the question
input = [HumanMessage(content= مامصدر؟")]


# Run the graph
response = await local_slm_demo.ainvoke({"messages": input})
display(Markdown(f"***Question:*** \n {input[0].content}\n"))
display(Markdown(response['messages'][-1].content))

In [None]:
# Load packages from the src directory
import sys
from IPython.display import Markdown, display, Image
sys.path.append('../src')

from langchain_core.messages import HumanMessage
from vectrix_graphs import local_slm_demo

#Ask the question
input = [HumanMessage(content="What is the attention mechanism?")]

# Run the graph
response = await local_slm_demo.ainvoke({"messages": input})
display(Markdown(f"***Question:*** \n {input[0].content}\n"))
display(Markdown(response['messages'][-1].content))

In [None]:
print(response['messages'][-1])

In [None]:
from langchain_core.messages import HumanMessage, AIMessage
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate


human = HumanMessage(content="What is the attention mechanism?")
ai = AIMessage(content="The attention mechanism is a technique used in neural networks to enable the model to focus on relevant parts of the input data during processing.")

messages = ChatPromptTemplate.from_messages([human, ai])


llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

chain = messages | llm | StrOutputParser()

chain.invoke({})