# Build Your Own RAG using RAGStack
This notebook shows the steps to take to use the DataStax Enterprise v7 Vector Store as a means to make LLM interactions meaningfull and without hallucinations. The approach taken here is Retrieval Augmented Generation.

You'll learn:
1. About the content in a CNN dataset
2. How to interact with the OpenAI Chat Model *without* providing this context
3. How to load this context into DataStax Enterprise v7
4. How to run a semantic similarity search on DataStax Enterprise v7
5. How to use this context *with* the local Mistral Chat Model

## Install dependencies

In [None]:
!pip install ragstack-ai sentence-transformers datasets pipdeptree

## Visualize Ragstack dependencies
RAGStack is a curated stack of the best open-source software for easing implementation of the RAG pattern in production-ready applications using DataStax Enterprise, Astra Vector DB or Apache Cassandra as a vector store.

A single command (pip install ragstack-ai) unlocks all the open-source packages required to build production-ready RAG applications with LangChain and DataStax Enterprise, Astra Vector DB or Apache Cassandra.

For each open-source project included in RAGStack, we select a version lineup and then test the combination for compatibility, performance, and security. Our extensive test suite ensures that RAGStack components work well together so you can confidently deploy them in production. We also run security scans on all components using industry-standard tools to ensure that you are not exposed to known vulnerabilities.

In [None]:
!pipdeptree -p ragstack-ai

## Keeping it all locally and within the enterprise firewall
In this notebook we'll keep all services local to ensure maximum safety:

- For the Vector Database, [DataStax Enterprise 7](https://www.datastax.com/blog/get-started-with-the-datastax-enterprise-7-0-developer-vector-search-preview) will be used.
- For the Foundational Model we'll be using [Mistral](https://mistral.ai/).

Read more about Mistral and how it stacks up to GPT-4 [here](https://www.zdnet.com/article/what-to-know-about-mistral-ai-the-company-behind-the-latest-gpt-4-rival/).

# Get an inference engine with Mistral started
There are a multitude of inference engines. You can go for [LM Studio](https://lmstudio.ai/) which has a nice UI. In this notebook, we'll use [Ollama](https://ollama.com/).

1. Get started by [downloading](https://ollama.com/download)
2. Install it to your machine
3. Start the inference engine, while downloading Mistral (~4GB) with the command `ollama run mistral` in a terminal

In case this all fails, because of RAM limitations, you can opt to use [tinyllama](https://ollama.com/library/tinyllama) as a model.

## Call Mistral's Chat Model
In this example we'll ask what Daniell Radcliffe recieves when he turns 18.

As Mistral has no access to the CNN documents, it will come up with some answer that is very generic.

In [None]:
from langchain.prompts import ChatPromptTemplate
from langchain_community.chat_models.ollama import ChatOllama
from langchain.schema.runnable import RunnableMap
from langchain.schema.output_parser import StrOutputParser

template = """
You are a philosopher that draws inspiration from great thinkers of the past
to craft well-thought answers to user questions. Use the provided context as the basis
for your answers and do not make up new reasoning paths - just mix-and-match what you are given.
Your answers must be extensively written.

QUESTION: {question}

YOUR ANSWER:"""
prompt = ChatPromptTemplate.from_messages([("system", template)])

llm = ChatOllama(
    model="mistral:latest", 
    num_ctx=4096,
    base_url="http://host.docker.internal:11434"
)

inputs = RunnableMap({
  'question': lambda x: x['question']
})
chain = inputs | prompt | llm | StrOutputParser()

chain.invoke({"question": "What kind of fortune does Daniel Radcliffe get when he turns 18?"})

## Load data from CNN

In [None]:
import datasets

def load_articles(n=5):
  dataset = datasets.load_dataset('cnn_dailymail', '3.0.0', split='train', streaming=True)
  data = dataset.take(n)
  return [d['article']
          for d in data]

articles = load_articles()

## Check out some content
In this example we can read that when Daniel Radcliffe turns 18, he'll gain access to £20 million.

In [None]:
print(articles[0])

## Generate chunks to load into the Vector Store
Now let's load the CNN data into the Astra DB Vector Store.
1. First we'll chunk up the data so that it can be loaded in multiple pieces.
2. Then we'll create a new Vector Store on Astra DB.
3. Lastly, we'll load up the documents. As part of this step, the data will be vectorized and it's embeddings stored in the Vector Store.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

documents = splitter.create_documents(articles)
document_chunks = splitter.split_documents(documents)

print(document_chunks[0])

# Now let's run DSE 7 Vector Store
Make sure you have [Docker](https://www.docker.com/) installed.

Run DSE 7 in any of these two ways from a terminal window:
1. `docker-compose up` (using the docker-compose.yml file in the root of this repository)
2. `docker run -e DS_LICENSE=accept -p 9042:9042 datastax/dse-server:7.0.0-alpha.4`

And then create a default keyspace as follows:

In [None]:
from cassandra.cluster import Cluster

# Connect to DSE7
cluster = Cluster(["host.docker.internal"])
session = cluster.connect()

# Create the default keyspace
session.execute("CREATE KEYSPACE IF NOT EXISTS default_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}")

# Get the Vector Store
The following code will create a new Vector Store in DataStax Enterprise. For embeddings we'll be using the default from Huggingface.

In [None]:
from langchain_community.vectorstores import Cassandra
from langchain_community.embeddings import HuggingFaceEmbeddings

# Create a new Astra DB Vector Store
vector_store = Cassandra(
    session=session,
    keyspace="default_keyspace",
    table_name="dse_vector_table",
    embedding=HuggingFaceEmbeddings()
)

In [None]:
# Load the CNN documents into the Astra DB Vector Store (Only the first time)
vector_store.add_documents(document_chunks)

## Run a semantic query on the Astra DB Vector Store
Here you'll see that Astra DB retrieves relevant documents given the query.

In [None]:
query = 'What kind of fortune does Daniel Radcliffe get when he turns 18?'
vector_store.similarity_search(query, k=2)

## Call Mistral's Chat Model again
Now let's run the query again on the Mistral Chat Model while inserting the relevant context from the DataStax Enterprise Vector Store to make the response meaningfull and stop hallucinating.

In [None]:
from langchain.prompts import ChatPromptTemplate
from langchain_community.chat_models.ollama import ChatOllama
from langchain.schema.runnable import RunnableMap
from langchain.schema.output_parser import StrOutputParser

# Get the retriever for the Chat Model
retriever = vector_store.as_retriever(
    search_kwargs={"k": 5}
)

# Create the prompt template
template = """
You are a philosopher that draws inspiration from great thinkers of the past
to craft well-thought answers to user questions. Use the provided context as the basis
for your answers and do not make up new reasoning paths - just mix-and-match what you are given.
Your answers must be extensively written.

CONTEXT:
{context}

QUESTION: {question}

YOUR ANSWER:"""
prompt = ChatPromptTemplate.from_messages([("system", template)])

# Define the chain
inputs = RunnableMap({
  'context': lambda x: retriever.get_relevant_documents(x['question']),
  'question': lambda x: x['question']
})
chain = inputs | prompt | llm | StrOutputParser()

# Call the chain with the question
chain.invoke({"question": "What kind of fortune does Daniel Radcliffe get when he turns 18?"})