# Instructions
1. Export an OpenAI API key as an environment variable under the name "OPENAI_API_KEY". Without this, RAG won't run!
2. Click “Run All” in run options on the top bar.
3. In the last tab, input whatever query as the parameter for the function “query_engine.query” and then rerun the final code block to acquire another answer for the new query.

(Note that for the sake of demonstration, we have commented out the code supporting persisting index values, because Google Colab itself cannot maintain persistent data and thus we would require a particular Google Drive folder format. Because there is a small amount of data, we believe it is an acceptable cost to just reconstruct an index.)


In [None]:
!pip install llama-index

Collecting llama-index
  Downloading llama_index-0.10.30-py3-none-any.whl (6.9 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.2.2-py3-none-any.whl (12 kB)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.12-py3-none-any.whl (26 kB)
Collecting llama-index-core<0.11.0,>=0.10.30 (from llama-index)
  Downloading llama_index_core-0.10.30-py3-none-any.whl (15.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m51.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index)
  Downloading llama_index_embeddings_openai-0.1.8-py3-none-any.whl (6.0 kB)
Collecting llama-index-indices-managed-llama-cloud<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.1.5-py3-none-any.whl (6.7 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading 

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.core import PromptTemplate
# from google.colab import userdata, drive
import os

In [None]:
# drive.mount("/mnt/drive")

Mounted at /mnt/drive


Mount drive in order to load in generated output.

In [None]:
filelink = lambda filename : {'video_link': os.path.splitext(filename)[0]}
def filelink_fn(filename):
  basename = os.path.basename(filename)
  basename = basename.replace(" ", "/")
  basename = basename.replace("_", ":")
  basename = basename.replace(",", "?")
  if (not basename.startswith("https")):
    basename = ""
  return {"video_link": os.path.splitext(basename)[0], "timestamp_format": "seconds"}

path = "/mnt/drive/MyDrive/CSCI2270/data"
documents = SimpleDirectoryReader(path, file_metadata=filelink_fn).load_data()

Replace the Path field with the filepath to the directory containing the data on local machine, instead of the Google Drive path.

In [None]:
# os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

By default, Llamaindex uses GPT 3.5 for both embeddings and LLM. Here, we load in the OpenAI API key to use GPT. Note that GPT has much stronger performance, at least compared to HuggingFace LLMs. Set the OpenAI API key as an environment variable.

In [None]:
persist_dir = "/mnt/drive/MyDrive/CSCI2270/storage"

index = None
# if (os.path.exists(persist_dir)):
#   storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
#   index = load_index_from_storage(storage_context)
# else:
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist(persist_dir)

Store index into persistent data in Google Drive, and load it if possible. Index construction is *very* expensive, both time and LLM token wise, so we would like to avoid it if possible

In [None]:
query_engine = index.as_query_engine(streaming=True, similarity_top_k=3)

In [None]:
prompts_dict = query_engine.get_prompts()
custom_prompt = (
  "Context information is below.\n"
  "---------------------\n"
  "{context_str}\n"
  "---------------------\n"
  "Given the context information and not prior knowledge, answer the query.\n"
  "Some rules to follow:\n"
  "1. If the query asks for a recipe or a dish and the recipe has an associated video link, please include the relevant video link.\n"
  "2. If the query asks for a recipe, please write down the recipe as a numbered list with start and end timestamps for each step in minutes.\n"
  "3. When writing the steps for a recipe, please provide video links that jump to the video at that timestamp.\n"
  "Query: {query_str}\n"
  "Answer: "
)
custom_prompt_template = PromptTemplate(custom_prompt)
query_engine.update_prompts({"response_synthesizer:text_qa_template": custom_prompt_template})

print(prompts_dict["response_synthesizer:text_qa_template"].get_template())

**Prompt Key**: response_synthesizer:text_qa_template<br>**Text:** <br>

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: 


<br><br>

**Prompt Key**: response_synthesizer:refine_template<br>**Text:** <br>

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer: 


<br><br>

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: 


In [None]:
response = query_engine.query("I want to make a pasta dish with a pressure cooker")
response.print_response_stream()

Sure! Here is a recipe for Pressure Cooker Pasta Fagioli:

Video Link: [Pressure Cooker Pasta Fagioli Recipe](https://www.youtube.com/watch?v=2vXdU19ouac)

1. Start: 43.0 End: 68.0 - Cook pancetta in a dry pan over medium-high heat until it renders out fat and crisps up.
   [Jump to Step 1 in the video](https://www.youtube.com/watch?v=2vXdU19ouac&t=43s)

2. Start: 77.0 End: 103.0 - Add finely diced onions, carrots, and celery to the cooked pancetta and sauté until softened.
   [Jump to Step 2 in the video](https://www.youtube.com/watch?v=2vXdU19ouac&t=77s)

3. Start: 109.0 End: 123.0 - Add dried basil, oregano, and marjoram to the vegetable mixture for Italian flavors.
   [Jump to Step 3 in the video](https://www.youtube.com/watch?v=2vXdU19ouac&t=109s)

4. Start: 185.0 End: 229.0 - Add soaked cannellini beans and chicken stock to the pressure cooker, bring to a boil, and cook under high pressure for 17 minutes.
   [Jump to Step 4 in the video](https://www.youtube.com/watch?v=2vXdU19oua