[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/telmo-correa/nebulous-llm-experiment/blob/main/notebooks/2%20-%20RetrievalQA%20chain.ipynb)

### 2. RetrievalQA chain

This notebook demonstrates how to load data and create a retrieval QA chain from a generated vector database with Chroma DB and langchain.

#### Data sources

We have generated a single dataset from 5 mostly unstructured sources about Nebulous: Fleet Command:

| Source | Description | Link |
|---|---|---|
| nebfltcom Wiki | Pages from the fan-made wiki | [🔗](http://nebfltcom.wikidot.com/) |
| Steam guides | Contents from some of the most popular Steam guides for Nebulous: Fleet Command focusing on gameplay | [🔗](https://steamcommunity.com/app/887570/guides/) |
| `#new-players` | Text messages from the discord channel `#new-players`, where new players may ask questions | [🔗](https://discord.gg/UT6wU7TQ) |
| `#shipyard` | Text messages from the discord channel `#shipyard`, where players ask for help with builds or fleet/ship-building questions | [🔗](https://discord.gg/UT6wU7TQ) |
| in-game lore | Content from in-game lore | [🔗](https://store.steampowered.com/app/887570/NEBULOUS_Fleet_Command/) |

The pre-compiled embeddings are saved on a google drive file: [data_all.tar.gz](https://drive.google.com/file/d/1-Ur30PLexcPo8mc1WPUh3Ahp6XlgptLx)

In [None]:
## If running on Google Colab, install the dependencies:

%pip install openai langchain

Import libraries:

In [1]:
import openai
import os

import getpass

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain import OpenAI

Setup OpenAI API key:

In [2]:
if "OPENAI_API_KEY" not in os.environ:
    print("Please enter your OpenAI API key:")
    openai_api_key = getpass.getpass()

    os.environ["OPENAI_API_KEY"] = openai_api_key
    openai.api_key = openai_api_key

Please enter your OpenAI API key:


 ········


Load generated embeddings with all data:

In [3]:
!wget -O /tmp/data_all.tar.gz "<dataset location here>"



Uncompress it:

In [4]:
!tar -xvzf /tmp/data_all.tar.gz

data_all/
data_all/all_data/
data_all/all_data/chroma-embeddings.parquet.tmp
data_all/all_data/chroma-embeddings.parquet
data_all/all_data/chroma-collections.parquet
data_all/all_data/index/
data_all/all_data/index/index_d6e2a0af-e8c9-491c-b2d1-26a4451fa9af.bin
data_all/all_data/index/uuid_to_id_d6e2a0af-e8c9-491c-b2d1-26a4451fa9af.pkl
data_all/all_data/index/id_to_uuid_d6e2a0af-e8c9-491c-b2d1-26a4451fa9af.pkl
data_all/all_data/index/index_metadata_d6e2a0af-e8c9-491c-b2d1-26a4451fa9af.pkl


Remove downloaded compressed file:

In [5]:
!rm -rf /tmp/data_all.tar.gz

In [6]:
PERSIST_DIRECTORY = "data_all/all_data/"

In [7]:
embedding = OpenAIEmbeddings()

Load ChromaDB from persisted data:

In [8]:
embedding = OpenAIEmbeddings()
docsearch = Chroma(persist_directory=PERSIST_DIRECTORY, embedding_function=embedding)

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

Create retrieval QA chain:

In [9]:
## Optionally, use W&B to track responses

from langchain.callbacks.tracers import WandbTracer

# wandb_config = {"project": "llm_agent_test"}
# tracer = WandbTracer(wandb_config)

In [10]:
llm = OpenAI(temperature=0)
chain = RetrievalQA.from_chain_type(
    llm,
    chain_type="stuff", 
    retriever=docsearch.as_retriever(search_type="similarity", search_kwargs={"k":10}),
    # callbacks=[tracer]
)

#### Query the chain

In [11]:
chain({"query": "What are the strategies for playing ANS?"}, return_only_outputs=False)

{'query': 'What are the strategies for playing ANS?',
 'result': ' When playing ANS, you should make use of their advantages such as superior electronic warfare, hybrid missile weapons, and particle beams. You should also use strategies such as utilizing the "Cruise" upgrade on missiles to waypoint them around rocks and out of LOS, and using size 2 hybrids to reach out and hit small, lone, defended targets at long range. Additionally, you should use floodlights to stay safe and AMMs to thin out hybrid strikes.'}

In [12]:
## If using W & B, make sure tracer is saved

# tracer.finish()