# Yield's Question Answering (QA) bot

### High-level approach
#### Pre-processing
- Load files from the repo, chunk them based on langugage-specific separators to maintain enough context
- Categorize each chunk as 'general' or 'technical' based on content type to allow semantic search to filter on query's underlying intent (general usage vs coding/technical)
- Add chunks to vector DB


#### Query time
- Call LLM to categorize the query as 'general' or 'technical' based on intent
- Run semantic search with category filter on vector db to get relevant chunks for LLM context
- Call LLM to answer user query with context


### Improvements
Parameters to explore to improve quality
- chunk size
- top-k count
- LLMs with larger context window (GPT-4 vs GPT-3.5 16k)
- Alternative Code-optimized LLMs such as BigCode https://huggingface.co/bigcode

In [None]:
# Install dependencies
!pip install langchain openai chromadb GitPython

In [347]:
import os
from langchain.document_loaders import GitLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter, Language
from langchain.text_splitter import MarkdownHeaderTextSplitter
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQAWithSourcesChain, RetrievalQA
from IPython.display import display
from IPython.display import Markdown
from getpass import getpass
from pathlib import Path
from langchain.callbacks import StdOutCallbackHandler
from langchain.prompts import PromptTemplate
from langchain.callbacks import get_openai_callback

stdout_handler = StdOutCallbackHandler() 

In [348]:
OPENAI_API_KEY = getpass()

········


In [349]:
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

## Common Functions

### Load contents from repo

In [350]:
def load_repo(remote_repo_url, local_repo_path, branch, file_filter=None):
    local_repo_exists = Path(local_repo_path).is_dir()

    if local_repo_exists:
        loader = GitLoader(
            repo_path=local_repo_path,
            branch=branch,
            file_filter=file_filter
        ) 
    else:
        loader = GitLoader(
            clone_url=remote_repo_url,
            repo_path=local_repo_path,
            branch=branch,
            file_filter=file_filter
        )
    return loader.load()

### Split document into chunks based on  Programming Language separators/syntax
* Split on programming language separators/syntax
* Categorize each chunk as 'general' or 'technical' based on content type

In [351]:
def split_docs(docs, language, chunk_size, chunk_overlap):
    text_splitter = RecursiveCharacterTextSplitter.from_language(language=language, chunk_size=chunk_size, chunk_overlap=chunk_overlap)

    all_splits=[]
    all_metadatas=[]
    for d in docs:
        doc_file=d.page_content   
        metadata = d.metadata
        splits = text_splitter.split_text(doc_file)
        
        if 'developer' in metadata['file_path']:
            metadata['category'] = "technical"
        else:
            metadata['category'] = "general"
        
        metadatas = [metadata for _ in splits]
        all_splits += splits
        all_metadatas += metadatas
        
    return {
        'all_splits': all_splits,
        'all_metadatas': all_metadatas
    }
    

### Create Vector DB

In [352]:
def create_vector_db(all_splits, all_metadatas):
    return  Chroma.from_texts(texts=all_splits,metadatas=all_metadatas,embedding=OpenAIEmbeddings())

## QA for Yield Docs V2

In [353]:
remote_repo_url="https://github.com/yieldprotocol/docs-v2"
local_repo_path="/tmp/yield_docs_v2_repo"
branch="main"
file_filter=lambda file_path: file_path.endswith(".md")

In [354]:
chunk_size_chars = 1000
chunk_overlap_chars = 0

docs_v2_list = load_repo(remote_repo_url, local_repo_path, branch, file_filter)
splits = split_docs(docs_v2_list, Language.MARKDOWN, chunk_size_chars, chunk_overlap_chars)

In [361]:
categorization_prompt_template = """
You are a Web3 expert who is able to answer any user query on Yield protocol's documentation, code, whitepapers and many other such topics.

# INSTRUCTIONS
- Classify the user's query into one of these categories - 'general' or 'technical' or 'na'
- Use the examples below as reference, do not make up any categories.
- Always return the category in plain text format.
- If you are unable to process the query, just return 'na'

QUERY: How do I borrow
ANSWER: general

QUERY: How do I borrow using code
ANSWER: technical

QUERY: How do I integrate
ANSWER: technical

QUERY: How is lending rate calculated
ANSWER: general

QUERY: {query}
ANSWER:"""

QUERY_CATEGORIZATION_PROMPT = PromptTemplate(
    template=categorization_prompt_template, input_variables=["query"]
)

In [362]:
final_answer_prompt_template = """
You are a Web3 expert who is able to answer any user query on Yield protocol's documentation, code, whitepapers and many other such topics.

# INSTRUCTIONS
- The user's query will be wrapped in triple back ticks
- Only answer the query using the provided context below which may include general information and code, do not make up any information.
- If the query is related to code or integration, provide a step by step explanation on the process, use code suggestions whereever relevant and always use markdown format annotated with the language to show the code.
- Yield Protocol has no JS SDK so always use ethers package for JS code suggestions.

# CONTEXT
{context}

# QUERY
```
{question}
```
"""
FINAL_ANSWER_PROMPT = PromptTemplate(
    template=final_answer_prompt_template, input_variables=["context", "question"]
)

In [363]:
def ask(query):
    display(Markdown(f"### Query\n{query}"))
    
    # 1. Call LLM to categorize query based on intent
    categorization_llm_chain = LLMChain(
        llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo"),
        prompt=QUERY_CATEGORIZATION_PROMPT
    )

    with get_openai_callback() as cb:
        result = categorization_llm_chain.run(query=query)
        category = result.replace("'", "")
        display(Markdown(f"### Query Category\n**{category}**"))
        print(cb)

    # 2. Call LLM to answer user's query
    chain_type_kwargs = {"prompt": FINAL_ANSWER_PROMPT}
    qa_chain = RetrievalQA.from_chain_type(
        llm=ChatOpenAI(temperature=0, model="gpt-4"), 
        chain_type="stuff", 
        retriever=docsearch_chunks.as_retriever(search_kwargs = {
            'k': 10,
            'filter': {'category': category}

        }), 
        chain_type_kwargs=chain_type_kwargs,
        return_source_documents=True
    )

    with get_openai_callback() as cb:
        answer = qa_chain({'query': query})
        display(Markdown("### Final Answer"))
        display(Markdown(answer['result']))
        print(cb)
        #print(res['source_documents'])

In [364]:
ask("how do i borrow, what are some of the main concepts to know about")

### Query
how do i borrow, what are some of the main concepts to know about

### Query Category
**general**

Tokens Used: 180
	Prompt Tokens: 179
	Completion Tokens: 1
Successful Requests: 1
Total Cost (USD): $0.0002705


### Final Answer

To borrow using Yield Protocol, follow these three simple steps:

1. Choose an asset you want to borrow.
2. Add collateral.
3. Review and initiate the transaction.

You can begin borrowing by accessing the [Yield v2 App](https://app.yieldprotocol.com/#/borow). Read our [Quick Start Guide](https://medium.com/yield-protocol/yield-protocol-v2-quickstart-guide-e516a955a405) for instructions.

Some main concepts to know about when borrowing on Yield Protocol are:

- Fixed interest rate: The interest rate you receive when borrowing is determined by a built-in automated market. The more you borrow, the higher your interest rate may be.
- Overcollateralization: All loans in Yield require overcollateralization, meaning you need to provide a greater value of collateral than the debt you're taking on.
- Vault: A vault is a collateralized debt position. Each vault allows you to deposit one type of collateral and borrow a single asset for a fixed term.
- Series: A series represents a single borrowable asset with a defined maturity date. Each series corresponds to a single ERC-20 fyToken.
- Maturity: At maturity, you must repay the debt to reclaim your collateral. You can also repay your debt earlier than the maturity by returning the fyTokens you have drawn. However, repaying early may result in not receiving your original fixed rate.

Remember to maintain a sufficient amount of collateral in your vault at all times to avoid liquidation.

Tokens Used: 3396
	Prompt Tokens: 3088
	Completion Tokens: 308
Successful Requests: 1
Total Cost (USD): $0.11112


In [365]:
ask("how do i borrow using code")

### Query
how do i borrow using code

### Query Category
**technical**

Tokens Used: 171
	Prompt Tokens: 170
	Completion Tokens: 1
Successful Requests: 1
Total Cost (USD): $0.000257


### Final Answer

To borrow using code, you can interact with the Yield Protocol's Ladle contract. The Ladle contract provides a function called `pour` that allows you to deposit collateral and borrow fyTokens. Here's a step-by-step guide on how to borrow using code:

1. Install the ethers package:

```bash
npm install ethers
```

2. Import the ethers package and set up your Ethereum provider:

```javascript
const { ethers } = require("ethers");

// Set up your Ethereum provider
const provider = new ethers.providers.JsonRpcProvider("YOUR_RPC_URL");
```

3. Set up your wallet and connect it to the provider:

```javascript
const walletPrivateKey = "YOUR_PRIVATE_KEY";
const wallet = new ethers.Wallet(walletPrivateKey, provider);
```

4. Define the Ladle contract address and ABI:

```javascript
const ladleAddress = "LADLE_CONTRACT_ADDRESS";
const ladleABI = [ /* LADLE_CONTRACT_ABI */ ];
```

5. Create a contract instance for the Ladle contract:

```javascript
const ladleContract = new ethers.Contract(ladleAddress, ladleABI, wallet);
```

6. Define the parameters for the `pour` function:

```javascript
const vaultId = "YOUR_VAULT_ID";
const ink = ethers.utils.parseUnits("COLLATERAL_AMOUNT", "COLLATERAL_DECIMALS");
const art = ethers.utils.parseUnits("FYTOKEN_AMOUNT", "FYTOKEN_DECIMALS");
```

7. Call the `pour` function to deposit collateral and borrow fyTokens:

```javascript
async function borrow() {
  try {
    const tx = await ladleContract.pour(vaultId, ink, art);
    const receipt = await tx.wait();
    console.log("Borrow successful:", receipt);
  } catch (error) {
    console.error("Borrow failed:", error);
  }
}

borrow();
```

Replace `YOUR_RPC_URL`, `YOUR_PRIVATE_KEY`, `LADLE_CONTRACT_ADDRESS`, `LADLE_CONTRACT_ABI`, `YOUR_VAULT_ID`, `COLLATERAL_AMOUNT`, `COLLATERAL_DECIMALS`, `FYTOKEN_AMOUNT`, and `FYTOKEN_DECIMALS` with the appropriate values for your use case.

Tokens Used: 3113
	Prompt Tokens: 2648
	Completion Tokens: 465
Successful Requests: 1
Total Cost (USD): $0.10733999999999999


In [366]:
ask("I don't understand what ladle is, explain in detail")

### Query
I don't understand what ladle is, explain in detail

### Query Category
**technical**

Tokens Used: 177
	Prompt Tokens: 176
	Completion Tokens: 1
Successful Requests: 1
Total Cost (USD): $0.000266


### Final Answer

The Ladle is a routing and asset management contract for Yield Protocol. It is designed to orchestrate contract calls throughout the protocol, providing user-oriented features in an efficient manner. The Ladle is the most complex contract in the protocol and has considerable privileges.

The Ladle is authorized to make changes to the accounting in the Cauldron, which is the core accounting contract in Yield Protocol. It is also the only contract that is authorized to create, modify, or destroy Vaults in the Cauldron.

The Ladle keeps a registry of all Joins (asset adapters) and is authorized to move assets from any Join to any account. It also moves assets from users to Joins, with allowances approved by the users.

The Ladle is authorized to mint fyToken (fixed yield tokens) at will. It also moves fyToken from users to FYToken contracts for burning, with allowances approved by the users. The Ladle knows about all the existing fyTokens through the series registry in the Cauldron.

The Ladle keeps a registry of all the Pools (liquidity pools), indexed by the id of the series traded. It also moves assets from users to Pool contracts for trading, with allowances approved by the users.

The Ladle provides various user features such as depositing collateral, borrowing fyTokens, repaying debt, withdrawing collateral, and more. Users can also perform actions like serving, rolling, repaying, redeeming, and transferring tokens.

To execute multiple actions in a single transaction, the Ladle provides a `batch` function, which is the most common method of operation. It can also execute arbitrary calls on any registered contracts using the `route` function. Additionally, the Ladle can be extended by the use of modules, which can be authorized via governance and inherit from LadleStorage to read and modify the Ladle storage.

Tokens Used: 3528
	Prompt Tokens: 3154
	Completion Tokens: 374
Successful Requests: 1
Total Cost (USD): $0.11706
