# Yield's Question Answering (QA) bot

### High-level approach
#### Pre-processing
- Load files from the repo, chunk them based on langugage-specific separators to maintain enough context
- Categorize each chunk as 'general' or 'technical' based on content type to allow semantic search to filter on query's underlying intent (general usage vs coding/technical)
- Add chunks to vector DB


#### Query time
- Call LLM to categorize the query as 'general' or 'technical' based on intent
- Run semantic search with category filter on vector db to get relevant chunks for LLM context
- Call LLM to answer user query with context


### Improvements
#### Parameters to explore to improve quality
- chunk size
- top-k count
- LLMs with larger context window (GPT-4 vs GPT-3.5 16k)
- Alternative Code-optimized LLMs such as BigCode https://huggingface.co/bigcode

#### Code suggestions
Code suggesstions could be improved by indexing [COOKBOOK](https://github.com/yieldprotocol/addendum-docs/blob/main/COOKBOOK.md) as the V2 documentation does not have extensive code examples

In [1]:
# Install dependencies
!pip install langchain openai chromadb GitPython ipython tiktoken





In [2]:
import os
from langchain.document_loaders import GitLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter, Language
from langchain.text_splitter import MarkdownHeaderTextSplitter
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQAWithSourcesChain, RetrievalQA
from IPython.display import display
from IPython.display import Markdown
from getpass import getpass
from pathlib import Path
from langchain.callbacks import StdOutCallbackHandler
from langchain.prompts import PromptTemplate
from langchain.callbacks import get_openai_callback
from langchain import LLMChain
from langchain.chat_models import ChatOpenAI

stdout_handler = StdOutCallbackHandler() 

In [3]:
OPENAI_API_KEY = getpass()

········


In [4]:
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

## Common Functions

### Load contents from repo

In [5]:
def load_repo(remote_repo_url, local_repo_path, branch, file_filter=None):
    local_repo_exists = Path(local_repo_path).is_dir()

    if local_repo_exists:
        loader = GitLoader(
            repo_path=local_repo_path,
            branch=branch,
            file_filter=file_filter
        ) 
    else:
        loader = GitLoader(
            clone_url=remote_repo_url,
            repo_path=local_repo_path,
            branch=branch,
            file_filter=file_filter
        )
    return loader.load()

### Split document into chunks based on  Programming Language separators/syntax
* Split on programming language separators/syntax
* Categorize each chunk as 'general' or 'technical' based on content type

In [6]:
def split_docs(docs, language, chunk_size, chunk_overlap):
    text_splitter = RecursiveCharacterTextSplitter.from_language(language=language, chunk_size=chunk_size, chunk_overlap=chunk_overlap)

    all_splits=[]
    all_metadatas=[]
    for d in docs:
        doc_file=d.page_content   
        metadata = d.metadata
        splits = text_splitter.split_text(doc_file)
        
        if 'developer' in metadata['file_path']:
            metadata['category'] = "technical"
        else:
            metadata['category'] = "general"
        
        metadatas = [metadata for _ in splits]
        all_splits += splits
        all_metadatas += metadatas
        
    return {
        'all_splits': all_splits,
        'all_metadatas': all_metadatas
    }
    

### Create Vector DB

In [7]:
def create_vector_db(all_splits, all_metadatas):
    return  Chroma.from_texts(texts=all_splits,metadatas=all_metadatas,embedding=OpenAIEmbeddings())

## QA for Yield Docs V2

In [8]:
remote_repo_url="https://github.com/yieldprotocol/docs-v2"
local_repo_path="/tmp/yield_docs_v2_repo"
branch="main"
file_filter=lambda file_path: file_path.endswith(".md")

In [10]:
chunk_size_chars = 1000
chunk_overlap_chars = 0

docs_v2_list = load_repo(remote_repo_url, local_repo_path, branch, file_filter)
splits = split_docs(docs_v2_list, Language.MARKDOWN, chunk_size_chars, chunk_overlap_chars)
docsearch_chunks = create_vector_db(splits['all_splits'], splits['all_metadatas'])

In [11]:
categorization_prompt_template = """
You are a Web3 expert who is able to answer any user query on Yield protocol's documentation, code, whitepapers and many other such topics.

# INSTRUCTIONS
- Classify the user's query into one of these categories - 'general' or 'technical' or 'na'
- Use the examples below as reference, do not make up any categories.
- Always return the category in plain text format.
- If you are unable to process the query, just return 'na'

QUERY: How do I borrow
ANSWER: general

QUERY: How do I borrow using code
ANSWER: technical

QUERY: How do I integrate
ANSWER: technical

QUERY: How is lending rate calculated
ANSWER: general

QUERY: {query}
ANSWER:"""

QUERY_CATEGORIZATION_PROMPT = PromptTemplate(
    template=categorization_prompt_template, input_variables=["query"]
)

In [12]:
final_answer_prompt_template = """
You are a Web3 expert who is able to answer any user query on Yield protocol's documentation, code, whitepapers and many other such topics.

# INSTRUCTIONS
- The user's query will be wrapped in triple back ticks
- Only answer the query using the provided context below which may include general information and code, do not make up any information.
- If the query is related to code or integration, provide a step by step explanation on the process, use code suggestions whereever relevant and always use markdown format annotated with the language to show the code.
- Yield Protocol has no JS SDK so always use ethers package for JS code suggestions.

# CONTEXT
{context}

# QUERY
```
{question}
```
"""
FINAL_ANSWER_PROMPT = PromptTemplate(
    template=final_answer_prompt_template, input_variables=["context", "question"]
)

In [13]:
def ask(query):
    display(Markdown(f"### Query\n{query}"))
    
    # 1. Call LLM to categorize query based on intent
    categorization_llm_chain = LLMChain(
        llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo"),
        prompt=QUERY_CATEGORIZATION_PROMPT
    )

    with get_openai_callback() as cb:
        result = categorization_llm_chain.run(query=query)
        category = result.replace("'", "")
        display(Markdown(f"### Query Category\n**{category}**"))
        print(cb)

    # 2. Call LLM to answer user's query
    chain_type_kwargs = {"prompt": FINAL_ANSWER_PROMPT}
    qa_chain = RetrievalQA.from_chain_type(
        llm=ChatOpenAI(temperature=0, model="gpt-4"), 
        chain_type="stuff", 
        retriever=docsearch_chunks.as_retriever(search_kwargs = {
            'k': 10,
            'filter': {'category': category}

        }), 
        chain_type_kwargs=chain_type_kwargs,
        return_source_documents=True
    )

    with get_openai_callback() as cb:
        answer = qa_chain({'query': query})
        display(Markdown("### Final Answer"))
        display(Markdown(answer['result']))
        print(cb)
        #print(res['source_documents'])

In [14]:
ask("how do i borrow, what are some of the main concepts to know about")

### Query
how do i borrow, what are some of the main concepts to know about

### Query Category
**general**

Tokens Used: 180
	Prompt Tokens: 179
	Completion Tokens: 1
Successful Requests: 1
Total Cost (USD): $0.0002705


### Final Answer

To borrow on Yield Protocol, follow these steps:

1. Access the Yield v2 App at https://app.yieldprotocol.com/#/borow.
2. Choose the asset you want to borrow.
3. Add collateral to your vault.
4. Review and initiate the transaction.

Here are some main concepts to know about borrowing on Yield Protocol:

- Borrowing is done at a fixed interest rate determined by a built-in automated market.
- All loans require overcollateralization, meaning you need to deposit a greater value of collateral than the debt you're taking on.
- You can borrow at a fixed rate for a fixed term, but you may repay early if you choose. However, repaying early may result in not receiving your original fixed rate.
- If you don't close your loan at the time of maturity, floating-rate interest will be charged to keep the position open.
- When borrowing, you deposit collateral in the protocol and draw new fyTokens against the collateral. You can then sell the fyTokens for the underlying token, locking in your borrowing rate.
- At maturity, you must repay the debt to reclaim your collateral. You can also repay your debt earlier than the maturity by returning the fyTokens you have drawn, but be aware that interest rate changes may affect the amount of the borrowed asset you need to spend to obtain the needed fyTokens.

For more information, refer to the Borrowing section in the Yield Protocol documentation: https://docs.yieldprotocol.com/users/borrowing

Tokens Used: 1917
	Prompt Tokens: 1616
	Completion Tokens: 301
Successful Requests: 1
Total Cost (USD): $0.06653999999999999


In [15]:
ask("how do i borrow using code")

### Query
how do i borrow using code

### Query Category
**technical**

Tokens Used: 171
	Prompt Tokens: 170
	Completion Tokens: 1
Successful Requests: 1
Total Cost (USD): $0.000257


### Final Answer

To borrow using code, you need to follow these steps:

1. Import the required packages and contracts.
2. Set up a connection to the Ethereum network using a provider.
3. Instantiate the Cauldron, Ladle, and Join contracts.
4. Create a vault if you don't have one.
5. Approve the transfer of collateral to the Join contract.
6. Add collateral to the vault.
7. Borrow the desired amount of fyToken.

Here's a step-by-step example using ethers.js:

```javascript
// Import ethers
const { ethers } = require("ethers");

// Import contract ABIs
const CauldronABI = require("./abis/Cauldron.json");
const LadleABI = require("./abis/Ladle.json");
const JoinABI = require("./abis/Join.json");

// Set up Ethereum provider
const provider = new ethers.providers.JsonRpcProvider("https://mainnet.infura.io/v3/YOUR_INFURA_API_KEY");

// Set up your Ethereum wallet
const wallet = new ethers.Wallet("YOUR_PRIVATE_KEY", provider);

// Contract addresses
const cauldronAddress = "0x..."; // Cauldron contract address
const ladleAddress = "0x..."; // Ladle contract address
const joinAddress = "0x..."; // Join contract address for the collateral

// Instantiate contracts
const cauldron = new ethers.Contract(cauldronAddress, CauldronABI, wallet);
const ladle = new ethers.Contract(ladleAddress, LadleABI, wallet);
const join = new ethers.Contract(joinAddress, JoinABI, wallet);

// Define collateral and debt parameters
const collateralAmount = ethers.utils.parseUnits("1", 18); // 1 collateral token (assuming 18 decimals)
const debtAmount = ethers.utils.parseUnits("100", 6); // 100 fyToken (assuming 6 decimals)
const ilk = "0x..."; // Collateral type identifier
const series = "0x..."; // Debt series identifier
const userAddress = "0x..."; // Your Ethereum address

// Create a vault if you don't have one
const vaultId = await cauldron.build(ilk, series, userAddress);

// Approve the transfer of collateral to the Join contract
const collateralToken = new ethers.Contract(collateralTokenAddress, ERC20_ABI, wallet);
await collateralToken.approve(joinAddress, collateralAmount);

// Add collateral to the vault
await join.join(vaultId, collateralAmount);

// Borrow the desired amount of fyToken
await ladle.borrow(vaultId, debtAmount);
```

Replace the placeholders with the appropriate values for your use case. This example assumes you have the ABIs for the Cauldron, Ladle, and Join contracts, as well as the ERC20 ABI for the collateral token.

Tokens Used: 2083
	Prompt Tokens: 1504
	Completion Tokens: 579
Successful Requests: 1
Total Cost (USD): $0.07986


In [16]:
ask("I don't understand what ladle is, explain in detail")

### Query
I don't understand what ladle is, explain in detail

### Query Category
**technical**

Tokens Used: 177
	Prompt Tokens: 176
	Completion Tokens: 1
Successful Requests: 1
Total Cost (USD): $0.000266


### Final Answer

The Ladle is a central contract in the Yield Protocol that serves as a routing and asset management system. It is designed to facilitate user interactions with the protocol and manage various components, such as Vaults, Pools, and Joins. The Ladle can be upgraded through Modules or replaced entirely, making it a flexible and powerful component of the protocol.

Key responsibilities of the Ladle include:

1. Managing Vaults: The Ladle is the only contract authorized to create, modify, or destroy Vaults in the Cauldron, which is the core accounting system of the Yield Protocol.

2. Managing Joins: The Ladle keeps a registry of all Joins (contracts that handle token deposits and withdrawals) and is authorized to move assets from any Join to any account.

3. Managing fyTokens: The Ladle is authorized to mint fyToken (fixed yield tokens) at will and moves fyTokens from users to FYToken contracts for burning, with allowances approved by the users.

4. Execution Flow: The Ladle allows users to batch multiple actions in a single transaction using the `batch` function. It can also execute arbitrary calls on any registered contracts using the `route` function and can be extended by the use of modules through `moduleCall`.

5. User Features: The Ladle provides collateralized debt features, such as depositing collateral, borrowing fyTokens, repaying debt, and withdrawing collateral using the `pour` function. It also offers other features like serving, rolling, repaying, redeeming, transferring, and managing permits and Ether.

In summary, the Ladle is a crucial contract in the Yield Protocol that orchestrates contract calls and manages various components to provide a seamless and efficient user experience.

Tokens Used: 2147
	Prompt Tokens: 1802
	Completion Tokens: 345
Successful Requests: 1
Total Cost (USD): $0.07476
