# Preprocess data into Vector DataBase

First step is to process data into vectordb, so that we can create a retirever to load the data.

### Load the data
There are many loaders being supported in langchain. I choose PDF loader here.

In [2]:
from langchain.document_loaders import PyMuPDFLoader
path="data/道路交通安全規則.pdf"
loader = PyMuPDFLoader(path)
pdf_data = loader.load()

# Use the RecursiveCharacterTextSplitter to split the text into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=5)
all_splits = text_splitter.split_documents(pdf_data)

### Embedding Model

There are many embedding models, I choose two embedding models to conduct the experiment:
- `text-embedding-3-small`: Charged
- `all-MiniLM-L6-v2`: Free

In [3]:
embedding_name = "text-embedding-3-small"

if embedding_name == "text-embedding-3-small":
    from key import API_KEY
    from langchain.embeddings import OpenAIEmbeddings
    import os
    os.environ["OPENAI_API_KEY"] = API_KEY
    embedding = OpenAIEmbeddings(model="text-embedding-3-small")
elif embedding_name == "all-MiniLM-L6-v2":
    from langchain.embeddings import HuggingFaceEmbeddings
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
    model_kwargs = {'device': 'cpu'}
    embedding = HuggingFaceEmbeddings(model_name=model_name,
                                    model_kwargs=model_kwargs)
else:
    raise NotImplementedError

  embedding = OpenAIEmbeddings(model="text-embedding-3-small")


### Create the vector database

In [4]:
from langchain.vectorstores import Chroma
persist_directory = 'db'
vectordb = Chroma.from_documents(
    documents=all_splits, 
    embedding=embedding, 
    persist_directory=persist_directory
)

## Language Model
Here we use the api service provided by `llamafactory-cli` to hold our LLM.

Use the following command to setup the service:
```
API_PORT=8000 llamafactory-cli api model_info.yaml
```

After we've create the host, we can use `ChatOpenAI` to get the output.

To combine llm with langchain, the office suggests us to use [LCEL (LangChain Expression Language)](https://python.langchain.com/v0.1/docs/expression_language/) to formualate the chain.

In [5]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(
    openai_api_key='None', 
    openai_api_base='http://0.0.0.0:8000/v1',
    temperature=0.5,    
)

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to Traditional Chinese. Translate the user sentence.",
    ),
    ("human", "Nice to meet you! What a wonderful day!"),
]

# LECL example
chain = (
    llm # The OpenAI model
    | StrOutputParser() # Parse the output of the OpenAI model
)

print(chain.invoke(messages))

你好！真是一個美好的一天！


## Normal LLM versus RAG

First, is the chain of normal LLM.

We only apply template and output parser before and after the inference of llm.

In [6]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

llm = ChatOpenAI(
    openai_api_key='None', 
    openai_api_base='http://0.0.0.0:8000/v1',
    temperature=0
)

template = """Answer the following questions. If you don't know the answer, please say "I don't know".
---
Question {question}
"""

template = ChatPromptTemplate.from_template(template)

def log_prompt(prompt):
    # Print the prompt in green
    print("\033[92m{}\033[00m".format(prompt.to_string().replace("Human: ", "")))  
    return prompt

normal_chain = (
    {"question": RunnablePassthrough()}
    | template
    | RunnableLambda(log_prompt)
    | llm
    | StrOutputParser()
)

Second, is the implementation of RAG.

Apart from what we've done, we also create a retriever when we get the user's input. 

Then we provide the infomation into the prompt, so that models can response better.

In [7]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

retriever = vectordb.as_retriever()

llm = ChatOpenAI(
    openai_api_key='None', 
    openai_api_base='http://0.0.0.0:8000/v1',
    temperature=0
)

template = """Answer the following questions with the extra infomation below. If you don't know the answer, please say "I don't know".

Extra information:
```
{context}
```

---
Question {question}
"""

template = ChatPromptTemplate.from_template(template)

def log_prompt(prompt):
    # Print the prompt in green
    print("\033[92m{}\033[00m".format(prompt.to_string().replace("Human: ", "")))  
    return prompt

def doc_format(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"question": RunnablePassthrough(), "context": retriever | doc_format} 
    | template
    | RunnableLambda(log_prompt)
    | llm
    | StrOutputParser()
)

Let's take a look of the result:

In [10]:
question = "行車的時候可以使用手機嗎？"

In [11]:
print(normal_chain.invoke(question))

[92mAnswer the following questions. If you don't know the answer, please say "I don't know".
---
Question 行車的時候可以使用手機嗎？
[00m
I don't know


In [12]:
print(rag_chain.invoke(question))

[92mAnswer the following questions with the extra infomation below. If you don't know the answer, please say "I don't know".

Extra information:
```
2   慢車行駛於道路時，駕駛人不得以手持方式使用行動電話、電腦或其他相類功能裝置進行撥接、
通話、數據通訊或其他有礙駕駛安全之行為。
第 121 條
（刪除）
第 122 條

2   慢車行駛於道路時，駕駛人不得以手持方式使用行動電話、電腦或其他相類功能裝置進行撥接、
通話、數據通訊或其他有礙駕駛安全之行為。
第 121 條
（刪除）
第 122 條

並應遵守下列規定：
一、禁止操作或觀看娛樂性顯示設備。
二、禁止操作行車輔助顯示設備。
三、禁止以手持方式使用行動電話、電腦或其他相類功能裝置進行撥接、通話、數據通訊或其他
有礙駕駛安全之行為。

並應遵守下列規定：
一、禁止操作或觀看娛樂性顯示設備。
二、禁止操作行車輔助顯示設備。
三、禁止以手持方式使用行動電話、電腦或其他相類功能裝置進行撥接、通話、數據通訊或其他
有礙駕駛安全之行為。
```

---
Question 行車的時候可以使用手機嗎？
[00m
不可以。根據提供的資訊，駕駛人在行駛慢車時不得以手持方式使用行動電話、電腦或其他相類功能裝置進行撥接、通話、數據通訊或其他有礙駕駛安全之行為。


## Test on traffic law in Taiwan

In [5]:
import json
from tqdm import tqdm
with open("data/test.jsonl", "r") as f:
    data = [json.loads(line) for line in f]     

Collect the output from normal chain / finetuned chain:

In [6]:

from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

# The backbone model is either base model or finetuned model
llm = ChatOpenAI(
    openai_api_key='None', 
    openai_api_base='http://0.0.0.0:8000/v1',
    temperature=0
)

template = """Answer the following questions. If you don't know the answer, please say "I don't know".
---
Question {question}
"""

template = ChatPromptTemplate.from_template(template)

finetuned_chain = (
    {"question": RunnablePassthrough()}
    | template
    | llm
    | StrOutputParser()
)

finetuned_chain_results = []
for item in tqdm(data):
    result = finetuned_chain.invoke(item["question"])
    finetuned_chain_results.append(result)

100%|██████████| 100/100 [01:13<00:00,  1.36it/s]


In [7]:
with open("output/finetuned_chain_ckpt_400_results.jsonl", "w") as f:
    for result, data in zip(finetuned_chain_results, data):
        f.write(json.dumps({"question": data["question"], "output": result}, ensure_ascii=False) + "\n")

Collect output from RAG model:

In [6]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

# The RAG one, the only difference is the embedding model in the vectorstore
retriever = vectordb.as_retriever()

llm = ChatOpenAI(
    openai_api_key='None', 
    openai_api_base='http://0.0.0.0:8000/v1',
    temperature=0
)

template = """Answer the following questions with the extra infomation below. If you don't know the answer, please say "I don't know".

Extra information:
```
{context}
```

---
Question {question}
"""

template = ChatPromptTemplate.from_template(template)

def doc_format(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"question": RunnablePassthrough(), "context": retriever | doc_format} 
    | template
    | llm
    | StrOutputParser()
)

rag_chain_embedding_huggingface_results = []
for item in tqdm(data):
    result = rag_chain.invoke(item["question"])
    rag_chain_embedding_huggingface_results.append(result)

100%|██████████| 100/100 [02:42<00:00,  1.62s/it]


In [7]:
with open("output/rag_chain_embedding_huggingface_results.jsonl", "w") as f:
    for result, data in zip(rag_chain_embedding_huggingface_results, data):
        f.write(json.dumps({"question": data["question"], "output": result}, ensure_ascii=False) + "\n")

### Calcuate performance

In [38]:
import pandas as pd
import matplotlib.pyplot as plt
import os

files = [f"log/{file}" for file in os.listdir("log") if file.endswith(".jsonl")]
files.sort()

for file in files:
    with open(file, "r") as f:
        data = [json.loads(line) for line in f]
        
        # Calculate the accuracy of each result
        accuracy = sum([int(d["correct"]) for d in data]) / len(data)
        
        print(f"{file} {accuracy * 100}%")
        

log/scores_finetuned_chain_ckpt_200_results.jsonl 50.0%
log/scores_finetuned_chain_ckpt_400_results.jsonl 49.0%
log/scores_normal_chain_results.jsonl 19.0%
log/scores_rag_chain_embedding_huggingface_results.jsonl 31.0%
log/scores_rag_chain_embedding_openai_results.jsonl 71.0%
