# Chatbot with LangChain conversational chain and Mistral 🤖💬

이 노트북에서는 사업주의 정책과 같은 사용자 정의 데이터에 대한 질문에 응답할 수 있는 챗봇을 구축하겠습니다.

[Mistral]() LLM은 apache-2.0 license로 자유롭게 이용할 수 있으며, 별도의 과금이 없습니다. 

여기서는 [openbuddy-mistral-7B-v13-GGUF](https://huggingface.co/TheBloke/openbuddy-mistral-7B-v13-GGUF)모델은 Mistral 모델을 한국어등 다국어를 지원할 수 있도록 파인튜닝한 모델입니다.

챗봇은 LangChain의 `ConversationalRetrievalChain`을 사용하며 다음과 같은 기능을 갖습니다.

- 자연어로 묻는 질문에 답변
- Elasticsearch에서 하이브리드 검색을 실행하여 질문에 답하는 문서를 찾으세요.
- Mistral LLM을 활용하여 답변 추출 및 요약
- 후속 질문을 위한 대화 기억 유지


## Requirements 🧰

이 예에서는 다음이 필요합니다.

- Python 3.6 이상
- 로컬에 설치된 Elasticsearch

## Install packages 📦

먼저 이 예제에 필요한 패키지를 `pip install`합니다.


In [1]:
%pip install -U langchain elasticsearch tiktoken sentence_transformers llama-cpp-python wget

Note: you may need to restart the kernel to use updated packages.


## Initialize clients 🔌

다음으로 `getpass`를 사용하여 자격 증명을 입력합니다. `getpass`는 Python 표준 라이브러리의 일부이며 자격 증명을 안전하게 요청하는 데 사용됩니다.

In [2]:
from getpass import getpass

ES_URL = "https://localhost:9200" #input('Elasticsearch URL(ex:https://127.0.0.1:9200): ')
ES_USER = "elastic" 
ES_USER_PASSWORD = "elastic" #getpass('elastic user PW: ')
CERT_PATH = "/home/wonseop/es/8.11.1/kibana-8.11.1/data/ca_1700913435542.crt" #input('Elasticsearch pem 파일 경로: ')
# pem 생성 방법: https://cdax.ch/2022/02/20/elasticsearch-python-workshop-1-the-basics/

# set OpenAI API key
# OPENAI_API_KEY = getpass("OpenAI API key")


## Load and process documents 📄

데이터를 로드할 시간입니다!   
우리는 직원 문서 및 정책 목록인 직장 검색 예제 데이터를 사용할 것입니다.


In [3]:
import json
from urllib.request import urlopen
import os

cwd = os.getcwd()
url = cwd + "/data/workplace-docs.json"
response = open(url)

workplace_docs = json.loads(response.read())

print(f"Successfully loaded {len(workplace_docs)} documents")

Successfully loaded 15 documents


## Chunk documents into passages 🪓

봇과 채팅하는 동안 봇은 관련 문서를 찾기 위해 인덱스에서 시멘틱 검색을 실행합니다.   
이것이 정확하려면 전체 문서를 작은 청크(chunk) -구절(passage)이라고도 함-로 분할해야 합니다.   
이런 방식으로 의미론적 검색은 문서 내에서 우리의 질문에 가장 답할 가능성이 높은 구절을 찾을 것입니다.

우리는 LangChain의 `CharacterTextSplitter`를 사용하고 문서의 텍스트를 청크 사이에 약간 겹치도록 800자로 분할할 것입니다.

In [4]:
from langchain.text_splitter import CharacterTextSplitter

metadata = []
content = []

for doc in workplace_docs:
    content.append(doc["content"])
    metadata.append({
        "name": doc["name"],
        "summary": doc["summary"]
    })

text_splitter = CharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=256
)
docs = text_splitter.create_documents(content, metadatas=metadata)

print(f"Split {len(workplace_docs)} documents into {len(docs)} passages")

Created a chunk of size 607, which is longer than the specified 512
Created a chunk of size 788, which is longer than the specified 512
Created a chunk of size 547, which is longer than the specified 512
Created a chunk of size 635, which is longer than the specified 512
Created a chunk of size 866, which is longer than the specified 512
Created a chunk of size 619, which is longer than the specified 512
Created a chunk of size 1120, which is longer than the specified 512
Created a chunk of size 567, which is longer than the specified 512


Split 15 documents into 89 passages


In [5]:
from elasticsearch import Elasticsearch

client = Elasticsearch(
    ES_URL,
    basic_auth=(ES_USER, ES_USER_PASSWORD),
    ca_certs=CERT_PATH
)

if client.indices.exists(index="workplace-docs"):
    client.indices.delete(index="workplace-docs")

임베딩을 생성하고 이를 사용하여 문서를 인덱싱해 보겠습니다.


In [6]:
import os
cwd = os.getcwd()

if os.path.isdir(cwd + "/models"):
    pass
else:
    os.mkdir(cwd + "/models")

In [7]:
os.chdir(cwd + "/models")

try :
    os.system("git lfs install & git clone https://huggingface.co/intfloat/multilingual-e5-base")
except:
    print('이미 모델이 존재합니다.')

os.chdir(cwd)

fatal: destination path 'multilingual-e5-base' already exists and is not an empty directory.


In [8]:
from langchain.vectorstores import ElasticsearchStore
from langchain.embeddings import HuggingFaceEmbeddings

print(cwd + "/models/multilingual-e5-base")

# embeddings = HuggingFaceEmbeddings(model_name="intfloat/multilingual-e5-base", model_kwargs = {'device': 'cpu'} )
embeddings = HuggingFaceEmbeddings(model_name=cwd + "/models/multilingual-e5-base", model_kwargs = {'device': 'cpu'} )

vector_store = ElasticsearchStore.from_documents(
    docs,
    es_connection = client,
    index_name="workplace-docs",
    embedding=embeddings
)

Updated git hooks.
Git LFS initialized.
/home/wonseop/Projects/es-lab-kr/notebooks/generative-ai/models/multilingual-e5-base


  from .autonotebook import tqdm as notebook_tqdm


## Chat with the chatbot 💬

챗봇을 초기화해 보겠습니다.   
Elasticsearch를 문서 검색 및 채팅 세션 기록 저장을 위한 저장소로 정의하고,   
Mistral를 질문을 해석하고 답변을 요약하는 LLM으로 정의한 다음, 이를 대화 체인에 전달합니다.

In [9]:
import wget

if os.path.isfile(cwd + "/models/openbuddy-mistral-7b-v13.Q4_K_M.gguf"):
    pass
else:
    wget.download("https://huggingface.co/TheBloke/openbuddy-mistral-7B-v13-GGUF/resolve/main/openbuddy-mistral-7b-v13.Q4_K_M.gguf", out=cwd + "/models/")

In [10]:
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

n_gpu_layers = None  # Metal set to 1 is enough.
n_batch = 1  # Should be between 1 and n_ctx, consider the amount of RAM of your Apple Silicon Chip.
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    # https://huggingface.co/TheBloke/openbuddy-mistral-7B-v13-GGUF
    model_path = cwd + "/models/openbuddy-mistral-7b-v13.Q4_K_M.gguf",
    # n_gpu_layers=n_gpu_layers,
    # n_batch=n_batch,
    n_ctx=4096,

    # https://www.reddit.com/r/LocalLLaMA/comments/1343bgz/what_model_parameters_is_everyone_using/
    temperature=0.75,
    top_k=1,
    top_p=1,

    # max_tokens=2048,
    verbose=True,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
)

llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /home/wonseop/Projects/es-lab-kr/notebooks/generative-ai/models/openbuddy-mistral-7b-v13.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  4096, 36608,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight q4_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight q4_K     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_v.weight q6_K     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    4:         blk.0.attn_output.weight q4_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_gate.weight q4_K     [  4096, 14336,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.ffn_up.weight q4_K     [  4096, 14336,     1,     1 ]
llama_model_loader: - tensor    7:         

In [11]:
#from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
from lib.elasticsearch_chat_message_history import ElasticsearchChatMessageHistory
from uuid import uuid4

retriever = vector_store.as_retriever()

# llm = OpenAI(openai_api_key=OPENAI_API_KEY)

chat = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

session_id = str(uuid4())
chat_history = ElasticsearchChatMessageHistory(
    client=vector_store.client,
    session_id=session_id,
    index="workplace-docs-chat-history"
)

이제 챗봇에 질문을 할 수 있습니다!

각 질문에 대한 컨텍스트로 채팅 기록이 어떻게 전달되는지 확인하세요.

In [12]:
# Define a convenience function for Q&A
def ask(question, chat_history):
    result = chat({"question": question, "chat_history": chat_history.messages})
    print(f"""[QUESTION] {question}
[ANSWER]  {result["answer"]}
          [SUPPORTING DOCUMENTS] {list(map(lambda d: d.metadata["name"], list(result["source_documents"])))}""")
    chat_history.add_user_message(result["question"])
    chat_history.add_ai_message(result["answer"])

# Chat away!
print(f"[CHAT SESSION ID] {session_id}")

[CHAT SESSION ID] 3a35cfac-ad01-433f-a045-ab81180b8df8


💡 _Try experimenting with other questions or after clearing the workplace data, and observe how the responses change._


In [13]:
ask("What does NASA stand for?", chat_history)


 NASA stands for North America South America. It is the Area Vice-President of North America and the Area Vice-President of South America in our organization's sales structure.
[QUESTION] What does NASA stand for?
[ANSWER]   NASA stands for North America South America. It is the Area Vice-President of North America and the Area Vice-President of South America in our organization's sales structure.

          [SUPPORTING DOCUMENTS] ['Sales Organization Overview', 'Code Of Conduct', 'Code Of Conduct', 'New Employee Onboarding Guide']



llama_print_timings:        load time =     693.17 ms
llama_print_timings:      sample time =      18.82 ms /    39 runs   (    0.48 ms per token,  2072.37 tokens per second)
llama_print_timings: prompt eval time =   99425.20 ms /  1157 tokens (   85.93 ms per token,    11.64 tokens per second)
llama_print_timings:        eval time =    4583.97 ms /    38 runs   (  120.63 ms per token,     8.29 tokens per second)
llama_print_timings:       total time =  104484.34 ms


In [14]:
ask("Which countries are part of it?", chat_history)


Llama.generate: prefix-match hit


 What countries are included in the North America and South America regions according to NASA's sales structure?



llama_print_timings:        load time =     693.17 ms
llama_print_timings:      sample time =      10.02 ms /    22 runs   (    0.46 ms per token,  2196.05 tokens per second)
llama_print_timings: prompt eval time =    8564.69 ms /   104 tokens (   82.35 ms per token,    12.14 tokens per second)
llama_print_timings:        eval time =    2336.84 ms /    21 runs   (  111.28 ms per token,     8.99 tokens per second)
llama_print_timings:       total time =   11016.05 ms
Llama.generate: prefix-match hit


 According to NASA's sales structure, the North America region includes the United States, Canada, Mexico, and Central and South America. The Area Vice-President of North America is Laura Martinez, while the Area Vice-President of South America is Gary Johnson.
[QUESTION] Which countries are part of it?
[ANSWER]   According to NASA's sales structure, the North America region includes the United States, Canada, Mexico, and Central and South America. The Area Vice-President of North America is Laura Martinez, while the Area Vice-President of South America is Gary Johnson.

          [SUPPORTING DOCUMENTS] ['Sales Organization Overview', 'Sales Organization Overview', 'Sales Organization Overview', 'Sales Engineering Collaboration']



llama_print_timings:        load time =     693.17 ms
llama_print_timings:      sample time =      28.57 ms /    58 runs   (    0.49 ms per token,  2030.46 tokens per second)
llama_print_timings: prompt eval time =   43655.00 ms /   520 tokens (   83.95 ms per token,    11.91 tokens per second)
llama_print_timings:        eval time =    6734.35 ms /    58 runs   (  116.11 ms per token,     8.61 tokens per second)
llama_print_timings:       total time =   50750.12 ms


In [15]:
ask("Who are the team's leads?", chat_history)

Llama.generate: prefix-match hit


 Who are the team's leads in NASA's sales structure?



llama_print_timings:        load time =     693.17 ms
llama_print_timings:      sample time =       7.51 ms /    16 runs   (    0.47 ms per token,  2131.34 tokens per second)
llama_print_timings: prompt eval time =   14700.83 ms /   178 tokens (   82.59 ms per token,    12.11 tokens per second)
llama_print_timings:        eval time =    1678.28 ms /    15 runs   (  111.89 ms per token,     8.94 tokens per second)
llama_print_timings:       total time =   16486.38 ms
Llama.generate: prefix-match hit



The Area Vice-President of North America is Laura Martinez, while the Area Vice-President of South America is Gary Johnson. They lead the respective teams for the North and South America regions within the NASA sales structure.
[QUESTION] Who are the team's leads?
[ANSWER]  
The Area Vice-President of North America is Laura Martinez, while the Area Vice-President of South America is Gary Johnson. They lead the respective teams for the North and South America regions within the NASA sales structure.

          [SUPPORTING DOCUMENTS] ['Sales Organization Overview', 'Sales Engineering Collaboration', 'Sales Engineering Collaboration', 'Sales Organization Overview']



llama_print_timings:        load time =     693.17 ms
llama_print_timings:      sample time =      24.88 ms /    50 runs   (    0.50 ms per token,  2009.81 tokens per second)
llama_print_timings: prompt eval time =   32621.29 ms /   391 tokens (   83.43 ms per token,    11.99 tokens per second)
llama_print_timings:        eval time =    5588.11 ms /    49 runs   (  114.04 ms per token,     8.77 tokens per second)
llama_print_timings:       total time =   38505.83 ms


# (Optional) Clean up 🧹

완료되면 이 세션의 채팅 기록을 정리할 수 있습니다

In [16]:
chat_history.clear()

... or delete the indices.


In [17]:
vector_store.client.indices.delete(index='workplace-docs')
vector_store.client.indices.delete(index='workplace-docs-chat-history')

ObjectApiResponse({'acknowledged': True})