# LangChain 实战：学前教育销售聊天机器人

## 使用 GPT-4 构造销售话术数据

使用 ChatGPT 构造销售数据的 Prompt 示例：

```
你是中国顶级的学前教育销售，现在培训职场新人，请给出100条实用的销售话术。

每条销售话术以如下格式给出：
[客户问题]
[销售回答]

```

## 使用 Document Transformers 模块来处理原始数据


将 ChatGPT 生成的结果保存到 [sales_data.txt](sales_data.txt) 文件中

In [1]:
with open("sales_data.txt", encoding='utf8') as f:
    sales = f.read()

### 使用 CharacterTextSplitter 来进行文本分割

- 基于单字符来进行文本分割（separator）
- 基于字符数来决定文本块长度（chunk_size）

参考示例：

```python
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(        
    separator = "\n\n",
    chunk_size = 1000,
    chunk_overlap  = 200,
    length_function = len,
    is_separator_regex = False,
)
```


In [2]:
from langchain.text_splitter import CharacterTextSplitter

In [3]:
text_splitter = CharacterTextSplitter(        
    separator = r'\d+\.',
    chunk_size = 100,
    chunk_overlap  = 0,
    length_function = len,
    is_separator_regex = True,
)

In [4]:
docs = text_splitter.create_documents([sales])

In [5]:
docs[0]

Document(page_content='[客户问题] 学前教育真的有必要吗？\n[销售回答] 当然，学前教育是孩子人生中的重要起步，能够帮助他们更好地适应小学教育，培养好习惯和社交能力。', metadata={})

In [6]:
len(docs)

100

### 使用 Faiss 作为向量数据库，持久化存销售 问答对（QA-Pair）

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS

db = FAISS.from_documents(docs, OpenAIEmbeddings())

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/embeddings (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001BE75F8FB80>, 'Connection to api.openai.com timed out. (connect timeout=600)')).


In [None]:
query = "你们的师资力量怎么样"

In [None]:
answer_list = db.similarity_search(query)

In [None]:
for ans in answer_list:
    print(ans.page_content + "\n")

In [None]:
db.save_local("sale")

### 使用 retriever 从向量数据库中获取结果

#### 使用参数 `k` 指定返回结果数量


In [None]:
topK_retriever = db.as_retriever(search_kwargs={"k": 3})

In [None]:
topK_retriever

In [None]:
docs = topK_retriever.get_relevant_documents(query)
for doc in docs:
    print(doc.page_content + "\n")

In [None]:
docs = topK_retriever.get_relevant_documents("有没有优惠政策？")

In [None]:
for doc in docs:
    print(doc.page_content + "\n")

#### 使用 similarity_score_threshold 设置阈值，提升结果的相关性质量

In [None]:
retriever = db.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.8}
)

In [None]:
docs = retriever.get_relevant_documents(query)
for doc in docs:
    print(doc.page_content + "\n")

### 提取向量数据库中的`销售回答`

In [None]:
docs = retriever.get_relevant_documents(query)

In [None]:
docs[0].page_content

In [None]:
docs[0].page_content.split("[销售回答] ")

In [None]:
ans = docs[0].page_content.split("[销售回答] ")[-1]

In [None]:
ans

#### 尝试各种问题

In [None]:
from typing import List

def sales(query: str, score_threshold: float=0.8) -> List[str]:
    retriever = db.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": score_threshold})    
    docs = retriever.get_relevant_documents(query)
    ans_list = [doc.page_content.split("[销售回答] ")[-1] for doc in docs]

    return ans_list

In [None]:
query = "我想离学校近点"

print(sales(query))

In [None]:
print(sales(query, 0.75))

In [None]:
query = "价格两千以内"

print(f"score:0.8 ans: {sales(query)}\n")
print(f"score:0.75 ans: {sales(query, 0.75)}\n")
print(f"score:0.5 ans: {sales(query, 0.5)}\n")

#### 当向量数据库中没有合适答案时，使用大语言模型能力

In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm,
                                       retriever=db.as_retriever(search_type="similarity_score_threshold",
                                                                 search_kwargs={"score_threshold": 0.8}))

In [None]:
qa_chain({"query": query})

In [None]:
qa_chain({"query": "我想离学校近点"})

In [None]:
print(sales("我想离学校近点"))

## 加载 FAISS 向量数据库已有结果

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS

db = FAISS.load_local("sale", OpenAIEmbeddings())

In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm,
                                       retriever=db.as_retriever(search_type="similarity_score_threshold",
                                                                 search_kwargs={"score_threshold": 0.8}))

In [None]:
qa_chain({"query": "我想学编程，你们有么"})

In [None]:
# 输出内部 Chain 的日志
qa_chain.combine_documents_chain.verbose = True

In [None]:
qa_chain({"query": "我想学编程，你们有么"})

In [None]:
# 返回向量数据库的检索结果
qa_chain.return_source_documents = True

In [None]:
result = qa_chain({"query": "我想学编程，你们有么"})

In [None]:
result