## 💡 这节课会带给你

1. 如何使用 LangChain：一套在大模型能力上封装的工具框架
2. 如何用几行代码实现一个复杂的 AI 应用
3. 面向大模型的流程开发的过程抽象

开始上课！


## 写在前面

- LangChain 是一套面向大模型的开发框架
- LangChain 是 AGI 时代软件工程的一个探索和原型
- LangChain 并不完美，还在不断迭代中：我写这个课件的时候是 V0.0.200，现在是 V0.0.241
- 学习 LangChain 更重要的是借鉴其思想，具体的接口可能很快就会改变


## LangChain 的核心组件

1. 模型 I/O 封装
   - LLMs：大语言模型
   - Chat Models：一般基于 LLMs，但按对话结构重新封装
   - PromptTemple：提示词模板
   - OutputParser：解析输出
2. 数据连接封装
   - Document Loaders：各种格式文件的加载器
   - Document transformers：对文档的常用操作，如：split, filter, translate, extract metadata, etc
   - Text Embedding Models：文本向量化表示，用于检索等操作（啥意思？别急，后面详细讲）
   - Verctor stores: （面向检索的）向量的存储
   - Retrievers: 向量的检索
3. 记忆封装
   - Memory：这里不是物理内存，从文本的角度，可以理解为“上文”、“历史记录”或者说“记忆力”的管理
4. 架构封装
   - Chain：实现一个功能或者一系列顺序功能组合
   - Agent：根据用户输入，自动规划执行步骤，自动选择每步需要的工具，最终完成用户指定的功能
     - Tools：调用外部功能的函数，例如：调 google 搜索、文件 I/O、Linux Shell 等等
     - Toolkits：操作某软件的一组工具集，例如：操作 DB、操作 Gmail 等等
5. Callbacks


## 一、模型 I/O 封装

### 1.1 模型 API：LLM vs. ChatModel


In [None]:
!pip install langchain == 0.0.240rc4


In [37]:
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

llm = OpenAI()  # 默认是text-davinci-003模型
llm.predict("你好，欢迎")


'使用我的产品！\n\n你可以在我们的网站上找到有关我们产品的详细信息，包括产品功能、价格、评论等等。我们还提供了专业的客户服务，可以为您提供有关产品的帮助和咨询。我们也提供免费的技术支持服务，如果您在使用我们的产品时遇到任何问题，可以随时联'

In [38]:
chat_model = ChatOpenAI() #默认是gpt-3.5-turbo
chat_model.predict("你好，欢迎")


'你好！感谢您的欢迎。有什么我可以帮助您的吗？'

In [39]:
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

messages = [
    SystemMessage(content="你是AGIClass的课程助理。"),
    HumanMessage(content="我来上课了")
]
chat_model(messages)


AIMessage(content='欢迎来到AGIClass！请问你是第一次来上课吗？', additional_kwargs={}, example=False)

### 1.2 模型的输入与输出

<img src="model_io.jpg" style="margin-left: 0px" width=500px>

### PromptTemplate


In [47]:
from langchain.prompts import PromptTemplate

template = PromptTemplate.from_template("给我讲个关于{subject}的笑话")
print(template.input_variables)
print(template.format(subject='小明'))


['subject']
给我讲个关于小明的笑话


<div class="alert alert-warning">
<b>思考：</b>把Prompt模板看作带有参数的函数，下面的内容可能更好理解
</div>


<div class="alert alert-info">
<b>作业：</b>自学ChatPromptTemplate的使用
</div>


### OutputParser


In [71]:
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List, Dict
import json


def chinese_friendly(string):
    lines = string.split('\n')
    for i, line in enumerate(lines):
        if line.startswith('{') and line.endswith('}'):
            try:
                lines[i] = json.dumps(json.loads(line), ensure_ascii=False)
            except:
                pass
    return '\n'.join(lines)


model_name = 'gpt-4'
temperature = 0.0
model = OpenAI(model_name=model_name, temperature=temperature)

# 定义你的输出格式


class Command(BaseModel):
    command: str = Field(description="linux shell命令名")
    arguments: Dict[str, str] = Field(description="命令的参数 (name:value)")

    # 你可以添加自定义的校验机制
    @validator('command')
    def no_space(cls, field):
        if " " in field or "\t" in field or "\n" in field:
            raise ValueError("命令名中不能包含空格或回车!")
        return field


parser = PydanticOutputParser(pydantic_object=Command)

prompt = PromptTemplate(
    template="将用户的指令转换成linux命令.\n{format_instructions}\n{query}",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

print(chinese_friendly(parser.get_format_instructions()))


query = "将系统日期设为2023-04-01"
model_input = prompt.format_prompt(query=query)

#print(parser.get_format_instructions())

print("====Prompt=====")
print(chinese_friendly(model_input.to_string()))

output = model(model_input.to_string())
print("====Output=====")
print(output)
print("====Parsed=====")
cmd = parser.parse(output)
print(cmd)


The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"command": {"title": "Command", "description": "linux shell命令名", "type": "string"}, "arguments": {"title": "Arguments", "description": "命令的参数 (name:value)", "type": "object", "additionalProperties": {"type": "string"}}}, "required": ["command", "arguments"]}
```
====Prompt=====
将用户的指令转换成linux命令.
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "st

## 二、数据连接封装

<img src="data_connection.jpg" style="margin-left: 0px" width=500px>

### 2.1 文档加载器：Document Loaders


In [None]:
!pip install pypdf


In [2]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("WhatisChatGPT.pdf")
pages = loader.load_and_split()

print(pages[0].page_content)


What is ChatGPT ? 
Md. Sakibul Islam Sakib  
 
ChatGPT is a conversational language model developed by OpenAI. It is part of the GPT (Generative 
Pretrained  Transformer) family of models, which are based on the Transformer architecture and trained on 
vast amounts of text data to generate human -like text.  
ChatGPT is designed to generate text in response to an input prompt, making it well suited for 
conversatio nal applications such as chatbots, customer service agents, and virtual assistants. The model has 
been trained on a diverse range of conversational data, including websites, books, and social media, 
allowing it to generate text that is coherent, contextual ly relevant, and often similar to text produced by 
humans.  
To use ChatGPT, a user provides an input prompt, such as a question or statement, which is then fed into 
the model. The model then generates a response based on its understanding of the input and i ts training 
data. The model can generate multiple responses 

### 2.2 文档处理器

例 1：TextSplitter


In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=50,
    chunk_overlap=10,  # 思考：为什么要做overlap
    length_function=len,
    add_start_index=True,
)

paragraphs = text_splitter.create_documents([pages[0].page_content])
for para in paragraphs[:5]:
    print(para.page_content)
    print('-------')


What is ChatGPT ? 
Md. Sakibul Islam Sakib
-------
ChatGPT is a conversational language model
-------
model developed by OpenAI. It is part of the GPT
-------
the GPT (Generative
-------
Pretrained  Transformer) family of models, which
-------


例 2：Doctran


In [52]:
!pip install doctran


In [9]:
from langchain.document_transformers import DoctranTextTranslator

translator = DoctranTextTranslator(
    openai_api_model="gpt-3.5-turbo", language="chinese")

translated_document = await translator.atransform_documents(pages)

print(translated_document[0].page_content)


ChatGPT是由OpenAI开发的一种对话语言模型。它是GPT（生成预训练变压器）模型系列的一部分，基于变压器架构，并在大量文本数据上进行训练，以生成类似人类的文本。ChatGPT旨在根据输入提示生成文本，非常适用于聊天机器人、客服代理和虚拟助手等对话应用。该模型经过多样化的对话数据训练，包括网站、书籍和社交媒体，使其能够生成连贯、上下文相关且通常类似于人类生成的文本。用户提供一个输入提示，例如问题或陈述，然后将其输入模型。模型根据对输入的理解和训练数据生成响应。模型可以为单个输入生成多个响应，响应的长度、风格和内容可以根据输入的上下文而变化。ChatGPT的一个关键优势是其能够生成与输入提示相关的文本。例如，如果用户询问天气问题，ChatGPT可以生成包含与天气相关的信息（如温度、降水和风况）的响应。如果用户随后提出跟进问题，ChatGPT可以使用先前的对话作为上下文，生成与先前对话相关的响应。ChatGPT还可以生成连贯流畅的文本，使其能够参与更长的对话。该模型经过大量对话数据的训练，已经学会了如何生成语法正确、使用适当的词汇和语气，并与输入提示和整体对话一致的文本。除了生成文本，ChatGPT还可以用于其他自然语言处理任务，如问答、摘要和文本分类。该模型经过特定任务的微调，使其在这些任务上表现良好，同时仍保留生成类似人类文本的能力。ChatGPT是使用大型语言模型进行对话应用的一个重要趋势的一部分。这些模型有潜力彻底改变我们与技术互动的方式，使我们能够以更自然、更直观的方式与设备和服务进行交互。然而，需要注意的是，这些模型并不完美，有时可能生成与主题无关、冒犯性或事实错误的文本。总之，ChatGPT是一个强大的对话语言模型，能够根据输入提示生成类似人类的文本。它具有广泛的潜在应用，从聊天机器人和虚拟助手到问答和文本分类等自然语言处理任务。然而，需要仔细考虑这些模型的限制和潜在风险，并负责任地使用它们。


### 2.3 文档向量化：Text Embeddings

<div class="alert alert-success">
<b>Embedding：</b>将目标物体（词、句子、文章）表示成向量的方法
</div>


In [15]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
text = "这是一个测试"
document = "测试文档"
query_vec = embeddings.embed_query(text)
doc_vec = embeddings.embed_documents([document])

print(len(query_vec))
print(query_vec[:10])  # 为了展示方便，只打印前10维
print(len(doc_vec[0]))
print(doc_vec[0][:10])  # 为了展示方便，只打印前10维


1536
[-0.011436891742050648, -0.012987656518816948, 0.00902028288692236, -0.01197319757193327, -0.02477347105741501, 0.01448672916740179, -0.021891633048653603, -0.005188601091504097, -0.0012567657977342606, -0.03370329365134239]
1536
[-0.0026156024105142475, 0.0007669371424757893, -0.0046187359068776655, -0.005695960389375307, -0.010613725343519495, 0.026026324865937287, -0.014497498739680675, -0.001994126675894642, -0.014295743539741444, -0.018316421352086065]


### 2.4 向量的存储（与索引）：Vectorstores

<img src="vector_stores.jpg" style="margin-left: 0px" width=500px>


In [10]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS

embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(paragraphs, embeddings)

query = "What can ChatGPT do?"
docs = db.similarity_search(query)
print(docs[0].page_content)


One of the key strengths of ChatGPT is its ability to generate text that is cont extually relevant to the input


<div class="alert alert-success">
<b>VectorDB建议：</b>
    <li>没有极端高性能要求的，FAISS比较常用</li>
    <li>Pinecone（付费-云服务）易用性比较好</li>
    <li>有极端性能要求的，可以找专人优化ElasticSearch（或APU加速）</li>
    <li>一些对比分析可参考：https://www.modb.pro/db/516016</li>
</div>


### 2.5 向量检索：Retrievers


In [17]:
retriever = db.as_retriever()
docs = retriever.get_relevant_documents("What can ChatGPT do?")

print(docs[0].page_content)


One of the key strengths of ChatGPT is its ability to generate text that is cont extually relevant to the input


如果不用向量检索会怎样？


In [19]:
from langchain.retrievers import TFIDFRetriever  # 最传统的关键字加权检索

retriever = TFIDFRetriever.from_documents(paragraphs)
docs = retriever.get_relevant_documents("What can ChatGPT do?")

print(docs[0].page_content)


What is ChatGPT ? 
Md. Sakibul Islam Sakib  
 
ChatGPT is a conversational language model developed by OpenAI. It is part of the GPT (Generative


<div class="alert alert-warning">
<b>思考：</b>为什么向量检索效果更好？
</div>


## 三、记忆封装：Memory

### 3.1 对话上下文：ConversationBufferMemory


In [72]:
from langchain.memory import ConversationBufferMemory, ConversationBufferWindowMemory

history = ConversationBufferMemory()
history.save_context({"input": "你好啊"}, {"output": "你也好啊"})

print(history.load_memory_variables({}))

history.save_context({"input": "你再好啊"}, {"output": "你又好啊"})

print(history.load_memory_variables({}))

{'history': 'Human: 你好啊\nAI: 你也好啊'}
{'history': 'Human: 你好啊\nAI: 你也好啊\nHuman: 你再好啊\nAI: 你又好啊'}


只保留一个窗口的上下文

In [73]:
from langchain.memory import ConversationBufferWindowMemory

window = ConversationBufferWindowMemory(k=2)
window.save_context({"input": "第一轮问"}, {"output": "第一轮答"})
window.save_context({"input": "第二轮问"}, {"output": "第二轮答"})
window.save_context({"input": "第三轮问"}, {"output": "第三轮答"})
print(window.load_memory_variables({}))

{'history': 'Human: 第二轮问\nAI: 第二轮答\nHuman: 第三轮问\nAI: 第三轮答'}


### 3.2 自动对历史信息做摘要：ConversationSummaryMemory


In [76]:
from langchain.memory import ConversationSummaryMemory
from langchain.llms import OpenAI

memory = ConversationSummaryMemory(
    llm=OpenAI(temperature=0),
    # buffer="The conversation is between a customer and a sales."
    buffer="以中文表示"
)
memory.save_context(
    {"input": "你好"}, {"output": "你好，我是你的AI助手。我能为你回答有关AGIClass的各种问题。"})

print(memory.load_memory_variables({}))


{'history': '\n人类问AI助手你好，AI助手回答你好，表示自己是人类的AI助手，可以回答有关AGIClass的各种问题。'}


## 四、链架构：Chain


「Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a PromptTemplate, and then passes the formatted response to an LLM. We can build more complex chains by combining multiple chains together, or by combining chains with other components.」

- Chain 封装了一个既定的流程
- 类比于函数封装了过程
- 建造者模式（Builder Pattern）, 解耦各种复杂的组件

### 4.1 一个最简单的 Chain


In [78]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0.9)
prompt = PromptTemplate(
    input_variables=["product"],
    template="为生产{product}的公司取一个亮眼中文名字：",
)

chain = LLMChain(llm=llm, prompt=prompt)

print(chain.run("电脑"))


智迅电子


### 4.2 在 Chain 中加入 Memory


In [80]:
from langchain.memory import ConversationBufferMemory, ConversationSummaryMemory
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

template = """你是聊天机器人小瓜，你可以和人类聊天。

{memory}
Human: {human_input}
AI:"""

prompt = PromptTemplate(
    input_variables=["memory", "human_input"], template=template
)

#memory = ConversationBufferMemory(memory_key="memory")

memory = ConversationSummaryMemory(llm=OpenAI(
    temperature=0), buffer="以中文表示", memory_key="memory")

llm_chain = LLMChain(
    llm=OpenAI(),
    prompt=prompt,
    verbose=True,
    memory=memory,
)

print(llm_chain.run("你是谁？"))
print("---------------")
output = llm_chain.run("我刚才问了你什么，你是怎么回答的？")
print(output)




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m你是聊天机器人小瓜，你可以和人类聊天。

以中文表示
Human: 你是谁？
AI:[0m

[1m> Finished chain.[0m
 你好，我是聊天机器人小瓜，很高兴认识你。
---------------


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m你是聊天机器人小瓜，你可以和人类聊天。


以中文表示
人类问AI谁是，AI回答自己是聊天机器人小瓜，表示很高兴认识人类。
Human: 我刚才问了你什么，你是怎么回答的？
AI:[0m

[1m> Finished chain.[0m
 您刚才问我是谁，我回答说我是聊天机器人小瓜，表示很高兴认识您。


### 4.3 一个复杂一点的 Chain


<img src="stuffdocchain.png" style="margin-left: 0px" width=500px>

In [None]:
!pip install unstructured faiss-cpu


In [81]:
from langchain.document_loaders import UnstructuredMarkdownLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

loader = UnstructuredMarkdownLoader("ChatALL.md")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=50)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(texts, embeddings)
qa_chain = RetrievalQA.from_chain_type(llm=OpenAI(
    temperature=0), chain_type="stuff", retriever=db.as_retriever())

query = "ChatALL在哪下载"
qa_chain.run(query)


' ChatALL可以从https://github.com/sunner/ChatALL/releases 下载。'

In [36]:
print('================qa_chain===============')
print(qa_chain)
print('======combine_documents_chain==========')
print(qa_chain.combine_documents_chain.document_prompt)
print('==============llm_chain================')
print(qa_chain.combine_documents_chain.llm_chain.prompt.template)


memory=None callbacks=None callback_manager=None verbose=False tags=None metadata=None combine_documents_chain=StuffDocumentsChain(memory=None, callbacks=None, callback_manager=None, verbose=False, tags=None, metadata=None, input_key='input_documents', output_key='output_text', llm_chain=LLMChain(memory=None, callbacks=None, callback_manager=None, verbose=False, tags=None, metadata=None, prompt=PromptTemplate(input_variables=['context', 'question'], output_parser=None, partial_variables={}, template="Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n{context}\n\nQuestion: {question}\nHelpful Answer:", template_format='f-string', validate_template=True), llm=OpenAI(cache=None, verbose=False, callbacks=None, callback_manager=None, tags=None, metadata=None, client=<class 'openai.api_resources.completion.Completion'>, model_name='text-davinci-003', temperature=0.0, max_tokens


### 4.4 常用的基础 Chain 类型：Sequential


In [82]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chains import SimpleSequentialChain

llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0.9)
name_prompt = PromptTemplate(
    input_variables=["product"],
    template="为生产{product}的公司取一个亮眼中文名字：",
)

name_chain = LLMChain(llm=llm, prompt=name_prompt)

slogan_prompt = PromptTemplate(
    input_variables=["name"],
    template="为名为{name}的公司起一个Slogan，输出格式 name:slogan",
)

slogan_chain = LLMChain(llm=llm, prompt=slogan_prompt)

overall_chain = SimpleSequentialChain(
    chains=[name_chain, slogan_chain], verbose=True)

print(overall_chain.run("雨伞"))




[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m天降伞业[0m
[33;1m[1;3m天降伞业:安全降落，信天由命[0m

[1m> Finished chain.[0m
天降伞业:安全降落，信天由命


### 4.5 常用的基础 Chain 类型：Transform


In [83]:
import re
from langchain.chains import TransformChain, LLMChain, SimpleSequentialChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

# 例如：发给OpenAI之前，把用户隐私数据抹掉


def anonymize(inputs: dict) -> dict:
    text = inputs["text"]
    t = re.compile(
        r'1(3\d|4[4-9]|5[0-35-9]|6[67]|7[013-8]|8[0-9]|9[0-9])\d{8}')
    while True:
        s = re.search(t, text)
        if s:
            text = text.replace(s.group(), '***********')
        else:
            break
    return {"output_text": text}


transform_chain = TransformChain(
    input_variables=["text"], output_variables=["output_text"], transform=anonymize
)

llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0.9)
prompt = PromptTemplate(
    input_variables=["input"],
    template="根据下述句子，提取候选人的职业:\n{input}\n输出JSON, 以job为key",
)

task_chain = LLMChain(llm=llm, prompt=prompt)

overall_chain = SimpleSequentialChain(
    chains=[transform_chain, task_chain], verbose=True)

print(overall_chain.run("我是警察，有事随时跟我联系，打我手机13912345678"))




[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m我是警察，有事随时跟我联系，打我手机***********[0m
[33;1m[1;3m{"job": "警察"}[0m

[1m> Finished chain.[0m
{"job": "警察"}


### 4.6 常用的基础 Chain 类型：Router


In [84]:
from langchain.chains.router.multi_prompt_prompt import MULTI_PROMPT_ROUTER_TEMPLATE
from langchain.chains.router.llm_router import LLMRouterChain, RouterOutputParser
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.chains import ConversationChain
from langchain.llms import OpenAI
from langchain.chains.router import MultiPromptChain
import warnings
warnings.filterwarnings("ignore")


windows_template = """
你只会写DOS或Windows Shell脚本。你不会写任何其他语言的程序。你也不会写Linux脚本。

用户问题:
{input}
"""

linux_template = """
你只会写Linux Shell脚本。你不会写任何其他语言的程序。你也不会写Windows脚本。

用户问题:
{input}
"""

prompt_infos = [
    {
        "name": "WindowsExpert",
        "description": "擅长回答Windows Shell相关问题",
        "prompt_template": windows_template,
    },
    {
        "name": "LinuxExpert",
        "description": "擅长回答Linux Shell相关问题",
        "prompt_template": linux_template,
    },
]

llm = OpenAI()

destination_chains = {}
for p_info in prompt_infos:
    name = p_info["name"]
    prompt_template = p_info["prompt_template"]
    prompt = PromptTemplate(template=prompt_template,
                            input_variables=["input"])
    chain = LLMChain(llm=llm, prompt=prompt)
    destination_chains[name] = chain
default_chain = ConversationChain(llm=llm, output_key="text")

destinations = [f"{p['name']}: {p['description']}" for p in prompt_infos]
destinations_str = "\n".join(destinations)
router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(
    destinations=destinations_str)
router_prompt = PromptTemplate(
    template=router_template,
    input_variables=["input"],
    output_parser=RouterOutputParser(),
)
router_chain = LLMRouterChain.from_llm(llm, router_prompt)

chain = MultiPromptChain(
    router_chain=router_chain,
    destination_chains=destination_chains,
    default_chain=default_chain,
    verbose=True,
)

print(chain.run("帮我写个脚本，让Windows系统每天0点自动校对时间"))

print(chain.run("帮我写个cron脚本，让系统每天0点自动重启"))




[1m> Entering new MultiPromptChain chain...[0m
WindowsExpert: {'input': '帮我写个脚本，让Windows系统每天0点自动校准时间'}
[1m> Finished chain.[0m

答案：
首先，您需要在Windows系统中打开控制台，在控制台输入以下命令：

schtasks /create /tn "Auto Sync Time" /tr schtasks /run /sc daily /st 00:00:00

接着，您可以运行这个任务，使Windows系统每天0点自动校准时间：

schtasks /run /tn "Auto Sync Time"


[1m> Entering new MultiPromptChain chain...[0m
LinuxExpert: {'input': '帮我写个cron脚本，让系统每天0点自动重启'}
[1m> Finished chain.[0m

答案:
这是一个Linux Shell脚本，可以在crontab中使用：

0 0 * * * /sbin/reboot


<div class="alert alert-warning">
<b>思考：</b>Router是否是一个必要的基础类型？
</div>


### 4.7 封装API调用：APIChain

In [85]:
from langchain.chains import APIChain
from langchain.prompts.prompt import PromptTemplate


from langchain.llms import OpenAI

llm = OpenAI(temperature=0)
from langchain.chains.api import open_meteo_docs
chain_new = APIChain.from_llm_and_api_docs(llm, open_meteo_docs.OPEN_METEO_DOCS, verbose=True)
chain_new.run('北京今天气温，摄氏度')



[1m> Entering new APIChain chain...[0m
[32;1m[1;3mhttps://api.open-meteo.com/v1/forecast?latitude=39.9042&longitude=116.4074&hourly=temperature_2m&temperature_unit=celsius[0m
[33;1m[1;3m{"latitude":39.875,"longitude":116.375,"generationtime_ms":0.11909008026123047,"utc_offset_seconds":0,"timezone":"GMT","timezone_abbreviation":"GMT","elevation":47.0,"hourly_units":{"time":"iso8601","temperature_2m":"°C"},"hourly":{"time":["2023-07-27T00:00","2023-07-27T01:00","2023-07-27T02:00","2023-07-27T03:00","2023-07-27T04:00","2023-07-27T05:00","2023-07-27T06:00","2023-07-27T07:00","2023-07-27T08:00","2023-07-27T09:00","2023-07-27T10:00","2023-07-27T11:00","2023-07-27T12:00","2023-07-27T13:00","2023-07-27T14:00","2023-07-27T15:00","2023-07-27T16:00","2023-07-27T17:00","2023-07-27T18:00","2023-07-27T19:00","2023-07-27T20:00","2023-07-27T21:00","2023-07-27T22:00","2023-07-27T23:00","2023-07-28T00:00","2023-07-28T01:00","2023-07-28T02:00","2023-07-28T03:00","2023-07-28T04:00","2023-07-28T05

' The temperature in Beijing today is 27.6°C.'

### 4.8 调用OpenAI Function Calling获得Pydantic输出

In [86]:
from pydantic import BaseModel, Field
from typing import Optional
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.schema import HumanMessage, SystemMessage

from langchain.chains.openai_functions import (
    create_openai_fn_chain,
    create_structured_output_chain,
)
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

class Contact(BaseModel):
    """Extracting information about a contact persion."""

    name: str = Field(..., description="The person's name")
    address: str = Field(..., description="The person's address")
    tel: str = Field(None, description="The person's telephone/mobile number")

prompt_msgs = [
    SystemMessage(
        content="You are a world class algorithm for extracting information in structured formats."
    ),
    HumanMessage(
        content="Use the given format to extract information from the following input:"
    ),
    HumanMessagePromptTemplate.from_template("{input}"),
    HumanMessage(content="Tips: Make sure to answer in the correct format"),
]
prompt = ChatPromptTemplate(messages=prompt_msgs)
llm = ChatOpenAI(model="gpt-4-0613", temperature=0)

chain = create_structured_output_chain(Contact, llm, prompt, verbose=True)

chain.run("寄给亮马桥外交办公大楼的王卓然，13012345678")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: You are a world class algorithm for extracting information in structured formats.
Human: Use the given format to extract information from the following input:
Human: 寄给亮马桥外交办公大楼的王卓然，13012345678
Human: Tips: Make sure to answer in the correct format[0m

[1m> Finished chain.[0m


Contact(name='王卓然', address='亮马桥外交办公大楼', tel='13012345678')

In [87]:
from langchain.chains import TransformChain, LLMChain, SimpleSequentialChain
from typing import Dict

def process(inputs: Dict[str,Contact])->str:
    person = inputs["contact"]
    return {"text":f"BEGIN:VCARD\nVERSION:2.1\nN:{person.name}\nADR:{person.address}\nTEL:{person.tel}\nEND:VCARD"}


transform_chain = TransformChain(
    input_variables=["contact"], output_variables=["text"], transform=process
)

overall_chain = SimpleSequentialChain(
    chains=[chain, transform_chain], verbose=True)

print(overall_chain.run("寄给亮马桥外交办公大楼的王卓然，13012345678"))



[1m> Entering new SimpleSequentialChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: You are a world class algorithm for extracting information in structured formats.
Human: Use the given format to extract information from the following input:
Human: 寄给亮马桥外交办公大楼的王卓然，13012345678
Human: Tips: Make sure to answer in the correct format[0m

[1m> Finished chain.[0m
[36;1m[1;3mname='王卓然' address='亮马桥外交办公大楼' tel='13012345678'[0m
[33;1m[1;3mBEGIN:VCARD
VERSION:2.1
N:王卓然
ADR:亮马桥外交办公大楼
TEL:13012345678
END:VCARD[0m

[1m> Finished chain.[0m
BEGIN:VCARD
VERSION:2.1
N:王卓然
ADR:亮马桥外交办公大楼
TEL:13012345678
END:VCARD


### 4.9 基于 Document 的 Chains

<img src="stuff.jpg" style="margin-left: 0px" width=500px>
<img src="refine.jpg" style="margin-left: 0px" width=500px>
<img src="map_reduce.jpg" style="margin-left: 0px" width=500px>
<img src="map_rerank.jpg" style="margin-left: 0px" width=500px>


In [91]:
from langchain.callbacks import StdOutCallbackHandler
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.chains.base import Chain
from langchain.document_loaders import PyPDFLoader

def set_verbose_recusively(chain):
    chain.verbose = True
    for attr in dir(chain):
        if attr.endswith('_chain') and isinstance(getattr(chain,attr),Chain):
            subchain=getattr(chain,attr)
            set_verbose_recusively(subchain)

loader = PyPDFLoader("SIGDIAL2023.pdf")
documents = loader.load_and_split()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len,
    add_start_index=True,
)

paragraphs = text_splitter.create_documents(
    [d.page_content for d in documents])
# print(paragraphs)
embeddings = OpenAIEmbeddings(model='text-embedding-ada-002')
db = FAISS.from_documents(paragraphs, embeddings)
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(temperature=0),
    chain_type="map_rerank",
    retriever=db.as_retriever(),
    verbose=True
)
set_verbose_recusively(qa_chain)

query = "When is the regular submission deadline? When is the ARR submission deadline?"
qa_chain.run(query)




[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new MapRerankDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

In addition to giving an answer, also return a score of how fully it answered the user's question. This should be in the following format:

Question: [question here]
Helpful Answer: [answer here]
Score: [score between 0 and 100]

How to determine the score:
- Higher is a better answer
- Better responds fully to the asked question, with sufficient level of detail
- If you do not know the answer based on the context, that should be a score of 0
- Don't be overconfident!

Example #1

Context:
---------
Apples are red
---------
Question: what color are apples?
Helpful Answer: red
Score: 100

Example #2

Context:
---------
it was night and the witn

'The regular submission deadline is April 15, 2023. The ARR submission deadline is also April 15, 2023.'

## 五、智能体架构：Agent


### 5.1 什么是智能体（Agent）

将大语言模型作为一个推理引擎。给定一个任务，智能体自动生成完成任务所需的步骤，执行相应动作（例如选择并调用工具），直到任务完成。


### 5.2 先定义一些工具：Tools

- 可以是一个函数或三方 API
- 也可以把一个 Chain 或者 Agent 的 run()作为一个 Tool


In [47]:
from langchain import SerpAPIWrapper

search = SerpAPIWrapper()
tools = [
    Tool.from_function(
        func=search.run,
        name="Search",
        description="useful for when you need to answer questions about current events"
    ),
]


In [92]:
from langchain.tools import Tool, tool
import calendar
import dateutil.parser as parser
from datetime import date


@tool("weekday")
def weekday(date_str: str) -> str:
    """Convert date to weekday name"""
    d = parser.parse(date_str)
    return calendar.day_name[d.weekday()]


In [93]:
from langchain.agents import load_tools

tools = load_tools(["serpapi"])
tools += [weekday]


### 5.3 智能体类型：ReAct


<img src="ReAct.png" style="margin-left: 0px" width=500px>


In [None]:
!pip install google-search-results


In [94]:
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.agents import AgentType
from langchain.agents import initialize_agent

llm = ChatOpenAI(model_name='gpt-4', temperature=0)

agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
agent.run("周杰伦生日那天是星期几")




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m我需要知道周杰伦的生日是哪一天，然后我可以使用weekday工具来找出那天是星期几。
Action: Search
Action Input: 周杰伦生日是哪一天[0m
Observation: [36;1m[1;3mJanuary 18, 1979[0m
Thought:[32;1m[1;3m我现在知道周杰伦的生日是1月18日，1979年。我可以使用weekday工具来找出那天是星期几。
Action: weekday
Action Input: January 18, 1979[0m
Observation: [33;1m[1;3mThursday[0m
Thought:[32;1m[1;3m我现在知道周杰伦的生日那天是星期四。
Final Answer: 星期四[0m

[1m> Finished chain.[0m


'星期四'

### 5.4 通过 OpenAI Function Calling 实现智能体


In [95]:
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.agents import AgentType
from langchain.agents import initialize_agent

llm = ChatOpenAI(model_name='gpt-4-0613', temperature=0)

agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    verbose=True,
    max_iterations=2,
    early_stopping_method="generate",
)
agent.run("周杰伦生日那天是星期几")




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Search` with `周杰伦的生日`


[0m[36;1m[1;3mJanuary 18, 1979[0m[32;1m[1;3m
Invoking: `weekday` with `{'date_str': '1979-01-18'}`


[0m[33;1m[1;3mThursday[0m[32;1m[1;3m周杰伦的生日（1979年1月18日）是星期四。[0m

[1m> Finished chain.[0m


'周杰伦的生日（1979年1月18日）是星期四。'

### 5.5 智能体类型：SelfAskWithSearch

In [97]:
from langchain import OpenAI, SerpAPIWrapper
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType

llm = OpenAI(temperature=0)
search = SerpAPIWrapper()
tools = [
    Tool(
        name="Intermediate Answer",
        func=search.run,
        description="useful for when you need to ask with search",
    )
]

self_ask_with_search = initialize_agent(
    tools, llm, agent=AgentType.SELF_ASK_WITH_SEARCH, verbose=True
)
self_ask_with_search.run(
    "冯小刚的老婆演过什么电影"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m Yes.
Follow up: Who is the wife of Feng Xiaogang?[0m
Intermediate answer: [36;1m[1;3mHe married actress Xu Fan in 1999. FilmographyEdit. As directorEdit. Year, English Title ...[0m
[32;1m[1;3mFollow up: What movies has Xu Fan acted in?[0m
Intermediate answer: [36;1m[1;3mXu Fan is a Chinese actress and Asian Film Awards winner. She married film director Feng Xiaogang in 1999 and has starred in a number of films and television series directed by her husband.[0m
[32;1m[1;3mSo the final answer is: Xu Fan has starred in a number of films and television series directed by her husband.[0m

[1m> Finished chain.[0m


'Xu Fan has starred in a number of films and television series directed by her husband.'

### 5.6 智能体类型：Plan-and-Execute


<img src="PlanExec.png" style="margin-left: 0px" width=500px>


In [None]:
!pip install langchain-experimental


In [None]:
from langchain.utilities.wolfram_alpha import WolframAlphaAPIWrapper
from langchain.agents import load_tools
from langchain import SerpAPIWrapper
from langchain.agents.tools import Tool
from langchain.llms import OpenAI
from langchain_experimental.plan_and_execute import PlanAndExecute, load_agent_executor, load_chat_planner
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationSummaryMemory

llm = ChatOpenAI(model_name='gpt-4', temperature=0)

search = SerpAPIWrapper(params={
    'engine': 'google', 
    'gl': 'cn', 
    'google_domain': 'google.com.hk', 
    'hl': 'zh-cn'
})

tools = [
    Tool(
        name="Search",
        func=search.run,
        description="useful for when you need to answer questions about current events"
    )
]

planner = load_chat_planner(llm)
executor = load_agent_executor(llm, tools, verbose=True)
agent = PlanAndExecute(planner=planner, executor=executor, verbose=True)

agent.run("分析北京明天天气，与上海明天天气对比，用中文写一遍报告")




[1m> Entering new PlanAndExecute chain...[0m
steps=[Step(value='Access a reliable weather forecasting website or API to gather the weather data for Beijing and Shanghai for tomorrow.'), Step(value='Analyze the weather data for both cities, focusing on key aspects such as temperature, humidity, wind speed, and weather conditions (sunny, cloudy, rainy, etc.).'), Step(value='Compare the weather data of the two cities, highlighting the similarities and differences.'), Step(value='Write a report in Chinese, summarizing the weather forecast for both cities and the comparison between them.'), Step(value='Review the report to ensure it is accurate and clear.'), Step(value='Given the above steps taken, please respond to the users original question.\n')]

[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: The assistant needs to access a reliable weather forecasting website or API to gather the weather data for Beijing and Shanghai for tomorrow. However, the assistant does no

## 六、Callbacks

回调函数，用于监测、记录调用过程中的信息


In [None]:
class BaseCallbackHandler:
    """Base callback handler that can be used to handle callbacks from langchain."""

    def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> Any:
        """Run when LLM starts running."""

    def on_chat_model_start(
        self, serialized: Dict[str, Any], messages: List[List[BaseMessage]], **kwargs: Any
    ) -> Any:
        """Run when Chat Model starts running."""

    def on_llm_new_token(self, token: str, **kwargs: Any) -> Any:
        """Run on new LLM token. Only available when streaming is enabled."""

    def on_llm_end(self, response: LLMResult, **kwargs: Any) -> Any:
        """Run when LLM ends running."""

    def on_llm_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> Any:
        """Run when LLM errors."""

    def on_chain_start(
        self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
    ) -> Any:
        """Run when chain starts running."""

    def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> Any:
        """Run when chain ends running."""

    def on_chain_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> Any:
        """Run when chain errors."""

    def on_tool_start(
        self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
    ) -> Any:
        """Run when tool starts running."""

    def on_tool_end(self, output: str, **kwargs: Any) -> Any:
        """Run when tool ends running."""

    def on_tool_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> Any:
        """Run when tool errors."""

    def on_text(self, text: str, **kwargs: Any) -> Any:
        """Run on arbitrary text."""

    def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:
        """Run on agent action."""

    def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> Any:
        """Run on agent end."""


In [1]:
from langchain.callbacks import StdOutCallbackHandler
from langchain.callbacks.base import BaseCallbackHandler
from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from typing import List, Dict, Any


class myhandler(BaseCallbackHandler):
    def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> Any:
        print(f"Feed LLM with {prompts}")

    def on_chain_start(
        self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
    ) -> Any:
        print(f"Chain Start: {inputs}")

    def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> Any:
        print(f"Done!")

    def on_text(self, text: str, **kwargs: Any) -> Any:
        print(f"On text: {text}")


handler = myhandler()

llm = OpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")

# 在构造的时候加回调，只触发这个对象对应的事件
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler])
x = chain.run(number=1)

# 在运行的时候加回调，触发这个过程的所有子过程的事件
chain = LLMChain(llm=llm, prompt=prompt)
x = chain.run(number=2, callbacks=[handler])


Chain Start: {'number': 1}
On text: Prompt after formatting:
[32;1m[1;3m1 + 1 = [0m
Done!
Chain Start: {'number': 2}
On text: Prompt after formatting:
[32;1m[1;3m1 + 2 = [0m
Feed LLM with ['1 + 2 = ']
Done!


## 大模型时代软件的演变趋势

<img src="agent.png" style="margin-left: 0px" width=600px>

<div class="alert alert-warning">
<b>思考：</b>
<ul>
<li>从软件工程的角度，LangChain现阶段的缺点是什么</li>
<li>距离智能体大规模应用，我们还有什么没解决的问题</li>
<li>你觉得智能体有哪些可优化的方向</li>
</ul>
</div>

## LangFlow

<img src="https://github.com/logspace-ai/langflow/raw/main/img/langflow-demo.gif?raw=true" style="margin-left: 0px">

https://github.com/logspace-ai/langflow


## 作业

做个自己的 [ChatPDF(https://www.chatpdf.com/)](https://www.chatpdf.com/)（需科学访问）。需求：

1. 从本地加载 PDF 文件，基于 PDF 的内容对话
2. 可以无前端，只要能在命令行运行就行
3. 其它随意发挥


## 课后调查

请点击链接或扫码填写问卷，帮助我们持续改进课程内容。谢谢！

https://agiclass.feishu.cn/share/base/form/shrcnU6ywdMDS2caxf6gp7dYQ63

<img src="../survey.png" width="200" style="margin-left: 0px">
