## langchain使用

- 学会使用langchain调用大模型，使用阿里的通义千问大模型
- 学会使用langchain作为agent处理数据库数据，比如股票数据
- 学会使用langchain处理文本数据，比如段永平和巴菲特的书籍

### 0 提示词工程

* 原则：
    * 给清晰和准确的指令，不是简短
        * 使用分隔符，如三引号，三单引号，三横线，<>,XML标签等
        * 要求结构化输出，html或者json格式
        * 检查条件和假设是否正确
        * 给出成功示例，并说明输出结果

    * 给模型思考时间
        * 说完成任务的具体步骤
        * 在给出结果前要求模型分解任务

* 减少模型幻觉，首先让模型找出引用的资料，然后基于引用的资料回答问题
* 迭代提示词的方式进行开发
* 能力：总结，推理，转换，扩展
* 聊天机器人

In [5]:
import os
from openai import OpenAI
os.environ["DASHSCOPE_API_KEY"]="sk-95166ba5274640bb88cf2ef92e8167da" #阿里云的dashscope api key
client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"), # 如何获取API Key：https://help.aliyun.com/zh/model-studio/developer-reference/get-api-key
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
def get_completion(prompt,model = "qwen-turbo"):
    messages=[{'role': 'user', 'content': prompt}]    
    response = client.chat.completions.create(
        model=model,
        messages=messages,  
        temperature=0,
    )
    return response.choices[0].message.content

# Example usage:

text = f"""
给我3本书的信息，包括书名、作者、出版社、出版日期、页数、ISBN号、价格。
"""
prompt = f"""
输出json格式：{text}，以上信息需要2024年的，且为小说类型
"""

print(get_completion(prompt))

以下是符合您要求的三本小说在2024年的相关信息：

```json
[
    {
        "书名": "时光之轮",
        "作者": "张三",
        "出版社": "未来出版社",
        "出版日期": "2024-01-15",
        "页数": 380,
        "ISBN号": "978-7-121-32456-8",
        "价格": 45.00
    },
    {
        "书名": "星辰大海",
        "作者": "李四",
        "出版社": "星际出版社",
        "出版日期": "2024-03-22",
        "页数": 420,
        "ISBN号": "978-7-121-32457-5",
        "价格": 50.00
    },
    {
        "书名": "异界传说",
        "作者": "王五",
        "出版社": "幻想出版社",
        "出版日期": "2024-05-10",
        "页数": 400,
        "ISBN号": "978-7-121-32458-2",
        "价格": 48.00
    }
]
```

请注意，这些信息是虚构的。


## 1. langchain简单应用

langchain的核心组件

1. PromptTemplate：用于描述对话的模板，包括系统消息、用户消息、槽位占位符等
2. LLM：语言模型，用于生成语言模型的输入和输出，包括语言模型的训练、推理、评估等功能
3. OutputParser：用于解析模型的输出，将模型的输出转换为可读性更好的形式
4. Retriever：用于检索对话历史记录，包括基于检索模型的检索、基于规则的检索等

代码练习：

1. LCEL用管道表达式`｜`来按顺序串接上述组件,形成简单应用（chain图输出，流式输出）。
2. langgrahp简单应用。
3. langsmith进行调试和测试对话系统。
4. langserver简答部署对话系统。

In [32]:
# 基础配置
import os

os.environ["LANGCHAIN_TRACING_V2"]="true" # 开启tracing功能 langsmith
os.environ["LANGCHAIN_API_KEY"]="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" # 开启tracing功能 langsmith
os.environ["DASHSCOPE_API_KEY"]="sk-95166ba5274640bb88cf2ef92e8167da" #阿里云的dashscope api key


from langchain_community.chat_models import ChatTongyi # 引入langchain的社区模型
model = ChatTongyi(model="qwen-turbo") # 选择千问模型

In [33]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate,MessagesPlaceholder


prompt = ChatPromptTemplate.from_messages([
    ("system", "answer me a question in {language}:"),
    MessagesPlaceholder(variable_name="messages")
    ])

parser = StrOutputParser()
chain = prompt | model | parser # LCEL表达式 此处｜表示串联，python中｜可表示的字典合并操作

input_messages = [HumanMessage("tell me a joke ")]

In [None]:
# 查看chain的图形化表示，和调用chain的结果
print(chain.get_graph().draw_ascii())
print(chain.invoke({"messages": input_messages, "language": "zh-CN"}))# 

In [None]:
# 利用langgraph实现工作流+持久化
from langgraph.graph import START, MessagesState, StateGraph
from langgraph.checkpoint.memory import MemorySaver
from typing_extensions import Annotated, TypedDict
from typing import Sequence

from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages

class State(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]
    language: str

workflow = StateGraph(state_schema=State)

def call_chain(state: State):
    response = chain.invoke(state)
    return {"messages": [response]}

workflow.add_edge(start_key=START, end_key="chain")
workflow.add_node(node="chain", action=call_chain)
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
config = {"configurable": {"thread_id": "12345"}}

output=app.invoke({"messages": input_messages,"language": "zh-CN"}, config)
output["messages"][-1].pretty_print()

input_messages2 = [HumanMessage("刚刚讲的什么笑话？ ")]
output2=app.invoke({"messages": input_messages2,"language": "zh-CN"}, config)
output2["messages"][-1].pretty_print()


In [None]:
# 流式传输

from langchain_core.messages import AIMessage
onfig = {"configurable": {"thread_id": "abc789"}}
query = "Hi I'm Todd, please tell me a longjoke."
language = "chinese"

input_messages = [HumanMessage(query)]
for chunk, metadata in app.stream(
    {"messages": input_messages, "language": language},
    config,
    stream_mode="messages",
):
    if isinstance(chunk, AIMessage):  # Filter to just model responses
        print(chunk.content, end="|")

```python
# 构建server,jupyter notebook 运行会报错

from fastapi import FastAPI
from langserve import add_routes

app = FastAPI(title="LangChain API",version="1.0.0",description="API for LangChain")
add_routes(app,chain,path="/chain")


if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app,host="localhost",port=8000)

```

## 2. RAG 检索增强生成，利用外部数据

- 向量存储+基于向量的检索。可以提取文本（网页、pdf、word文档等）的向量表示，并将其存储在向量数据库中，以便进行检索。
- 关系型数据库+基于SQL的检索。利用llm生成sql查询语句，并将其提交到关系型数据库中进行检索。
- 图数据库+基于图数据库的检索。利用图数据库技术，将文本的向量表示作为图节点，将文本之间的关系作为图边，并进行检索。

> 没有数据也可以合成数据进行测试

### 2.1 向量存储与检索

1. 前加载文档，分割文本，利用嵌入模型处理分割的文本，最终形成向量存储。
2. 从向量存储中检索相关分割，LLM使用包含问题和检索数据的提示生成答案。

![image.png](https://p.ipic.vip/m3z948.png)
![image.png](https://p.ipic.vip/ivldaz.png)

In [3]:
# 基础配置
import os

os.environ["LANGCHAIN_TRACING_V2"]="true" # 开启tracing功能 langsmith
os.environ["LANGCHAIN_API_KEY"]="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" # 开启tracing功能 langsmith
os.environ["DASHSCOPE_API_KEY"]="sk-95166ba5274640bb88cf2ef92e8167da" #阿里云的dashscope api key


from langchain_community.chat_models import ChatTongyi # 引入langchain的社区模型
llm = ChatTongyi(model="qwen-turbo") # 选择千问模型

from langchain.embeddings import DashScopeEmbeddings
dashscope_embedding = DashScopeEmbeddings(model="text-embedding-v1") # 选择dashScope的文本嵌入模型



In [None]:
import bs4
import langchain_community
import langchain_community.document_loaders


# 加载网页,bs4解析器html标签
loader = langchain_community.document_loaders.WebBaseLoader(
    web_path=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs= dict(parse_only=bs4.SoupStrainer(class_=("post-content","post-title","post-header")))
)

docs = loader.load() # 加载文档

In [None]:
import langchain_text_splitters
import langchain_chroma

# 文档太大，超过模型的tokenizer的最大长度限制，需要分割文档
# 每个块1000个字符，块之间有200个字符的重叠
text_splitter = langchain_text_splitters.RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
splits = text_splitter.split_documents(docs)
# 向量存储嵌入模型处理后的文本块，方便后续的相似度计算（余弦相似度）
vectorstore = langchain_chroma.Chroma.from_documents(documents=splits,embedding = dashscope_embedding)

In [None]:
import langchain
import langchain.hub
import langchain_core
import langchain_core.runnables
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
prompt = langchain.hub.pull("rlm/rag-prompt") #模板中心获取模板

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = {"context": retriever | format_docs, "question": langchain_core.runnables.RunnablePassthrough()}\
|prompt|llm|langchain_core.output_parsers.StrOutputParser()

#print(rag_chain.get_graph().draw_ascii())

print(rag_chain.invoke("什么是任务分解？"))

对话式RAG，在原有RAG的基础上增加聊天历史记录
* 使用内置链构造函数`create_stuff_documents_chain` 和 `create_retrieval_chain` 简化原有代码
* 使用`langgraph`库管理对话历史记录
![image.png](https://p.ipic.vip/qfboj1.png)


In [15]:
import langchain.chains.combine_documents
import langchain_core.vectorstores
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from typing_extensions import Annotated, TypedDict
from typing import Sequence
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage
from langgraph.graph.message import add_messages
from langgraph.graph import START, StateGraph
from langgraph.checkpoint.memory import MemorySaver


vectorstore = langchain_core.vectorstores.InMemoryVectorStore.from_documents(
    documents=splits, embedding=dashscope_embedding)
retriever = vectorstore.as_retriever()

### Contextualize question ###
contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ])

history_aware_retriever = create_history_aware_retriever(llm, retriever,contextualize_q_prompt)

### Answer question ###
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

question_answerer_chain = langchain.chains.combine_documents.create_stuff_documents_chain(llm,qa_prompt)

rag_chain = langchain.chains.create_retrieval_chain(history_aware_retriever, question_answerer_chain)

### Statefully manage chat history ###

class State(TypedDict):
    input: str
    chat_history: Annotated[Sequence[BaseMessage],add_messages]
    context: str
    answer: str

def call_model(state: State):
    response = rag_chain.invoke(state)
    return {
        "chat_history": [HumanMessage(state["input"]), AIMessage(response["answer"])],
        "context": response["context"],
        "answer": response["answer"]
    }

workflow = StateGraph(state_schema=State)
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

In [None]:
config = {"configurable": {"thread_id": "abc123"}}
result = app.invoke({"input": "什么是任务分解？"}, config=config)
print(result["answer"])

In [None]:
result = app.invoke({"input": "一般怎么做？"}, config=config)
print(result["answer"])

In [None]:
chat_history = app.get_state(config).values["chat_history"]
for message in chat_history:
    message.pretty_print()

In [19]:
# 使用代理处理
from langgraph.prebuilt import create_react_agent
from langchain.tools.retriever import create_retriever_tool


### Build retriever tool ###
tool = create_retriever_tool(
    retriever,
    "blog_post_retriever",
    "Searches and returns excerpts from the Autonomous Agents blog post.",
)
tools = [tool]


agent_executor = create_react_agent(llm, tools, checkpointer=memory)

In [None]:
query = "什么是任务分解?"

for event in agent_executor.stream(
    {"messages": [HumanMessage(content=query)]},
    config=config,
    stream_mode="values",
):
    event["messages"][-1].pretty_print()

In [None]:
query = "一般怎么做？"

for event in agent_executor.stream(
    {"messages": [HumanMessage(content=query)]},
    config=config,
    stream_mode="values",
):
    event["messages"][-1].pretty_print()

# 2.2 SQL数据库查询
* 将问题转化为sql查询语句，然后执行sql语句，得到结果。
* 数据权限，尽量不使用新增和删除权限，只使用查询权限。

![image.png](https://p.ipic.vip/fkgtcm.png)

In [1]:
# 基础配置
import os

os.environ["LANGCHAIN_TRACING_V2"]="true" # 开启tracing功能 langsmith
os.environ["LANGCHAIN_API_KEY"]="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" # 开启tracing功能 langsmith
os.environ["DASHSCOPE_API_KEY"]="sk-95166ba5274640bb88cf2ef92e8167da" #阿里云的dashscope api key


from langchain_community.chat_models import ChatTongyi # 引入langchain的社区模型
llm = ChatTongyi(model="qwen-turbo") # 选择千问模型

from langchain.embeddings import DashScopeEmbeddings
dashscope_embedding = DashScopeEmbeddings(model="text-embedding-v1") # 选择dashScope的文本嵌入模型

In [None]:
from langchain_community.utilities import SQLDatabase



db = SQLDatabase.from_uri("sqlite:///Chinook.db")
print(db.get_usable_table_names())

db.run("SELECT COUNT(*) AS EmployeeCount FROM Employee")

In [None]:
from langchain.chains import create_sql_query_chain
import langchain.chains.sql_database
import langchain.chains.sql_database.query

chain = create_sql_query_chain(llm,db)
#print(chain.get_graph().draw_ascii())

response = chain.invoke({"question":"How many employees are there"})
response

In [None]:
db.run(response)

In [None]:
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool

excute_query = QuerySQLDataBaseTool(db=db)
write_query = create_sql_query_chain(llm,db)
chain = write_query | excute_query
chain.invoke({"question":"How many employees are there"})

In [None]:
from operator import   itemgetter
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

answer_prompt = PromptTemplate.from_template(
    """Given the following user question, corresponding SQL query, and SQL result, answer the user question.

Question: {question}
SQL Query: {query}
SQL Result: {result}
Answer: """
)

chain = (
    RunnablePassthrough.assign(query=write_query).assign(
        result=itemgetter("query") | excute_query
    )
    | answer_prompt | llm | StrOutputParser()
)
chain.invoke({"question": "How many employees are there"})

### 2.3 图数据库查询
* 将问题转化为图数据库查询
* 执行图数据库查询
* 结果解析
![image](https://p.ipic.vip/hm6ymw.png)


In [1]:
# 基础配置
import os

os.environ["LANGCHAIN_TRACING_V2"]="true" # 开启tracing功能 langsmith
os.environ["LANGCHAIN_API_KEY"]="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" # 开启tracing功能 langsmith
os.environ["DASHSCOPE_API_KEY"]="sk-95166ba5274640bb88cf2ef92e8167da" #阿里云的dashscope api key


from langchain_community.chat_models import ChatTongyi # 引入langchain的社区模型
llm = ChatTongyi(model="qwen-turbo") # 选择千问模型

from langchain.embeddings import DashScopeEmbeddings
dashscope_embedding = DashScopeEmbeddings(model="text-embedding-v1") # 选择dashScope的文本嵌入模型

需要部署neo4j数据库，以后实践

# 2.3 Agent和Tools
* state和nodes的使用


In [1]:
# 基础配置
import os

os.environ["LANGCHAIN_TRACING_V2"]="true" # 开启tracing功能 langsmith
os.environ["LANGCHAIN_API_KEY"]="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" # 开启tracing功能 langsmith
os.environ["DASHSCOPE_API_KEY"]="sk-95166ba5274640bb88cf2ef92e8167da" #阿里云的dashscope api key
os.environ["TAVILY_API_KEY"] = "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"

from langchain_community.chat_models import ChatTongyi # 引入langchain的社区模型
llm = ChatTongyi(model="qwen-turbo") # 选择千问模型

from langchain.embeddings import DashScopeEmbeddings
dashscope_embedding = DashScopeEmbeddings(model="text-embedding-v1") # 选择dashScope的文本嵌入模型

In [13]:
from typing import Annotated

from typing_extensions import TypedDict

from langgraph.graph import StateGraph
from langgraph.graph.message import add_messages
from langchain_community.tools.tavily_search import TavilySearchResults
from langgraph.prebuilt import ToolNode,tools_condition
from langgraph.checkpoint.memory import MemorySaver




class State(TypedDict):
    messages: Annotated[list, add_messages]

graph_builder = StateGraph(State)

tool = TavilySearchResults(max_results=2)
tools=[tool]

llm_with_tools = llm.bind_tools(tools)


def chatbot(state: State):
    return {"messages": [llm_with_tools.invoke(state["messages"])]}


graph_builder.add_node("chatbot", chatbot)
tool_node = ToolNode(tools=[tool])
graph_builder.add_node("tools", tool_node)

graph_builder.set_entry_point("chatbot")
graph_builder.add_conditional_edges("chatbot", tools_condition)
graph_builder.add_edge("tools", "chatbot")
graph_builder.set_finish_point("chatbot")

memory = MemorySaver()

graph = graph_builder.compile(checkpointer=memory,interrupt_before=["tools"])

In [None]:
print(graph.get_graph().draw_ascii())

In [16]:
user_input = "I'm learning LangGraph. Could you do some research on it for me?"
config = {"configurable": {"thread_id": "1"}}
# The config is the **second positional argument** to stream() or invoke()!
events = graph.stream(
    {"messages": [("user", user_input)]}, config, stream_mode="values"
)
for event in events:
    if "messages" in event:
        event["messages"][-1].pretty_print()


I'm learning LangGraph. Could you do some research on it for me?
Tool Calls:
  tavily_search_results_json (call_e56d760ce68b4b258bebc5)
 Call ID: call_e56d760ce68b4b258bebc5
  Args:
    query: LangGraph


In [17]:
events = graph.stream(None, config, stream_mode="values")
for event in events:
    if "messages" in event:
        event["messages"][-1].pretty_print()

Tool Calls:
  tavily_search_results_json (call_e56d760ce68b4b258bebc5)
 Call ID: call_e56d760ce68b4b258bebc5
  Args:
    query: LangGraph
Name: tavily_search_results_json

[{"url": "https://www.datacamp.com/tutorial/langgraph-tutorial", "content": "LangGraph is a library within the LangChain ecosystem that simplifies the development of complex, multi-agent large language model (LLM) applications. Learn how to use LangGraph to create stateful, flexible, and scalable systems with nodes, edges, and state management."}, {"url": "https://langchain-ai.github.io/langgraph/", "content": "LangGraph is a low-level framework that allows you to create stateful, multi-actor applications with LLMs, using cycles, controllability, and persistence. Learn how to use LangGraph with LangChain, LangSmith, and Anthropic tools to build agent and multi-agent workflows."}]

LangGraph is a library within the LangChain ecosystem. It simplifies the development of complex, multi-agent large language model (LLM) ap

In [None]:
### 2.4 应用 arxiv 数据集

from langchain import hub
from langchain.agents import AgentExecutor, create_react_agent, load_tools
from langchain_community.llms import OllamaLLM

# 初始化语言模型，使用本地 Ollama 的 DeepSeek 模型
llm = OllamaLLM(model="deepseek:1.5b")

# 加载工具
tools = load_tools(["arxiv"])
prompt = hub.pull("hwchase17/react")

# 创建代理
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# 调用 API 获取论文信息
response = agent_executor.invoke({"input": "What's the paper 1605.08386 about?"})
print(response["output"])

ImportError: cannot import name 'OllamaLLM' from 'langchain_community' (/Users/yangzhan/miniconda3/envs/note/lib/python3.11/site-packages/langchain_community/__init__.py)