<a href="https://colab.research.google.com/github/weedge/doraemon-nb/blob/main/Qwen_Langchain_Chain_of_Thought.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 如何让 Qwen 使用 Langchain 中的工具

主要介绍如何让千问调用 LangChain 框架中实现好的谷歌搜索、 WolframAlpha 等工具。将主要基于 ReAct Prompting 技术，一种特殊的链式思考（Chain-of-Thought，简称 CoT）提示技巧，来实现这一目的。



In [None]:
!git clone https://github.com/QwenLM/Qwen.git

Cloning into 'Qwen'...
remote: Enumerating objects: 1458, done.[K
remote: Counting objects: 100% (685/685), done.[K
remote: Compressing objects: 100% (394/394), done.[K
remote: Total 1458 (delta 507), reused 392 (delta 289), pack-reused 773[K
Receiving objects: 100% (1458/1458), 35.31 MiB | 21.24 MiB/s, done.
Resolving deltas: 100% (855/855), done.


In [None]:
!cd Qwen/ && pip install -r requirements.txt

Collecting transformers==4.32.0 (from -r requirements.txt (line 1))
  Downloading transformers-4.32.0-py3-none-any.whl (7.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.5/7.5 MB[0m [31m16.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate (from -r requirements.txt (line 2))
  Downloading accelerate-0.25.0-py3-none-any.whl (265 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m27.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken (from -r requirements.txt (line 3))
  Downloading tiktoken-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m33.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting einops (from -r requirements.txt (line 4))
  Downloading einops-0.7.0-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0

In [None]:
# 安装 langchain 相关依赖
!pip install langchain==0.0.288 google-search-results wolframalpha arxiv;


Collecting langchain==0.0.288
  Downloading langchain-0.0.288-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting google-search-results
  Downloading google_search_results-2.4.2.tar.gz (18 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting wolframalpha
  Downloading wolframalpha-5.0.0-py3-none-any.whl (7.5 kB)
Collecting arxiv
  Downloading arxiv-2.1.0-py3-none-any.whl (11 kB)
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain==0.0.288)
  Downloading dataclasses_json-0.5.14-py3-none-any.whl (26 kB)
Collecting langsmith<0.1.0,>=0.0.21 (from langchain==0.0.288)
  Downloading langsmith-0.0.71-py3-none-any.whl (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.2/46.2 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
Collecting xmltodict (from wolframalpha)
  Downloading xmltodict-0.13.0-py2.py3-none-any.whl (10.0 kB)
Collecting jaraco.con

## 第零步 - 导入 LangChain 的工具
以下引入几个常用 APIs 作为演示：

- 谷歌搜索API https://serpapi.com/manage-api-key
- WolframAlpha https://products.wolframalpha.com/api/
- arxiv论文搜索
- python shell (需升级python至3.9以上使用)

注1：此处推荐模仿此案例，细致地构造给千问看的工具描述。

注2：谷歌搜索（SERPAPI）， WolframAlpha 需自行申请它们的 API_KEY 后才能使用。



In [None]:
from langchain import SerpAPIWrapper
from langchain.utilities.wolfram_alpha import WolframAlphaAPIWrapper
from langchain.utilities import ArxivAPIWrapper
from langchain.tools.python.tool import PythonAstREPLTool

from typing import Dict, Tuple
import os
import json

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
from google.colab import userdata

# 为了使用谷歌搜索（SERPAPI）， WolframAlpha，您需要自行申请它们的 API KEY，然后填入此处
os.environ['SERPAPI_API_KEY'] = userdata.get('SERPAPI_API_KEY')
os.environ['WOLFRAM_ALPHA_APPID'] = userdata.get('WOLFRAM_ALPHA_APPID')

search = SerpAPIWrapper()
WolframAlpha = WolframAlphaAPIWrapper()
arxiv = ArxivAPIWrapper()
python=PythonAstREPLTool()

def tool_wrapper_for_qwen(tool):
    def tool_(query):
        query = json.loads(query)["query"]
        return tool.run(query)
    return tool_

# 以下是给千问看的工具描述：
TOOLS = [
    {
        'name_for_human':
            'google search',
        'name_for_model':
            'Search',
        'description_for_model':
            'useful for when you need to answer questions about current events.',
        'parameters': [{
            "name": "query",
            "type": "string",
            "description": "search query of google",
            'required': True
        }],
        'tool_api': tool_wrapper_for_qwen(search)
    },
    {
        'name_for_human':
            'Wolfram Alpha',
        'name_for_model':
            'Math',
        'description_for_model':
            'Useful for when you need to answer questions about Math, Science, Technology, Culture, Society and Everyday Life.',
        'parameters': [{
            "name": "query",
            "type": "string",
            "description": "the problem to solved by Wolfram Alpha",
            'required': True
        }],
        'tool_api': tool_wrapper_for_qwen(WolframAlpha)
    },
    {
        'name_for_human':
            'arxiv',
        'name_for_model':
            'Arxiv',
        'description_for_model':
            'A wrapper around Arxiv.org Useful for when you need to answer questions about Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, Statistics, Electrical Engineering, and Economics from scientific articles on arxiv.org.',
        'parameters': [{
            "name": "query",
            "type": "string",
            "description": "the document id of arxiv to search",
            'required': True
        }],
        'tool_api': tool_wrapper_for_qwen(arxiv)
    },
    {
        'name_for_human':
            'python',
        'name_for_model':
            'python',
        'description_for_model':
            "A Python shell. Use this to execute python commands. When using this tool, sometimes output is abbreviated - Make sure it does not look abbreviated before using it in your answer. "
            "Don't add comments to your python code.",
        'parameters': [{
            "name": "query",
            "type": "string",
            "description": "a valid python command.",
            'required': True
        }],
        'tool_api': tool_wrapper_for_qwen(python)
    }

]


## 第一步：让千问判断调用什么工具，生成工具入参
根据prompt模版、query、工具的信息构建prompt



In [None]:
TOOL_DESC = """{name_for_model}: Call this tool to interact with the {name_for_human} API. What is the {name_for_human} API useful for? {description_for_model} Parameters: {parameters} Format the arguments as a JSON object."""

REACT_PROMPT = """Answer the following questions as best you can. You have access to the following tools:

{tool_descs}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can be repeated zero or more times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {query}"""

def build_planning_prompt(TOOLS, query):
    tool_descs = []
    tool_names = []
    for info in TOOLS:
        tool_descs.append(
            TOOL_DESC.format(
                name_for_model=info['name_for_model'],
                name_for_human=info['name_for_human'],
                description_for_model=info['description_for_model'],
                parameters=json.dumps(
                    info['parameters'], ensure_ascii=False),
            )
        )
        tool_names.append(info['name_for_model'])
    tool_descs = '\n\n'.join(tool_descs)
    tool_names = ','.join(tool_names)

    prompt = REACT_PROMPT.format(tool_descs=tool_descs, tool_names=tool_names, query=query)
    return prompt

prompt_1 = build_planning_prompt(TOOLS[0:1], query="加拿大2023年人口统计数字是多少？")
print(prompt_1)


Answer the following questions as best you can. You have access to the following tools:

Search: Call this tool to interact with the google search API. What is the google search API useful for? useful for when you need to answer questions about current events. Parameters: [{"name": "query", "type": "string", "description": "search query of google", "required": true}] Format the arguments as a JSON object.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Search]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can be repeated zero or more times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: 加拿大2023年人口统计数字是多少？


In [None]:
!pip install auto-gptq==0.6.0+cu118 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
!pip install -U optimum==1.16.1


Looking in indexes: https://pypi.org/simple, https://huggingface.github.io/autogptq-index/whl/cu118/
Collecting auto-gptq
  Downloading https://huggingface.github.io/autogptq-index/whl/cu118/auto-gptq/auto_gptq-0.6.0%2Bcu118-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.0/5.0 MB[0m [31m38.8 MB/s[0m eta [36m0:00:00[0m
Collecting datasets (from auto-gptq)
  Downloading datasets-2.15.0-py3-none-any.whl (521 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentencepiece (from auto-gptq)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m34.1 MB/s[0m eta [36m0:00:00[0m
Collecting rouge (from auto-gptq)
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Collecting gekko 

In [None]:
!pip install -U transformers==4.36.1

Collecting transformers==4.36.1
  Downloading transformers-4.36.1-py3-none-any.whl (8.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.3/8.3 MB[0m [31m54.6 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.19,>=0.14 (from transformers==4.36.1)
  Downloading tokenizers-0.15.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m62.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.13.3
    Uninstalling tokenizers-0.13.3:
      Successfully uninstalled tokenizers-0.13.3
  Attempting uninstall: transformers
    Found existing installation: transformers 4.32.0
    Uninstalling transformers-4.32.0:
      Successfully uninstalled transformers-4.32.0
Successfully installed tokenizers-0.15.0 transformers-4.36.1


In [None]:
import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding


In [None]:
# FlashAttention only supports Ampere GPUs or newer. A100+
!git clone https://github.com/Dao-AILab/flash-attention
!cd flash-attention && pip install .


Cloning into 'flash-attention'...
remote: Enumerating objects: 4411, done.[K
remote: Counting objects: 100% (2027/2027), done.[K
remote: Compressing objects: 100% (269/269), done.[K
remote: Total 4411 (delta 1835), reused 1771 (delta 1757), pack-reused 2384[K
Receiving objects: 100% (4411/4411), 6.91 MiB | 13.61 MiB/s, done.
Resolving deltas: 100% (3086/3086), done.
Processing /content/flash-attention
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting ninja (from flash-attn==2.3.6)
  Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.2/307.2 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: flash-attn
  Building wheel for flash-attn (setup.py) ... [?25l[?25hdone
  Created wheel for flash-attn: filename=flash_attn-2.3.6-cp310-cp310-linux_x86_64.whl size=56477261 sha256=652ad256d0891cb2c6d7183f96f7f56ff61cdeee24388381abb35e7a0

In [None]:
# on T4 GPU don't use flash attention
!pip uninstall -y flash-attn

Found existing installation: flash-attn 2.3.6
Uninstalling flash-attn-2.3.6:
  Successfully uninstalled flash-attn-2.3.6


In [None]:
# https://huggingface.co/Qwen
# on T4-16G single GPU, use those Qwen LLM
#checkpoint = "Qwen/Qwen-1_8B-Chat"
#checkpoint = "Qwen/Qwen-7B-Chat-Int4"
#checkpoint = "Qwen/Qwen-7B-Chat-Int8"
#checkpoint = "Qwen/Qwen-7B-Chat" # if use inference chat/generate, maybe OOM
checkpoint = "Qwen/Qwen-14B-Chat-Int4"

TOKENIZER = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)
MODEL = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", trust_remote_code=True).eval()
MODEL.generation_config = GenerationConfig.from_pretrained(checkpoint, trust_remote_code=True)
MODEL.generation_config.do_sample = True  # greedy




Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

In [None]:
!ldconfig /usr/lib64-nvidia

/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link



In [None]:
!ldconfig -p | grep libcuda

	libcudart.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12
	libcudart.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so
	libcudadebugger.so.1 (libc6,x86-64) => /usr/lib64-nvidia/libcudadebugger.so.1
	libcuda.so.1 (libc6,x86-64) => /usr/lib64-nvidia/libcuda.so.1
	libcuda.so (libc6,x86-64) => /usr/lib64-nvidia/libcuda.so


In [None]:
stop = ["Observation:", "Observation:\n"]
react_stop_words_tokens = [TOKENIZER.encode(stop_) for stop_ in stop]
response_1, _ = MODEL.chat(TOKENIZER, prompt_1, history=None, stop_words_ids=react_stop_words_tokens)
print(response_1)


Thought: 我需要使用搜索引擎查找加拿大2023年的人口统计数据。
Action: Search
Action Input: {"query": "加拿大2023年人口统计"}
Observation:


## 第二步：从千问的输出中解析需要使用的工具和入参，并调用对应工具


In [None]:
def parse_latest_plugin_call(text: str) -> Tuple[str, str]:
    i = text.rfind('\nAction:')
    j = text.rfind('\nAction Input:')
    k = text.rfind('\nObservation:')
    if 0 <= i < j:  # If the text has `Action` and `Action input`,
        if k < j:  # but does not contain `Observation`,
            # then it is likely that `Observation` is ommited by the LLM,
            # because the output text may have discarded the stop word.
            text = text.rstrip() + '\nObservation:'  # Add it back.
            k = text.rfind('\nObservation:')
    if 0 <= i < j < k:
        plugin_name = text[i + len('\nAction:'):j].strip()
        plugin_args = text[j + len('\nAction Input:'):k].strip()
        return plugin_name, plugin_args
    return '', ''

def use_api(tools, response):
    use_toolname, action_input = parse_latest_plugin_call(response)
    if use_toolname == "":
        return "no tool founds"

    used_tool_meta = list(filter(lambda x: x["name_for_model"] == use_toolname, tools))
    if len(used_tool_meta) == 0:
        return "no tool founds"
    print(used_tool_meta)
    api_output = used_tool_meta[0]["tool_api"](action_input)
    return api_output

api_output = use_api(TOOLS, response_1)
print(api_output)


[{'name_for_human': 'google search', 'name_for_model': 'Search', 'description_for_model': 'useful for when you need to answer questions about current events.', 'parameters': [{'name': 'query', 'type': 'string', 'description': 'search query of google', 'required': True}], 'tool_api': <function tool_wrapper_for_qwen.<locals>.tool_ at 0x7b027cb9c040>}]
根据加拿大统计局预测，加拿大人口今天（2023年6月16日）预计将超过4000万。 联邦统计局使用模型来实时估计加拿大的人口，该计数模型预计加拿大人口将在北美东部时间今天下午3点前达到4000万。 加拿大的人口增长率目前为2.7％。


## 第三步：让千问根据工具返回结果继续作答
拼接上述返回答案，形成新的prompt，并获得生成最终结果



In [None]:
prompt_2 = prompt_1 + response_1 + ' ' + api_output
stop = ["Observation:", "Observation:\n"]
react_stop_words_tokens = [TOKENIZER.encode(stop_) for stop_ in stop]
response_2, _ = MODEL.chat(TOKENIZER, prompt_2, history=None, stop_words_ids=react_stop_words_tokens)
print(prompt_2, response_2)


Answer the following questions as best you can. You have access to the following tools:

Search: Call this tool to interact with the google search API. What is the google search API useful for? useful for when you need to answer questions about current events. Parameters: [{"name": "query", "type": "string", "description": "search query of google", "required": true}] Format the arguments as a JSON object.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Search]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can be repeated zero or more times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: 加拿大2023年人口统计数字是多少？Thought: 我需要使用搜索引擎查找加拿大2023年的人口统计数据。
Action: Search
Action Input: {"query": "加拿大2023年人口统计"}
Observation: 根据加拿大统计局预测，

## 总结 - 串联起整个流程


In [None]:
def main(query, choose_tools):
    prompt = build_planning_prompt(choose_tools, query) # 组织prompt
    print(prompt)
    stop = ["Observation:", "Observation:\n"]
    react_stop_words_tokens = [TOKENIZER.encode(stop_) for stop_ in stop]
    response, _ = MODEL.chat(TOKENIZER, prompt, history=None, stop_words_ids=react_stop_words_tokens)

    while "Final Answer:" not in response: # 出现final Answer时结束
        api_output = use_api(choose_tools, response) # 抽取入参并执行api
        api_output = str(api_output) # 部分api工具返回结果非字符串格式需进行转化后输出
        if "no tool founds" == api_output:
            break
        print("\033[32m" + response + "\033[0m" + "\033[34m" + ' ' + api_output + "\033[0m")
        prompt = prompt + response + ' ' + api_output # 合并api输出
        response, _ = MODEL.chat(TOKENIZER, prompt, history=None, stop_words_ids=react_stop_words_tokens) # 继续生成

    print("\033[32m" + response + "\033[0m")


In [None]:
# 请尽可能控制备选工具数量
query = "加拿大2023年人口统计数字是多少？" # 所提问题
choose_tools = TOOLS # 选择备选工具
print("=" * 10)
main(query, choose_tools)



Answer the following questions as best you can. You have access to the following tools:

Search: Call this tool to interact with the google search API. What is the google search API useful for? useful for when you need to answer questions about current events. Parameters: [{"name": "query", "type": "string", "description": "search query of google", "required": true}] Format the arguments as a JSON object.

Math: Call this tool to interact with the Wolfram Alpha API. What is the Wolfram Alpha API useful for? Useful for when you need to answer questions about Math, Science, Technology, Culture, Society and Everyday Life. Parameters: [{"name": "query", "type": "string", "description": "the problem to solved by Wolfram Alpha", "required": true}] Format the arguments as a JSON object.

Arxiv: Call this tool to interact with the arxiv API. What is the arxiv API useful for? A wrapper around Arxiv.org Useful for when you need to answer questions about Physics, Mathematics, Computer Science, Qu

In [None]:
query = "求解方程 2x+5 = -3x + 7" # 所提问题
choose_tools = TOOLS # 选择备选工具
print("=" * 10)
main(query, choose_tools)



Answer the following questions as best you can. You have access to the following tools:

Search: Call this tool to interact with the google search API. What is the google search API useful for? useful for when you need to answer questions about current events. Parameters: [{"name": "query", "type": "string", "description": "search query of google", "required": true}] Format the arguments as a JSON object.

Math: Call this tool to interact with the Wolfram Alpha API. What is the Wolfram Alpha API useful for? Useful for when you need to answer questions about Math, Science, Technology, Culture, Society and Everyday Life. Parameters: [{"name": "query", "type": "string", "description": "the problem to solved by Wolfram Alpha", "required": true}] Format the arguments as a JSON object.

Arxiv: Call this tool to interact with the arxiv API. What is the arxiv API useful for? A wrapper around Arxiv.org Useful for when you need to answer questions about Physics, Mathematics, Computer Science, Qu

In [None]:
#query = "编号是1605.08386的论文讲了些什么？" # 所提问题
query = "编号是2210.17323的论文讲了些什么？" # 所提问题

choose_tools = TOOLS # 选择备选工具
print("=" * 10)
main(query, choose_tools)



Answer the following questions as best you can. You have access to the following tools:

Search: Call this tool to interact with the google search API. What is the google search API useful for? useful for when you need to answer questions about current events. Parameters: [{"name": "query", "type": "string", "description": "search query of google", "required": true}] Format the arguments as a JSON object.

Math: Call this tool to interact with the Wolfram Alpha API. What is the Wolfram Alpha API useful for? Useful for when you need to answer questions about Math, Science, Technology, Culture, Society and Everyday Life. Parameters: [{"name": "query", "type": "string", "description": "the problem to solved by Wolfram Alpha", "required": true}] Format the arguments as a JSON object.

Arxiv: Call this tool to interact with the arxiv API. What is the arxiv API useful for? A wrapper around Arxiv.org Useful for when you need to answer questions about Physics, Mathematics, Computer Science, Qu

In [None]:
query ="使用python对下面的列表进行排序： [2, 4135, 523, 2, 3]"
choose_tools = TOOLS # 选择备选工具
print("=" * 10)
main(query, choose_tools)


Answer the following questions as best you can. You have access to the following tools:

Search: Call this tool to interact with the google search API. What is the google search API useful for? useful for when you need to answer questions about current events. Parameters: [{"name": "query", "type": "string", "description": "search query of google", "required": true}] Format the arguments as a JSON object.

Math: Call this tool to interact with the Wolfram Alpha API. What is the Wolfram Alpha API useful for? Useful for when you need to answer questions about Math, Science, Technology, Culture, Society and Everyday Life. Parameters: [{"name": "query", "type": "string", "description": "the problem to solved by Wolfram Alpha", "required": true}] Format the arguments as a JSON object.

Arxiv: Call this tool to interact with the arxiv API. What is the arxiv API useful for? A wrapper around Arxiv.org Useful for when you need to answer questions about Physics, Mathematics, Computer Science, Qu

# ReAct Prompting 示例
对prompt instruct 模版进行了优化

In [None]:
!pip install json5 torch transformers



In [7]:
#ImportError: Loading GPTQ quantized model requires optimum library : `pip install optimum` and auto-gptq library 'pip install auto-gptq'
!pip install -U auto-gptq
!pip install -U optimum




In [9]:
!pip install -U transformers==4.36.1

Collecting transformers==4.36.1
  Downloading transformers-4.36.1-py3-none-any.whl (8.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.3/8.3 MB[0m [31m16.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.19,>=0.14 (from transformers==4.36.1)
  Downloading tokenizers-0.15.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m36.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.13.3
    Uninstalling tokenizers-0.13.3:
      Successfully uninstalled tokenizers-0.13.3
  Attempting uninstall: transformers
    Found existing installation: transformers 4.32.0
    Uninstalling transformers-4.32.0:
      Successfully uninstalled transformers-4.32.0
Successfully installed tokenizers-0.15.0 transformers-4.36.1


In [1]:
import json
import os

import json5
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig


In [2]:
# https://huggingface.co/Qwen
# u can restart session use diff LLM
# on T4-16G single GPU, use those Qwen LLM
#checkpoint = "Qwen/Qwen-1_8B-Chat"
#checkpoint = "Qwen/Qwen-7B-Chat-Int4"
#checkpoint = "Qwen/Qwen-7B-Chat-Int8"
checkpoint = "Qwen/Qwen-14B-Chat-Int4" # faster
#checkpoint = "Qwen/Qwen-7B-Chat" # if use inference chat/generate, maybe OOM; need gc.collect(),torch.cuda.empty_cache()


tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)
generation_config = GenerationConfig.from_pretrained(checkpoint, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", trust_remote_code=True).eval()
model.generation_config = generation_config
model.generation_config.top_k = 1

model.safetensors.index.json:   0%|          | 0.00/82.2k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/5 [00:00<?, ?it/s]

model-00001-of-00005.safetensors:   0%|          | 0.00/2.05G [00:00<?, ?B/s]

model-00002-of-00005.safetensors:   0%|          | 0.00/2.02G [00:00<?, ?B/s]

model-00003-of-00005.safetensors:   0%|          | 0.00/2.04G [00:00<?, ?B/s]

model-00004-of-00005.safetensors:   0%|          | 0.00/2.00G [00:00<?, ?B/s]

model-00005-of-00005.safetensors:   0%|          | 0.00/1.56G [00:00<?, ?B/s]



Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

In [3]:
!nvidia-smi --query-gpu=timestamp,memory.total,memory.free,memory.used,name,utilization.gpu,utilization.memory --format=csv -l 2


timestamp, memory.total [MiB], memory.free [MiB], memory.used [MiB], name, utilization.gpu [%], utilization.memory [%]
2023/12/18 15:56:43.448, 15360 MiB, 5487 MiB, 9615 MiB, Tesla T4, 0 %, 0 %
2023/12/18 15:56:45.450, 15360 MiB, 5487 MiB, 9615 MiB, Tesla T4, 0 %, 0 %
2023/12/18 15:56:47.450, 15360 MiB, 5487 MiB, 9615 MiB, Tesla T4, 0 %, 0 %


In [4]:
# 将一个插件的关键信息拼接成一段文本的模版。
TOOL_DESC = """{name_for_model}: Call this tool to interact with the {name_for_human} API. What is the {name_for_human} API useful for? {description_for_model} Parameters: {parameters}"""

# ReAct prompting 的 instruction 模版，将包含插件的详细信息。
PROMPT_REACT = """Answer the following questions as best you can. You have access to the following APIs:

{tools_text}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tools_name_text}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can be repeated zero or more times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {query}"""


In [5]:
# 将对话历史、插件信息聚合成一段初始文本
def build_input_text(chat_history, list_of_plugin_info) -> str:
    # 候选插件的详细信息
    tools_text = []
    for plugin_info in list_of_plugin_info:
        tool = TOOL_DESC.format(
            name_for_model=plugin_info["name_for_model"],
            name_for_human=plugin_info["name_for_human"],
            description_for_model=plugin_info["description_for_model"],
            parameters=json.dumps(plugin_info["parameters"], ensure_ascii=False),
        )
        if plugin_info.get('args_format', 'json') == 'json':
            tool += " Format the arguments as a JSON object."
        elif plugin_info['args_format'] == 'code':
            tool += ' Enclose the code within triple backticks (`) at the beginning and end of the code.'
        else:
            raise NotImplementedError
        tools_text.append(tool)
    tools_text = '\n\n'.join(tools_text)

    # 候选插件的代号
    tools_name_text = ', '.join([plugin_info["name_for_model"] for plugin_info in list_of_plugin_info])

    im_start = '<|im_start|>'
    im_end = '<|im_end|>'
    prompt = f'{im_start}system\nYou are a helpful assistant.{im_end}'
    for i, (query, response) in enumerate(chat_history):
        if list_of_plugin_info:  # 如果有候选插件
            # 倒数第一轮或倒数第二轮对话填入详细的插件信息，但具体什么位置填可以自行判断
            if (len(chat_history) == 1) or (i == len(chat_history) - 2):
                query = PROMPT_REACT.format(
                    tools_text=tools_text,
                    tools_name_text=tools_name_text,
                    query=query,
                )
        query = query.lstrip('\n').rstrip()  # 重要！若不 strip 会与训练时数据的构造方式产生差异。
        response = response.lstrip('\n').rstrip()  # 重要！若不 strip 会与训练时数据的构造方式产生差异。
        # 使用续写模式（text completion）时，需要用如下格式区分用户和AI：
        prompt += f"\n{im_start}user\n{query}{im_end}"
        prompt += f"\n{im_start}assistant\n{response}{im_end}"

    assert prompt.endswith(f"\n{im_start}assistant\n{im_end}")
    prompt = prompt[: -len(f'{im_end}')]
    return prompt


In [6]:
tools = [
    {
        'name_for_human': '谷歌搜索',
        'name_for_model': 'google_search',
        'description_for_model': '谷歌搜索是一个通用搜索引擎，可用于访问互联网、查询百科知识、了解时事新闻等。',
        'parameters': [
            {
                'name': 'search_query',
                'description': '搜索关键词或短语',
                'required': True,
                'schema': {'type': 'string'},
            }
        ],
    },
    {
        'name_for_human': '文生图',
        'name_for_model': 'image_gen',
        'description_for_model': '文生图是一个AI绘画（图像生成）服务，输入文本描述，返回根据文本作画得到的图片的URL',
        'parameters': [
            {
                'name': 'prompt',
                'description': '英文关键词，描述了希望图像具有什么内容',
                'required': True,
                'schema': {'type': 'string'},
            }
        ],
    },
]


In [7]:
prompt = build_input_text([("董宇辉是谁？","")],tools)
print(prompt)

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Answer the following questions as best you can. You have access to the following APIs:

google_search: Call this tool to interact with the 谷歌搜索 API. What is the 谷歌搜索 API useful for? 谷歌搜索是一个通用搜索引擎，可用于访问互联网、查询百科知识、了解时事新闻等。 Parameters: [{"name": "search_query", "description": "搜索关键词或短语", "required": true, "schema": {"type": "string"}}] Format the arguments as a JSON object.

image_gen: Call this tool to interact with the 文生图 API. What is the 文生图 API useful for? 文生图是一个AI绘画（图像生成）服务，输入文本描述，返回根据文本作画得到的图片的URL Parameters: [{"name": "prompt", "description": "英文关键词，描述了希望图像具有什么内容", "required": true, "schema": {"type": "string"}}] Format the arguments as a JSON object.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [google_search, image_gen]
Action Input: the input to the action
Observation: the result of 

In [8]:
# 作为一个文本续写模型来使用,按照prompt模版输出指令
def text_completion(input_text: str, stop_words) -> str:
    im_end = '<|im_end|>'
    if im_end not in stop_words:
        stop_words = stop_words + [im_end]
    stop_words_ids = [tokenizer.encode(w) for w in stop_words]

    # TODO: 增加流式输出的样例实现
    input_ids = torch.tensor([tokenizer.encode(input_text)]).to(model.device)
    output = model.generate(input_ids, stop_words_ids=stop_words_ids)
    output = output.tolist()[0]
    output = tokenizer.decode(output, errors="ignore")
    assert output.startswith(input_text)
    output = output[len(input_text) :].replace('<|endoftext|>', '').replace(im_end, '')

    for stop_str in stop_words:
        idx = output.find(stop_str)
        if idx != -1:
            output = output[: idx + len(stop_str)]
    return output  # 续写 input_text 的结果，不包含 input_text 的内容


In [9]:
output = text_completion(prompt, stop_words=['Observation:', 'Observation:\n'])
print(output)

Thought: 我需要查找相关信息。
Action: google_search
Action Input: {"search_query": "董宇辉"}
Observation:


In [10]:
# 解析 action 返回调用的插件，插件参数,以及输出作为下一个action的输入
def parse_latest_plugin_call(text):
    plugin_name, plugin_args = '', ''
    i = text.rfind('\nAction:')
    j = text.rfind('\nAction Input:')
    k = text.rfind('\nObservation:')
    if 0 <= i < j:  # If the text has `Action` and `Action input`,
        if k < j:  # but does not contain `Observation`,
            # then it is likely that `Observation` is ommited by the LLM,
            # because the output text may have discarded the stop word.
            text = text.rstrip() + '\nObservation:'  # Add it back.
        k = text.rfind('\nObservation:')
        plugin_name = text[i + len('\nAction:') : j].strip()
        plugin_args = text[j + len('\nAction Input:') : k].strip()
        text = text[:k]
    return plugin_name, plugin_args, text


In [11]:
action, action_input, output = parse_latest_plugin_call(output)
print("action ===> ",action)
print("action_input ===> ", action_input)
print("output ===> ",output)

action ===>  google_search
action_input ===>  {"search_query": "董宇辉"}
output ===>  Thought: 我需要查找相关信息。
Action: google_search
Action Input: {"search_query": "董宇辉"}


In [12]:
from google.colab import userdata

# 为了使用谷歌搜索（SERPAPI）， WolframAlpha，您需要自行申请它们的 API KEY，然后填入此处
os.environ['SERPAPI_API_KEY'] = userdata.get('SERPAPI_API_KEY')

#
# 输入：
#   plugin_name: 需要调用的插件代号，对应 name_for_model。
#   plugin_args：插件的输入参数，是一个 dict，dict 的 key、value 分别为参数名、参数值。
# 输出：
#   插件的返回结果，需要是字符串。
#   即使原本是 JSON 输出，也请 json.dumps(..., ensure_ascii=False) 成字符串。
#
def call_plugin(plugin_name: str, plugin_args: str) -> str:
    #
    # 请开发者自行完善这部分内容。这里的参考实现仅是 demo 用途，非生产用途。
    #
    if plugin_name == 'google_search':
        from langchain import SerpAPIWrapper

        return SerpAPIWrapper().run(json5.loads(plugin_args)['search_query'])
    elif plugin_name == 'image_gen':
        import urllib.parse

        prompt = json5.loads(plugin_args)["prompt"]
        prompt = urllib.parse.quote(prompt)
        return json.dumps({'image_url': f'https://image.pollinations.ai/prompt/{prompt}'}, ensure_ascii=False)
    else:
        raise NotImplementedError


In [13]:
# action、action_input 分别为需要调用的插件代号、输入参数
# observation是插件返回的结果，为字符串
observation = call_plugin(action, action_input)
print(observation)


[{'title': '董宇辉成为东方甄选高级合伙人俞敏洪：老板必须心甘情愿为员工打工', 'link': 'https://wap.eastmoney.com/a/202312182935672729.html', 'source': '东方财富', 'date': '11 hours ago', 'thumbnail': 'https://serpapi.com/searches/65806c077690dc81dc285f6b/images/eba6f41efcfa4ee136bae667b0495e84f4b4d8184be1f5e1.jpeg'}, {'title': '东方甄选开盘涨超14%，董宇辉成“高级合伙人”', 'link': 'https://www.yicai.com/news/101929273.html', 'source': '第一财经', 'date': '8 hours ago', 'thumbnail': 'https://serpapi.com/searches/65806c077690dc81dc285f6b/images/eba6f41efcfa4ee1123a648c27cb6a331909ded944d5b733.png'}, {'title': '成为合伙人后更进一步，董宇辉成新东方文旅副总裁！', 'link': 'http://www.stcn.com/article/detail/1067624.html', 'source': '证券时报', 'date': '7 hours ago', 'thumbnail': 'https://serpapi.com/searches/65806c077690dc81dc285f6b/images/eba6f41efcfa4ee1736f767ce092c6bcfbd72032b8bd3515.jpeg'}]


In [14]:
#
# 本示例代码的入口函数。
#
# 输入：
#   prompt: 用户的最新一个问题。
#   history: 用户与模型的对话历史，是一个 list，
#       list 中的每个元素为 {"user": "用户输入", "bot": "模型输出"} 的一轮对话。
#       最新的一轮对话放 list 末尾。不包含最新一个问题。
#   list_of_plugin_info: 候选插件列表，是一个 list，list 中的每个元素为一个插件的关键信息。
#       比如 list_of_plugin_info = [plugin_info_0, plugin_info_1, plugin_info_2]，
#       其中 plugin_info_0, plugin_info_1, plugin_info_2 这几个样例见本文档前文。
#
# 输出：
#   模型对用户最新一个问题的回答。
#
def llm_with_plugin(prompt: str, history, list_of_plugin_info=()):
    chat_history = [(x['user'], x['bot']) for x in history] + [(prompt, '')]
    print("chat_history ===> ",chat_history)

    # 需要让模型进行续写的初始文本
    planning_prompt = build_input_text(chat_history, list_of_plugin_info)
    print("planning_prompt ===> ",planning_prompt)

    text = ''
    while True:
        output = text_completion(planning_prompt + text, stop_words=['Observation:', 'Observation:\n'])
        # 解析 action 返回调用的插件，插件参数,以及输出作为下一个action的输入
        action, action_input, output = parse_latest_plugin_call(output)
        print("action ===> ",action)
        print("action_input ===> ", action_input)
        print("output ===> ",output)
        if action:  # 需要调用插件
            # action、action_input 分别为需要调用的插件代号、输入参数
            # observation是插件返回的结果，为字符串
            observation = call_plugin(action, action_input)
            print("observation ===> ",observation)
            output += f'\nObservation: {observation}\nThought:'
            text += output
        else:  # 生成结束，并且不再需要调用插件
            text += output
            break

    new_history = []
    new_history.extend(history)
    new_history.append({'user': prompt, 'bot': text})
    return text, new_history


In [21]:
prompt = "你好"
print(f"User's Prompt:\n{prompt}\n")
response, history = llm_with_plugin(prompt=prompt, history=[], list_of_plugin_info=tools)
print(f"Qwen's Response:\n{response}\n")


User's Prompt:
你好

chat_history ===>  [('你好', '')]
planning_prompt ===>  <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Answer the following questions as best you can. You have access to the following APIs:

google_search: Call this tool to interact with the 谷歌搜索 API. What is the 谷歌搜索 API useful for? 谷歌搜索是一个通用搜索引擎，可用于访问互联网、查询百科知识、了解时事新闻等。 Parameters: [{"name": "search_query", "description": "搜索关键词或短语", "required": true, "schema": {"type": "string"}}] Format the arguments as a JSON object.

image_gen: Call this tool to interact with the 文生图 API. What is the 文生图 API useful for? 文生图是一个AI绘画（图像生成）服务，输入文本描述，返回根据文本作画得到的图片的URL Parameters: [{"name": "prompt", "description": "英文关键词，描述了希望图像具有什么内容", "required": true, "schema": {"type": "string"}}] Format the arguments as a JSON object.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [google_search, ima

In [22]:
prompt = "东方甄选"
print(f"User's Prompt:\n{prompt}\n")
response, history = llm_with_plugin(prompt=prompt, history=[], list_of_plugin_info=tools)
print(f"Qwen's Response:\n{response}\n")


User's Prompt:
东方甄选

chat_history ===>  [('东方甄选', '')]
planning_prompt ===>  <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Answer the following questions as best you can. You have access to the following APIs:

google_search: Call this tool to interact with the 谷歌搜索 API. What is the 谷歌搜索 API useful for? 谷歌搜索是一个通用搜索引擎，可用于访问互联网、查询百科知识、了解时事新闻等。 Parameters: [{"name": "search_query", "description": "搜索关键词或短语", "required": true, "schema": {"type": "string"}}] Format the arguments as a JSON object.

image_gen: Call this tool to interact with the 文生图 API. What is the 文生图 API useful for? 文生图是一个AI绘画（图像生成）服务，输入文本描述，返回根据文本作画得到的图片的URL Parameters: [{"name": "prompt", "description": "英文关键词，描述了希望图像具有什么内容", "required": true, "schema": {"type": "string"}}] Format the arguments as a JSON object.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [google_search,

In [23]:
prompt = "董宇辉"
print(f"User's Prompt:\n{prompt}\n")
response, history = llm_with_plugin(prompt=prompt, history=history, list_of_plugin_info=tools)
print(f"Qwen's Response:\n{response}\n")


User's Prompt:
董宇辉

chat_history ===>  [('东方甄选', 'Thought: 提供的工具帮助较小，我将直接回答。\nFinal Answer: 东方甄选是抖音电商平台“东方甄选”推出的直播带货栏目，由董宇辉主持。该栏目以农产品为主打商品，通过主播讲解和互动的方式吸引观众购买。节目风格幽默风趣，深受观众喜爱。'), ('董宇辉', '')]
planning_prompt ===>  <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Answer the following questions as best you can. You have access to the following APIs:

google_search: Call this tool to interact with the 谷歌搜索 API. What is the 谷歌搜索 API useful for? 谷歌搜索是一个通用搜索引擎，可用于访问互联网、查询百科知识、了解时事新闻等。 Parameters: [{"name": "search_query", "description": "搜索关键词或短语", "required": true, "schema": {"type": "string"}}] Format the arguments as a JSON object.

image_gen: Call this tool to interact with the 文生图 API. What is the 文生图 API useful for? 文生图是一个AI绘画（图像生成）服务，输入文本描述，返回根据文本作画得到的图片的URL Parameters: [{"name": "prompt", "description": "英文关键词，描述了希望图像具有什么内容", "required": true, "schema": {"type": "string"}}] Format the arguments as a JSON object.

Use the following format:

Question: the input

In [24]:
prompt = "搜索一下'小作文'事件"
print(f"User's Prompt:\n{prompt}\n")
response, history = llm_with_plugin(prompt=prompt, history=history, list_of_plugin_info=tools)
print(f"Qwen's Response:\n{response}\n")


User's Prompt:
搜索一下'小作文'事件

chat_history ===>  [('东方甄选', 'Thought: 提供的工具帮助较小，我将直接回答。\nFinal Answer: 东方甄选是抖音电商平台“东方甄选”推出的直播带货栏目，由董宇辉主持。该栏目以农产品为主打商品，通过主播讲解和互动的方式吸引观众购买。节目风格幽默风趣，深受观众喜爱。'), ('董宇辉', 'Thought: 提供的工具帮助较小，我将直接回答。\nFinal Answer: 董宇辉是中国大陆的一位网络红人和电商主播，因在抖音电商平台“东方甄选”上的直播带货而走红。他以其独特的语言风格和幽默的表演方式吸引了大量粉丝，并被誉为“直播界的清流”。除了带货外，他还经常分享自己的生活经验和人生感悟，受到了广大网友的喜爱和关注。'), ("搜索一下'小作文'事件", '')]
planning_prompt ===>  <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
东方甄选<|im_end|>
<|im_start|>assistant
Thought: 提供的工具帮助较小，我将直接回答。
Final Answer: 东方甄选是抖音电商平台“东方甄选”推出的直播带货栏目，由董宇辉主持。该栏目以农产品为主打商品，通过主播讲解和互动的方式吸引观众购买。节目风格幽默风趣，深受观众喜爱。<|im_end|>
<|im_start|>user
Answer the following questions as best you can. You have access to the following APIs:

google_search: Call this tool to interact with the 谷歌搜索 API. What is the 谷歌搜索 API useful for? 谷歌搜索是一个通用搜索引擎，可用于访问互联网、查询百科知识、了解时事新闻等。 Parameters: [{"name": "search_query", "description": "搜索关键词或短语", "required": true, "schema": {"type": "string

In [18]:
history = []
for query in ['你好', '搜索一下谁是周杰伦', '再搜下他老婆是谁', '给我画个可爱的小猫吧，最好是黑猫']:
    print(f"User's Query:\n{query}\n")
    response, history = llm_with_plugin(prompt=query, history=history, list_of_plugin_info=tools)
    print(f"Qwen's Response:\n{response}\n")


User's Query:
你好

chat_history ===>  [('你好', '')]
planning_prompt ===>  <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Answer the following questions as best you can. You have access to the following APIs:

google_search: Call this tool to interact with the 谷歌搜索 API. What is the 谷歌搜索 API useful for? 谷歌搜索是一个通用搜索引擎，可用于访问互联网、查询百科知识、了解时事新闻等。 Parameters: [{"name": "search_query", "description": "搜索关键词或短语", "required": true, "schema": {"type": "string"}}] Format the arguments as a JSON object.

image_gen: Call this tool to interact with the 文生图 API. What is the 文生图 API useful for? 文生图是一个AI绘画（图像生成）服务，输入文本描述，返回根据文本作画得到的图片的URL Parameters: [{"name": "prompt", "description": "英文关键词，描述了希望图像具有什么内容", "required": true, "schema": {"type": "string"}}] Format the arguments as a JSON object.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [google_search, imag

#总结
- 和以前的NLU 模版差不多，但是比较灵活，prompt模版需要优化，有可能返回的结果不一定对，可进行相关参数调整

  对 top_p 等推理参数有调参建议

  通常来讲，较低的 top_p 会有更高的准确度，但会牺牲回答的多样性、且更易出现重复某个词句的现象。

  可以按如下方式调整 top_p 为 0.5：
```
model.generation_config.top_p = 0.5
```
  特别的，可以用如下方式关闭 top-p sampling，改用 greedy sampling，效果上相当于 top_p=0 或 temperature=0：
```
model.generation_config.do_sample = False  # greedy decoding
```
  此外，我们在 model.chat() 接口也提供了调整 top_p 等参数的接口。

- 量化很重要， 根据应用量化压缩模型的阶段，可以将模型量化分为：
  - 量化感知训练（Quantization Aware Training, QAT）：在模型训练过程中加入伪量化算子，通过训练时统计输入输出的数据范围可以提升量化后模型的精度，适用于对模型精度要求较高的场景；其量化目标无缝地集成到模型的训练过程中。这种方法使LLM在训练过程中适应低精度表示，增强其处理由量化引起的精度损失的能力。这种适应旨在量化过程之后保持更高性能。
  - 量化感知微调（Quantization-Aware Fine-tuning，QAF）：在微调过程中对LLM进行量化。主要目标是确保经过微调的LLM在量化为较低位宽后仍保持性能。通过将量化感知整合到微调中，以在模型压缩和保持性能之间取得平衡。
  - 训练后量化（Post Training Quantization, PTQ）：在LLM训练完成后对其参数进行量化，只需要少量校准数据，适用于追求高易用性和缺乏训练资源的场景。主要目标是减少LLM的存储和计算复杂性，而无需对LLM架构进行修改或进行重新训练。PTQ的主要优势在于其简单性和高效性。但PTQ可能会在量化过程中引入一定程度的精度损失。
