<a href="https://colab.research.google.com/github/zhousanfu/machine-learning-demo/blob/master/LangChain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers sentencepiece cpm_kernels

In [None]:
!pip install langchain

In [None]:
!pip install google-search-results -i pypi.douban.com/simple --trusted-host pypi.douban.com

## Chatglm

In [None]:
from transformers import AutoTokenizer, AutoModel
from typing import Any, List, Mapping, Optional

class chatGLM():
    def __init__(self, model_name) -> None:
        self.tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
        self.model = AutoModel.from_pretrained(model_name, trust_remote_code=True).half().cuda().eval()

    def __call__(self, prompt, history) -> Any:
        response, history = self.model.chat(self.tokenizer , prompt, history=history) # 这里演示未使用流式接口. stream_chat()
        return response, history

llm = chatGLM(model_name="THUDM/chatglm-6B-int4")

Downloading (…)okenizer_config.json:   0%|          | 0.00/446 [00:00<?, ?B/s]

Downloading (…)enization_chatglm.py:   0%|          | 0.00/17.0k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/THUDM/chatglm-6B-int4:
- tokenization_chatglm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading ice_text.model:   0%|          | 0.00/2.71M [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/838 [00:00<?, ?B/s]

Downloading (…)iguration_chatglm.py:   0%|          | 0.00/4.38k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/THUDM/chatglm-6B-int4:
- configuration_chatglm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)/modeling_chatglm.py:   0%|          | 0.00/59.4k [00:00<?, ?B/s]

Downloading (…)main/quantization.py:   0%|          | 0.00/31.0k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/THUDM/chatglm-6B-int4:
- quantization.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/THUDM/chatglm-6B-int4:
- modeling_chatglm.py
- quantization.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading pytorch_model.bin:   0%|          | 0.00/3.89G [00:00<?, ?B/s]

In [None]:
response, history = llm(prompt="你好", history=[])
print("response: %s"%response)
response, history = llm(prompt="我最近有点失眠怎么办?", history=[])
print("response: %s"%response)

## Langchain

### prompt
填入内容来引导大模型输出

In [5]:
from langchain import PromptTemplate



template = """Explain the concept of {concept} in couple of lines"""

prompt = PromptTemplate(input_variables=["concept"], template=template)
prompt = prompt.format(concept="regularization")
print("prompt=", prompt)

template = "请给我解释一下{concept}的意思"
prompt = PromptTemplate(input_variables=["concept"], template=template)
prompt = prompt.format(concept="人工智能")
print("prompt=", prompt)

prompt= Explain the concept of regularization in couple of lines
prompt= 请给我解释一下人工智能的意思


### Chains
链接多个组件处理一个特定的下游任务

In [None]:
from langchain.chains import LLMChain



chain = LLMChain(llm=openAI(), prompt=promptTem)
print(chain.run("你好"))

#chains ---------Chatglm对象不符合LLMChain类llm对象要求，模仿一下
class DemoChain():
    def __init__(self, llm, prompt) -> None:
        self.llm = llm
        self.prompt = prompt

    def run(self, query) -> Any:
        prompt = self.prompt.format(concept=query)
        print("query=%s  ->prompt=%s"%(query, prompt))
        response = self.llm(prompt)
        return response

chain = DemoChain(llm=llm, prompt=promptTem)
print(chain.run(query="天道酬勤"))

“”“
query=天道酬勤  ->prompt=请给我解释一下天道酬勤的意思
天道酬勤是指自然界的规律认为只要一个人勤奋努力，就有可能会获得成功。这个成语的意思是说，尽管一个人可能需要付出很多努力才能取得成功，但只要他/她坚持不懈地努力，就有可能会得到回报。
”“”

### llm重写

TfboyLLM继承了langchain.llms.base的LLM类。需要实现它的两个方法：

*   _call: 主要的处理方法，对传来的prompt问题分析，给他一个答案。return

*   _identifying_params: 说明LLM类中的参数和数值。本例中没有类的成员变量。


其实关键要看_call中实现的逻辑：
收到prompt先打印出来。
对问题正则匹配，规则为：[数字]+[运算符]+[数字]。匹配到，返回计算结果。匹配不到继续执行。
判断有没有[?]。如果有，则对文本中字符进行替换，规则为：我->你, 你->我, 吗->"", ?->!。
如果都不符合，就返回：“很抱歉，请换一种问法。比如：1+1等于几”。

In [None]:
from typing import Any, List, Mapping, Optional
from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
import re

class TfboyLLM(LLM):

    @property
    def _llm_type(self) -> str:
        return "custom"

    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
    ) -> str:
        print("问题:",prompt)
        pattern = re.compile(r'^.*(\d+[*/+-]\d+).*$')
        match = pattern.search(prompt)
        if match:
            result = eval(match.group(1))
        elif "？" in prompt:
            rep_args = {"我":"你", "你":"我", "吗":"", "？":"！"}
            result = [(rep_args[c] if c in rep_args else c) for c in list(prompt)]
            result = ''.join(result)
        else:
            result = "很抱歉，请换一种问法。比如：1+1等于几"
        return result

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {}



In [None]:
llm = TfboyLLM()
print("答案:",llm("我能问你问题吗？"))