<a href="https://colab.research.google.com/github/thm1118/Colab_Project/blob/main/Colab_ChatGLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![visitors](https://visitor-badge.glitch.me/badge?page_id=wsh.colab_chatglm)

># **基于清华大学发布的对话语言模型ChatGLM**
> Colab notebook by [Happy_WSH](https://space.bilibili.com/8417436)。

>我做了什么？
>
>编写此notebook及操作教程；编写了Colab操作风格的WSH方法；测试并给出适合Colab免费用户使用的配置

模型作者github：[THUDM/ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B)

秋叶方法github：[Akegarasu/ChatGLM-webui](https://github.com/Akegarasu/ChatGLM-webui)

WSH方法github：[WSH032/ChatGLM-6](https://github.com/WSH032/ChatGLM-6B)

使用时请遵守Apache-2.0 license协议，及各github仓库要求的协议

`不过，由于 ChatGLM-6B 的规模较小，目前已知其具有相当多的局限性，如事实性/数学逻辑错误，可能生成有害/有偏见内容，较弱的上下文能力，自我认知混乱，以及对英文指示生成与中文指示完全矛盾的内容。请大家在使用前了解这些问题，以免产生误解。更大的基于 1300 亿参数 GLM-130B 的 ChatGLM 正在内测开发中。---模型作者THUDM`


# (一)克隆github，安装依赖，配置环境

In [None]:
#@title ##*挂载谷歌硬盘（可选）*
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#@title ##1.1克隆秋叶的库、安装依赖

!git clone https://github.com/Akegarasu/ChatGLM-webui
%cd /content/ChatGLM-webui
print(f"正在安装依赖，请耐心等待")
!pip install --upgrade -r requirements.txt  > /dev/null 2>&1
print(f"依赖安装完成")

#切换编码
import locale
print(locale.getpreferredencoding())
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
print(locale.getpreferredencoding())
import os
os.environ['PYTHONIOENCODING'] = 'UTF-8'

## *如果你使用CPU模型，请按照如下教程(可选)*

In [None]:
#运行这个代码找到libcuda.so.1文件
!sudo find /usr/ -name 'libcuda.so.1'

/usr/local/cuda-11.8/compat/libcuda.so.1


In [None]:
#运行这个代码找到环境路径(被":"分割为两个)
!echo $LD_LIBRARY_PATH

/usr/local/nvidia/lib:/usr/local/nvidia/lib64


In [None]:
#@title 将cuda文件，两个环境路径填入代码块，并运行
cuda_file_path = "/usr/local/cuda-11.8/compat/libcuda.so.1" #@param {type:"string"}
environment_path1 = "/usr/local/nvidia/lib" #@param {type:"string"}
environment_path2 = "/usr/local/nvidia/lib64" #@param {type:"string"}

import os

!mkdir -p {environment_path1}
!mkdir -p {environment_path2}

!cp -p {cuda_file_path} {environment_path1}
!cp -p {cuda_file_path} {environment_path2}

print("cuda文件拷贝完成")

#@markdown **!!!运行完后，点击`代码执行程序`--`重新启动代码执行程序`完成刷新!!!**

In [None]:
#@title 显示GPU 详细信息
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

# （二）选择一个方法并使用

## 2.1秋叶方法（WebUI风格）`功能多，但gradio在Colab里不稳定`

In [None]:
#@title ###2.1.1 设置参数并开始对话

%cd /content/ChatGLM-webui

extArgs=""

#@markdown 是否用CPU进行推理`CPU模式请选择chatglm-6b-int4-qe，不然会爆ram`
use_cpu = False #@param {type:"boolean"}
#启用CPU模型
if use_cpu:
  extArgs = extArgs + "--cpu "

#@markdown 选择模型`Colab免费用户只能使用int4和qe模型`，或者填入自定义模型路径`将会覆盖预设模型选择`
model_path = "THUDM/chatglm-6b-int4" #@param ["THUDM/chatglm-6b", "THUDM/chatglm-6b-int4", "THUDM/chatglm-6b-int4-qe"]
your_model_path = "" #@param {type:"string"}
#用自定义路径覆盖预设
if your_model_path:
  model_path = your_model_path

#@markdown 推理精度`留空则自动指定, fp32只有CPU可使用 ； int4、int8只有GPU能用`
precision = "" #@param ["", "fp32", "fp16", "int4", "int8"]
#指定精度
if precision:
  extArgs = extArgs + f"--precision={precision} "

#启动
!python webui.py --model-path={model_path} --listen --share {extArgs}

## 2.2WSH方法（Colab风格）`在Colab里兼容较好`

In [None]:
#@title ###2.2.1选择并启用模型

from transformers import AutoModel, AutoTokenizer
import gradio as gr

#@markdown 是否用CPU进行推理`CPU模式请选择chatglm-6b-int4-qe，不然会爆ram`
use_cpu = False #@param {type:"boolean"}

#@markdown 选择模型`Colab免费用户只能使用int4和qe模型`，或者填入自定义模型路径`将会覆盖预设模型选择`
model_path = "THUDM/chatglm-6b-int4" #@param ["THUDM/chatglm-6b", "THUDM/chatglm-6b-int4", "THUDM/chatglm-6b-int4-qe"]
your_model_path = "" #@param {type:"string"}
#用自定义路径覆盖预设
if your_model_path:
  model_path = your_model_path

if use_cpu:
  tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
  model = AutoModel.from_pretrained(model_path, trust_remote_code=True).float()
else:
  tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
  model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()

model = model.eval()

#初始化参数
history = []
count = 1
max_length = 2048
top_p = 0.7
temperature =  0.95
max_turns = 20
clear_history_flag = False

def clear_history_set():
  global history, count, clear_history_flag
  history = []
  count = 1
  clear_history_flag = False

In [None]:
#@title ###2.2.2参数设置（对话中也可更改）
max_length = 2048 #@param {type:"number"}
top_p = 0.7 #@param {type:"slider", min:0, max:1, step:0.01}
temperature =  0.95 #@param {type:"slider", min:0, max:1, step:0.01}
max_turns = 20 #@param {type:"slider", min:1, max:256, step:1}

In [None]:
#@title ###2.2.3提问
import ipywidgets as widgets
from IPython.display import display
from ipywidgets import Layout

#@markdown 提问
ask = "\u4F60\u597D" #@param {type:"string"}

#清空历史
if clear_history_flag:
  clear_history_set()
  print(f"达到对话次数上限，历史对话记录已被清空")

print(f"第 {count} 次对话，到达 {max_turns} 次后，下一次对话时将删除历史对话记录")

#用于根据文本框行数自动调整高度
def get_bigger (args):
  a = textarea_param.value.count ('\n') + 1
  b = int(len(textarea_param.value) / 100 ) +1
  l = max(a,b)
  textarea_param.rows = l

#设置文本框大小
text_layout = Layout (flex='0 1 auto', height='auto', min_height='40px', width='1400px')
textarea_param = widgets.Textarea (value='', placeholder='回答...', description='ChatGLM:', disabled=False, layout=text_layout)
textarea_param.observe (get_bigger, 'value')
display(textarea_param)

#通过ipywidgets输出回答内容
def ask_and_ans(tokenizer, ask, history, max_length, top_p, temperature):
  old_history = ""
  old_response = ""
  for response, history in model.stream_chat(tokenizer, ask, history, max_length, top_p, temperature):
    textarea_param.value = response
    old_response = response
    old_history = history
  return old_response, old_history


#提问及计数+1
response, history = ask_and_ans(tokenizer, ask, history, max_length=max_length, top_p=top_p,temperature=temperature)
count = count +1

#设置清空标志
if count > max_turns:
  clear_history_flag = True


In [None]:
#@title ###2.2.4 历史对话记录



#@markdown 显示对话历史
show_history = True #@param {type:"boolean"}
if show_history:
  for ask_contet,ans_content in history:
    print(f"用户： {ask_contet}")
    print(f"回答： {ans_content}")
    print(f"------------------------------------------------------")
#@markdown 手动清空对话历史
clear_history = False #@param {type:"boolean"}
if clear_history:
  clear_history_set()
  print(f"聊天记录已被清空")

## 2.3官方流式方法

In [None]:
#@title ###2.3.1选择并启用模型

from transformers import AutoModel, AutoTokenizer
import gradio as gr

#@markdown 是否用CPU进行推理`CPU模式请选择chatglm-6b-int4-qe，不然会爆ram`
use_cpu = False #@param {type:"boolean"}

#@markdown 选择模型`Colab免费用户只能使用int4和qe模型`，或者填入自定义模型路径`将会覆盖预设模型选择`
model_path = "THUDM/chatglm-6b-int4" #@param ["THUDM/chatglm-6b", "THUDM/chatglm-6b-int4", "THUDM/chatglm-6b-int4-qe"]
your_model_path = "" #@param {type:"string"}
#用自定义路径覆盖预设
if your_model_path:
  model_path = your_model_path

if use_cpu:
  tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
  model = AutoModel.from_pretrained(model_path, trust_remote_code=True).float()
else:
  tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
  model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()

model = model.eval()


In [None]:
#@title ###2.3.2开启对话

#@markdown 最大对话轮数
MAX_TURNS = 20 #@param {type:"slider", min:1, max:256, step:1}
MAX_BOXES = MAX_TURNS * 2



def predict(input, max_length, top_p, temperature, history=None):
    if history is None:
        history = []
    for response, history in model.stream_chat(tokenizer, input, history, max_length=max_length, top_p=top_p,
                                               temperature=temperature):
        updates = []
        for query, response in history:
            updates.append(gr.update(visible=True, value="用户：" + query))
            updates.append(gr.update(visible=True, value="ChatGLM-6B：" + response))
        if len(updates) < MAX_BOXES:
            updates = updates + [gr.Textbox.update(visible=False)] * (MAX_BOXES - len(updates))
        yield [history] + updates


with gr.Blocks() as demo:
    state = gr.State([])
    text_boxes = []
    for i in range(MAX_BOXES):
        if i % 2 == 0:
            text_boxes.append(gr.Markdown(visible=False, label="提问："))
        else:
            text_boxes.append(gr.Markdown(visible=False, label="回复："))

    with gr.Row():
        with gr.Column(scale=4):
            txt = gr.Textbox(show_label=False, placeholder="Enter text and press enter", lines=11).style(
                container=False)
        with gr.Column(scale=1):
            max_length = gr.Slider(0, 4096, value=2048, step=1.0, label="Maximum length", interactive=True)
            top_p = gr.Slider(0, 1, value=0.7, step=0.01, label="Top P", interactive=True)
            temperature = gr.Slider(0, 1, value=0.95, step=0.01, label="Temperature", interactive=True)
            button = gr.Button("Generate")
    button.click(predict, [txt, max_length, top_p, temperature, state], [state] + text_boxes)
demo.queue().launch()

# *（三）开发备用代码*

In [None]:
old_response = ""
for response, history in model.stream_chat(tokenizer, "你好", [], max_length=2048, top_p=0.7, temperature=0.95):
  print(response[len(old_response):], end="")
  old_response = response
print(end="\r")
print(old_response)

你好👋！我是人工智能助手 ChatGLM-6B，很高兴见到你，欢迎问我任何问题。


In [None]:
import sys
old_response = ""
for response, history in model.stream_chat(tokenizer, "你好", [], max_length=2048, top_p=0.7, temperature=0.95):
  old_response = response
  sys.stdout.write(response)
  sys.stdout.flush()
  sys.stdout.write("\r")
print(old_response) #

你好👋！我是人工智能助手 ChatGLM-6B，很高兴见到你，欢迎问我任何问题。


In [None]:
def ask_and_ans(tokenizer, ask, history, max_length, top_p, temperature):
  old_history = ""
  old_response = ""
  for response, history in model.stream_chat(tokenizer, ask, history, max_length, top_p, temperature):
    old_response = response
    old_history = history
    print(end="\r")
    print(response, end="", flush=True) # 打印当前字符串
  print(end="\r")
  print(old_response)
  return old_response, old_history

In [None]:
from google.colab import output
def ask_and_ans(tokenizer, ask, history, max_length, top_p, temperature):
  old_history = ""
  old_response = ""
  for response, history in model.stream_chat(tokenizer, ask, history, max_length, top_p, temperature):
    print(response[len(old_response):], end="")
    old_response = response
    old_history = history
  output.clear()
  print(old_response)
  return old_response, old_history

In [None]:
#@title ###终止按钮
import ipywidgets as widgets
from IPython.display import display
from ipywidgets import Layout
import threading

ask = "\u6211\u7231\u4F60" #@param {type:"string"}

#用于终止ask_and_ans
stop_flag = False

#清空历史
if clear_history_flag:
  clear_history_set()
  print(f"达到对话次数上限，历史对话记录已被清空")

print(f"第 {count} 次对话，到达 {max_turns} 次后，下一次对话时将删除历史对话记录")


# 定义一个回调函数，它会在按钮被点击时调用
def stop_button_clicked(b):
    # 使用global关键字，声明要修改全局变量stop_flag的值
    global stop_flag
    # 将stop_flag的值设为True，表示要终止子线程
    print("终止回答.")
    stop_flag = True

# 创建一个按钮控件，并为它注册回调函数
stop_button = widgets.Button(description="终止回答")
stop_button.on_click(stop_button_clicked)
# 显示按钮控件
display(stop_button)


#用于根据文本框行数自动调整高度
def get_bigger (args):
  textarea_param.rows = textarea_param.value.count ('\n') + 1

#设置文本框大小
text_layout = Layout (flex='0 1 auto', height='auto', min_height='100px', width='auto')
textarea_param = widgets.Textarea (value='', placeholder='回答...', description='ChatGLM:', disabled=False, layout=text_layout)
textarea_param.observe (get_bigger, 'value')
display(textarea_param)

#通过ipywidgets输出回答内容
def ask_and_ans(tokenizer, ask, history, max_length, top_p, temperature):
  global stop_flag
  old_history = ""
  old_response = ""
  for response, history in model.stream_chat(tokenizer, ask, history, max_length, top_p, temperature):
    textarea_param.value = response
    old_response = response
    old_history = history
    if stop_flag:
      # 获取当前线程实例
      thread = threading.current_thread()
      # 将返回值赋给result属性
      thread.result = (old_response, old_history)
      return old_response, old_history
  # 获取当前线程实例
  thread = threading.current_thread()
  # 将返回值赋给result属性
  thread.result = (old_response, old_history)
  return old_response, old_history

# 创建一个子线程，并将ask_and_ans作为目标函数
stop_thread = threading.Thread(target=ask_and_ans, args=(tokenizer, ask, history, max_length, top_p, temperature))
# 启动子线程
stop_thread.start()
# 等待子线程结束
stop_thread.join()
# 获取子线程的返回值
response, history = stop_thread.result
count = count +1

#设置清空标志
if count > max_turns:
  clear_history_flag = True
