## 实验：Huggingface Transformer & Diffusion

#### 准备工作

1. 创建虚拟环境：

```
conda create -n huggingface
conda activate huggingface
conda install python=3.11 notebook ipywidgets
```

2. 设置 Pip 软件源，加快下载速度：

```
pip config set global.index-url https://mirrors.bfsu.edu.cn/pypi/web/simple
```

2. 根据硬件选择合适的 PyTorch 版本（二选一）：

- 选项1：CPU 版 PyTorch

```
pip3 install torch torchvision
```

- 选项2：GPU 版 PyTorch

```
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu118
```

3. 安装 Huggingface 相关软件库：

```
pip install diffusers["torch"] transformers huggingface_hub
```

#### 开始实验！

情感分析：

In [1]:
from transformers import pipeline
pipe = pipeline(task = "text-classification")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [2]:
pipe("Professor Qiu is very strict.")

[{'label': 'NEGATIVE', 'score': 0.8571763634681702}]

In [3]:
pipe("There is a new coffee shop on campus.")

[{'label': 'NEGATIVE', 'score': 0.515913188457489}]

In [4]:
pipe("There is a new coffee shop on campus. The drinks there taste not bad.")

[{'label': 'POSITIVE', 'score': 0.9982901215553284}]

In [5]:
pipe("There is a new coffee shop on campus. I will never go there again.")

[{'label': 'NEGATIVE', 'score': 0.9703740477561951}]

填词：

In [6]:
fill_masker = pipeline(task = "fill-mask", model = "ethanyt/guwenbert-base")

Some weights of the model checkpoint at ethanyt/guwenbert-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [7]:
output = fill_masker("帘外雨潺潺，春意阑珊。罗[MASK]不耐五更寒。")
list(map(lambda x : (x["token_str"], x["score"]), output))

[('衾', 0.20730605721473694),
 ('[SEP]', 0.1400241106748581),
 ('，', 0.1293717622756958),
 ('枕', 0.04262242466211319),
 ('衣', 0.03205817565321922)]

文生图（CPU 版，可能会非常慢）：

In [None]:
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float32
)

第一次运行时代码会下载模型文件，如果出现连接问题，可以用如下命令手动下载（该命令会下载所有版本的模型文件，非常大，如果不需要完整版可以去往 https://aliendao.cn/models/runwayml/stable-diffusion-v1-5 手动下载）：

```
python model_download.py --repo_id runwayml/stable-diffusion-v1-5 --mirror
```

然后加载本地模型文件：

```python
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
    "./dataroot/models/runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float32
)
```

In [None]:
prompt = "A blue apple"
pipe.enable_attention_slicing()
image = pipe(prompt, num_inference_steps=10).images[0]

In [None]:
image

GPU 版：

In [1]:
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
)
pipe.to("cuda")

AttributeError: module 'sympy' has no attribute 'Expr'

In [2]:
import torch
print(torch.__version__)
print(torch.cuda.is_available())

NameError: name '_C' is not defined

In [5]:
prompt = "A tree on the moon"
pipe.enable_attention_slicing()
image = pipe(prompt, num_inference_steps=50).images[0]

  0%|          | 0/50 [00:00<?, ?it/s]

In [None]:
image

控制随机数种子：

In [None]:
gen = torch.Generator(device="cuda").manual_seed(123)
image = pipe(prompt, generator=gen, num_inference_steps=50).images[0]
image