<table style="width:100%">
<tr>
<td style="vertical-align:middle; text-align:left;">
<font size="2">
Supplementary code for the <a href="http://mng.bz/orYv">Build a Large Language Model From Scratch</a> book by <a href="https://sebastianraschka.com">Sebastian Raschka</a><br>
<br>Code repository: <a href="https://github.com/rasbt/LLMs-from-scratch">https://github.com/rasbt/LLMs-from-scratch</a>
<br>汉化的库: <a href="https://github.com/GoatCsu/CN-LLMs-from-scratch.git">https://github.com/GoatCsu/CN-LLMs-from-scratch.git</a>
</font>
</td>
<td style="vertical-align:middle; text-align:left;">
<a href="http://mng.bz/orYv"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp" width="100px"></a>
</td>
</tr>
</table>


# 使用 LLaMA 3.1 70B 和 Ollama 生成偏好数据集  

- **偏好微调（Preference Finetuning）** 旨在使 **指令微调后的 LLM** 更加符合 **人类偏好**。  
- 生成 **偏好微调数据集** 有多种方法：
  1. **使用指令微调 LLM 生成多个响应**，并由 **人工根据偏好标准进行排序**。  
  2. **使用指令微调 LLM 生成多个响应**，并由 **LLM 根据设定的偏好标准进行排序**。  
  3. **使用 LLM 直接生成偏好（Preferred）和非偏好（Dispreferred）响应**，基于特定偏好标准。  

- **本笔记本采用方法 3**。  
- 这里使用 **70B 参数的 LLaMA 3.1-Instruct 模型**（通过 **Ollama** 运行）为 **指令数据集生成偏好标签**。  
- **期望的指令数据集格式如下**：

### 输入（Input）


```json
[
    {
        "instruction": "What is the state capital of California?",
        "input": "",
        "output": "The state capital of California is Sacramento.",
    },
    {
        "instruction": "Provide a synonym for 'fast'.",
        "input": "",
        "output": "A synonym for 'fast' is 'quick'.",
    },
    {
        "instruction": "What is the capital of Greece?",
        "input": "",
        "output": "The capital of Greece is Athens.",

    },
...
]
```
生成的数据集格式如下，其中 **较礼貌的响应** 被标记为 **`'chosen'`（偏好响应）**，**较不礼貌的响应** 被标记为 **`'rejected'`（非偏好响应）**：


```json
[
    {
        "instruction": "What is the state capital of California?",
        "input": "",
        "output": "The state capital of California is Sacramento.",
        "rejected": "Look, the state capital of California is obviously Sacramento.",
        "chosen": "The state capital of California is Sacramento."
    },
    {
        "instruction": "Provide a synonym for 'fast'.",
        "input": "",
        "output": "A synonym for 'fast' is 'quick'.",
        "chosen": "A suitable alternative to 'fast' would be 'quick'.",
        "rejected": "A synonym for 'fast' is 'quick'."
    },
    {
        "instruction": "What is the capital of Greece?",
        "input": "",
        "output": "The capital of Greece is Athens.",
        "chosen": "I'd be happy to help! The capital of Greece is indeed Athens.",
        "rejected": "The capital of Greece is Athens."
    },
...
]
```

### 输出（Output）

- 该代码 **无需 GPU**，在 **RAM 充足的笔记本电脑** 上即可运行。  


In [1]:
from importlib.metadata import version

pkgs = ["tqdm",    # Progress bar
        ]

for p in pkgs:
    print(f"{p} version: {version(p)}")

tqdm version: 4.66.4


## 安装 Ollama 并下载 LLaMA 3.1

- **Ollama** 是一个用于高效运行 **LLM（大语言模型）** 的应用。  
- 它是 **[llama.cpp](https://github.com/ggerganov/llama.cpp)** 的封装，后者采用 **纯 C/C++ 实现 LLM**，以 **最大化推理效率**。  
- **请注意**，Ollama **仅用于 LLM 推理（inference）**，**不支持训练或微调（finetuning）**。  
- **在运行下方代码前**，请先访问 **[https://ollama.com](https://ollama.com)** 并按照安装指南完成 **Ollama 安装**（例如，点击 **“Download”** 按钮，下载适用于您的操作系统的 Ollama 应用）。  

- **对于 macOS 和 Windows 用户**，点击 **下载的 Ollama 应用**，如果系统提示安装 **命令行工具**，请选择 **“是”**。  
- **Linux 用户** 可以使用 **Ollama 官网提供的安装命令** 进行安装。  

- **通常，在命令行使用 Ollama 之前**，需要 **启动 Ollama 应用** 或 **在终端运行 `ollama serve`**。  

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/ollama-eval/ollama-serve.webp?1">

- **确保 Ollama 运行后**，在 **另一个终端窗口** 执行以下命令，尝试 **700 亿参数的 LLaMA 3.1 模型**：  


```bash
# 70B model
ollama run llama3.1:70b
```


The output looks like as follows:

```
$ ollama run llama3.1:70b
pulling manifest
pulling aa81b541aae6... 100% ▕████████████████▏ 39 GB
pulling 8cf247399e57... 100% ▕████████████████▏ 1.7 KB
pulling f1cd752815fc... 100% ▕████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████▏ 96 B
pulling 3c1c2d3df5b3... 100% ▕████████████████▏ 486 B
verifying sha256 digest
writing manifest
removing any unused layers
success
```

- **注意**：`llama3.1:70b` 指的是 **指令微调后的 700 亿参数 LLaMA 3.1 模型**。  

- **如果您的设备资源有限**，可以选择 **更轻量的 80 亿参数 LLaMA 3.1 模型**，  
  **只需将 `llama3.1:70b` 替换为 `llama3.1`**。  

- **下载完成后**，您将进入 **命令行交互界面**，可与模型进行对话。  

- **尝试输入以下提示**："What do llamas eat?"（羊驼吃什么？），  
  预计模型会返回类似如下的输出：  


```
>>> What do llamas eat?
Llamas are ruminant animals, which means they have a four-chambered 
stomach and eat plants that are high in fiber. In the wild, llamas 
typically feed on:
1. Grasses: They love to graze on various types of grasses, including tall 
grasses, wheat, oats, and barley.
```

- 输入`/bye`以结束这一节

## 使用 Ollama 的 REST API

- **另一种与模型交互的方式** 是通过 **REST API** 在 **Python** 中进行调用，具体实现如下。  
- **在运行本笔记本中的代码前**，请确保 **Ollama 仍在运行**，可通过以下方式启动：
  - 在终端中执行 `ollama serve`
  - 使用 **Ollama 应用程序**  

- **接下来，运行下方代码单元**，以查询模型并获取响应。  


- 首先，我们使用 **一个简单示例** 调用 API，以确保其 **正常运行**：  


In [2]:
import urllib.request
import json


def query_model(prompt, model="llama3.1:70b", url="http://localhost:11434/api/chat"):
    # Create the data payload as a dictionary
    data = {
        "model": model,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "options": {
            "seed": 123,
            "temperature": 0,
        }
    }

    # Convert the dictionary to a JSON formatted string and encode it to bytes
    payload = json.dumps(data).encode("utf-8")

    # Create a request object, setting the method to POST and adding necessary headers
    request = urllib.request.Request(url, data=payload, method="POST")
    request.add_header("Content-Type", "application/json")

    # Send the request and capture the response
    response_data = ""
    with urllib.request.urlopen(request) as response:
        # Read and decode the response
        while True:
            line = response.readline().decode("utf-8")
            if not line:
                break
            response_json = json.loads(line)
            response_data += response_json["message"]["content"]

    return response_data


result = query_model("What do Llamas eat?")
print(result)

Llamas are herbivores, which means they primarily eat plants and plant-based foods. Their diet consists of:

1. **Grasses**: Various types of grasses, including timothy grass, orchard grass, and brome grass.
2. **Hay**: High-quality hay, such as alfalfa or clover hay, is a staple in a llama's diet.
3. **Leaves**: Leaves from trees and shrubs, like willow, cottonwood, and mesquite, are also eaten.
4. **Fruits and vegetables**: Llamas enjoy fruits like apples, carrots, and sweet potatoes, as well as leafy greens like kale and spinach.
5. **Grains**: In moderation, llamas can eat grains like oats, barley, and corn.

It's essential to note that llamas have a unique digestive system, with a three-part stomach and a large cecum (a specialized part of the large intestine). This allows them to break down and extract nutrients from plant material more efficiently than many other animals.

A typical llama diet might consist of:

* 1-2% of their body weight in hay per day
* 0.5-1% of their body w

## 加载 JSON 数据（Load JSON Entries）

- 现在，我们进入 **数据生成** 部分。  
- **为了直观演示**，我们将使用 **`instruction-data.json`** 文件，  
  该文件最初用于 **第 7 章的指令微调（Instruction Finetuning）**。  

In [3]:
from pathlib import Path

json_file = Path("..", "01_main-chapter-code", "instruction-data.json")

with open(json_file, "r") as file:
    json_data = json.load(file)

print("Number of entries:", len(json_data))

Number of entries: 1100


- **该文件的结构如下**，其中：
  - `'output'`：测试数据集中提供的 **预期响应**，即模型通过 **指令微调（Instruction Finetuning）** 训练后应生成的内容。  
  - `'input'` 和 `'instruction'`：用于指导模型生成 `'output'` 的 **输入数据**。  


In [4]:
json_data[0]

{'instruction': 'Evaluate the following phrase by transforming it into the spelling given.',
 'input': 'freind --> friend',
 'output': 'The spelling of the given phrase "freind" is incorrect, the correct spelling is "friend".'}

- 下面是一个 **小型工具函数**，用于格式化 **指令（instruction）和输入（input）**：  

In [5]:
def format_input(entry):
    instruction_text = (
        f"Below is an instruction that describes a task. Write a response that "
        f"appropriately completes the request."
        f"\n\n### Instruction:\n{entry['instruction']}"
    )

    input_text = f"\n\n### Input:\n{entry['input']}" if entry["input"] else ""
    instruction_text + input_text

    return instruction_text + input_text

- 现在，我们使用 **Ollama API** 生成 **`'chosen'`（偏好）** 和 **`'rejected'`（非偏好）** 响应，  
  以进行 **模型的偏好微调（Preference Tuning）**。  
- **为了直观演示**，这里生成的回答在 **礼貌程度** 上存在 **明显差异**。  

In [6]:
import random


for entry in json_data[:5]:
    
    politeness = random.choice(["polite", "impolite"])    
    prompt = (
        f"Given the input `{format_input(entry)}` "
        f"and correct output `{entry['output']}`, "
        f"slightly rewrite the output to be more {politeness}."
        "Keep the modification minimal."
        "Only return return the generated response and nothing else."
    )
    print("\nDataset response:")
    print(">>", entry['output'])
    print(f"\n{politeness} response:")
    print(">>", query_model(prompt))    


Dataset response:
>> The spelling of the given phrase "freind" is incorrect, the correct spelling is "friend".

impolite response:
>> The spelling of the given phrase "freind" is flat out wrong, get it together, the correct spelling is "friend".

Dataset response:
>> He goes to the park every day.

polite response:
>> He goes to the park daily, if I'm not mistaken.

Dataset response:
>> 45 kilometers is 45000 meters.

polite response:
>> 45 kilometers is equivalent to 45000 meters.

Dataset response:
>> Although it was raining, they went for a walk.

polite response:
>> Although it was raining outside, they still decided to go for a walk.

Dataset response:
>> 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.

impolite response:
>> Here are your precious square numbers: 1, 4, 9, 16, 25, 36, 49, 64, 81, 100.


- **如果我们认为上面生成的响应较为合理**，可以进入 **下一步**，将该提示（prompt）应用于 **整个数据集**。  
- **在数据集中添加**：
  - **`'chosen'`**：代表 **偏好（preferred）响应**  
  - **`'rejected'`**：代表 **非偏好（dispreferred）响应**  

In [7]:
import random
from tqdm import tqdm

def generate_model_responses(json_data):

    for i, entry in enumerate(tqdm(json_data, desc="Writing entries")):
        politeness = random.choice(["polite", "impolite"])    
        prompt = (
            f"Given the input `{format_input(entry)}` "
            f"and correct output `{entry['output']}`, "
            f"slightly rewrite the output to be more {politeness}."
            "Keep the modification minimal."
            "Only return return the generated response and nothing else."
        )
        response = query_model(prompt)
        
        if politeness == "polite":
            json_data[i]["chosen"] = response
            json_data[i]["rejected"] = entry["output"]
        else:
            json_data[i]["rejected"] = response
            json_data[i]["chosen"] = entry["output"]    

- 现在，我们对 **整个数据集** 进行评估，并计算 **每个模型的平均分**（在 **M3 MacBook Air** 上运行 **每个模型约需 1 分钟**）。  
- **请注意**，截至目前，Ollama **在不同操作系统上的推理结果并非完全确定性**，  
  因此，您的评估分数可能会与下方示例结果 **略有不同**。  

In [8]:
generate_model_responses(json_data)

Writing entries: 100%|██████████| 1100/1100 [17:20<00:00,  1.06it/s]


In [10]:
with open("instruction-data-with-preference.json", "w") as file:
    json.dump(json_data, file, indent=4)