# 为指令数据集创建“被动语态”条目
- 本笔记本使用OpenAI的GPT-4生成“被动语态”条目，用于指令数据集，示例如下。

```python
{  
   'instruction': 'Identify the verb in the following sentence',
   'input': 'The cat sleeps on the couch.',
   'output': 'The verb in the sentence is "sleeps."',
   'output_2': 'The sentence is "sleeps."'   #  <---- Newly created entry
}  
```

In [2]:
pip install -r requirements-extra.txt

Collecting openai>=1.30.3 (from -r requirements-extra.txt (line 1))
  Downloading openai-1.54.4-py3-none-any.whl.metadata (24 kB)
Collecting scikit-learn>=1.3.1 (from -r requirements-extra.txt (line 2))
  Downloading scikit_learn-1.5.2-cp310-cp310-win_amd64.whl.metadata (13 kB)
Collecting distro<2,>=1.7.0 (from openai>=1.30.3->-r requirements-extra.txt (line 1))
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting jiter<1,>=0.4.0 (from openai>=1.30.3->-r requirements-extra.txt (line 1))
  Downloading jiter-0.7.1-cp310-none-win_amd64.whl.metadata (5.3 kB)
Collecting scipy>=1.6.0 (from scikit-learn>=1.3.1->-r requirements-extra.txt (line 2))
  Downloading scipy-1.14.1-cp310-cp310-win_amd64.whl.metadata (60 kB)
Collecting joblib>=1.2.0 (from scikit-learn>=1.3.1->-r requirements-extra.txt (line 2))
  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=1.3.1->-r requirements-extra.txt (line 2))
  Downloading thr



In [3]:
from importlib.metadata import version

pkgs = ["openai",  # OpenAI API
        "tqdm",    # Progress bar
       ]

for p in pkgs:
    print(f"{p} version: {version(p)}")

openai version: 1.54.4
tqdm version: 4.66.6


## 测试OpenAI API
- 首先，我们需要测试OpenAI API是否已正确设置。

- 如果你还没有账户，可以前往OpenAI平台创建一个。
- 请注意，GPT-4 API并不是免费的，你需要向账户转入资金（详见计费页面）。
- 使用本笔记本代码生成约200条“被动语态”条目需要花费约0.13美元（13美分）。

#### 提供API密钥
- 首先，需要提供你的OpenAI API密钥，该密钥可以在API密钥页面找到。
- 请确保不要将此密钥与他人分享。
- 将此密钥（以“sk-...”开头）添加到此文件夹中的config.json文件中。

In [None]:
import json
from openai import OpenAI

# Load API key from a JSON file. 
# Make sure to replace "sk-..." with your actual API key from https://platform.openai.com/api-keys
with open("config.json", "r") as config_file:
    config = json.load(config_file)
    api_key = config["OPENAI_API_KEY"]

client = OpenAI(api_key=api_key)

In [None]:
#First, let's try the API with a simple example to make sure it works as intended:
def run_chatgpt(prompt, client, model="gpt-4-turbo"):
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.0,
    )
    return response.choices[0].message.content


# Prepare input
sentence = "I ate breakfast"
prompt = f"Convert the following sentence to passive voice: '{sentence}'"
run_chatgpt(prompt, client)

#### 创建JSON条目
接下来，我们加载需要修改的文件：



In [None]:
import json

json_file = "instruction-examples.json"

with open(json_file, "r") as file:
    json_data = json.load(file)
    
print("Number of entries:", len(json_data))

In [None]:
#我们会在一个小样本上尝试OpenAI的聊天API，以确保其能够正确运行。

for entry in json_data[:5]:
    text = entry["output"]
    prompt = f"Without adding any response or explanation, convert the following text to passive voice: {text}"
    
    print("\nInput:")
    print(">>", text)
    print("\nOutput:")
    print(">>", run_chatgpt(prompt, client))
    print("\n-------------------------")

- 现在，我们扩展代码，将生成的条目添加到json_data中，并加入一个进度条以监控过程。

In [None]:
from tqdm import tqdm  # a progress bar tool


for i, entry in tqdm(enumerate(json_data[:5]), total=len(json_data[:5])):
    text = entry["output"]
    prompt = f"Without adding any response or explanation, convert the following text to passive voice: {text}"
    json_data[i]["output_2"] = run_chatgpt(prompt, client)

如果以上所有步骤运行正常，我们可以对整个JSON数据集执行被动语态转换（此过程大约需要3分钟）。

In [None]:
for i, entry in tqdm(enumerate(json_data), total=len(json_data)):
    text = entry["output"]
    prompt = f"Without adding any response or explanation, convert the following text to passive voice: {text}"
    json_data[i]["output_2"] = run_chatgpt(prompt, client)

In [None]:
#保存文件

new_json_file = json_file.replace(".json", "-modified.json")


with open(new_json_file, "w") as file:
    json.dump(json_data, file, indent=4)  # "indent" for pretty-printing