# 第六章 文本转换

<div class="toc">
 <ul class="toc-item">
     <li><span><a href="#一引言" data-toc-modified-id="一、引言">一、引言</a></span></li>
     <li>
         <span><a href="#二文本翻译" data-toc-modified-id="二、文本翻译">二、文本翻译</a></span>
         <ul class="toc-item">
             <li><span><a href="#21-中文转西班牙语" data-toc-modified-id="2.1 中文转西班牙语">2.1 中文转西班牙语</a></span></li> 
             <li><span><a href="#22-识别语种" data-toc-modified-id="2.2 识别语种">2.2 识别语种</a></span></li>
             <li><span><a href="#23-多语种翻译" data-toc-modified-id="2.3 多语种翻译">2.3 多语种翻译</a></span></li>
             <li><span><a href="#24-同时进行语气转换" data-toc-modified-id="2.4 同时进行语气转换">2.4 同时进行语气转换</a></span></li>
             <li><span><a href="#25-通用翻译器" data-toc-modified-id="2.5 通用翻译器">2.5 通用翻译器</a></span></li>
             </ul>
         </li>
     <li><span><a href="#三语气与写作风格调整" data-toc-modified-id="三、语气与写作风格调整">三、语气与写作风格调整</a></span></li>
     <li><span><a href="#四文件格式转换" data-toc-modified-id="四、文件格式转换">四、文件格式转换</a></span></li>
     <li><span><a href="#五拼写及语法纠正" data-toc-modified-id="五、拼写及语法纠正">五、拼写及语法纠正</a></span></li>
     <li><span><a href="#六综合样例" data-toc-modified-id="六、综合样例">六、综合样例</a></span></li>
     </ul>
</div>

## 一、引言

LLM非常<span style="color:red">擅长将输入转换成不同的格式</span>，典型应用包括多语种文本翻译、拼写及语法纠正、语气调整、格式转换等。

本章节将介绍如何使用编程的方式，调用API接口来实现“文本转换”功能。

首先，我们需要OpenAI包，加载API密钥，定义getCompletion函数。

In [1]:
import os
from openai import OpenAI
from dotenv import load_dotenv, find_dotenv

# 读取本地/项目的环境变量。

# find_dotenv()寻找并定位.env文件的路径
# load_dotenv()读取该.env文件，并将其中的环境变量加载到当前的运行环境中  
# 如果你设置的是全局的环境变量，这行代码则没有任何作用。
_ = load_dotenv(find_dotenv())

# 获取环境变量 OPENAI_API_KEY
openai_api_key = os.getenv("OPENAI_API_KEY")

In [2]:
client = OpenAI(api_key=openai_api_key)

def get_completion(prompt, model="gpt-4o-mini"):
    '''
    prompt: 对应的提示词
    model: 调用的模型，默认为gpt-4o-mini
    '''
    # 调用API
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0,
    )
    # 返回结果
    return response.choices[0].message.content

## 二、文本翻译

### 2.1 中文转西班牙语

In [3]:
prompt = f"""
请将以下使用三个反引号标记中文翻译成西班牙语:

注意，不要生成除了目标文本翻译之外的其他内容。

翻译文本：```您好，我想订购一个搅拌机。```
"""
response = get_completion(prompt)
print(response)

Hola, me gustaría pedir una licuadora.


### 2.2 识别语种

In [8]:
prompt = f"""
请告诉我以下使用三个反引号标记的文本是什么语种。

注意，不要生成除了答案之外的其他内容。

文本：```Combien coûte le lampadaire?```
"""
response = get_completion(prompt)
print(response)

法语


### 2.3 多语种翻译

In [6]:
prompt = f"""
请将以下文本分别翻译成中文、英文、法语和西班牙语: 
```I want to order a basketball.```
"""
response = get_completion(prompt)
print(response)

中文: 我想订购一个篮球。

English: I want to order a basketball.

Français: Je veux commander un ballon de basket.

Español: Quiero pedir una pelota de baloncesto.


### 2.4 同时进行语气转换

In [7]:
prompt = f"""
请将以下文本翻译成中文，分别展示成正式与非正式两种语气: 
```Would you like to order a pillow?```
"""
response = get_completion(prompt)
print(response)

正式语气：
请问您是否需要订购枕头？

非正式语气：
你想买个枕头不？


### 2.5 通用翻译器

随着全球化与跨境商务的发展，交流的用户可能来自各个不同的国家，使用不同的语言，因此我们需要一个通用翻译器，识别各个消息的语种，并翻译成目标用户的母语，从而实现更方便的跨国交流。

In [5]:
user_messages = [
  "La performance du système est plus lente que d'habitude.",  # System performance is slower than normal         
  "Mi monitor tiene píxeles que no se iluminan.",              # My monitor has pixels that are not lighting
  "Il mio mouse non funziona",                                 # My mouse is not working
  "Mój klawisz Ctrl jest zepsuty",                             # My keyboard has a broken control key
  "我的屏幕在闪烁"                                             # My screen is flashing
]

In [6]:
for issue in user_messages:
    prompt = f"告诉我以下用三个反引号标记的文本是什么语种，请直接输出语种，如法语，无需输出标点符号: ```{issue}```"
    lang = get_completion(prompt)
    print(f"原始消息 ({lang}): {issue}\n")

    prompt = f"""
    请将以下用三个反引号标记的消息分别翻译成英文和中文。

    输出请按照以下格式：
    中文翻译：xxx
    英文翻译：yyy

    不要生成除了指定格式之外的任何内容
    
    消息：```{issue}```
    """
    response = get_completion(prompt)
    print(response, "\n=========================================")

原始消息 (法语): La performance du système est plus lente que d'habitude.

中文翻译：系统的性能比平时慢。  
英文翻译：The system's performance is slower than usual. 
原始消息 (西班牙语): Mi monitor tiene píxeles que no se iluminan.

中文翻译：我的显示器有一些像素不亮。  
英文翻译：My monitor has pixels that do not light up. 
原始消息 (意大利语): Il mio mouse non funziona

中文翻译：我的鼠标不工作  
英文翻译：My mouse is not working 
原始消息 (波兰语): Mój klawisz Ctrl jest zepsuty

中文翻译：我的Ctrl键坏了  
英文翻译：My Ctrl key is broken 
原始消息 (中文): 我的屏幕在闪烁

中文翻译：我的屏幕在闪烁  
英文翻译：My screen is flickering 


## 三、语气与写作风格调整

写作的语气往往会根据受众对象而有所调整。例如，对于工作邮件，我们常常需要使用正式语气与书面用词，而对同龄朋友的微信聊天，可能更多地会使用轻松、口语化的语气。

In [13]:
prompt = f"""
将以下文本翻译成商务信函的格式: 
```小老弟，我小羊，上回你说咱部门要采购的显示器是多少寸来着？```
"""
response = get_completion(prompt)
print(response)

以下为翻译后的商务信函格式：

---
[您的姓名]  
[您的职位]  
[您的部门]  
[日期]

[收件人姓名] 先生/女士 敬启：

关于此前沟通中提及的部门显示器采购事宜，特此致信确认所需显示器的具体尺寸规格。如蒙告知相关参数要求，以便推进后续采购流程，将不胜感激。

此致  
商祺！

[您的姓名] 谨上  
[您的职位]  
[您的部门]  

---

说明：  
1. 保留了核心信息（显示器采购、尺寸确认），但采用正式措辞  
2. 使用标准商务信函结构（发件人信息、日期、尊称、事由说明、请求确认、敬辞）  
3. 去除了口语化称呼（"小老弟"）和自称（"小羊"），改为规范署名  
4. 增加了推动事务进展的正式请求表述  
5. 可根据实际情况补充采购编号、预算代码等详细信息


## 四、文件格式转换

ChatGPT非常擅长不同格式之间的转换，例如JSON到HTML、XML、Markdown等。在下述例子中，我们有一个包含餐厅员工姓名和电子邮件的列表的JSON，我们希望将其从JSON转换为HTML。

In [14]:
data_json = { "resturant employees" :[ 
    {"name":"Shyam", "email":"shyamjaiswal@gmail.com"},
    {"name":"Bob", "email":"bob32@gmail.com"},
    {"name":"Jai", "email":"jai87@gmail.com"}
]}

In [20]:
prompt = f"""
请将以下用尖括号标记的Python字典转换为HTML表格，要求保留表格的标题和列名。

请注意，不要生成指定html表格之外的任何内容。
生成的html表格不要是markdown格式，不需要```html标记。
请生成标准的html格式。

<{data_json}>
"""
response = get_completion(prompt)
print(response)

<table>
<caption>resturant employees</caption>
<thead>
<tr>
<th>name</th>
<th>email</th>
</tr>
</thead>
<tbody>
<tr>
<td>Shyam</td>
<td>shyamjaiswal@gmail.com</td>
</tr>
<tr>
<td>Bob</td>
<td>bob32@gmail.com</td>
</tr>
<tr>
<td>Jai</td>
<td>jai87@gmail.com</td>
</tr>
</tbody>
</table>


In [22]:
from IPython.display import display, Markdown, Latex, HTML, JSON
display(HTML(response))

name,email
Shyam,shyamjaiswal@gmail.com
Bob,bob32@gmail.com
Jai,jai87@gmail.com


## 五、拼写及语法纠正

拼写及语法的检查与纠正是一个十分常见的需求，特别是使用非母语语言，例如，在论坛发帖时，或发表英文论文时，校对是一件十分重要的事情。

下述例子给定了一个句子列表，其中有些句子存在拼写或语法问题，有些则没有，我们循环遍历每个句子，要求模型校对文本，如果正确则输出“未发现错误”，如果错误则输出纠正后的文本。

In [23]:
text = [ 
  "The girl with the black and white puppies have a ball.",  # The girl has a ball.
  "Yolanda has her notebook.", # ok
  "Its going to be a long day. Does the car need it’s oil changed?",  # Homonyms
  "Their goes my freedom. There going to bring they’re suitcases.",  # Homonyms
  "Your going to need you’re notebook.",  # Homonyms
  "That medicine effects my ability to sleep. Have you heard of the butterfly affect?", # Homonyms
  "This phrase is to cherck chatGPT for spelling abilitty"  # spelling
]

In [25]:
for i in range(len(text)):
    time.sleep(20)
    prompt = f"""请校对并更正以下文本，注意纠正文本保持原始语种，无需输出原始文本。
    如果您没有发现任何错误，请输出“未发现错误”。
    
    例如：
    输入：I are happy.
    输出：I am happy.
    ```{text[i]}```"""
    response = get_completion(prompt)
    print(i, response)

0 The girl with the black and white puppies has a ball.
1 未发现错误。
2 It's going to be a long day. Does the car need its oil changed?
3 There goes my freedom. They're going to bring their suitcases.
4 You're going to need your notebook.
5 That medicine affects my ability to sleep. Have you heard of the butterfly effect?
6 This phrase is to check ChatGPT for spelling ability.


以下是一个简单的语法纠错示例（译注：与 Grammarly 功能类似），输入文本为一段关于熊猫玩偶的评价，输出为纠正后的文本。本例使用的 Prompt 较为简单，你也可以进一步要求进行语调的更改。

In [1]:
text = f"""
Got this for my daughter for her birthday cuz she keeps taking \
mine from my room.  Yes, adults also like pandas too.  She takes \
it everywhere with her, and it's super soft and cute.  One of the \
ears is a bit lower than the other, and I don't think that was \
designed to be asymmetrical. It's a bit small for what I paid for it \
though. I think there might be other options that are bigger for \
the same price.  It arrived a day earlier than expected, so I got \
to play with it myself before I gave it to my daughter.
"""

In [6]:
prompt = f"""
校对并更正以下用三个反引号标注的商品评论。

请满足一下要求：
1.保留原始评论的语种。
2.输出内容只包含校对后的评论内容，不包含其他任何内容。
3.保持原始评论的语言风格。
4.输出中不需要加```标记

评论：```{text}```
"""
response = get_completion(prompt)
print(response)

Got this for my daughter for her birthday because she kept taking mine from my room. Yes, adults like pandas too. She takes it everywhere with her, and it's super soft and cute. One of the ears sits a bit lower than the other, but I don't think that was intentional. It's slightly smaller than I expected for the price. There might be other options offering bigger sizes at the same price point. It arrived a day earlier than expected, so I got to play with it myself before giving it to my daughter.


引入 ```Redlines``` 包，详细显示并对比纠错过程：

In [7]:
from redlines import Redlines
from IPython.display import display, Markdown

diff = Redlines(text,response)
display(Markdown(diff.output_markdown))

Got this for my daughter for her birthday <span style='color:red;font-weight:700;text-decoration:line-through;'>cuz </span><span style='color:green;font-weight:700;'>because </span>she <span style='color:red;font-weight:700;text-decoration:line-through;'>keeps </span><span style='color:green;font-weight:700;'>kept </span>taking mine from my <span style='color:red;font-weight:700;text-decoration:line-through;'>room.  </span><span style='color:green;font-weight:700;'>room. </span>Yes, adults <span style='color:red;font-weight:700;text-decoration:line-through;'>also </span>like pandas <span style='color:red;font-weight:700;text-decoration:line-through;'>too.  </span><span style='color:green;font-weight:700;'>too. </span>She takes it everywhere with her, and it's super soft and <span style='color:red;font-weight:700;text-decoration:line-through;'>cute.  </span><span style='color:green;font-weight:700;'>cute. </span>One of the ears <span style='color:red;font-weight:700;text-decoration:line-through;'>is </span><span style='color:green;font-weight:700;'>sits </span>a bit lower than the other, <span style='color:red;font-weight:700;text-decoration:line-through;'>and </span><span style='color:green;font-weight:700;'>but </span>I don't think that was <span style='color:red;font-weight:700;text-decoration:line-through;'>designed to be asymmetrical. </span><span style='color:green;font-weight:700;'>intentional. </span>It's <span style='color:red;font-weight:700;text-decoration:line-through;'>a bit small </span><span style='color:green;font-weight:700;'>slightly smaller than I expected </span>for <span style='color:red;font-weight:700;text-decoration:line-through;'>what I paid for it though. I think there </span><span style='color:green;font-weight:700;'>the price. There </span>might be other options <span style='color:red;font-weight:700;text-decoration:line-through;'>that are </span><span style='color:green;font-weight:700;'>offering </span>bigger <span style='color:red;font-weight:700;text-decoration:line-through;'>for </span><span style='color:green;font-weight:700;'>sizes at </span>the same <span style='color:red;font-weight:700;text-decoration:line-through;'>price.  </span><span style='color:green;font-weight:700;'>price point. </span>It arrived a day earlier than expected, so I got to play with it myself before <span style='color:red;font-weight:700;text-decoration:line-through;'>I gave </span><span style='color:green;font-weight:700;'>giving </span>it to my daughter.

## 六、综合样例
下述例子展示了同一段评论，用一段prompt同时进行文本翻译+拼写纠正+风格调整+格式转换。

In [8]:
text = f"""
Got this for my daughter for her birthday cuz she keeps taking \
mine from my room.  Yes, adults also like pandas too.  She takes \
it everywhere with her, and it's super soft and cute.  One of the \
ears is a bit lower than the other, and I don't think that was \
designed to be asymmetrical. It's a bit small for what I paid for it \
though. I think there might be other options that are bigger for \
the same price.  It arrived a day earlier than expected, so I got \
to play with it myself before I gave it to my daughter.
"""


**Review of Panda Plush Toy**

**Introduction**

I bought this panda plush toy as a birthday gift for my daughter, who loves pandas as much as I do. This toy is very soft and adorable, and my daughter enjoys carrying it around with her everywhere. However, I also noticed some flaws in the toy’s design and size that made me question its value for money.

**Appearance and Quality**

The toy has a realistic black and white fur pattern and a cute expression on its face. It is made of high-quality material that feels smooth and gentle to the touch. One of the ears is slightly lower than the other, which may be a manufacturing defect or an intentional asymmetry to make it look more natural. The toy is also quite small, measuring about 12 inches in height. I expected it to be bigger for the price I paid, as I have seen other plush toys that are larger and cheaper.

**Delivery and Service**

The toy arrived a day earlier than the estimated delivery date, which was a pleasant surprise. It was well-packaged and in good condition when I received it. The seller also included a thank-you note and a coupon for my next purchase, which I appreciated.

**Conclusion**

Overall, this panda plush toy is a lovely and cuddly gift for any panda lover, especially children. It has a high-quality feel and a charming appearance, but it also has some minor flaws in its design and size that may affect its value. I would recommend this toy to anyone who is looking for a small and cute panda plush, but not to those who want a large and realistic one.

In [9]:
prompt = f"""
针对以下三个反引号之间的英文评论文本，
首先进行拼写及语法纠错，
然后将其转化成中文，
再将其转化成优质淘宝评论的风格，从各种角度出发，分别说明产品的优点与缺点，并进行总结。
润色一下描述，使评论更具有吸引力。
输出结果格式为：
【优点】xxx
【缺点】xxx
【总结】xxx
注意，只需按照输出要求输出，不要输出其他内容。注意分段输出。
将结果输出成Markdown格式。
```{text}```
"""
response = get_completion(prompt)
display(Markdown(response))

```markdown
【优点】  
- 毛绒触感堪比云朵，女儿爱不释手到走哪带哪  
- 萌到犯规的熊猫脸连大人都想私藏  
- 物流超给力提前惊喜送达，私享把玩时间  
- 送礼自用两相宜，俘获全年龄段少女心  

【缺点】  
- 耳朵高低肩实属做工小遗憾，非个性设计  
- 体型比预期娇小，性价比有待提升  
- 同价位区间有更大尺寸竞品可选  

【总结】  
虽然存在细微做工瑕疵和尺寸落差，但无敌软萌的手感与治愈系颜值完全征服大小朋友！作为送礼佳品依旧值得推荐，期待推出加大版满足不同需求~
```