<a href="https://colab.research.google.com/github/zinojeng/openai/blob/main/translation_agent_litellm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Translation Agent: Agentic translation using reflection workflow

這是基於 https://github.com/andrewyng/translation-agent 的 Colab 改寫版本

1. 先用 LLM 做第一次翻譯
2. 用 LLM 反思剛剛的第一次翻譯結果
3. 根據反思結果，用 LLM 翻譯第二次

- 原版會針對長文本拆段(是用 LangChain 的 RecursiveCharacterTextSplitter) 避免超過單次呼叫限制，這個簡化版不會處理。因此文本長度限制要看模型，例如 gpt-4o 是 128k tokens。
- 這個版本改用 LiteLLM 處理 LLM API 發送，方便切換不同模型使用

作者: ihower https://ihower.tw

## 1. 設定 OpenAI API Key (若已設定過 colab 密鑰就不用在設定了)

請先新增 google colab 要用的 OpenAI API 密鑰(請點左邊側欄的鑰匙)，名稱寫 openai_api_key，值是 sk- 開頭的 key

OpenAI API key 請去 https://platform.openai.com/ 註冊申請，綁個信用卡儲值 USD 5 美金。

設定好這一次，以後我分享的 colab 就都不用再設定了

<img src="https://ihower.tw/images/google-colab-secret.jpg" width="400">

In [4]:
from google.colab import userdata
import os

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

## 2. 要從什麼語言翻譯到什麼語言

這裏我寫從英文翻譯成繁體中文

In [22]:
source_lang = "English"
target_lang = "Traditional Chinese"
country = "Taiwan"

## 3. 給予要翻譯的原文

你可以把文字檔 source.txt 上傳到 colab 目錄下(點左邊側欄的目錄，拖拉即可上傳)，或是直接改以下 source_text 變數

In [32]:
pip install PyMuPDF

Collecting PyMuPDF
  Downloading PyMuPDF-1.24.7-cp310-none-manylinux2014_x86_64.whl (3.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.5/3.5 MB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting PyMuPDFb==1.24.6 (from PyMuPDF)
  Downloading PyMuPDFb-1.24.6-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (15.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.7/15.7 MB[0m [31m41.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyMuPDFb, PyMuPDF
Successfully installed PyMuPDF-1.24.7 PyMuPDFb-1.24.6


In [None]:
import fitz  # PyMuPDF

def read_pdf(file_path):
    # 打開 PDF 檔案
    pdf_document = fitz.open(file_path)
    # 初始化一個空字符串來存儲所有頁面的文本
    text = ""
    # 遍歷每一頁並提取文本
    for page_num in range(pdf_document.page_count):
        page = pdf_document.load_page(page_num)
        text += page.get_text()
    return text

# 檢查檔案是否存在並讀取內容
file_path = '/content/source.pdf'
if os.path.exists(file_path):
    source_text = read_pdf(file_path)
    print(source_text)

In [23]:
import os

if os.path.exists('/content/source.txt'):
    with open('source.txt', 'r', encoding='utf-8') as file:
        source_text = file.read()
else:
    source_text = """
Shall I compare thee to a summer’s day?
Thou art more lovely and more temperate.
Rough winds do shake the darling buds of May,
And summer’s lease hath all too short a date.
"""

## 4. 以下就一路執行到底即可

In [None]:
!pip install litellm

from litellm import completion

In [34]:
import requests
import json

# 這是呼叫 OpenAI API 的函式
def get_completion(user_prompt, system_message = "You are a helpful assistant.", model="gpt-4o", temperature=0.3):
  messages = [
    {"role": "system", "content": system_message },
    {"role": "user", "content": user_prompt },
  ]

  response = completion(model=model, messages=messages)

  return response["choices"][0]["message"]["content"]

In [35]:
# 第一次翻譯
def one_chunk_initial_translation(model, source_text):
    system_message = f"You are an expert linguist, specializing in translation from {source_lang} to {target_lang}."

    translation_prompt = f"""This is an {source_lang} to {target_lang} translation, please provide the {target_lang} translation for this text. \
Do not provide any explanations or text apart from the translation.
{source_lang}: {source_text}

{target_lang}:"""

    translation = get_completion(translation_prompt, system_message=system_message, model=model)

    return translation

In [36]:
# 根據原文，反思第一次翻譯的結果
def one_chunk_reflect_on_translation(model, source_text, translation_1):
    system_message = f"You are an expert linguist specializing in translation from {source_lang} to {target_lang}. \
You will be provided with a source text and its translation and your goal is to improve the translation."

    prompt = f"""Your task is to carefully read a source text and a translation from {source_lang} to {target_lang}, and then give constructive criticism and helpful suggestions to improve the translation. \
The final style and tone of the translation should match the style of {target_lang} colloquially spoken in {country}.

The source text and initial translation, delimited by XML tags <SOURCE_TEXT></SOURCE_TEXT> and <TRANSLATION></TRANSLATION>, are as follows:

<SOURCE_TEXT>
{source_text}
</SOURCE_TEXT>

<TRANSLATION>
{translation_1}
</TRANSLATION>

When writing suggestions, pay attention to whether there are ways to improve the translation's \n\
(i) accuracy (by correcting errors of addition, mistranslation, omission, or untranslated text),\n\
(ii) fluency (by applying {target_lang} grammar, spelling and punctuation rules, and ensuring there are no unnecessary repetitions),\n\
(iii) style (by ensuring the translations reflect the style of the source text and takes into account any cultural context),\n\
(iv) terminology (by ensuring terminology use is consistent and reflects the source text domain; and by only ensuring you use equivalent idioms {target_lang}).\n\

Write a list of specific, helpful and constructive suggestions for improving the translation.
Each suggestion should address one specific part of the translation.
Output only the suggestions and nothing else."""

    reflection = get_completion(prompt, system_message=system_message, model=model)
    return reflection

In [37]:
# 根據原文, 第一次翻譯 和 反思結果，做第二次翻譯
def one_chunk_improve_translation(model, source_text, translation_1, reflection):
    system_message = f"You are an expert linguist, specializing in translation editing from {source_lang} to {target_lang}."

    prompt = f"""Your task is to carefully read, then edit, a translation from {source_lang} to {target_lang}, taking into
account a list of expert suggestions and constructive criticisms.

The source text, the initial translation, and the expert linguist suggestions are delimited by XML tags <SOURCE_TEXT></SOURCE_TEXT>, <TRANSLATION></TRANSLATION> and <EXPERT_SUGGESTIONS></EXPERT_SUGGESTIONS> \
as follows:

<SOURCE_TEXT>
{source_text}
</SOURCE_TEXT>

<TRANSLATION>
{translation_1}
</TRANSLATION>

<EXPERT_SUGGESTIONS>
{reflection}
</EXPERT_SUGGESTIONS>

Please take into account the expert suggestions when editing the translation. Edit the translation by ensuring:

(i) accuracy (by correcting errors of addition, mistranslation, omission, or untranslated text),
(ii) fluency (by applying {target_lang} grammar, spelling and punctuation rules and ensuring there are no unnecessary repetitions), \
(iii) style (by ensuring the translations reflect the style of the source text)
(iv) terminology (inappropriate for context, inconsistent use), or
(v) other errors.

Output only the new translation and nothing else."""

    translation_2 = get_completion(prompt, system_message, model=model)

    return translation_2

In [38]:
# 將以上三個函式整合再一起
def one_chunk_translate_text(model, source_text):
    print("## 第一次翻譯 ##\n")
    translation_1 = one_chunk_initial_translation(model, source_text)
    print(translation_1)
    print("\n## 反思第一次的翻譯 ##\n")
    reflection = one_chunk_reflect_on_translation(model, source_text, translation_1)
    print(reflection)
    print("\n## 根據反思做第二次翻譯 ##\n")
    translation_2 = one_chunk_improve_translation(model, source_text, translation_1, reflection)
    print(translation_2)
    return translation_2

### 5. 呼叫最後整合的 function 進行翻譯

如果要用其他家的模型，可以參考我另一個 Colab 設定好別家的 API Key，然後修改下方的模型參數即可: https://colab.research.google.com/drive/1Ci9gbq5e9HqHK9p0kJYxmrJvF1eDrKc0?usp=sharing

In [39]:
result = one_chunk_translate_text("gpt-4o", source_text)

## 第一次翻譯 ##

特朗普遇襲事件打亂關鍵選區選戰

唐納‧川普遇襲的事件打亂了競選活動，民主黨的競選活動撤下了廣告並暫停了募款呼籲，而共和黨，包括這位前總統，在短暫停頓後繼續推進。
總統喬‧拜登的競選活動，以及許多領先的民主黨人，在週六晚上的槍擊事件後迅速停止了數字廣告和籌款信息，這次槍擊事件造成一人死亡，並讓整個國家震驚。共和黨人也短暫停止了這些活動，但在週日下午恢復了更典型的訊息傳遞。
不同的反應突顯出在這次前所未有的暴力事件之後，應對一場激烈競爭的政治競選活動的巨大挑戰。
“無論你是共和黨還是民主黨，這都是一個相當令人震驚的事件。而且它發生在競選的每週日程的非常奇怪的時刻——星期六晚上。我認為每個人都在試圖根據公眾和選民對此的反應來調整他們的回應方式，我認為目前還太早下結論，”密西根州共和黨前主席、共和黨策略師傑森‧羅說。
共和黨人迅速團結在特朗普周圍，特朗普在週日的募款呼籲中既呼籲“團結”，又誓言永不投降。
這位前總統的競選活動曾短暫地放慢了穩定的募款信息發送，但在週日下午改變了方針。幾封電子郵件和短信突出了特朗普在槍擊事件發生後立刻被美國特勤局特工包圍時舉起拳頭的標志性照片。他的競選網站也在週日晚間開始轉向一個展示該照片的募款頁面。
籌款信息的重啟與特朗普前往密爾沃基參加週一開幕的共和黨全國代表大會同時進行。在密爾沃基，特朗普的支持者也聚集起來向他致敬，而包括佛羅里達州共和黨眾議員馬特‧蓋茨在內的盟友則在其他地方計劃集會。
支持者們希望表達對“這位他們覺得一直在傳遞他們訊息、背負他們重擔的人的感激之情——這幾乎讓他付出了生命的代價，”共和黨策略師和前特朗普政府任命官員馬修‧巴特利特說。“這是一種非常強烈的情感，當你將它轉變為政治時，是最具政治說服力的情感之一。”
與特朗普的籌款推動相比，根據平台數據，拜登的競選活動迅速停止了在Facebook和Instagram上的數字廣告，並且到週日晚仍未恢復。這次競選活動還停止了籌款電子郵件和短信。包括民主黨國會競選委員會和民主黨參議員競選委員會等主要民主黨組織，也暫停了他們的籌款信息和數字廣告。
拜登“試圖樹立一個榜樣，做出總統應該做的處理方式。他選擇做總統而不是總統候選人——這是目前需要做的正確事情，”波士頓的民主黨策略師瑪麗‧安妮‧馬什說。
民主黨人的日程也被打亂了：原定於週一正式訪問德州的拜登延後了行程

## 6. 印出最後翻譯結果:

In [21]:
print(result)

US presidential candidate Trump was shot during a rally in Pennsylvania on the 13th, injuring his right ear, causing it to bleed. One person was killed and two others were seriously injured at the scene. The 20-year-old shooter, Thomas Matthew Crooks, was killed on the spot. Following the shooting, the shooter's identity and past photos were disclosed. Crooks, who was known to be gifted in mathematics, used a gun bought by his father, and police found explosives in his car and residence.

The shooting of Trump shocked the world! The FBI swiftly identified the shooter as Crooks, a 20-year-old man from Bethel Park, Pennsylvania. He was positioned in a building 180 meters away from Trump's rally venue and opened fire multiple times as Trump delivered his speech. However, the motive for the crime remains unclear. Witnesses reported seeing the shooter climb onto the roof with a gun and immediately informed security personnel, but the tragedy couldn't be prevented; the entire incident lasted