# Idea: Generate a prompt and obtain output from a language model (LM). Translate this prompt into other languages, and get the output from the same LM. Translate these results back into English. Finally, compare the outputs.

Additional Considerations:

- Use a LM that has been trained on multiple languages (such as GPT-3).
- The translation model (TM) should be unfiltered (no profanity filters, for instance).
- Consider using a model that evaluates the outputs of the LM (e.g. sentiment). Compare the original output and the translated output.
- If the outputs are similar, then the LM maintains fidelity across languages.
- If they are very similar (especially for API models), there might be a translation process happening in the background.

To-Do List:
- Find a translation model without profanity filters.
    - Currently, Helsinki-NLP is being used, but it filters language and its performance is suboptimal.
    - Google's translation API was tested, but it's expensive.
- Identify instances where the model provides an answer in one language but declines to do so in another.
- Test with more or different embedding models/classifiers.
    - So far, only an emotion classifier has been tested.
    - Perhaps an embedding model could be used, even though the dimensions might not be intuitively interpretable

In [29]:
import os
import time
import pandas as pd
import plotly.graph_objects as go

import openai
import requests

In [2]:
with open('/Users/tilman/.hfkey', 'r') as f:
    HF_API_TOKEN = f.read().strip()
headers = {"Authorization": f"Bearer {HF_API_TOKEN}"}

with open("/Users/tilman/.openaikey", "r") as file:
    OAI_API_TOKEN = file.read().strip()

openai.api_key = OAI_API_TOKEN

In [3]:
def translate_to_eng(text: str = "This is a test, pleae replace with actual string in another language."):
	HF_API_URL = "https://api-inference.huggingface.co/models/Helsinki-NLP/opus-mt-mul-en"
	payload = {
		"inputs": text,
        "parameters": {"truncation": "only_first"}  # This line tells the API to truncate the input if it's too long

	}
	response = requests.post(HF_API_URL, headers=headers, json=payload)
	return response.json()

def get_translation_in_eng(text:str):
	response = translate_to_eng(text)
	
	while 'error' in response and 'is currently loading' in response['error']:
		estimated_time = response.get('estimated_time', 20)
		print(f"Model is loading, waiting for {estimated_time} seconds.")
		time.sleep(estimated_time)
		response = translate_to_eng(text)

	try:
		return response[0]["translation_text"]
	except:
		print(response)
		return None

print(get_translation_in_eng("Das ist ein Test."))

Model is loading, waiting for 20.0 seconds.
That's a test.


In [4]:
def translate_to_mul(text: str, target_language: str):
    HF_API_URL = "https://api-inference.huggingface.co/models/Helsinki-NLP/opus-mt-en-mul"
    text_with_token = ">>" + target_language + "<< " + text
    payload = {
        "inputs":text_with_token,
        "parameters": {"truncation": "only_first"}  # This line tells the API to truncate the input if it's too long

    }
    response = requests.post(HF_API_URL, headers=headers, json=payload)
    return response.json()

def get_translation_in_language(text: str, target_language: str):
    response = translate_to_mul(text, target_language)
    
    while 'error' in response and 'is currently loading' in response['error']:
        estimated_time = response.get('estimated_time', 20)
        print(f"Model is loading, waiting for {estimated_time} seconds.")
        time.sleep(estimated_time)
        response = translate_to_mul(text, target_language)
    try:
        return response[0]['translation_text']
    except:
        print(response)
        return None

print(get_translation_in_language("This is a test.", "spa"))  # English to Spanish


Model is loading, waiting for 20.0 seconds.
Esto es una prueba.


In [5]:
# create a function that takes a prompt and returns a completion
def complete(prompt):
    max_tokens = int(512 * 0.9)
    OAI_MODEL = "gpt-3.5-turbo"
    return openai.ChatCompletion.create(
    model=OAI_MODEL,
    # add max length
    messages=[
        # {"role": "system", "content": "You are a creative assistant."},
        {"role": "user", "content": prompt}
    ],
    n=1,
    max_tokens=max_tokens,
    )

def get_answer(prompt):
    return complete(prompt).choices[0].message.content

In [6]:
complete_ouput = complete("What is the meaning of life?")
answer = complete_ouput.choices[0].message.content
print(answer)

The meaning of life is a profound and philosophical question that has been approached differently by various cultures, religions, and individuals throughout history. Answers to this question can vary greatly depending on personal beliefs and perspectives. Some people find meaning in fulfilling personal goals, relationships, or experiences, while others seek meaning through spirituality, religion, or the pursuit of knowledge. Ultimately, the meaning of life may be subjective and unique to each individual.


In [26]:

def emo(text: str = "I like you. I love you"):
	EMO_API_URL = "https://api-inference.huggingface.co/models/arpanghoshal/EmoRoBERTa"
	payload = {
		"inputs": text,
		"parameters": {"truncation": "only_first"}  # This line tells the API to truncate the input if it's too long
	}
	response = requests.post(EMO_API_URL, headers=headers, json=payload)
	return response.json()
	
emo()

[[{'label': 'love', 'score': 0.9666957855224609},
  {'label': 'admiration', 'score': 0.017801644280552864},
  {'label': 'joy', 'score': 0.01078831683844328},
  {'label': 'approval', 'score': 0.0012105630012229085},
  {'label': 'caring', 'score': 0.0005521224229596555},
  {'label': 'excitement', 'score': 0.0005357517511583865},
  {'label': 'amusement', 'score': 0.0005129202618263662},
  {'label': 'gratitude', 'score': 0.0004403434577398002},
  {'label': 'desire', 'score': 0.0002855785423889756},
  {'label': 'anger', 'score': 0.00021759387163911015},
  {'label': 'optimism', 'score': 0.00012963690096512437},
  {'label': 'disapproval', 'score': 0.00012925038754474372},
  {'label': 'grief', 'score': 9.996540029533207e-05},
  {'label': 'annoyance', 'score': 6.998069875407964e-05},
  {'label': 'pride', 'score': 6.350891635520384e-05},
  {'label': 'curiosity', 'score': 5.861332101630978e-05},
  {'label': 'neutral', 'score': 4.916170291835442e-05},
  {'label': 'disgust', 'score': 4.371397881186

In [27]:
def polyglot_prompt(prompt: str, lang_list = ["eng", "deu"]):
    
    output_dict = {}

    for lang in lang_list:
        # TODO: sometime the models need to be loaded - this results in a key error, which is not handled yet
        output_dict[lang] = {}
        output_dict[lang]["prompt"] = prompt
        output_dict[lang]["lang_prompt"] = get_translation_in_language(prompt, lang)
        output_dict[lang]["lang_output"] = get_answer(output_dict[lang]["lang_prompt"])
        output_dict[lang]["eng_output"] = get_translation_in_eng(output_dict[lang]["lang_output"])
        output_dict[lang]["emotion"] = emo(output_dict[lang]["eng_output"])

    return output_dict

lang_list = ["eng", "spa", "deu", "fra", "rus"]
output_dict = polyglot_prompt("What is the meaning of life?", lang_list=lang_list)

Model is loading, waiting for 20.0 seconds.
Model is loading, waiting for 20.0 seconds.


In [19]:
# convert lang_list into a list of flag emojis
flag_emojis = {
    "eng": "🇬🇧",
    "spa": "🇪🇸",
    "deu": "🇩🇪",
    "fra": "🇫🇷",
    "rus": "🇷🇺",
}


In [25]:
import pandas as pd

df = pd.DataFrame()

# Loop over every language in the dictionary
for lang, content in output_dict.items():
    # The emotions are a list of dictionaries. Convert this list to a dictionary of lists.
    emotions_dict = {item['label']: item['score'] for item in content['emotion'][0]}
    
    # Add the readable text to the dictionary
    lang_prompt = content['lang_prompt']
    lang_prompt = "<br>".join([lang_prompt[i:i+100] for i in range(0, len(lang_prompt), 100)])
    
    eng_output = content["eng_output"]
    eng_output = "<br>".join([eng_output[i:i+100] for i in range(0, len(eng_output), 100)])
    
    readable_text = lang_prompt + "<br><br>" + eng_output
    emotions_dict['readable_text'] = readable_text

    # Convert this dictionary to a pandas DataFrame and transpose it to have emotions as columns
    emotions_df = pd.DataFrame(emotions_dict, index=[lang]).T

    # Append this DataFrame to the main one
    df = pd.concat([df, emotions_df], axis=1)

fig = go.Figure()

for col in df.columns:
    # separate emotions from readable_text
    emotion_df = df[df.index != 'readable_text']
    text_df = df[df.index == 'readable_text']
    
    fig.add_trace(
        go.Bar(
            x=emotion_df.index,
            y=emotion_df[col],
            name=flag_emojis[col],
            hovertext=text_df[col].values[0],  # get the readable text
        )
    )

# Set title and labels
fig.update_layout(
    title_text='Emotion Scores by Language for the prompt: ' + output_dict['eng']['prompt'],
    xaxis_title='Emotion',
    yaxis_title='Score',
    barmode='group'  # bars are grouped together by emotion
)

# Show the plot
fig.show()
