In [1]:
import os
import openai
from dotenv import load_dotenv

In [2]:
load_dotenv()  # Load environment variables from .env file

# Ensure the OPENAI_API_KEY is set
openai.api_key = os.getenv("OPENAI_API_KEY")
if openai.api_key is None:
    raise ValueError("OPENAI_API_KEY environment variable not set")

client = openai.OpenAI()

# Text to speech
The Audio API provides a speech endpoint based on our GPT-4o mini TTS (text-to-speech) model.

In [None]:
meus_exemplos = "arquivos/meus_exemplos/exemplo1.mp3"

msg = "Reality is the sum or aggregate of everything in existence; everything that is not imaginary. Different cultures and academic disciplines conceptualize it in various ways."

In [4]:
response = client.audio.speech.create(
    model="tts-1",
    voice='echo',
    input=msg
)
response.write_to_file(meus_exemplos)

Example with streaming.

In [6]:
meus_exemplos = meus_exemplos = "arquivos/meus_exemplos/exemplo2.mp3"

with client.audio.speech.with_streaming_response.create(
    model="gpt-4o-mini-tts",
    voice="coral",
    input="Today is a wonderful day to build something people love!",
    instructions="Speak in a cheerful and positive tone.",
) as response:
    response.stream_to_file(meus_exemplos)

For intelligent realtime applications, it is recommended to use the gpt-4o-mini-tts model, the newest and most reliable text-to-speech model. We can prompt the model to control aspects of speech, including:
- Accent
- Emotional range
- Intonation
- Impressions
- Speed of speech
- Tone
- Whispering

The other text-to-speech models are tts-1 and tts-1-hd. The tts-1 model provides lower latency, but at a lower quality than the tts-1-hd model.

# Speech to text
The Audio API provides two speech to text endpoints:
- transcriptions
- translations

All endpoints can be used to:
- Transcribe audio into whatever language the audio is in.
- Translate and transcribe the audio into English.

Historically, both endpoints have been backed by our open source Whisper model (whisper-1). The transcriptions endpoint now also supports higher quality model snapshots, with limited parameter support:
- gpt-4o-mini-transcribe
- gpt-4o-transcribe

## Transcription
The transcriptions API takes as input the audio file you want to transcribe and the desired output file format for the transcription of the audio.

In [8]:
audio1 = "arquivos/audio/audio_asimov.mp3"

audio_file= open(audio1, "rb")

transcription = client.audio.transcriptions.create(
    model="gpt-4o-mini-transcribe", 
    file=audio_file,
    prompt="Essa é a transcrição de uma aula da Asimov Academy. O nome do professor é Rodrigo Soares Tadevald."
)

print(transcription.text)

Seja muito bem-vindo ou bem-vinda ao nosso curso completo de Python aqui da Asimov Academy. Eu e minha equipe ficamos muito felizes que vocês tenham escolhido iniciar no mundo da programação, especificamente com a linguagem Python, aqui com a gente. Pode ter certeza que a gente colocou muito carinho e muita dedicação para construir esse material. Além dos conhecimentos técnicos que a gente vai apresentar sobre a linguagem e programação em si, eu também coloquei grande parte da minha experiência e minha vivência para compartilhar com vocês ao longo desse treinamento. Para quem não me conhece ainda, meu nome é Rodrigo Soares Tadeval e eu não sou programador de origem. Na verdade, eu me formei como engenheiro e eu utilizei a programação dentro da minha carreira no mercado financeiro como analista de dados. E essa é a grande mágica da programação. Vocês não precisam utilizar ela única e exclusivamente para desenvolver software. Na verdade, ela pode ser usada para o que vocês quiserem no di

## Translation
The translations API takes as input the audio file in any of the supported languages and transcribes, if necessary, the audio into English. This differs from our /Transcriptions endpoint since the output is not in the original input language and is instead translated to English text. This endpoint supports only the whisper-1 model.

In [9]:
audio2 = "arquivos/audio/fala.mp3"
audio_file = open(audio2, "rb")

translation = client.audio.translations.create(
    model="whisper-1", 
    file=audio_file,
)

print(translation.text)

Python is a high-level programming language, interpreted as a script, imperative, oriented to objects, functional, of dynamic and strong typing. It was launched by Guido Van Rossum in 1981. It currently has a community development model, open and managed by the non-profit organization Python Software Foundation. Although several parts of the language have formal patterns and specifications, the language as a whole is not formally specified. The pattern, in practice, is the CpyPython implementation. The language was designed with the philosophy of emphasizing the importance of the programmer's effort over the computational effort. It prioritizes the legibility of the code over speed or expressiveness. It combines a concise and clear syntax with the powerful resources of its standard library and by modules and frameworks developed by third parties.
