# API Usage


This notebook serves as a quick reference for using the mentioned third party APIs. Follow along with the examples to get a better understanding of how to use the APIs.


In [18]:
from dotenv import load_dotenv

load_dotenv()

True

## 1. Transcription API


For the audio-to-text transcription, we will go through two APIs namely Open AI Whisper and Deepgram Nova.


### 1.1 Open AI Whisper


File types Supported : `mp3`,`mp4`,`mpeg`,`mpga`,`4a`,`wav` and `webm`. \
File Size Limit : 25 MB \
Operations Supported by Model : `transcription` and `translate` \
Languages Supported : [Click here](https://platform.openai.com/docs/guides/speech-to-text/supported-languages) \
Prompt can be sent optionally that will be sent to GPT to improve the accuracy of the transcription and correct any mistakes in the transcription.


In [28]:
# import required libraries
import openai
import os

openai.api_key = os.getenv("OPENAI_API_KEY")
audio_file = open("../inputs/sample_audio.wav", "rb")
# returns a JSON Object
transcript = openai.Audio.transcribe("whisper-1", audio_file)
openai_output = transcript.text
print(openai_output)

This dynamic workshop aims to provide up-to-date information on pharmacological approaches, issues, and treatment in the geriatric population to assist in preventing medication-related problems, appropriately and effectively managing medications and compliance. The concept of polypharmacy, taking multiple types of drugs, will also be discussed, as this is a common issue that can impact adverse side effects in the geriatric population. Participants will leave with the knowledge and considerations of common drug interaction and how to minimize effects that limit function. Summit Professional Education is approved provider of continuing education. This course is offered for six units. This course contains content classified under both the domain of occupational therapy and professional issues, period.


### 1.2 Deepgram Nova


Type of Transcription : `live` and `pre-recorded` \
Features Supported : [Features](https://developers.deepgram.com/docs/features-overview). \
Models Supported : [Models](https://developers.deepgram.com/docs/models-overview)
These features can be sent as an options in the form of a dict to the API.


In [3]:
from deepgram import Deepgram
import os
import json

deepgram_api_key = os.getenv("DEEPGRAM_API_KEY")
dg_client = Deepgram(deepgram_api_key)

options = {"punctuate": True, "model": "general", "tier": "enhanced"}

with open("../inputs/sample_audio.wav", "rb") as audio:
    source = {"buffer": audio, "mimetype": "audio/wav"}
    # you can aslo send direct url of the audio file as the source
    response = await dg_client.transcription.prerecorded(source, options)
    deepgram_output = response["results"]["channels"][0]["alternatives"][0][
        "transcript"
    ]
    print(deepgram_output)
    # save the response as json in the output folder with audio file name as the name of the json file
    with open("../outputs/sample_audio.json", "w") as f:
        json.dump(response, f, indent=4)

This dynamic workshop aims to provide up to date information on pharmacological approach as comma issues, comma and treatment in the geriatric population to assist in preventing medication related problems, comma appropriately and effectively managing medications and compliance period. The concept of poly pharmacy parenthesis taking multiple types of drugs parenthesis will also be discussed as this is a common issue that can impact adverse side effects in the aeriatric population period. Participants will leave with the knowledge and considerations of common drug interaction and how to minimize the effects that limit function, period. Summit professional education is approved provider of continuing education Perion. This course is offered for six units. Period. This course contains content classified under the both the domain of patiental therapy and professional issues, period.


## 2. Text-to-Speech API


To convert text to speech, we will use following two APIs namely Google Cloud Text-to-Speech and ElevenLabs.


### 2.1 ElevenLabs


Models available : https://docs.elevenlabs.io/speech-synthesis/models \
Prompting : https://docs.elevenlabs.io/speech-synthesis/prompting \
Voice : https://docs.elevenlabs.io/voicelab/overview \


In [29]:
from elevenlabs import generate, play, set_api_key

set_api_key(os.getenv("ELEVENLABS_API_KEY"))

audio = generate(
    text=openai_output,
    voice="Bella",
    model="eleven_monolingual_v1",
)

In [30]:
play(audio)

In [31]:
# save the audio file
with open("../outputs/elevenlabs_sample_audio.wav", "wb") as f:
    f.write(audio)

### 2.2 Google Cloud Text-to-Speech


In [36]:
import google.cloud.texttospeech as tts
from typing import Sequence

api_key_string = os.getenv("GOOGLE_CLOUD_TEXT_TO_SPEECH_API_KEY")


def unique_languages_from_voices(voices: Sequence[tts.Voice]):
    language_set = set()
    for voice in voices:
        for language_code in voice.language_codes:
            language_set.add(language_code)
    return language_set


def list_languages():
    client = tts.TextToSpeechClient(client_options={"api_key": api_key_string})
    response = client.list_voices()
    languages = unique_languages_from_voices(response.voices)

    print(f" Languages: {len(languages)} ".center(60, "-"))
    for i, language in enumerate(sorted(languages)):
        print(f"{language:>10}", end="\n" if i % 5 == 4 else "")


list_languages()

---------------------- Languages: 57 -----------------------
     af-ZA     ar-XA     bg-BG     bn-IN     ca-ES
    cmn-CN    cmn-TW     cs-CZ     da-DK     de-DE
     el-GR     en-AU     en-GB     en-IN     en-US
     es-ES     es-US     eu-ES     fi-FI    fil-PH
     fr-CA     fr-FR     gl-ES     gu-IN     he-IL
     hi-IN     hu-HU     id-ID     is-IS     it-IT
     ja-JP     kn-IN     ko-KR     lt-LT     lv-LV
     ml-IN     mr-IN     ms-MY     nb-NO     nl-BE
     nl-NL     pa-IN     pl-PL     pt-BR     pt-PT
     ro-RO     ru-RU     sk-SK     sr-RS     sv-SE
     ta-IN     te-IN     th-TH     tr-TR     uk-UA
     vi-VN    yue-HK

In [39]:
import google.cloud.texttospeech as tts


def text_to_wav(voice_name: str, text: str):
    language_code = "-".join(voice_name.split("-")[:2])
    text_input = tts.SynthesisInput(text=text)
    voice_params = tts.VoiceSelectionParams(
        language_code=language_code, name=voice_name
    )
    audio_config = tts.AudioConfig(audio_encoding=tts.AudioEncoding.LINEAR16)

    client = tts.TextToSpeechClient(client_options={"api_key": api_key_string})
    response = client.synthesize_speech(
        input=text_input,
        voice=voice_params,
        audio_config=audio_config,
    )

    filename = f"../outputs/{voice_name}.wav"
    with open(filename, "wb") as out:
        out.write(response.audio_content)
        print(f'Generated speech saved to "{filename}"')

In [40]:
text_to_wav("en-AU-Neural2-A", "What is the temperature in Sydney?")
text_to_wav("en-GB-Neural2-B", "What is the temperature in London?")
text_to_wav("en-IN-Wavenet-C", "What is the temperature in Delhi?")
text_to_wav("en-US-Studio-O", "What is the temperature in New York?")
text_to_wav("fr-FR-Neural2-A", "Quelle est la température à Paris ?")
text_to_wav("fr-CA-Neural2-B", "Quelle est la température à Montréal ?")

Generated speech saved to "../outputs/en-AU-Neural2-A.wav"
Generated speech saved to "../outputs/en-GB-Neural2-B.wav"
Generated speech saved to "../outputs/en-IN-Wavenet-C.wav"
Generated speech saved to "../outputs/en-US-Studio-O.wav"
Generated speech saved to "../outputs/fr-FR-Neural2-A.wav"
Generated speech saved to "../outputs/fr-CA-Neural2-B.wav"


## 3. LangChain and OpenAI


In [None]:
from langchain import PromptTemplate, OpenAI, LLMChain
from langchain import PromptTemplate

prompt = PromptTemplate(
    input_variables=["text", "prompt"],
    template="Prompt that was provided is: {prompt}.Text that was provided is: {text}.",
)
prompt.format(
    text="I want to start a company that makes cars.",
    prompt="Suggest a name for my company.",
)
llm = OpenAI(temperature=0)
llm_chain = LLMChain(llm=llm, prompt=prompt)
llm_chain.predict(
    text="I want to start a company that makes cars.",
    prompt="Suggest a name for my company.",
)