# Podcast Generator

This notebook support the creation of a podcast .mp3 audio file.

When you listen to the generated podcast, you will notice an engaging conversation between the host and a guest, as they talk about the content of your choice. You determine what they talk about, as you can provide a list of web pages to take content from.

The podcast generator uses the following technique to create the .mp3:

1. Define a list of url's you want to use as the input for the podcast content. The generator will automatically fetch the content of these web pages and translate to markdown language
2. Define who are the host and the guest
3. For each web page, generate a podcast transcript (where the host and the guest have a conversation). This uses Azure OpenAI gpt3.5 deployed model.
4. Transform the podcast transcript to SSML (Speech Synthesis Markup Language)
5. Transform the SSML output to audio using Azure Cognitive Service Speech API
6. Combine all the .mp3 files into one output

Let's first get started by installing the pre-requisites (pip install)

In [1]:
import sys
!{sys.executable} -m pip install -r requirements.txt




[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: C:\Users\wedebols\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


Now let's define the name of the host and the guest. For a full list of voices, check out https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-text-to-speech?tabs=streaming

| LocalName             | ShortName                       | Gender | WordsPerMinute |
|-----------------------|---------------------------------|--------|----------------|
| Ava                   | en-US-AvaNeural                 | Female |                |
| Andrew                | en-US-AndrewNeural              | Male   |                |
| Emma                  | en-US-EmmaNeural                | Female |                |
| Brian                 | en-US-BrianNeural               | Male   |                |
| Jenny *               | en-US-JennyNeural               | Female | 152            |
| Guy *                 | en-US-GuyNeural                 | Male   | 215            |
| Aria *                | en-US-AriaNeural                | Female | 150            |
| Davis *               | en-US-DavisNeural               | Male   | 154            |
| Jane *                | en-US-JaneNeural                | Female | 154            |
| Jason *               | en-US-JasonNeural               | Male   | 156            |
| Sara *                | en-US-SaraNeural                | Female | 157            |
| Tony *                | en-US-TonyNeural                | Male   | 156            |
| Nancy *               | en-US-NancyNeural               | Female | 149            |
| Amber                 | en-US-AmberNeural               | Female | 152            |
| Ana                   | en-US-AnaNeural                 | Female | 135            |
| Ashley                | en-US-AshleyNeural              | Female | 149            |
| Brandon               | en-US-BrandonNeural             | Male   | 156            |
| Christopher           | en-US-ChristopherNeural         | Male   | 149            |
| Cora                  | en-US-CoraNeural                | Female | 146            |
| Elizabeth             | en-US-ElizabethNeural           | Female | 152            |
| Eric                  | en-US-EricNeural                | Male   | 147            |
| Jacob                 | en-US-JacobNeural               | Male   | 154            |
| Jenny Multilingual    | en-US-JennyMultilingualNeural   | Female | 190            |
| Jenny Multilingual V2 | en-US-JennyMultilingualV2Neural | Female | 190            |
| Michelle              | en-US-MichelleNeural            | Female | 154            |
| Monica                | en-US-MonicaNeural              | Female | 145            |
| Roger                 | en-US-RogerNeural               | Male   |                |
| Ryan Multilingual     | en-US-RyanMultilingualNeural    | Male   | 190            |
| Steffan               | en-US-SteffanNeural             | Male   | 154            |

** Have styles in preview (for example, assistant, newscast, angry, ...)

In [2]:
host = "Brian"
guest = "Andrew"

Let's define all the import's

In [3]:
import os, fnmatch
import requests
import markdownify
import re
import json
import azure.cognitiveservices.speech as speechsdk
import shutil
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from urllib.parse import urljoin
from openai import AzureOpenAI
from pydub import AudioSegment
from pydub.playback import play

load_dotenv()

True

## Retrieve markdown text for a URL

The following function will download the page content from the URL parameter. 

- It will then find the div with id `unit-inner-section`. 
- Next, it removes some metadata from the HTML. 
- Finally, the returning text will be transformed to markdown content as the return value for this function. Markdown is a bit easier to work with when using it as input for gpt model (as it will preserve headers, ...)
- The function will also store the markdown content in the output folder (mainly for debugging purposes)

In [4]:
def get_markdown(url, savelocation):
    print("- Retrieving markdown from " + url)

    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    # might need to adapt this when working with other web pages (not Microsoft Learn)
    div = soup.find(id="unit-inner-section")

    for ul in div.find_all("ul", class_="metadata"):
        ul.decompose()
    for d in div.find_all("div", class_="xp-tag"):
        d.decompose()
    for next in div.find_all("div", class_="next-section"):
        next.decompose()
    for header in div.find_all(["h1", "h2", "h3", "h4", "h5", "h6"]):
        header.string = "\n# " + header.get_text() + "\n"
    for u in div.find_all(["li"]):
        u.string = "- " + u.get_text()
    for code in div.find_all("code"):
        code.decompose()

    markdown = markdownify.markdownify(str(div), heading_style="ATX", bullets="-")
    markdown = re.sub('\n{3,}', '\n\n', markdown)

    with open(savelocation, "w", encoding="utf-8") as file:
        file.write(markdown)

    return markdown

## Get Azure OpenAI chat response

This function will call the Azure OpenAI GPT model. Follow these steps:

1. Deploy an Azure OpenAI Service resource (https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal)
2. Deploy a model "gpt-35-turbo-16k". If possible, you can also deploy "gpt-4-32k" if quota is available. The more tokens you have, the less issues you will experience when calling the chat service. (https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal#deploy-a-model)
3. Retrieve the OPENAI_API_KEY and store it in the .env file

The following code makes use op some predefined prompts. The idea is that every webpage (markdown) will be attached as content when asking the gpt model to generate a podcast transcript. Since we want the transcript opening and closing section to be different, we have multiple prompts.

Notice that the characters of the host and guest are defined in another template.

For troubleshooting purposes, the output of the chat completion is also stored as a file in the output folder.

In [5]:
def get_chat_response(action, content, savelocation):
    print(f"- Retrieving chat response ({action})")
    client = AzureOpenAI(azure_endpoint="https://wedebolsaiopenai2.openai.azure.com/", api_version="2023-07-01-preview", api_key=os.getenv("OPENAI_API_KEY"))
    
    with open("prompts/prompt_characters.txt", "r", encoding="utf-8") as text_file:
        prompt_characters = text_file.read()

    with open(f"prompts/prompt_{action}.txt", "r", encoding="utf-8") as text_file:
        prompt = text_file.read()

    prompt = prompt.replace("{characters}", prompt_characters)
    prompt = prompt.replace("{host}", host)
    prompt = prompt.replace("{guest}", guest)
    prompt = prompt.replace("{content}", content)

    message_text = [
        {"role":"system","content":prompt},
        {"role":"user","content":"Create the podcast"}
    ]

    completion = client.chat.completions.create(
        model="gpt-35-turbo-16k",
        messages = message_text,
        temperature=0.2,
        max_tokens=13000,
        top_p=0.95,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None
    )

    output = completion.choices[0].message.content

    with open(savelocation, "w", encoding="utf-8") as file:
        file.write(output)

    return output

## Create MP3 audio

The following function takes the SSML transcript and uses the Azure Speech Service to transform the text into speech.

1. You will need to deploy an Azure Speech Service. Check out https://learn.microsoft.com/en-us/azure/ai-services/speech-service/index-text-to-speech for more information.
2. Fetch the SPEECH_API_KEY and store in the .env file.

As a result, an .mp3 file will be created in the output folder.

In [6]:
def get_audio(ssml, savelocation):
    print("- Creating audio")
    
    service_region = "eastus"
    speech_key = os.getenv("SPEECH_API_KEY")
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio24Khz96KBitRateMonoMp3)  

    file_config = speechsdk.audio.AudioOutputConfig(filename=savelocation)
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config)  

    result = speech_synthesizer.speak_ssml_async(ssml).get()
    return result

## Append multiple mp3 files

Since we have multiple .mp3 files, we want to merge/append them together sequentially. 

In [7]:

def append_mp3_files(input_files, output_file):
    print("- Combining audio files " + str(input_files))
    # Initialize an empty AudioSegment
    combined_audio = AudioSegment.silent(duration=0)

    # Iterate through input files and append them to the combined_audio
    for input_file in input_files:
        audio_segment = AudioSegment.from_file(input_file, format="mp3")
        combined_audio += audio_segment

    # Export the combined audio to the output file
    combined_audio.export(output_file, format="mp3")


## Create the combined podcast .mp3 file

This code will first get a list of all the generated .mp3 files, and combine them with a couple of short audio tunes to indicate start, break and finish.

In [8]:
def combineAudio(templocation, savelocation, modulename):
    input_files = fnmatch.filter(os.listdir(templocation), '*.mp3')
    final_files = []

    for i in range(len(input_files)):
        input_files[i] = os.path.join(templocation, input_files[i])

    for i in range(len(input_files)):
        if i == 0:
            final_files.append("media\\start.mp3")
            final_files.append(input_files[i])
        elif i == len(input_files) - 1:
            final_files.append("media\\break.mp3")
            final_files.append(input_files[i])
            final_files.append("media/finish.mp3")
        else:
            final_files.append("media\\break.mp3")
            final_files.append(input_files[i])

    append_mp3_files(final_files, f"{savelocation}\\{modulename}.mp3")


## Main function

In [9]:
try:
    shutil.rmtree("temp")
except FileNotFoundError:
    print("Directory not found")
finally:
    os.mkdir("temp")

with open("LearningPaths.json", "r") as file:
    learning_paths = json.load(file)

for lp in learning_paths:
    os.mkdir(f"temp/{lp['learning_path']}")
    for module in lp["learning_modules"]:
        os.mkdir(f"temp/{lp['learning_path']}/{module['learning_module']}")

        for index, url in enumerate(module["learning_units"]):
            unit_name = url.split("/")[-1]

            if index == 0:
                action = "start"
            elif index == len(module["learning_units"]) - 1:
                action = "finish"
            else:
                action = "between"
            
            markdown = get_markdown(url, f"temp/{lp['learning_path']}/{module['learning_module']}/{unit_name}.md")
            transcript = get_chat_response(action, markdown, f"temp/{lp['learning_path']}/{module['learning_module']}/{unit_name}.transcript.txt")
            ssml = get_chat_response("ssml", transcript, f"temp/{lp['learning_path']}/{module['learning_module']}/{unit_name}.ssml.txt")
            audio = get_audio(ssml, f"temp/{lp['learning_path']}/{module['learning_module']}/{unit_name}.mp3")

        combineAudio(f"temp/{lp['learning_path']}/{module['learning_module']}", "output", module['learning_module'])
        #break

print("Done!")


Directory not found
- Retrieving markdown from https://learn.microsoft.com/en-us/training/modules/use-dataflow-gen-2-fabric/1-introduction


- Retrieving chat response (start)
- Retrieving chat response (ssml)
- Creating audio
- Retrieving markdown from https://learn.microsoft.com/en-us/training/modules/use-dataflow-gen-2-fabric/2-dataflows-gen-2
- Retrieving chat response (between)
- Retrieving chat response (ssml)
- Creating audio
- Retrieving markdown from https://learn.microsoft.com/en-us/training/modules/use-dataflow-gen-2-fabric/3-explore-dataflows-gen-2
- Retrieving chat response (between)
- Retrieving chat response (ssml)
- Creating audio
- Retrieving markdown from https://learn.microsoft.com/en-us/training/modules/use-dataflow-gen-2-fabric/4-dataflow-pipeline
- Retrieving chat response (finish)
- Retrieving chat response (ssml)
- Creating audio
- Combining audio files ['media\\start.mp3', 'temp/Ingest data with Microsoft Fabric/Ingest Data with Dataflows Gen2 in Microsoft Fabric\\1-introduction.mp3', 'media\\break.mp3', 'temp/Ingest data with Microsoft Fabric/Ingest Data with Dataflows Gen2 in Microsoft Fabric\\2-d

KeyboardInterrupt: 