# Podcast Generator

This notebook support the creation of a podcast .mp3 audio file.

When you listen to the generated podcast, you will notice an engaging conversation between the host and a guest, as they talk about the content of your choice. You determine what they talk about, as you can provide a list of web pages to take content from.

The podcast generator uses the following technique to create the .mp3:

1. Define a list of url's you want to use as the input for the podcast content. The generator will automatically fetch the content of these web pages and translate to markdown language
2. Define who are the host and the guest
3. For each web page, generate a podcast transcript (where the host and the guest have a conversation). This uses Azure OpenAI gpt3.5 deployed model.
4. Transform the podcast transcript to SSML (Speech Synthesis Markup Language)
5. Transform the SSML output to audio using Azure Cognitive Service Speech API
6. Combine all the .mp3 files into one output

Let's first get started by installing the pre-requisites (pip install)

Now let's define the name of the host and the guest. For a full list of voices, check out https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-text-to-speech?tabs=streaming

| LocalName             | ShortName                       | Gender | WordsPerMinute |
|-----------------------|---------------------------------|--------|----------------|
| Ava                   | en-US-AvaNeural                 | Female |                |
| Andrew                | en-US-AndrewNeural              | Male   |                |
| Emma                  | en-US-EmmaNeural                | Female |                |
| Brian                 | en-US-BrianNeural               | Male   |                |
| Jenny *               | en-US-JennyNeural               | Female | 152            |
| Guy *                 | en-US-GuyNeural                 | Male   | 215            |
| Aria *                | en-US-AriaNeural                | Female | 150            |
| Davis *               | en-US-DavisNeural               | Male   | 154            |
| Jane *                | en-US-JaneNeural                | Female | 154            |
| Jason *               | en-US-JasonNeural               | Male   | 156            |
| Sara *                | en-US-SaraNeural                | Female | 157            |
| Tony *                | en-US-TonyNeural                | Male   | 156            |
| Nancy *               | en-US-NancyNeural               | Female | 149            |
| Amber                 | en-US-AmberNeural               | Female | 152            |
| Ana                   | en-US-AnaNeural                 | Female | 135            |
| Ashley                | en-US-AshleyNeural              | Female | 149            |
| Brandon               | en-US-BrandonNeural             | Male   | 156            |
| Christopher           | en-US-ChristopherNeural         | Male   | 149            |
| Cora                  | en-US-CoraNeural                | Female | 146            |
| Elizabeth             | en-US-ElizabethNeural           | Female | 152            |
| Eric                  | en-US-EricNeural                | Male   | 147            |
| Jacob                 | en-US-JacobNeural               | Male   | 154            |
| Jenny Multilingual    | en-US-JennyMultilingualNeural   | Female | 190            |
| Jenny Multilingual V2 | en-US-JennyMultilingualV2Neural | Female | 190            |
| Michelle              | en-US-MichelleNeural            | Female | 154            |
| Monica                | en-US-MonicaNeural              | Female | 145            |
| Roger                 | en-US-RogerNeural               | Male   |                |
| Ryan Multilingual     | en-US-RyanMultilingualNeural    | Male   | 190            |
| Steffan               | en-US-SteffanNeural             | Male   | 154            |

** Have styles in preview (for example, assistant, newscast, angry, ...)

In [13]:
host = "Brian"
guest = "Emma"
podcast_title = "DP600, Implementing Analytics Solutions Using Microsoft Fabric"
code = "DP-600"
learn_module = "all"

Let's define all the import's

In [14]:
import os, fnmatch
import requests
import markdownify
import re
import json
import azure.cognitiveservices.speech as speechsdk
import shutil
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from urllib.parse import urljoin
from openai import AzureOpenAI
from pydub import AudioSegment
from pydub.playback import play

load_dotenv()

True

## Retrieve markdown text for a URL

The following function will download the page content from the URL parameter. 

- It will then find the div with id `unit-inner-section`. 
- Next, it removes some metadata from the HTML. 
- Finally, the returning text will be transformed to markdown content as the return value for this function. Markdown is a bit easier to work with when using it as input for gpt model (as it will preserve headers, ...)
- The function will also store the markdown content in the output folder (mainly for debugging purposes)

In [3]:
def get_markdown(url, savelocation):
    print("- Retrieving markdown from " + url)

    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    # might need to adapt this when working with other web pages (not Microsoft Learn)
    div = soup.find(id="unit-inner-section")

    for ul in div.find_all("ul", class_="metadata"):
        ul.decompose()
    for d in div.find_all("div", class_="xp-tag"):
        d.decompose()
    for next in div.find_all("div", class_="next-section"):
        next.decompose()
    for header in div.find_all(["h1", "h2", "h3", "h4", "h5", "h6"]):
        header.string = "\n# " + header.get_text() + "\n"
    for code in div.find_all("code"):
        code.decompose()

    markdown = markdownify.markdownify(str(div), heading_style="ATX", bullets="-")
    markdown = re.sub('\n{3,}', '\n\n', markdown)
    markdown = markdown.replace("[Continue](/en-us/)", "")

    with open(savelocation, "w", encoding="utf-8") as file:
        file.write(markdown)

    return markdown

## Get Azure OpenAI chat response

This function will call the Azure OpenAI GPT model. Follow these steps:

1. Deploy an Azure OpenAI Service resource (https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal)
2. Deploy a model "gpt-35-turbo-16k". If possible, you can also deploy "gpt-4-32k" if quota is available. The more tokens you have, the less issues you will experience when calling the chat service. (https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal#deploy-a-model)
3. Retrieve the OPENAI_API_KEY and store it in the .env file

The following code makes use op some predefined prompts. The idea is that every webpage (markdown) will be attached as content when asking the gpt model to generate a podcast transcript. Since we want the transcript opening and closing section to be different, we have multiple prompts.

Notice that the characters of the host and guest are defined in another template.

For troubleshooting purposes, the output of the chat completion is also stored as a file in the output folder.

In [15]:
def get_chat_response(action, content, savelocation, maxtokens=13000, userMessage="Generate the podcast"):
    print(f"- Retrieving chat response ({action}, maxtokens={maxtokens})")
    client = AzureOpenAI(azure_endpoint=os.getenv("AZURE_ENDPOINT"), api_version="2023-07-01-preview", api_key=os.getenv("OPENAI_API_KEY"))
    
    with open("prompts/prompt_characters.txt", "r", encoding="utf-8") as text_file:
        prompt_characters = text_file.read()

    with open(f"prompts/prompt_{action}.txt", "r", encoding="utf-8") as text_file:
        prompt = text_file.read()

    prompt = prompt.replace("{characters}", prompt_characters)
    prompt = prompt.replace("{host}", host)
    prompt = prompt.replace("{guest}", guest)
    prompt = prompt.replace("{content}", content)
    prompt = prompt.replace("{podcast_title}", podcast_title)

    message_text = [
        {"role":"system","content":prompt},
        {"role":"user","content":userMessage}
    ]

    completion = client.chat.completions.create(
        model="gpt-35-turbo-16k",
        messages = message_text,
        temperature=0.1,
        #max_tokens=13000,
        top_p=0.95,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None
    )

    output = completion.choices[0].message.content
    print(f"- Actual total usage token={completion.usage.total_tokens}")

    with open(savelocation, "w", encoding="utf-8") as file:
        file.write(output)

    return output

## Create MP3 audio

The following function takes the SSML transcript and uses the Azure Speech Service to transform the text into speech.

1. You will need to deploy an Azure Speech Service. Check out https://learn.microsoft.com/en-us/azure/ai-services/speech-service/index-text-to-speech for more information.
2. Fetch the SPEECH_API_KEY and store in the .env file.

As a result, an .mp3 file will be created in the output folder.

In [5]:
def get_audio(ssml, savelocation):
    print(f"- Creating audio {savelocation}")
    
    service_region = "eastus"
    speech_key = os.getenv("SPEECH_API_KEY")
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio24Khz96KBitRateMonoMp3)  

    file_config = speechsdk.audio.AudioOutputConfig(filename=savelocation)
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config)  

    result = speech_synthesizer.speak_ssml_async(ssml).get()
    return result

## Append multiple mp3 files

Since we have multiple .mp3 files, we want to merge/append them together sequentially. 

## Create the combined podcast .mp3 file

This code will first get a list of all the generated .mp3 files, and combine them with a couple of short audio tunes to indicate start, break and finish.

In [6]:
def combineAudio(templocation, savelocation):
    input_files = fnmatch.filter(os.listdir(templocation), '*.mp3')
    final_files = []

    for i in range(len(input_files)):
        input_files[i] = os.path.join(templocation, input_files[i])

    for i in range(len(input_files)):
        if i == 0:
            final_files.append("media\\start.mp3")
            final_files.append(input_files[i])
        elif i == len(input_files) - 1:
            final_files.append("media\\break.mp3")
            final_files.append(input_files[i])
            final_files.append("media/finish.mp3")
        elif i == 1:
            final_files.append(input_files[i]) # skip the first break (as the introduction is only a few minutes long)
        else:
            final_files.append("media\\break.mp3")
            final_files.append(input_files[i])

    print("- Combining audio files " + str(final_files))
    
    # Initialize an empty AudioSegment
    combined_audio = AudioSegment.silent(duration=0)

    # Iterate through input files and append them to the combined_audio
    for input_file in final_files:
        audio_segment = AudioSegment.from_file(input_file, format="mp3")
        combined_audio += audio_segment

    # Export the combined audio to the output file
    combined_audio.export(savelocation, format="mp3")


In [7]:
def calculate_number_words(text): 
    nrOfWords = len(text.split())
    return nrOfWords

def calculate_approx_tokens(text):
    nrOfTokens = round(calculate_number_words(text) * 3)
    if nrOfTokens > 13000:
        nrOfTokens = 13000
    return nrOfTokens

## Main function

In [8]:
with open("LearningPaths.json", "r") as file:
    learning_paths = json.load(file)

for lp in learning_paths:

    modules = [module for module in lp["learning_modules"] if module["learning_module"] == learn_module or learn_module == "all"]
    for module in modules:
        outputFolder_module = f"output/{code}.{module['learning_module']}"
        outputFile_module_mp3 = f"output/{code}.{module['learning_module']}.mp3"

        if not os.path.exists(outputFolder_module):
            os.mkdir(outputFolder_module)

        for index, url in enumerate(module["learning_units"]):
            unit_name = url.split("/")[-1]

            if index == 0:
                action = "start"
            elif index == len(module["learning_units"]) - 1:
                action = "finish"
            else:
                action = "between"
            
            outputFile_md = f"{outputFolder_module}/{unit_name}.md"
            outpufFile_transcript = f"{outputFolder_module}/{unit_name}.transcript.txt"
            outputFile_ssml = f"{outputFolder_module}/{unit_name}.ssml.xml"
            outputFile_unit_mp3 = f"{outputFolder_module}/{unit_name}.mp3"
            
            markdown = get_markdown(url, outputFile_md)
            transcript = get_chat_response(action, markdown, outpufFile_transcript, calculate_approx_tokens(markdown))
            ssml = get_chat_response("ssml", transcript, outputFile_ssml)
            #audio = get_audio(ssml, outputFile_unit_mp3)

            #break

        #combineAudio(outputFolder_module , outputFile_module_mp3)

        #break

print("Done!")


- Retrieving markdown from https://learn.microsoft.com/en-us/training/modules/use-dataflow-gen-2-fabric/1-introduction
- Retrieving chat response (start, maxtokens=570)
- Actual total usage token=1190
- Retrieving chat response (ssml, maxtokens=13000)
- Actual total usage token=1651
- Retrieving markdown from https://learn.microsoft.com/en-us/training/modules/use-dataflow-gen-2-fabric/2-dataflows-gen-2
- Retrieving chat response (between, maxtokens=1800)
- Actual total usage token=1766
- Retrieving chat response (ssml, maxtokens=13000)
- Actual total usage token=1673
- Retrieving markdown from https://learn.microsoft.com/en-us/training/modules/use-dataflow-gen-2-fabric/3-explore-dataflows-gen-2
- Retrieving chat response (between, maxtokens=1206)
- Actual total usage token=1680
- Retrieving chat response (ssml, maxtokens=13000)
- Actual total usage token=1895
- Retrieving markdown from https://learn.microsoft.com/en-us/training/modules/use-dataflow-gen-2-fabric/4-dataflow-pipeline
- Re

In [7]:
with open("LearningPaths.json", "r") as file:
    learning_paths = json.load(file)

for lp in learning_paths:

    modules = [module for module in lp["learning_modules"] if module["learning_module"] == learn_module or learn_module == "all"]
    for module in modules:
        outputFolder_module = f"output/{code}.{module['learning_module']}"
        outputFile_module_mp3 = f"output/{code}.{module['learning_module']}.mp3"

        for index, url in enumerate(module["learning_units"]):
            unit_name = url.split("/")[-1]

            outputFile_ssml = f"{outputFolder_module}/{unit_name}.ssml.xml"
            outputFile_unit_mp3 = f"{outputFolder_module}/{unit_name}.mp3"

            with open(outputFile_ssml, "r", encoding="utf8") as ssml_file:
                ssml = ssml_file.read()
            
            audio = get_audio(ssml, outputFile_unit_mp3)

            #break

        combineAudio(outputFolder_module , outputFile_module_mp3)

        #break

print("Done!")


- Creating audio output/AI-3004.Analyze images/1-introduction.mp3
- Creating audio output/AI-3004.Analyze images/2-provision-computer-vision-resource.mp3
- Creating audio output/AI-3004.Analyze images/3-analyze-image.mp3
- Creating audio output/AI-3004.Analyze images/4-generate-smart-cropped-thumbnail.mp3
- Combining audio files ['media\\start.mp3', 'output/AI-3004.Analyze images\\1-introduction.mp3', 'output/AI-3004.Analyze images\\2-provision-computer-vision-resource.mp3', 'media\\break.mp3', 'output/AI-3004.Analyze images\\3-analyze-image.mp3', 'media\\break.mp3', 'output/AI-3004.Analyze images\\4-generate-smart-cropped-thumbnail.mp3', 'media/finish.mp3']
Done!


In [17]:
import glob

md_files = glob.glob('output/**/*.md', recursive=True)
for md_file in md_files:
    print(md_file)

    with open(md_file, "r", encoding="utf8") as md_file_handle:
        markdown = md_file_handle.read()

    outpufFile_plantuml = md_file.replace(".md", ".plantuml")

    plantuml = get_chat_response("plantuml", markdown, outpufFile_plantuml, 13000, "Generate the plantuml code")
    print(outpufFile_plantuml)


output\AI-3003.Analyze text with Azure AI Language\1-introduction.md
- Retrieving chat response (plantuml, maxtokens=13000)


- Actual total usage token=341
output\AI-3003.Analyze text with Azure AI Language\1-introduction.plantuml
output\AI-3003.Analyze text with Azure AI Language\2-provision-resource.md
- Retrieving chat response (plantuml, maxtokens=13000)
- Actual total usage token=551
output\AI-3003.Analyze text with Azure AI Language\2-provision-resource.plantuml
output\AI-3003.Analyze text with Azure AI Language\3-detect-language.md
- Retrieving chat response (plantuml, maxtokens=13000)
- Actual total usage token=882
output\AI-3003.Analyze text with Azure AI Language\3-detect-language.plantuml
output\AI-3003.Analyze text with Azure AI Language\4-extract-key-phrases.md
- Retrieving chat response (plantuml, maxtokens=13000)
- Actual total usage token=325
output\AI-3003.Analyze text with Azure AI Language\4-extract-key-phrases.plantuml
output\AI-3003.Analyze text with Azure AI Language\5-analyze-sentiment.md
- Retrieving chat response (plantuml, maxtokens=13000)
- Actual total usage token=478
output\AI-30