<a href="https://colab.research.google.com/github/tiangu1980/HighPrecisionStepperJuggler/blob/master/demo/VibeVoice_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# VibeVoice Colab — T4 Quickstart (1.5B)

This notebook provides a quickstart guide to run VibeVoice on Colab with T4. The T4 GPU can only support the 1.5B model due to memory limitations. Please note that T4 can only use SDPA instead of flash_attention_2, which may result in unstable and lower audio quality. For the best TTS experience, we recommend trying the 7B model on a more powerful GPU.


## Step 1: Setup Environment

In [None]:
# Check for T4 GPU
import torch
if torch.cuda.is_available() and "T4" in torch.cuda.get_device_name(0):
    print("✅ T4 GPU detected")
else:
    print("""
    ⚠️ WARNING: T4 GPU not detected

    The recommended runtime for this Colab notebook is "T4 GPU".

    To change the runtime type:

        1. Click on "Runtime" in the top navigation menu
        2. Click on "Change runtime type"
        3. Select "T4 GPU"
        4. Click "OK" if a "Disconnect and delete runtime" window appears
        5. Click on "Save"

    """)

# Clone the VibeVoice repository
![ -d /content/VibeVoice ] || git clone --quiet --branch main --depth 1 https://github.com/great-wind/MicroSoft_VibeVoice.git /content/VibeVoice
print("✅ Cloned VibeVoice repository")

# Install project dependencies
!uv pip --quiet install --system -e /content/VibeVoice
print("✅ Installed dependencies")

# Download model (~3 minutes)
!HF_XET_HIGH_PERFORMANCE=1 hf download microsoft/VibeVoice-1.5B --quiet  --local-dir /content/models/VibeVoice-1.5B > /dev/null
print("✅ Downloaded model: microsoft/VibeVoice-1.5B")


## Step 2: Create Transcript

In [None]:
%%writefile /content/my_transcript.txt
Speaker 4: It's a cozy Saturday afternoon. Alice, Carter, and Frank are sitting at a café table, planning their big Greek adventure. Laptops open, coffee cups half-full, and Alice already has a notebook covered with doodles of olive branches and little temples.
Speaker 1: Okay, first things first-Athens! The cradle of democracy, birthplace of philosophy, land of gyros and feta cheese. We have to start there.
Speaker 2: You just sound like a travel brochure already. Did you secretly get hired by the Greek Ministry of Tourism?
Speaker 1: If they paid me in baklava, I'd take the job! Anyway, we'll start with the Acropolis. Can you imagine walking up where Socrates and Plato once debated?
Speaker 2: Yeah, I imagine Socrates would look at my sneakers and go, "Really? That's your toga alternative?"
Speaker 3: Hey, at least you're wearing shoes. Half the tourists I see in Athens are trying to climb marble stairs in flip-flops. It's like an Olympic sport called "Sprained Ankle 101."
Speaker 1: Haha. True. But besides the Acropolis, we should check out the Parthenon. It was dedicated to Athena, the goddess of wisdom. The columns are still standing after 2,500 years!
Speaker 2: Yeah, meanwhile my IKEA bookshelf didn't last two months.
Speaker 3: That's because you didn't sacrifice a goat to Athena before assembling it.
Speaker 1: Stop! Okay, but food-Carter, you'll love souvlaki. Tender grilled meat on skewers. And saganaki, which is basically cheese set on fire at your table.
Speaker 2: Flaming cheese? Okay, Greece officially understands me.
Speaker 3: Next stop: Delphi. The ancient Greeks thought it was the center of the world, right?
Speaker 1: Yes! The Oracle of Apollo. People traveled from all over just to hear cryptic advice. Like, "You'll win if you don't lose." Super helpful.
Speaker 2: So basically the same as a modern motivational poster.
Speaker 1: Heehee. Exactly. But the ruins are incredible-the Temple of Apollo, the theater with mountain views. You feel like the gods are still listening.
Speaker 3: And don't forget the food. The mountain villages nearby? Honey, nuts, homemade yogurt. You'll eat something simple like bread with olive oil, and it tastes like heaven.
Speaker 2: If the oracle told me "Your destiny is carbs," I'd be like, "Finally, some accurate prophecy."
Speaker 1: Meteora is going to blow your mind. Monasteries perched on top of giant rock pillars-like castles in the clouds.
Speaker 2: Okay but... how do monks even get groceries up there?
Speaker 3: They used to haul supplies with ropes and baskets. Imagine being the delivery guy: "Here's your pizza, please don't drop me."
Speaker 1: Haha. Those monasteries are UNESCO sites now. We'll climb up and see Byzantine frescoes that survived centuries.
Speaker 2: I'm more motivated by the "climb down and eat moussaka" part.
Speaker 3: Good choice. Moussaka-layers of eggplant, meat, and béchamel sauce. It's basically Greek lasagna, but better.
Speaker 3: Ah, Thessaloniki. My favorite food city. Street markets full of bougatsa-sweet custard pie wrapped in flaky pastry. You'll eat one, then accidentally eat three more.
Speaker 1: And history too! Roman forums, Byzantine churches, Ottoman baths. Every empire left its mark.
Speaker 2: So it's like the European version of my old hard drive-cluttered with everything.
Speaker 3: Exactly. But tastier. And at night, the waterfront fills with music. You'll hear rebetiko songs-kind of like Greek blues.
Speaker 1: Romantic! We'll sit by the sea, eating fresh seafood, maybe grilled octopus.
Speaker 2: As long as it doesn't still look like it could hug me back.
Speaker 1: Now the iconic one: Santorini. White houses, blue domes, sunsets that make Instagram crash.
Speaker 2: I'm already seeing the photos: "nofilter living my best life."
Speaker 3: It's true. Oia at sunset is one of those bucket-list moments. But warning: crowds. Everyone's elbowing each other for that one picture.
Speaker 1: Then we'll sneak off and find a taverna. Tomato fritters, fava bean dip, local white wine. Simple, fresh, perfect.
Speaker 2: Finally, a meal that doesn't start with "fried cheese explosion."
Speaker 3: Don't worry, we'll still get fried cheese tomorrow.
Speaker 3: Crete is a whole trip by itself, but we'll squeeze in highlights. Knossos Palace-the legendary labyrinth of King Minos. Some say the Minotaur was kept there.
Speaker 2: Great, so basically "Airbnb but with a monster roommate."
Speaker 1: The palace is fascinating-Minoan frescoes of dolphins and dancers. They were advanced for 3,500 years ago!
Speaker 3: And Crete's food... oh boy. Dakos-barley rusk topped with tomato and feta. Fresh fish straight from the sea. And raki, the local spirit.
Speaker 2: What's the alcohol percentage?
Speaker 3: High enough that after two shots, you'll start believing you are the Minotaur.
Speaker 1: Final day-Mykonos. Beaches, windmills, nightlife.
Speaker 2: So this is the "party like Dionysus" portion of the trip?
Speaker 3: Exactly. We'll stroll through Little Venice, where houses sit right on the water. Then by night, the clubs go wild.
Speaker 1: But I want to try kopanisti cheese. Spicy, tangy, totally different from feta.
Speaker 2: I'm noticing a trend: every stop involves cheese.
Speaker 3: Hey, if Greece invented democracy and theater, the least they could do is also perfect dairy.
Speaker 1: So seven days: Athens, Delphi, Meteora, Thessaloniki, Santorini, Crete, Mykonos. History, food, sunsets, parties. Perfect balance.
Speaker 2: And enough cheese to keep me happy for the rest of my life.
Speaker 3: That's the spirit. Greece: come for the ruins, stay for the raki.
Speaker 4: And so, their Greek adventure is set. Laughter, myths, and plenty of flaming cheese await.




## Step 3: Generate Audio

In [None]:
# Run Python script to generate audio from transcript
!python /content/VibeVoice/demo/inference_from_file.py \
    --model_path /content/models/VibeVoice-1.5B \
    --txt_path /content/my_transcript.txt \
    --speaker_names Alice Carter Frank Maya

# Display audio controls
from IPython.display import Audio
Audio("/content/outputs/my_transcript_generated.wav")


# Step 4: Download Audio

In [None]:
from google.colab import files
files.download("/content/outputs/my_transcript_generated.wav")



## Risks and Limitations

While efforts have been made to optimize it through various techniques, it may still produce outputs that are unexpected, biased, or inaccurate. VibeVoice inherits any biases, errors, or omissions produced by its base model (specifically, Qwen2.5 1.5b in this release). Potential for Deepfakes and Disinformation: High-quality synthetic speech can be misused to create convincing fake audio content for impersonation, fraud, or spreading disinformation. Users must ensure transcripts are reliable, check content accuracy, and avoid using generated content in misleading ways. Users are expected to use the generated content and to deploy the models in a lawful manner, in full compliance with all applicable laws and regulations in the relevant jurisdictions. It is best practice to disclose the use of AI when sharing AI-generated content.