<a href="https://colab.research.google.com/github/yellomello/Python_Practice-Basics/blob/main/Teach_O_Matic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Teach-O-Matic Channel](https://user-images.githubusercontent.com/14149230/236333210-36ad530d-af79-402c-8297-2e4eaf8b678e.png)](https://www.youtube.com/channel/UCtvf4LKQpFkrUHJ8o3bI8DA)


# 📚 Teach-O-Matic

In this notebook, we'll create instructional "how to videos" on any topic, using LangChain, OpenAI, and Replicate. We created a YouTube channel featuring outputs from this notebook so you get a taste of what you can create:

> 🍿 Watch [How to Make Spaghetti (By Shakespeare)](https://www.youtube.com/watch?v=mZoGDUckhOE?utm_source=project&utm_campaign=teachomatic) (1min)


To make your own videos, keep running the cells in the notebook!

##### Table of Contents

>[📚 Teach-O-Matic](#scrollTo=tcRrbtmSkoVR)

>>>>>[Table of Contents](#scrollTo=-FylA4F5cixH)

>[🌐 Links](#scrollTo=2LUvZTDzE-4h)

>[💾 Install Stuff](#scrollTo=OniyJ8XRWQbY)

>[⛓️ Create our LangChain](#scrollTo=Ou-40KJur3ZI)

>[❔ What should the video be about?](#scrollTo=ynqnG0SR3nwW)

>[👟 Run the Chain!](#scrollTo=LyQwZW-24M10)

>[⏳ Wait for our async predictions to complete](#scrollTo=Znto050hQCjM)

>[🪡 Stitch them all together!](#scrollTo=wkvtnprozZyQ)

>[🍿 Watch the video](#scrollTo=YWwsGXaVhgwF)

>[🧠 Things we learned:](#scrollTo=or_M30Ck7Hxb)



# 🌐 Links

- [LangChain](https://python.langchain.com/en/latest?utm_source=project&utm_campaign=teachomatic), for chaining together our model requests.
- [Replicate](https://replicate.com?utm_source=project&utm_campaign=teachomatic), for running:
  - [suno-ai/bark](https://github.com/suno-ai/bark?utm_source=project&utm_campaign=teachomatic) for creating the narrator
  - [damo-text-to-video](https://replicate.com/cjwbw/damo-text-to-video?utm_source=project&utm_campaign=teachomatic), to create the videos
  - [stable diffusion](https://replicate.com/stability-ai/stable-diffusion?utm_source=project&utm_campaign=teachomatic), for creating the title/ending screen image
  - [riffusion](https://replicate.com/riffusion/riffusion?utm_source=project&utm_campaign=teachomatic), for creating the background music
- [OpenAI Docs](https://beta.openai.com) for creating the script + video descriptions
- [MoviePy](https://zulko.github.io/moviepy/) for stitching the video and audio together
- [Teach-O-Matic Youtube Channel](https://www.youtube.com/@teach-o-matic?utm_source=project&utm_campaign=teachomatic). Tweet us [@replicatehq](https://twitter.com/replicatehq?utm_source=project&utm_campaign=teachomatic) and we'll add your how to video to the channel!
- [GitHub Repo](https://github.com/cbh123/teach-o-matic?utm_source=project&utm_campaign=teachomatic)

# 💾 Install Stuff

In [3]:
!pip install replicate
!pip install requests
!pip install openai
!pip install langchain
!pip install moviepy
from google.colab import output
output.clear()

In [None]:
import os
import openai
import time
import numpy as np
import replicate

In [None]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, SimpleSequentialChain, SequentialChain, TransformChain
from langchain.llms import OpenAI, Replicate
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)


Get your replicate token from [https://replicate.com/account](https://replicate.com/account?utm_source=project&utm_campaign=teachomatic)

In [None]:
# get your token from https://replicate.com/account
from getpass import getpass

REPLICATE_API_TOKEN = getpass()
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN

··········


In [None]:
# get your key from https://platform.openai.com/account/api-keys
OPENAI_API_KEY = getpass()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")

··········


# ⛓️ Create our LangChain
First let's generate the script, video descriptions, and title for our how-to video. [LangChain](https://langchain.readthedocs.io/) makes this easy, because we can separate each LLM call into a composable chain.

In this step we want to accomplish three things:
1. a script that our narrator ([Bark](https://replicate.com/suno-ai/bark)) will read
2. a title for our video
3. visual descriptions for our script that we run in the [text-to-video](https://replicate.com/cjwbw/damo-text-to-video) model

Down the road, we may want to add or change our outputs. Changing our outputs is easy with LangChain —— all it requires is adding or updating an LLMChain.

In [None]:
# LLMChain to write a a script for our how-to video.
topic_template = """
Write me the instructions for {topic}.

Rules
- The output should 6 paragraphs, each no more than 20 words.
- Split each output with a \n\n
- Omit the word 'paragraph'.
- The first paragraph should introduce some background on why the problem is worth solving.

You are a {narrator_adjectives} narrator.
"""

system_message_prompt = SystemMessage(content="You are a helpful assistant that enthusiastically teaches people new topics.")
human_message_prompt = HumanMessagePromptTemplate(prompt=PromptTemplate(
                                                  template=topic_template,
                                                  input_variables=["topic", "narrator_adjectives"]))

# create the initial script
chat = ChatOpenAI(temperature=0.9, model_name="gpt-4")
chat_prompt_template = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
script_chain = LLMChain(llm=chat, prompt=chat_prompt_template, output_key='script')

In [None]:
# LLMChain to write a title for our video
llm = OpenAI(temperature=.9)
template = """Please come up with a creative and zany title for the below how-to video script.
Puns are encouraged. Don't include quotations (") in the output.

Script:
{script}
Title: """
prompt_template = PromptTemplate(input_variables=["script"], template=template)
title_chain = LLMChain(llm=llm, prompt=prompt_template, output_key='title')

In [1]:
# LLMChain to write a descriptions of each video that we'll create using a text to video model.
video_template = """
          Create a visual description for each of the 6 following paragraphs. Keep each one as concrete as possible.

          Rules
          - Each description should be no more than 20 words.
          - The output MUST be very specific and concrete and include a single action.
          - Do not include numbers in the output.
          - Split each output with a \n\n
          - Include common household objects

          Example input:
          Driving a car provides independence and convenience.\n\nAdjust seat, mirrors, and steering wheel for comfort.\n\nFamiliarize yourself with pedals, gearshift, and dashboard controls.\n\nStart the car, press brake, and shift gears.\n\nObserve traffic rules, signals, and road signs attentively.\n\nPractice regularly to build confidence and improve skills.

          Example output:
          Happy driver driving a car.\n\nDriver adjusting seat and mirrors in car.\n\nDriver studying pedals, gearshift, and dashboard.\n\nStarting car, pressing brake, and shifting gears.\n\nDriver obeying traffic rules and road signs.\n\nCar driving on road.

          "{script}""
          """

system_message_prompt = SystemMessage(content="You are a helpful assistant that is helping us create visual depictions of text. Sometimes it's important to see the person doing the action. For example, cooking spaghetti may involve a chef. Fixing a car may involve a mechanic.")
human_message_prompt = HumanMessagePromptTemplate(prompt=PromptTemplate(
                                                  template=video_template,
                                                  input_variables=["script"]
                                                  ))

# create the video descriptions
chat_prompt_template = ChatPromptTemplate(messages=[system_message_prompt, human_message_prompt], input_variables=["script"], partial_variables={"format_instructions": "Your response should be a list of \n separated values, eg: `foo\nbar\nbaz`"})
video_descriptions_chain = LLMChain(llm=chat, prompt=chat_prompt_template, output_key='video_descriptions')

NameError: ignored

In [None]:
# let's define a new output parser!
from langchain.output_parsers.list import ListOutputParser
from typing import List

class DoubleNewlineOutputParser(ListOutputParser):
  """Parse out \n\n separated lists."""
  def get_format_instructions(self) -> str:
      return (
          "Your response should be a list of \n\n separated values, "
          "eg: `foo\n\nbar\n\nbaz`"
      )

  def parse(self, text: str) -> List[str]:
        """Parse the output of an LLM call."""
        return text.strip().split("\n\n")

In [None]:
# LLMChain to create the replicate predictions for our text-to-video model
def transform_func(inputs: dict) -> dict:
  video_model = replicate.models.get('cjwbw/damo-text-to-video')
  video_version = video_model.versions.get("1e205ea73084bd17a0a3b43396e49ba0d6bc2e754e9283b2df49fad2dcf95755")
  descriptions = DoubleNewlineOutputParser().parse(inputs['video_descriptions'])

  predictions = []

  for description in descriptions:
      print(f"Creating video prediction for '{description}'...")
      video_prediction = replicate.predictions.create(version=video_version,
                                                      input={"prompt": description, "num_frames": 50, "fps": 10}) # 5s videos
      predictions.append(video_prediction)
  return {'video_predictions': predictions}

video_predictions_chain = TransformChain(input_variables=['video_descriptions'], output_variables=['video_predictions'], transform=transform_func)

In [None]:
# LLMChain to create the replicate predictions for our bark model
def transform_func(inputs: dict) -> dict:
  audio_model = replicate.models.get("suno-ai/bark")
  audio_version = audio_model.versions.get("b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787")
  parsed_script = DoubleNewlineOutputParser().parse(inputs['script'])

  predictions = []

  for line in parsed_script:
      print(f"Creating audio prediction for '{line}''...")
      audio_prediction = replicate.predictions.create(version=audio_version,
                                                      input={"prompt": line, "history_prompt": "announcer"})
      predictions.append(audio_prediction)
  return {'audio_predictions': predictions}

audio_predictions_chain = TransformChain(input_variables=['script'], output_variables=['audio_predictions'], transform=transform_func)

In [None]:
# LLMChain to create the cover image
llm = OpenAI(temperature=.9)
template = """
          Create a visual description for the following script.
          "{script}""
          """
prompt_template = PromptTemplate(input_variables=["script"], template=template)
text2image = Replicate(model="stability-ai/stable-diffusion:db21e45d3f7023abc2a46ee38a23973f6dce16bb082a930b0c49861f96d1e5bf",
                       input={'image_dimensions': '512x512', "negative_prompt": "text, writing"})
title_image_chain = LLMChain(llm=text2image, prompt=prompt_template, output_key='title_image')

In [None]:
# LLMChain to write the thank you note at the end of our video
template = """Please come up with a creative and zany ending quote from our narrator.
The script is what the narrator just read. We want to close things out.

Make sure you add a "And don't forget to like and subscribe!" to the end of your output.

You are a {narrator_adjectives} narrator.

Script:
{script}
Ending quote:
"""

system_message_prompt = SystemMessage(content="You are a helpful assistant.")
human_message_prompt = HumanMessagePromptTemplate(prompt=PromptTemplate(
                                                  template=template,
                                                  input_variables=["script", "narrator_adjectives"]))
chat_prompt_template = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
ending_quote_chain = LLMChain(llm=chat, prompt=chat_prompt_template, output_key='ending_quote')

In [None]:
# LLMChain to create the prediction that generates the audio for the thank you note
def transform_func(inputs: dict) -> dict:
  audio_model = replicate.models.get("suno-ai/bark")
  audio_version = audio_model.versions.get("b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787")
  ending_quote_prediction = replicate.predictions.create(version=audio_version,
                                                      input={"prompt": inputs['ending_quote'], "history_prompt": "announcer"})
  return {'ending_quote_prediction': ending_quote_prediction}

ending_quote_prediction_chain = TransformChain(input_variables=['ending_quote'], output_variables=['ending_quote_prediction'], transform=transform_func)


In [None]:
# LLMChain to create the prediction that generates the audio for the title
def transform_func(inputs: dict) -> dict:
  audio_model = replicate.models.get("suno-ai/bark")
  audio_version = audio_model.versions.get("b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787")
  title_prediction = replicate.predictions.create(version=audio_version,
                                                      input={"prompt": inputs['title'], "history_prompt": "announcer"})
  return {'title_audio_prediction': title_prediction}

title_audio_prediction_chain = TransformChain(input_variables=['title'], output_variables=['title_audio_prediction'], transform=transform_func)


In [None]:
# LLMChain to create the prediction that generates the background music
def transform_func(inputs: dict) -> dict:
  model = replicate.models.get("riffusion/riffusion")
  version = model.versions.get("8cf61ea6c56afd61d8f5b9ffd14d7c216c0a93844ce2d82ac1c9ecc9c7f24e05")
  music_prediction = replicate.predictions.create(version=version, input={"prompt": inputs['music_style']})

  return {'music_prediction': music_prediction}

music_prediction_chain = TransformChain(input_variables=['music_style'], output_variables=['music_prediction'], transform=transform_func)

# ❔ What should the video be about?
> ⬇️ Here's the fun part. Every time we make a new video, we'll want to change the variables below and re-run from here.

In [None]:
TOPIC = "how to avoid bear attacks when camping in the wilderness" # make sure it has "how to _"
NARRATOR_ADJECTIVES = "park ranger"
MUSIC_STYLE = "campy 1960s advertisement"

# 👟 Run the Chain!
Now, let's execute the chain we created. This is relatively fast, because the chains that create long-running predictions (like the video_predictions_chain) make asynchronous calls to the Replicate API.

In [None]:
# Run the chain
overall_chain = SequentialChain(chains=[script_chain,
                                        title_chain,
                                        video_descriptions_chain,
                                        video_predictions_chain,
                                        audio_predictions_chain,
                                        ending_quote_chain,
                                        ending_quote_prediction_chain,
                                        title_image_chain,
                                        title_audio_prediction_chain,
                                        music_prediction_chain
                                        ], input_variables=['topic', 'narrator_adjectives', 'music_style'], output_variables=['script', 'video_descriptions', 'title', 'video_predictions', 'audio_predictions', 'ending_quote', 'title_image', 'ending_quote_prediction', 'title_audio_prediction', 'music_prediction'], verbose=True)
chain_output = overall_chain({"topic": TOPIC, "narrator_adjectives": NARRATOR_ADJECTIVES, "music_style": MUSIC_STYLE})

NameError: ignored

In [None]:
# unpack outputs
title = chain_output['title']
script = chain_output['script']
split_script = script.split('\n\n')
video_descriptions = chain_output['video_descriptions'].split('\n\n')
video_predictions = chain_output['video_predictions']
audio_predictions = chain_output['audio_predictions']

> 🚨 This sanity check can sometimes fail. If it does, you only need to re-run the prior two blocks (Run the Chain! onwards)

In [None]:
# sanity check
assert len(split_script) == 6
assert len(video_descriptions) == 6

# ⏳ Wait for our async predictions to complete
Here's a helper to check in on our predictions. This usually takes a minute or two.

In [None]:
def all_done(predictions):
    return set([p.status for p in predictions]) == {'succeeded'}

In [None]:
all_predictions = chain_output['video_predictions'] + \
                  chain_output['audio_predictions'] + \
                  [chain_output['ending_quote_prediction']] + \
                  [chain_output['title_audio_prediction']] + \
                  [chain_output['music_prediction']]

In [None]:
done = False

while not done:
  [p.reload() for p in all_predictions]
  for p in all_predictions:
    print(f'https://replicate.com/p/{p.id}', p.status)
  done = all_done(all_predictions)
  time.sleep(2)
  output.clear()

print("Predictions complete")

# 🪡 Stitch them all together!

In [None]:
video_urls = [v.output for v in video_predictions]
audio_urls = [a.output['audio_out'] for a in audio_predictions]
music_url = chain_output['music_prediction'].output['audio']
subtitles = split_script
title_image_url = chain_output['title_image']
title_audio_url = chain_output['title_audio_prediction'].output['audio_out']

Thank you to GPT-4 for this code. This scary looking block stitches together our video and narrator content, and adds background music.

In [None]:
import requests
import os
import moviepy.editor as mp
import moviepy.video.fx.all as vfx
import textwrap
from moviepy.editor import *
from PIL import Image, ImageDraw, ImageFont
from io import BytesIO
import numpy as np


# Download video and audio files
video_files = []
audio_files = []
for i, url in enumerate(video_urls):
    response = requests.get(url)
    video_filename = f"temp_video{i}.mp4"
    with open(video_filename, "wb") as video_file:
        video_file.write(response.content)
    video_files.append(video_filename)

for i, url in enumerate(audio_urls):
    response = requests.get(url)
    audio_filename = f"temp_audio{i}.mp3"
    with open(audio_filename, "wb") as audio_file:
        audio_file.write(response.content)
    audio_files.append(audio_filename)

# Load and process video and audio files
processed_videos = []
for i, audio_file in enumerate(audio_files):
    video = mp.VideoFileClip(video_files[i])
    audio = mp.AudioFileClip(audio_file)

    # Loop the video for the duration of the audio
    looped_video = mp.concatenate_videoclips([video] * int(audio.duration // video.duration + 1))

    # Set the audio of the video to the audio file
    video_with_audio = looped_video.set_audio(audio)
    processed_videos.append(video_with_audio)

# Concatenate all the processed videos
final_video = mp.concatenate_videoclips(processed_videos)

## The following adds the title image / narration to the video.
# Add this function to create the text image
def txt_image(img, txt, font_size, color):
    image = img.copy()
    draw = ImageDraw.Draw(image)
    draw.text((50, 50), txt, fill=(255, 255, 0))
    # font = ImageFont.load_default().font_variant(size=font_size)
    # draw.text((50, 50), txt, font=font, fill=color)
    return image

# Download and create the image clip
image_url = title_image_url
response = requests.get(image_url)
img = Image.open(BytesIO(response.content))

# Resize the image to match the video dimensions
img_resized = img.resize((1200, 900))

# Download the audio file
audio_url = chain_output['ending_quote_prediction'].output['audio_out']
response = requests.get(audio_url)
with open("temp_audio_ending.mp3", "wb") as audio_file:
    audio_file.write(response.content)

# Create the audio clip
audio_ending = AudioFileClip("temp_audio_ending.mp3")

# make title empty for now, couldn't figure out how to get it bigger
text = chain_output['title']
img_text = ImageClip(np.asarray(txt_image(img_resized, txt="", font_size=48, color="white")), duration=4)

# Set the audio of the image clip to the audio file and trim it to the same duration
img_text_audio_ending = mp.concatenate_videoclips([img_text] * int(audio_ending.duration // img_text.duration + 1))
img_text_audio_ending = img_text.set_audio(audio_ending)

# Download the title page audio file
audio_url = chain_output['title_audio_prediction'].output['audio_out']
response = requests.get(audio_url)
with open("temp_audio_title.mp3", "wb") as audio_file:
    audio_file.write(response.content)

# Create the audio clip
audio_beginning = AudioFileClip("temp_audio_title.mp3")

# Set the audio of the image clip to the audio file and trim it to the same duration
img_text_audio_beginning = mp.concatenate_videoclips([img_text] * int(audio_beginning.duration // img_text.duration + 1))
img_text_audio_beginning = img_text.set_audio(audio_beginning)

# Concatenate the image clip with the processed videos
width, height = processed_videos[0].size
title_video = img_text_audio_beginning.resize((width, height))
ending_video = img_text_audio_ending.resize((width, height))

processed_videos.insert(0, title_video)
processed_videos.append(ending_video)

final_video = concatenate_videoclips(processed_videos)

# Download the background audio file
bg_audio_url = music_url
response = requests.get(bg_audio_url)
with open("temp_bg_audio.mp3", "wb") as audio_file:
    audio_file.write(response.content)

# Create the background audio clip
bg_audio = AudioFileClip("temp_bg_audio.mp3")

# Calculate the duration of the final video
video_duration = final_video.duration

# Loop the background audio to match the final video's duration
bg_audio_looped = bg_audio.fx(afx.audio_loop, duration=video_duration)
bg_audio_looped = bg_audio_looped.volumex(0.5)

# Overlay the background audio with the audio from the final video
final_audio = CompositeAudioClip([final_video.audio, bg_audio_looped])

# Set the audio of the final video to the combined audio
final_video_with_bg_audio = final_video.set_audio(final_audio)

# Save the final video
final_video_with_bg_audio.write_videofile(f"how_to.mp4", codec='libx264', audio_codec='aac')

# Clean up temporary files
for video_file, audio_file in zip(video_files, audio_files):
    os.remove(video_file)
for audio_file in audio_files:
    os.remove(audio_file)

# 🍿 Watch the video

In [None]:
from IPython.display import HTML
from base64 import b64encode
mp4 = open(f'how_to.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

# 🧠 Things we learned:
- It's hard to get the open source base LLMs like dolly/stable-lm to respond with structured text (like a consistent 3 paragraph response)
- You can use SequentialChain in LangChain to handle multiple inputs and outputs
- You can use a TransformChain to run arbitrary functions in your LangChain. This is how I'm running multiple async Replicate calls
- Sometimes the damo/text-to-video output is fuzzy. Especially when the video description is vague. So it's really important that our LLM has a concrete description for what the video should be! It would be really cool if our LLM was multi-modal so we could feed back the video output and say "if this is fuzzy can you make the description more concrete and try again?"
- GPT-4 was really helpful for stitching all the clips together! The final product is combining images, videos, narration clips, and background music (at 50% volume), and it all sort of works.
