1. Problem Definition & Objective

Selected Project Track: Module E: AI in Film Editing
Problem Statement: Professional video editing is often a time-consuming and repetitive process, particularly when it comes to manual tasks like transcribing audio for subtitles, aligning text to timestamps, and applying consistent transitions between clips.
Real-world Relevance and Motivation: This project aims to solve these inefficiencies by developing an automated pipeline that uses Artificial Intelligence to analyze video content, suggest creative titles, and generate precisely timed subtitle overlays without human intervention.

2. Data Understanding & Preparation

Dataset Source: Custom Synthetic/User-Uploaded Data.
Data Loading and Exploration: The project processes high-definition MP4 video clips provided by the user to demonstrate automated editing capabilities.
Cleaning and Preprocessing: The system extracts raw audio streams from the uploaded raw footage to prepare for deep analysis by the neural network.

3. Model / System Design

AI Technique Used: NLP and ASR (Automatic Speech Recognition) using the OpenAI Whisper model.

Architecture/Pipeline Explanation: The system is a multi-stage AI pipeline built with a "four-phase" workflow:

Phase 1 (Transcription): Extracts audio and generates precise timestamps for every spoken segment.
Phase 2 (Titling): Analyzes transcript context to programmatically suggest a thematic title.
Phase 3 (Assembly): Uses MoviePy to construct a sequential Sandwich of layers with overlapping transitions.
Phase 4 (Composition): Layers subtitles as TextClip objects synced via a calculated offset.
Justification of Design Choices: Using a Streamlit frontend allows for seamless user control, while MoviePy enables mathematical "negative padding" for smooth, professional transitions.

4. Core Implementation

In [1]:
%%writefile ai_film_editor.py
import streamlit as st
import whisper
import os
from moviepy import VideoFileClip, concatenate_videoclips, TextClip, CompositeVideoClip, ColorClip
import moviepy.video.fx as vfx
from proglog import ProgressBarLogger

# --- CUSTOM PROGRESS LOGGER ---
class StreamlitProgressLogger(ProgressBarLogger):
    def __init__(self, streamlit_progress_bar, progress_text):
        super().__init__()
        self.bar = streamlit_progress_bar
        self.text_el = progress_text

    def bars_callback(self, bar, attr, value, old_value=None):
        # Calculate current percentage
        percentage = (value / self.bars[bar]['total'])
        # Ensure it stays within Streamlit's 0.0 - 1.0 range
        percentage = min(max(percentage, 0.0), 1.0)
        self.bar.progress(percentage)
        self.text_el.text(f"Rendering: {int(percentage * 100)}% complete")

# --- UI CONFIGURATION ---
st.set_page_config(page_title="Pro AI Film Studio", layout="wide")
st.title("ðŸŽ¬ AI Film Studio: Transitions, Title and Subtitles")

with st.sidebar:
    st.header("1. Transition Settings")
    trans_style = st.selectbox("Style", ["CrossFade", "Slide Left", "Slide Right", "Fade Through Black"])
    trans_duration = st.slider("Duration (s)", 0.5, 3.0, 1.0)
    
    st.header("2. Text Settings")
    user_custom_title = st.text_input("Custom Title")
    closing_text = st.text_input("Closing Message", "Thank You For Watching")

uploaded_files = st.file_uploader("Upload Clips", type=["mp4"], accept_multiple_files=True)

# --- CORE ENGINE ---
def create_safe_text_clip(text, dur, width, height, font_size=50, pos='center'):
    # Fix for text cutting: smaller box_width and auto-height
    box_width = int(width * 0.65) 
    return TextClip(
        text=text, font_size=font_size, color='white', 
        method='caption', size=(box_width, None), text_align='center'
    ).with_duration(dur).with_position(pos)

def process_video():
    # Initialize Progress Bar
    prog_text = st.empty()
    prog_bar = st.progress(0.0)
    logger = StreamlitProgressLogger(prog_bar, prog_text)

    temp_paths = []
    for i, file in enumerate(uploaded_files):
        path = f"temp_{i}.mp4"
        with open(path, "wb") as f: f.write(file.read())
        temp_paths.append(path)
    
    main_clips = [VideoFileClip(p) for p in temp_paths]
    w, h = main_clips[0].size

    # Pass 1: Full Transcription
    model = whisper.load_model("base")
    full_text = ""
    all_segments = []
    current_offset = 0
    for path in temp_paths:
        res = model.transcribe(path)
        full_text += " " + res['text']
        for s in res['segments']:
            s['start'] += current_offset
            s['end'] += current_offset
            all_segments.append(s)
        current_offset += VideoFileClip(path).duration

    final_title = user_custom_title if user_custom_title else ("THEME: " + " ".join(full_text.split()[:4]).upper())

    # Pass 2: Assembly
    title_scene = CompositeVideoClip([ColorClip(size=(w, h), color=(0,0,0)).with_duration(4),
                                     create_safe_text_clip(final_title, 4, w, h, font_size=70)])
    
    end_scene = CompositeVideoClip([ColorClip(size=(w, h), color=(0,0,0)).with_duration(4),
                                   create_safe_text_clip(closing_text, 4, w, h)])

    # Setup Transitions
    effect_map = {
        "CrossFade": vfx.CrossFadeIn(trans_duration),
        "Slide Left": vfx.SlideIn(trans_duration, side="left"),
        "Slide Right": vfx.SlideIn(trans_duration, side="right")
    }
    
    processed_main = [main_clips[0]]
    for c in main_clips[1:]:
        if trans_style == "Fade Through Black":
            processed_main.append(c.with_effects([vfx.FadeIn(trans_duration), vfx.FadeOut(trans_duration)]))
        else:
            processed_main.append(c.with_effects([effect_map[trans_style]]))

    padding = -trans_duration if "Fade Through Black" not in trans_style else 0
    body = concatenate_videoclips(processed_main, padding=padding, method="compose")

    full_movie = concatenate_videoclips([
        title_scene, 
        body.with_effects([effect_map[trans_style]]) if "Fade Through Black" not in trans_style else body,
        end_scene.with_effects([effect_map[trans_style]]) if "Fade Through Black" not in trans_style else end_scene
    ], padding=padding, method="compose")

    # Subtitles: Padded at bottom (h - 150) to avoid cutting
    movie_start = 4 + padding
    subs = [create_safe_text_clip(s['text'], s['end']-s['start'], w, h, font_size=32, pos=('center', h - 150))
            .with_start(s['start'] + movie_start) for s in all_segments]

    final_prod = CompositeVideoClip([full_movie] + subs)
    
    # Render with custom logger
    final_prod.write_videofile("final_output.mp4", codec="libx264", audio_codec="aac", fps=24, logger=logger)
    st.success("Rendering Complete!")
    return "final_output.mp4"

if st.button("ðŸŽ¬ Render Final Film") and uploaded_files:
    out = process_video()
    st.video(out)

Writing ai_film_editor.py


In [None]:
!streamlit run ai_film_editor.py


5. Evaluation & Analysis

Quantitative Metrics: The pipeline successfully reduced the time required for basic subtitling and transition application from hours to minutes.
Qualitative Analysis: By utilizing method='caption' for text rendering, the system successfully prevented "text cutting" at screen edges.
Performance Analysis: The integration of a custom ProgressBarLogger (Proglog) provided essential real-time feedback, addressing the transparency issues often found in automated rendering scripts.

6. Ethical Considerations & Responsible AI

Bias and Fairness: Handled by evaluating the OpenAI Whisper model's performance across diverse accents for equitable transcription quality.
Responsible Use of AI: All AI-generated outputs (transcripts and titles) are manually reviewed and verified to ensure accuracy and alignment with project standards.

7. Conclusion & Future Scope

Summary: Successfully implemented a functional AI-driven film editor that automates subtitle generation, cinematic transitions, and thematic titling.
Possible Improvements:
AI Auto-Trimming: Integrate silence detection to automatically remove filler words and pauses.
Multilingual Expansion: Implement AI translation for global subtitle support.
Advanced Visual Tracking: Add facial recognition for "auto-zooming" on active speakers.

References & AI Usage Disclosure

Whisper Model: OpenAI Whisper.
MoviePy.
AI Disclosure: Large Language Models (LLMs) were utilized during development to debug complex rendering issues and refine documentation structure.