# Video Pair Clip Generator

This notebook processes videos in a directory where files come in **pairs** with names like:

- `running_1.mp4` and `running_2.mp4`
- `swimming_1.MOV` and `swimming_2.MOV`

For each pair, it automatically:

- Finds all valid `_1` / `_2` pairs in the directory.
- From the `_1` video, creates:
  - `*_clip1` → first 60 seconds
  - `*_clip2` → a 60-second clip from the middle
  - `*_clip3` → last 60 seconds
- From the `_2` video, creates:
  - `*_clip4` → a 60-second clip from the middle
  - `*_clip5` → last 60 seconds

All new clips are saved to the **same directory** as the original videos, keeping the same file extension.

If a video is shorter than 60 seconds, the clip will simply be as long as the video allows.

We will use:

- **`pathlib`**: to handle file paths in a clean, OS-independent way.
- **`re` (regular expressions)**: to detect file name patterns like `prefix_1.ext` and `prefix_2.ext`.
- **`moviepy.editor`**: to load videos and write subclips to new files.


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
# If moviepy is not installed in your environment, run this cell once.
# If it is already installed, you can skip this cell.

!python3 -m pip install moviepy==1.0.3





In [3]:
from pathlib import Path
import re

from moviepy.editor import VideoFileClip, vfx
from moviepy.video.fx.all import crop
from tqdm.auto import tqdm

# Allowed video extensions (lowercase)
VIDEO_EXTENSIONS = {".mp4", ".mov", ".mkv", ".avi", ".m4v"}

# Desired duration of each segment in seconds
SEGMENT_LENGTH_SECONDS = 60

# Final output resolution (square)
TARGET_SIZE = 1080

## Helper functions: pair detection and segment computation

The functions below:

1. **`find_video_pairs(directory)`**  
   - Scans a directory for video files whose names match `<prefix>_<index>.<ext>`.
   - Groups them by `(prefix, ext)` and returns a list of pairs where both `_1` and `_2` exist.

2. **Segment computation functions**  
   - `compute_first_segment(duration)` → first 60 seconds.
   - `compute_middle_segment(duration)` → 60 seconds around the middle.
   - `compute_last_segment(duration)` → last 60 seconds.


In [4]:
def find_video_pairs(directory: Path):
    """
    Find pairs of videos in `directory` where filenames follow:
        <prefix>_<index>.<ext>
    and both index '1' and '2' exist for the same (prefix, ext).

    Returns:
        List of tuples: [(path_to_prefix_1, path_to_prefix_2), ...]
    """
    all_files = [
        p for p in directory.iterdir()
        if p.is_file() and p.suffix.lower() in VIDEO_EXTENSIONS
    ]

    groups = {}
    pattern = re.compile(r"^(?P<prefix>.+)_(?P<index>\d+)$")  # match <prefix>_<index>

    for path in all_files:
        match = pattern.match(path.stem)
        if not match:
            continue

        prefix = match.group("prefix")
        index = match.group("index")
        key = (prefix, path.suffix.lower())

        if key not in groups:
            groups[key] = {}
        groups[key][index] = path

    pairs = []
    for (prefix, ext), index_map in groups.items():
        if "1" in index_map and "2" in index_map:
            pairs.append((index_map["1"], index_map["2"]))

    return pairs


def compute_first_segment(duration: float, seg_len: float = SEGMENT_LENGTH_SECONDS):
    """Return (start, end) for the first segment, up to seg_len seconds."""
    start = 0.0
    end = min(seg_len, duration)
    return start, end


def compute_middle_segment(duration: float, seg_len: float = SEGMENT_LENGTH_SECONDS):
    """
    Return (start, end) for a seg_len-second window around the middle.
    If the video is shorter than seg_len, return the whole video.
    """
    if duration <= 0:
        return 0.0, 0.0

    if duration <= seg_len:
        return 0.0, duration

    mid = duration / 2.0
    start = max(0.0, mid - seg_len / 2.0)
    end = start + seg_len

    if end > duration:
        end = duration
        start = max(0.0, end - seg_len)

    return start, end


def compute_last_segment(duration: float, seg_len: float = SEGMENT_LENGTH_SECONDS):
    """
    Return (start, end) for the last seg_len seconds.
    If the video is shorter than seg_len, return the whole video.
    """
    if duration <= 0:
        return 0.0, 0.0

    if duration <= seg_len:
        return 0.0, duration

    start = max(0.0, duration - seg_len)
    end = duration
    return start, end

## Helper functions: resizing and writing clips

The functions below handle resizing and saving clips:

1. **`make_square_clip(clip, target_size)`**
   - Resizes the clip so that the smaller dimension becomes at least `target_size`.
   - Center-crops to a square of size `target_size × target_size`.

2. **`write_segment_clip(clip, input_path, start, end, suffix)`**
   - Extracts a subclip between `start` and `end`.
   - Resizes and crops it to 1080 × 1080.
   - Writes it to disk with the appropriate suffix.

3. **`process_video_for_index(input_path, index)`**
   - For `_1` videos: creates `_clip1`, `_clip2`, `_clip3`.
   - For `_2` videos: creates `_clip4`, `_clip5`.


In [5]:
def make_square_clip(clip, target_size: int = TARGET_SIZE):
    """
    Resize `clip` so that the smaller side >= target_size, then center-crop
    to (target_size x target_size). This preserves aspect ratio.
    """
    w, h = clip.size
    if w == 0 or h == 0:
        return clip

    # Scale so that the smaller dimension is at least target_size
    scale = target_size / min(w, h)
    clip_resized = clip.fx(vfx.resize, scale)

    w2, h2 = clip_resized.size
    x_center = w2 / 2.0
    y_center = h2 / 2.0

    x1 = int(x_center - target_size / 2.0)
    x2 = x1 + target_size
    y1 = int(y_center - target_size / 2.0)
    y2 = y1 + target_size

    return crop(clip_resized, x1=x1, y1=y1, x2=x2, y2=y2)


def write_segment_clip(clip: VideoFileClip, input_path: Path, start: float, end: float, suffix: str):
    """
    Write a subclip from `start` to `end` seconds to a new file, resizing to TARGET_SIZE x TARGET_SIZE.
    """
    if end - start <= 0:
        print(f"  [!] Skipping {input_path.name}{suffix}: non-positive segment length.")
        return

    # Extract the time window
    sub = clip.subclip(start, end)

    # Resize + center-crop to TARGET_SIZE x TARGET_SIZE
    sub_square = make_square_clip(sub, TARGET_SIZE)

    output_name = f"{input_path.stem}{suffix}{input_path.suffix}"
    output_path = input_path.with_name(output_name)

    print(f"  -> Writing {output_path.name} [{start:.2f}s - {end:.2f}s] at {TARGET_SIZE}x{TARGET_SIZE}")

    # You can adjust codec/bitrate here if needed
    sub_square.write_videofile(
        str(output_path),
        codec="libx264",
        audio_codec="aac",
        verbose=False,
        logger=None
    )


def process_video_for_index(input_path: Path, index: int):
    """
    Process a single video file depending on its index:
      - index == 1: create _clip1, _clip2, _clip3
      - index == 2: create _clip4, _clip5
    """
    input_path = Path(input_path)

    match = re.match(r"^(?P<prefix>.+)_(?P<index>\d+)$", input_path.stem)
    if not match:
        print(f"[!] Skipping {input_path.name}: does not match <prefix>_<index> pattern.")
        return

    true_index = int(match.group("index"))
    if true_index != index:
        print(f"[!] Warning: Expected index {index}, but filename has index {true_index} in {input_path.name}. Using {true_index}.")
        index = true_index

    with VideoFileClip(str(input_path)) as clip:
        duration = clip.duration or 0.0
        print(f"Processing {input_path.name} (duration = {duration:.2f} s, index = {index})")

        if duration <= 0:
            print("  [!] Duration is zero or invalid. Skipping.")
            return

        if index == 1:
            # _clip1: first 60s
            s1, e1 = compute_first_segment(duration)
            write_segment_clip(clip, input_path, s1, e1, "_clip1")

            # _clip2: middle 60s
            s2, e2 = compute_middle_segment(duration)
            write_segment_clip(clip, input_path, s2, e2, "_clip2")

            # _clip3: last 60s
            s3, e3 = compute_last_segment(duration)
            write_segment_clip(clip, input_path, s3, e3, "_clip3")

        elif index == 2:
            # _clip4: middle 60s
            s4, e4 = compute_middle_segment(duration)
            write_segment_clip(clip, input_path, s4, e4, "_clip4")

            # _clip5: last 60s
            s5, e5 = compute_last_segment(duration)
            write_segment_clip(clip, input_path, s5, e5, "_clip5")

        else:
            print(f"  [!] Index {index} not handled (only 1 and 2 are supported).")

## Set the input directory

Update the `input_dir` path below to point to the folder that contains your videos.

All output clips will be saved in the **same directory** as their corresponding input videos.

In [6]:
input_dir = "/content/drive/MyDrive/Harvard/LS100/videos/Frame_Reduced"
video_dir = Path(input_dir).expanduser().resolve()

if not video_dir.exists():
    raise ValueError(f"Directory not found: {video_dir}")

print("Using directory:", video_dir)

Using directory: /content/drive/MyDrive/Harvard/LS100/videos/Frame_Reduced


# Set Your Input Directory Here
Replace the path below with your video folder path.

In [7]:
input_dir = "/content/drive/MyDrive/Harvard/LS100/videos/Frame_Reduced"   # <-- change this line

video_dir = Path(input_dir).expanduser().resolve()

if not video_dir.exists():
    raise ValueError(f"Directory not found: {video_dir}")

print("Using directory:", video_dir)


Using directory: /content/drive/MyDrive/Harvard/LS100/videos/Frame_Reduced


## Run the pipeline with progress bars

This cell will:

1. Find all valid `(prefix_1, prefix_2)` video pairs.
2. For each pair:
   - Process the `_1` video and create clips 1–3.
   - Process the `_2` video and create clips 4–5.
3. Show an overall progress bar across all pairs.

When it finishes, your output clips (`*_clip1`–`*_clip5`) will be available in the same directory as the originals.


In [8]:
pairs = find_video_pairs(video_dir)

if not pairs:
    print("No valid (prefix_1, prefix_2) video pairs found in this directory.")
else:
    print(f"Found {len(pairs)} pair(s):\n")
    for p1, p2 in pairs:
        print(f"  - {p1.name}  &  {p2.name}")

    print("\nStarting processing...\n")

    for v1, v2 in tqdm(pairs, desc="Processing video pairs"):
        print("\n" + "=" * 70)
        print(f"Pair: {v1.name}  |  {v2.name}")

        # Process the _1 video (clips 1, 2, 3)
        process_video_for_index(v1, index=1)

        # Process the _2 video (clips 4, 5)
        process_video_for_index(v2, index=2)

    print("\nAll pairs processed.")

Found 3 pair(s):

  - Aryeh_1.MP4  &  Aryeh_2.MP4
  - Max_1.MP4  &  Max_2.MP4
  - Danny_1.MP4  &  Danny_2.MP4

Starting processing...



Processing video pairs:   0%|          | 0/3 [00:00<?, ?it/s]


Pair: Aryeh_1.MP4  |  Aryeh_2.MP4
Processing Aryeh_1.MP4 (duration = 896.90 s, index = 1)
  -> Writing Aryeh_1_clip1.MP4 [0.00s - 60.00s] at 1080x1080
  -> Writing Aryeh_1_clip2.MP4 [418.45s - 478.45s] at 1080x1080
  -> Writing Aryeh_1_clip3.MP4 [836.90s - 896.90s] at 1080x1080





Processing Aryeh_2.MP4 (duration = 846.16 s, index = 2)
  -> Writing Aryeh_2_clip4.MP4 [393.08s - 453.08s] at 1080x1080
  -> Writing Aryeh_2_clip5.MP4 [786.16s - 846.16s] at 1080x1080






Pair: Max_1.MP4  |  Max_2.MP4
Processing Max_1.MP4 (duration = 896.90 s, index = 1)
  -> Writing Max_1_clip1.MP4 [0.00s - 60.00s] at 1080x1080
  -> Writing Max_1_clip2.MP4 [418.45s - 478.45s] at 1080x1080
  -> Writing Max_1_clip3.MP4 [836.90s - 896.90s] at 1080x1080





Processing Max_2.MP4 (duration = 846.15 s, index = 2)
  -> Writing Max_2_clip4.MP4 [393.07s - 453.07s] at 1080x1080
  -> Writing Max_2_clip5.MP4 [786.15s - 846.15s] at 1080x1080






Pair: Danny_1.MP4  |  Danny_2.MP4
Processing Danny_1.MP4 (duration = 896.90 s, index = 1)
  -> Writing Danny_1_clip1.MP4 [0.00s - 60.00s] at 1080x1080
  -> Writing Danny_1_clip2.MP4 [418.45s - 478.45s] at 1080x1080
  -> Writing Danny_1_clip3.MP4 [836.90s - 896.90s] at 1080x1080





Processing Danny_2.MP4 (duration = 846.36 s, index = 2)
  -> Writing Danny_2_clip4.MP4 [393.18s - 453.18s] at 1080x1080
  -> Writing Danny_2_clip5.MP4 [786.36s - 846.36s] at 1080x1080






All pairs processed.


## Step 2 & 3: Generate Clips for All Pairs

In [9]:
if not pairs:
    print("No pairs to process.")
else:
    for v1, v2 in pairs:
        print("\n" + "="*60)
        print(f"Processing pair: {v1.name} / {v2.name}")

        process_video_for_index(v1, index=1)
        process_video_for_index(v2, index=2)

    print("\nAll pairs processed.")



Processing pair: Aryeh_1.MP4 / Aryeh_2.MP4
Processing Aryeh_1.MP4 (duration = 896.90 s, index = 1)
  -> Writing Aryeh_1_clip1.MP4 [0.00s - 60.00s] at 1080x1080
  -> Writing Aryeh_1_clip2.MP4 [418.45s - 478.45s] at 1080x1080
  -> Writing Aryeh_1_clip3.MP4 [836.90s - 896.90s] at 1080x1080
Processing Aryeh_2.MP4 (duration = 846.16 s, index = 2)
  -> Writing Aryeh_2_clip4.MP4 [393.08s - 453.08s] at 1080x1080
  -> Writing Aryeh_2_clip5.MP4 [786.16s - 846.16s] at 1080x1080

Processing pair: Max_1.MP4 / Max_2.MP4
Processing Max_1.MP4 (duration = 896.90 s, index = 1)
  -> Writing Max_1_clip1.MP4 [0.00s - 60.00s] at 1080x1080
  -> Writing Max_1_clip2.MP4 [418.45s - 478.45s] at 1080x1080
  -> Writing Max_1_clip3.MP4 [836.90s - 896.90s] at 1080x1080
Processing Max_2.MP4 (duration = 846.15 s, index = 2)
  -> Writing Max_2_clip4.MP4 [393.07s - 453.07s] at 1080x1080
  -> Writing Max_2_clip5.MP4 [786.15s - 846.15s] at 1080x1080

Processing pair: Danny_1.MP4 / Danny_2.MP4
Processing Danny_1.MP4 (dura

In [10]:
from pathlib import Path
import re

from moviepy.editor import VideoFileClip, vfx
from moviepy.video.fx.all import crop  # for center-cropping


In [11]:
TARGET_SIZE = 1080  # final width and height (square)


def make_square_clip(clip, target_size=TARGET_SIZE):
    """
    Resize 'clip' so that the smaller side becomes >= target_size,
    then center-crop to (target_size x target_size).
    This preserves aspect ratio and avoids distortion.
    """
    w, h = clip.size

    # Scale so that the smaller dimension is at least target_size
    scale = target_size / min(w, h)
    clip_resized = clip.fx(vfx.resize, scale)

    w2, h2 = clip_resized.size

    # Center crop to target_size x target_size
    x_center = w2 / 2
    y_center = h2 / 2
    x1 = int(x_center - target_size / 2)
    x2 = int(x_center + target_size / 2)
    y1 = int(y_center - target_size / 2)
    y2 = int(y_center + target_size / 2)

    clip_cropped = crop(clip_resized, x1=x1, y1=y1, x2=x2, y2=y2)

    return clip_cropped


In [12]:
def write_segment_clip(clip: VideoFileClip, input_path: Path, start: float, end: float, suffix: str):
    """
    Write a subclip from `start` to `end` seconds to a new file, resizing to 1080x1080.
    """
    if end - start <= 0:
        print(f"  [!] Skipping {input_path.name}{suffix}: invalid duration.")
        return

    # Extract the time window
    sub = clip.subclip(start, end)

    # Resize + center-crop to 1080x1080
    sub_square = make_square_clip(sub, TARGET_SIZE)

    output_name = f"{input_path.stem}{suffix}{input_path.suffix}"
    output_path = input_path.with_name(output_name)

    print(f"  -> Writing {output_path.name} [{start:.2f}s - {end:.2f}s] at {TARGET_SIZE}x{TARGET_SIZE}")

    sub_square.write_videofile(
        str(output_path),
        codec="libx264",
        audio_codec="aac",
        verbose=False,
        logger=None
    )
