# RealESRGAN Video Inference with Fine-Tuned Model on Google Colab (T4 GPU)

This notebook demonstrates how to use a fine-tuned RealESRGAN model to perform video super-resolution on a Google Colab environment with a T4 GPU. The script processes a video frame by frame, enhances each frame using the fine-tuned RealESRGAN model, and saves the upscaled video with preserved audio.

## Prerequisites
- Ensure you have a Google Colab environment with a T4 GPU assigned (select Runtime > Change runtime type > T4 GPU).
- Upload a video file (e.g., `test1.mp4`) to the Colab working directory or clone the Git repository containing the video.
- Obtain the fine-tuned model weights (`net_g_5000.pth`) from the Git repository (stored as a Git LFS file). Clone the repository and ensure Git LFS is set up to download the model.
- Install Git LFS in Colab if needed: `!apt-get install git-lfs && git lfs install`.

## Steps
1. Install required dependencies and Git LFS.
2. Clone the Git repository containing the fine-tuned model (`net_g_5000.pth`).
3. Set up paths for the input video and model weights.
4. Run the inference script to process the video.
5. Download the output video from the specified output directory.

## Notes
- The script uses FP16 precision by default to optimize performance on T4 GPU.
- If you encounter CUDA out-of-memory errors, reduce the `tile` size (default: 1000).
- The output video resolution is scaled by the `outscale` factor (default: 4x).
- For `.flv` videos, the script converts them to `.mp4` before processing.
- The fine-tuned model (`net_g_5000.pth`) is specific to this implementation and differs from the original RealESRGAN_x4plus model.

### If **not** using Conda:

Ensure that you have the following system-level dependencies installed via your OS package manager:

* `ffmpeg`
* `libGL`

You can install the required Python packages manually using `pip`:

In [None]:
# !sudo apt update
# !sudo apt update && sudo apt install -y libgl1 ffmpeg

Get:1 https://repo.anaconda.com/pkgs/misc/debrepo/conda stable InRelease [3961 B]
Get:2 https://packages.microsoft.com/repos/microsoft-ubuntu-noble-prod noble InRelease [3600 B]
Get:3 https://dl.yarnpkg.com/debian stable InRelease                           [0m
Get:4 http://archive.ubuntu.com/ubuntu noble InRelease [256 kB]              [0m0m[33m
Get:5 http://security.ubuntu.com/ubuntu noble-security InRelease [126 kB]      [0m
Get:6 https://repo.anaconda.com/pkgs/misc/debrepo/conda stable/main amd64 Packages [4557 B]
Get:7 http://archive.ubuntu.com/ubuntu noble-updates InRelease [126 kB]
Get:8 https://packages.microsoft.com/repos/microsoft-ubuntu-noble-prod noble/main all Packages [576 B]
Get:9 http://archive.ubuntu.com/ubuntu noble-backports InRelease [126 kB]
Get:10 https://packages.microsoft.com/repos/microsoft-ubuntu-noble-prod noble/main amd64 Packages [36.3 kB]
Get:11 https://dl.yarnpkg.com/debian stable/main amd64 Packages [11.8 kB][33m[33m
Get:12 https://dl.yarnpkg.com/de

In [None]:
# # Install dependencies
# !pip install -q basicsr facexlib gfpgan numpy opencv-python Pillow torch torchvision tqdm realesrgan ffmpeg-python

### If using GitHub Codespaces or a Conda environment (Python 3.12):

All required Python libraries are automatically installed via the Conda environment. System dependencies are auto-configured using the `postCreateCommand` in the `devcontainer.json` file. No additional steps are required unless you want to manually verify or update packages.

#### To create a new Conda environment in GitHub Codespaces:

1. Click on the **Select Kernel** dropdown in the top-right corner.
2. Select **Another Kernel**.
3. Go to **Python Environments** > **Create Python Environment**.
4. Choose **Conda** and set the version to **Python 3.12**.

In [4]:
!pip show basicsr

Name: basicsr
Version: 1.4.2
Summary: Open Source Image and Video Super-Resolution Toolbox
Home-page: https://github.com/xinntao/BasicSR
Author: Xintao Wang
Author-email: xintao.wang@outlook.com
License: Apache License 2.0
Location: /usr/local/python/3.12.1/lib/python3.12/site-packages
Requires: addict, future, lmdb, numpy, opencv-python, Pillow, pyyaml, requests, scikit-image, scipy, tb-nightly, torch, torchvision, tqdm, yapf
Required-by: gfpgan, realesrgan


In [6]:
%%writefile dependency-fix.sh
#!/bin/bash
# Fix torchvision import in basicsr/data/degradations.py
sed -i 's/from torchvision.transforms.functional_tensor import rgb_to_grayscale/from torchvision.transforms.functional import rgb_to_grayscale/' /usr/local/python/3.12.1/lib/python3.12/site-packages/basicsr/data/degradations.py

Writing dependency-fix.sh


In [7]:
!chmod +x dependency-fix.sh
!./dependency-fix.sh

In [None]:
import os
from basicsr.archs.rrdbnet_arch import RRDBNet
from realesrgan import RealESRGANer
from tqdm import tqdm
import ffmpeg
import mimetypes
import numpy as np

class VideoReader:
    def __init__(self, video_path, ffmpeg_bin='ffmpeg'):
        self.ffmpeg_bin = ffmpeg_bin
        meta = self.get_video_meta_info(video_path)
        self.width = meta['width']
        self.height = meta['height']
        self.fps = meta['fps']
        self.audio = meta['audio']
        self.nb_frames = meta['nb_frames']
        self.stream_reader = (
            ffmpeg.input(video_path).output('pipe:', format='rawvideo', pix_fmt='bgr24', loglevel='error')
            .run_async(pipe_stdin=True, pipe_stdout=True, cmd=ffmpeg_bin)
        )
        self.idx = 0

    def get_video_meta_info(self, video_path):
        probe = ffmpeg.probe(video_path)
        video_streams = [stream for stream in probe['streams'] if stream['codec_type'] == 'video']
        has_audio = any(stream['codec_type'] == 'audio' for stream in probe['streams'])
        return {
            'width': video_streams[0]['width'],
            'height': video_streams[0]['height'],
            'fps': eval(video_streams[0]['avg_frame_rate']),
            'audio': ffmpeg.input(video_path).audio if has_audio else None,
            'nb_frames': int(video_streams[0]['nb_frames'])
        }

    def get_frame(self):
        if self.idx >= self.nb_frames:
            return None
        img_bytes = self.stream_reader.stdout.read(self.width * self.height * 3)
        if not img_bytes:
            return None
        img = np.frombuffer(img_bytes, np.uint8).reshape([self.height, self.width, 3])
        self.idx += 1
        return img

    def get_resolution(self):
        return self.height, self.width

    def get_fps(self):
        return self.fps

    def get_audio(self):
        return self.audio

    def __len__(self):
        return self.nb_frames

    def close(self):
        self.stream_reader.stdin.close()
        self.stream_reader.wait()

class VideoWriter:
    def __init__(self, video_save_path, audio, height, width, fps, outscale, ffmpeg_bin='ffmpeg'):
        self.ffmpeg_bin = ffmpeg_bin
        out_width, out_height = int(width * outscale), int(height * outscale)
        if out_height > 2160:
            print('Warning: Output video exceeds 4K resolution, which may be slow due to I/O. Consider reducing outscale.')
        input_args = {
            'format': 'rawvideo',
            'pix_fmt': 'bgr24',
            's': f'{out_width}x{out_height}',
            'framerate': fps
        }
        output_args = {
            'pix_fmt': 'yuv420p',
            'vcodec': 'libx264',
            'loglevel': 'error'
        }
        if audio is not None:
            output_args['acodec'] = 'copy'
            self.stream_writer = (
                ffmpeg.input('pipe:', **input_args)
                .output(audio, video_save_path, **output_args)
                .overwrite_output()
                .run_async(pipe_stdin=True, pipe_stdout=True, cmd=ffmpeg_bin)
            )
        else:
            self.stream_writer = (
                ffmpeg.input('pipe:', **input_args)
                .output(video_save_path, **output_args)
                .overwrite_output()
                .run_async(pipe_stdin=True, pipe_stdout=True, cmd=ffmpeg_bin)
            )

    def write_frame(self, frame):
        frame = frame.astype(np.uint8).tobytes()
        self.stream_writer.stdin.write(frame)

    def close(self):
        self.stream_writer.stdin.close()
        self.stream_writer.wait()

def main():
    # Hardcoded parameters
    input_path = 'video/test1.mp4'
    output_dir = 'output'
    model_name = 'RealESRGAN_x4plus'
    model_path = '../../RealESRGAN/model/net_g_5000.pth'
    outscale = 4
    suffix = 'out'
    tile = 1000
    ffmpeg_bin = 'ffmpeg'
    fp32 = False  # Use FP16 by default

    # Ensure model name is RealESRGAN_x4plus
    if model_name != 'RealESRGAN_x4plus':
        raise ValueError('This script only supports RealESRGAN_x4plus model')

    # Validate input and model path
    input_path = input_path.rstrip('/').rstrip('\\')
    if not os.path.isfile(input_path) or not mimetypes.guess_type(input_path)[0].startswith('video'):
        raise ValueError('Input must be a video file')
    if not os.path.isfile(model_path):
        raise ValueError(f'Model path {model_path} does not exist')

    # Convert .flv to .mp4 if necessary
    if input_path.endswith('.flv'):
        mp4_path = input_path.replace('.flv', '.mp4')
        os.system(f'{ffmpeg_bin} -i {input_path} -codec copy {mp4_path}')
        input_path = mp4_path

    # Initialize model
    model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32, scale=4)
    upsampler = RealESRGANer(
        scale=4,
        model_path=model_path,
        model=model,
        tile=tile,
        tile_pad=10,
        pre_pad=0,
        half=not fp32
    )

    # Create output directory
    os.makedirs(output_dir, exist_ok=True)
    video_name = os.path.splitext(os.path.basename(input_path))[0]
    video_save_path = os.path.join(output_dir, f'{video_name}_{suffix}.mp4')

    # Process video
    reader = VideoReader(input_path, ffmpeg_bin)
    audio = reader.get_audio()
    height, width = reader.get_resolution()
    fps = reader.get_fps()
    writer = VideoWriter(video_save_path, audio, height, width, fps, outscale, ffmpeg_bin)

    pbar = tqdm(total=len(reader), unit='frame', desc=f'Processing {video_name}')
    while True:
        img = reader.get_frame()
        if img is None:
            break
        try:
            output, _ = upsampler.enhance(img, outscale=outscale)
            writer.write_frame(output)
        except RuntimeError as error:
            print(f'Error processing frame: {error}')
            print('Try reducing tile size if you encounter CUDA out of memory.')
        pbar.update(1)

    reader.close()
    writer.close()
    print(f'Saved: {video_save_path}')

if __name__ == '__main__':
    main()




Processing test1:   0%|          | 0/61 [00:00<?, ?frame/s][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:   2%|▏         | 1/61 [00:14<14:39, 14.67s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:   3%|▎         | 2/61 [00:28<13:46, 14.01s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:   5%|▍         | 3/61 [00:41<13:20, 13.81s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:   7%|▋         | 4/61 [00:55<13:02, 13.73s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:   8%|▊         | 5/61 [01:09<12:47, 13.70s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  10%|▉         | 6/61 [01:22<12:31, 13.67s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  11%|█▏        | 7/61 [01:36<12:19, 13.70s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  13%|█▎        | 8/61 [01:50<12:15, 13.88s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  15%|█▍        | 9/61 [02:04<12:08, 14.01s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  16%|█▋        | 10/61 [02:18<11:54, 14.00s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  18%|█▊        | 11/61 [02:33<11:41, 14.02s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  20%|█▉        | 12/61 [02:47<11:28, 14.06s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  21%|██▏       | 13/61 [03:01<11:18, 14.14s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  23%|██▎       | 14/61 [03:15<11:04, 14.14s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  25%|██▍       | 15/61 [03:30<10:56, 14.28s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  26%|██▌       | 16/61 [03:44<10:41, 14.26s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  28%|██▊       | 17/61 [03:58<10:26, 14.23s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  30%|██▉       | 18/61 [04:12<10:10, 14.20s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  31%|███       | 19/61 [04:26<09:56, 14.19s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  33%|███▎      | 20/61 [04:40<09:38, 14.11s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  34%|███▍      | 21/61 [04:55<09:26, 14.17s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  36%|███▌      | 22/61 [05:09<09:18, 14.33s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  38%|███▊      | 23/61 [05:24<09:04, 14.32s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  39%|███▉      | 24/61 [05:38<08:46, 14.23s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  41%|████      | 25/61 [05:52<08:32, 14.24s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  43%|████▎     | 26/61 [06:06<08:17, 14.20s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  44%|████▍     | 27/61 [06:20<08:03, 14.23s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  46%|████▌     | 28/61 [06:35<07:54, 14.39s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  48%|████▊     | 29/61 [06:50<07:41, 14.41s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  49%|████▉     | 30/61 [07:04<07:22, 14.28s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  51%|█████     | 31/61 [07:18<07:09, 14.32s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  52%|█████▏    | 32/61 [07:32<06:53, 14.26s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  54%|█████▍    | 33/61 [07:46<06:40, 14.29s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  56%|█████▌    | 34/61 [08:01<06:27, 14.34s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  57%|█████▋    | 35/61 [08:16<06:16, 14.47s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  59%|█████▉    | 36/61 [08:30<05:59, 14.39s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  61%|██████    | 37/61 [08:44<05:44, 14.37s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  62%|██████▏   | 38/61 [08:58<05:29, 14.31s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  64%|██████▍   | 39/61 [09:13<05:16, 14.36s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  66%|██████▌   | 40/61 [09:27<05:02, 14.41s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  67%|██████▷   | 41/61 [09:42<04:50, 14.53s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  69%|██████▉   | 42/61 [09:56<04:34, 14.46s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  70%|███████   | 43/61 [10:12<04:27, 14.88s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  72%|███████▏  | 44/61 [10:27<04:11, 14.80s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  74%|███████▍  | 45/61 [10:42<03:58, 14.90s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  75%|███████▌  | 46/61 [10:57<03:44, 14.98s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  77%|███████▋  | 47/61 [11:13<03:32, 15.15s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  79%|███████▊  | 48/61 [11:29<03:19, 15.34s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  80%|████████  | 49/61 [11:45<03:08, 15.69s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  82%|████████▏ | 50/61 [12:02<02:55, 16.00s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  84%|████████▎ | 51/61 [12:18<02:40, 16.04s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  85%|████████▌ | 52/61 [12:34<02:25, 16.13s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  87%|████████▋ | 53/61 [12:51<02:09, 16.17s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  89%|████████▊ | 54/61 [13:06<01:52, 16.07s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  90%|█████████ | 55/61 [13:22<01:35, 15.99s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  92%|█████████▏| 56/61 [13:39<01:20, 16.16s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  93%|█████████▎| 57/61 [13:55<01:04, 16.16s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  95%|█████████▌| 58/61 [14:11<00:48, 16.15s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  97%|█████████▋| 59/61 [14:27<00:31, 15.99s/frame][A

	Tile 1/4
	Tile 2/4
	Tile 3/4
	Tile 4/4



Processing test1:  98%|█████████▊| 60/61 [16:29<00:16, 16.50s/frame]

Saved: output/test1_out.mp4





## Usage Instructions

1. **Set Up Files**:
   - Upload your input video (e.g., `test1.mp4`) to the `video/` directory in the Colab working directory, or include it in your Git repository.
   - Clone the Git repository containing the fine-tuned model weights (`net_g_5000.pth`) by running the provided cell. Ensure Git LFS is installed to download the model.

2. **Modify Paths**:
   - Update `input_path` in the `main` function to point to your video file (e.g., `video/test1.mp4`).
   - Update `model_path` to point to the fine-tuned model (e.g., `your-repo/model/net_g_5000.pth`).
   - Optionally, adjust `output_dir`, `outscale`, or `tile` as needed.

3. **Run the Notebook**:
   - Execute all cells in order.
   - Monitor the progress bar for frame processing.

4. **Download Output**:
   - The enhanced video will be saved in the `output/` directory (e.g., `output/test1_out.mp4`).
   - Download the video from Colab's file explorer.

## Troubleshooting
- **CUDA Out of Memory**: Reduce `tile` size (e.g., to 512) or lower `outscale`.
- **FFmpeg Errors**: Ensure FFmpeg is installed correctly by running the dependency installation cell.
- **Model Not Found**: Verify the Git repository is cloned, Git LFS is set up, and the model path is correct.
- **Slow Processing**: High `outscale` or large video resolutions may increase processing time. Consider reducing `outscale` or video resolution.
- **Git LFS Issues**: Ensure Git LFS is installed and the repository is properly configured. Run `git lfs pull` in the repository directory to download the model.