Comprehensive Project Documentation

Image and Video Colorization using Deep Learning

A complete final year project that colorizes grayscale images and videos using a custom Convolutional Neural Network (CNN) trained from scratch without any pretrained weights.

📋 Table of Contents

Project Overview
Project Structure
Installation & Setup
Training the Model
Using the Application
Features
API Endpoints
Troubleshooting
Project Details

🎨 Project Overview

Objective

Build a deep learning system that converts grayscale images and videos to color using a custom CNN architecture, without relying on any pretrained models.

Key Technologies

Framework: PyTorch
Color Space: LAB (works with L channel input to predict A and B channels)
Dataset: CIFAR-10
Backend: Flask REST API
Frontend: React with interactive UI
Video Processing: OpenCV

Model Architecture

Type: Convolutional Neural Network (CNN)
Input: Grayscale image (L channel only)
Output: Color channels (A and B)
Encoder-Decoder: 8 blocks with batch normalization and ReLU activation
Loss Function: Mean Squared Error (MSE)

📁 Project Structure

Final Project/
├── backend/
│   ├── data_loader.py          # Custom Dataset class and DataLoaders
│   ├── model.py                # CNN architecture (ColorizeNet)
│   ├── train.py                # Training script with logging
│   ├── predict.py              # Inference script for images
│   ├── video_colorize.py       # Video processing and colorization
│   ├── utils.py                # Color space conversions & utilities
│   ├── app.py                  # Flask API server
│   ├── config.py               # Configuration settings
│   ├── requirements.txt        # Python dependencies
│   └── models/                 # Directory for saved models
│       └── colorize_model.pth  # Trained model weights
│
├── frontend/
│   ├── public/
│   │   └── index.html          # HTML entry point
│   ├── src/
│   │   ├── components/         # React components
│   │   │   ├── ImageUploader.js
│   │   │   ├── ImageUploader.css
│   │   │   ├── VideoUploader.js
│   │   │   ├── VideoUploader.css
│   │   │   ├── ImageComparison.js
│   │   │   └── ImageComparison.css
│   │   ├── App.js              # Main React app
│   │   ├── App.css
│   │   ├── index.js
│   │   └── index.css
│   ├── package.json
│   └── .gitignore
│
├── dataset/                    # CIFAR-10 dataset (auto-downloaded)
├── outputs/                    # Output colorized images/videos
├── .github/
│   └── copilot-instructions.md
├── .gitignore
└── README.md                   # This file

🚀 Installation & Setup

Prerequisites

Python 3.8+
Node.js 14+
pip and npm package managers
CUDA 11.0+ (optional, for GPU support)

Backend Setup

Navigate to backend directory:
```
cd backend
```

Create virtual environment (recommended):

# Windows
python -m venv venv
venv\Scripts\activate

# macOS/Linux
python3 -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```

Frontend Setup

Navigate to frontend directory:
```
cd frontend
```
Install dependencies:
```
npm install
```

🏋️ Training the Model

Quick Start (CPU - ~10-20 minutes for 5 epochs)

cd backend
python train.py --epochs 5 --batch_size 32 --image_size 256

Full Training (Recommended - ~2-4 hours for 20 epochs with GPU)

python train.py \
    --epochs 20 \
    --batch_size 64 \
    --learning_rate 0.001 \
    --image_size 256 \
    --model_save_path models/colorize_model.pth

Training Parameters

--epochs: Number of training epochs (default: 20)
--batch_size: Batch size for training (default: 32)
--learning_rate: Learning rate for Adam optimizer (default: 0.001)
--image_size: Input image resolution (default: 256)
--model_save_path: Path to save trained model (default: models/colorize_model.pth)
--num_workers: Number of data loading workers (default: 0)

Expected Output

After training, you'll see:

models/colorize_model.pth - Trained model weights
outputs/training_loss.png - Loss graph visualization
outputs/training_history.json - Numerical training history

Training Notes

The model downloads CIFAR-10 automatically (~400MB)
Uses MSE loss between predicted and ground truth AB channels
Implements learning rate scheduling (reduces by 0.5 after 5 epochs without improvement)
Best model is saved based on validation loss

💻 Using the Application

Method 1: Using the Web UI (Recommended)

Start Backend Server:

cd backend
python app.py

The Flask API will run on http://localhost:5000

Start Frontend (new terminal):

cd frontend
npm start

The React app will run on http://localhost:3000

Use the Application:

Open browser to http://localhost:3000
Select Image or Video tab
Adjust color intensity (0.0 - 2.0)
Toggle denoising if desired
Upload file (drag & drop or click to select)
Wait for processing
Download colorized output

Method 2: Using Command Line

Colorize a Single Image:

cd backend
python predict.py \
    --model_path models/colorize_model.pth \
    --image_path path/to/image.jpg \
    --output_dir outputs \
    --color_intensity 1.0 \
    --denoise

Colorize Multiple Images:

python predict.py \
    --model_path models/colorize_model.pth \
    --image_dir path/to/image/folder \
    --output_dir outputs

Colorize a Video:

python video_colorize.py \
    --model_path models/colorize_model.pth \
    --video_path path/to/video.mp4 \
    --output_path outputs/colorized.mp4 \
    --color_intensity 1.0 \
    --skip_frames 1

✨ Features

Image Colorization

✅ Convert grayscale to color
✅ Process color images (recolorize)
✅ Adjustable color intensity (0.0 to 2.0)
✅ Optional denoising filter
✅ Batch processing support
✅ Multiple image format support (PNG, JPG, BMP, TIFF)

Video Colorization

✅ Extract and colorize video frames
✅ Frame skipping for faster processing
✅ Reconstruct video from colorized frames
✅ Preserves original video FPS
✅ Multiple video format support

UI Features

🎨 Modern, responsive web interface
🖱️ Interactive before/after comparison slider
📊 Real-time processing feedback
💾 One-click download of results
🎛️ Easy adjustment controls

Image Restoration

🔧 Non-Local Means Denoising
🛂 Noise reduction preserves details
⚙️ Configurable denoising strength

🔌 API Endpoints

Health Check

GET /health
Response: { "status": "ok", "message": "..." }

Colorize Image

POST /colorize-image
Content-Type: multipart/form-data

Parameters:
- file: Image file (required)
- intensity: Color intensity 0.0-2.0 (default: 1.0)
- denoise: Apply denoising true/false (default: false)

Response: PNG image file

Colorize Video

POST /colorize-video
Content-Type: multipart/form-data

Parameters:
- file: Video file (required)
- intensity: Color intensity 0.0-2.0 (default: 1.0)
- denoise: Apply denoising true/false (default: false)
- skip_frames: Process every nth frame (default: 1)

Response: MP4 video file

Model Information

GET /model-info
Response: Model architecture details

Supported Formats

GET /supported-formats
Response: List of supported file formats

🛠️ Troubleshooting

Backend Won't Start

Error: Cannot load model at 'models/colorize_model.pth'
Solution: Train the model first using train.py

CORS Issues

Error: Access to XMLHttpRequest blocked by CORS policy
Solution: CORS is enabled in Flask app. Check if backend is running on port 5000

Out of Memory

Error: CUDA out of memory
Solution:
1. Reduce batch_size: --batch_size 16
2. Use CPU: Set device to 'cpu'
3. Reduce image_size: --image_size 128

Slow Video Processing

Solution: Use --skip_frames parameter
python video_colorize.py --skip_frames 2  # Process every 2nd frame

React App Not Loading

Error: Cannot GET /
Solution:
1. Ensure npm start is running
2. Check http://localhost:3000
3. Check React dev server output for errors

📚 Project Details

Color Space: LAB

The LAB color space consists of:

L channel: Lightness (0-100)
A channel: Green-Red (-128 to 127)
B channel: Blue-Yellow (-128 to 127)

Our model:

Input: L channel (grayscale)
Output: A and B channels (color)
Advantage: Separates luminance from color, better for colorization

Model Architecture: ColorizeNet

Input: (B, 1, 256, 256) - Grayscale L channel

ENCODER:
├─ Conv2d: 1 → 64 channels
├─ Conv2d: 64 → 64 channels
├─ MaxPool: 256 → 128
├─ Conv2d: 64 → 128 channels
├─ Conv2d: 128 → 128 channels
├─ MaxPool: 128 → 64
├─ Conv2d: 128 → 256 channels
├─ Conv2d: 256 → 256 channels
├─ MaxPool: 64 → 32
├─ Conv2d: 256 → 512 channels (Bottleneck)
└─ Conv2d: 512 → 512 channels

DECODER:
├─ ConvTranspose2d: 512 → 256 (32 → 64)
├─ ConvTranspose2d: 256 → 128 (64 → 128)
├─ ConvTranspose2d: 128 → 64 (128 → 256)
├─ ConvTranspose2d: 64 → 32 (256 → 512)
└─ Conv2d: 32 → 2 channels (Output: A, B)

Output: (B, 2, 256, 256) - Color AB channels

Training Configuration

Loss Function: MSELoss (Mean Squared Error)
Optimizer: Adam with weight decay (L2 regularization)
Learning Rate: 0.001 (with ReduceLROnPlateau scheduler)
Batch Normalization: Applied after each convolution
Activation: ReLU throughout network
Data Augmentation: Resize to 256×256

Key Algorithms

RGB to LAB Conversion:

Normalize RGB [0,255] → [0,1]
Apply gamma correction
Convert to XYZ using transformation matrix
Normalize by reference white point
Apply non-linear transformation
Calculate LAB values

AB Channel Normalization:

A: [-128, 127] → [-1, 1] (divide by 128)
B: [-128, 127] → [-1, 1] (divide by 128)

Color Intensity Control:

Adjusted_AB = Original_AB × Intensity_Factor
Range: 0.0 (grayscale) to 2.0 (vibrant colors)

📊 Training Results

Typical results after training on CIFAR-10:

Metric	Value
Best Validation MSE Loss	~0.02
Training Time (20 epochs, GPU)	2-4 hours
Training Time (20 epochs, CPU)	8-12 hours
Model Size	~15 MB
FPS (Image prediction, GPU)	30-50
FPS (Image prediction, CPU)	5-10

🎓 Learning Outcomes

By working through this project, you'll learn:

Deep Learning Fundamentals
- CNN architecture design
- Encoder-decoder patterns
- Batch normalization
- Loss functions and optimization
PyTorch Skills
- Building custom models
- Custom Dataset classes
- DataLoader creation
- Training loops
- Model checkpointing
Image Processing
- Color space conversions (RGB, LAB, XYZ)
- Image normalization
- Denoising techniques
- Video frame extraction
Full-Stack Development
- REST API design with Flask
- React component creation
- CORS and API integration
- File upload handling
Software Engineering
- Project structure
- Configuration management
- Error handling
- Code documentation

📝 Code Comments

All code is written with detailed comments explaining:

Function purpose and parameters
Algorithm steps
Mathematical transformations
Edge cases and special handling

📄 License

This project is created for educational purposes as a final year project.

🤝 Support

For issues or questions:

Check the Troubleshooting section
Review code comments for implementation details
Check Flask/React console output for error messages

🎯 Quick Reference

Train Model (5 epochs)

cd backend && python train.py --epochs 5

Run Backend API

cd backend && python app.py

Run Frontend

cd frontend && npm start

Colorize Single Image

python backend/predict.py --model_path backend/models/colorize_model.pth --image_path image.jpg

Colorize Video

python backend/video_colorize.py --model_path backend/models/colorize_model.pth --video_path video.mp4

Created: 2024 Status: Complete Final Year Project Difficulty: Intermediate to Advanced

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
backend		backend
frontend		frontend
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHECKLIST.md		CHECKLIST.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
setup.bat		setup.bat
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Comprehensive Project Documentation

Image and Video Colorization using Deep Learning

📋 Table of Contents

🎨 Project Overview

Objective

Key Technologies

Model Architecture

📁 Project Structure

🚀 Installation & Setup

Prerequisites

Backend Setup

Frontend Setup

🏋️ Training the Model

Quick Start (CPU - ~10-20 minutes for 5 epochs)

Full Training (Recommended - ~2-4 hours for 20 epochs with GPU)

Training Parameters

Expected Output

Training Notes

💻 Using the Application

Method 1: Using the Web UI (Recommended)

Start Backend Server:

Start Frontend (new terminal):

Use the Application:

Method 2: Using Command Line

Colorize a Single Image:

Colorize Multiple Images:

Colorize a Video:

✨ Features

Image Colorization

Video Colorization

UI Features

Image Restoration

🔌 API Endpoints

Health Check

Colorize Image

Colorize Video

Model Information

Supported Formats

🛠️ Troubleshooting

Backend Won't Start

CORS Issues

Out of Memory

Slow Video Processing

React App Not Loading

📚 Project Details

Color Space: LAB

Model Architecture: ColorizeNet

Training Configuration

Key Algorithms

RGB to LAB Conversion:

AB Channel Normalization:

Color Intensity Control:

📊 Training Results

🎓 Learning Outcomes

📝 Code Comments

📄 License

🤝 Support

🎯 Quick Reference

Train Model (5 epochs)

Run Backend API

Run Frontend

Colorize Single Image

Colorize Video

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages