# Video Processing Basics:

Frame extraction
Frame interpolation
Frame alignment
Video compression and decompression

Video processing in computer vision involves analyzing and manipulating video streams to extract useful information. Here's an overview of the basics:

1. **Frame:** A single image in a sequence of images that make up a video.

2. **Frame Rate (fps):** The number of frames displayed per second. Common frame rates are 24, 30, and 60 fps.

3. **Resolution:** The dimensions of the video, typically represented by width x height (e.g., 1920x1080 pixels).

4. **Color Space:** The color representation used in the video, such as RGB (Red, Green, Blue) or YUV (Luminance, Chrominance).

5. **Video Compression:** Reducing the size of video files by removing redundant or irrelevant information. Common video compression standards include MPEG, H.264, and H.265.

6. **Preprocessing:** Techniques applied to frames before further analysis, such as resizing, denoising, or color correction.

7. **Feature Extraction:** Identifying key points or regions in frames that are relevant for further analysis. Common features include edges, corners, or keypoints detected using algorithms like Canny edge detection or Harris corner detection.

8. **Motion Estimation:** Analyzing the movement of objects between frames to track their trajectories. Techniques include optical flow estimation and block matching.

9. **Object Detection and Tracking:** Identifying and following objects of interest across frames. This involves detecting objects in individual frames using techniques like Haar cascades or deep learning-based methods (e.g., YOLO, SSD) and tracking their movements over time.

10. **Background Subtraction:** Separating foreground objects from the background to focus on relevant information. This is commonly used in applications like surveillance and video segmentation.

11. **Video Stabilization:** Removing unwanted camera motion or jitter from the video to produce smoother footage. Techniques include optical flow-based stabilization and gyroscopic data-based stabilization.

12. **Temporal Filtering:** Applying filters to video sequences over time to remove noise or enhance certain characteristics. Examples include temporal averaging and temporal median filtering.

Equations are integral to many aspects of video processing, especially when it comes to algorithms like optical flow estimation or motion tracking. Here are some common equations used in these contexts:

1. **Optical Flow Equation:**
   \[
   I_x \cdot u + I_y \cdot v + I_t = 0
   \]
   where \( I_x \) and \( I_y \) are the spatial gradients of intensity in the x and y directions respectively, \( I_t \) is the temporal gradient of intensity, and \( u \) and \( v \) represent the horizontal and vertical components of optical flow.

2. **Motion Model:**
   \[
   p_{k+1} = p_k + \Delta t \cdot v_k + \frac{1}{2} \Delta t^2 \cdot a_k
   \]
   where \( p_k \) is the position at time \( k \), \( v_k \) is the velocity at time \( k \), \( a_k \) is the acceleration at time \( k \), and \( \Delta t \) is the time step.

3. **Kalman Filter Equations:**
   Prediction step:
   \[
   \hat{x}_{k|k-1} = F_k \hat{x}_{k-1|k-1} + B_k u_k
   \]
   \[
   P_{k|k-1} = F_k P_{k-1|k-1} F_k^T + Q_k
   \]
   Update step:
   \[
   K_k = P_{k|k-1} H_k^T (H_k P_{k|k-1} H_k^T + R_k)^{-1}
   \]
   \[
   \hat{x}_{k|k} = \hat{x}_{k|k-1} + K_k(z_k - H_k \hat{x}_{k|k-1})
   \]
   \[
   P_{k|k} = (I - K_k H_k) P_{k|k-1}
   \]
   where \( \hat{x}_{k|k} \) is the updated state estimate, \( P_{k|k} \) is the updated error covariance, \( F_k \) is the state transition matrix, \( B_k \) is the control-input matrix, \( u_k \) is the control vector, \( Q_k \) is the process noise covariance, \( H_k \) is the observation matrix, \( R_k \) is the observation noise covariance, \( K_k \) is the Kalman gain, and \( z_k \) is the measurement vector.

Understanding these equations and concepts lays the foundation for more advanced video processing techniques and applications in computer vision.

In [1]:
import cv2

def extract_frames(video_path, output_dir, frame_rate=1):
    # Open the video file
    video_capture = cv2.VideoCapture(video_path)
    
    # Get the frame rate of the video
    fps = video_capture.get(cv2.CAP_PROP_FPS)
    
    # Calculate frame interval based on desired frame rate
    frame_interval = int(round(fps / frame_rate))
    
    # Initialize frame count
    frame_count = 0
    
    while True:
        # Read a frame from the video
        ret, frame = video_capture.read()
        
        # If frame reading was successful
        if ret:
            # Increment frame count
            frame_count += 1
            
            # Check if it's time to extract a frame based on frame interval
            if frame_count % frame_interval == 0:
                # Save the frame to the output directory
                output_path = f"{output_dir}/frame_{frame_count}.jpg"
                cv2.imwrite(output_path, frame)
        else:
            # Break the loop if no more frames are available
            break
    
    # Release the video capture object
    video_capture.release()

# Example usage
video_path = "My.mp4"
output_dir = "output_frames"
extract_frames(video_path, output_dir, frame_rate=1)


In [3]:
import cv2

def interpolate_frames(video_path, output_path, new_frame_rate=60):
    # Open the video file
    video_capture = cv2.VideoCapture(video_path)
    
    # Get the frame rate of the video
    fps = video_capture.get(cv2.CAP_PROP_FPS)
    
    # Calculate the frame interval for interpolation
    frame_interval = int(round(fps / new_frame_rate))
    
    # Initialize the optical flow object
    optical_flow = cv2.optflow.createOptFlow_DualTVL1()
    
    # Initialize variables for frame interpolation
    prev_frame = None
    prev_gray = None
    frame_count = 0
    
    # Initialize video writer object
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(output_path, fourcc, new_frame_rate, (int(video_capture.get(3)), int(video_capture.get(4))))
    
    while True:
        # Read a frame from the video
        ret, frame = video_capture.read()
        
        if ret:
            frame_count += 1
            
            # Convert frame to grayscale
            gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
            
            # Calculate optical flow if not the first frame
            if prev_frame is not None:
                flow = optical_flow.calc(prev_gray, gray, None)
                
                # Interpolate frames based on optical flow
                interpolated_frame = cv2.remap(prev_frame, -flow[..., 0], -flow[..., 1], cv2.INTER_LINEAR)
                
                # Write the interpolated frame to the output video
                out.write(interpolated_frame)
            
            # Update previous frame and grayscale image
            prev_frame = frame.copy()
            prev_gray = gray.copy()
            
            # Skip frames based on frame interval
            for _ in range(frame_interval - 1):
                video_capture.grab()
        else:
            break
    
    # Release video objects
    video_capture.release()
    out.release()

# Example usage
input_video_path = "My.mp4"
output_video_path = "output_video_interpolated.mp4"
interpolate_frames(input_video_path, output_video_path, new_frame_rate=60)


In [6]:
import cv2

def align_frames(frame1, frame2):
    # Convert frames to grayscale
    gray1 = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)
    gray2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

    # Detect features in both frames
    orb = cv2.ORB_create()
    keypoints1, descriptors1 = orb.detectAndCompute(gray1, None)
    keypoints2, descriptors2 = orb.detectAndCompute(gray2, None)

    # Match features between frames
    bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
    matches = bf.match(descriptors1, descriptors2)
    matches = sorted(matches, key=lambda x: x.distance)

    # Extract matched keypoints
    src_pts = np.float32([keypoints1[m.queryIdx].pt for m in matches]).reshape(-1, 1, 2)
    dst_pts = np.float32([keypoints2[m.trainIdx].pt for m in matches]).reshape(-1, 1, 2)

    # Compute homography
    homography, _ = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)

    # Warp frame 1 onto frame 2 using the computed homography
    aligned_frame = cv2.warpPerspective(frame1, homography, (frame2.shape[1], frame2.shape[0]))

    return aligned_frame

# Example usage
frame1 = cv2.imread('My.jpg')
frame2 = cv2.imread('My.jpg')

aligned_frame = align_frames(frame1, frame2)

# Display the aligned frame
cv2.imshow('Aligned Frame', aligned_frame)
cv2.waitKey(0)
cv2.destroyAllWindows()


In [9]:
import cv2

def compress_video(input_video, output_video, codec='XVID', fps=30, quality=95):
    # Open input video
    input_cap = cv2.VideoCapture(input_video)
    if not input_cap.isOpened():
        print("Error: Couldn't open input video.")
        return
    
    # Get input video properties
    width = int(input_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(input_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

    # Define codec and create VideoWriter object
    fourcc = cv2.VideoWriter_fourcc(*codec)
    output_writer = cv2.VideoWriter(output_video, fourcc, fps, (width, height))

    # Read and compress frames
    while True:
        ret, frame = input_cap.read()
        if not ret:
            break
        
        # Compress frame
        output_writer.write(frame)

    # Release resources
    input_cap.release()
    output_writer.release()
    cv2.destroyAllWindows()

def decompress_video(input_video, output_video):
    # Open input video
    input_cap = cv2.VideoCapture(input_video)
    if not input_cap.isOpened():
        print("Error: Couldn't open input video.")
        return
    
    # Get input video properties
    fps = input_cap.get(cv2.CAP_PROP_FPS)
    width = int(input_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(input_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

    # Create VideoWriter object
    output_writer = cv2.VideoWriter(output_video, cv2.VideoWriter_fourcc(*'XVID'), fps, (width, height))

    # Decompress and write frames
    while True:
        ret, frame = input_cap.read()
        if not ret:
            break
        
        # Write frame
        output_writer.write(frame)

    # Release resources
    input_cap.release()
    output_writer.release()
    cv2.destroyAllWindows()

# Example usage
input_video = 'My.mp4'
compressed_video = 'compressed_video.mp4'
decompressed_video = 'decompressed_video.mp4'

# Compress video
compress_video(input_video, compressed_video)

# Decompress video
decompress_video(compressed_video, decompressed_video)


# Lists of all the topics in Video processing basics 

Video processing basics in computer vision cover a wide range of topics. Here's a list of some fundamental topics within video processing:

1. **Frame Capture**: 
   - Capturing individual frames from a video stream.

2. **Frame Display**: 
   - Displaying individual frames or sequences of frames.

3. **Frame Manipulation**: 
   - Basic operations on frames like resizing, cropping, rotating, and flipping.

4. **Color Spaces**: 
   - Understanding and working with different color spaces like RGB, HSV, YUV, etc., for better color manipulation.

5. **Frame Subtraction**: 
   - Background subtraction to detect moving objects.

6. **Temporal Filtering**: 
   - Applying filters across frames to reduce noise or enhance features temporally.

7. **Optical Flow**: 
   - Estimating the motion of objects between frames.

8. **Object Tracking**: 
   - Tracking the movement of objects across frames.

9. **Video Compression**: 
   - Techniques to reduce the size of video files for storage or transmission.

10. **Video Decompression**: 
    - Techniques to decode compressed video data back into individual frames.

11. **Motion Detection**: 
    - Detecting and analyzing motion in video sequences.

12. **Video Stabilization**: 
    - Reducing shakiness or jitter in videos caused by camera motion.

13. **Video Enhancement**: 
    - Enhancing video quality through techniques like denoising, deblurring, and contrast adjustment.

14. **Video Segmentation**: 
    - Partitioning a video into segments based on object boundaries or motion.

15. **Frame Interpolation**: 
    - Generating intermediate frames between existing frames to smoothen motion or increase frame rate.

16. **Video Summarization**: 
    - Generating concise representations of videos by selecting key frames or segments.

17. **Video Annotation**: 
    - Adding metadata or labels to video frames for analysis or visualization.

These topics provide a foundation for understanding and working with video data in computer vision applications. Each topic may have various techniques, algorithms, and tools associated with it, forming the building blocks for more advanced video processing tasks.

# Frame Capture:

Capturing individual frames from a video stream.

Frame capture refers to the process of extracting individual frames from a video stream. This process is commonly used in various applications such as video editing, computer vision, and video compression. 

Here's a breakdown of the general steps involved in frame capture:

1. **Frame Extraction**: Each frame of a video is a still image that represents a specific moment in time. To capture frames, you need to extract these images from the video stream. This can be done using software libraries or frameworks that provide functions for video processing.

2. **Frame Rate**: The frame rate of a video determines how many frames are displayed per second. Common frame rates include 24 frames per second (fps) for film, 30 fps for television, and 60 fps for many digital videos. The frame rate is typically denoted as FPS.

3. **Frame Resolution**: The resolution of a frame refers to the dimensions of the image, typically measured in pixels. Common resolutions include 1920x1080 (Full HD) and 3840x2160 (4K UHD). The resolution determines the level of detail in each frame.

4. **Sampling Rate**: When capturing frames, you may need to specify a sampling rate, which determines how often frames are extracted from the video stream. For example, you might capture every frame (1:1 sampling), or you might capture every nth frame to reduce the number of frames processed.

Equations:

1. **Total Frames**: The total number of frames in a video can be calculated by multiplying the frame rate (FPS) by the duration of the video (in seconds). 
   \[ \text{Total Frames} = \text{Frame Rate} \times \text{Duration (seconds)} \]

2. **Frame Time**: The time corresponding to a particular frame can be calculated using the frame index (starting from 0) and the frame rate.
   \[ \text{Frame Time (seconds)} = \frac{\text{Frame Index}}{\text{Frame Rate}} \]

3. **Frame Index from Time**: If you have a specific time in seconds and want to find the corresponding frame index, you can use the following equation:
   \[ \text{Frame Index} = \text{Frame Rate} \times \text{Time (seconds)} \]

Frame capture is a fundamental operation in video processing and analysis, and understanding these concepts allows for efficient manipulation and analysis of video data.

In [1]:
import cv2

def capture_frames(video_path, output_path, sampling_rate=1):
    # Open the video file
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print("Error: Could not open video file.")
        return

    frame_count = 0

    # Read and save frames
    while True:
        ret, frame = cap.read()

        # Check if frame is read correctly
        if not ret:
            break

        # Apply sampling rate
        if frame_count % sampling_rate == 0:
            # Save the frame
            frame_output_path = output_path.format(frame_count)
            cv2.imwrite(frame_output_path, frame)

        frame_count += 1

    # Release the video capture object
    cap.release()

    print("Frames captured successfully.")

# Example usage:
video_path = 'My.mp4'
output_path = 'frames/frame_{}.jpg'  # Output path format, {} will be replaced by frame index
sampling_rate = 10  # Capture every 10th frame

capture_frames(video_path, output_path, sampling_rate)


Error: Could not open video file.


# Frame Display:

Displaying individual frames or sequences of frames.

Displaying individual frames or sequences of frames involves presenting a series of images in a sequential order to convey motion, simulate movement, or illustrate a process. This concept is fundamental in various fields such as animation, film, video games, and scientific visualization. Here's an overview of the key aspects involved:

1. **Frame**: A frame is a single image in a sequence of images. In animation and video, each frame represents a specific moment in time. Frames are typically displayed at a constant rate, measured in frames per second (fps), to create the illusion of continuous motion when viewed in succession.

2. **Frame Rate (fps)**: The frame rate determines how many frames are displayed per second. Common frame rates include 24 fps (standard for film), 30 fps (common for television and online video), and 60 fps (common for video games and high-definition video). Higher frame rates result in smoother motion but require more computational resources.

3. **Resolution**: The resolution of each frame refers to the number of pixels it contains, typically measured in width x height (e.g., 1920x1080 for Full HD). Higher resolutions result in sharper and more detailed images but require more storage space and computational power.

4. **Compression**: To reduce file size and transmission bandwidth, frames are often compressed using various algorithms such as JPEG, MPEG, or H.264. Compression techniques exploit redundancies in the image data to represent it more efficiently.

5. **Display Devices**: Frames are ultimately displayed on various devices such as monitors, projectors, or screens. Different devices have different display capabilities, including resolution, color depth, refresh rate, and aspect ratio, which may affect how frames are rendered.

6. **Interpolation**: In computer graphics and animation, interpolation techniques are often used to generate intermediate frames between keyframes. This process, known as tweening, helps create smooth motion transitions by automatically generating frames based on the positions and attributes of keyframes.

7. **Equations**: Several mathematical equations and algorithms are involved in frame display and animation, including:

   - **Motion Blur**: Simulates the blurring effect that occurs when objects move quickly across the screen. Motion blur equations typically involve the velocity of the moving object and the exposure time of each frame.
   
   - **Bezier Curves**: Used in animation to define smooth paths for objects to follow between keyframes. Bezier curves are described by polynomial equations that control the shape of the curve.
   
   - **Physics Simulations**: In simulations of physical phenomena such as fluid dynamics or rigid body dynamics, equations of motion (e.g., Newton's laws) are used to calculate the position and velocity of objects over time, which are then rendered as frames.

Frame display involves a combination of artistic creativity, technical knowledge, and computational algorithms to create compelling visual experiences across various media platforms.

In [1]:
import cv2

def display_frames(video_path):
    # Open video file
    cap = cv2.VideoCapture(video_path)
    
    if not cap.isOpened():
        print("Error: Couldn't open video file")
        return
    
    while True:
        # Read a frame from the video
        ret, frame = cap.read()
        
        # If there are no more frames to read, break the loop
        if not ret:
            break
        
        # Display the frame
        cv2.imshow('Frame', frame)
        
        # Wait for a key press and exit if 'q' is pressed
        if cv2.waitKey(25) & 0xFF == ord('q'):
            break
    
    # Release the video capture object and close all windows
    cap.release()
    cv2.destroyAllWindows()

# Path to the video file
video_path = 'My.mp4'

# Call the function to display frames
display_frames(video_path)


Error: Couldn't open video file


# Frame Manipulation:

Basic operations on frames like resizing, cropping, rotating, and flipping.

Frame manipulation involves basic operations on frames or images, such as resizing, cropping, rotating, and flipping. These operations are fundamental in image processing and computer vision tasks, allowing us to preprocess images before further analysis or presentation. Here's a breakdown of each operation along with relevant information and equations:

### 1. Resizing:

Resizing an image involves changing its dimensions while preserving its aspect ratio or stretching it to fit new dimensions. This operation is useful for adjusting the size of images for display, processing, or storage.

**Equations:**
- **Nearest Neighbor Interpolation:** Simplest method, where each pixel value in the output image is determined by the nearest pixel in the input image.
- **Bilinear Interpolation:** More sophisticated method, where the output pixel value is a weighted average of the four nearest pixels in the input image.

### 2. Cropping:

Cropping involves selecting a region of interest (ROI) from an image and discarding the rest. This operation is commonly used to remove unwanted parts of an image or focus on specific features.

**Equations:** No specific equations for cropping, but the process involves selecting a rectangular region defined by its coordinates (top-left and bottom-right corners) and extracting pixels within that region.

### 3. Rotating:

Rotating an image involves changing its orientation by a specified angle. This operation is useful for correcting image alignment or extracting features from different perspectives.

**Equations:**
- **Rotation Matrix:** To rotate an image by an angle θ around its center, we use a rotation matrix:

\[ \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \]

### 4. Flipping:

Flipping an image involves reversing its pixel values horizontally, vertically, or both. This operation is commonly used for data augmentation or mirroring images.

**Equations:** No specific equations for flipping, but the process involves reversing pixel values along one or both axes.

### Implementation in Python (using OpenCV):

Here's an example Python code snippet demonstrating these operations using OpenCV:

```python
import cv2

# Read an image
image = cv2.imread('image.jpg')

# Resize image
resized_image = cv2.resize(image, (new_width, new_height))

# Crop image
cropped_image = image[y1:y2, x1:x2]

# Rotate image
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle, scale)
rotated_image = cv2.warpAffine(image, M, (w, h))

# Flip image
flipped_image = cv2.flip(image, flip_code)
```

Make sure to replace `'image.jpg'` with the path to your input image. Adjust parameters like `new_width`, `new_height`, `x1`, `y1`, `x2`, `y2`, `angle`, `scale`, and `flip_code` according to your requirements. This code demonstrates the basic operations of resizing, cropping, rotating, and flipping frames/images using OpenCV in Python.

In [8]:
import cv2

# Read an image
image = cv2.imread('My.jpg')

new_width = 5
new_height = 7
y1 = 2
y2 = 3
x1 = 2
x2 = 3
angle = 20
scale = 4
flip_code = 2

# Resize image
resized_image = cv2.resize(image, (new_width, new_height))

# Crop image
cropped_image = image[y1:y2, x1:x2]

# Rotate image
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle, scale)
rotated_image = cv2.warpAffine(image, M, (w, h))

# Flip image
flipped_image = cv2.flip(image, flip_code)

In [9]:
import cv2

def resize_frame(frame, width=None, height=None):
    if width is None and height is None:
        return frame
    if width is None:
        r = height / frame.shape[0]
        dim = (int(frame.shape[1] * r), height)
    else:
        r = width / frame.shape[1]
        dim = (width, int(frame.shape[0] * r))
    resized = cv2.resize(frame, dim, interpolation=cv2.INTER_AREA)
    return resized

def crop_frame(frame, x, y, w, h):
    return frame[y:y+h, x:x+w]

def rotate_frame(frame, angle):
    (h, w) = frame.shape[:2]
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, angle, 1.0)
    rotated = cv2.warpAffine(frame, M, (w, h))
    return rotated

def flip_frame(frame, direction):
    if direction == 'horizontal':
        return cv2.flip(frame, 1)
    elif direction == 'vertical':
        return cv2.flip(frame, 0)
    elif direction == 'both':
        return cv2.flip(frame, -1)
    else:
        return frame

# Example usage
if __name__ == "__main__":
    # Read a sample image
    image_path = 'My.jpg'
    image = cv2.imread(image_path)

    # Resize
    resized_image = resize_frame(image, width=400)

    # Crop
    cropped_image = crop_frame(resized_image, x=100, y=50, w=200, h=300)

    # Rotate
    rotated_image = rotate_frame(cropped_image, angle=45)

    # Flip
    flipped_image = flip_frame(rotated_image, direction='horizontal')

    # Display the original and manipulated frames
    cv2.imshow('Original Image', image)
    cv2.imshow('Resized Image', resized_image)
    cv2.imshow('Cropped Image', cropped_image)
    cv2.imshow('Rotated Image', rotated_image)
    cv2.imshow('Flipped Image', flipped_image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()


# Color Spaces:

Understanding and working with different color spaces like RGB, HSV, YUV, etc., for better color manipulation.

Color spaces are essential in video processing for representing and manipulating colors. Different color spaces have unique advantages and are suited for various tasks such as image analysis, segmentation, and manipulation. Here's an overview of some commonly used color spaces in video processing:

### 1. RGB (Red, Green, Blue):
RGB is the most common color space, where each pixel is represented by three color channels: red, green, and blue. It is widely used in digital imaging systems, displays, and cameras. RGB is additive, meaning colors are formed by combining different intensities of red, green, and blue light.

### 2. HSV (Hue, Saturation, Value):
HSV color space represents colors based on three components: hue, saturation, and value. Hue represents the color type (e.g., red, green, blue), saturation controls the intensity or purity of the color, and value represents the brightness or intensity of the color. HSV is often used for color segmentation and detection tasks due to its intuitive representation of color.

### 3. YUV (Luma, Chrominance):
YUV color space separates the luminance (brightness) information (Y) from the chrominance (color) information (U and V). The Y component represents the grayscale image, while U and V represent the chrominance components that encode color information. YUV is commonly used in video compression and transmission systems, such as MPEG and JPEG, as it provides better compression efficiency by separating brightness from color.

### 4. LAB (Lightness, A, B):
LAB color space consists of three components: lightness (L), and two color-opponent channels (A and B). The A channel represents the green to red color spectrum, while the B channel represents the blue to yellow color spectrum. LAB is perceptually uniform, making it suitable for color correction and image editing tasks.

### 5. CMYK (Cyan, Magenta, Yellow, Black):
CMYK color space is primarily used in color printing and represents colors using four components: cyan, magenta, yellow, and black (key). CMYK is subtractive, meaning colors are formed by subtracting different color components from white light. It is commonly used in printing processes to reproduce a wide range of colors.

### 6. XYZ (CIE 1931 Color Space):
XYZ color space is based on the human visual system's response to light and is defined by the International Commission on Illumination (CIE). It provides a device-independent representation of color and is used as a reference color space for color science and color matching applications.

### Conclusion:
Understanding and working with different color spaces is essential in video processing for various color manipulation tasks. Each color space has its advantages and applications, and choosing the right color space depends on the specific requirements of the task at hand. By leveraging the properties of different color spaces, video processing applications can achieve better color manipulation and analysis results.

In [10]:
import cv2
import numpy as np

# Read the original image
original_image = cv2.imread('My.jpg')

# Convert the image to different color spaces
hsv_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2HSV)
yuv_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2YUV)

# Display the original image and converted images
cv2.imshow('Original Image', original_image)
cv2.imshow('HSV Image', hsv_image)
cv2.imshow('YUV Image', yuv_image)
cv2.waitKey(0)
cv2.destroyAllWindows()


In [11]:
import cv2

# Open video file
video_capture = cv2.VideoCapture('My.mp4')

while True:
    # Read a frame from the video
    ret, frame = video_capture.read()
    
    if not ret:
        break  # Break the loop if there are no more frames
    
    # Convert the frame to different color spaces
    hsv_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    yuv_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2YUV)
    
    # Display the frames
    cv2.imshow('RGB', frame)
    cv2.imshow('HSV', hsv_frame)
    cv2.imshow('YUV', yuv_frame)
    
    # Check for key press to exit
    if cv2.waitKey(25) & 0xFF == ord('q'):
        break

# Release the video capture object and close all windows
video_capture.release()
cv2.destroyAllWindows()


# Frame Subtraction:

Background subtraction to detect moving objects.

### Frame Subtraction: Background Subtraction for Detecting Moving Objects

Frame subtraction, also known as background subtraction, is a fundamental technique in computer vision and video processing for detecting moving objects in video sequences. It involves comparing each frame of a video to a background model to identify regions that differ significantly, indicating movement. Here's a detailed explanation of the process, including the necessary equations and methods.

#### Basic Concept

The basic idea of frame subtraction is to create a background model \( B \) and then subtract it from each new frame \( F_t \) at time \( t \) to get the foreground mask \( M_t \), which highlights the moving objects.

\[ M_t(x, y) = |F_t(x, y) - B(x, y)| \]

where:
- \( F_t(x, y) \) is the pixel value at position \((x, y)\) in the current frame at time \( t \).
- \( B(x, y) \) is the pixel value at the same position in the background model.
- \( M_t(x, y) \) is the resulting mask pixel value, representing the detected motion.

#### Thresholding

To determine whether a pixel has changed significantly, a threshold \( T \) is applied to the difference:

\[ M_t(x, y) = \begin{cases} 
1 & \text{if } |F_t(x, y) - B(x, y)| > T \\
0 & \text{otherwise}
\end{cases} \]

This thresholding step converts the difference image into a binary mask where moving objects are highlighted.

#### Background Model Update

The background model \( B \) needs to be updated to adapt to changes in the scene over time. One common method is the running average:

\[ B_t(x, y) = \alpha F_t(x, y) + (1 - \alpha) B_{t-1}(x, y) \]

where:
- \( \alpha \) is the learning rate, a small positive constant (e.g., 0.01).

#### Advanced Techniques

##### 1. Mixture of Gaussians (MOG)

A more advanced background subtraction technique uses a mixture of Gaussians to model the background. Each pixel is modeled as a mixture of \( K \) Gaussian distributions.

\[ P(F_t(x, y)) = \sum_{i=1}^{K} w_{i,t} \eta(F_t(x, y); \mu_{i,t}, \Sigma_{i,t}) \]

where:
- \( w_{i,t} \) is the weight of the \( i \)-th Gaussian at time \( t \).
- \( \eta \) is the Gaussian probability density function.
- \( \mu_{i,t} \) and \( \Sigma_{i,t} \) are the mean and covariance of the \( i \)-th Gaussian, respectively.

The parameters of the Gaussians are updated iteratively, and a pixel is classified as foreground if it does not match any of the \( K \) distributions.

##### 2. Kernel Density Estimation (KDE)

Kernel Density Estimation is a non-parametric way to estimate the probability density function of the background. For each pixel, the density is estimated based on a window of previous frames.

##### 3. Codebook Model

In the codebook approach, a pixel's history is represented by a set of codewords (a codebook). Each codeword represents a different appearance of the pixel, and a pixel is classified as foreground if its value does not match any codeword.

### Implementation Steps

1. **Initialization**: Initialize the background model \( B \). This can be done by averaging the first few frames.

2. **Foreground Detection**: For each new frame \( F_t \):
   - Compute the difference \( D_t = |F_t - B| \).
   - Apply thresholding to get the binary mask \( M_t \).
   - Optionally, apply morphological operations (e.g., dilation, erosion) to clean up the mask.

3. **Background Model Update**: Update the background model using a chosen method (e.g., running average, MOG).

### Applications

- **Surveillance**: Detecting intruders in a monitored area.
- **Traffic Monitoring**: Identifying and tracking vehicles.
- **Human-Computer Interaction**: Gesture recognition and motion capture.
- **Environmental Monitoring**: Observing changes in natural environments.

### Summary

Frame subtraction is a powerful tool for detecting moving objects in video sequences. By subtracting a background model from each frame and applying thresholding, moving objects can be highlighted effectively. Advanced techniques like Mixture of Gaussians and Kernel Density Estimation provide more robust background modeling for complex scenes. Implementing these methods involves initializing the background model, detecting the foreground in each frame, and updating the background model over time.

Understanding and choosing the appropriate background subtraction method depends on the specific application and the characteristics of the video data.

In [1]:
import cv2
import numpy as np

def main():
    # Open a video file or capture from webcam
    cap = cv2.VideoCapture('output.mp4')  # Use 0 for webcam

    # Check if video opened successfully
    if not cap.isOpened():
        print("Error: Could not open video.")
        return

    # Read the first frame to initialize the background model
    ret, frame = cap.read()
    if not ret:
        print("Error: Could not read the first frame.")
        return

    # Convert the first frame to grayscale
    background = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Use GaussianBlur to remove noise and improve background subtraction
    background = cv2.GaussianBlur(background, (21, 21), 0)

    while True:
        # Capture frame-by-frame
        ret, frame = cap.read()
        if not ret:
            break

        # Convert the frame to grayscale
        gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Blur the frame to reduce noise
        gray_frame = cv2.GaussianBlur(gray_frame, (21, 21), 0)

        # Compute the absolute difference between the current frame and the background model
        diff_frame = cv2.absdiff(background, gray_frame)

        # Apply a binary threshold to the difference image
        _, thresh_frame = cv2.threshold(diff_frame, 25, 255, cv2.THRESH_BINARY)

        # Dilate the thresholded image to fill in holes, making the object detection more robust
        thresh_frame = cv2.dilate(thresh_frame, None, iterations=2)

        # Find contours of the detected objects
        contours, _ = cv2.findContours(thresh_frame.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

        # Draw bounding boxes around the detected objects
        for contour in contours:
            if cv2.contourArea(contour) < 500:
                continue  # Ignore small contours to reduce noise
            (x, y, w, h) = cv2.boundingRect(contour)
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

        # Display the resulting frame
        cv2.imshow('Frame', frame)
        cv2.imshow('Thresh', thresh_frame)
        cv2.imshow('Difference', diff_frame)

        # Update the background model using a running average
        background = cv2.addWeighted(background, 0.5, gray_frame, 0.5, 0)

        # Break the loop when 'q' key is pressed
        if cv2.waitKey(30) & 0xFF == ord('q'):
            break

    # Release the capture and close any OpenCV windows
    cap.release()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()


# Temporal Filtering:

Applying filters across frames to reduce noise or enhance features temporally.

Temporal filtering in video processing involves applying filters across multiple frames to reduce noise, enhance features, or achieve other effects over time. This approach leverages the temporal redundancy in video sequences to improve the quality and robustness of the resulting output.

## Key Concepts and Techniques

### 1. **Temporal Averaging**
Temporal averaging smooths the video by averaging pixel values over several frames. This helps to reduce noise but may introduce motion blur.

**Equation:**
\[ I_t(x, y) = \frac{1}{2k+1} \sum_{i=-k}^{k} I_{t+i}(x, y) \]
where \( I_t(x, y) \) is the pixel value at position \((x, y)\) in frame \(t\), and \(2k+1\) is the number of frames considered for averaging.

### 2. **Exponential Moving Average (EMA)**
EMA gives more weight to recent frames, making it more responsive to changes while still reducing noise.

**Equation:**
\[ I'_t(x, y) = \alpha I_t(x, y) + (1 - \alpha) I'_{t-1}(x, y) \]
where \( I'_t(x, y) \) is the filtered pixel value, and \(\alpha\) is the smoothing factor (0 < \(\alpha\) < 1).

### 3. **Kalman Filtering**
Kalman filter is an optimal recursive filter that estimates the state of a dynamic system from noisy measurements.

**State Equation:**
\[ \mathbf{x}_t = \mathbf{F} \mathbf{x}_{t-1} + \mathbf{B} \mathbf{u}_t + \mathbf{w}_t \]
**Measurement Equation:**
\[ \mathbf{z}_t = \mathbf{H} \mathbf{x}_t + \mathbf{v}_t \]
where:
- \(\mathbf{x}_t\) is the state vector at time \(t\).
- \(\mathbf{F}\) is the state transition model.
- \(\mathbf{u}_t\) is the control input vector.
- \(\mathbf{B}\) is the control-input model.
- \(\mathbf{w}_t\) is the process noise.
- \(\mathbf{z}_t\) is the measurement vector.
- \(\mathbf{H}\) is the observation model.
- \(\mathbf{v}_t\) is the measurement noise.

### 4. **Gaussian Temporal Filtering**
This applies a Gaussian filter over the temporal dimension, effectively performing weighted averaging with a Gaussian kernel.

**Equation:**
\[ I_t(x, y) = \sum_{i=-k}^{k} I_{t+i}(x, y) \cdot g(i) \]
where \( g(i) \) is the Gaussian weight for frame \( t+i \), calculated as:
\[ g(i) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{i^2}{2 \sigma^2}} \]

### 5. **Bilateral Filtering**
Bilateral filtering preserves edges while smoothing the image. It can be extended to the temporal domain by considering both spatial and temporal dimensions.

**Equation:**
\[ I_t(x, y) = \frac{1}{W} \sum_{i=-k}^{k} \sum_{(x', y') \in \Omega} I_{t+i}(x', y') \cdot f_d(x', y', x, y) \cdot f_r(I_{t+i}(x', y'), I_t(x, y)) \]
where:
- \( \Omega \) is the spatial neighborhood.
- \( f_d \) is the spatial distance weight.
- \( f_r \) is the range distance weight.
- \( W \) is the normalization factor.

### 6. **Optical Flow-Based Filtering**
This method uses optical flow to align frames before applying filters, which helps to handle motion better.

**Optical Flow Calculation:**
\[ I_t(x, y) = I_{t-1}(x + u(x, y), y + v(x, y)) \]
where \( u(x, y) \) and \( v(x, y) \) are the horizontal and vertical components of the optical flow vector at \((x, y)\).

## Applications
1. **Noise Reduction**: Temporal filters are widely used to reduce sensor noise in videos.
2. **Motion Deblurring**: Temporal filtering helps in reducing motion blur by leveraging multiple frames.
3. **Object Tracking**: Kalman filters and optical flow techniques are integral to tracking objects over time.
4. **Video Stabilization**: Temporal filtering can help in stabilizing shaky video footage.
5. **Feature Enhancement**: Enhancing specific features over time, such as improving the visibility of moving objects.

## Implementation in Python (using OpenCV and NumPy)

### Temporal Averaging
```python
import cv2
import numpy as np

def temporal_averaging(frames, k):
    num_frames = len(frames)
    avg_frames = []
    for i in range(k, num_frames - k):
        avg_frame = np.mean(frames[i-k:i+k+1], axis=0)
        avg_frames.append(avg_frame.astype(np.uint8))
    return avg_frames

# Example usage
cap = cv2.VideoCapture('video.mp4')
frames = []

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    frames.append(frame)

cap.release()
k = 2
avg_frames = temporal_averaging(frames, k)

for i, frame in enumerate(avg_frames):
    cv2.imshow('Averaged Frame', frame)
    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()
```

### Exponential Moving Average
```python
def exponential_moving_average(frames, alpha):
    ema_frames = []
    ema_frame = frames[0].astype(np.float32)
    
    for frame in frames:
        ema_frame = alpha * frame + (1 - alpha) * ema_frame
        ema_frames.append(ema_frame.astype(np.uint8))
        
    return ema_frames

# Example usage
alpha = 0.5
ema_frames = exponential_moving_average(frames, alpha)

for i, frame in enumerate(ema_frames):
    cv2.imshow('EMA Frame', frame)
    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()
```

### Kalman Filtering (OpenCV Kalman Filter example for object tracking)
```python
import cv2
import numpy as np

cap = cv2.VideoCapture('video.mp4')

# Create Kalman filter
kalman = cv2.KalmanFilter(4, 2)
kalman.measurementMatrix = np.array([[1, 0, 0, 0], [0, 1, 0, 0]], np.float32)
kalman.transitionMatrix = np.array([[1, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0], [0, 0, 0, 1]], np.float32)
kalman.processNoiseCov = np.eye(4, dtype=np.float32) * 0.03

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    
    # Simulated measurement (replace with actual measurement in real scenarios)
    measurement = np.array([[np.float32(x)], [np.float32(y)]])
    
    # Correct the state with the measurement
    kalman.correct(measurement)
    
    # Predict the next state
    predicted = kalman.predict()
    
    # Extract predicted position
    x, y = int(predicted[0]), int(predicted[1])
    
    # Draw the prediction
    cv2.circle(frame, (x, y), 5, (0, 255, 0), -1)
    
    cv2.imshow('Kalman Filter', frame)
    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
```

### Optical Flow-Based Filtering
```python
import cv2
import numpy as np

cap = cv2.VideoCapture('video.mp4')
ret, prev_frame = cap.read()
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
frames = []

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    flow = cv2.calcOpticalFlowFarneback(prev_gray, gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)
    prev_gray = gray

    # Warp the current frame according to the optical flow
    h, w = frame.shape[:2]
    flow_map = np.column_stack((np.repeat(np.arange(h), w), np.tile(np.arange(w), h))) + flow.reshape(-1, 2)
    flow_map = np.round(flow_map).astype(int)
    flow_map[:, 0] = np.clip(flow_map[:, 0], 0, h - 1)
    flow_map[:, 1] = np.clip(flow_map[:, 1], 0, w - 1)
    warped_frame = frame[flow_map[:, 0], flow_map[:, 1]].reshape(h, w, 3)
    
    frames.append(warped_frame)

cap.release()

for i

, frame in enumerate(frames):
    cv2.imshow('Optical Flow Filtered Frame', frame)
    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()
```

These implementations cover the basics of temporal filtering in video processing. Depending on the specific application and the type of noise or artifacts present in the video, different filtering techniques and parameters may be more suitable.

In [8]:
import cv2
import numpy as np

def temporal_averaging(frames, k):
    num_frames = len(frames)
    avg_frames = []
    for i in range(k, num_frames - k):
        avg_frame = np.mean(frames[i-k:i+k+1], axis=0)
        avg_frames.append(avg_frame.astype(np.uint8))
    return avg_frames

# Example usage
cap = cv2.VideoCapture('video.mp4')
frames = []

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    frames.append(frame)

cap.release()
k = 2
avg_frames = temporal_averaging(frames, k)

for i, frame in enumerate(avg_frames):
    cv2.imshow('Averaged Frame', frame)
    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()


In [6]:
import cv2
import numpy as np

cap = cv2.VideoCapture('video.mp4')
ret, prev_frame = cap.read()
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
frames = []

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    flow = cv2.calcOpticalFlowFarneback(prev_gray, gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)
    prev_gray = gray

    # Warp the current frame according to the optical flow
    h, w = frame.shape[:2]
    flow_map = np.column_stack((np.repeat(np.arange(h), w), np.tile(np.arange(w), h))) + flow.reshape(-1, 2)
    flow_map = np.round(flow_map).astype(int)
    flow_map[:, 0] = np.clip(flow_map[:, 0], 0, h - 1)
    flow_map[:, 1] = np.clip(flow_map[:, 1], 0, w - 1)
    warped_frame = frame[flow_map[:, 0], flow_map[:, 1]].reshape(h, w, 3)
    
    frames.append(warped_frame)

cap.release()

for i, frame in enumerate(frames):
    cv2.imshow('Optical Flow Filtered Frame', frame)
    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()

# Optical Flow:

Estimating the motion of objects between frames.

Optical flow is a crucial concept in computer vision and image processing, referring to the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and the scene. Estimating optical flow involves determining the motion of each pixel in an image sequence, which has various applications including video compression, object tracking, and autonomous navigation.

### Key Concepts and Equations in Optical Flow

1. **Optical Flow Constraint Equation**:
   The basic principle of optical flow is the brightness constancy assumption, which states that the intensity of a point in the image remains constant over time despite its movement. Mathematically, for a pixel at location \((x, y)\) in an image \(I\) at time \(t\), the intensity \(I(x, y, t)\) is assumed to be constant as it moves to a new position \((x + \Delta x, y + \Delta y)\) at time \(t + \Delta t\):

   \[
   I(x, y, t) = I(x + \Delta x, y + \Delta y, t + \Delta t)
   \]

   Taking the first-order Taylor expansion of \(I(x + \Delta x, y + \Delta y, t + \Delta t)\) and assuming \(\Delta x\), \(\Delta y\), and \(\Delta t\) are small, we get:

   \[
   I(x + \Delta x, y + \Delta y, t + \Delta t) \approx I(x, y, t) + \frac{\partial I}{\partial x} \Delta x + \frac{\partial I}{\partial y} \Delta y + \frac{\partial I}{\partial t} \Delta t
   \]

   Since \(I(x, y, t) = I(x + \Delta x, y + \Delta y, t + \Delta t)\), we obtain:

   \[
   \frac{\partial I}{\partial x} \Delta x + \frac{\partial I}{\partial y} \Delta y + \frac{\partial I}{\partial t} \Delta t = 0
   \]

   Dividing by \(\Delta t\), we get the optical flow constraint equation:

   \[
   I_x u + I_y v + I_t = 0
   \]

   where \(I_x = \frac{\partial I}{\partial x}\), \(I_y = \frac{\partial I}{\partial y}\), \(I_t = \frac{\partial I}{\partial t}\), \(u = \frac{\Delta x}{\Delta t}\), and \(v = \frac{\Delta y}{\Delta t}\) are the horizontal and vertical components of the optical flow velocity, respectively.

2. **Aperture Problem**:
   The optical flow constraint equation provides one equation with two unknowns (u and v), leading to an under-determined system known as the aperture problem. To resolve this, additional constraints or assumptions are needed.

3. **Methods for Optical Flow Estimation**:
   - **Lucas-Kanade Method**:
     This method assumes that the flow is essentially constant in a small neighborhood around each pixel. By combining the optical flow constraint equations for all the pixels in the neighborhood, a system of linear equations is formed, which can be solved using least squares:

     \[
     A^T A \mathbf{v} = A^T \mathbf{b}
     \]

     where \(A\) is a matrix containing gradients \(I_x\) and \(I_y\), and \(\mathbf{b}\) is a vector containing the negative temporal gradient \(-I_t\).

   - **Horn-Schunck Method**:
     This method introduces a global smoothness constraint, which assumes that the flow field varies smoothly over the entire image. The method minimizes an energy function that includes both the optical flow constraint and a smoothness term:

     \[
     E = \int \int \left( (I_x u + I_y v + I_t)^2 + \alpha^2 \left( \left| \nabla u \right|^2 + \left| \nabla v \right|^2 \right) \right) dx dy
     \]

     Here, \(\alpha\) is a regularization parameter balancing the data and smoothness terms. The corresponding Euler-Lagrange equations lead to iterative solutions for \(u\) and \(v\).

4. **Pyramidal Approach**:
   To handle large motions and reduce computational complexity, a pyramidal approach can be used. This involves creating a pyramid of images at multiple scales (resolutions) and computing optical flow from the coarsest to the finest level, refining the flow estimates progressively.

### Applications of Optical Flow

- **Video Compression**: Predicting motion between frames can significantly reduce the amount of data required to encode video sequences.
- **Object Tracking**: Tracking the movement of objects across frames is a direct application of optical flow.
- **Motion Detection**: Optical flow can be used to detect and segment moving objects from the background.
- **Autonomous Navigation**: Optical flow provides crucial information about the environment for navigation and obstacle avoidance in robotics and autonomous vehicles.
- **Medical Imaging**: Analyzing the movement of organs and tissues in medical scans.

### Conclusion

Optical flow is a fundamental tool in computer vision for estimating the motion of objects between frames. While the basic optical flow constraint equation lays the foundation, practical implementations often require additional constraints and methods to handle the under-determined nature of the problem and to achieve robust performance in real-world scenarios. Techniques such as the Lucas-Kanade and Horn-Schunck methods, along with multi-scale approaches, are commonly used to estimate optical flow effectively.

In [1]:
import cv2
import numpy as np

# Capture video from webcam or file
cap = cv2.VideoCapture('video.mp4')  # Change 'video.mp4' to 0 for webcam

# Parameters for Lucas-Kanade optical flow
lk_params = dict(winSize=(15, 15), maxLevel=2,
                 criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

# Take first frame
ret, old_frame = cap.read()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)

# Create a mask image for drawing purposes
mask = np.zeros_like(old_frame)

while True:
    ret, frame = cap.read()
    if not ret:
        break
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Calculate optical flow
    p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, None, None, **lk_params)

    # Select good points
    good_new = p1[st == 1]
    good_old = p1[st == 1]

    # Draw the tracks
    for i, (new, old) in enumerate(zip(good_new, good_old)):
        a, b = new.ravel().astype(int)
        c, d = old.ravel().astype(int)
        mask = cv2.line(mask, (a, b), (c, d), (0, 255, 0), 2)
        frame = cv2.circle(frame, (a, b), 5, (0, 0, 255), -1)

    # Overlay the optical flow on the original frame
    img = cv2.add(frame, mask)

    # Display the resulting frame
    cv2.imshow('Frame', img)

    # Update the previous frame and previous points
    old_gray = frame_gray.copy()

    # Exit the loop if 'q' is pressed
    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

# Release the capture and close all windows
cap.release()
cv2.destroyAllWindows()


error: OpenCV(4.9.0) D:\a\opencv-python\opencv-python\opencv\modules\video\src\lkpyramid.cpp:1260: error: (-215:Assertion failed) (npoints = prevPtsMat.checkVector(2, CV_32F, true)) >= 0 in function 'cv::`anonymous-namespace'::SparsePyrLKOpticalFlowImpl::calc'


In [5]:
import numpy as np
import cv2

def block_matching_motion_estimation(ref_frame, curr_frame, block_size=16, search_range=7):
    height, width = ref_frame.shape
    motion_vectors = np.zeros((height // block_size, width // block_size, 2), dtype=int)

    for y in range(0, height, block_size):
        for x in range(0, width, block_size):
            best_match = (0, 0)
            min_sad = float('inf')
            
            current_block = curr_frame[y:y + block_size, x:x + block_size]

            for dy in range(-search_range, search_range + 1):
                for dx in range(-search_range, search_range + 1):
                    ref_y = y + dy
                    ref_x = x + dx

                    if ref_y < 0 or ref_y + block_size > height or ref_x < 0 or ref_x + block_size > width:
                        continue

                    reference_block = ref_frame[ref_y:ref_y + block_size, ref_x:ref_x + block_size]

                    sad = np.sum(np.abs(current_block - reference_block))

                    if sad < min_sad:
                        min_sad = sad
                        best_match = (dy, dx)
            
            motion_vectors[y // block_size, x // block_size] = best_match

    return motion_vectors

def apply_motion_compensation(ref_frame, motion_vectors, block_size=16):
    height, width = ref_frame.shape
    compensated_frame = np.zeros_like(ref_frame)

    for y in range(0, height, block_size):
        for x in range(0, width, block_size):
            dy, dx = motion_vectors[y // block_size, x // block_size]
            ref_y = y + dy
            ref_x = x + dx
            compensated_frame[y:y + block_size, x:x + block_size] = ref_frame[ref_y:ref_y + block_size, ref_x:ref_x + block_size]

    return compensated_frame

# Load video frames (example with OpenCV)
cap = cv2.VideoCapture('video.mp4')
ret, ref_frame = cap.read()
ref_frame_gray = cv2.cvtColor(ref_frame, cv2.COLOR_BGR2GRAY)

while cap.isOpened():
    ret, curr_frame = cap.read()
    if not ret:
        break

    curr_frame_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)

    motion_vectors = block_matching_motion_estimation(ref_frame_gray, curr_frame_gray)
    compensated_frame = apply_motion_compensation(ref_frame_gray, motion_vectors)

    # Display the original and compensated frames
    cv2.imshow('Original Frame', curr_frame_gray)
    cv2.imshow('Compensated Frame', compensated_frame)

    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

    ref_frame_gray = curr_frame_gray

cap.release()
cv2.destroyAllWindows()


In [8]:
import cv2
import numpy as np

def draw_tracks(frame, tracks):
    for track in tracks:
        for i in range(1, len(track)):
            pt1 = tuple(map(int, track[i - 1]))
            pt2 = tuple(map(int, track[i]))
            cv2.line(frame, pt1, pt2, (0, 255, 0), 2)
    return frame

# Parameters for ShiTomasi corner detection
feature_params = dict(maxCorners=100, qualityLevel=0.3, minDistance=7, blockSize=7)

# Parameters for Lucas-Kanade optical flow
lk_params = dict(winSize=(15, 15), maxLevel=2,
                 criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

cap = cv2.VideoCapture('video.mp4')
ret, old_frame = cap.read()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)
p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)

# Create a mask image for drawing purposes
mask = np.zeros_like(old_frame)

# Create an array to store the tracks
tracks = [[] for _ in range(len(p0))]

while True:
    ret, frame = cap.read()
    if not ret:
        break
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Calculate optical flow
    p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)

    # Select good points
    good_new = p1[st == 1]
    good_old = p0[st == 1]

    # Update the tracks
    for i, (new, old) in enumerate(zip(good_new, good_old)):
        a, b = new.ravel()
        c, d = old.ravel()
        tracks[i].append((a, b))
        pt1 = tuple(map(int, (a, b)))
        pt2 = tuple(map(int, (c, d)))
        mask = cv2.line(mask, pt1, pt2, (0, 255, 0), 2)
        frame = cv2.circle(frame, pt1, 5, (0, 255, 0), -1)

    img = cv2.add(frame, mask)

    # Display the resulting frame
    cv2.imshow('Frame', img)

    k = cv2.waitKey(30) & 0xff
    if k == 27:
        break

    # Update the previous frame and previous points
    old_gray = frame_gray.copy()
    p0 = good_new.reshape(-1, 1, 2)

cap.release()
cv2.destroyAllWindows()


In [12]:
import cv2
import numpy as np

# Capture video from webcam or file
cap = cv2.VideoCapture('video.mp4')  # Change 'video.mp4' to 0 for webcam

# Parameters for ShiTomasi corner detection
feature_params = dict(maxCorners=100, qualityLevel=0.3, minDistance=7, blockSize=7)

# Parameters for Lucas-Kanade optical flow
lk_params = dict(winSize=(15, 15), maxLevel=2,
                 criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

# Take first frame and find corners
ret, old_frame = cap.read()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)
p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)

# Create a mask image for drawing purposes
mask = np.zeros_like(old_frame)

while True:
    ret, frame = cap.read()
    if not ret:
        break
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Calculate optical flow
    p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)

    # Select good points
    good_new = p1[st == 1]
    good_old = p0[st == 1]

    # Draw the tracks
    for i, (new, old) in enumerate(zip(good_new, good_old)):
        a, b = new.ravel()
        c, d = old.ravel()
        a, b, c, d = int(a), int(b), int(c), int(d)  # Convert points to integers
        mask = cv2.line(mask, (a, b), (c, d), (0, 255, 0), 2)
        frame = cv2.circle(frame, (a, b), 5, (0, 0, 255), -1)

    # Overlay the optical flow on the original frame
    img = cv2.add(frame, mask)

    # Display the resulting frame
    cv2.imshow('Frame', img)

    # Update the previous frame and previous points
    old_gray = frame_gray.copy()
    p0 = good_new.reshape(-1, 1, 2)

    # Exit the loop if 'q' is pressed
    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

# Release the capture and close all windows
cap.release()
cv2.destroyAllWindows()


# Object Tracking:

Tracking the movement of objects across frames.

Object tracking in computer vision involves locating and following a specific object or multiple objects in a video sequence. Here's a detailed overview of object tracking, including techniques, equations, and considerations:

### 1. Object Tracking Techniques:

#### 1.1. **Template Matching**:
   - Compares a template image to regions in a larger image to find matches.
   - Measures similarity using methods like Cross-correlation or Sum of Squared Differences (SSD).

#### 1.2. **Optical Flow**:
   - Estimates the motion of objects by analyzing pixel intensity changes between consecutive frames.
   - Common algorithms include Lucas-Kanade method and Horn-Schunck method.

#### 1.3. **Kalman Filters**:
   - Predicts the next state of an object based on its previous state and motion model.
   - Incorporates measurements to correct predictions and estimate the most likely state.

#### 1.4. **Particle Filters (Sequential Monte Carlo)**:
   - Represents the posterior probability density of the object's state using a set of particles.
   - Samples particles from the state space and updates their weights based on measurements.

#### 1.5. **Deep Learning-based Tracking**:
   - Utilizes deep neural networks to learn representations for tracking tasks.
   - Techniques include Siamese Networks, Fully Convolutional Networks (FCNs), and Recurrent Neural Networks (RNNs).

### 2. Equations and Concepts:

#### 2.1. **Motion Model**:
   - Represents how an object's state evolves over time.
   - Common motion models include constant velocity (CV) and constant acceleration (CA) models.

#### 2.2. **State Estimation**:
   - Predicts the current state of the object based on previous states and motion model.
   - Typically implemented using Kalman filters or particle filters.

#### 2.3. **Measurement Model**:
   - Describes how measurements are related to the object's state.
   - Involves converting object properties (e.g., position) into observable quantities (e.g., pixel coordinates).

#### 2.4. **Resampling (Particle Filters)**:
   - Updates the set of particles based on their weights to ensure a representative sample.
   - High-weight particles are more likely to survive, while low-weight particles may be replaced.

### 3. Implementation Considerations:

#### 3.1. **Initialization**:
   - Choosing an initial bounding box or region of interest (ROI) for the object to track.

#### 3.2. **Motion Model Selection**:
   - Determining the appropriate motion model based on the object's dynamics.

#### 3.3. **Measurement Update**:
   - Incorporating measurements from the current frame to refine object state estimates.

#### 3.4. **Handling Occlusions and Track Loss**:
   - Dealing with situations where the object is temporarily invisible or leaves the frame.

#### 3.5. **Performance Optimization**:
   - Employing techniques to improve tracking speed and accuracy, such as feature selection and parallelization.

Object tracking is a fundamental task in many computer vision applications, including surveillance, human-computer interaction, and autonomous navigation. By understanding various tracking techniques and their underlying principles, developers can implement robust and efficient tracking systems tailored to specific use cases.

In [3]:
import cv2
import numpy as np

# Load the main image
main_image = cv2.imread('My.jpg')
if main_image is None:
    print("Error: Could not read the main image.")
    exit()

# Convert to grayscale
gray_main = cv2.cvtColor(main_image, cv2.COLOR_BGR2GRAY)

# Load the template image
template = cv2.imread('My.jpg', 0)
if template is None:
    print("Error: Could not read the template image.")
    exit()

# Get the width and height of the template
w, h = template.shape[::-1]

# Perform template matching using cv2.matchTemplate
res = cv2.matchTemplate(gray_main, template, cv2.TM_CCOEFF_NORMED)

# Define a threshold to find matches
threshold = 0.8
loc = np.where(res >= threshold)

# Draw rectangles around the matched regions
for pt in zip(*loc[::-1]):
    cv2.rectangle(main_image, pt, (pt[0] + w, pt[1] + h), (0, 255, 255), 2)

# Display the result
cv2.imshow('Detected', main_image)
cv2.waitKey(0)
cv2.destroyAllWindows()


In [2]:
import cv2
import numpy as np

# Read the video from file
cap = cv2.VideoCapture('video.mp4')

# Parameters for Lucas-Kanade optical flow
lk_params = dict(winSize=(15, 15),
                 maxLevel=2,
                 criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

# Take first frame and find corners in it
ret, old_frame = cap.read()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)
p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **dict(maxCorners=100, qualityLevel=0.3, minDistance=7, blockSize=7))

# Create some random colors
color = np.random.randint(0, 255, (100, 3))

# Create a mask image for drawing purposes
mask = np.zeros_like(old_frame)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Calculate optical flow
    p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)

    # Select good points
    good_new = p1[st == 1]
    good_old = p0[st == 1]

    # Draw the tracks
    for i, (new, old) in enumerate(zip(good_new, good_old)):
        a, b = map(int, new.ravel())
        c, d = map(int, old.ravel())
        mask = cv2.line(mask, (a, b), (c, d), color[i].tolist(), 2)
        frame = cv2.circle(frame, (a, b), 5, color[i].tolist(), -1)
    img = cv2.add(frame, mask)

    cv2.imshow('frame', img)
    if cv2.waitKey(30) & 0xFF == 27:
        break

    # Now update the previous frame and previous points
    old_gray = frame_gray.copy()
    p0 = good_new.reshape(-1, 1, 2)

cv2.destroyAllWindows()
cap.release()


In [3]:
import cv2
import numpy as np

def horn_schunck(I1, I2, alpha, Niter):
    I1 = I1.astype(float)
    I2 = I2.astype(float)

    # Set initial values for u, v
    u = np.zeros(I1.shape)
    v = np.zeros(I1.shape)

    # Estimate derivatives
    Ix = cv2.Sobel(I1, cv2.CV_64F, 1, 0, ksize=5)
    Iy = cv2.Sobel(I1, cv2.CV_64F, 0, 1, ksize=5)
    It = I2 - I1

    # Averaging kernel
    kernel = np.array([[1/12, 1/6, 1/12],
                       [1/6,  0,   1/6],
                       [1/12, 1/6, 1/12]], float)

    for _ in range(Niter):
        u_avg = cv2.filter2D(u, -1, kernel)
        v_avg = cv2.filter2D(v, -1, kernel)
        P = (Ix * u_avg + Iy * v_avg + It) / (alpha**2 + Ix**2 + Iy**2)
        u = u_avg - Ix * P
        v = v_avg - Iy * P

    return u, v

# Read the video from file
cap = cv2.VideoCapture('video.mp4')

ret, frame1 = cap.read()
frame1_gray = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)

while True:
    ret, frame2 = cap.read()
    if not ret:
        break

    frame2_gray = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

    u, v = horn_schunck(frame1_gray, frame2_gray, alpha=1, Niter=100)

    # Visualize the flow vectors
    hsv = np.zeros((frame1_gray.shape[0], frame1_gray.shape[1], 3), dtype=np.uint8)
    hsv[..., 1] = 255

    mag, ang = cv2.cartToPolar(u, v)
    hsv[..., 0] = ang * 180 / np.pi / 2
    hsv[..., 2] = cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX)
    rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)

    cv2.imshow('frame', rgb)
    if cv2.waitKey(30) & 0xFF == 27:
        break

    frame1_gray = frame2_gray

cv2.destroyAllWindows()
cap.release()


In [4]:
import numpy as np

# Initial state (location and velocity)
x = np.array([[0],  # initial position
              [0]]) # initial velocity

# State transition matrix (assuming constant velocity model)
F = np.array([[1, 1],  # [1, delta_t] for position update
              [0, 1]]) # [0, 1] for velocity update

# Measurement matrix (we can only measure position)
H = np.array([[1, 0]])

# Measurement noise covariance
R = np.array([[1]])

# Process noise covariance
Q = np.array([[1, 0],
              [0, 1]])

# Initial estimation error covariance
P = np.array([[1, 0],
              [0, 1]])

# Control input (acceleration, for example)
u = np.array([[0], 
              [0]])

# Control matrix
B = np.array([[0.5], 
              [1]])

# Identity matrix
I = np.eye(2)


In [5]:
def predict(x, P, F, Q, u, B):
    x = F @ x + B @ u  # State prediction
    P = F @ P @ F.T + Q  # Covariance prediction
    return x, P


In [6]:
def update(x, P, Z, H, R):
    y = Z - H @ x  # Measurement residual
    S = H @ P @ H.T + R  # Residual covariance
    K = P @ H.T @ np.linalg.inv(S)  # Kalman gain
    x = x + K @ y  # State update
    P = (I - K @ H) @ P  # Covariance update
    return x, P


In [8]:
import numpy as np

# Initial state (location and velocity)
x = np.array([[0],  # initial position
              [0]]) # initial velocity

# State transition matrix (assuming constant velocity model)
F = np.array([[1, 1],  # [1, delta_t] for position update
              [0, 1]]) # [0, 1] for velocity update

# Measurement matrix (we can only measure position)
H = np.array([[1, 0]])

# Measurement noise covariance
R = np.array([[1]])

# Process noise covariance
Q = np.array([[1, 0],
              [0, 1]])

# Initial estimation error covariance
P = np.array([[1, 0],
              [0, 1]])

# Control input (acceleration, for example)
u = np.array([[0]])  # assuming no control input

# Control matrix
B = np.array([[0], 
              [0]])  # assuming no control input effect on position and velocity

# Identity matrix
I = np.eye(2)

def predict(x, P, F, Q, u, B):
    x = F @ x + B @ u  # State prediction
    P = F @ P @ F.T + Q  # Covariance prediction
    return x, P

def update(x, P, Z, H, R):
    y = Z - H @ x  # Measurement residual
    S = H @ P @ H.T + R  # Residual covariance
    K = P @ H.T @ np.linalg.inv(S)  # Kalman gain
    x = x + K @ y  # State update
    P = (I - K @ H) @ P  # Covariance update
    return x, P

# Simulated measurements (for example, noisy position measurements)
measurements = [1, 2, 3, 4, 5]

# Run Kalman Filter
for z in measurements:
    x, P = predict(x, P, F, Q, u, B)
    x, P = update(x, P, np.array([[z]]), H, R)
    print("State estimate: \n", x)
    print("Covariance estimate: \n", P)


State estimate: 
 [[0.75]
 [0.25]]
Covariance estimate: 
 [[0.75 0.25]
 [0.25 1.75]]
State estimate: 
 [[1.8 ]
 [0.65]]
Covariance estimate: 
 [[0.8  0.4 ]
 [0.4  1.95]]
State estimate: 
 [[2.9009009 ]
 [0.88288288]]
Covariance estimate: 
 [[0.81981982 0.42342342]
 [0.42342342 1.95495495]]
State estimate: 
 [[3.96153846]
 [0.97435897]]
Covariance estimate: 
 [[0.82211538 0.42307692]
 [0.42307692 1.94871795]]
State estimate: 
 [[4.98858773]
 [1.00142653]]
Covariance estimate: 
 [[0.82196862 0.42225392]
 [0.42225392 1.94721826]]


In [9]:
import numpy as np

class ParticleFilter:
    def __init__(self, num_particles, state_dim, process_noise_std, measurement_noise_std, state_transition_fn, measurement_fn):
        self.num_particles = num_particles
        self.state_dim = state_dim
        self.particles = np.random.randn(num_particles, state_dim)  # Initialize particles randomly
        self.weights = np.ones(num_particles) / num_particles  # Initialize weights uniformly
        self.process_noise_std = process_noise_std
        self.measurement_noise_std = measurement_noise_std
        self.state_transition_fn = state_transition_fn
        self.measurement_fn = measurement_fn

    def predict(self):
        # Predict the next state of each particle
        for i in range(self.num_particles):
            noise = np.random.randn(self.state_dim) * self.process_noise_std
            self.particles[i] = self.state_transition_fn(self.particles[i]) + noise

    def update(self, measurement):
        # Update the weights based on the measurement likelihood
        for i in range(self.num_particles):
            predicted_measurement = self.measurement_fn(self.particles[i])
            self.weights[i] = self.measurement_likelihood(measurement, predicted_measurement)
        self.weights += 1.e-300  # Avoid division by zero
        self.weights /= np.sum(self.weights)  # Normalize the weights

    def measurement_likelihood(self, measurement, predicted_measurement):
        # Assuming Gaussian noise
        error = measurement - predicted_measurement
        return np.exp(-0.5 * np.dot(error, error) / (self.measurement_noise_std**2)) / (np.sqrt(2 * np.pi) * self.measurement_noise_std)

    def resample(self):
        # Resample particles based on their weights
        indices = np.random.choice(self.num_particles, size=self.num_particles, p=self.weights)
        self.particles = self.particles[indices]
        self.weights.fill(1.0 / self.num_particles)

    def estimate(self):
        # Estimate the state as the mean of the particles
        return np.average(self.particles, weights=self.weights, axis=0)

# Example usage:
# Define state transition and measurement functions
def state_transition(state):
    # Simple linear motion model: x_next = x + v
    return state

def measurement_fn(state):
    # Measurement model: z = x
    return state

# Create a particle filter
num_particles = 1000
state_dim = 2  # For example, 2D state [x, y]
process_noise_std = 0.1
measurement_noise_std = 1.0

pf = ParticleFilter(num_particles, state_dim, process_noise_std, measurement_noise_std, state_transition, measurement_fn)

# Simulate a series of measurements
measurements = [np.array([i, i]) for i in range(10)]

for measurement in measurements:
    pf.predict()
    pf.update(measurement)
    pf.resample()
    estimate = pf.estimate()
    print(f"Estimated state: {estimate}")


Estimated state: [ 0.0454986  -0.02005369]
Estimated state: [0.41256888 0.28438928]
Estimated state: [0.82182439 0.76469088]
Estimated state: [1.39775968 1.39647948]
Estimated state: [1.8895624  1.69085101]
Estimated state: [2.24594535 1.66451034]
Estimated state: [2.49969525 1.71447346]
Estimated state: [2.71463496 1.82491497]
Estimated state: [2.90542744 1.97404892]
Estimated state: [3.08366087 2.14263308]


In [10]:
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
import cv2
import numpy as np

class TrackingDataset(Dataset):
    def __init__(self, video_path, transform=None):
        self.video_path = video_path
        self.transform = transform
        self.frames = self.load_video_frames(video_path)
        self.pairs, self.labels = self.create_pairs(self.frames)

    def load_video_frames(self, video_path):
        cap = cv2.VideoCapture(video_path)
        frames = []
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            frames.append(frame)
        cap.release()
        return frames

    def create_pairs(self, frames):
        pairs = []
        labels = []
        num_frames = len(frames)
        for i in range(num_frames - 1):
            pairs.append([frames[i], frames[i + 1]])
            labels.append(1)
        return pairs, labels

    def __len__(self):
        return len(self.pairs)

    def __getitem__(self, idx):
        img1, img2 = self.pairs[idx]
        if self.transform:
            img1 = self.transform(img1)
            img2 = self.transform(img2)
        label = self.labels[idx]
        return img1, img2, label

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Resize((128, 128)),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

dataset = TrackingDataset('video.mp4', transform=transform)
dataloader = DataLoader(dataset, batch_size=16, shuffle=True)


In [11]:
import torch.nn as nn
import torch.nn.functional as F

class SiameseNetwork(nn.Module):
    def __init__(self):
        super(SiameseNetwork, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=10)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=7)
        self.conv3 = nn.Conv2d(128, 128, kernel_size=4)
        self.conv4 = nn.Conv2d(128, 256, kernel_size=4)
        self.fc1 = nn.Linear(256*6*6, 4096)
        self.fc2 = nn.Linear(4096, 1)

    def forward_once(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), (2, 2)))
        x = F.relu(F.max_pool2d(self.conv2(x), (2, 2)))
        x = F.relu(F.max_pool2d(self.conv3(x), (2, 2)))
        x = F.relu(F.max_pool2d(self.conv4(x), (2, 2)))
        x = x.view(x.size()[0], -1)
        x = F.relu(self.fc1(x))
        return x

    def forward(self, input1, input2):
        output1 = self.forward_once(input1)
        output2 = self.forward_once(input2)
        euclidean_distance = F.pairwise_distance(output1, output2)
        return euclidean_distance


In [17]:
from torch.optim import Adam

class ContrastiveLoss(nn.Module):
    def __init__(self, margin=1.0):
        super(ContrastiveLoss, self).__init__()
        self.margin = margin

    def forward(self, output1, output2, label):
        euclidean_distance = F.pairwise_distance(output1, output2)
        loss_contrastive = torch.mean((1-label) * torch.pow(euclidean_distance, 2) +
                                      (label) * torch.pow(torch.clamp(self.margin - euclidean_distance, min=0.0), 2))
        return loss_contrastive

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = SiameseNetwork().to(device)
criterion = ContrastiveLoss()
optimizer = Adam(model.parameters(), lr=1e-4)

num_epochs = 10

for epoch in range(num_epochs):
    for img1, img2, label in dataloader:
        img1, img2, label = img1.to(device), img2.to(device), label.to(device).float()

        optimizer.zero_grad()
        #output1 = model.forward_once(img1)
        #output2 = model.forward_once(img2)
        #loss = criterion(output1, output2, label)
        #loss.backward()
        #optimizer.step()

    #print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}')


# Video Compression:

Techniques to reduce the size of video files for storage or transmission.

## Video Compression: Techniques and Principles

### Introduction

Video compression is the process of reducing the size of video files for storage or transmission while maintaining an acceptable level of quality. This involves encoding the video data in such a way that it requires fewer bits. Video compression techniques are essential for streaming services, video conferencing, and digital broadcasting. The main goal is to minimize the bit rate while preserving the perceptual quality of the video.

### Key Concepts in Video Compression

1. **Spatial Redundancy**: Refers to the redundancy within a single frame. Compression techniques exploit the fact that neighboring pixels in a frame are often similar.

2. **Temporal Redundancy**: Refers to the redundancy between consecutive frames. Compression techniques utilize the similarity between frames to reduce the data required to represent them.

3. **Psycho-visual Redundancy**: Takes advantage of the human visual system’s insensitivity to certain changes, allowing for lossy compression that removes less noticeable details.

### Basic Components of Video Compression

1. **Encoder and Decoder (Codec)**: A codec is a device or software that encodes and decodes digital video. Popular codecs include H.264, H.265 (HEVC), and VP9.

2. **Bit Rate**: The amount of data processed per unit of time, typically measured in bits per second (bps). Lower bit rates reduce file size but can affect quality.

3. **Frame Rate**: The number of frames displayed per second (fps). Common frame rates are 24, 30, and 60 fps.

4. **Resolution**: The number of pixels in each dimension that can be displayed, e.g., 1920x1080 (Full HD).

### Compression Techniques

#### 1. **Lossless Compression**

Lossless compression reduces file size without losing any information, allowing the original video to be perfectly reconstructed. It’s less effective in reducing file size compared to lossy compression.

- **Run-Length Encoding (RLE)**: Encodes sequences of identical values by storing the value and its count.
- **Huffman Coding**: Uses variable-length codes for different symbols, with more common symbols using shorter codes.
- **Lempel-Ziv-Welch (LZW)**: Builds a dictionary of commonly occurring patterns to replace repeated patterns with shorter codes.

#### 2. **Lossy Compression**

Lossy compression significantly reduces file size by removing some information, which can lead to a loss in quality. It exploits psycho-visual redundancy.

- **Transform Coding**: Converts spatial domain data to frequency domain data using transforms like the Discrete Cosine Transform (DCT) or Discrete Wavelet Transform (DWT).
  
  **DCT Equation**:
  \[
  F(u,v) = \frac{1}{4} C(u)C(v) \sum_{x=0}^{N-1} \sum_{y=0}^{N-1} f(x,y) \cos \left[\frac{(2x+1)u\pi}{2N}\right] \cos \left[\frac{(2y+1)v\pi}{2N}\right]
  \]
  where \( C(u) \) and \( C(v) \) are normalization factors, and \( f(x,y) \) is the pixel value at coordinates \( (x,y) \).

- **Quantization**: Reduces the precision of the transformed coefficients, which is the primary source of loss in lossy compression.

  **Quantization Equation**:
  \[
  Q(u,v) = \left\lfloor \frac{F(u,v)}{Q(u,v)} \right\rfloor
  \]
  where \( Q(u,v) \) is a quantization matrix.

- **Entropy Coding**: Further compresses the quantized coefficients using methods like Huffman coding or Arithmetic coding.

#### 3. **Predictive Coding**

Exploits temporal redundancy by predicting future frames based on previous ones.

- **Intra-frame Coding**: Encodes each frame independently (used for keyframes).
- **Inter-frame Coding**: Encodes the difference between frames.

  **Motion Compensation**:
  \[
  P_t = F_t - M(F_{t-1}, v)
  \]
  where \( P_t \) is the predicted frame, \( F_t \) is the current frame, \( F_{t-1} \) is the previous frame, and \( M \) is the motion vector.

### Advanced Techniques

- **Motion Estimation and Compensation**: Identifies and compensates for the motion of objects between frames.
  
  **Motion Vector Calculation**:
  \[
  MV = (dx, dy)
  \]
  where \( dx \) and \( dy \) represent the displacement in horizontal and vertical directions, respectively.

- **Rate-Distortion Optimization**: Balances the trade-off between bit rate and distortion.
  
  **Rate-Distortion Function**:
  \[
  D(R) = D_0 e^{-\lambda R}
  \]
  where \( D \) is the distortion, \( R \) is the bit rate, \( D_0 \) is the initial distortion, and \( \lambda \) is a constant.

### Popular Video Compression Standards

1. **H.264/AVC**: Widely used for its high compression efficiency and good quality.
2. **H.265/HEVC**: Provides better compression than H.264, supporting higher resolutions like 4K.
3. **VP9**: An open and royalty-free codec used mainly for web video streaming.
4. **AV1**: A newer open-source codec designed to succeed VP9, offering better compression rates.

### Conclusion

Video compression is a vital technology enabling efficient storage and transmission of video content. By leveraging techniques that reduce spatial, temporal, and psycho-visual redundancy, significant reductions in file size can be achieved. Both lossy and lossless methods have their applications, with advanced techniques like motion compensation and rate-distortion optimization playing crucial roles in modern codecs. Understanding these principles and equations helps in appreciating how video compression works to deliver high-quality video at manageable data rates.

In [2]:
import cv2

def compress_video(input_video_path, output_video_path, codec='XVID', fps=None, resolution=None, bit_rate=None):
    # Open the input video
    cap = cv2.VideoCapture(input_video_path)

    if not cap.isOpened():
        print(f"Error opening video file: {input_video_path}")
        return

    # Get the original video properties
    original_fps = cap.get(cv2.CAP_PROP_FPS) if fps is None else fps
    original_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    original_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    
    # Set the resolution
    width, height = resolution if resolution else (original_width, original_height)

    # Define the codec and create VideoWriter object
    fourcc = cv2.VideoWriter_fourcc(*codec)
    out = cv2.VideoWriter(output_video_path, fourcc, original_fps, (width, height))

    while True:
        ret, frame = cap.read()
        if not ret:
            break
        
        # Optionally, resize the frame
        if resolution:
            frame = cv2.resize(frame, (width, height))
        
        # Write the frame to the output file
        out.write(frame)

    # Release everything if job is finished
    cap.release()
    out.release()
    cv2.destroyAllWindows()

    if bit_rate:
        # Adjust bitrate using FFmpeg
        import os
        temp_file = output_video_path.replace('.avi', '_temp.avi')
        os.rename(output_video_path, temp_file)
        ffmpeg_command = f'ffmpeg -i {temp_file} -b:v {bit_rate} -bufsize {bit_rate} {output_video_path}'
        os.system(ffmpeg_command)
        os.remove(temp_file)

# Example usage
compress_video('video.mp4', 'output_compressed.avi', codec='XVID', resolution=(640, 480), bit_rate='500k')


# Video Decompression:

Techniques to decode compressed video data back into individual frames.

Video decompression is the process of converting compressed video data back into its original format, typically a sequence of individual frames. This process is crucial for viewing and editing videos efficiently. There are various techniques for video decompression, with the most common ones being based on lossless or lossy compression algorithms. Here, I'll provide an overview of these techniques along with equations where relevant.

### Lossless Compression Techniques:

1. **Run-Length Encoding (RLE):**
   - RLE is a simple form of compression that replaces sequences of identical pixels with a count value and the pixel value itself.
   - Equation: Let \( C \) represent the count of repeated pixels and \( P \) represent the pixel value. The compressed data can be represented as \( (C, P) \).

2. **Huffman Coding:**
   - Huffman coding assigns variable-length codes to different symbols (e.g., pixel values) based on their frequencies in the data. More frequent symbols are assigned shorter codes.
   - Equation: The compression ratio achieved by Huffman coding depends on the frequency distribution of symbols in the data.

### Lossy Compression Techniques:

1. **Discrete Cosine Transform (DCT):**
   - DCT is widely used in lossy compression techniques like JPEG and MPEG. It transforms blocks of pixels from the spatial domain to the frequency domain.
   - Equation: 
     \[ F(u,v) = \frac{2}{N}C(u)C(v)\sum_{x=0}^{N-1}\sum_{y=0}^{N-1}f(x,y)\cos\left[\frac{(2x+1)u\pi}{2N}\right]\cos\left[\frac{(2y+1)v\pi}{2N}\right] \]
     Where \( F(u,v) \) is the DCT coefficient at frequency \( (u,v) \), \( f(x,y) \) is the pixel value at location \( (x,y) \), and \( C(u) \) and \( C(v) \) are normalization factors.

2. **Quantization:**
   - After DCT, quantization is applied to reduce the precision of DCT coefficients based on a quantization matrix.
   - Equation: \( Q_{ij} \) represents the quantization matrix. The quantized DCT coefficient \( F_q(u,v) \) is calculated as \( F_q(u,v) = \text{round}\left(\frac{F(u,v)}{Q(u,v)}\right) \).

3. **Motion Compensation (for Interframe Compression):**
   - In video compression standards like MPEG, motion compensation is used to exploit temporal redundancy between consecutive frames.
   - Equation: The motion vectors representing the displacement between blocks in consecutive frames are estimated using techniques like block matching.

4. **Entropy Coding:**
   - Entropy coding techniques like Arithmetic coding or Modified Huffman coding are used to further compress the quantized DCT coefficients or other transformed data.
   - Equation: The compression ratio depends on the efficiency of the entropy coding scheme.

### Overall Decompression Process:

1. **Bitstream Parsing:**
   - The compressed video data is parsed to extract compressed frames, motion vectors, and other necessary information.

2. **Entropy Decoding:**
   - Entropy-coded data is decoded using the corresponding decoding algorithm to recover the quantized DCT coefficients or other transformed data.

3. **Inverse Quantization:**
   - The quantized DCT coefficients are multiplied by the quantization matrix to obtain approximate DCT coefficients.

4. **Inverse DCT:**
   - Inverse DCT is applied to convert the approximate DCT coefficients back to spatial domain blocks.

5. **Motion Compensation (for Interframe Compression):**
   - Motion vectors are used to predict the current frame from previously decoded frames.

6. **Frame Reconstruction:**
   - Predicted frames and residual information (if any) are combined to reconstruct the original frames.

Video decompression involves a combination of these techniques, tailored to the specific compression standard used (e.g., MPEG-2, H.264, H.265). The effectiveness of decompression depends on the compression ratio achieved and the quality of reconstructed frames.

In [3]:
class VideoDecompressor:
    def __init__(self, compressed_data):
        self.compressed_data = compressed_data
        self.decompressed_frames = []

    def decompress(self):
        for compressed_frame in self.compressed_data:
            decompressed_frame = self._rle_decode(compressed_frame)
            self.decompressed_frames.append(decompressed_frame)

    def _rle_decode(self, compressed_frame):
        decompressed_frame = []
        for count, pixel_value in compressed_frame:
            decompressed_frame.extend([pixel_value] * count)
        return decompressed_frame

# Example compressed video data (RLE-encoded)
compressed_data = [
    [(3, 100), (2, 150), (1, 100)],
    [(1, 50), (3, 200), (1, 100)],
    [(2, 80), (2, 120), (1, 200)]
]

# Create a VideoDecompressor instance and decompress the video
decompressor = VideoDecompressor(compressed_data)
decompressor.decompress()

# Print decompressed frames
for i, frame in enumerate(decompressor.decompressed_frames):
    print(f"Frame {i+1}: {frame}")


Frame 1: [100, 100, 100, 150, 150, 100]
Frame 2: [50, 200, 200, 200, 100]
Frame 3: [80, 80, 120, 120, 200]


In [4]:
def run_length_encoding(data):
    encoded_data = []
    count = 1
    for i in range(1, len(data)):
        if data[i] == data[i - 1]:
            count += 1
        else:
            encoded_data.append((count, data[i - 1]))
            count = 1
    encoded_data.append((count, data[-1]))  # Add the last pixel sequence
    return encoded_data

def run_length_decoding(encoded_data):
    decoded_data = []
    for count, pixel in encoded_data:
        decoded_data.extend([pixel] * count)
    return decoded_data

# Example usage:
original_data = [1, 1, 1, 2, 2, 3, 3, 3, 3]
encoded_data = run_length_encoding(original_data)
decoded_data = run_length_decoding(encoded_data)

print("Original Data:", original_data)
print("Encoded Data:", encoded_data)
print("Decoded Data:", decoded_data)


Original Data: [1, 1, 1, 2, 2, 3, 3, 3, 3]
Encoded Data: [(3, 1), (2, 2), (4, 3)]
Decoded Data: [1, 1, 1, 2, 2, 3, 3, 3, 3]


In [5]:
import cv2

def decompress_video(input_file, output_file):
    # Open the compressed video file
    cap = cv2.VideoCapture(input_file)
    
    # Check if the video file opened successfully
    if not cap.isOpened():
        print("Error: Could not open video file.")
        return
    
    # Get video properties (frame width, height, frame rate, etc.)
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = cap.get(cv2.CAP_PROP_FPS)
    frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    
    # Define the codec and create VideoWriter object
    fourcc = cv2.VideoWriter_fourcc(*'XVID')  # Codec for output video
    out = cv2.VideoWriter(output_file, fourcc, fps, (frame_width, frame_height))
    
    # Decompress each frame and write it to the output video file
    while cap.isOpened():
        ret, frame = cap.read()
        if ret:
            # Display frame or process it
            # (You can perform any processing here before writing it to the output file)
            
            # Write the frame to the output video file
            out.write(frame)
            
            # Display the decompressed frame
            cv2.imshow('Decompressed Video', frame)
            
            # Exit on pressing 'q' key
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
        else:
            break
    
    # Release the video objects
    cap.release()
    out.release()
    cv2.destroyAllWindows()

# Example usage:
input_file = 'video.mp4'
output_file = 'decompressed_video.avi'
decompress_video(input_file, output_file)


In [6]:
import heapq
from collections import defaultdict, Counter

class HuffmanNode:
    def __init__(self, symbol, frequency):
        self.symbol = symbol
        self.frequency = frequency
        self.left = None
        self.right = None

    def __lt__(self, other):
        return self.frequency < other.frequency

def build_huffman_tree(freq_dict):
    priority_queue = [HuffmanNode(symbol, freq) for symbol, freq in freq_dict.items()]
    heapq.heapify(priority_queue)
    
    while len(priority_queue) > 1:
        left = heapq.heappop(priority_queue)
        right = heapq.heappop(priority_queue)
        
        merged = HuffmanNode(None, left.frequency + right.frequency)
        merged.left = left
        merged.right = right
        
        heapq.heappush(priority_queue, merged)
    
    return priority_queue[0]

def build_code_table(node, code='', code_table={}):
    if node is not None:
        if node.symbol is not None:
            code_table[node.symbol] = code
        build_code_table(node.left, code + '0', code_table)
        build_code_table(node.right, code + '1', code_table)
    return code_table

def huffman_encode(data):
    freq_dict = Counter(data)
    huffman_tree = build_huffman_tree(freq_dict)
    code_table = build_code_table(huffman_tree)
    encoded_data = ''.join(code_table[symbol] for symbol in data)
    return encoded_data, code_table

def huffman_decode(encoded_data, code_table):
    reverse_code_table = {code: symbol for symbol, code in code_table.items()}
    decoded_data = ''
    code = ''
    for bit in encoded_data:
        code += bit
        if code in reverse_code_table:
            decoded_data += reverse_code_table[code]
            code = ''
    return decoded_data

# Example usage:
data = "ABBCCCDDDDEEEEE"
encoded_data, code_table = huffman_encode(data)
decoded_data = huffman_decode(encoded_data, code_table)

print("Original Data:", data)
print("Encoded Data:", encoded_data)
print("Decoded Data:", decoded_data)


Original Data: ABBCCCDDDDEEEEE
Encoded Data: 010011011000000101010101111111111
Decoded Data: ABBCCCDDDDEEEEE


# Motion Detection:

Detecting and analyzing motion in video sequences.

## Motion Detection: An Overview

### Introduction
Motion detection is a process of identifying a change in position of an object relative to its surroundings or the change in the surroundings relative to an object. In the context of video sequences, motion detection involves analyzing a sequence of images to determine the presence, location, and nature of moving objects within the scene. This technology is widely used in various fields such as surveillance, automotive safety, human-computer interaction, and video compression.

### Key Concepts

#### 1. **Frame Differencing**
This is the simplest method for motion detection. It involves subtracting consecutive frames in a video sequence to detect changes.

**Equation:**
\[ D(x,y,t) = |I(x,y,t) - I(x,y,t-1)| \]
where \(D(x,y,t)\) is the difference image, \(I(x,y,t)\) is the intensity of the pixel at position \((x,y)\) in frame \(t\), and \(I(x,y,t-1)\) is the intensity of the same pixel in the previous frame.

#### 2. **Background Subtraction**
This method involves creating a model of the background and then detecting moving objects as those that deviate significantly from this background.

**Steps:**
- **Background Modeling:** Create a background model \(B(x,y)\).
- **Foreground Detection:** Identify moving objects by comparing the current frame \(I(x,y,t)\) with the background model.

**Equation:**
\[ F(x,y,t) = |I(x,y,t) - B(x,y)| \]
where \(F(x,y,t)\) is the foreground mask.

#### 3. **Optical Flow**
Optical flow is a method that estimates the motion of objects based on the apparent motion of brightness patterns in the image sequence.

**Equation:**
\[ I_x u + I_y v + I_t = 0 \]
where \(I_x\), \(I_y\), and \(I_t\) are the partial derivatives of the image intensity with respect to \(x\), \(y\), and time \(t\) respectively, and \(u\) and \(v\) are the components of the optical flow vector.

### Techniques and Algorithms

#### 1. **Lucas-Kanade Method**
This is a popular optical flow algorithm that assumes that the flow is essentially constant in a local neighborhood of the pixel under consideration.

**Equation:**
\[ \sum_{i} \sum_{j} \left[ I_x(i,j) I_x(i,j) \right] u + \sum_{i} \sum_{j} \left[ I_x(i,j) I_y(i,j) \right] v = - \sum_{i} \sum_{j} \left[ I_x(i,j) I_t(i,j) \right] \]
\[ \sum_{i} \sum_{j} \left[ I_x(i,j) I_y(i,j) \right] u + \sum_{i} \sum_{j} \left[ I_y(i,j) I_y(i,j) \right] v = - \sum_{i} \sum_{j} \left[ I_y(i,j) I_t(i,j) \right] \]

#### 2. **Gaussian Mixture Model (GMM)**
This is used for background subtraction, where the background is modeled using a mixture of Gaussians.

**Equation:**
\[ P(x) = \sum_{i=1}^{K} \omega_i \cdot \eta(x; \mu_i, \Sigma_i) \]
where \(P(x)\) is the probability of pixel value \(x\), \(\omega_i\) is the weight, \(\mu_i\) is the mean, \(\Sigma_i\) is the covariance matrix of the \(i\)-th Gaussian component, and \(\eta\) is the Gaussian probability density function.

### Applications

1. **Surveillance Systems:** Detecting intruders, monitoring activity, and tracking objects.
2. **Autonomous Vehicles:** Detecting pedestrians, other vehicles, and obstacles.
3. **Human-Computer Interaction:** Gesture recognition and activity monitoring.
4. **Video Compression:** Identifying regions of interest for efficient encoding.

### Challenges

1. **Illumination Changes:** Variations in lighting can affect detection accuracy.
2. **Complex Backgrounds:** Dynamic backgrounds can cause false detections.
3. **Object Occlusion:** Objects that are partially or fully occluded can be challenging to detect and track.
4. **Real-Time Processing:** High computational demands for processing video in real-time.

### Advanced Techniques

#### 1. **Deep Learning**
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are used for more robust motion detection and tracking by learning spatial and temporal features directly from data.

#### 2. **Motion History Images (MHI)**
MHIs are a way of encoding motion information into a single image by accumulating binary motion images over time.

**Equation:**
\[ H(x,y,t) = 
\begin{cases} 
\tau & \text{if } D(x,y,t) = 1 \\
\max(0, H(x,y,t-1) - 1) & \text{otherwise}
\end{cases} \]
where \(H(x,y,t)\) is the MHI and \(\tau\) is the maximum duration that a motion is recorded.

### Conclusion
Motion detection is a crucial aspect of video analysis, enabling numerous applications from security to automation. The choice of technique depends on the specific requirements and constraints of the application, with each method offering different strengths and addressing various challenges. Understanding the underlying principles and equations allows for the effective implementation and improvement of motion detection systems.

In [1]:
import cv2

# Open the video capture
cap = cv2.VideoCapture('video.mp4')

# Check if video opened successfully
if not cap.isOpened():
    print("Error: Could not open video.")
    exit()

# Read the first frame
ret, prev_frame = cap.read()
if not ret:
    print("Error: Could not read video frame.")
    exit()

# Convert the frame to grayscale
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)

while True:
    # Read the next frame
    ret, frame = cap.read()
    if not ret:
        break

    # Convert the frame to grayscale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Compute the absolute difference between current frame and previous frame
    diff = cv2.absdiff(prev_gray, gray)

    # Apply a binary threshold to get a binary image
    _, thresh = cv2.threshold(diff, 30, 255, cv2.THRESH_BINARY)

    # Display the results
    cv2.imshow('Frame', frame)
    cv2.imshow('Motion Detection', thresh)

    # Update the previous frame
    prev_gray = gray

    # Exit on 'q' key press
    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

# Release the video capture and close windows
cap.release()
cv2.destroyAllWindows()


In [2]:
import cv2

# Open the video capture
cap = cv2.VideoCapture('video.mp4')

# Check if video opened successfully
if not cap.isOpened():
    print("Error: Could not open video.")
    exit()

# Create the background subtractor
fgbg = cv2.createBackgroundSubtractorMOG2()

while True:
    # Read the next frame
    ret, frame = cap.read()
    if not ret:
        break

    # Apply the background subtractor to get the foreground mask
    fgmask = fgbg.apply(frame)

    # Display the results
    cv2.imshow('Frame', frame)
    cv2.imshow('Foreground Mask', fgmask)

    # Exit on 'q' key press
    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

# Release the video capture and close windows
cap.release()
cv2.destroyAllWindows()


# Video Stabilization:

Reducing shakiness or jitter in videos caused by camera motion.

Video stabilization is a technique used to reduce the effects of camera motion, such as shaking or jitter, in videos. It aims to produce smoother and more visually pleasing footage by compensating for undesired motion. Here's a comprehensive overview of video stabilization, including the methods, techniques, and equations involved:

### 1. Types of Camera Motion:
   - **Global Motion**: Entire frame moves due to camera panning, tilting, or zooming.
   - **Local Motion**: Relative motion within the frame caused by shaking or vibrations.

### 2. Techniques for Video Stabilization:

#### 2.1. Global Motion Compensation:
   - **Translation Model**: Estimates global motion parameters (translation vectors) to compensate for panning or tilting.
   - **Homography Model**: Computes a homography matrix to handle more complex global transformations like zooming or rotation.

#### 2.2. Local Motion Compensation:
   - **Feature-Based Methods**: Tracks keypoints or feature points across frames and computes transformations to align them.
   - **Block-Based Methods**: Divides frames into blocks and estimates motion vectors for each block to stabilize.

### 3. Equations:

#### 3.1. Global Motion Compensation:
   - **Translation Model**: 
     - \(x' = x + \Delta x\)
     - \(y' = y + \Delta y\)
   - **Homography Model**: 
     - \(s \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} = H \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}\)
     - \(H = K[R|t] = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix}\)

#### 3.2. Local Motion Compensation:
   - **Optical Flow**: 
     - \(I_x u + I_y v + I_t = 0\)
     - Where \(I_x\), \(I_y\) are partial derivatives of intensity with respect to x and y, and \(I_t\) is the derivative with respect to time.
   - **Lucas-Kanade Algorithm**:
     - \(A^T A x = A^T b\)
     - Where \(A\) is the Jacobian matrix of image gradients, \(x\) is the motion vector, and \(b\) is the difference between pixel values in consecutive frames.

### 4. Common Challenges:
   - **Noise**: Stabilization algorithms may amplify noise, leading to jittery or unnatural-looking videos.
   - **Computational Complexity**: Some methods require significant computational resources, making real-time stabilization challenging.
   - **Ghosting Artifacts**: Improper motion estimation can introduce ghosting or smearing artifacts in stabilized videos.

### 5. Practical Implementation:
   - **Pre-processing**: Convert videos to appropriate formats, resize for efficiency, and apply noise reduction if necessary.
   - **Motion Estimation**: Choose suitable algorithms based on the type and intensity of motion.
   - **Filtering**: Apply filters or smoothing techniques to reduce jitter while preserving scene details.
   - **Post-processing**: Adjust parameters, crop borders, and apply additional effects for better aesthetics.

### 6. Available Libraries and Tools:
   - **OpenCV**: Provides a comprehensive set of functions and algorithms for video stabilization, including global and local motion compensation.
   - **FFmpeg**: Offers video stabilization filters and command-line tools for batch processing and integration into video pipelines.
   - **Adobe Premiere Pro, Final Cut Pro, etc.**: Commercial video editing software often includes built-in stabilization features with user-friendly interfaces.

Video stabilization is an essential tool in modern video production, enabling filmmakers, content creators, and researchers to produce high-quality, professional-looking videos even under challenging conditions. It combines mathematical principles, computer vision techniques, and signal processing algorithms to achieve smooth and stable footage.

In [3]:
import cv2
import numpy as np

def stabilize_video(input_video_path, output_video_path):
    cap = cv2.VideoCapture(input_video_path)
    
    # Check if the video file is opened successfully
    if not cap.isOpened():
        print("Error: Could not open video file.")
        return
    
    # Get the first frame
    ret, prev_frame = cap.read()
    if not ret:
        print("Error: Could not read the first frame.")
        return
    
    # Initialize variables for motion estimation
    prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
    prev_pts = cv2.goodFeaturesToTrack(prev_gray, maxCorners=200, qualityLevel=0.01, minDistance=30, blockSize=3)
    prev_pts = np.float32(prev_pts).reshape(-1, 1, 2)
    
    # Create a VideoWriter object to save the stabilized video
    fps = cap.get(cv2.CAP_PROP_FPS)
    frame_size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    fourcc = cv2.VideoWriter_fourcc(*'XVID')
    out = cv2.VideoWriter(output_video_path, fourcc, fps, frame_size)
    
    # Process each frame
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        
        # Convert the current frame to grayscale
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        
        # Perform optical flow to estimate motion between frames
        curr_pts, status, err = cv2.calcOpticalFlowPyrLK(prev_gray, gray, prev_pts, None)
        
        # Filter out points with low status
        good_new = curr_pts[status == 1]
        good_old = prev_pts[status == 1]
        
        # Compute the transformation matrix using RANSAC
        M, _ = cv2.estimateAffinePartial2D(good_old, good_new)
        
        # Apply the transformation to stabilize the frame
        stabilized_frame = cv2.warpAffine(frame, M, (frame.shape[1], frame.shape[0]))
        
        # Write the stabilized frame to the output video
        out.write(stabilized_frame)
        
        # Update the variables for the next iteration
        prev_gray = gray.copy()
        prev_pts = good_new.reshape(-1, 1, 2)
    
    # Release VideoCapture and VideoWriter objects
    cap.release()
    out.release()
    cv2.destroyAllWindows()

# Example usage
input_video_path = 'video.mp4'
output_video_path = 'output_stabilized_video.avi'
stabilize_video(input_video_path, output_video_path)


# Video Enhancement:

Enhancing video quality through techniques like denoising, deblurring, and contrast adjustment.

Video enhancement refers to the process of improving the quality of a video by applying various techniques to enhance its visual appearance. This typically involves reducing noise, removing blur, adjusting contrast, and enhancing details. Below, I'll cover the main techniques used in video enhancement, along with equations where applicable:

### 1. Denoising:
   - **Objective**: Reduce noise in the video caused by low light conditions, high ISO settings, or electronic interference.
   - **Techniques**: 
     - **Temporal Filtering**: Use temporal information across consecutive frames to distinguish noise from actual content.
     - **Spatial Filtering**: Apply

spatial filters such as Gaussian, median, or bilateral filtering to remove noise while preserving image details.

   - **Equations**:
     - **Gaussian Filter**:
       \[ G(x, y) = \frac{1}{2\pi\sigma^2} \exp{\left(-\frac{x^2 + y^2}{2\sigma^2}\right)} \]
     - **Median Filter**:
       \[ I_{\text{med}}(x, y) = \text{median}(I(x + i, y + j)) \quad \text{for} \quad i, j \in \{-k, ..., k\} \]

### 2. Deblurring:
   - **Objective**: Remove blur caused by camera motion or out-of-focus conditions.
   - **Techniques**: 
     - **Wiener Filtering**: Utilize a mathematical model to estimate the blur kernel and deconvolve it from the image.
     - **Lucy-Richardson Deconvolution**: Iterate between the blurred image and its estimate to refine the deblurred result.

### 3. Contrast Adjustment:
   - **Objective**: Enhance the contrast of the video to improve visibility and make it more visually appealing.
   - **Techniques**:
     - **Histogram Equalization**: Spread out the intensity values across the histogram to utilize the full dynamic range.
     - **Contrast Stretching**: Linearly expand the intensity range between the minimum and maximum values.

### 4. Detail Enhancement:
   - **Objective**: Enhance fine details and textures in the video to make it appear sharper.
   - **Techniques**:
     - **Unsharp Masking**: Subtract a blurred version of the image from the original to enhance edges and details.
     - **High-Pass Filtering**: Accentuate high-frequency components using filters like Laplacian or Sobel.

### 5. Color Correction:
   - **Objective**: Adjust the color balance and tone of the video to achieve a desired look or correct for lighting conditions.
   - **Techniques**:
     - **White Balance Adjustment**: Scale the intensity of color channels to remove color casts.
     - **Color Grading**: Apply creative color adjustments to enhance the overall aesthetic of the video.

### 6. Video Super-Resolution:
   - **Objective**: Increase the spatial resolution of the video to improve clarity and sharpness.
   - **Techniques**:
     - **Single-Image Super-Resolution**: Use deep learning models to infer high-resolution details from low-resolution frames.
     - **Optical Flow-based Methods**: Utilize motion information across frames to enhance spatial resolution.

### 7. Artifact Removal:
   - **Objective**: Remove visual artifacts such as compression artifacts, flickering, or banding.
   - **Techniques**:
     - **Temporal Filtering**: Apply temporal averaging or interpolation to smooth out artifacts over time.
     - **Frequency Filtering**: Use frequency-domain techniques like Fourier or wavelet transforms to identify and remove specific artifacts.

### 8. Video Fusion:
   - **Objective**: Combine multiple video streams or frames to generate a high-quality output.
   - **Techniques**:
     - **Multi-Exposure Fusion**: Blend differently exposed frames to create an image with balanced brightness and contrast.
     - **Multi-Frame Super-Resolution**: Combine information from multiple frames to enhance spatial resolution and reduce noise.

### 9. Motion Compensation:
   - **Objective**: Compensate for motion in the video to stabilize or align frames.
   - **Techniques**:
     - **Optical Flow-based Stabilization**: Estimate motion vectors between frames and warp or interpolate frames to align them.
     - **Global Motion Estimation**: Model and compensate for large-scale motion such as camera panning or rotation.

### Conclusion:
Video enhancement techniques play a crucial role in improving the quality and visual appeal of videos across various applications, including surveillance, entertainment, and scientific imaging. By applying a combination of denoising, deblurring, contrast adjustment, detail enhancement, and other techniques, it's possible to significantly enhance the clarity, sharpness, and overall quality of video content.

In [4]:
import cv2

def enhance_video(input_video_path, output_video_path):
    cap = cv2.VideoCapture(input_video_path)
    
    # Check if the video file is opened successfully
    if not cap.isOpened():
        print("Error: Could not open video file.")
        return
    
    # Create a VideoWriter object to save the enhanced video
    fps = cap.get(cv2.CAP_PROP_FPS)
    frame_size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    fourcc = cv2.VideoWriter_fourcc(*'XVID')
    out = cv2.VideoWriter(output_video_path, fourcc, fps, frame_size)
    
    # Process each frame
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        
        # Apply denoising using bilateral filter
        denoised_frame = cv2.bilateralFilter(frame, d=9, sigmaColor=75, sigmaSpace=75)
        
        # Apply contrast adjustment using histogram equalization
        lab_frame = cv2.cvtColor(denoised_frame, cv2.COLOR_BGR2LAB)
        l_channel, a_channel, b_channel = cv2.split(lab_frame)
        clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8, 8))
        enhanced_l_channel = clahe.apply(l_channel)
        enhanced_lab_frame = cv2.merge([enhanced_l_channel, a_channel, b_channel])
        enhanced_frame = cv2.cvtColor(enhanced_lab_frame, cv2.COLOR_LAB2BGR)
        
        # Write the enhanced frame to the output video
        out.write(enhanced_frame)
    
    # Release VideoCapture and VideoWriter objects
    cap.release()
    out.release()
    cv2.destroyAllWindows()

# Example usage
input_video_path = 'video.mp4'
output_video_path = 'output_enhanced_video.avi'
enhance_video(input_video_path, output_video_path)


# Video Segmentation:

Partitioning a video into segments based on object boundaries or motion.

Video segmentation involves partitioning a video into segments, either by identifying object boundaries or by tracking motion. This process is crucial for various applications, such as object detection, activity recognition, video editing, and autonomous driving.

### Types of Video Segmentation

1. **Temporal Segmentation**:
   - Divides a video into distinct temporal segments, often corresponding to different scenes or events.
   - Techniques include shot boundary detection and scene change detection.

2. **Spatial Segmentation**:
   - Segments individual frames into regions based on spatial characteristics like color, texture, and edges.
   - Often used as a precursor to object recognition.

3. **Spatio-Temporal Segmentation**:
   - Considers both spatial and temporal dimensions to segment moving objects across multiple frames.
   - Combines motion information with spatial features to improve segmentation accuracy.

### Methods of Video Segmentation

#### 1. Manual Annotation
   - Human annotators label the objects or regions in each frame.
   - Extremely accurate but time-consuming and impractical for large datasets.

#### 2. Semi-Automatic Methods
   - Combine manual input with automated algorithms.
   - Human operators correct or refine segments generated by algorithms.

#### 3. Fully Automatic Methods
   - Rely on algorithms to segment videos without human intervention.
   - Include classical methods (e.g., clustering, thresholding) and modern methods (e.g., deep learning).

### Techniques and Algorithms

1. **Thresholding**:
   - Simple method based on pixel intensity.
   - \( \text{Binary Threshold} : I(x,y) > T \) where \( I(x,y) \) is the pixel intensity and \( T \) is the threshold.

2. **Clustering**:
   - Groups similar pixels together.
   - K-means clustering is commonly used, where \( \mu_i \) are the centroids:
     \[ J = \sum_{i=1}^{k} \sum_{x_j \in S_i} \| x_j - \mu_i \|^2 \]

3. **Graph-Based Methods**:
   - Represent pixels as nodes and edges represent similarity.
   - Normalized cuts minimize the disassociation between segments:
     \[ \text{Ncut}(A,B) = \frac{\text{cut}(A,B)}{\text{assoc}(A,V)} + \frac{\text{cut}(A,B)}{\text{assoc}(B,V)} \]

4. **Optical Flow**:
   - Estimates motion between consecutive frames.
   - Horn-Schunck method uses the brightness constancy constraint:
     \[ I_x u + I_y v + I_t = 0 \]
     where \( I_x, I_y, I_t \) are image gradients, and \( u, v \) are the flow vectors.

5. **Superpixel Segmentation**:
   - Groups pixels into perceptually meaningful atomic regions.
   - Simple Linear Iterative Clustering (SLIC) superpixels minimize the distance metric \( D \):
     \[ D = \sqrt{ \frac{d_c^2}{N_c^2} + \frac{d_s^2}{N_s^2} } \]
     where \( d_c \) is color distance and \( d_s \) is spatial distance.

6. **Deep Learning Methods**:
   - Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) for feature extraction and temporal coherence.
   - Fully Convolutional Networks (FCNs) for pixel-wise classification.
   - Example architectures: Mask R-CNN, U-Net, SegNet.

### Applications of Video Segmentation

1. **Object Tracking**:
   - Track objects across frames for surveillance and autonomous driving.

2. **Video Editing**:
   - Automatic editing tools for content creation.

3. **Activity Recognition**:
   - Identify human activities by analyzing segmented motion patterns.

4. **Medical Imaging**:
   - Segment anatomical structures in medical videos for diagnosis and treatment planning.

### Challenges

- **Complex Motion**: Handling complex and non-rigid motion patterns.
- **Occlusion**: Dealing with objects that partially or fully occlude each other.
- **Real-Time Processing**: Achieving real-time performance for applications like autonomous driving.
- **Generalization**: Ensuring models work well across diverse video types and environments.

### Evaluation Metrics

- **Intersection over Union (IoU)**:
  \[ \text{IoU} = \frac{\text{Area of Overlap}}{\text{Area of Union}} \]

- **Precision and Recall**:
  \[ \text{Precision} = \frac{TP}{TP + FP} \]
  \[ \text{Recall} = \frac{TP}{TP + FN} \]

- **Boundary F1 Score**:
  - Measures the accuracy of the predicted boundary against the ground truth.

In summary, video segmentation is a multifaceted problem involving temporal, spatial, and spatio-temporal aspects. It leverages various algorithms ranging from simple thresholding to advanced deep learning techniques to partition videos into meaningful segments for further analysis and application.

In [1]:
import cv2
import torch
from torchvision import models, transforms

# Load a pretrained segmentation model
model = models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Define the transformation
transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.ToTensor(),
])

# Load the video
video_path = 'video.mp4'
cap = cv2.VideoCapture(video_path)

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Preprocess the frame
    input_frame = transform(frame).unsqueeze(0)

    # Perform segmentation
    with torch.no_grad():
        outputs = model(input_frame)

    # Post-process and visualize the segmentation results
    # (This part depends on the model's output format and desired visualization)

    # Display the resulting frame
    cv2.imshow('Frame', frame)

    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

# Release the video capture object and close windows
cap.release()
cv2.destroyAllWindows()


Downloading: "https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth" to C:\Users\varun/.cache\torch\hub\checkpoints\maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
100%|████████████████████████████████████████████████████████████████████████████████| 170M/170M [05:11<00:00, 571kB/s]


In [2]:
import cv2
import numpy as np

def calculate_histogram(frame):
    """
    Calculate the histogram of a frame.
    """
    hist = cv2.calcHist([frame], [0, 1, 2], None, [8, 8, 8], [0, 256, 0, 256, 0, 256])
    cv2.normalize(hist, hist)
    return hist.flatten()

def detect_shot_boundaries(video_path, threshold=0.5):
    """
    Detects shot boundaries in a video using histogram differences.
    Args:
        video_path (str): Path to the video file.
        threshold (float): Threshold for detecting shot boundaries.
    Returns:
        List of frame indices where shot boundaries occur.
    """
    cap = cv2.VideoCapture(video_path)
    ret, prev_frame = cap.read()
    
    if not ret:
        print("Failed to read video")
        return []
    
    prev_hist = calculate_histogram(prev_frame)
    frame_idx = 0
    shot_boundaries = []

    while True:
        ret, frame = cap.read()
        if not ret:
            break
        
        curr_hist = calculate_histogram(frame)
        hist_diff = cv2.compareHist(prev_hist, curr_hist, cv2.HISTCMP_BHATTACHARYYA)
        
        if hist_diff > threshold:
            shot_boundaries.append(frame_idx)
        
        prev_hist = curr_hist
        frame_idx += 1
    
    cap.release()
    return shot_boundaries

# Example usage
video_path = 'video.mp4'
shot_boundaries = detect_shot_boundaries(video_path, threshold=0.5)
print("Detected shot boundaries at frames:", shot_boundaries)


Detected shot boundaries at frames: []


In [3]:
def calculate_edge_change_ratio(prev_frame, curr_frame):
    """
    Calculate the Edge Change Ratio (ECR) between two frames.
    """
    prev_edges = cv2.Canny(prev_frame, 100, 200)
    curr_edges = cv2.Canny(curr_frame, 100, 200)
    
    diff = cv2.absdiff(prev_edges, curr_edges)
    non_zero_diff = np.count_nonzero(diff)
    
    ecr = non_zero_diff / (prev_frame.shape[0] * prev_frame.shape[1])
    return ecr

def detect_shot_boundaries_with_ecr(video_path, hist_threshold=0.5, ecr_threshold=0.02):
    cap = cv2.VideoCapture(video_path)
    ret, prev_frame = cap.read()
    
    if not ret:
        print("Failed to read video")
        return []
    
    prev_hist = calculate_histogram(prev_frame)
    frame_idx = 0
    shot_boundaries = []

    while True:
        ret, frame = cap.read()
        if not ret:
            break
        
        curr_hist = calculate_histogram(frame)
        hist_diff = cv2.compareHist(prev_hist, curr_hist, cv2.HISTCMP_BHATTACHARYYA)
        ecr = calculate_edge_change_ratio(prev_frame, frame)
        
        if hist_diff > hist_threshold and ecr > ecr_threshold:
            shot_boundaries.append(frame_idx)
        
        prev_hist = curr_hist
        prev_frame = frame
        frame_idx += 1
    
    cap.release()
    return shot_boundaries

# Example usage
video_path = 'video.mp4'
shot_boundaries = detect_shot_boundaries_with_ecr(video_path, hist_threshold=0.5, ecr_threshold=0.02)
print("Detected shot boundaries at frames:", shot_boundaries)


Detected shot boundaries at frames: []


In [4]:
import cv2
import numpy as np
from skimage.segmentation import slic, mark_boundaries
from skimage.color import rgb2lab

def segment_image_slic(image, n_segments=400, compactness=10):
    """
    Segments an image using the SLIC superpixel algorithm.
    Args:
        image (numpy.ndarray): Input image.
        n_segments (int): The approximate number of superpixels.
        compactness (float): Balances color proximity and space proximity.
    Returns:
        segments (numpy.ndarray): The segmented image.
    """
    lab_image = rgb2lab(image)
    segments = slic(lab_image, n_segments=n_segments, compactness=compactness)
    return segments

def apply_edge_detection(image):
    """
    Applies edge detection on the image.
    Args:
        image (numpy.ndarray): Input image.
    Returns:
        edges (numpy.ndarray): Image with edges detected.
    """
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    edges = cv2.Canny(gray, 100, 200)
    return edges

def draw_contours(image, segments):
    """
    Draws contours on the segmented image.
    Args:
        image (numpy.ndarray): Input image.
        segments (numpy.ndarray): Segmented image.
    Returns:
        image_with_contours (numpy.ndarray): Image with contours drawn.
    """
    contours, _ = cv2.findContours(segments, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    image_with_contours = image.copy()
    cv2.drawContours(image_with_contours, contours, -1, (0, 255, 0), 2)
    return image_with_contours

# Example usage
image_path = 'My.jpg'
image = cv2.imread(image_path)
segments = segment_image_slic(image, n_segments=400, compactness=10)

# Convert segments to a format suitable for drawing contours
segments_uint8 = (segments * (255 / segments.max())).astype(np.uint8)
edges = apply_edge_detection(image)
image_with_contours = draw_contours(image, edges)

# Display the results
cv2.imshow('Original Image', image)
cv2.imshow('Superpixels', mark_boundaries(image, segments))
cv2.imshow('Edges', edges)
cv2.imshow('Contours', image_with_contours)
cv2.waitKey(0)
cv2.destroyAllWindows()


In [5]:
import cv2
import numpy as np
from skimage.segmentation import slic, mark_boundaries
from skimage.color import rgb2lab

def segment_image_slic(frame, n_segments=400, compactness=10):
    """
    Segments an image using the SLIC superpixel algorithm.
    Args:
        frame (numpy.ndarray): Input frame.
        n_segments (int): The approximate number of superpixels.
        compactness (float): Balances color proximity and space proximity.
    Returns:
        segments (numpy.ndarray): The segmented frame.
    """
    lab_frame = rgb2lab(frame)
    segments = slic(lab_frame, n_segments=n_segments, compactness=compactness)
    return segments

def apply_background_subtraction(frame, back_subtractor):
    """
    Applies background subtraction to detect motion.
    Args:
        frame (numpy.ndarray): Input frame.
        back_subtractor (cv2.BackgroundSubtractor): Background subtractor object.
    Returns:
        fg_mask (numpy.ndarray): Foreground mask.
    """
    fg_mask = back_subtractor.apply(frame)
    return fg_mask

def combine_segmentation_and_motion(segments, fg_mask):
    """
    Combines spatial segmentation with motion detection.
    Args:
        segments (numpy.ndarray): Segmented frame.
        fg_mask (numpy.ndarray): Foreground mask.
    Returns:
        combined_mask (numpy.ndarray): Combined segmentation mask.
    """
    combined_mask = np.zeros_like(fg_mask)
    for segment_val in np.unique(segments):
        segment_mask = (segments == segment_val).astype(np.uint8)
        if np.sum(fg_mask[segment_mask == 1]) > 0:
            combined_mask[segment_mask == 1] = 255
    return combined_mask

# Example usage
video_path = 'video.mp4'
cap = cv2.VideoCapture(video_path)
back_subtractor = cv2.createBackgroundSubtractorMOG2(history=500, varThreshold=16, detectShadows=True)

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    
    # Apply SLIC superpixel segmentation
    segments = segment_image_slic(frame, n_segments=400, compactness=10)
    
    # Apply background subtraction
    fg_mask = apply_background_subtraction(frame, back_subtractor)
    
    # Combine segmentation with motion detection
    combined_mask = combine_segmentation_and_motion(segments, fg_mask)
    
    # Visualize the results
    frame_with_boundaries = mark_boundaries(frame, segments)
    combined_visual = cv2.bitwise_and(frame, frame, mask=combined_mask)
    
    cv2.imshow('Original Frame', frame)
    cv2.imshow('Superpixels', frame_with_boundaries)
    cv2.imshow('Foreground Mask', fg_mask)
    cv2.imshow('Combined Segmentation', combined_visual)
    
    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()


# Frame Interpolation:

Generating intermediate frames between existing frames to smoothen motion or increase frame rate.

Frame interpolation is a technique used in video processing to generate intermediate frames between existing frames, with the goal of smoothening motion or increasing the frame rate of a video. This can be particularly useful in scenarios such as converting videos from lower frame rates (e.g., 24fps) to higher frame rates (e.g., 60fps), or in applications like slow-motion video where additional frames are needed to fill in the gaps between original frames.

There are several methods for frame interpolation, ranging from simple linear blending to more sophisticated algorithms involving motion estimation and compensation. One common approach is optical flow-based interpolation, which estimates the motion between adjacent frames and generates intermediate frames based on this motion information.

The basic equation for optical flow estimation is:

\[I_x \cdot u + I_y \cdot v + I_t = 0\]

Where:
- \(I_x\), \(I_y\): Spatial gradients of the image intensity in the x and y directions.
- \(u\), \(v\): Horizontal and vertical components of the optical flow vector.
- \(I_t\): Temporal gradient of the image intensity.

Optical flow algorithms solve this equation to estimate the motion vectors (\(u\), \(v\)) between consecutive frames. Once the motion vectors are obtained, intermediate frames can be generated by warping pixels from one frame to another based on the estimated motion.

One common method for generating intermediate frames is by using a weighted average of neighboring frames. For example, given two frames \(I_1\) and \(I_2\) with motion vectors \(u\) and \(v\), the intermediate frame \(I_{\text{interpolated}}\) at time \(t\) can be computed as:

\[I_{\text{interpolated}}(x, y, t) = (1 - \alpha) \cdot I_1(x - u \cdot t, y - v \cdot t) + \alpha \cdot I_2(x + u \cdot t, y + v \cdot t)\]

Where:
- \(x\), \(y\): Spatial coordinates.
- \(\alpha\): Interpolation factor, typically ranging from 0 to 1, representing the proportion of contribution from each neighboring frame.

This equation blends pixels from the two neighboring frames based on their estimated motion vectors.

More advanced interpolation techniques may involve temporal filtering, motion-compensated prediction, or deep learning-based approaches for better quality and accuracy.

Overall, frame interpolation is a powerful tool for enhancing video quality, improving motion smoothness, and enabling various video processing applications. However, it's important to consider computational complexity and potential artifacts introduced by the interpolation process.

In [6]:
import numpy as np

def interpolate_frames(frame1, frame2, flow, alpha):
    """
    Interpolate frames using linear blending based on optical flow.

    Args:
    - frame1: First frame (numpy array).
    - frame2: Second frame (numpy array).
    - flow: Optical flow vectors (numpy array of shape [height, width, 2]).
    - alpha: Interpolation factor (float).

    Returns:
    - interpolated_frame: Interpolated frame (numpy array).
    """
    # Get dimensions of frames
    height, width = frame1.shape[:2]

    # Initialize interpolated frame
    interpolated_frame = np.zeros_like(frame1)

    # Generate intermediate frame by blending pixels from frame1 and frame2 based on flow
    for y in range(height):
        for x in range(width):
            # Compute corresponding coordinates in frame2 using flow
            new_x = int(x + flow[y, x, 0])
            new_y = int(y + flow[y, x, 1])

            # Ensure the coordinates are within the frame boundaries
            if 0 <= new_x < width and 0 <= new_y < height:
                # Linear blending based on interpolation factor
                interpolated_frame[y, x] = (1 - alpha) * frame1[y, x] + alpha * frame2[new_y, new_x]
            else:
                # If the coordinates are out of bounds, use only frame1
                interpolated_frame[y, x] = frame1[y, x]

    return interpolated_frame


In [11]:
import cv2
import numpy as np

def compute_optical_flow(frame1, frame2):
    """
    Compute optical flow using Lucas-Kanade method.

    Args:
    - frame1: First frame (numpy array).
    - frame2: Second frame (numpy array).

    Returns:
    - flow: Optical flow vectors (numpy array of shape [height, width, 2]).
    """
    # Convert frames to grayscale
    gray1 = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)
    gray2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

    # Parameters for Lucas-Kanade optical flow
    lk_params = dict(winSize=(15, 15),
                     maxLevel=2,
                     criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

    # Compute optical flow
    flow, _ = cv2.calcOpticalFlowPyrLK(gray1, gray2, None, None, **lk_params)

    return flow

# Example usage
frame1 = cv2.imread('My.jpg')
frame2 = cv2.imread('My.jpg')

# Compute optical flow
#flow = compute_optical_flow(frame1, frame2)

# Interpolation factor
alpha = 0.5

# Interpolate frames
#interpolated_frame = interpolate_frames(frame1, frame2, flow, alpha)

# Display or save interpolated frame
#cv2.imshow('Interpolated Frame', interpolated_frame)
#cv2.waitKey(0)
#cv2.destroyAllWindows()


# Video Summarization:

Generating concise representations of videos by selecting key frames or segments.

Video summarization is a process that involves creating a concise and informative summary of a video by selecting key frames or segments that capture the most important content. This process is particularly useful for efficiently browsing large video collections, enabling quick access to the most relevant parts of a video, and reducing the time required to view the entire content.

### Types of Video Summarization

1. **Static Video Summarization (Keyframe Extraction):**
   - **Objective:** Select a set of representative frames (keyframes) from the video that best capture the main content.
   - **Approach:** Techniques can include clustering, edge detection, motion analysis, and more.

2. **Dynamic Video Summarization (Video Skimming):**
   - **Objective:** Create a shorter video that includes the most important segments.
   - **Approach:** Techniques often involve identifying significant scenes based on various criteria such as motion, audio, and semantic content.

### Key Techniques in Video Summarization

#### 1. Clustering-based Methods
- **K-means Clustering:** Frames are grouped into clusters based on visual features. The centroid of each cluster represents a keyframe.
- **Equation:**
  \[
  J = \sum_{i=1}^{k} \sum_{x \in C_i} \| x - \mu_i \|^2
  \]
  where \( J \) is the objective function, \( k \) is the number of clusters, \( x \) is a data point (frame), and \( \mu_i \) is the centroid of cluster \( C_i \).

#### 2. Graph-based Methods
- **Shot Boundary Detection:** The video is divided into shots, and a graph is constructed where nodes represent shots and edges represent similarities.
- **Dominant Set Clustering:** Finds the most representative frames in the graph.
- **Equation:**
  \[
  W(i,j) = \text{similarity}(i,j)
  \]
  where \( W(i,j) \) is the weight (similarity) between frames \( i \) and \( j \).

#### 3. Semantic Analysis
- **Object and Scene Recognition:** Use of deep learning models to recognize objects and scenes, which helps in identifying important segments.
- **Text and Speech Analysis:** Transcription and analysis of spoken content to find important segments.
- **Equation:**
  \[
  \text{Importance}(f) = \alpha \cdot \text{VisualScore}(f) + \beta \cdot \text{AudioScore}(f) + \gamma \cdot \text{TextScore}(f)
  \]
  where \( \alpha, \beta, \gamma \) are weights, and \( f \) is a frame or segment.

#### 4. Attention Mechanisms
- **Deep Learning Models:** Use of attention mechanisms to focus on important parts of the video. Models such as transformers can weigh the importance of different parts of the input.
- **Equation:**
  \[
  \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V
  \]
  where \( Q \) is the query, \( K \) is the key, \( V \) is the value, and \( d_k \) is the dimension of the key.

### Evaluation Metrics
- **Precision and Recall:** Measure the accuracy of the selected keyframes or segments compared to a ground truth.
  \[
  \text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}
  \]
  \[
  \text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}
  \]
- **F-Score:** Harmonic mean of precision and recall.
  \[
  F_1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
  \]
- **User Studies:** Subjective evaluation by human viewers to assess the quality and usefulness of the summaries.

### Applications
- **Content Browsing:** Quickly navigate through large video datasets.
- **Video Surveillance:** Identify key events from security footage.
- **Media and Entertainment:** Generate trailers or highlights for movies and sports.
- **Education:** Create concise educational content.

### Challenges
- **Diversity:** Ensuring the summary captures a wide range of content.
- **Relevance:** Selecting the most relevant frames or segments.
- **Temporal Coherence:** Maintaining a logical flow in the summarized video.
- **Scalability:** Efficiently summarizing videos with varying lengths and content types.

### Future Directions
- **Multimodal Summarization:** Combining visual, auditory, and textual information for richer summaries.
- **Personalization:** Tailoring summaries based on user preferences and viewing history.
- **Real-time Summarization:** Developing systems that can summarize live video streams.

In summary, video summarization leverages a variety of techniques from machine learning, computer vision, and natural language processing to create concise and informative representations of videos. By addressing the challenges and improving the techniques, the field continues to evolve towards more efficient and effective summarization methods.

In [1]:
import cv2
import numpy as np
from sklearn.cluster import KMeans
import os

def extract_frames(video_path, interval=30):
    """Extract frames from a video at regular intervals."""
    cap = cv2.VideoCapture(video_path)
    frames = []
    success, frame = cap.read()
    count = 0

    while success:
        if count % interval == 0:
            frames.append(frame)
        success, frame = cap.read()
        count += 1

    cap.release()
    return frames

def compute_histograms(frames):
    """Compute color histograms for a list of frames."""
    histograms = []
    for frame in frames:
        hist = cv2.calcHist([frame], [0, 1, 2], None, [8, 8, 8], [0, 256, 0, 256, 0, 256])
        hist = cv2.normalize(hist, hist).flatten()
        histograms.append(hist)
    return histograms

def cluster_histograms(histograms, n_clusters=5):
    """Cluster histograms using K-means."""
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    labels = kmeans.fit_predict(histograms)
    return labels

def select_key_frames(frames, labels):
    """Select one key frame from each cluster."""
    key_frames = []
    for cluster in np.unique(labels):
        indices = np.where(labels == cluster)[0]
        key_frame_idx = indices[len(indices) // 2]
        key_frames.append(frames[key_frame_idx])
    return key_frames

def save_frames(frames, output_dir):
    """Save key frames to the specified directory."""
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    for i, frame in enumerate(frames):
        output_path = os.path.join(output_dir, f"key_frame_{i+1}.jpg")
        cv2.imwrite(output_path, frame)

def summarize_video(video_path, output_dir, frame_interval=30, n_clusters=5):
    """Summarize the video by selecting key frames."""
    frames = extract_frames(video_path, frame_interval)
    histograms = compute_histograms(frames)
    labels = cluster_histograms(histograms, n_clusters)
    key_frames = select_key_frames(frames, labels)
    save_frames(key_frames, output_dir)

# Example usage
video_path = 'video.mp4'
output_dir = 'path/to/output/directory'
summarize_video(video_path, output_dir)




# Video Annotation:

Adding metadata or labels to video frames for analysis or visualization.

## Video Annotation: Adding Metadata or Labels to Video Frames

Video annotation is the process of adding metadata or labels to video frames to facilitate analysis, visualization, and various applications like object detection, activity recognition, and machine learning model training. This process involves identifying and marking elements within the video frames, such as objects, actions, and events, to create a structured dataset.

### Types of Video Annotation

1. **Bounding Boxes**:
   - Rectangular boxes are drawn around objects to define their position and size.
   - **Equation**: 
     \[
     \text{Bounding Box} = (x, y, w, h)
     \]
     where \((x, y)\) is the top-left corner, and \(w, h\) are the width and height.

2. **Polygonal Segmentation**:
   - Polygons are drawn around objects to capture their precise shape.
   - **Equation**:
     \[
     \text{Polygon} = \{(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\}
     \]
     where \((x_i, y_i)\) are the vertices of the polygon.

3. **Key Points**:
   - Specific points on an object are marked, often used for facial features or body joints.
   - **Equation**:
     \[
     \text{Key Points} = \{(x_1, y_1), (x_2, y_2), \ldots, (x_k, y_k)\}
     \]
     where \((x_i, y_i)\) are the coordinates of key points.

4. **Lines and Splines**:
   - Lines or curves are drawn to annotate boundaries or paths.
   - **Equation** (for a line):
     \[
     \text{Line} = \{(x_1, y_1), (x_2, y_2)\}
     \]

5. **Semantic Segmentation**:
   - Each pixel is labeled with a class, creating a mask for the object.
   - **Equation**:
     \[
     \text{Mask} = M_{ij}
     \]
     where \(M_{ij}\) is the class label for the pixel at position \((i, j)\).

### Applications of Video Annotation

1. **Object Detection and Tracking**:
   - Used in autonomous driving, surveillance, and robotics to identify and follow objects.

2. **Action Recognition**:
   - Understanding and classifying actions in sports, security, and entertainment.

3. **Healthcare**:
   - Analyzing medical videos for diagnosis and research.

4. **Training Machine Learning Models**:
   - Creating labeled datasets to train models for computer vision tasks.

### Process of Video Annotation

1. **Data Collection**:
   - Gather videos from various sources such as cameras, drones, or synthetic data.

2. **Annotation Tools**:
   - Use tools like CVAT, Labelbox, or VATIC to add annotations.
   
3. **Annotation**:
   - Human annotators or automated systems mark the frames with the required labels.
   
4. **Quality Control**:
   - Review and correct annotations to ensure accuracy and consistency.
   
5. **Exporting Annotations**:
   - Save the annotations in formats like JSON, XML, or CSV for further use.

### Example of a Simple Annotation Workflow

1. **Load Video**:
   ```python
   import cv2
   
   video_path = "video.mp4"
   cap = cv2.VideoCapture(video_path)
   ```

2. **Annotate Frames**:
   ```python
   annotations = []
   
   while cap.isOpened():
       ret, frame = cap.read()
       if not ret:
           break
       
       # Assume we use a predefined function to get bounding boxes
       bounding_boxes = detect_objects(frame)
       
       for box in bounding_boxes:
           annotations.append({
               'frame': int(cap.get(cv2.CAP_PROP_POS_FRAMES)),
               'x': box[0],
               'y': box[1],
               'width': box[2],
               'height': box[3]
           })
   ```

3. **Save Annotations**:
   ```python
   import json
   
   with open("annotations.json", "w") as f:
       json.dump(annotations, f)
   ```

### Mathematical Models and Algorithms

1. **Intersection over Union (IoU)**:
   - Measures the overlap between two bounding boxes.
   - **Equation**:
     \[
     \text{IoU} = \frac{\text{Area of Overlap}}{\text{Area of Union}}
     \]

2. **Non-Maximum Suppression (NMS)**:
   - Reduces redundant bounding boxes by selecting the best ones.
   - **Algorithm**:
     1. Sort detected boxes by confidence score.
     2. Select the highest score box and remove boxes with IoU above a threshold.
     3. Repeat until no more boxes are left.

### Conclusion

Video annotation is a crucial step in the preparation of data for video analysis and computer vision applications. By accurately labeling video frames, we can train and evaluate models that perform tasks like object detection, tracking, and action recognition. The choice of annotation method and tools depends on the specific requirements of the project and the nature of the video data.

In [3]:
import cv2
import numpy as np

# Global variables
drawing = False
ix, iy = -1, -1
annotations = []

# Mouse callback function
def draw_rectangle(event, x, y, flags, param):
    global ix, iy, drawing, frame, annotations

    if event == cv2.EVENT_LBUTTONDOWN:
        drawing = True
        ix, iy = x, y

    elif event == cv2.EVENT_MOUSEMOVE:
        if drawing:
            frame_temp = frame.copy()
            cv2.rectangle(frame_temp, (ix, iy), (x, y), (0, 255, 0), 2)
            cv2.imshow('frame', frame_temp)

    elif event == cv2.EVENT_LBUTTONUP:
        drawing = False
        cv2.rectangle(frame, (ix, iy), (x, y), (0, 255, 0), 2)
        annotations.append((ix, iy, x, y))

def annotate_video(video_path):
    global frame
    cap = cv2.VideoCapture(video_path)
    frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    frame_idx = 0

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        
        frame_idx += 1
        cv2.namedWindow('frame')
        cv2.setMouseCallback('frame', draw_rectangle)

        while True:
            cv2.imshow('frame', frame)
            key = cv2.waitKey(1) & 0xFF

            if key == ord('n'):  # Next frame
                annotations.clear()
                break
            elif key == ord('q'):  # Quit
                cap.release()
                cv2.destroyAllWindows()
                return

        print(f'Processed frame {frame_idx}/{frame_count}')

    cap.release()
    cv2.destroyAllWindows()

    # Save annotations
    with open('annotations.txt', 'w') as f:
        for annot in annotations:
            f.write(f'{annot}\n')

if __name__ == "__main__":
    video_path = 'video.mp4'
    annotate_video(video_path)


Processed frame 1/210
Processed frame 2/210
Processed frame 3/210
Processed frame 4/210
Processed frame 5/210
Processed frame 6/210
Processed frame 7/210
Processed frame 8/210
Processed frame 9/210
Processed frame 10/210
Processed frame 11/210
Processed frame 12/210
Processed frame 13/210
Processed frame 14/210
Processed frame 15/210
Processed frame 16/210
Processed frame 17/210
Processed frame 18/210
Processed frame 19/210
Processed frame 20/210
Processed frame 21/210
Processed frame 22/210
Processed frame 23/210
Processed frame 24/210
Processed frame 25/210
Processed frame 26/210
Processed frame 27/210
Processed frame 28/210
Processed frame 29/210
Processed frame 30/210
Processed frame 31/210
Processed frame 32/210
Processed frame 33/210
Processed frame 34/210
Processed frame 35/210
Processed frame 36/210
Processed frame 37/210
Processed frame 38/210
Processed frame 39/210
Processed frame 40/210
Processed frame 41/210
Processed frame 42/210
Processed frame 43/210
Processed frame 44/2