<a href="https://colab.research.google.com/github/vamshishashikrishna/LearnAnalytics/blob/main/Bike_Lane_Detection_with_Mask2Former.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.
import kagglehub
andandand_roads_of_berlin_path = kagglehub.dataset_download('andandand/roads-of-berlin')

print('Data source import complete.')


Downloading from https://www.kaggle.com/api/v1/datasets/download/andandand/roads-of-berlin?dataset_version_number=2...


100%|██████████| 175M/175M [00:03<00:00, 60.9MB/s]

Extracting files...





Data source import complete.


# Bike Lane Detection using Mask2Former

## 1. Introduction and Setup

This notebook demonstrates how to use the Mask2Former model, pretrained on [Mapillary Vistas](https://www.mapillary.com/dataset/vistas),
to detect bike lanes in urban scenes. We'll learn how to:
- Load and prepare images for semantic segmentation
- Use a pretrained model for inference
- Visualize and interpret the results


## 2. Model and Processor Setup

Mask2Former is a semantic segmentation model. The Mapillary Vistas
dataset includes 66 classes of street-level objects, including bike lanes (class 7).
To load our model and processor we can write a function such as:

```python
def setup_model():
    # Initialize model and processor
    processor = AutoImageProcessor.from_pretrained(
        "facebook/mask2former-swin-large-mapillary-vistas-semantic"
    )
    mask2former = Mask2FormerForUniversalSegmentation.from_pretrained(
        "facebook/mask2former-swin-large-mapillary-vistas-semantic"
    )
    
    # Move to GPU if available
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")
    mask2former.to(device)
    
    return processor, mask2former, device
```

## 3. Image Loading and Batch Processing

We'll create functions to load and process images in batches. This is more efficient
than processing single images, especially when using a GPU.

```python
def load_batch_images(image_paths, batch_size=4):
    """
    Load a batch of images from given paths
    """
    batch_images = []
    for img_path in image_paths[:batch_size]:
        try:
            image = Image.open(img_path).convert('RGB')
            batch_images.append(image)
        except Exception as e:
            print(f"Error loading {img_path}: {e}")
    return batch_images
```

## 4. Model Inference

Here's a sample function to run inference on our batch of images.
The model will identify all semantic segments in the images, but we'll
focus specifically on bike lanes (class 7).

```python
def run_inference(processor, mask2former, batch_images, device):
    # Prepare batch for processing
    inputs = processor(images=batch_images, return_tensors="pt")
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    # Run inference
    with torch.no_grad():
        outputs = mask2former(**inputs)
    
    # Post-process results
    predicted_maps = processor.post_process_semantic_segmentation(
        outputs,
        target_sizes=[[img.size[1], img.size[0]] for img in batch_images]
    )
    
    return predicted_maps
```


## 5. Visualization

To understand our results, we'll create a visualization function that shows
the original images alongside the detected bike lanes.

```python
def visualize_bike_lanes(images, segmentation_maps, alpha=0.6):
    """
    Visualize one batch of images with bike lanes highlighted
    """
    batch_size = len(images)
    
    fig, axes = plt.subplots(batch_size, 2, figsize=(12, 4*batch_size))
    if batch_size == 1:
        axes = axes.reshape(1, -1)
    
    for idx in range(batch_size):
        # Original image
        axes[idx, 0].imshow(images[idx])
        axes[idx, 0].set_title('Original Image')
        axes[idx, 0].axis('off')
        
        # Create bike lane overlay
        image_array = np.array(images[idx])
        seg_map = segmentation_maps[idx].cpu().numpy()
        bike_lane_mask = seg_map == 7
        
        # Create overlay
        overlay = image_array.copy()
        overlay[bike_lane_mask] = [255, 0, 0]  # Red color for bike lanes
        blended = (alpha * overlay + (1-alpha) * image_array).astype(np.uint8)
        
        # Display overlay
        axes[idx, 1].imshow(blended)
        axes[idx, 1].set_title('Bike Lane Detection')
        axes[idx, 1].axis('off')
        
        # Add detection status
        has_bike_lanes = np.any(bike_lane_mask)
        text_color = 'green' if has_bike_lanes else 'red'
        status_text = 'Bike Lanes Detected' if has_bike_lanes else 'No Bike Lanes Detected'
        axes[idx, 1].text(0.5, -0.1, status_text,
                         color=text_color,
                         ha='center',
                         transform=axes[idx, 1].transAxes,
                         fontsize=10,
                         fontweight='bold')
    
    plt.tight_layout()
    plt.show()
```

## 6. Inference

In [2]:
import torch
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
from PIL import Image
import os
from tqdm import tqdm

def process_training_set(train_dir):
    """
    Process all images in training set to detect bike lanes using Mask2Former

    Args:
        train_dir (str): Path to training directory containing images

    Returns:
        dict: Dictionary mapping image paths to boolean indicating bike lane presence
    """
    # Initialize model and processor with correct classes
    processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-mapillary-vistas-semantic")
    mask2former = Mask2FormerForUniversalSegmentation.from_pretrained(
        "facebook/mask2former-swin-large-mapillary-vistas-semantic"
    )

    # Move model to GPU if available
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    mask2former.to(device)

    # Dictionary to store results
    results = {}

    # Get all image files recursively
    image_files = []
    for root, _, files in os.walk(train_dir):
        for file in files:
            if file.lower().endswith(('.png', '.jpg', '.jpeg')):
                image_files.append(os.path.join(root, file))

    # Process images in batches
    batch_size = 4  # Adjust based on your GPU memory

    for i in tqdm(range(0, len(image_files), batch_size)):
        batch_paths = image_files[i:i + batch_size]
        batch_images = []

        # Load and prepare batch
        for img_path in batch_paths:
            try:
                image = Image.open(img_path).convert('RGB')
                batch_images.append(image)
            except Exception as e:
                print(f"Error loading {img_path}: {e}")
                continue

        if not batch_images:
            continue

        # Prepare batch for processing
        inputs = processor(images=batch_images, return_tensors="pt")
        inputs = {k: v.to(device) for k, v in inputs.items()}

        # Run inference
        with torch.no_grad():
            outputs = mask2former(**inputs)

        # Post-process results
        predicted_maps = processor.post_process_semantic_segmentation(
            outputs,
            target_sizes=[[img.size[1], img.size[0]] for img in batch_images]
        )

        # Check for bike lanes in each image
        for img_path, pred_map in zip(batch_paths, predicted_maps):
            # Mapillary Vistas class 7 corresponds to bike lanes
            contains_bike_lane = torch.any(pred_map == 7).item()
            results[img_path] = contains_bike_lane

    return results

def analyze_results(results):
    """
    Analyze and print summary statistics of bike lane detection results

    Args:
        results (dict): Dictionary mapping image paths to bike lane detection results
    """
    total_images = len(results)
    images_with_lanes = sum(1 for v in results.values() if v)

    print(f"\nAnalysis Results:")
    print(f"Total images processed: {total_images}")
    print(f"Images with bike lanes: {images_with_lanes} ({(images_with_lanes/total_images)*100:.2f}%)")
    print(f"Images without bike lanes: {total_images - images_with_lanes} ({((total_images-images_with_lanes)/total_images)*100:.2f}%)")

## 7. Compute How Many of the Images Show a Bike Lane

In [3]:
# Example usage
# You can also try images from the test folder, the model is pretrained.
# We are not really using the training set to 'train' we only do inference
train_dir = "/kaggle/input/roads-of-berlin/train"

print("Starting bike lane detection on training set...")
results = process_training_set(train_dir)

# Analyze results
analyze_results(results)

Starting bike lane detection on training set...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/536 [00:00<?, ?B/s]

  return func(*args, **kwargs)


config.json:   0%|          | 0.00/79.5k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/866M [00:00<?, ?B/s]

0it [00:00, ?it/s]



Analysis Results:
Total images processed: 0


ZeroDivisionError: division by zero

In [None]:
# Save results to file
import json
with open("bike_lane_detection_results.json", "w") as f:
    json.dump(results, f, indent=4)

## 8. Visualize a batch of predictions

In [None]:
import torch
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import os

def visualize_bike_lanes(images, segmentation_maps, alpha=0.6):
    """
    Visualize one batch of images with bike lanes highlighted

    Args:
        images: List of PIL images
        segmentation_maps: List of segmentation maps
        alpha: Transparency of the overlay (0-1)
    """
    batch_size = len(images)

    # Create figure
    fig, axes = plt.subplots(batch_size, 2, figsize=(12, 4*batch_size))
    if batch_size == 1:
        axes = axes.reshape(1, -1)

    for idx in range(batch_size):
        # Original image
        axes[idx, 0].imshow(images[idx])
        axes[idx, 0].set_title('Original Image')
        axes[idx, 0].axis('off')

        # Create bike lane overlay
        image_array = np.array(images[idx])
        seg_map = segmentation_maps[idx].cpu().numpy()

        # Create mask for bike lanes (class 7)
        bike_lane_mask = seg_map == 7

        # Create overlay
        overlay = image_array.copy()
        # Add red highlight for bike lanes
        overlay[bike_lane_mask] = [255, 0, 0]  # Red color for bike lanes

        # Blend with original image
        blended = (alpha * overlay + (1-alpha) * image_array).astype(np.uint8)

        # Display overlay
        axes[idx, 1].imshow(blended)
        axes[idx, 1].set_title('Bike Lane Detection')
        axes[idx, 1].axis('off')

        # Add text indicating if bike lanes were detected
        has_bike_lanes = np.any(bike_lane_mask)
        text_color = 'green' if has_bike_lanes else 'red'
        status_text = 'Bike Lanes Detected' if has_bike_lanes else 'No Bike Lanes Detected'
        axes[idx, 1].text(0.5, -0.1, status_text,
                         color=text_color,
                         ha='center',
                         transform=axes[idx, 1].transAxes,
                         fontsize=10,
                         fontweight='bold')

    plt.tight_layout()
    plt.show()

def process_and_visualize_batch(image_paths, batch_size=4):
    # Initialize model and processor
    processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-mapillary-vistas-semantic")
    mask2former = Mask2FormerForUniversalSegmentation.from_pretrained(
        "facebook/mask2former-swin-large-mapillary-vistas-semantic"
    )

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    mask2former.to(device)

    # Load images
    batch_images = []
    for img_path in image_paths[:batch_size]:
        image = Image.open(img_path).convert('RGB')
        batch_images.append(image)

    # Process batch
    inputs = processor(images=batch_images, return_tensors="pt")
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Run inference
    with torch.no_grad():
        outputs = mask2former(**inputs)

    # Post-process results
    predicted_maps = processor.post_process_semantic_segmentation(
        outputs,
        target_sizes=[[img.size[1], img.size[0]] for img in batch_images]
    )

    # Visualize results
    visualize_bike_lanes(batch_images, predicted_maps)

    return batch_images, predicted_maps

In [None]:
train_dir = "/kaggle/input/roads-of-berlin/train"

# Get first few images from directory
image_paths = []
for root, _, files in os.walk(train_dir):
    for file in files:
        if file.lower().endswith(('.png', '.jpg', '.jpeg')):
            image_paths.append(os.path.join(root, file))
    if len(image_paths) >= 4:  # Just get enough for one batch
        break

# Process and visualize batch
batch_images, predicted_maps = process_and_visualize_batch(image_paths)

## 9. Tasks



### **Beginner Level**
1. **Understand the Model and its Output**:
   - Explain what [Mask2Former](https://arxiv.org/abs/2112.01527) is and its use in segmentation tasks.
   - Visualize a mask that shows all classes that the model can label, not only bike lanes.
     

2. **Explore the Output on More Images**:
   - Modify the code that shows the segmentations overlaid on the images so that you can see predictions on the next batches.
   - Use the pretrained Mask2Former model on provided images from the test folder and visualize the output.

---

### **Intermediate Level**
1. **Evaluate the Transformations of Preprocessing**:
   - Visualize the output of the preprocessing transformations and explain the role that they have.

2. **Quantify Percentage of Bike Lanes on Images**:
   - Modify the plots to show the percentage of pixels in the image that show a bike lane

3. **Experiment with Confidence Thresholds**:
   - Experiment with model inference settings like confidence thresholds impact on results.
  Consider using the following function to manipulate the outputs:

```
def run_inference_with_threshold(processor, mask2former, batch_images, device, threshold=0.5):
    """
    Run inference with adjustable confidence threshold
    
    Args:
        processor: Mask2Former processor
        mask2former: Mask2Former model
        batch_images: List of input images
        device: torch device
        threshold: Confidence threshold for predictions (default: 0.5)
    """
    # Prepare batch for processing
    inputs = processor(images=batch_images, return_tensors="pt")
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    # Run inference
    with torch.no_grad():
        outputs = mask2former(**inputs)
    
    # Get logits for semantic segmentation
    logits = outputs.logits  # Shape: (batch_size, num_classes, height, width)
    
    # Apply softmax to get probabilities
    probs = torch.softmax(logits, dim=1)
    
    # Get bike lane probabilities (class 7)
    bike_lane_probs = probs[:, 7, :, :]
    
    # Apply threshold to bike lane predictions
    bike_lane_mask = (bike_lane_probs > threshold).int()
    
    # Create final segmentation maps
    predicted_maps = []
    for idx in range(len(batch_images)):
        # Create empty prediction map
        pred_map = torch.zeros_like(bike_lane_mask[idx])
        # Set bike lane pixels based on threshold
        pred_map[bike_lane_mask[idx] == 1] = 7
        predicted_maps.append(pred_map)
    
    return predicted_maps, bike_lane_probs
```

4. **Visualize Outputs with a Widget**:
   - Create a small utility to visually compare the output of different prediction thresholds.

5. **Incorporate Bike Lane Masks into an Image Embedding Pipeline**:
   - Overlay bike lane masks into images before feeding them to a segmentation model. What effect does this have on image similarity queries?