# ZOD (Zenseact Open Dataset) - Complete Dataset Overview

This notebook provides a complete overview and analysis of the **Zenseact Open Dataset (ZOD)** - one of the world's largest and most comprehensive autonomous driving datasets. We'll explore the massive dataset we've downloaded, analyze its structure, and demonstrate various data types including camera images, LiDAR point clouds, and rich annotations.

### üìä Dataset Highlights:
- **Scale**: 100,000 frames (1.4TB extracted)
- **Sensors**: Multi-camera setup + LiDAR Velodyne
- **Annotations**: Rich 3D bounding boxes and object detection
- **Location**: Swedish driving conditions and scenarios
- **Quality**: High-resolution sensor data with precise calibration

### üéØ Analysis Goals:
1. **File Structure Analysis**: Understand downloaded and extracted data organization
2. **Data Type Exploration**: Camera images, LiDAR point clouds, annotations
3. **Sample Visualizations**: Display various sensor modalities
4. **Dataset Statistics**: Comprehensive size and content analysis
5. **Usage Examples**: Practical code for working with ZOD data

## 1. Import Required Libraries

In [1]:
import os
import glob
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from PIL import Image
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("Set2")

print("‚úÖ Libraries imported successfully!")
print("üöó Ready for ZOD dataset analysis")

‚úÖ Libraries imported successfully!
üöó Ready for ZOD dataset analysis


## 2. Dataset Paths and Configuration

Let's set up the paths to our ZOD dataset and configure basic parameters.

In [13]:
# Dataset paths
ZOD_DATA_PATH = "/media/tom/ml/zod-data"
SINGLE_FRAMES_PATH = os.path.join(ZOD_DATA_PATH, "single_frames")
DOWNLOADS_PATH = os.path.join(ZOD_DATA_PATH, "downloads")

# Project paths
PROJECT_PATH = "/media/tom/ml/projects/clft-zod"
OUTPUT_PATH = os.path.join(PROJECT_PATH, "output")
ANALYSIS_PATH = os.path.join(OUTPUT_PATH, "analysis")

print(f"üóÇÔ∏è  ZOD Data Path: {ZOD_DATA_PATH}")
print(f"üìÅ Single Frames: {SINGLE_FRAMES_PATH}")
print(f"üì• Downloads: {DOWNLOADS_PATH}")
print(f"üî¨ Analysis Output: {ANALYSIS_PATH}")

# Check if paths exist
paths_status = {
    "ZOD Data": os.path.exists(ZOD_DATA_PATH),
    "Single Frames": os.path.exists(SINGLE_FRAMES_PATH),
    "Downloads": os.path.exists(DOWNLOADS_PATH),
    "Analysis Output": os.path.exists(ANALYSIS_PATH)
}

print("\nüìä Path Status:")
for path_name, exists in paths_status.items():
    status = "‚úÖ" if exists else "‚ùå"
    print(f"{status} {path_name}: {'Found' if exists else 'Not found'}")

üóÇÔ∏è  ZOD Data Path: /media/tom/ml/zod-data
üìÅ Single Frames: /media/tom/ml/zod-data/single_frames
üì• Downloads: /media/tom/ml/zod-data/downloads
üî¨ Analysis Output: /media/tom/ml/projects/clft-zod/output/analysis

üìä Path Status:
‚úÖ ZOD Data: Found
‚úÖ Single Frames: Found
‚úÖ Downloads: Found
‚úÖ Analysis Output: Found


## 3. Dataset Size Analysis

Let's analyze the size and structure of our downloaded ZOD dataset.

In [14]:
def get_directory_size(path):
    """Calculate the total size of a directory in GB"""
    total_size = 0
    if os.path.exists(path):
        for dirpath, dirnames, filenames in os.walk(path):
            for filename in filenames:
                filepath = os.path.join(dirpath, filename)
                try:
                    total_size += os.path.getsize(filepath)
                except (OSError, FileNotFoundError):
                    pass
    return total_size / (1024**3)  # Convert to GB

def count_directories_and_files(path):
    """Count directories and files in a path"""
    dirs = 0
    files = 0
    if os.path.exists(path):
        for dirpath, dirnames, filenames in os.walk(path):
            dirs += len(dirnames)
            files += len(filenames)
    return dirs, files

# Analyze dataset sizes
if os.path.exists(ZOD_DATA_PATH):
    print("üìä ZOD Dataset Analysis")
    print("=" * 50)
    
    # Downloads analysis
    downloads_size = get_directory_size(DOWNLOADS_PATH)
    downloads_dirs, downloads_files = count_directories_and_files(DOWNLOADS_PATH)
    
    # Single frames analysis  
    frames_size = get_directory_size(SINGLE_FRAMES_PATH)
    frames_dirs, frames_files = count_directories_and_files(SINGLE_FRAMES_PATH)
    
    # Total dataset size
    total_size = get_directory_size(ZOD_DATA_PATH)
    
    print(f"üì• Downloads folder: {downloads_size:.1f} GB")
    print(f"   ‚îî‚îÄ‚îÄ Directories: {downloads_dirs:,}")
    print(f"   ‚îî‚îÄ‚îÄ Files: {downloads_files:,}")
    
    print(f"üìÅ Single frames: {frames_size:.1f} GB") 
    print(f"   ‚îî‚îÄ‚îÄ Directories: {frames_dirs:,}")
    print(f"   ‚îî‚îÄ‚îÄ Files: {frames_files:,}")
    
    print(f"üî¢ Total dataset: {total_size:.1f} GB")
    
    # Compression ratio
    if downloads_size > 0 and frames_size > 0:
        compression_ratio = downloads_size / frames_size
        print(f"üì¶ Compression ratio: {compression_ratio:.2f} ({frames_size/downloads_size:.1f}x expansion)")
        
else:
    print("‚ùå ZOD dataset path not found!")

üìä ZOD Dataset Analysis
üì• Downloads folder: 1041.1 GB
   ‚îî‚îÄ‚îÄ Directories: 1
   ‚îî‚îÄ‚îÄ Files: 34
üìÅ Single frames: 1333.1 GB
   ‚îî‚îÄ‚îÄ Directories: 400,000
   ‚îî‚îÄ‚îÄ Files: 1,231,936
üî¢ Total dataset: 2374.6 GB
üì¶ Compression ratio: 0.78 (1.3x expansion)
üì• Downloads folder: 1041.1 GB
   ‚îî‚îÄ‚îÄ Directories: 1
   ‚îî‚îÄ‚îÄ Files: 34
üìÅ Single frames: 1333.1 GB
   ‚îî‚îÄ‚îÄ Directories: 400,000
   ‚îî‚îÄ‚îÄ Files: 1,231,936
üî¢ Total dataset: 2374.6 GB
üì¶ Compression ratio: 0.78 (1.3x expansion)


## 4. Frame Structure Exploration

Let's explore the structure of individual frames in the ZOD dataset.

In [23]:
# Get list of frame directories
if os.path.exists(SINGLE_FRAMES_PATH):
    frame_dirs = [d for d in os.listdir(SINGLE_FRAMES_PATH) 
                  if os.path.isdir(os.path.join(SINGLE_FRAMES_PATH, d))]
    frame_dirs.sort()
    
    print(f"üóÇÔ∏è  Found {len(frame_dirs)} frame directories")
    print(f"üìã Sample frame IDs: {frame_dirs[:10]}")
    
    # Analyze structure of first frame
    if frame_dirs:
        sample_frame = frame_dirs[0]
        sample_path = os.path.join(SINGLE_FRAMES_PATH, sample_frame)
        
        print(f"\nüîç Analyzing frame structure: {sample_frame}")
        print("=" * 50)
        
        # List contents
        contents = os.listdir(sample_path)
        contents.sort()
        
        for item in contents:
            item_path = os.path.join(sample_path, item)
            if os.path.isdir(item_path):
                # Count files in subdirectory
                sub_contents = os.listdir(item_path)
                print(f"üìÅ {item}/ ({len(sub_contents)} files)")
            else:
                # Get file size
                size = os.path.getsize(item_path)
                if size > 1024*1024:  # > 1MB
                    size_str = f"{size/(1024*1024):.1f} MB"
                elif size > 1024:  # > 1KB
                    size_str = f"{size/1024:.1f} KB"
                else:
                    size_str = f"{size} bytes"
                print(f"üìÑ {item} ({size_str})")
                
else:
    print("‚ùå Single frames path not found!")

üóÇÔ∏è  Found 100000 frame directories
üìã Sample frame IDs: ['000000', '000001', '000002', '000003', '000004', '000005', '000006', '000007', '000008', '000009']

üîç Analyzing frame structure: 000000
üìÅ annotations/ (4 files)
üìÑ calibration.json (1.1 KB)
üìÅ camera_front_blur/ (1 files)
üìÑ ego_motion.json (10.9 KB)
üìÑ info.json (4.6 KB)
üìÅ lidar_velodyne/ (3 files)
üìÑ metadata.json (479 bytes)


## 5. Sample Data Loading and Visualization

Let's load and visualize sample data from the ZOD dataset to understand the data structure.

In [31]:
def load_sample_frame_data(frame_id):
    """Load sample data from a ZOD frame"""
    frame_path = os.path.join(SINGLE_FRAMES_PATH, frame_id)
    
    if not os.path.exists(frame_path):
        print(f"‚ùå Frame {frame_id} not found!")
        return None
        
    data = {}
    
    # Load JSON files
    json_files = ['metadata.json', 'info.json', 'ego_motion.json', 
                  'object_detection.json', 'calibration.json']
    
    for json_file in json_files:
        json_path = os.path.join(frame_path, json_file)
        if os.path.exists(json_path):
            try:
                with open(json_path, 'r') as f:
                    data[json_file.replace('.json', '')] = json.load(f)
                print(f"‚úÖ Loaded {json_file}")
            except Exception as e:
                print(f"‚ùå Error loading {json_file}: {e}")
        else:
            print(f"‚ö†Ô∏è  {json_file} not found")
    
    # Check for camera images
    camera_dir = os.path.join(frame_path, 'camera_front_blur')
    if os.path.exists(camera_dir):
        camera_files = [f for f in os.listdir(camera_dir) if f.endswith(('.jpg', '.png'))]
        data['camera_files'] = camera_files
        print(f"üì∑ Found {len(camera_files)} camera images")
    
    # Check for LiDAR data
    lidar_dir = os.path.join(frame_path, 'lidar_velodyne')
    if os.path.exists(lidar_dir):
        lidar_files = [f for f in os.listdir(lidar_dir) if f.endswith('.bin')]
        data['lidar_files'] = lidar_files
        print(f"üì° Found {len(lidar_files)} LiDAR files")
        
    # Check for annotations
    annotations_dir = os.path.join(frame_path, 'annotations')
    if os.path.exists(annotations_dir):
        annotation_files = [f for f in os.listdir(annotations_dir)]
        data['annotation_files'] = annotation_files
        print(f"üè∑Ô∏è  Found {len(annotation_files)} annotation files")
    
    return data

# Load sample frame
if 'frame_dirs' in locals() and frame_dirs:
    sample_frame_id = frame_dirs[0]
    print(f"üîç Loading sample frame: {sample_frame_id}")
    print("=" * 50)
    
    sample_data = load_sample_frame_data(sample_frame_id)
else:
    print("‚ùå No frame directories available")

üîç Loading sample frame: 000000
‚úÖ Loaded metadata.json
‚úÖ Loaded info.json
‚úÖ Loaded ego_motion.json
‚ö†Ô∏è  object_detection.json not found
‚úÖ Loaded calibration.json
üì∑ Found 1 camera images
üì° Found 0 LiDAR files
üè∑Ô∏è  Found 4 annotation files


### 5.1 Metadata Analysis

Let's examine the metadata structure from our sample frame.

In [38]:
# Analyze metadata structure
if 'sample_data' in locals() and sample_data and 'metadata' in sample_data:
    metadata = sample_data['metadata']
    
    print("üìä Metadata Structure Analysis")
    print("=" * 50)
    
    # Display key metadata fields
    key_fields = ['timestamp', 'weather', 'timeofday', 'road_type', 'location']
    
    for field in key_fields:
        if field in metadata:
            value = metadata[field]
            print(f"üîπ {field}: {value}")
    
    # Show all available fields
    print(f"\nüìã All metadata fields ({len(metadata)} total):")
    for key in sorted(metadata.keys()):
        value_type = type(metadata[key]).__name__
        print(f"   ‚Ä¢ {key} ({value_type})")
        
    # Display weather and time information
    if 'weather' in metadata and 'timeofday' in metadata:
        print(f"\nüå§Ô∏è  Scene conditions:")
        print(f"   Weather: {metadata['weather']}")
        print(f"   Time of day: {metadata['timeofday']}")
        
else:
    print("‚ùå No metadata available in sample data")

üìä Metadata Structure Analysis
üîπ road_type: city

üìã All metadata fields (17 total):
   ‚Ä¢ collection_car (str)
   ‚Ä¢ country_code (str)
   ‚Ä¢ frame_id (str)
   ‚Ä¢ latitude (float)
   ‚Ä¢ longitude (float)
   ‚Ä¢ num_lane_instances (int)
   ‚Ä¢ num_pedestrians (int)
   ‚Ä¢ num_traffic_lights (int)
   ‚Ä¢ num_traffic_signs (int)
   ‚Ä¢ num_vehicles (int)
   ‚Ä¢ num_vulnerable_vehicles (int)
   ‚Ä¢ road_condition (str)
   ‚Ä¢ road_type (str)
   ‚Ä¢ scraped_weather (str)
   ‚Ä¢ solar_angle_elevation (float)
   ‚Ä¢ time (str)
   ‚Ä¢ time_of_day (str)
