# Keyframe Scene detection with Labellerr SDK

This notebook demonstrates how to use the Labellerr SDK for video processing and scene detection. The SDK provides powerful tools for managing video datasets, processing videos, and detecting scene changes using various algorithms.


In [1]:
from labellerr.client import LabellerrClient
from labellerr.core.datasets import create_dataset_from_local, LabellerrDataset
from labellerr.core.annotation_templates import create_template
from labellerr.core.projects import create_project
from labellerr.core.schemas.annotation_templates import AnnotationQuestion, QuestionType, CreateTemplateParams, DatasetDataType
from labellerr.core.schemas import DatasetConfig
from labellerr.core.schemas.projects import CreateProjectParams, RotationConfig

import uuid
from pathlib import Path
import os


---
## ***Authentication Setup***

Before using the Labellerr SDK, you need to set up your authentication credentials. These credentials ensure secure access to the Labellerr platform and its services.

### Required Credentials:

1. **API Key & API Secret**
   - Log in to your Labellerr account
   - Navigate to the "Get API" tab
   - Copy your unique API key and secret

2. **Client ID**
   - This is a unique identifier for your application
   - Contact Labellerr support to obtain your client ID


In [2]:
from dotenv import dotenv_values
config = dotenv_values(".env")

api_key = config["API_KEY"]
api_secret = config["API_SECRET"]
client_id = config["CLIENT_ID"]
email = config["EMAIL"]

client = LabellerrClient(api_key, api_secret, client_id)

---
## ***Kaggle Dataset Download***

Before downloading the dataset from Kaggle, you need to:

1. Install kagglehub package using pip
2. Authenticate with Kaggle
3. Download the CCTV footage dataset

The kagglehub package provides a simple interface to download datasets directly from Kaggle. Make sure you have a Kaggle account and API credentials set up before proceeding.

Note: If you haven't set up Kaggle authentication before, you'll need to:
1. Create a Kaggle account at https://www.kaggle.com
2. Go to "Account" settings
3. Scroll to API section and click "Create New API Token"
4. This will download a kaggle.json file with your credentials

In [None]:
# !pip install kagglehub ipywidgets

In [None]:
import kagglehub

kagglehub.login()

In [None]:
# Download 1000 videos(~1 min) dataset

# large video dataset(1000 videos)
# path_to_dataset = kagglehub.dataset_download("yashsuman/cctv-footage")

# small video dataset(5 videos)
path_to_dataset = kagglehub.dataset_download("mistag/short-videos")

print("Path to dataset files:", path_to_dataset)

---
## ***Video Project Creation***

Create a Labellerr Video Project with kaggle dataset

In [3]:
# DATASET_PATH = path_to_dataset
KAGGLE_DATASET_PATH = Path(r"..\..\..\.cache\kagglehub\datasets\mistag\short-videos\versions\4")

KAGGLE_DATASET_PATH.exists()

True

### Create Labellerr Dataset

In [None]:
# import logging

# logging.basicConfig(level=logging.DEBUG)
# logger = logging.getLogger(__name__)


dataset = create_dataset_from_local(
        client=client,
        dataset_config=DatasetConfig(dataset_name="SDK VIDEO DATASET", 
                                     data_type="video"),
        folder_to_upload=KAGGLE_DATASET_PATH,
    )

INFO:root:Total file count: 2
INFO:root:Total file size: 42.2 MB
INFO:root:CPU count: 24, Batch Count: 2
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api-gateway-qcb3iv2gaa-uc.a.run.app:443
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api-gateway-qcb3iv2gaa-uc.a.run.app:443
DEBUG:urllib3.connectionpool:https://api-gateway-qcb3iv2gaa-uc.a.run.app:443 "POST /connectors/connect/local?client_id=1 HTTP/1.1" 200 1085
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): storage.googleapis.com:443
DEBUG:urllib3.connectionpool:https://api-gateway-qcb3iv2gaa-uc.a.run.app:443 "POST /connectors/connect/local?client_id=1 HTTP/1.1" 200 1079
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): storage.googleapis.com:443
DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "POST /labellerr-connector-files-dev/local_upload/fa03a1f3-3b77-42f9-b8de-eef499af4ee9/seafood_1280p.mp4?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=loca

In [7]:
dataset.status()
dataset.dataset_id
dataset.files_count

DEBUG:urllib3.connectionpool:https://api-gateway-qcb3iv2gaa-uc.a.run.app:443 "GET /datasets/354681d3-034a-4d66-b070-365f4bd11d8a?client_id=1&uuid=d4eef5b7-31da-4188-87d1-cb42176f6f5a HTTP/1.1" 200 361
DEBUG:urllib3.connectionpool:https://api-gateway-qcb3iv2gaa-uc.a.run.app:443 "GET /datasets/354681d3-034a-4d66-b070-365f4bd11d8a?client_id=1&uuid=85af0df1-2466-4987-bda3-36bf7bc8f87b HTTP/1.1" 200 361
DEBUG:urllib3.connectionpool:https://api-gateway-qcb3iv2gaa-uc.a.run.app:443 "GET /datasets/354681d3-034a-4d66-b070-365f4bd11d8a?client_id=1&uuid=d73a6c0a-9077-49f0-939b-1ea2cb4bb04d HTTP/1.1" 200 361
DEBUG:urllib3.connectionpool:https://api-gateway-qcb3iv2gaa-uc.a.run.app:443 "GET /datasets/354681d3-034a-4d66-b070-365f4bd11d8a?client_id=1&uuid=81833057-1650-425a-b547-94fae62e7de9 HTTP/1.1" 200 422
DEBUG:urllib3.connectionpool:https://api-gateway-qcb3iv2gaa-uc.a.run.app:443 "GET /datasets/354681d3-034a-4d66-b070-365f4bd11d8a?client_id=1&uuid=9b4ea0fe-6ed8-48f0-adb4-f5da409a70a0 HTTP/1.1" 200

2

In [4]:
dataset = LabellerrDataset(client=client,
                           dataset_id="354681d3-034a-4d66-b070-365f4bd11d8a")

### Create Labellerr Annotation Template

In [6]:
template = create_template(
    client=client,
    params=CreateTemplateParams(
        template_name="SDK VIDEO TEMPLATE",
        data_type=DatasetDataType.video,
        questions=[
            AnnotationQuestion(
                question_number=1,
                question="Class polygon ",
                question_id=str(uuid.uuid4()),
                question_type=QuestionType.polygon,
                required=True,
                color="#FF0000"
            )
        ]
    )
)

In [None]:
template.annotation_template_id

In [5]:
from labellerr.core.annotation_templates import LabellerrAnnotationTemplate
template = LabellerrAnnotationTemplate(client=client,
                                      annotation_template_id='35d44c7d-9b02-4eb0-9dee-9a7ff1165331')

### Create Labellerr Project

In [None]:
video_project = create_project(
    client=client,
    params=CreateProjectParams(
        project_name="SDK VIDEO PROJECT",
        data_type=DatasetDataType.video,
        rotations=RotationConfig(
            annotation_rotation_count=1,
            review_rotation_count=1,
            client_review_rotation_count=1
        )
    ),
    datasets=[dataset],
    annotation_template=template
)

In [None]:
video_project.project_id

'gusella_late_marmoset_23922'

In [None]:
from labellerr.core.projects import LabellerrProject
video_project = LabellerrProject(client=client,
                          project_id='gusella_late_marmoset_23922')

---
## ***Download Labellerr Indexed Dataset***

In [7]:
dataset.download()


######################################################################
# Starting batch video processing for dataset: 354681d3-034a-4d66-b070-365f4bd11d8a
######################################################################

Fetching files for dataset: 354681d3-034a-4d66-b070-365f4bd11d8a
{'message': '200: Success', 'response': {'files': [{'has_embedding': False, 'file_id': '2a8d96ca-9161-4dee-ad3b-a5faf301bc6c', 'created_at': 1763374470963, 'file_name_original': 'butterflies_960p.mp4', 'dataset_id': '354681d3-034a-4d66-b070-365f4bd11d8a', 'connection_id': 'fa03a1f3-3b77-42f9-b8de-eef499af4ee9', 'email_id': 'e0811e.ba8447468b95374970256d3c2b', 'file_name': 'butterflies_960p.mp4', 'file_reference': 'gs://labellerr-connector-files-dev/local_upload/fa03a1f3-3b77-42f9-b8de-eef499af4ee9/butterflies_960p.mp4', 'file_metadata': {'file_size': 25.047, 'image_width': None, 'file_format': 'mp4', 'additional_metadata': None, 'image_height': None}, 'created_by': 'e0811e.ba8447468b95374970256d3c2

[{'status': 'success',
  'file_id': '2a8d96ca-9161-4dee-ad3b-a5faf301bc6c',
  'dataset_id': '354681d3-034a-4d66-b070-365f4bd11d8a',
  'video_path': './Labellerr_datastets\\354681d3-034a-4d66-b070-365f4bd11d8a\\2a8d96ca-9161-4dee-ad3b-a5faf301bc6c.mp4',
  'output_folder': './Labellerr_datastets\\354681d3-034a-4d66-b070-365f4bd11d8a',
  'frames_downloaded': 1572,
  'frames_failed': 0,
  'failed_frames_info': []},
 {'status': 'success',
  'file_id': '7db3f60c-f6e5-4d3d-a63b-cb38530ee265',
  'dataset_id': '354681d3-034a-4d66-b070-365f4bd11d8a',
  'video_path': './Labellerr_datastets\\354681d3-034a-4d66-b070-365f4bd11d8a\\7db3f60c-f6e5-4d3d-a63b-cb38530ee265.mp4',
  'output_folder': './Labellerr_datastets\\354681d3-034a-4d66-b070-365f4bd11d8a',
  'frames_downloaded': 389,
  'frames_failed': 0,
  'failed_frames_info': []}]

---
## ***Scene Change Detection on Dataset***

Labellerr SDK provides multiple algorithms for scene detection in videos:

1. **PySceneDetect**: 
   - Python-based scene detection
   - Uses content-aware detection
   - Good for general-purpose scene detection

2. **SSIMSceneDetect**:
   - Uses Structural Similarity Index (SSIM)
   - Better for detecting subtle scene changes
   - More computationally intensive but more accurate

3. **FFMPEGSceneDetect**:
   - Uses FFMPEG for scene detection
   - Fastest method
   - Good for quick analysis of large video files

Choose the method that best suits your needs based on accuracy requirements and processing speed constraints.

In [None]:
# !pip install opencv-python pillow scenedetect scikit-image

In [None]:
from labellerr.services.video_sampling import PySceneDetect

  from .autonotebook import tqdm as notebook_tqdm


### Scene Detection Implementation


In [8]:
dataset_dir = Path(f".\\Labellerr_datasets\\{dataset.dataset_id}")

if dataset_dir.exists():
    print("Path exists ✅")
else:
    print("Path does not exist ❌")


Path exists ✅


In [14]:
detector = PySceneDetect()

In [15]:
for filename in os.listdir(dataset_dir):
    file_path = os.path.join(dataset_dir, filename)
    
    if os.path.isfile(file_path):
        detector.detect_and_extract(file_path)

JSON mapping saved to: PyScene_detects\354681d3-034a-4d66-b070-365f4bd11d8a\2a8d96ca-9161-4dee-ad3b-a5faf301bc6c\2a8d96ca-9161-4dee-ad3b-a5faf301bc6c_mapping.json
JSON mapping saved to: PyScene_detects\354681d3-034a-4d66-b070-365f4bd11d8a\7db3f60c-f6e5-4d3d-a63b-cb38530ee265\7db3f60c-f6e5-4d3d-a63b-cb38530ee265_mapping.json


---
## ***Image Project Creation***

Create Image project of extracted keyframe from video

### Create Labellerr Dataset of keyframe

In [None]:
dataset = create_dataset_from_local(
        client=client,
        dataset_config=DatasetConfig(dataset_name="SDK VIDEO KEYFRAME DATASET", 
                                     data_type="image"),
        folder_to_upload=dataset_dir,
    )

### Create Annotation template of Keyframe Image Project

In [None]:
template = create_template(
    client=client,
    params=CreateTemplateParams(
        template_name="SDK VIDEO KEYFRAME DATASET",
        data_type=DatasetDataType.image,
        questions=[
            AnnotationQuestion(
                question_number=1,
                question="Class polygon ",
                question_id=str(uuid.uuid4()),
                question_type=QuestionType.polygon,
                required=True,
                color="#FF0000"
            )
        ]
    )
)

### Create Image Annotation Project

In [None]:
img_project = create_project(
    client=client,
    params=CreateProjectParams(
        project_name="SDK VIDEO PROJECT",
        data_type=DatasetDataType.video,
        rotations=RotationConfig(
            annotation_rotation_count=1,
            review_rotation_count=1,
            client_review_rotation_count=1
        )
    ),
    datasets=[dataset],
    annotation_template=template
)

---
## ***Performing Annotations of Keyframe Image Project***

In [None]:
# annotations of image project on labellerr platform

### Downloading the Annotation

---
## ***Uploading KeyFrames Pre-Annotation to Video Project***

### Converting Annotation JSON to required format

### Uploading pre-annotation

In [None]:
VIDEO_JSON_PATH = r"path_to_your_video_preannotation_file.json"

video_project.upload_preannotations(video_json_file_path=VIDEO_JSON_PATH)