<a href="https://colab.research.google.com/github/thunwaaa/sign_language/blob/main/examples/hand_landmarker/python/hand_landmarker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2023 The MediaPipe Authors. All Rights Reserved.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Hand Landmarks Detection with MediaPipe Tasks

This notebook shows you how to use MediaPipe Tasks Python API to detect hand landmarks from images.

## Preparation

Let's start with installing MediaPipe.

In [1]:
!pip install protobuf>=5.26.1,<6.0

/bin/bash: line 1: 6.0: No such file or directory


In [2]:
!pip install opencv-python



In [3]:
!pip install mediapipe --upgrade



In [4]:
!pip install numpy==1.26.0



Then download an off-the-shelf model bundle. Check out the [MediaPipe documentation](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker#models) for more information about this model bundle.

In [5]:
!wget -q https://storage.googleapis.com/mediapipe-models/hand_landmarker/hand_landmarker/float16/1/hand_landmarker.task

## Visualization utilities

In [6]:
#@markdown We implemented some functions to visualize the hand landmark detection results. <br/> Run the following cell to activate the functions.

import cv2
import numpy as np
import mediapipe as mp
from mediapipe import solutions

# Setup MediaPipe solutions
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles

# Constants
MARGIN = 10  # pixels
FONT_SIZE = 1
FONT_THICKNESS = 1
HANDEDNESS_TEXT_COLOR = (88, 205, 54)  # vibrant green

def draw_landmarks_on_image(rgb_image, detection_result):
    hand_landmarks_list = detection_result.multi_hand_landmarks
    handedness_list = detection_result.multi_handedness
    annotated_image = np.copy(rgb_image)

    # Loop through the detected hands to visualize.
    for idx in range(len(hand_landmarks_list)):
        hand_landmarks = hand_landmarks_list[idx]
        handedness = handedness_list[idx]

        # Draw the hand landmarks directly using MediaPipe's drawing utilities
        mp_drawing.draw_landmarks(
            annotated_image,
            hand_landmarks,
            mp_hands.HAND_CONNECTIONS,
            mp_drawing_styles.get_default_hand_landmarks_style(),
            mp_drawing_styles.get_default_hand_connections_style())

        # Get the top left corner of the detected hand's bounding box.
        height, width, _ = annotated_image.shape
        x_coordinates = [landmark.x for landmark in hand_landmarks.landmark]
        y_coordinates = [landmark.y for landmark in hand_landmarks.landmark]
        text_x = int(min(x_coordinates) * width)
        text_y = int(min(y_coordinates) * height) - MARGIN

        # Draw handedness (left or right hand) on the image.
        cv2.putText(annotated_image, f"{handedness.classification[0].label}",
                    (text_x, text_y), cv2.FONT_HERSHEY_DUPLEX,
                    FONT_SIZE, HANDEDNESS_TEXT_COLOR, FONT_THICKNESS, cv2.LINE_AA)

    return annotated_image

## Download test image

Let's grab a test image that we'll use later. The image is from [Unsplash](https://unsplash.com/photos/mt2fyrdXxzk).

In [None]:
#

Optionally, you can upload your own image. If you want to do so, uncomment and run the cell below.

In [None]:
# @title
import cv2
import mediapipe as mp
import numpy as np
import time
import os

from google.colab import files
# อัปโหลดวิดีโอหลายๆ ไฟล์
uploaded = files.upload()

# แสดงรายชื่อไฟล์วิดีโอที่อัปโหลด
print("ไฟล์วิดีโอที่อัปโหลด:")
for filename in uploaded.keys():
    print(f"- {filename}")

# เก็บพาธของไฟล์วิดีโอทั้งหมด
video_paths = list(uploaded.keys())

## Running inference and visualizing the results

Here are the steps to run hand landmark detection using MediaPipe.

Check out the [MediaPipe documentation](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker/python) to learn more about configuration options that this solution supports.


In [None]:
# @title
# import urllib.request

# # MediaPipe initialization
# BaseOptions = mp.tasks.BaseOptions
# HandLandmarker = mp.tasks.vision.HandLandmarker
# HandLandmarkerOptions = mp.tasks.vision.HandLandmarkerOptions
# VisionRunningMode = mp.tasks.vision.RunningMode

# # Create a hand landmarker instance with the video mode
# options = HandLandmarkerOptions(
#     base_options=BaseOptions(model_asset_path='hand_landmarker.task'),
#     running_mode=VisionRunningMode.VIDEO,
#     num_hands=2)

# # Open the video file
# cap = cv2.VideoCapture(input_video_path)
# if not cap.isOpened():
#     print(f"Error: Could not open video file {input_video_path}")
#     exit()

# # Get video properties
# width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
# height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
# fps = cap.get(cv2.CAP_PROP_FPS)
# total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

# # Create video writer for output
# fourcc = cv2.VideoWriter_fourcc(*'mp4v')
# out = cv2.VideoWriter(output_video_path, fourcc, fps, (width, height))

# with HandLandmarker.create_from_options(options) as landmarker:
#     # Initialize timestamp
#     timestamp = 0
#     frame_count = 0

#     while cap.isOpened():
#         success, frame = cap.read()
#         if not success:
#             print("End of video or error reading frame.")
#             break

#         frame_count += 1
#         # Optional: Print progress
#         if frame_count % 10 == 0:
#             print(f"Processing frame {frame_count}/{total_frames} ({frame_count/total_frames*100:.1f}%)")

#         # Convert to RGB (MediaPipe requirement)
#         frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
#         mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame_rgb)

#         # Process the frame
#         results = landmarker.detect_for_video(mp_image, timestamp)
#         timestamp += 1

#         # Create a black canvas instead of using the original frame
#         canvas = np.zeros((height, width, 3), dtype=np.uint8)

#         # Draw hand landmarks
#         if results.hand_landmarks:
#             for idx, hand_landmarks in enumerate(results.hand_landmarks):
#                 # Get hand label (LEFT or RIGHT)
#                 handedness = results.handedness[idx][0].category_name

#                 # Get center of hand for text placement
#                 x_values = [landmark.x for landmark in hand_landmarks]
#                 y_values = [landmark.y for landmark in hand_landmarks]
#                 center_x = int(sum(x_values) / len(x_values) * width)
#                 center_y = int(sum(y_values) / len(y_values) * height)

#                 # Draw connections with white color
#                 for connection in mp.solutions.hands.HAND_CONNECTIONS:
#                     start_idx = connection[0]
#                     end_idx = connection[1]

#                     start_point = (int(hand_landmarks[start_idx].x * width),
#                                   int(hand_landmarks[start_idx].y * height))
#                     end_point = (int(hand_landmarks[end_idx].x * width),
#                                 int(hand_landmarks[end_idx].y * height))

#                     cv2.line(canvas, start_point, end_point, (255, 255, 255), 2)  # White lines

#                 # Draw landmarks with light blue color
#                 for landmark in hand_landmarks:
#                     landmark_point = (int(landmark.x * width),
#                                      int(landmark.y * height))
#                     cv2.circle(canvas, landmark_point, 5, (255, 200, 0), -1)  # Light blue dots

#                 # Display hand label
#                 color = (255, 100, 100) if handedness == "LEFT" else (100, 100, 255)  # Different colors for left/right
#                 text_position = (center_x, center_y - 30)
#                 cv2.putText(canvas, handedness, text_position,
#                            cv2.FONT_HERSHEY_SIMPLEX, 1, color, 2)

#         # Write the canvas to output video
#         out.write(canvas)

#         # Optional: Display the frame (comment out for faster processing)
#         # cv2.imshow('Processing Video', canvas)
#         # if cv2.waitKey(1) & 0xFF == ord('q'):
#         #     break

# # Release resources
# cap.release()
# out.release()
# cv2.destroyAllWindows()

# print(f"Processing complete. Output saved to {output_video_path}")

# files.download(output_video_path)

In [23]:
import os
import cv2
import json
import mediapipe as mp
from google.colab import files

def process_video_to_json(input_video_path, gesture_label):
    """
    Process a video and extract hand landmarks to a JSON file

    Args:
        input_video_path: Path to the input video file
        gesture_label: Label for the gesture being performed in the video
    """
    # MediaPipe initialization
    BaseOptions = mp.tasks.BaseOptions
    HandLandmarker = mp.tasks.vision.HandLandmarker
    HandLandmarkerOptions = mp.tasks.vision.HandLandmarkerOptions
    VisionRunningMode = mp.tasks.vision.RunningMode

    # Create output JSON path
    output_json_path = os.path.splitext(input_video_path)[0] + "_landmarks.json"

    # Create a hand landmarker instance with the video mode
    options = HandLandmarkerOptions(
        base_options=BaseOptions(model_asset_path='hand_landmarker.task'),
        running_mode=VisionRunningMode.VIDEO,
        num_hands=2)

    # Open the video file
    cap = cv2.VideoCapture(input_video_path)
    if not cap.isOpened():
        print(f"Error: Could not open video file {input_video_path}")
        return None

    # Get video properties
    fps = cap.get(cv2.CAP_PROP_FPS)
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

    # Initialize data structure for JSON
    video_landmarks = {
        'video_id': os.path.splitext(os.path.basename(input_video_path))[0],
        'gesture_label': gesture_label,
        'fps': fps,
        'total_frames': total_frames,
        'frames': []
    }

    with HandLandmarker.create_from_options(options) as landmarker:
        # Initialize timestamp
        timestamp = 0
        frame_count = 0

        while cap.isOpened():
            success, frame = cap.read()
            if not success:
                break

            frame_count += 1
            # Optional: Print progress
            if frame_count % 10 == 0:
                print(f"Processing {input_video_path} - frame {frame_count}/{total_frames} ({frame_count/total_frames*100:.1f}%)")

            # Convert to RGB (MediaPipe requirement)
            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame_rgb)

            # Process the frame
            results = landmarker.detect_for_video(mp_image, timestamp)

            # Calculate timestamp in seconds
            timestamp_seconds = frame_count / fps

            # Frame landmark data
            frame_data = {
                'frame_number': frame_count,
                'timestamp': timestamp_seconds,
                'hands': []
            }

            # Write hand landmarks to JSON
            if results.hand_landmarks:
                for idx, hand_landmarks in enumerate(results.hand_landmarks):
                    # Get hand label (LEFT or RIGHT)
                    handedness = results.handedness[idx][0].category_name

                    # Prepare hand landmarks
                    hand_data = {
                        'hand_type': handedness,
                        'landmarks': []
                    }

                    # Process each landmark
                    for landmark_idx, landmark in enumerate(hand_landmarks):
                        hand_data['landmarks'].append({
                            'landmark_id': landmark_idx,
                            'x': landmark.x,
                            'y': landmark.y,
                            'z': landmark.z
                        })

                    frame_data['hands'].append(hand_data)

            # Add frame data if landmarks were detected
            if frame_data['hands']:
                video_landmarks['frames'].append(frame_data)

            timestamp += 1

    # Release resources
    cap.release()

    # Save to JSON
    with open(output_json_path, 'w', encoding='utf-8') as jsonfile:
        json.dump(video_landmarks, jsonfile, ensure_ascii=False, indent=4)

    print(f"Processing complete. Landmarks saved to {output_json_path}")
    return output_json_path

# อัปโหลดวิดีโอหลายๆ ไฟล์
uploaded = files.upload()

# ให้ผู้ใช้ระบุท่าทางสำหรับทุกวิดีโอ
gesture_label = input("กรุณาระบุชื่อท่าทาง (เช่น hello, thank_you, เป็นต้น): ")

# เก็บ JSON paths
json_paths = []

# ประมวลผลวิดีโอแต่ละไฟล์
for video_path in uploaded.keys():
    json_path = process_video_to_json(video_path, gesture_label)
    if json_path:
        json_paths.append(json_path)

# ดาวน์โหลด JSON ทุกไฟล์
for json_path in json_paths:
    files.download(json_path)

Saving 1.2.1.mp4 to 1.2.1 (4).mp4
Saving output_hand_detection.mp4 to output_hand_detection (3).mp4
กรุณาระบุชื่อท่าทาง (เช่น hello, thank_you, เป็นต้น): hi
Processing 1.2.1 (4).mp4 - frame 10/217 (4.6%)
Processing 1.2.1 (4).mp4 - frame 20/217 (9.2%)
Processing 1.2.1 (4).mp4 - frame 30/217 (13.8%)
Processing 1.2.1 (4).mp4 - frame 40/217 (18.4%)
Processing 1.2.1 (4).mp4 - frame 50/217 (23.0%)
Processing 1.2.1 (4).mp4 - frame 60/217 (27.6%)
Processing 1.2.1 (4).mp4 - frame 70/217 (32.3%)
Processing 1.2.1 (4).mp4 - frame 80/217 (36.9%)
Processing 1.2.1 (4).mp4 - frame 90/217 (41.5%)
Processing 1.2.1 (4).mp4 - frame 100/217 (46.1%)
Processing 1.2.1 (4).mp4 - frame 110/217 (50.7%)
Processing 1.2.1 (4).mp4 - frame 120/217 (55.3%)
Processing 1.2.1 (4).mp4 - frame 130/217 (59.9%)
Processing 1.2.1 (4).mp4 - frame 140/217 (64.5%)
Processing 1.2.1 (4).mp4 - frame 150/217 (69.1%)
Processing 1.2.1 (4).mp4 - frame 160/217 (73.7%)
Processing 1.2.1 (4).mp4 - frame 170/217 (78.3%)
Processing 1.2.1 (4).

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Data Processing

รวบรวมข้อมูลจากทุกไฟล์ JSONในโฟลเดอร์เพื่อเพิ่มความหลากหลายและความครอบคลุมของข้อมูล

In [25]:
import os
import json
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from pathlib import Path

def preprocess_sign_language_data(data_directory):
    """
    Preprocess sign language data from JSON files

    Args:
        data_directory: Path to directory containing JSON files

    Returns:
        X: Processed feature data
        y: Corresponding labels
    """
    # เก็บข้อมูลทั้งหมด
    all_landmarks = []
    all_labels = []

    # เพิ่มการตรวจสอบก่อนเริ่มทำงาน
    if not os.path.exists(data_directory):
        print(f"Error: Directory {data_directory} does not exist!")
        return [], [], None # Return empty lists and None instead of None, None, None

   # วนลูปผ่านไฟล์ JSON โดยตรง
    for json_file in os.listdir(data_directory):
        if json_file.endswith('.json'):
            file_path = os.path.join(data_directory, json_file)

            try:
                # โหลดข้อมูล JSON
                with open(file_path, 'r') as f:
                    video_data = json.load(f)

                # สกัดคุณลักษณะ
                processed_landmarks = process_video_landmarks(video_data)

                # เพิ่มข้อมูล
                all_landmarks.append(processed_landmarks)
                # ใช้ชื่อไฟล์เป็นฉลาก
                all_labels.append(json_file.split('_')[0])

            except Exception as e:
                print(f"Error processing file {file_path}: {e}")

    # ถ้าไม่พบไฟล์ JSON เลย
    if not all_landmarks:
        print(f"No JSON files found in directory {data_directory}")
        return [], [], None # Return empty lists and None instead of None, None, None

    # แปลง list เป็น numpy array
    X = np.array(all_landmarks)

    # Encode labels
    label_encoder = LabelEncoder()
    y = label_encoder.fit_transform(all_labels)

    # พิมพ์ข้อมูลดีบัก
    print(f"Processed {len(X)} samples")
    print(f"Labels: {label_encoder.classes_}")

    return X, y, label_encoder

def process_video_landmarks(video_data):
    """
    แปลงข้อมูลแลนด์มาร์คจาก JSON เป็นชุดข้อมูลที่เหมาะสำหรับโมเดล

    Args:
        video_data: ข้อมูล JSON ของวิดีโอ

    Returns:
        processed_landmarks: อาร์เรย์ของแลนด์มาร์ค
    """
    # เลือกเฟรมที่มีการตรวจจับมือ
    hand_frames = [frame for frame in video_data['frames'] if frame['hands']]

    # เลือกเฟรมทั้งหมด (หรือจำกัดจำนวนเฟรม)
    selected_frames = hand_frames[:30]  # จำกัดที่ 30 เฟรม

    # เตรียมอาร์เรย์เก็บแลนด์มาร์ค
    landmarks_sequence = []

    for frame in selected_frames:
        # สำหรับแต่ละมือในเฟรม
        frame_landmarks = []
        for hand in frame['hands']:
            # สกัด x, y, z ของแต่ละจุด
            hand_landmarks = [
                [landmark['x'], landmark['y'], landmark['z']]
                for landmark in hand['landmarks']
            ]
            frame_landmarks.extend(hand_landmarks)

        # padding หากมีจุดไม่ครบ
        while len(frame_landmarks) < 42:  # 21 จุด * 2 มือ
            frame_landmarks.append([0, 0, 0])

        landmarks_sequence.append(frame_landmarks[:42])

    # padding sequence ให้มีความยาวคงที่
    while len(landmarks_sequence) < 30:
        landmarks_sequence.append([[0, 0, 0]] * 42)

    # เพิ่มในฟังก์ชัน process_video_landmarks เพื่อดีบัก
    print("Video data keys:", video_data.keys())
    print("Number of frames:", len(video_data['frames']))
    print("First frame hands:", video_data['frames'][0]['hands'])

    return np.array(landmarks_sequence)


# ตัวอย่างการใช้งาน
data_directory = '/content/hi'  # ตรวจสอบให้แน่ใจว่าพาธนี้ถูกต้อง
# Check if the directory exists
if not os.path.exists(data_directory):
    print(f"Error: Directory '{data_directory}' does not exist. Please create it and add your JSON files.")
else:
    # Check if the directory contains any JSON files
    json_files = [f for f in os.listdir(data_directory) if f.endswith('.json')]
    if not json_files:
        print(f"Error: Directory '{data_directory}' does not contain any JSON files. Please add your JSON files.")
    else:
        X, y, label_encoder = preprocess_sign_language_data(data_directory)

        if len(X) > 0 and len(y) > 0 and label_encoder is not None:

          # เพิ่มบรรทัดนี้ก่อนการแบ่งข้อมูล
            print("X shape:", X.shape)
            print("y shape:", y.shape)
            print("X data type:", X.dtype)
            print("First sample landmarks:\n", X[0])
            print("Labels:", y)
            # แบ่งข้อมูล
            X_train, X_test, y_train, y_test = train_test_split(
                X, y, test_size=0.2, random_state=42
            )

            print("Shape of training data:", X_train.shape)
            print("Unique labels:", label_encoder.classes_)
        else:
            print("No data to process. Check your JSON files.")

Video data keys: dict_keys(['video_id', 'gesture_label', 'fps', 'total_frames', 'frames'])
Number of frames: 48
First frame hands: [{'hand_type': 'Right', 'landmarks': [{'landmark_id': 0, 'x': 0.3443603515625, 'y': 0.8842741250991821, 'z': -1.2120239034629776e-07}, {'landmark_id': 1, 'x': 0.3644331097602844, 'y': 0.8599238991737366, 'z': -0.011832674033939838}, {'landmark_id': 2, 'x': 0.3963460624217987, 'y': 0.8404296040534973, 'z': -0.024631474167108536}, {'landmark_id': 3, 'x': 0.4157256782054901, 'y': 0.817276656627655, 'z': -0.03736870363354683}, {'landmark_id': 4, 'x': 0.4262501895427704, 'y': 0.7945399284362793, 'z': -0.051433876156806946}, {'landmark_id': 5, 'x': 0.3917095363140106, 'y': 0.8760520219802856, 'z': -0.028685227036476135}, {'landmark_id': 6, 'x': 0.39549970626831055, 'y': 0.8848051428794861, 'z': -0.04091654717922211}, {'landmark_id': 7, 'x': 0.3912796974182129, 'y': 0.8867673873901367, 'z': -0.050275832414627075}, {'landmark_id': 8, 'x': 0.3873821794986725, 'y': 0