# 시나리오별 모델 실험

## 위험 시나리오 감지
**[위험]**

- 문고리를 잡는 행위
- 현관문이 열리는 경우 (+문고리 잡는 행위 와 함께 → 문고리의 위치가 바뀌는 경우를 탐지하는 식으로)
- 철사로 문고리를 내리려는 행위 (문 여는 시도)
- 도어락 앞에서 플래시를 켜고 지문을 추적하는 행위<br>
→ 모두 도어락 부근 행동

## 실험 1 : IoU 계산? ROI 활용?
- 도어락 부분 영역 설정 → 해당 부분에 손목이 들어온 뒤에 일정 시간 지속되면 위험상황

1. 도어락 부분 감지 : ROI 설정
    - 문제 : 도어락 레이블이 없음 → 대략적인 해당 영역을 박스로 설정해서 실험 
    - 손잡이 형태의 도어락 : 문고리 인식하는지 테스트 + 해당 영역 ROI 지정
    - 도어락 자체가 손잡이인 디자인 : 직접 ROI 지정
2. 손목 감지
   - pose, skeleton 모델 활용해서 손목 키포인트 찾기
3. **이미지** →  **동영상** → **실시간 영상** 순으로 테스트

### 1) ROI 설정
- openCV 활용
  > cv2.selectROI(win_name, img, showCrossHair=True, fromCenter=False)
- 참고 : [블로그](https://alpaca-gt.tistory.com/80)
  

In [2]:
import cv2
import numpy as np

In [3]:
# 테스트 이미지 불러오기
img = cv2.imread('pose_test_data/test_img1.png')
x,y,w,h	= cv2.selectROI('img', img, False)      # 원본 이미지 띄어줌
if w and h:
    roi = img[y:y+h, x:x+w]
    # cv2.imshow('cropped', roi)  # ROI 지정 영역을 새창으로 표시

cv2.waitKey(0)  # close window when a key press is detected
cv2.destroyAllWindows()
cv2.waitKey(1) 

-1

### 2) 손목 Keypoint 추출
- [yolov8n-pose](https://docs.ultralytics.com/tasks/pose/) 모델 사용
    - 이 모델은 keypoint name(x, y coordinates 좌표)값이 없음
    - 다음 방식으로 키포인트 이름 만들 예정([참고](https://github.com/Alimustoofaa/YoloV8-Pose-Keypoint-Classification))

In [4]:
# yolo v8 pose test

from PIL import Image
import glob
import os
from ultralytics import YOLO

# Load a model
pose_model = YOLO('yolov8n-pose.pt')  # load an official pose model

In [5]:
# Get the test images
current_path = os.getcwd()
img_path = current_path + '/pose_test_data'
imgs = glob.glob(img_path + '/*.png')

len(imgs)

6

In [3]:
pose_folder_path = "./pose_result"

# 만약 폴더가 존재하지 않으면 폴더 생성
if not os.path.exists(pose_folder_path):
    os.makedirs(pose_folder_path)

# Run batched inference on a list of images
pose_results = pose_model(imgs, stream=True)  # return a generator of Results objects

# Process results generator
for i, pose_result in enumerate(pose_results):
    boxes = pose_result.boxes  # Boxes object for bounding box outputs
    masks = pose_result.masks  # Masks object for segmentation masks outputs
    keypoints = pose_result.keypoints  # Keypoints object for pose outputs
    probs = pose_result.probs  # Probs object for classification outputs
    # pose_result.show()  # display to screen
    pose_result.save(filename=os.path.join(pose_folder_path, f'pose_result_{i+1}.jpg'))  # save to disk


0: 640x640 1 person, 109.5ms
1: 640x640 1 person, 109.5ms
2: 640x640 1 person, 109.5ms
3: 640x640 1 person, 109.5ms
4: 640x640 1 person, 109.5ms
5: 640x640 1 person, 109.5ms
Speed: 3.1ms preprocess, 109.5ms inference, 3.3ms postprocess per image at shape (1, 3, 640, 640)


In [6]:
# Extract keypoint
# 도어락을 누르는 포즈의 손목 키포인트를 잡기 좋은 이미지 선택

# Predict with the model
pose_results = pose_model('./pose_test_data/test_img1.png')

# Process results generator
for i, pose_result in enumerate(pose_results):
    boxes = pose_result.boxes  # Boxes object for bounding box outputs
    masks = pose_result.masks  # Masks object for segmentation masks outputs
    keypoints = pose_result.keypoints  # Keypoints object for pose outputs
    probs = pose_result.probs  # Probs object for classification outputs
    # pose_result.show()  # display to screen
    # pose_result.save(filename=os.path.join(pose_folder_path, f'pose_result_{i+1}.jpg'))  # save to disk


image 1/1 /Users/seullee/Documents/STUDY-AI/AIFFEL/cozy_house/ss_lab/pose_test_data/test_img1.png: 384x640 1 person, 151.1ms
Speed: 4.7ms preprocess, 151.1ms inference, 29.7ms postprocess per image at shape (1, 3, 384, 640)


In [7]:
print(keypoints)

ultralytics.engine.results.Keypoints object with attributes:

conf: tensor([[0.3611, 0.0732, 0.4092, 0.2473, 0.8563, 0.9219, 0.9878, 0.6475, 0.9658, 0.5326, 0.8874, 0.9865, 0.9944, 0.9825, 0.9929, 0.9610, 0.9774]])
data: tensor([[[0.0000e+00, 0.0000e+00, 3.6106e-01],
         [0.0000e+00, 0.0000e+00, 7.3235e-02],
         [0.0000e+00, 0.0000e+00, 4.0920e-01],
         [0.0000e+00, 0.0000e+00, 2.4734e-01],
         [9.8182e+02, 2.1188e+02, 8.5630e-01],
         [8.8804e+02, 2.5682e+02, 9.2194e-01],
         [9.8083e+02, 2.8153e+02, 9.8781e-01],
         [8.7086e+02, 3.4776e+02, 6.4748e-01],
         [1.0067e+03, 3.7130e+02, 9.6585e-01],
         [9.3763e+02, 3.7669e+02, 5.3262e-01],
         [1.0686e+03, 3.5442e+02, 8.8741e-01],
         [8.9290e+02, 4.3181e+02, 9.8648e-01],
         [9.5620e+02, 4.4372e+02, 9.9443e-01],
         [8.8858e+02, 5.4339e+02, 9.8247e-01],
         [9.5626e+02, 5.6408e+02, 9.9286e-01],
         [8.6905e+02, 6.5733e+02, 9.6100e-01],
         [9.4230e+02, 6.878

- 각 좌표값이 어디 좌표값인지 알수 없음
- 각 키포인트에 이름으로 뽑을수 있도록 pydantic class 활용

In [8]:
from pydantic import BaseModel

class GetKeypoint(BaseModel):
    NOSE:           int = 0
    LEFT_EYE:       int = 1
    RIGHT_EYE:      int = 2
    LEFT_EAR:       int = 3
    RIGHT_EAR:      int = 4
    LEFT_SHOULDER:  int = 5
    RIGHT_SHOULDER: int = 6
    LEFT_ELBOW:     int = 7
    RIGHT_ELBOW:    int = 8
    LEFT_WRIST:     int = 9
    RIGHT_WRIST:    int = 10
    LEFT_HIP:       int = 11
    RIGHT_HIP:      int = 12
    LEFT_KNEE:      int = 13
    RIGHT_KNEE:     int = 14
    LEFT_ANKLE:     int = 15
    RIGHT_ANKLE:    int = 16

# example 
get_keypoint = GetKeypoint()

In [9]:
result_keypoint = keypoints.xyn.cpu().numpy()[0]
result_keypoint

array([[          0,           0],
       [          0,           0],
       [          0,           0],
       [          0,           0],
       [    0.51136,     0.19619],
       [    0.46252,      0.2378],
       [    0.51085,     0.26067],
       [    0.45357,       0.322],
       [    0.52431,     0.34379],
       [    0.48835,     0.34879],
       [    0.55656,     0.32817],
       [    0.46505,     0.39982],
       [    0.49802,     0.41085],
       [     0.4628,     0.50314],
       [    0.49805,      0.5223],
       [    0.45263,     0.60864],
       [    0.49078,     0.63687]], dtype=float32)

In [10]:
right_wrist_x, right_wrist_y = result_keypoint[get_keypoint.RIGHT_WRIST]
right_wrist_x, right_wrist_y

(0.5565609, 0.32816875)

- 이미지에 해당 좌표가 맞게 잡혔는지 확인 ([openCV docs](https://docs.opencv.org/4.x/dc/da5/tutorial_py_drawing_functions.html), [블로그](https://inhovation97.tistory.com/52))
> cv2.circle( img, center, radius, color[, thickness, lineType])<br>
>
> - img - 그림 그릴 대상 이미지(numpy배열)<br>
> - center - 원점 좌표 (x, y)<br>
> - radius - 원의 반지름<br>
> - color - 색상, (B,G,R), 0~255 <- 주의할 것.<br>
> - thickness - 선 두께 (-1: 채우기)<br>
> - lineType
>   - 선 타입 cv2.line() 함수와 동일함<br>
                      cv2.LINE_4 - 연결 선 알고리즘<br>
                      cv2.LINE_8 - 연결 선 알고리즘<br>
                      cv2.LINE_AA - 안티 앨리어싱( 계단 현상 없는 선 )<br>

In [11]:
image_path = './pose_test_data/test_img1.png'
image = cv2.imread(image_path)

# get image dimensions
height, width, _ = image.shape

# Convert float coordinates to integer coordinates
right_wrist_x_int = int(right_wrist_x * width)
right_wrist_y_int = int(right_wrist_y * height)

# Draw circles on the image at the wrist coordinates
cv2.circle(image, (right_wrist_x_int, right_wrist_y_int), 5, (0, 255, 0), -1)  # Green circle for right wrist
# cv2.imshow('marked_image', image) # show marked image
cv2.imwrite('./marked_image.jpg', image) # save marked image 

True

- 이미지에 해당 키포인트가 잘 들어갔다.
  - <img src=https://github.com/thetjswo/PJ_AIFFELthon/assets/140625136/8d1bfa05-6712-4a0d-acc9-ff509965643f width="50%">

### 3) ROI 구역에 손목 키포인트 있는 경우 찾기

In [14]:
# Draw ROI and get coordinates
x, y, w, h = cv2.selectROI('roi_img', image, False)

if w and h:
    roi = image[y:y+h, x:x+w]
    
    # Convert float coordinates to integer coordinates
    right_wrist_x_int = int(right_wrist_x * width)
    right_wrist_y_int = int(right_wrist_y * height)
    
    # Check if wrist keypoint is within ROI
    if x <= right_wrist_x_int <= x + w and y <= right_wrist_y_int <= y + h:
        dangerous_state = True
        print("Wrist keypoint is inside the ROI. Dangerous state detected.")
    else:
        dangerous_state = False
        print("Wrist keypoint is outside the ROI. Safe state.")


cv2.waitKey(0) # waits for a key event infinitely : close window when a key press is detected
cv2.destroyAllWindows()
cv2.waitKey(1) # wait for delay milliseconds, when it is positive

Wrist keypoint is inside the ROI. Dangerous state detected.


-1

### 4) PoseDetector class
- 먼저 roi 영역설정한뒤에
- 이미지를 모델에 전달한 뒤 양쪽 손목 좌표 들어가는 경우 모두 위험상황으로 설정

In [1]:
import cv2
from ultralytics import YOLO
from pydantic import BaseModel

class GetKeypoint(BaseModel):
    NOSE:           int = 0
    LEFT_EYE:       int = 1
    RIGHT_EYE:      int = 2
    LEFT_EAR:       int = 3
    RIGHT_EAR:      int = 4
    LEFT_SHOULDER:  int = 5
    RIGHT_SHOULDER: int = 6
    LEFT_ELBOW:     int = 7
    RIGHT_ELBOW:    int = 8
    LEFT_WRIST:     int = 9
    RIGHT_WRIST:    int = 10
    LEFT_HIP:       int = 11
    RIGHT_HIP:      int = 12
    LEFT_KNEE:      int = 13
    RIGHT_KNEE:     int = 14
    LEFT_ANKLE:     int = 15
    RIGHT_ANKLE:    int = 16


class PoseDetector:
    def __init__(self, model_name):
        # Load the YOLOv8 pose model
        self.model = YOLO(model_name)
        # Initialize the ROI (Region of Interest) to None
        self.roi = None
        # Initialize the dangerous state to False
        self.dangerous_state = False
        # Get keypoint with name
        self.get_keypoint = GetKeypoint()

    def detect_pose(self, image_path):
        # Read the input image
        image = cv2.imread(image_path)
        # Run the YOLOv8 pose model on the input image
        results = self.model(image)

        for result in results:
            # Get the keypoint coordinates for the detected pose
            keypoints = result.keypoints.xyn.cpu().numpy()[0]
            
            # Extract the wrists keypoint coordinates
            left_wrist_x, left_wrist_y  = keypoints[self.get_keypoint.LEFT_WRIST]
            right_wrist_x, right_wrist_y = keypoints[self.get_keypoint.RIGHT_WRIST]

            # Convert the wrists keypoint coordinates to integer values
            left_wrist_x_int = int(left_wrist_x * image.shape[1])
            left_wrist_y_int = int(left_wrist_y * image.shape[0])
            right_wrist_x_int = int(right_wrist_x * image.shape[1])
            right_wrist_y_int = int(right_wrist_y * image.shape[0])


            # If the ROI is not set, select the ROI using cv2.selectROI
            if self.roi is None:
                self.roi = cv2.selectROI('roi_img', image, False)

                cv2.waitKey(0)  # close window when a key press is detected
                cv2.destroyAllWindows()
                cv2.waitKey(1) 

            # Check if a valid ROI was selected
            if self.roi[2] > 0 and self.roi[3] > 0:
                # Unpack the ROI coordinates
                x_roi, y_roi, w_roi, h_roi = self.roi
                # Check if the wrist keypoint is within the ROI
                if (x_roi <= left_wrist_x_int <= x_roi + w_roi and y_roi <= left_wrist_y_int <= y_roi + h_roi) or \
                (x_roi <= right_wrist_x_int <= x_roi + w_roi and y_roi <= right_wrist_y_int <= y_roi + h_roi):
                    self.dangerous_state = True
                    print("Wrist keypoint is inside the ROI. Dangerous state detected.")
                else:
                    self.dangerous_state = False
                    print("Wrist keypoint is outside the ROI. Safe state.")

# Usage
pose_detector = PoseDetector("yolov8n-pose.pt")
pose_detector.detect_pose("./pose_test_data/test_img2.png") # 얼굴 문에 기대는 사진


0: 384x640 1 person, 103.4ms
Speed: 7.1ms preprocess, 103.4ms inference, 9.2ms postprocess per image at shape (1, 3, 384, 640)
Wrist keypoint is outside the ROI. Safe state.


In [2]:
pose_detector.detect_pose("./pose_test_data/test_img3.png") # 도어락 만지는 사진


0: 384x640 1 person, 151.2ms
Speed: 82.9ms preprocess, 151.2ms inference, 46.4ms postprocess per image at shape (1, 3, 384, 640)
Wrist keypoint is inside the ROI. Dangerous state detected.


- 동일한 PoseDetector 인스턴스일 경우 roi 설정은 한번만 하면 됨<br> => 만약 이미지 배열이 아예 달라질 경우에는 새로 roi가 설정되도록 해야함
- 영상도 가능한지 체크

### 5-1) 영상 ROI 설정
- 영상은 selectROI를 사용할수 없음, 대신 `setMouseCallback`을 사용할수 있는것 같음, 테스트!
- 참고 : [stack overflow](https://stackoverflow.com/questions/68969235/select-roi-on-video-stream-while-its-playing)

In [7]:
# import cv2

# 저장된 영상으로 먼저 테스트
video_path = "./C021_A18_SY17_P07_S06_02DAS.mp4"
cap = cv2.VideoCapture(video_path)
cv2.namedWindow('Frame')
if not cap.isOpened():
    exit()

# Our ROI, defined by two points
p1, p2 = (0, 0), (0, 0)
drawing = False  # True while ROI is actively being drawn by mouse
show_drawing = False  # True while ROI is drawn but is pending use or cancel
blue = (255, 0, 0)


def on_mouse(event, x, y, flags, userdata):
    global p1, p2, drawing, show_drawing

    if event == cv2.EVENT_LBUTTONDOWN:
        # Left click down (select first point)
        drawing = True
        show_drawing = True
        p1 = x, y
        p2 = x, y
    elif event == cv2.EVENT_MOUSEMOVE:
        # Drag to second point
        if drawing:
            p2 = x, y
    elif event == cv2.EVENT_LBUTTONUP:
        # Left click up (select second point)
        drawing = False
        p2 = x, y


cv2.setMouseCallback('Frame', on_mouse)

while True:
    val, fr = cap.read()
    if not val:
        break

    if show_drawing:
        # Fix p2 to be always within the frame
        p2 = (
            0 if p2[0] < 0 else (p2[0] if p2[0] < fr.shape[1] else fr.shape[1]),
            0 if p2[1] < 0 else (p2[1] if p2[1] < fr.shape[0] else fr.shape[0])
        )
        cv2.rectangle(fr, p1, p2, blue, 2)
        avg_y = (p1[1] + p2[1]) // 2
        cv2.line(fr, (p1[0], avg_y), (p2[0], avg_y), blue, 2)  # Middle horizontal line
        avg_x = (p1[0] + p2[0]) // 2
        cv2.line(fr, (avg_x, p1[1]), (avg_x, p2[1]), blue, 2)  # Middle vertical line

    cv2.imshow('Frame', fr)

    pressed = cv2.waitKey(1)
    if pressed in [13, 32]:
        # Pressed Enter or Space to use ROI
        drawing = False
        show_drawing = False
        # here do something with ROI points values (p1 and p2)
        print(p1, p2)
    elif pressed in [ord('c'), ord('C'), 27]:
        # Pressed C or Esc to cancel ROI
        drawing = False
        show_drawing = False
        print(p1, p2)
    elif pressed in [ord('q'), ord('Q')]:
        # Pressed Q to exit
        break

cap.release()
cv2.destroyAllWindows()
cv2.waitKey(1)

(978, 268) (1112, 506)
(775, 124) (1141, 713)
(508, 377) (657, 547)
(1250, 274) (1462, 722)
(796, 144) (1172, 837)


-1

- 영상에 실시간으로 ROI를 새로 지정하는건 가능한데, pose 모델에 입력해서 키포인트를 받는건 프레임단위로 진행되기때문에 이게 의미가 있을까? 하는 생각이 든다.
- 위에서 만든 PoseDetector class를 영상 버전으로 수정해서 테스트

### 5-2) PoseDetector class 영상 입력 테스트
- input을 영상으로 받도록 수정

In [9]:
import cv2
from ultralytics import YOLO
from pydantic import BaseModel

class GetKeypoint(BaseModel):
    NOSE:           int = 0
    LEFT_EYE:       int = 1
    RIGHT_EYE:      int = 2
    LEFT_EAR:       int = 3
    RIGHT_EAR:      int = 4
    LEFT_SHOULDER:  int = 5
    RIGHT_SHOULDER: int = 6
    LEFT_ELBOW:     int = 7
    RIGHT_ELBOW:    int = 8
    LEFT_WRIST:     int = 9
    RIGHT_WRIST:    int = 10
    LEFT_HIP:       int = 11
    RIGHT_HIP:      int = 12
    LEFT_KNEE:      int = 13
    RIGHT_KNEE:     int = 14
    LEFT_ANKLE:     int = 15
    RIGHT_ANKLE:    int = 16


class PoseDetector:
    def __init__(self, model_name):
        # Load the YOLOv8 pose model
        self.model = YOLO(model_name)
        # Initialize the ROI (Region of Interest) to None
        self.roi = None
        # Initialize the dangerous state to False
        self.dangerous_state = False
        # Get keypoint with name
        self.get_keypoint = GetKeypoint()
        
    def detect_pose_video(self, video_path=0):
        # video path가 있는 경우 해당 경로로 접근, 없는 경우 웹캠 사용
        cap = cv2.VideoCapture(video_path)

        while True:
            ret, frame = cap.read()

            if not ret:
                break

            # Run the YOLOv8 pose model on the input image
            results = self.model(frame)

            for result in results:
                # Get the keypoint coordinates for the detected pose
                keypoints = result.keypoints.xyn.cpu().numpy()[0]
                
                # Extract the wrists keypoint coordinates
                left_wrist_x, left_wrist_y  = keypoints[self.get_keypoint.LEFT_WRIST]
                right_wrist_x, right_wrist_y = keypoints[self.get_keypoint.RIGHT_WRIST]
    
                # Convert the wrists keypoint coordinates to integer values
                left_wrist_x_int = int(left_wrist_x * frame.shape[1])
                left_wrist_y_int = int(left_wrist_y * frame.shape[0])
                right_wrist_x_int = int(right_wrist_x * frame.shape[1])
                right_wrist_y_int = int(right_wrist_y * frame.shape[0])

                # If the ROI is not set, select the ROI using cv2.selectROI
                if self.roi is None:
                    self.roi = cv2.selectROI('roi_img', frame, False)
    
                # Check if a valid ROI was selected
                if self.roi[2] > 0 and self.roi[3] > 0:
                    # Unpack the ROI coordinates
                    x_roi, y_roi, w_roi, h_roi = self.roi
                    # Check if the wrist keypoint is within the ROI
                    if (x_roi <= left_wrist_x_int <= x_roi + w_roi and y_roi <= left_wrist_y_int <= y_roi + h_roi) or \
                    (x_roi <= right_wrist_x_int <= x_roi + w_roi and y_roi <= right_wrist_y_int <= y_roi + h_roi):
                        self.dangerous_state = True
                        print("Wrist keypoint is inside the ROI. Dangerous state detected.")
                    else:
                        self.dangerous_state = False
                        print("Wrist keypoint is outside the ROI. Safe state.")

            # 프레임 보여주기
            cv2.imshow('frame', frame)
            # 'q' 누르면 종료
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        # cap 해제, 비디오 창 닫기
        cap.release()
        cv2.destroyAllWindows()

In [10]:
# Usage
pose_detector = PoseDetector("yolov8n-pose.pt")
pose_detector.detect_pose_video("C021_A18_SY17_P07_S06_02DAS.mp4")


0: 384x640 1 person, 150.6ms
Speed: 17.8ms preprocess, 150.6ms inference, 21.4ms postprocess per image at shape (1, 3, 384, 640)
Wrist keypoint is outside the ROI. Safe state.

0: 384x640 1 person, 126.8ms
Speed: 72.3ms preprocess, 126.8ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)
Wrist keypoint is outside the ROI. Safe state.

0: 384x640 1 person, 65.9ms
Speed: 1.2ms preprocess, 65.9ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)
Wrist keypoint is outside the ROI. Safe state.

0: 384x640 1 person, 71.1ms
Speed: 2.5ms preprocess, 71.1ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)
Wrist keypoint is outside the ROI. Safe state.

0: 384x640 1 person, 86.8ms
Speed: 1.8ms preprocess, 86.8ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)
Wrist keypoint is outside the ROI. Safe state.

0: 384x640 1 person, 50.3ms
Speed: 1.3ms preprocess, 50.3ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)
Wr

- 프레임 읽는 속도가 너무 느리고, 프레임당 메세지가 출력되도록 설정해둬서 무한으로 결과창이 길어진다.
- 프레임 읽는 속도를 수정하고, 위험한 경우에만 메세지가 출력되도록 수정

In [3]:
import cv2

video_path = "C021_A18_SY17_P07_S06_02DAS.mp4"
cap = cv2.VideoCapture(video_path)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))  # 비디오의 전체 프레임 개수
print(frame_count)

3598


- 전체 프레임 수를 100으로 나눠서 중간에 건너뛸 프레임 체크 

In [8]:
frame_skip = max(1, frame_count // 200) # Skip frames to process 100 frames maximum 
frame_skip

17

- 17번째 단위의 프레임만 체크하도록 수정
  - 0, 17, 24번째 프레임으로 키포인트 체크
  - 1\~16, 18~23 번째 프레임은 건너뜀
- 영상 C021_A18_SY17_P07_S06_02DAS.mp4 에서 필요없는 동작 부분은 삭제 -> pose_video_sample.mp4
- openCV 창이 제대로 안닫히고 커널이 계속 죽는 문제 발생 -> signal 활용

In [2]:
import cv2
from ultralytics import YOLO
from pydantic import BaseModel
import signal
import sys

class GetKeypoint(BaseModel):
    NOSE:           int = 0
    LEFT_EYE:       int = 1
    RIGHT_EYE:      int = 2
    LEFT_EAR:       int = 3
    RIGHT_EAR:      int = 4
    LEFT_SHOULDER:  int = 5
    RIGHT_SHOULDER: int = 6
    LEFT_ELBOW:     int = 7
    RIGHT_ELBOW:    int = 8
    LEFT_WRIST:     int = 9
    RIGHT_WRIST:    int = 10
    LEFT_HIP:       int = 11
    RIGHT_HIP:      int = 12
    LEFT_KNEE:      int = 13
    RIGHT_KNEE:     int = 14
    LEFT_ANKLE:     int = 15
    RIGHT_ANKLE:    int = 16


class PoseDetector:
    def __init__(self, model_name):
        # Load the YOLOv8 pose model
        self.model = YOLO(model_name)
        # Initialize the ROI (Region of Interest) to None
        self.roi = None
        # Initialize the dangerous state to False
        self.dangerous_state = False
        # Get keypoint with name
        self.get_keypoint = GetKeypoint()
        
    def detect_pose_video(self, video_path=0):
        # video path가 있는 경우 해당 경로로 접근, 없는 경우 웹캠 사용
        cap = cv2.VideoCapture(video_path)
        frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))  # 비디오의 전체 프레임 개수
        
        frame_skip = max(1, frame_count // 200) # Skip frames to process 200 frames maximum

        frame_num = 0 # Keep track of the current frame number

        # openCV 창 닫는 시그널 받으면 열려있는 창 모두 닫기
        def signal_handler(signal, frame): # signal은 "ctrl+c"
            cap.release()
            cv2.destroyAllWindows()
            sys.exit(0)

        signal.signal(signal.SIGINT, signal_handler)
        
        while True:
            ret, frame = cap.read()

            if not ret:
                break
            
            frame_num += 1

            if frame_num % frame_skip == 0:
                # Run the YOLOv8 pose model on the input image
                results = self.model(frame)
    
                for result in results:
                    # Get the keypoint coordinates for the detected pose
                    keypoints = result.keypoints.xyn.cpu().numpy()[0]
                    
                    # Extract the wrists keypoint coordinates
                    left_wrist_x, left_wrist_y  = keypoints[self.get_keypoint.LEFT_WRIST]
                    right_wrist_x, right_wrist_y = keypoints[self.get_keypoint.RIGHT_WRIST]
        
                    # Convert the wrists keypoint coordinates to integer values
                    left_wrist_x_int = int(left_wrist_x * frame.shape[1])
                    left_wrist_y_int = int(left_wrist_y * frame.shape[0])
                    right_wrist_x_int = int(right_wrist_x * frame.shape[1])
                    right_wrist_y_int = int(right_wrist_y * frame.shape[0])
    
                    # If the ROI is not set, select the ROI using cv2.selectROI
                    if self.roi is None:
                        self.roi = cv2.selectROI('roi_img', frame, False)

        
                    # Check if a valid ROI was selected
                    if self.roi[2] > 0 and self.roi[3] > 0:
                        # Unpack the ROI coordinates
                        x_roi, y_roi, w_roi, h_roi = self.roi
                        # Check if the wrist keypoint is within the ROI
                        if (x_roi <= left_wrist_x_int <= x_roi + w_roi and y_roi <= left_wrist_y_int <= y_roi + h_roi) or \
                        (x_roi <= right_wrist_x_int <= x_roi + w_roi and y_roi <= right_wrist_y_int <= y_roi + h_roi):
                            self.dangerous_state = True
                            print("Wrist keypoint is inside the ROI. Dangerous state detected.")
                        else:
                            self.dangerous_state = False
                            # print("Wrist keypoint is outside the ROI. Safe state.")

            # 프레임 보여주기
            cv2.imshow('frame', frame)
            # 'q' 누르면 종료
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        # cap 해제, 비디오 창 닫기
        cap.release()
        cv2.destroyAllWindows()
        cv2.waitKey(1)

In [4]:
# Usage
pose_detector = PoseDetector("yolov8n-pose.pt")
pose_detector.detect_pose_video("pose_video_sample.mp4")


0: 384x640 1 person, 100.1ms
Speed: 8.5ms preprocess, 100.1ms inference, 17.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 84.2ms
Speed: 6.8ms preprocess, 84.2ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 55.9ms
Speed: 1.6ms preprocess, 55.9ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 57.7ms
Speed: 1.7ms preprocess, 57.7ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 53.2ms
Speed: 1.4ms preprocess, 53.2ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 48.9ms
Speed: 1.4ms preprocess, 48.9ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 57.6ms
Speed: 1.7ms preprocess, 57.6ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 51.9ms
Speed: 1.5ms preprocess, 51.9ms inference, 0.3ms postprocess per image at shape (1, 3,

- 설정한 영역에 roi가 있을때에만 위험 문구 출력됨
- yolo model 실행하면 출력되는 문구도 없애고 싶다..

In [11]:
import cv2
from ultralytics import YOLO
from pydantic import BaseModel
import signal
import sys
import os

class GetKeypoint(BaseModel):
    NOSE:           int = 0
    LEFT_EYE:       int = 1
    RIGHT_EYE:      int = 2
    LEFT_EAR:       int = 3
    RIGHT_EAR:      int = 4
    LEFT_SHOULDER:  int = 5
    RIGHT_SHOULDER: int = 6
    LEFT_ELBOW:     int = 7
    RIGHT_ELBOW:    int = 8
    LEFT_WRIST:     int = 9
    RIGHT_WRIST:    int = 10
    LEFT_HIP:       int = 11
    RIGHT_HIP:      int = 12
    LEFT_KNEE:      int = 13
    RIGHT_KNEE:     int = 14
    LEFT_ANKLE:     int = 15
    RIGHT_ANKLE:    int = 16


class PoseDetector:
    def __init__(self, model_name):
        # # Suppress YOLOv8 output
        # self.null_device = open(os.devnull, 'w')
        # self.original_stdout = sys.stdout
        # sys.stdout = self.null_device
        
        # Load the YOLOv8 pose model
        self.model = YOLO(model_name)
        # Initialize the ROI (Region of Interest) to None
        self.roi = None
        # Initialize the dangerous state to False
        self.dangerous_state = False
        # Get keypoint with name
        self.get_keypoint = GetKeypoint()

    # def __del__(self):
    #     # Restore original stdout
    #     sys.stdout = self.original_stdout
    #     self.null_device.close()
        
    def detect_pose_video(self, video_path=0):
        # video path가 있는 경우 해당 경로로 접근, 없는 경우 웹캠 사용
        cap = cv2.VideoCapture(video_path)
        frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))  # 비디오의 전체 프레임 개수
        
        frame_skip = max(1, frame_count // 200) # Skip frames to process 200 frames maximum

        frame_num = 0 # Keep track of the current frame number

        # openCV 창 닫는 시그널 받으면 열려있는 창 모두 닫기
        def signal_handler(signal, frame): 
            cap.release()
            cv2.destroyAllWindows()
            sys.exit(0)

        signal.signal(signal.SIGINT, signal_handler)
        
        while True:
            ret, frame = cap.read()

            if not ret:
                break
            
            frame_num += 1

            if frame_num % frame_skip == 0:
                # Run the YOLOv8 pose model on the input image
                results = self.model(frame)
    
                for result in results:
                    # Get the keypoint coordinates for the detected pose
                    keypoints = result.keypoints.xyn.cpu().numpy()[0]
                    
                    # Extract the wrists keypoint coordinates
                    left_wrist_x, left_wrist_y  = keypoints[self.get_keypoint.LEFT_WRIST]
                    right_wrist_x, right_wrist_y = keypoints[self.get_keypoint.RIGHT_WRIST]
        
                    # Convert the wrists keypoint coordinates to integer values
                    left_wrist_x_int = int(left_wrist_x * frame.shape[1])
                    left_wrist_y_int = int(left_wrist_y * frame.shape[0])
                    right_wrist_x_int = int(right_wrist_x * frame.shape[1])
                    right_wrist_y_int = int(right_wrist_y * frame.shape[0])
    
                    # If the ROI is not set, select the ROI using cv2.selectROI
                    if self.roi is None:
                        print("Selecting ROI...")
                        self.roi = cv2.selectROI('roi_img', frame, False)
                        print(f"Selected ROI: {self.roi}")
        
                    # Check if a valid ROI was selected
                    if self.roi[2] > 0 and self.roi[3] > 0:
                        # Unpack the ROI coordinates
                        x_roi, y_roi, w_roi, h_roi = self.roi
                        # Check if the wrist keypoint is within the ROI
                        if (x_roi <= left_wrist_x_int <= x_roi + w_roi and y_roi <= left_wrist_y_int <= y_roi + h_roi) or \
                        (x_roi <= right_wrist_x_int <= x_roi + w_roi and y_roi <= right_wrist_y_int <= y_roi + h_roi):
                            self.dangerous_state = True
                            print("Wrist keypoint is inside the ROI. Dangerous state detected.")
                        else:
                            self.dangerous_state = False
                            # print("Wrist keypoint is outside the ROI. Safe state.")

            # 프레임 보여주기
            cv2.imshow('frame', frame)
            # 'q' 누르면 종료
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        # cap 해제, 비디오 창 닫기
        cap.release()
        cv2.destroyAllWindows()
        cv2.waitKey(1)

In [12]:
# Usage
pose_detector = PoseDetector("yolov8n-pose.pt")
pose_detector.detect_pose_video("pose_video_sample.mp4")


0: 384x640 1 person, 86.3ms
Speed: 4.5ms preprocess, 86.3ms inference, 2.3ms postprocess per image at shape (1, 3, 384, 640)
Selecting ROI...
Selected ROI: (1011, 298, 69, 148)

0: 384x640 1 person, 113.7ms
Speed: 2.1ms preprocess, 113.7ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 58.2ms
Speed: 1.6ms preprocess, 58.2ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 73.9ms
Speed: 2.1ms preprocess, 73.9ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 47.9ms
Speed: 2.1ms preprocess, 47.9ms inference, 0.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 51.3ms
Speed: 1.9ms preprocess, 51.3ms inference, 0.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 56.1ms
Speed: 1.3ms preprocess, 56.1ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 50.4ms
Speed: 1.4ms preprocess, 50.4ms inf

In [8]:
# Usage - model stdout 삭제
pose_detector = PoseDetector("yolov8n-pose.pt")
pose_detector.detect_pose_video("pose_video_sample.mp4")


0: 384x640 1 person, 96.4ms
Speed: 12.7ms preprocess, 96.4ms inference, 14.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 101.3ms
Speed: 2.5ms preprocess, 101.3ms inference, 1.1ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 51.8ms
Speed: 1.7ms preprocess, 51.8ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 56.8ms
Speed: 1.4ms preprocess, 56.8ms inference, 0.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 61.9ms
Speed: 1.6ms preprocess, 61.9ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 47.4ms
Speed: 1.4ms preprocess, 47.4ms inference, 0.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 46.5ms
Speed: 1.4ms preprocess, 46.5ms inference, 0.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 43.5ms
Speed: 1.8ms preprocess, 43.5ms inference, 0.6ms postprocess per image at shape (1, 3

- 모델 기본 출력문(stdout)은 출력되고, 위험상황 출력문은 출력되지않음
  - 모델 실행 기본 출력문은 나중에 없애보도록 하자!
    
---

1차 모델 실험은 여기까지 진행하고, 랩미팅에서 논의한 내용 토대로 2차 실험 진행!