### 객체탐지

#### 개요
- 딥러닝의 CNN(외 RCNN 등)와 같은 알고리즘을 통해서 물체를 인식하여 표시하는 기술
- 자동차번호판 번호 인식, 화재경보, 교통사고인지, 이상행동파악 등...
- CCTV과 같이 접목해서 활용되는 경우가 아주 많음

#### 필요 라이브러리
- OpenCV - 최초 인텔에서 개발한 오픈소스 실시간 컴퓨터 비전 라이브러리
    - C/C++을 목표로 제작. 크로스 플랫폼    
    - 파이썬에 OpenCV가 적용되면서 활성화!
    - 카메라 인식 산업에서 대부분 사용되고 있음
    - C/C++에서 기본 동작코드 2~300줄이면 파이썬에선 10줄이내로 같은 작업을 할 수 있음

- YOLO(PyTorch)
    - Not You Only Live Once, You Only Look Once! 
    - 손쉽게 사용할 수 있는 실시간 객체 탐시 시스템
    - 2015년에 출시후 현재 2024년 현재 v8.0 
    - OpenCV만 가지고 작업하던 걸, YOLO로 넘어가는 추세

In [1]:
!pip install opencv-python

Collecting opencv-python


[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip



  Downloading opencv_python-4.10.0.84-cp37-abi3-win_amd64.whl.metadata (20 kB)
Downloading opencv_python-4.10.0.84-cp37-abi3-win_amd64.whl (38.8 MB)
   ---------------------------------------- 0.0/38.8 MB ? eta -:--:--
   ---------------------------------------- 0.1/38.8 MB 1.7 MB/s eta 0:00:24
   ---------------------------------------- 0.4/38.8 MB 5.0 MB/s eta 0:00:08
    --------------------------------------- 0.8/38.8 MB 6.7 MB/s eta 0:00:06
   - -------------------------------------- 1.4/38.8 MB 8.1 MB/s eta 0:00:05
   -- ------------------------------------- 2.0/38.8 MB 8.9 MB/s eta 0:00:05
   -- ------------------------------------- 2.1/38.8 MB 8.9 MB/s eta 0:00:05
   -- ------------------------------------- 2.1/38.8 MB 8.9 MB/s eta 0:00:05
   -- ------------------------------------- 2.1/38.8 MB 8.9 MB/s eta 0:00:05
   -- ------------------------------------- 2.1/38.8 MB 8.9 MB/s eta 0:00:05
   -- ------------------------------------- 2.1/38.8 MB 8.9 MB/s eta 0:00:05
   -- ----

In [1]:
## Window, Mac 차이가 없음
## Raspbarry Pi는 최선버전에서 사용법이 변경되었음.
import cv2

In [2]:
## 이미지 로드
## 사막여우 == Fennec Fox
img = cv2.imread('./fennec_fox.png')

cv2.imshow('Fox', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

In [5]:
## 현재 웹캠이 동작 안함
video_path = './Mumbai_traffic.mp4'

cap = cv2.VideoCapture(video_path) # 0~숫자는 카메라번호
cap.set(3, 640)
cap.set(4, 480)

while (cap.isOpened()):  ## True => (cap.isOpened())
    ret, img = cap.read() # 실시간으로 화면을 캡쳐 ret(결과정보객체), img(실시간이미지)
    if ret == True:
        cv2.imshow('youtube mpeg', img) ## 내부적으로 PyQt로 생성되는 GUI창

        if cv2.waitKey(1) == ord('q'): # 키보드 q를 클릭하면
            break
    else:
        break

cap.release() # 자원 해제
cv2.destroyAllWindows()

##### 이미지 처리

In [8]:
img = cv2.imread('./fennec_fox.png')

cv2.imshow('Original', img) ## 일반 이미지
# cv2.waitKey(0) 
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
height, width = img.shape[0], img.shape[1]
## 정수입력 width/2 => float 문제
half_img = cv2.resize(gray, (int(width/2), int(height/2)))
# cv2.imshow('Gray', gray) ## 흑백 변환
cv2.imshow('half', half_img)
cv2.waitKey(0) 

cv2.destroyAllWindows()

In [4]:
video_path = './Mumbai_traffic.mp4'

cap = cv2.VideoCapture(video_path) # 0~숫자는 카메라번호
cap.set(3, 640)
cap.set(4, 480)

while (cap.isOpened()):   
    ret, img = cap.read() # 실시간으로 화면을 캡쳐 ret(결과정보객체) 보통 사용하지 않아서 _로 변경, img(실시간이미지)
    if ret == True:
        # cv2.imshow('youtube mpeg', img) ## 내부적으로 PyQt로 생성되는 GUI창
        height, width = img.shape[0], img.shape[1]
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  ## 컬러 -> 흑백으로
        half = cv2.resize(gray, (int(width/2), int(height/2))) ## 사이즈를 반으로 축소
        cv2.imshow('youtube gray', half)

        if cv2.waitKey(1) == ord('q'): # 키보드 q를 클릭하면
            break
    else:
        break

cap.release() # 자원 해제
cv2.destroyAllWindows()

- 포토샵등의 이미지, 프리미어등의 동영상 처리하는 프로그램에서 사용하는 거의 대부분의 기능이 OpenCV에 포함되어 있음

In [6]:
video_path = './mbc_news.mp4'

faceCascade = cv2.CascadeClassifier('./haarcascade_frontalface_default.xml')
cap = cv2.VideoCapture(video_path) # 0~숫자는 카메라번호

while (cap.isOpened()):    
    ret, img = cap.read() # 실시간으로 화면을 캡쳐 ret(결과정보객체) 보통 사용하지 않아서 _로 변경, img(실시간이미지)
    if ret == True:
        height, width = img.shape[0], img.shape[1]

        half = cv2.resize(img, (int(width/2), int(height/2)))

        # 얼굴인식
        faces = faceCascade.detectMultiScale(
            half,
            scaleFactor=2.0,
            minNeighbors=5,
            minSize=(10,10)
        )
        ## 찾은 얼굴 위치 표시
        for (x,y,w,h) in faces:
            cv2.rectangle(half,(x,y),(x+w,y+h),(0,255,255), 2)
            roi_color = half[y:y+h, x:x+w]

        cv2.imshow('youtube mpeg', half) ## 내부적으로 PyQt로 생성되는 GUI창

        if cv2.waitKey(1) == ord('q'): # 키보드 q를 클릭하면
            break
    
    else:
        break

cap.release() # 자원 해제
cv2.destroyAllWindows()

##### YOLO
- You Only Look Once - CNN을 기반으로 한 물체 감지 라이브러리
- https://www.ultralytics.com/ko
- https://github.com/ultralytics/ultralytics

In [7]:
# YOLO 설치
!pip install ultralytics

Collecting ultralytics


[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip



  Downloading ultralytics-8.2.76-py3-none-any.whl.metadata (41 kB)
     ---------------------------------------- 0.0/41.3 kB ? eta -:--:--
     --------- ------------------------------ 10.2/41.3 kB ? eta -:--:--
     ---------------------------- --------- 30.7/41.3 kB 325.1 kB/s eta 0:00:01
     -------------------------------------- 41.3/41.3 kB 398.6 kB/s eta 0:00:00
Collecting pyyaml>=5.3.1 (from ultralytics)
  Downloading PyYAML-6.0.2-cp311-cp311-win_amd64.whl.metadata (2.1 kB)
Collecting py-cpuinfo (from ultralytics)
  Downloading py_cpuinfo-9.0.0-py3-none-any.whl.metadata (794 bytes)
Collecting ultralytics-thop>=2.0.0 (from ultralytics)
  Downloading ultralytics_thop-2.0.0-py3-none-any.whl.metadata (8.5 kB)
Downloading ultralytics-8.2.76-py3-none-any.whl (865 kB)
   ---------------------------------------- 0.0/865.6 kB ? eta -:--:--
   ---------------------------------------- 0.0/865.6 kB ? eta -:--:--
   - -------------------------------------- 30.7/865.6 kB 1.3 MB/s eta 0:00:0

In [8]:
## 콘솔창에서 테스트하는 방법, 트레이닝한 이미지처리 모델
!yolo predict model=yolov8n.pt source='https://ultralytics.com/images/bus.jpg'

Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n.pt to 'yolov8n.pt'...
Ultralytics YOLOv8.2.76 🚀 Python-3.11.5 torch-2.4.0+cu121 CUDA:0 (NVIDIA GeForce GTX 1650, 4096MiB)
YOLOv8n summary (fused): 168 layers, 3,151,904 parameters, 0 gradients, 8.7 GFLOPs

Downloading https://ultralytics.com/images/bus.jpg to 'bus.jpg'...
image 1/1 c:\Sources\Iot-bigdata-2024\day8\bus.jpg: 640x480 4 persons, 1 bus, 1 stop sign, 82.3ms
Speed: 9.6ms preprocess, 82.3ms inference, 28.6ms postprocess per image at shape (1, 3, 640, 480)
Results saved to [1mC:\Sources\Iot-bigdata-2024\runs\detect\predict[0m
💡 Learn more at https://docs.ultralytics.com/modes/predict



  0%|          | 0.00/6.25M [00:00<?, ?B/s]
 18%|█▊        | 1.12M/6.25M [00:00<00:00, 11.7MB/s]
 38%|███▊      | 2.38M/6.25M [00:00<00:00, 11.7MB/s]
 58%|█████▊    | 3.62M/6.25M [00:00<00:00, 11.8MB/s]
 76%|███████▌  | 4.75M/6.25M [00:00<00:00, 11.7MB/s]
 94%|█████████▍| 5.88M/6.25M [00:00<00:00, 11.5MB/s]
100%|██████████| 6.25M/6.25M [00:00<00:00, 11.6MB/s]

  0%|          | 0.00/134k [00:00<?, ?B/s]
100%|██████████| 134k/134k [00:00<00:00, 4.75MB/s]


In [9]:
from ultralytics import YOLO

In [13]:
# 이미지와 openCV 물체감지
model = YOLO(model='./yolov8n.pt')

result = model('./20190417_194709.jpg')
plots = result[0].plot()
height, width = plots.shape[0], plots.shape[1]
last = cv2.resize(plots, (800, 450))
cv2.imshow('yolo', last)
cv2.waitKey(0)
cv2.destroyAllWindows()


image 1/1 c:\Sources\Iot-bigdata-2024\day8\20190417_194709.jpg: 384x640 1 bottle, 1 cup, 1 bowl, 1 tv, 1 mouse, 22.9ms
Speed: 7.5ms preprocess, 22.9ms inference, 3.0ms postprocess per image at shape (1, 3, 384, 640)


In [23]:
## 실시간 가능, 동영상도 가능
classNames = [
                "person", "bicycle", "car", "motorbike", "airplane", "bus", "train", "truck", "boat",
                "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat",
                "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella",
                "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat",
                "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup",
                "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli",
                "carrot", "hot dog", "pizza", "donut", "cake", "chair", "sofa", "pottedplant", "bed",
                "diningtable", "toilet", "tvmonitor", "laptop", "mouse", "remote", "keyboard", "cell phone",
                "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors",
                "teddy bear", "hair drier", "toothbrush"
              ]

In [17]:
import math

In [32]:
video_path = './Mumbai_traffic.mp4'

cap = cv2.VideoCapture(video_path) # 숫자는 CCTV,웹캠 등 실시간 영상

while (cap.isOpened()):
    ret, img = cap.read()
    if ret == True:
        height, width = img.shape[0], img.shape[1]
        half = cv2.resize(img, (int(width/2), int(height/2)))
        # YOLO로 물체검출 시작
        results = model(half, stream=True)

        ## 결과표시 like OpenCV 얼굴검출
        for result in results:
            ## 아래의 셀의 결과가 간단
            boxes = result.boxes

            for box in boxes:
                x1, y1, x2, y2 = box.xyxy[0]
                x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)

                cv2.rectangle(half, (x1,y1), (x2,y2), (0,255,255), 2) # 검출된 물체 박스그리기

                ## 정확도계산, Class name
                accuracy = (box.conf[0]/100)*100
                index = int(box.cls[0])
                # print(f'ClassName : {classNames[index]} / Accuracy : {accuracy:.2f}') # 콘솔프린트는 생략
                title = f'{classNames[index]}, {accuracy:.2f}'
                ## 박스위에 종류와 정확도출력
                org = [x1, y1] 
                font = cv2.FONT_HERSHEY_SIMPLEX
                fontScale = 0.6
                color = (0,255,255)
                thickness = 2

                cv2.putText(half, title, org, font, fontScale, color, thickness)

        cv2.imshow('YOLOv8', half)

        if cv2.waitKey(1) == ord('q'):
            break
    else:
        break

cap.release()
cv2.destroyAllWindows()


0: 384x640 1 person, 19 cars, 1 bus, 171.9ms
Speed: 3.0ms preprocess, 171.9ms inference, 5.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 19 cars, 1 bus, 19.3ms
Speed: 2.0ms preprocess, 19.3ms inference, 4.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 19 cars, 1 bus, 17.2ms
Speed: 2.0ms preprocess, 17.2ms inference, 4.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 19 cars, 1 bus, 16.3ms
Speed: 2.0ms preprocess, 16.3ms inference, 3.1ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 18 cars, 1 bus, 15.2ms
Speed: 2.0ms preprocess, 15.2ms inference, 4.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 16 cars, 1 bus, 18.5ms
Speed: 2.6ms preprocess, 18.5ms inference, 2.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 18 cars, 1 bus, 15.4ms
Speed: 2.4ms preprocess, 15.4ms inference, 3.0ms postprocess per image at shape (1, 3, 384, 640)

0

In [37]:
video_path = './Mumbai_traffic.mp4'

cap = cv2.VideoCapture(video_path) # 숫자는 CCTV,웹캠 등 실시간 영상

while (cap.isOpened()):
    ret, img = cap.read()
    if ret == True:
        height, width = img.shape[0], img.shape[1]
        half = cv2.resize(img, (int(width/2), int(height/2)))
        results = model(half, stream=True)

        for result in results:
            last = result.plot()

        cv2.imshow('YOLOv8', last)

        if cv2.waitKey(1) == ord('q'):
            break
    else:
        break

cap.release()
cv2.destroyAllWindows()


0: 384x640 1 person, 19 cars, 1 bus, 175.9ms
Speed: 4.0ms preprocess, 175.9ms inference, 4.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 19 cars, 1 bus, 23.1ms
Speed: 2.0ms preprocess, 23.1ms inference, 6.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 19 cars, 1 bus, 14.7ms
Speed: 3.0ms preprocess, 14.7ms inference, 2.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 19 cars, 1 bus, 18.4ms
Speed: 1.5ms preprocess, 18.4ms inference, 3.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 18 cars, 1 bus, 15.5ms
Speed: 2.0ms preprocess, 15.5ms inference, 5.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 16 cars, 1 bus, 19.1ms
Speed: 1.0ms preprocess, 19.1ms inference, 4.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 18 cars, 1 bus, 15.9ms
Speed: 2.0ms preprocess, 15.9ms inference, 4.0ms postprocess per image at shape (1, 3, 384, 640)

0

In [38]:
video_path = './Mumbai_traffic.mp4'

cap = cv2.VideoCapture(video_path)
out = cv2.VideoWriter('./Mumbai_traffic_result.mp4', fourcc=1446269005, fps=30, frameSize=(640,360))

while (cap.isOpened()):
    ret, img = cap.read()
    if ret == True:
        height, width = img.shape[0], img.shape[1]
        half = cv2.resize(img, (int(width/2), int(height/2)))
        results = model(half, stream=True)

        for result in results:
            last = result.plot()

        cv2.imshow('YOLOv8', last)
        out.write(last)

        if cv2.waitKey(1) == ord('q'):
            break
    else:
        break

cap.release()
out.release()
cv2.destroyAllWindows()


0: 384x640 1 person, 19 cars, 1 bus, 51.9ms
Speed: 2.0ms preprocess, 51.9ms inference, 4.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 19 cars, 1 bus, 14.5ms
Speed: 3.1ms preprocess, 14.5ms inference, 2.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 19 cars, 1 bus, 16.3ms
Speed: 1.4ms preprocess, 16.3ms inference, 5.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 19 cars, 1 bus, 17.1ms
Speed: 2.5ms preprocess, 17.1ms inference, 2.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 18 cars, 1 bus, 18.7ms
Speed: 3.0ms preprocess, 18.7ms inference, 3.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 16 cars, 1 bus, 16.5ms
Speed: 3.0ms preprocess, 16.5ms inference, 4.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 18 cars, 1 bus, 15.5ms
Speed: 3.5ms preprocess, 15.5ms inference, 3.0ms postprocess per image at shape (1, 3, 384, 640)

0: 