# SlowFast

**SlowFast networks pretrained on the Kinetics 400 dataset**

CCTV 영상을 통한 범죄 인식을 위한 데이터 전처리 과정을 진행하는 코드로

필요한 패키지와 라이브러리를 설치합니다.
PyAV라는 라이브러리를 설치하여 비디오 프레임 추출에 사용합니다.
Kaggle 데이터 다운로드

Kaggle에서 'crimeucfdataset' 데이터셋을 다운로드합니다.
데이터셋은 ZIP 형식으로 다운로드되므로 압축을 풉니다.
비디오에서 프레임 추출

video_to_frame: OpenCV를 이용하여 비디오에서 프레임을 추출하는 함수. (이 코드에서는 사용되지 않음)
extract_frames: PyAV를 이용하여 비디오에서 프레임을 추출하는 함수.
Anomaly 데이터 프레임 추출

Anomaly 데이터셋의 두 파트에서 모든 비디오를 순회하며 각 비디오에서 프레임을 추출합니다.
각 프레임은 JPEG 형식으로 저장됩니다.
Normal 데이터 프레임 추출

Normal 데이터셋에서 비디오를 순회하며 프레임을 추출합니다.
데이터 전처리

preprocess_data:
주어진 경로에서 이미지 데이터를 순회합니다.
각 비디오에 대해 프레임을 일정 간격으로 선택하여 하나의 이미지로 합칩니다. 이때, 한 이미지에 16개의 프레임이 합쳐집니다.
각 비디오는 총 10개의 이러한 합쳐진 이미지를 생성합니다. 이는 프레임 선택의 시작 지점을 0-9까지 변화시켜 10배의 샘플을 생성하기 위함입니다.
요약하면, 이 코드는 주어진 CCTV 영상 데이터셋에서 프레임을 추출하고, 특정 프레임들을 선택하여 이미지 데이터를 증강시키는 데이터 전처리 작업을 수행합니다.

### 환경설정[Mount Google drive]

In [None]:
from google.colab import drive

In [None]:
drive.mount('/content/drive')

ValueError: ignored

In [None]:
%cd "/content/drive/MyDrive/ds_study/Final_team_3/Code/Taebin2"

/content/drive/MyDrive/ds_study/Final_team_3/Code/Taebin2


In [None]:
from google.colab import drive
drive.mount('/content/drive')

### 라이브러리 설치

In [None]:
!pip install fvcore
!pip install albumentations
!pip install albumentations.pytorch
!pip install transformers
!pip install tokenizers
!pip install pytorchvideo

[31mERROR: Could not find a version that satisfies the requirement albumentations.pytorch (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for albumentations.pytorch[0m[31m


### Imports

Load the model:

In [None]:
import torch
import json
import random
import numpy as np
import glob2
import cv2
import os
import math
import pandas as pd
from PIL import Image
from argparse import Namespace
from tqdm.auto import tqdm

import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, IterableDataset
import albumentations
import albumentations.pytorch
from transformers.optimization import AdamW, get_cosine_schedule_with_warmup
from transformers import set_seed
import pytorchvideo.models.hub as pyvideo# Choose the `slowfast_r50` model
model = torch.hub.load('facebookresearch/pytorchvideo', 'slowfast_r50', pretrained=True)

Using cache found in /root/.cache/torch/hub/facebookresearch_pytorchvideo_main


### Kaggle에서 'crimeucfdataset' 데이터셋을 다운로드

In [None]:
#Upload kaggle.json file
!pip install -i https://test.pypi.org/simple/ supportlib
import supportlib.gettingdata as getdata
getdata.kaggle()
!kaggle datasets download -d mission-ai/crimeucfdataset


Looking in indexes: https://test.pypi.org/simple/


Saving kaggle.json to kaggle.json
Downloading crimeucfdataset.zip to /content/drive/MyDrive/ds_study/Final_team_3/Code/Taebin2
100% 32.9G/32.9G [09:06<00:00, 74.0MB/s]
100% 32.9G/32.9G [09:06<00:00, 64.6MB/s]


In [None]:
getdata.zipextract('/content/drive/MyDrive/ds_study/Final_team_3/Code/Taebin2/crimeucfdataset.zip')

### Imports

In [None]:
!sudo apt-get install -y python-dev pkg-config
!sudo apt-get install -y \
    libavformat-dev libavcodec-dev libavdevice-dev \
    libavutil-dev libswscale-dev libswresamp  le-dev libavfilter-dev
!pip install av

import av
import glob
import os
import time
import tqdm
import datetime
import argparse
import cv2
from tqdm.autonotebook import tqdm
from concurrent.futures import ThreadPoolExecutor

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package python-dev is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
  python2-dev python2 python-dev-is-python3

E: Package 'python-dev' has no installation candidate
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package libswresamp
E: Unable to locate package le-dev


### 비디오에서 프레임 추출

In [None]:
def video_to_frame(path, out_path):
    vidcap = cv2.VideoCapture(path)
    success, image = vidcap.read()
    count = 0
    while success:
        cv2.imwrite(os.path.join(out_path, "{}.jpg".format(count)), image)
        success, image = vidcap.read()
        count += 1

In [None]:
def extract_frames(video_path):
    frames = []
    video = av.open(video_path)
    for frame in video.decode(0):
        yield frame.to_image()

In [None]:
def process_videos(video_folder, result_folder):
    for i in tqdm(os.listdir(video_folder)):
        p1 = os.path.join(video_folder, i)
        r1 = os.path.join(result_folder, i)
        if os.path.exists(r1):
            continue
        os.makedirs(r1, exist_ok=True)
        for j in os.listdir(p1):
            vid_path = os.path.join(p1, j)
            r2 = os.path.join(r1, j[:-4])
            os.makedirs(r2, exist_ok=True)
            for k, frame in enumerate(extract_frames(vid_path)):
                frame.save(os.path.join(r2, f"{k}.jpg"))

In [None]:
# Anomaly videos part1
process_videos('/content/drive/MyDrive/ds_study/Final_team_3/Code/Taebin2/Anomaly_Dataset/Anomaly_Videos/Anomaly-Videos-Part-1',
               '/content/drive/MyDrive/ds_study/Final_team_3/Code/Taebin2/Dataset')

  0%|          | 0/4 [00:00<?, ?it/s]

In [None]:
# Anomaly videos part2
process_videos('/content/drive/MyDrive/ds_study/Final_team_3/Code/Taebin2/Anomaly_Dataset/Anomaly_Videos/Anomaly-Videos-Part-2',
               '/content/drive/MyDrive/ds_study/Final_team_3/Code/Taebin2/Dataset')

  0%|          | 0/3 [00:00<?, ?it/s]

In [None]:
# Normal class
process_videos('/content/drive/MyDrive/ds_study/Final_team_3/Code/Taebin2/Anomaly_Dataset/Anomaly_Videos/Anomaly-Videos-Part-1',
               '/content/drive/MyDrive/ds_study/Final_team_3/Code/Taebin2/Dataset/normal')

  0%|          | 0/4 [00:00<?, ?it/s]

In [None]:
path = '/content/drive/MyDrive/ds_study/Final_team_3/Code/Taebin2/Dataset'
res = '/content/drive/MyDrive/ds_study/Final_team_3/Code/Taebin2/crime16'
#Number
seq_length = 16

In [None]:
def preprocess_data(seq_length, path, res):
    dir = os.listdir(path)
    for i in tqdm(dir):
        p1 = os.path.join(path, i)
        r1 = os.path.join(res, i)
        os.makedirs(r1, exist_ok=True)
        for j in os.listdir(p1):
            p2 = os.path.join(p1, j)
            r2 = os.path.join(r1, j)

            skip_length = int(len(os.listdir(p2)) / seq_length)

            for m in range(10):
                k = m * skip_length
                l = 0
                img1 = None
                while (l < seq_length):
                    p3 = os.path.join(p2, str(k) + ".jpg")
                    img = cv2.imread(p3)

                    # 이미지 로딩 체크
                    if img is None:
                        print(f"Failed to load {p3}")
                    else:
                        img = cv2.resize(img, (128, 128))
                        if img1 is None:
                            img1 = img
                        else:
                            # 이미지 합치기
                            img1 = np.hstack((img1, img))
                        l += 1

                    k += skip_length

                if img1 is not None:
                    cv2.imwrite(os.path.join(r2, f"{m}.jpg"), img1)


Import remaining functions:

In [None]:
from typing import Dict
import json
import urllib
from torchvision.transforms import Compose, Lambda
from torchvision.transforms._transforms_video import (
    CenterCropVideo,
    NormalizeVideo,
)
from pytorchvideo.data.encoded_video import EncodedVideo
from pytorchvideo.transforms import (
    ApplyTransformToKey,
    ShortSideScale,
    UniformTemporalSubsample,
    UniformCropVideo
)



#### Setup

Set the model to eval mode and move to desired device.

In [None]:
# Set to GPU or CPU
device = "cpu"
model = model.eval()
model = model.to(device)

Download the id to label mapping for the Kinetics 400 dataset on which the torch hub models were trained. This will be used to get the category label names from the predicted class ids.

In [None]:
json_url = "https://dl.fbaipublicfiles.com/pyslowfast/dataset/class_names/kinetics_classnames.json"
json_filename = "kinetics_classnames.json"
try: urllib.URLopener().retrieve(json_url, json_filename)
except: urllib.request.urlretrieve(json_url, json_filename)

In [None]:
with open(json_filename, "r") as f:
    kinetics_classnames = json.load(f)

# Create an id to label name mapping
kinetics_id_to_classname = {}
for k, v in kinetics_classnames.items():
    kinetics_id_to_classname[v] = str(k).replace('"', "")

#### Define input transform

In [None]:
side_size = 256
mean = [0.45, 0.45, 0.45]
std = [0.225, 0.225, 0.225]
crop_size = 256
num_frames = 32
sampling_rate = 2
frames_per_second = 30
slowfast_alpha = 4
num_clips = 10
num_crops = 3

class PackPathway(torch.nn.Module):
    """
    Transform for converting video frames as a list of tensors.
    """
    def __init__(self):
        super().__init__()

    def forward(self, frames: torch.Tensor):
        fast_pathway = frames
        # Perform temporal sampling from the fast pathway.
        slow_pathway = torch.index_select(
            frames,
            1,
            torch.linspace(
                0, frames.shape[1] - 1, frames.shape[1] // slowfast_alpha
            ).long(),
        )
        frame_list = [slow_pathway, fast_pathway]
        return frame_list

transform =  ApplyTransformToKey(
    key="video",
    transform=Compose(
        [
            UniformTemporalSubsample(num_frames),
            Lambda(lambda x: x/255.0),
            NormalizeVideo(mean, std),
            ShortSideScale(
                size=side_size
            ),
            CenterCropVideo(crop_size),
            PackPathway()
        ]
    ),
)

# The duration of the input clip is also specific to the model.
clip_duration = (num_frames * sampling_rate)/frames_per_second

#### Run Inference

Download an example video.

In [None]:
url_link = "https://dl.fbaipublicfiles.com/pytorchvideo/projects/archery.mp4"
video_path = 'archery.mp4'
try: urllib.URLopener().retrieve(url_link, video_path)
except: urllib.request.urlretrieve(url_link, video_path)

Load the video and transform it to the input format required by the model.

In [None]:
# Select the duration of the clip to load by specifying the start and end duration
# The start_sec should correspond to where the action occurs in the video
start_sec = 0
end_sec = start_sec + clip_duration

# Initialize an EncodedVideo helper class and load the video
video = EncodedVideo.from_path(video_path)

# Load the desired clip
video_data = video.get_clip(start_sec=start_sec, end_sec=end_sec)

# Apply a transform to normalize the video input
video_data = transform(video_data)

# Move the inputs to the desired device
inputs = video_data["video"]
inputs = [i.to(device)[None, ...] for i in inputs]

#### Get Predictions

In [None]:
# Pass the input clip through the model
preds = model(inputs)

# Get the predicted classes
post_act = torch.nn.Softmax(dim=1)
preds = post_act(preds)
pred_classes = preds.topk(k=5).indices[0]

# Map the predicted classes to the label names
pred_class_names = [kinetics_id_to_classname[int(i)] for i in pred_classes]
print("Top 5 predicted labels: %s" % ", ".join(pred_class_names))

Top 5 predicted labels: archery, throwing axe, playing paintball, disc golfing, riding or walking with horse
