# Step 8: Machine Learning/Deep Learning Prototype

## Learning Objective

- Find a machine learning or deep learning approach that works for the problem to be solved.
The approach taken was: 
1. Collect dataset videos and their annotations
2. Overlay annotations bounding boxes over dataset videos in order to learn more about the dataset
3. Utilize YOLO algorithm with Darknet's ANN weights to detect objects in an image.  
4. Utilize Deepsort algorithm to track objects image giving them a unique ID.  
5. Compare to YOLO's detection bounding boxes with the ground truth annotations using IOU and mean Average Precision indication.
6. Add a velocity computation to the objects in the image based on motion of the bounding boxes between video frames

- Implement a prototype of the approach in a Jupyter notebook. 
This notebook

- Demonstrate the viability of the approach.
The output of this notebook shows that the approach is viable and can produce good tracking for objects in images using the Yolo/Deepsort algorithms.  

### Prerequisites
- Python 3.9 
- Tensorflow - installed via Anaconda (https://www.anaconda.com/products/individual)
- Download yolov4.weights file: https://drive.google.com/open?id=1cewMfusmPjYWbrnuJRuKhPMwRe_b9PaT
and place in a "../yolo/weights/" folder
- opencv
- This notebook calls files/folders that are one folder above it. For it work, the entire git repo (https://github.com/zyerusha/video_velocity_finder) must be checked out as it uses submodules containing forked algorithms. 

If needed, set "get_requirements = True" to install necessary requirements used:

In [16]:
get_requirements = False
if (get_requirements == True):
    !pip install -r ../requirements.txt

Changing directory to top directory of this project:

In [17]:
from sys import path
os.chdir(os.path.dirname(path[0]))

Copy folders & files that needed to run this project into a tmp folder and changing working directory to the tmp:

In [18]:
# using custom class for this project:
from utils.folder_utils import FolderUtils
from shutil import copy2

# copy folders that will be used by the algorithm
FolderUtils.CopyFolders('tensorflow_yolov4_tflite/data/', 'tmp/data') 
FolderUtils.CopyFolders('tensorflow_yolov4_tflite/core/', 'tmp/core') 
FolderUtils.CopyFolders('yolov4-deepsort/model_data/','tmp/model_data') 
FolderUtils.CopyFolders('utils','tmp/utils') 
FolderUtils.CopyFolders('deep_sort/deep_sort','tmp/deep_sort/deep_sort') 
FolderUtils.CopyFolders('deep_sort/tools','tmp/tools') 

# remember to download weights from:
# here: # https://drive.google.com/open?id=1cewMfusmPjYWbrnuJRuKhPMwRe_b9PaT
# and place them here: 'yolo/weights/yolov4.weights' 
FolderUtils.CopyFolders('yolo/weights','tmp/weights') 

copy2('yolo/darknet/data/coco.names', 'tmp/coco.names')
copy2('tensorflow_yolov4_tflite/save_model.py', 'tmp/save_model.py')
os.chdir('tmp')

Copied folder tensorflow_yolov4_tflite/data/ --> tmp/data
Copied folder tensorflow_yolov4_tflite/core/ --> tmp/core
Copied folder yolov4-deepsort/model_data/ --> tmp/model_data
Copied folder utils --> tmp/utils
Copied folder deep_sort/deep_sort --> tmp/deep_sort/deep_sort
Copied folder deep_sort/tools --> tmp/tools
Copied folder yolo/weights --> tmp/weights


Importing necessary libraries:

In [19]:

import os
from yaml import SafeLoader, load
import pandas as pd
import os
import cv2


# custom classes developed for this project:
from utils.deepsort_yolo import DeepsortYolo
from utils.video_utils import VideoUtils
from utils.bbox_utils import Bbox

Variables Setup: 

In [20]:
start_time = 0  #  specify the start time [sec] of the video to use  
video_duration = None # this will process the entire length for the video
# Comment out or set if disired a specific video duration 
video_duration = 1 # [sec]
video_name = 'VIRAT_S_050000_07_001014_001126' # set which dataset to use

In [21]:
# setup
dataset_dir_path = '../../datasets/VIRAT/'  # top directory where the dataset is located 
video_src_path = dataset_dir_path + 'Videos/Ground/' # location where videos are stored

video_ext = '.mp4'  
video_name_orig = video_name + video_ext
video_dest_path = './../processed/' +  video_name + '/'  # location where to place processed videos/data


using_yml = True

# annotations
saved_csv = video_dest_path + 'df_bbox.csv'

yml_video_name = 'gt'
yolo_video_name = 'yolo'

annotations_path = dataset_dir_path + 'viratannotations/train/' + video_name +'/'
# annotations_path = dataset_dir_path + 'viratannotations/validate/' + video_name +'/'
ann_activities_file = annotations_path + video_name + '.activities.yml'
ann_geom_file = annotations_path + video_name + '.geom.yml'
ann_regions_file = annotations_path + video_name + '.regions.yml'
ann_types_file = annotations_path + video_name + '.types.yml'


src_video = os.path.join(video_src_path, video_name_orig)
gt_video_name = yml_video_name + '_' 
yolo_video_name = yolo_video_name + '_'
video_max_frames = 2000

The following code loads a dataframe with annotation data from either a yaml or from a csv file. The csv is created after the first yaml reading on the data. This provides  faster loading of data on a rerun of this notebook.

In [22]:
# using annotations:
print(f"Loading annotations for {video_name_orig}...")
def add_category_type(row):
  id = row['object_id']
  val = type_df.loc[type_df['id'] == id, 'category'].iloc[0]
  return val


if os.path.exists(saved_csv):
  df_bbox = pd.read_csv(saved_csv)
else:
    # Read categories from annotation file
    with open(ann_types_file) as yaml_file:
        yaml_contents = load(yaml_file, Loader=SafeLoader)
    yaml_df = pd.json_normalize(yaml_contents)
    yaml_df
    for col in yaml_df.columns:
        type_name = col.split('.')[-1]
        if not (type_name == 'id1'):
            yaml_df.loc[yaml_df[col] == 1, col] = type_name
    
    yaml_df = yaml_df[yaml_df['types.id1'].notna()].reset_index().dropna(axis=1, how='all')  
    type_df = yaml_df.ffill(axis=1).iloc[:,-1].to_frame(name='category')
    type_df.insert(0, "id", yaml_df['types.id1'])

    # Read bounding boxes from annotation file
    with open(ann_geom_file) as yaml_file:
        yaml_contents = load(yaml_file, Loader=SafeLoader)
    yaml_df = pd.json_normalize(yaml_contents)

    df_bbox = yaml_df[['geom.id1','geom.ts0','geom.ts1','geom.g0']].dropna().reset_index()
    df_bbox.rename(columns={'geom.id1': 'object_id', 'geom.ts0': 'frame_id','geom.ts1': 'time_sec', 'geom.g0': 'bbox'}, inplace=True)
    df_bbox['bbox'] = df_bbox['bbox'].str.split()
    df_tmp = pd.DataFrame(df_bbox['bbox'].to_list(), columns = ['bb_left', 'bb_top', 'bb_right', 'bb_bottom'])
    df_bbox = pd.concat([df_bbox, df_tmp], axis=1).drop(columns=['bbox'])

    df_bbox['category'] = df_bbox.apply(lambda row: add_category_type(row), axis=1) 
    df_bbox.drop(columns=['index'], axis=1, inplace=True)
    df_bbox.to_csv(saved_csv, index = False)
    

df_bbox.head(10)

Loading annotations for VIRAT_S_050000_07_001014_001126.mp4...


Unnamed: 0,object_id,frame_id,time_sec,bb_left,bb_top,bb_right,bb_bottom,category
0,0.0,0.0,0.0,485,743,653,914,Vehicle
1,0.0,1.0,0.033333,489,748,657,919,Vehicle
2,0.0,2.0,0.066667,488,747,656,918,Vehicle
3,0.0,3.0,0.1,488,747,656,918,Vehicle
4,0.0,4.0,0.133333,488,747,656,918,Vehicle
5,0.0,5.0,0.166667,488,746,656,917,Vehicle
6,0.0,6.0,0.2,488,746,656,917,Vehicle
7,0.0,7.0,0.233333,488,746,656,917,Vehicle
8,0.0,8.0,0.266667,488,745,656,916,Vehicle
9,0.0,9.0,0.3,488,745,656,916,Vehicle


Instantiating the classes that will used to process the video

In [23]:
vUtils = VideoUtils() 
deepsortYolo  = DeepsortYolo()

In [24]:
video_in = cv2.VideoCapture(src_video)
if video_in.isOpened():
    fps, total_frames, frame_size = VideoUtils.GetVideoData(video_in)
    start_count, end_count = VideoUtils.GetStartEndCount(fps, total_frames, start_time, video_duration)
    video_duration = int(end_count / fps)
    video_in.release()


Total frames in video: 3351 @ 30 frames/sec


Using https://github.com/zyerusha/tensorflow-yolov4-tflite to convert YOLO weights to tensorflow:


In [25]:
# If not done already, Convert weights to tensorflow model
if not os.path.exists("checkpoints/yolov4"):
    !python save_model.py --model yolov4 --weights ./weights/yolov4.weights --output ./checkpoints/yolov4

Processing video to use YOLO first, then Deepsort. Finally the bounding boxes and data detected on the objects is stored in a csv file. There is no need to run this video processing again if this csv exists.

In [26]:
yolo_filename = os.path.join(video_dest_path, VideoUtils.AddTimestampToName(yolo_video_name, start_time, video_duration))
gt_filename = os.path.join(video_dest_path, VideoUtils.AddTimestampToName(gt_video_name, start_time, video_duration))

tracker_file_csv = yolo_filename + '.csv'
tracker_file_mp4 = yolo_filename + '.mp4'
gt_file_mp4 = gt_filename + '.mp4'
if not os.path.exists(tracker_file_csv):
    model_filename = 'model_data/mars-small128.pb'
    yolo_weights_filename = './checkpoints/yolov4'
    tracker_video_out, trk_bbox = deepsortYolo.ProcessVideo(yolo_weights_filename, model_filename, src_video, video_dest_path, tracker_file_mp4, start_time_sec=start_time, duration_sec=video_duration, save_images=False)
    trk_bbox.to_csv(tracker_file_csv, index=False)

Reload dataframe that was previously stored in a csv.

In [27]:
trk_bbox = pd.read_csv(tracker_file_csv)
trk_bbox.head(10)

Unnamed: 0,Frame,bb_left,bb_top,bb_right,bb_bottom,category,object_id
0,2,921.0,276.0,1028.0,364.0,car,1.0
1,2,797.0,121.0,908.0,195.0,car,2.0
2,2,1069.0,457.0,1180.0,545.0,car,3.0
3,2,1468.0,720.0,1496.0,803.0,person,4.0
4,2,1220.0,647.0,1421.0,830.0,truck,5.0
5,2,505.0,772.0,648.0,908.0,truck,6.0
6,2,979.0,348.0,1126.0,474.0,car,7.0
7,2,453.0,106.0,623.0,293.0,truck,8.0
8,2,873.0,198.0,993.0,290.0,car,9.0
9,2,1144.0,350.0,1170.0,410.0,person,10.0


Adding velocities to dataframe:

In [28]:
from utils.velocity_utils import VelocityUtils
velUtils = VelocityUtils()
df = pd.DataFrame(data=trk_bbox)

# Expirementing with finding the scale factor to map pixel movement to meters:
scale = 1/15
# df_person = df[(df['object_id'] == 4)] 
# human_height_avg = 1.75
# human_pixel_height =(df_person['bb_bottom'].mean() - df_person['bb_top'].mean()) 
# scale = human_height_avg / human_pixel_height
# print(f'human pixel high: {human_pixel_height}, avg human height: {human_height_avg}, scale: {scale}')


df = df.reset_index()
df = df.drop(columns = ['index'])
for id in df['object_id'].unique():
  df = velUtils.AddVelocity(df, id, fps, scale, ['bb_left', 'bb_top', 'bb_right', 'bb_bottom'])

In [29]:
object_id = 13 # select an object id to see it's data
df_sub = pd.DataFrame(df[(df['object_id'] == object_id)])
df_sub.head(10)

Unnamed: 0,Frame,bb_left,bb_top,bb_right,bb_bottom,category,object_id,x,y,dx,dy,vx,vy,vel
24,3,1378.0,1021.0,1658.0,1080.0,car,13.0,1518.0,1050.0,-3.0,-1.0,-6.0,-2.0,6.32
37,4,1370.0,1019.0,1660.0,1080.0,car,13.0,1515.0,1049.0,-3.0,-1.0,-6.0,-2.0,6.32
49,5,1361.0,1017.0,1656.0,1080.0,car,13.0,1508.0,1048.0,-7.0,-1.0,-14.0,-2.0,14.14
60,6,1357.0,1016.0,1656.0,1080.0,car,13.0,1506.0,1048.0,-2.0,0.0,-4.0,0.0,4.0
71,7,1351.0,1014.0,1658.0,1080.0,car,13.0,1504.0,1047.0,-2.0,-1.0,-4.0,-2.0,4.47
83,8,1346.0,1012.0,1661.0,1080.0,car,13.0,1503.0,1046.0,-1.0,-1.0,-2.0,-2.0,2.83
95,9,1343.0,1009.0,1667.0,1080.0,car,13.0,1505.0,1044.0,2.0,-2.0,4.0,-4.0,5.66
108,10,1333.0,1006.0,1666.0,1080.0,car,13.0,1499.0,1043.0,-6.0,-1.0,-12.0,-2.0,12.17
121,11,1322.0,1004.0,1657.0,1080.0,car,13.0,1489.0,1042.0,-10.0,-1.0,-20.0,-2.0,20.1
134,12,1317.0,1002.0,1658.0,1080.0,car,13.0,1487.0,1041.0,-2.0,-1.0,-4.0,-2.0,4.47


Comparing ground truth bounding boxes to those produced by YOLO.

In [30]:
selected_category = 'Vehicle' # set a specific category to look for in the ground truth data
df_bbox_filtered = df_bbox[df_bbox['category'] == selected_category]
gt_video_out, bbox_gt = vUtils.AnnotateVideo(video_dest_path, tracker_file_mp4, gt_file_mp4, df_bbox_filtered, start_time_sec=start_time, duration_sec=video_duration, save_images=False)
df = pd.DataFrame(columns=['frame idx', 'gt bbox', 'pred bbox', 'iou'])
for i_frame in range(len(bbox_gt)): #scan all frames in video
    data = []
    for bb_gt in bbox_gt[i_frame]:
        df_sub = trk_bbox[trk_bbox["Frame"] == i_frame]
        found, info = Bbox.GetMaxCorrelation(bb_gt, df_sub, i_frame)
        if(found):
            data.append(info)
    
    df_frame = pd.DataFrame(data, columns=['frame idx', 'gt bbox', 'pred bbox', 'iou'])
    df = pd.concat([df, df_frame], ignore_index=True)



mPA = df['iou'].sum()/len(df)
print(f'mean Average Precision(mPA): {mPA}')
df.head(10)

Total frames in video: 31 @ 30 frames/sec
31 0 30
Created frame id  0, 0.00 sec in video; completed:  0.0 %
Created frame id 25, 0.83 sec in video; completed:  83.3 %
Done: Created video: ./../processed/VIRAT_S_050000_07_001014_001126/0-1_gt_.mp4
mean Average Precision(mPA): 0.7506638894597608


Unnamed: 0,frame idx,gt bbox,pred bbox,iou
0,2,"[488, 747, 656, 918]","[505.0, 772.0, 648.0, 908.0]",0.678684
1,2,"[433, 104, 644, 332]","[453.0, 106.0, 623.0, 293.0]",0.66219
2,2,"[792, 117, 914, 202]","[797.0, 121.0, 908.0, 195.0]",0.794101
3,2,"[872, 203, 994, 298]","[873.0, 198.0, 993.0, 290.0]",0.85781
4,2,"[917, 277, 1032, 365]","[921.0, 276.0, 1028.0, 364.0]",0.911043
5,2,"[981, 352, 1128, 469]","[979.0, 348.0, 1126.0, 474.0]",0.905212
6,2,"[1055, 441, 1187, 557]","[1069.0, 457.0, 1180.0, 545.0]",0.640576
7,2,"[1108, 510, 1240, 626]","[1113.0, 524.0, 1238.0, 613.0]",0.728745
8,2,"[1235, 658, 1419, 814]","[1220.0, 647.0, 1421.0, 830.0]",0.781452
9,3,"[488, 747, 656, 918]","[505.0, 773.0, 648.0, 909.0]",0.678684
