Welcome to my first ML project using Google Cloud!

I'm looking to build an action recognition model that detects the different swimming strokes using a CNN-GRU and Long Short Term Memory model and use research to support the strongest model possible. I'll build by own model for each and then compare and contrast the model using transfer learning.

The purpose of this project is to exercise my ML knowledge and get outside my comfort zone.

Here's what I hope to learn:
1.Building some data pipelines and API's
2.Merging two datasets together and modeling the data appropriately
3.Building my own Computer Vision Models using research and reading papers and learning to evaluate each against some pre-built models
4.Learn some basic functionality of Google Cloud as it pertains to machine learning. 
5.Sharpen my overall skills as a Data Scientist and Overcome any bumps along the way.


Steps to success:
1.Download and Store the Data In Google Cloud
2.Clean and Model the Data For Success
3.Research how to build proper LSTM and CNN-GRU models
4.Find the appropriate transfer learning models to compare with
5.Evaluate the models and optimize parameters to find the most efficient models

In [1]:
!pip install tensorflow

Collecting protobuf<3.20,>=3.9.2
  Using cached protobuf-3.19.6-cp39-cp39-macosx_10_9_x86_64.whl (980 kB)
Installing collected packages: protobuf
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.20.3
    Uninstalling protobuf-3.20.3:
      Successfully uninstalled protobuf-3.20.3
Successfully installed protobuf-3.19.6


In [36]:
pip install git+https://github.com/pytube/pytube

Collecting git+https://github.com/pytube/pytube
  Cloning https://github.com/pytube/pytube to /private/var/folders/kb/fxsdzvkd3tn7qtds435g0l880000gn/T/pip-req-build-egc7_mtx
  Running command git clone --filter=blob:none --quiet https://github.com/pytube/pytube /private/var/folders/kb/fxsdzvkd3tn7qtds435g0l880000gn/T/pip-req-build-egc7_mtx
  Resolved https://github.com/pytube/pytube to commit d3d18691b3e99b2d3b4d620446b088a1c32896c6
  Preparing metadata (setup.py) ... [?25ldone
[?25hNote: you may need to restart the kernel to use updated packages.


In [37]:
import tensorflow as tf
from tensorflow import keras

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import os
import pytube

In [18]:
#Import csv files for downloading the data
train_df = pd.read_csv('train.csv')
val_df = pd.read_csv('validate.csv')

In [21]:
#Create Swimming Labels List
swimming_labels = ['swimming backstroke', 'swimming butterfly stroke', 
                   'swimming breast stroke', 'swimming front crawl']

#Reset DF's so it's just the swimming videos
train_df = train_df[train_df.label.isin(swimming_labels)]
val_df = val_df[val_df.label.isin(swimming_labels)]

In [39]:
#remove dashes
train_df['youtube_id'] = train_df['youtube_id'].str.lstrip('-')
val_df['youtube_id'] = val_df['youtube_id'].str.lstrip('-')

df = pd.concat([train_df, val_df])

In [None]:
#Create directory
os.makedirs("Training Videos", exist_ok = True)
os.makedirs("Validation Videos", exist_ok = True)

for index, row in df.iterrows():
    #get the youtube id
    video_id = row['youtube_id']
    
    #create youtube video url
    url = f"https://www.youtube.com/watch?v={video_id}"
    
    if row['split'] == 'train':
        directory = "Training Videos"
    else:
        directory = "Validation Videos"
    
    try:
        youtube = pytube.YouTube(url)
        video = youtube.streams.first()
        video.download(directory, filename = f"{row['label']}_{index}")
        print(f"Downloaded video {index} to {directory}")
    except:
        print(f"Failed to download video {index}")