<a href="https://colab.research.google.com/github/wildlifeai/pepeketua_zooniverse/blob/main/frog_zooniverse.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook contains the scripts to upload photos of Archey's frogs to a Zooniverse project, where researchers label landmarks in the frogs to train ML algorithms.

# Download frog photos

###Add shortcuts to the compressed photos

To download the photos of the frogs into this Google Colab you first need to add shortcuts in your Google drive to the [five zipped folders](https://drive.google.com/file/d/1XXSrATFX1l-J0CUE4m6UfoOBp9zv3XOr/view?usp=sharing) with the photos. 

To add the shortcuts:
* go to the "Shared with me" section in your Google drive,
* find the five zipped folders,
* click on "Add shorcut to Drive" and
* save the shortcuts (we created a folder called "frog_photos" and saved them there).

*Specify* the folder in your Google drive where you saved the shortcuts to the photos (in our case "frog_photos").

In [1]:
dir_shortcuts = "/content/drive/My Drive/frog_photos/"

*If you can't access the five zipped folders please [email Victor](victor@wildlife.ai). 

###Download the photos

To download and unzip the five folders with the photos you will need to grant access to Google file stream. 

The photos will be temporarily stored in your Google Colab disk and will not count towards your Google drive storage.

*Uncomment lines to unzip all five folders (it might take a while though).

In [4]:
import zipfile
from google.colab import drive

drive.mount('/content/drive/')

whareorino_a = zipfile.ZipFile(dir_shortcuts + "whareorino_a.zip", 'r')
whareorino_a.extractall("/tmp")
whareorino_a.close()

whareorino_b = zipfile.ZipFile(dir_shortcuts + "whareorino_b.zip", 'r')
whareorino_b.extractall("/tmp")
whareorino_b.close()

whareorino_c = zipfile.ZipFile(dir_shortcuts + "whareorino_c.zip", 'r')
whareorino_c.extractall("/tmp")
whareorino_c.close()

whareorino_d = zipfile.ZipFile(dir_shortcuts + "whareorino_d.zip", 'r')
whareorino_d.extractall("/tmp")
whareorino_d.close()

pukeokahu = zipfile.ZipFile(dir_shortcuts + "pukeokahu.zip", 'r')
pukeokahu.extractall("/tmp")
pukeokahu.close()

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


KeyboardInterrupt: ignored

###Create SQLite database

Create a database to keep track of the photos uploaded to Zooniverse and the annotations provided by the researchers

In [2]:
# Create the SQLite database and the required tables
import sqlite3

conn = sqlite3.connect('archey_frog.db')
print("Opened database successfully");

# Create a table for information related to each photo 
conn.execute('''
CREATE TABLE IF NOT EXISTS photo(id integer PRIMARY KEY AUTOINCREMENT,
                      filename text NULL,
                      file_path text NULL,
                      capture varchar(255) NULL,
                      frog_id varchar(255) NULL,
                      grid text NULL,
                      zoo_subject varchar(255) NULL);''')

conn.commit()

print("Table created successfully");

conn.close()

Opened database successfully
Table created successfully


In [3]:
import os 
import pandas as pd

# Create a df of the photos found in the tmp folder
data = []
# Loop through each folder in the tmp directory
for grid in os.listdir('../tmp/'):
  if 'Grid' in grid:
    grid_path = '../tmp/' + grid
    # Loop through each subfolder in the 'Grid' directories
    for subfolder in os.listdir(grid_path):
      if 'Individual' in subfolder:
        subfolder_path = grid_path + "/" + subfolder
        # Loop through each individual frog in the "individual frog" directoy
        for ind in os.listdir(subfolder_path):
          if not ind.endswith('db'):
            ind_path = subfolder_path + "/" + ind
            # Loop through each photo of the "individual" frog
            for doc in os.listdir(ind_path):
              #Save information about the photo and the frog
              if not doc.endswith('db'):
                fpath = ind_path + "/" + doc
                capt = doc.split(".",1)[0].replace('_', '-').rsplit("-",1)[1]
                data.append((doc, fpath, capt, ind, grid))

df = pd.DataFrame(data,columns = ['filename', 'file_path', 'capture', 'frog_id', 'grid'])

<bound method NDFrame.head of          filename  ...    grid
0    1_00-756.JPG  ...  Grid A
1    1100-353.jpg  ...  Grid A
2    1100-493.jpg  ...  Grid A
3    1100-221.jpg  ...  Grid A
4    1100-533.jpg  ...  Grid A
..            ...  ...     ...
802  1100-500.jpg  ...  Grid A
803  1001-190.jpg  ...  Grid A
804   1_01-91.JPG  ...  Grid A
805  1_01-154.jpg  ...  Grid A
806  1_01-162.jpg  ...  Grid A

[807 rows x 5 columns]>


In [59]:
data

[('1_00-756.JPG',
  '../tmp/Grid A/Individual Frogs/220/1_00-756.JPG',
  '756',
  '220',
  'Grid A'),
 ('1100-353.jpg',
  '../tmp/Grid A/Individual Frogs/83/1100-353.jpg',
  '353',
  '83',
  'Grid A'),
 ('1100-493.jpg',
  '../tmp/Grid A/Individual Frogs/83/1100-493.jpg',
  '493',
  '83',
  'Grid A'),
 ('1100-221.jpg',
  '../tmp/Grid A/Individual Frogs/83/1100-221.jpg',
  '221',
  '83',
  'Grid A'),
 ('1100-533.jpg',
  '../tmp/Grid A/Individual Frogs/83/1100-533.jpg',
  '533',
  '83',
  'Grid A'),
 ('1100-460.jpg',
  '../tmp/Grid A/Individual Frogs/83/1100-460.jpg',
  '460',
  '83',
  'Grid A'),
 ('1100-522.jpg',
  '../tmp/Grid A/Individual Frogs/83/1100-522.jpg',
  '522',
  '83',
  'Grid A'),
 ('1100-393.jpg',
  '../tmp/Grid A/Individual Frogs/83/1100-393.jpg',
  '393',
  '83',
  'Grid A'),
 ('1100-602.jpg',
  '../tmp/Grid A/Individual Frogs/83/1100-602.jpg',
  '602',
  '83',
  'Grid A'),
 ('1100-208.jpg',
  '../tmp/Grid A/Individual Frogs/83/1100-208.jpg',
  '208',
  '83',
  'Grid A')

In [None]:
movies_df = db_utils.download_csv_from_google_drive(movies_file_id)

    # Include server's path of the movie files
    movies_df["Fpath"] = movies_path + "/" + movies_df["FilenameCurrent"] + ".mov"

    # Standarise the filename
    movies_df["FilenameCurrent"] = movies_df["FilenameCurrent"].str.normalize("NFD")
    
    # Set up sites information
    sites_db = movies_df[
        ["SiteDecription", "CentroidLat", "CentroidLong"]
    ].drop_duplicates("SiteDecription")

    # Add values to sites table
    db_utils.add_to_table(
        db_path, "sites", [(None,) + tuple(i) + (None,) for i in sites_db.values], 5
    )

In [11]:
# Utility functions for common database operations

def create_connection(db_file):
    """ create a database connection to the SQLite database
        specified by db_file
    :param db_file: database file
    :return: Connection object or None
    """
    conn = None
    try:
        conn = sqlite3.connect(db_file)
        conn.execute("PRAGMA foreign_keys = 1")
        return conn
    except sqlite3.Error as e:
        print(e)

    return conn


def insert_many(conn, data, table, count):
    """
    Insert multiple rows into table
    :param conn: the Connection object
    :param data: data to be inserted into table
    :param table: table of interest
    :param count: number of fields
    :return:
    """

    values = (1,) * count
    values = str(values).replace("1", "?")

    cur = conn.cursor()
    cur.executemany(f"INSERT INTO {table} VALUES {values}", data)


def retrieve_query(conn, query):
    """
    Execute SQL query and returns output
    :param conn: the Connection object
    :param query: a SQL query
    :return:
    """
    try:
        cur = conn.cursor()
        cur.execute(query)
    except sqlite3.Error as e:
        print(e)

    rows = cur.fetchall()

    return rows


def execute_sql(conn, sql):
    """ Execute multiple SQL statements without return
    :param conn: Connection object
    :param sql: a string of SQL statements
    :return:
    """
    try:
        c = conn.cursor()
        c.executescript(sql)
    except sqlite3.Error as e:
        print(e)


def add_to_table(db_path, table_name, values, num_fields):

    conn = create_connection(db_path)

    try:
        insert_many(
            conn, values, table_name, num_fields,
        )
    except sqlite3.Error as e:
        print(e)

    conn.commit()

    print(f"Updated {table_name}")