<a href="https://colab.research.google.com/github/wildlifeai/pepeketua_zooniverse/blob/main/zooniverse_upload.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook contains the scripts to upload photos of Archey's frogs to a Zooniverse project where they can be manually annotated.

#Requirements

### Install required packages

We use the "panoptes_client" package to communicate with Zooniverse. If you don't have it installed, run the command below.

In [None]:
!pip install panoptes_client

### Load required libraries

Load generic libraries

In [None]:
import io, os
import zipfile
import json
import getpass
import pandas as pd
import numpy as np

from google.colab import drive
from datetime import date
from panoptes_client import (
    SubjectSet,
    Subject,
    Project,
    Panoptes,
) 

### Connect to Zooniverse

You need to specify your Zooniverse username and password. Uploading and downloading information from Zooniverse is only accessible to those user with access to the project.

In [None]:
# Your user name and password for Zooniverse. 
zoo_user = getpass.getpass('Enter your Zooniverse user')
zoo_pass = getpass.getpass('Enter your Zooniverse password')

# Connect to Zooniverse with your username and password
auth = Panoptes.connect(username=zoo_user, password=zoo_pass)

if not auth.logged_in:
    raise AuthenticationError("Your credentials are invalid. Please try again.")

# Connect to the Zooniverse project (our frog project # is 13355)
project = Project(13355)

#Upload photos to Zooniverse

## Temporarily download frog photos

###Add shortcuts to the compressed photos

To download the photos of the frogs into this Google Colab you first need to add shortcuts in your Google drive to the [five zipped folders](https://drive.google.com/file/d/1XXSrATFX1l-J0CUE4m6UfoOBp9zv3XOr/view?usp=sharing) with the photos. 

To add the shortcuts:
* go to the "Shared with me" section in your Google drive,
* find the five zipped folders,
* click on "Add shorcut to Drive" and
* save the shortcuts (we created a folder called "frog_photos" and saved them there).

*Specify* the folder in your Google drive where you saved the shortcuts to the photos (in our case "frog_photos").

In [9]:
dir_shortcuts = "/content/drive/MyDrive/Conservation/Projects/frog_id/frog_photos/"

*If you can't access the five zipped folders please [email Victor](victor@wildlife.ai). 

###Load the zipped files

To download the five zip folders with the photos you will need to grant access to the Google file stream. 



In [None]:
# Mount the drive in colab
drive.mount('/content/drive/')

# Load the five zipped files
whareorino_a = zipfile.ZipFile(dir_shortcuts + "whareorino_a.zip", 'r')
whareorino_b = zipfile.ZipFile(dir_shortcuts + "whareorino_b.zip", 'r')
whareorino_c = zipfile.ZipFile(dir_shortcuts + "whareorino_c.zip", 'r')
whareorino_d = zipfile.ZipFile(dir_shortcuts + "whareorino_d.zip", 'r')
pukeokahu = zipfile.ZipFile(dir_shortcuts + "pukeokahu.zip", 'r')

# Extract the filepath of the photos of individual frogs
zips = [whareorino_a, whareorino_b, whareorino_c, whareorino_d, pukeokahu]
pdList = []

for zip_file in zips:
  zip_pd = pd.DataFrame(
      [x for x in zip_file.namelist() if 'Individual Frogs' in x and not x.endswith(('.db','/','Store'))]
      )
  pdList.append(zip_pd)

# Combine the file paths of the five grids into a single data frame
frog_df = pd.concat(pdList)


###Create a data frame with frog information

Create a data frame to keep track of the photos uploaded to Zooniverse

####Prepare information related to the photos

In [11]:
# Rename the column of df
frog_df = frog_df.rename(columns={0: "zip_path"})

# Add new columns using directory and filename information
directories = frog_df['zip_path'].str.split("/", n = 4, expand = True)

# Add the grid, frog_id, filename, and capture cols 
frog_df["grid"] = directories[0]
frog_df["frog_id"] = directories[2] 
frog_df["filename"] = directories[3] 
frog_df["capture"] = frog_df["filename"].str.split(".",1, expand = True)[0].str.replace('_', '-').str.rsplit("-",1, expand = True)[1] 


# Manually filter out non-standard photos
frog_df = frog_df[~frog_df['filename'].str.contains(("Picture|IMG|#"))]


####Prepare information related to Zooniverse subjects

In [12]:
# Get info of subjects uploaded to the project
export = project.get_export("subjects")

# Save the subjects info as pandas data frame
subjects_df = pd.read_csv(
    io.StringIO(export.content.decode("utf-8")),
    usecols=[
        "subject_id",
        "metadata",
    ],
)

# Reset index of df
subj_df = subjects_df.reset_index(drop=True).reset_index()

# Flatten the metadata from the uploaded subjects
meta_df = pd.json_normalize(subj_df.metadata.apply(json.loads))

# Drop metadata and index columns from original df
subj_df = subj_df.drop(columns=["metadata", "index",]).rename(
    columns={"id": "subject_id"}
)

# Combine the flatten metadata with the subjects df
subj_df = pd.concat([subj_df, meta_df], axis=1)

# Add the subject_id of photos already uploaded to Zooniverse
frog_df = pd.merge(frog_df, subj_df, 
                   how="left", on=["grid", "capture", "frog_id", "filename"])

# Exclude photos with weird filenames
frog_df = frog_df[frog_df['capture'].str.isnumeric() & (~frog_df['capture'].isnull())]

###Temporarily download photos to colab

Specify the directory to save the photos and the number of photos

In [56]:
# Specify the directory in colab to temporarily save the photos
tmp_dir = 'photos_ulpoad/'

# Specify the number of photos to upload
n_photos = 1000

In [82]:
# List photos that can't be downloaded because of:
# "Bad magic number for file header"
faulty_images = ['Grid A/Individual Frogs/80/1100-263.jpg',
                'Grid B/Individual Frogs/499/1100-3257.JPG',
                'Grid B/Individual Frogs/195/1000-2564.jpg',
                'Grid B/Individual Frogs/325/1100-2875.jpg',
                'Grid C/Individual Frogs/120/0100-4190.JPG',
                'Grid D/Individual Frogs/156/1100-6428.jpg',
                'Grid D/Individual Frogs/518/1100-7484.jpg',
                'Grid D/Individual Frogs/685/1101-7727.JPG',
                'Grid D/Individual Frogs/940/1000-10378.JPG',
                'Grid D/Individual Frogs/789/0001-10472.JPG',
                'Pukeokahu Frog Monitoring/Individual Frogs/364/1100-8589.JPG',
                'Pukeokahu Frog Monitoring/Individual Frogs/88/1100-88.jpg']

Temporarily download the photos to colab

In [None]:
# Create the folder to store the videos if not exist
if not os.path.exists(tmp_dir):
    os.mkdir(tmp_dir)

# Select photos that haven't been uploaded 
photos_upload = frog_df[frog_df['subject_id'].isnull()]

# Filter out falty images
photos_upload = photos_upload[~photos_upload['zip_path'].isin(faulty_images)]

# Select n number of photos to upload to Zooniverse
photos_upload = photos_upload.sample(n_photos)
photos_upload["photo_path"] = np.nan

for zip_file in zips:
  # Get a list of all archived file names from the zip
  listOfFileNames = zip_file.namelist()
  # Iterate over the file names
  for fileName in listOfFileNames:
      # Check filename endswith csv
      if fileName in photos_upload['zip_path'].values:
          #print(fileName)
          # Extract a single file from zip
          zip_file.extract(fileName, tmp_dir) 
          # Include the colab path of the photo in the df
          photos_upload.loc[photos_upload['zip_path'].eq(fileName),'photo_path'] = tmp_dir + fileName
                                
print(len(photos_upload.index), "photos have been temporarily downloaded to", tmp_dir)

Check metadata info makes sense before uploading the photos

In [None]:
from google.colab import data_table
data_table.DataTable(photos_upload)

##Upload photos to Zooniverse

In [None]:
# Select the photo_path and other columns that will be used as metadata
photos_upload = photos_upload[
                            [
                             "photo_path",
                             "filename",
                             "capture" ,
                             "frog_id",
                             "grid",
                             ]
                            ]
        
# Save the df as the subject metadata
subject_metadata = photos_upload.set_index('photo_path').to_dict('index')

# Create a subjet set in Zooniverse to host the photos
subject_set = SubjectSet()

subject_set.links.project = project
subject_set.display_name = "training_1000" + date.today().strftime("_%d_%m_%Y")

subject_set.save()

print("Zooniverse subject set created")


# Upload the photos to Zooniverse (with metadata)
new_subjects = []

for photo_path, metadata in subject_metadata.items():
    subject = Subject()

    subject.links.project = project
    subject.add_location(photo_path)

    subject.metadata.update(metadata)

    subject.save()
    new_subjects.append(subject)

# Upload frames
subject_set.add(new_subjects)

print("Subjects uploaded to Zooniverse")
