<a href="https://colab.research.google.com/github/wildlifeai/pepeketua_zooniverse/blob/main/frog_zooniverse.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook contains the scripts to upload photos of Archey's frogs to a Zooniverse project and download labels of the landmarks of the frogs to train ML algorithms.

#Requirements

We use the "panoptes_client" package to communicate with Zooniverse. If you don't have it installed, run the command below.

In [None]:
!pip install panoptes_client

Load generic libraries

In [2]:
import zipfile
import pandas as pd
import numpy as np

from google.colab import drive
from datetime import date
from panoptes_client import (
    SubjectSet,
    Subject,
    Project,
    Panoptes,
) 

Broken libmagic installation detected. The python-magic module is installed but can't be imported. Please check that both python-magic and the libmagic shared library are installed correctly. Uploading media other than images may not work.


# Download frog photos

###Add shortcuts to the compressed photos

To download the photos of the frogs into this Google Colab you first need to add shortcuts in your Google drive to the [five zipped folders](https://drive.google.com/file/d/1XXSrATFX1l-J0CUE4m6UfoOBp9zv3XOr/view?usp=sharing) with the photos. 

To add the shortcuts:
* go to the "Shared with me" section in your Google drive,
* find the five zipped folders,
* click on "Add shorcut to Drive" and
* save the shortcuts (we created a folder called "frog_photos" and saved them there).

*Specify* the folder in your Google drive where you saved the shortcuts to the photos (in our case "frog_photos").

In [3]:
dir_shortcuts = "/content/drive/My Drive/frog_photos/"

*If you can't access the five zipped folders please [email Victor](victor@wildlife.ai). 

###Download the compressed photos

To download the five zip folders with the photos you will need to grant access to the Google file stream. 



In [23]:
#Mount the drive in colab
drive.mount('/content/drive/')

#Load the five zipped files
whareorino_a = zipfile.ZipFile(dir_shortcuts + "whareorino_a.zip", 'r')
whareorino_b = zipfile.ZipFile(dir_shortcuts + "whareorino_b.zip", 'r')
whareorino_c = zipfile.ZipFile(dir_shortcuts + "whareorino_c.zip", 'r')
whareorino_d = zipfile.ZipFile(dir_shortcuts + "whareorino_d.zip", 'r')
pukeokahu = zipfile.ZipFile(dir_shortcuts + "pukeokahu.zip", 'r')

# Extract the filepath of the photos of individual frogs
zips = [whareorino_a, whareorino_b, whareorino_c, whareorino_d, pukeokahu]
pdList = []

for zip_file in zips:
  zip_pd = pd.DataFrame(
      [x for x in zip_file.namelist() if 'Individual Frogs' in x and not x.endswith(('.db','/','Store'))]
      )
  zip_pd["zipfile"]=zip_file
  pdList.append(zip_pd)

# Combine the file paths of the five grids into a single data frame
frog_df = pd.concat(pdList)

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


#Create a data frame with frog information

Create a data frame to keep track of the photos uploaded to Zooniverse

###Prepare information related to the photos

In [24]:
#Rename the column
frog_df = frog_df.rename(columns={0: "zip_path"})

#Add new columns based on the directory and filename of the photos
directories = frog_df['zip_path'].str.split("/", n = 4, expand = True)

# making separate first name column from new data frame 
frog_df["grid"] = directories[0]
frog_df["frog_id"] = directories[2] 
frog_df["filename"] = directories[3] 

frog_df["capture"] = frog_df["filename"].str.split(".",1, expand = True)[0].str.replace('_', '-').str.rsplit("-",1, expand = True)[0] 
                                
frog_df["subject_id"] = np.nan #Uncomment if no subjects uploaded
#list(frog_df.columns)
#frog_df.iloc[1442]['filename']


In [None]:
frog_df

###Prepare information related to Zooniverse subjects

You need to specify your Zooniverse username and password. Uploading and downloading information from Zooniverse is only accessible to those user with access to the project.

In [26]:
zoo_user = "user"
zoo_pass = "pass"

In [None]:
# Connect to Zooniverse with your username and password
auth = Panoptes.connect(username=zoo_user, password=zoo_pass)

if not auth.logged_in:
    raise AuthenticationError("Your credentials are invalid. Please try again.")

# Connect to the Zooniverse project (our frog project # is 13355)
project = Project(13355)

# Get info of subjects uploaded to the project
export = project.get_export("subjects")

# Save the subjects info as pandas data frame
subjects_df = pd.read_csv(
    io.StringIO(export.content.decode("utf-8")),
    usecols=[
        "subject_id",
        "metadata",
    ],
)

# Reset index of df
subj_df = subjects_df.reset_index(drop=True).reset_index()

# Flatten the metadata from the uploaded subjects
meta_df = pd.json_normalize(subj_df.metadata.apply(json.loads))

# Drop metadata and index columns from original df
subj_df = subj_df.drop(columns=["metadata", "index",]).rename(
    columns={"id": "subject_id"}
)

# Add the subject_id of photos already uploaded to Zooniverse
frog_df = pd.merge(frog_df, subj_df, how="left", on="movie_filename")


all


PanoptesAPIException: ignored

#Upload new photos to Zooniverse

In [29]:
#Select n number of photos to upload to Zooniverse
photos_upload = frog_df[frog_df['subject_id'].isnull()].sample(n = 3)

photos_upload["zipfile"]

902     <zipfile.ZipFile filename='/content/drive/My D...
216     <zipfile.ZipFile filename='/content/drive/My D...
1325    <zipfile.ZipFile filename='/content/drive/My D...
Name: zipfile, dtype: object

In [13]:
#Select n number of photos to upload to Zooniverse
photos_upload = frog_df[frog_df['subject_id'].isnull()].sample(n = 3)

#Uncompress the photos that will be uploaded
zip_path

#Select the file_path and other columns that will be used as metadata
photos_upload = photos_upload[
                            [
                             "file_path",
                             "filename",
                             "capture" ,
                             "frog_id",
                             "grid",
                             ]
                            ]
        
# Save the df as the subject metadata
subject_metadata = photos_upload.set_index('file_path').to_dict('index')

# Create a subjet set in Zooniverse to host the photos
subject_set = SubjectSet()

subject_set.links.project = project
subject_set.display_name = "sample" + date.today().strftime("_%d_%m_%Y")

subject_set.save()

print("Zooniverse subject set created")


# Upload the photos to Zooniverse (with metadata)
new_subjects = []

for file_path, metadata in subject_metadata.items():
    subject = Subject()

    subject.links.project = project
    subject.add_location(file_path)

    subject.metadata.update(metadata)

    subject.save()
    new_subjects.append(subject)

# Upload frames
subject_set.add(new_subjects)

print("Subjects uploaded to Zooniverse")


Zooniverse subject set created


FileNotFoundError: ignored

#Download Zooniverse annotations

In [None]:
import os 
import pandas as pd

# Create a df of the photos found in the tmp folder
data = []
# Loop through each folder in the tmp directory
for grid in os.listdir('../tmp/'):
  if 'Grid' in grid:
    grid_path = '../tmp/' + grid
    # Loop through each subfolder in the 'Grid' directories
    for subfolder in os.listdir(grid_path):
      if 'Individual' in subfolder:
        subfolder_path = grid_path + "/" + subfolder
        # Loop through each individual frog in the "individual frog" directoy
        for ind in os.listdir(subfolder_path):
          if not ind.endswith('db'):
            ind_path = subfolder_path + "/" + ind
            # Loop through each photo of the "individual" frog
            for doc in os.listdir(ind_path):
              #Save information about the photo and the frog
              if not doc.endswith('db'):
                fpath = ind_path + "/" + doc
                capt = doc.split(".",1)[0].replace('_', '-').rsplit("-",1)[1]
                data.append((doc, fpath, capt, ind, grid))

df = pd.DataFrame(data,columns = ['filename', 'file_path', 'capture', 'frog_id', 'grid'])