<a href="https://colab.research.google.com/github/wildlifeai/pepeketua_zooniverse/blob/main/zooniverse_classifications.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook contains the scripts to download the annotations of landmarks of Archey's frogs from Zooniverse and modify them to train ML algorithms.

#Requirements

### Install required packages

We use the "panoptes_client" package to communicate with Zooniverse. If you don't have it installed, run the command below.

In [None]:
!pip install panoptes_client

### Load required libraries

Load generic libraries

In [None]:
import io
import zipfile
import json
import getpass
import pandas as pd
import numpy as np

from google.colab import drive
from datetime import date
from panoptes_client import (
    SubjectSet,
    Subject,
    Project,
    Panoptes,
) 

### Connect to Zooniverse

You need to specify your Zooniverse username and password. Uploading and downloading information from Zooniverse is only accessible to those user with access to the project.

In [None]:
# Your user name and password for Zooniverse. 
zoo_user = getpass.getpass('Enter your Zooniverse user')
zoo_pass = getpass.getpass('Enter your Zooniverse password')


# Connect to Zooniverse with your username and password
auth = Panoptes.connect(username=zoo_user, password=zoo_pass)

if not auth.logged_in:
    raise AuthenticationError("Your credentials are invalid. Please try again.")

# Connect to the Zooniverse project (our frog project # is 13355)
project = Project(13355)

#Download Zooniverse annotations

In [None]:
# Get the export classifications
export = project.get_export("classifications")

# Save the response as pandas data frame
classifications = pd.read_csv(
    io.StringIO(export.content.decode("utf-8")),
    usecols=[
             "user_name",
             "subject_ids",
             "subject_data",
             "classification_id",
             #"workflow_id",
             #"workflow_version",
             "annotations",
             ],
             )
# Convert JSON strings into Python dictionaries, providing access to key-value pairs.
classifications['annotations'] = [json.loads(q) for q in classifications.annotations]

# Flatten annotations
x =[]
y = []
label = []
classification_id = []

for i,row in classifications.iterrows():
  class_id = row['classification_id']
  
  for t in row['annotations']:
    # Select survey Task = T0
    if t['task'] == 'T0':
      if len(t['value']) > 0:
        for l in t['value']:
          x.append(l['x'])
          y.append(l['y'])
          label.append(l['tool_label'])
          classification_id.append(class_id)
      else:
        x.append('')
        y.append('')
        label.append('')
        classification_id.append(class_id)
    
# Combine all the annotations into a data frame
annotations = pd.concat([
                     pd.DataFrame(x, columns =['x']),
                     pd.DataFrame(y, columns =['y']),
                     pd.DataFrame(label, columns =['label']),
                     pd.DataFrame(classification_id, columns =['classification_id'])],
                    axis=1)

# Drop metadata and index columns from original df
classifications = classifications.drop(columns=["annotations"])

# Add metadata information based on the classification id
flat_anotations = pd.merge(annotations, classifications, 
                           how="left", on=["classification_id"])


Display a table of the dataframe with the annotations

In [None]:
from google.colab import data_table
data_table.DataTable(flat_anotations)

##Analyse classifications

Compare the accuracy difference between three different users

In [None]:
duplicated_annotations = flat_anotations.groupby(['subject_ids','label']).filter(lambda x: len(x) > 1)

duplicated_annotations.sort_values(by=['label','subject_ids'])[['x','y','label','user_name','subject_ids']].round({'x': 1, 'y': 1})

#duplicated_annotations.groupby(['subject_ids','label']).agg({'x':['max','min'],'y':['max','min']})
