<img align="left" src="https://panoptes-uploads.zooniverse.org/project_avatar/86c23ca7-bbaa-4e84-8d8a-876819551431.png" type="image/png" height=100 width=100>
</img>


<h1 align="right">KSO Tutorials #12: Analyse Zooniverse classifications</h1>
<h3 align="right">Written by @jannesgg and @vykanton</h3>
<h5 align="right">Last updated: Sept 15th, 2021</h5>

# Set up and requirements

### Import Python packages

In [1]:
# Set the directory of the libraries
import sys
sys.path.append('..')

# Set to display dataframes as interactive tables
from itables import init_notebook_mode
init_notebook_mode(all_interactive=True)

# Import required modules
import utils.t12_utils as t12
from utils.zooniverse_utils import retrieve_zoo_info, populate_subjects, populate_annotations
import getpass

print("Packages loaded successfully")

<IPython.core.display.Javascript object>



Packages loaded successfully


### Initiate SQL database and populate sites, movies and species

In [2]:
# Specify the path of the movies 
movies_path = "/uploads"

# Specify the path of the sql database
db_path = "koster_lab.db"

# Initiate the SQL database 
%run -i "../db_starter/starter.py" --movies_path $movies_path --db_path $db_path

There is no folder with initial information about the sites, movies and species.
 Please enter the ID of a Google Drive zipped folder with the inital database information. 
 For example, the ID of the template information is: 1PZGRoSY_UpyLfMhRphMUMwDXw4yx1_Fn
ID of Google Drive zipped folder········
Retrieving the file from  https://drive.google.com/uc?id=11XQIKdwX9wHrzOcrDlRrs9Oq5uA2o95k


Downloading...
From: https://drive.google.com/uc?id=11XQIKdwX9wHrzOcrDlRrs9Oq5uA2o95k
To: C:\Users\Victor\koster_data_management\tutorials\db_csv_info.zip
100%|██████████| 15.0k/15.0k [00:00<00:00, 3.27MB/s]


Updated sites
Updated movies
Updated species


### Retrieve Zooniverse information

In [3]:
# Save your user name, password and Zooniverse project number.
zoo_user = getpass.getpass('Enter your Zooniverse user')
zoo_pass = getpass.getpass('Enter your Zooniverse password')
project_n = getpass.getpass('Enter the number of the Zooniverse project')

Enter your Zooniverse user········
Enter your Zooniverse password········
Enter the number of the Zooniverse project········


In [4]:
# Specify the Zooniverse information required throughout the tutorial
zoo_info = ["subjects", "workflows", "classifications"]

# Retrieve and store the Zooniverse information required throughout the tutorial in a dictionary
zoo_info_dict = retrieve_zoo_info(zoo_user, zoo_pass, project_n, zoo_info)

Connecting to the Zooniverse project
Retrieving subjects from Zooniverse
subjects were retrieved successfully
Retrieving workflows from Zooniverse
workflows were retrieved successfully
Retrieving classifications from Zooniverse
classifications were retrieved successfully


In [5]:
# Populate the sql with subjects already uploaded to Zooniverse
subjects = populate_subjects(zoo_info_dict["subjects"], project_n, db_path)

Updated subjects
The database has a total of 2342 frame subjects and 7362 clip subjects have been updated


### Step 1: Specify the Zooniverse workflow id and version of interest

*Note:  A manual export in Zooniverse is required to get the most up-to-date classifications here**

Make sure your workflows in Zooniverse have different names to avoid issues while selecting the workflow id

In [6]:
# Display a selectable list of workflow names and a list of versions of the workflow of interest
workflows_df = zoo_info_dict["workflows"]
workflow_name, workflow_version, subj_type = t12.choose_workflows(workflows_df)

Dropdown(description='Workflow name:', options=('Species identification', 'test ', 'Species location (intermed…

Dropdown(description='Minimum workflow version:', options=(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, …

Dropdown(description='Subject type:', index=1, options=('frame', 'clip'), value='clip')

In [7]:
# Selects the workflow id based on the workflow name
workflow_id = workflows_df[workflows_df.display_name==workflow_name.value].workflow_id.unique()[0]

# Retrieve classifications from the workflow of interest
total_df, class_df = t12.get_classifications(workflow_id,
                                             workflow_version.value, 
                                             subj_type.value, 
                                             zoo_info_dict["classifications"], 
                                             zoo_info_dict["subjects"])

Zooniverse classifications have been retrieved


### Step 2: Aggregate classifications received on the workflow of interest

In [8]:
# Specify the agreement threshold required among cit scientists
agg_params = t12.choose_agg_parameters(subj_type.value)

FloatSlider(value=0.8, continuous_update=False, description='Aggregation threshold:', max=1.0, readout_format=…

IntSlider(value=3, continuous_update=False, description='Min numbers of users:', max=20, min=1, style=SliderSt…

FloatSlider(value=0.8, continuous_update=False, description='Object threshold:', max=1.0, readout_format='.1f'…

FloatSlider(value=0.5, continuous_update=False, description='IOU Epsilon:', max=1.0, readout_format='.1f', sty…

FloatSlider(value=0.8, continuous_update=False, description='Inter user agreement:', max=1.0, readout_format='…

In [10]:
agg_class_df = t12.aggregrate_classifications(class_df, subj_type.value, subjects, agg_params)

Aggregrating the classifications


AttributeError: 'DataFrame' object has no attribute 'start_frame'

In [None]:
# Populate the sql with subjects already uploaded to Zooniverse
populate_annotations(subjects, agg_class_df, db_path)

### Step 3: Summarise the number of classifications based on the agreement specified

In [None]:
agg_class_df.groupby("label")["subject_id"].agg("count")

### Step 4: Display the aggregated classifications in a table

In [None]:
# Display the dataframe into a table
agg_class_df

### Step 5: Use the subject explorer widget to get more information about specific subjects and their classifications

In [None]:
# Get all classified subjects from specified workflows
subject_df = t12.process_clips(class_df) if subj_type.value == "clip" else t12.process_frames(class_df)

t12.launch_viewer(total_df, subject_df)

In [None]:
# END