# Simple Aggregation of Zwicky's Quirky Transients Project

Using: Python 3

This will demonstrate a simple aggregation of the initial workflow of the ZTF Zooniverse project, Zwicky's Quirky Transients. 

In [1]:
from aggregate_ztf import aggregate_ztf

project_name = 'zwickys-quirky-transients'
# the input file
class_file             = project_name + '-classifications.csv'
# the output file
aggregated_file        = project_name + '-aggregated.csv'

# the workflow version is a string because it's actually 2 integers separated by a "."
# (major).(minor)
# i.e. 6.1 is NOT the same version as 6.10
# though note we only care about the major version, so this is just being careful
workflow_id      = 8368
workflow_version = "6.16"

The above are the only things you actually *need* to do the aggregation. But you can also specify e.g. a file to save a list of classification counts by user in (useful if you want to make a Tree diagram, and also useful if you're doing user weighted and want to record classifier weights somewhere). 

Once you've done this, all that's left is to run the aggregation.

In [2]:
class_agg = aggregate_ztf(classfile_in=class_file, wfid=workflow_id, wfv=workflow_version, 
              outfile=aggregated_file, counts_out=False, verbose=True)

Reading classifications from zwickys-quirky-transients-classifications.csv ...
... 35442 classifications selected from workflow 8368, version 6.
Getting subject info...
35442 classifications from 593 users, 340 registered and 253 unregistered.

Mean n_class per user 59.8, median 12.0.
Classification leaderboard:
user_name
770120179                             2568
mitch                                 1903
graham_d                              1610
morganlf                              1450
LeusaneLordelo                        1397
ElisabethB                            1086
bgreiner                              1073
jiipee                                 709
Quivira1541                            544
planetaryscience                       525
nilium                                 500
not-logged-in-a8bc9af51de9e7ff2707     444
eimueller                              390
not-logged-in-fe04f474585e216d2a9e     366
LisaV                                  346
jeffrodabro                    

With `verbose` turned on, the program will output a leaderboard (it's not recommended you share this publicly; it's just for your edification), the project's Gini coefficient, and various basic project statistics. With about 35,000 classifications to aggregate, the whole process takes a few seconds.

Note that the program assumes this is somewhat time-sensitive and you may be aggregating project data where not all the subjects are fully classified yet. In addition to the full aggregation file, then, it saves a subset of subjects where a minimum number of classifications have been collected to make the vote fractions useful. The default value is 5 classifications, but you can change this at the prompt/program input.

The function returns the aggregated dataframe, with all rows and columns, ranked by `p_Real` (or the weighted version, if you have weighted the classifications) and ties broken by the classification count. Let's look at the subset with more than 5 classifications.

In [4]:
class_agg[class_agg.count_unweighted >= 5].head(10)

Unnamed: 0_level_0,p_Real,p_Bogus,p_Skip,count_unweighted,subject_filename,link
subject_ids,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
29060993,1.0,0.0,0.0,10,zoo722318781915015037.png,https://www.zooniverse.org/projects/rswcit/zwi...
29007754,1.0,0.0,0.0,9,zoo712284970915015022.png,https://www.zooniverse.org/projects/rswcit/zwi...
29055894,1.0,0.0,0.0,9,zoo729120633415015025.png,https://www.zooniverse.org/projects/rswcit/zwi...
29059069,1.0,0.0,0.0,9,zoo728133944615015011.png,https://www.zooniverse.org/projects/rswcit/zwi...
29007772,1.0,0.0,0.0,8,zoo712286850215015015.png,https://www.zooniverse.org/projects/rswcit/zwi...
29008427,1.0,0.0,0.0,8,zoo717126822915015034.png,https://www.zooniverse.org/projects/rswcit/zwi...
29008655,1.0,0.0,0.0,8,zoo701542024115015007.png,https://www.zooniverse.org/projects/rswcit/zwi...
29010295,1.0,0.0,0.0,8,zoo726484051915015023.png,https://www.zooniverse.org/projects/rswcit/zwi...
29010723,1.0,0.0,0.0,8,zoo719426335515015202.png,https://www.zooniverse.org/projects/rswcit/zwi...
29010820,1.0,0.0,0.0,8,zoo717114622115015084.png,https://www.zooniverse.org/projects/rswcit/zwi...


The `subject_filename` is extracted from the subject metadata (the information from the original manifest). The `link` column exists so that, should you load this csv up in e.g. Google Drive, you can click that column to visit the subject page on Zooniverse Talk. 