Analysis pipeline for Radio Galaxy Zoo.
Requires:
- MongoDB
- Python 2.7.x
To set up the pipeline, you also needs a download of the raw RGZ database. This is natively stored on the Amazon S3 servers. Daily dumps can be obtained via email link; contact the Zooniverse team at Adler/Oxford for access. As of early 2016, the database (raw version is a set of JSON files generated by mongoexport
) is ~250 MB zipped and ~6 GB when imported into Mongo. The standard versions to be used by science team members are "sanitized" to make sure identifying information on users is kept secure.
Once the database is downloaded, the pipeline can be run by:
-
run a
mongod
session on your local machine -
run
mongoimport
on all three JSON collection files from RGZ. Example:mongoimport --db radio --drop --collection radio_subjects sanitized_radio_2016-01-01/radio_subjects.json
mongoimport --db radio --drop --collection radio_classifications sanitized_radio_2016-01-01/radio_classifications.json
mongoimport --db radio --drop --collection radio_groups sanitized_radio_2016-01-01/radio_groups.json
-
execute
make_rgz_catalog.sh