Skip to content
Analysis and visualization code for the Natural History of Song project (Mehr et al., 2019, Science)
R JavaScript Stata HTML CSS
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
functions analysis Sep 6, 2019

Natural History of Song

This repo contains code for the Natural History of Song project (Mehr et al., 2019, Science). The Data and Materials Availability statement from the paper is copied below. To reproduce our analyses you will need some or all of these files.

All Natural History of Song data and materials are publicly archived at, with the exception of the full audio recordings in the NHS Discography, which are available via the Harvard Dataverse, at All analysis scripts are available at Human Relations Area Files data and the eHRAF World Cultures database are available via licensing agreement at; the document- and paragraph-wise word histograms from the Probability Sample File were provided by the Human Relations Area Files under a Data Use Agreement. The Global Summary of the Year corpus is maintained by the National Oceanic and Atmospheric Administration, United States Department of Commerce, and is publicly available at

For those replicating analyses using the eHRAF Probability Sample File data, you will need to build rds files as per code in script2_compare_psf_final.R. If you run into issues, please contact us.


All analyses in the paper can be reproduced with the code posted here, in R and Python. The pipeline for visualizations takes csv output from R, processes it in Stata, and then produces visualizations in R. Some figure elements are augmented manually (e.g., adding some labels) and/or include illustrations, so your reproduced figures will not match those in the paper exactly.

You can’t perform that action at this time.