Knowledge Curation in a Developer Community: A Study of Stack Overflow and Mailing Lists
This is the dataset that accompanies the following publications:
- “Knowledge Curation in a Developer Community: A Study of Stack Overflow and Mailing Lists” by Carlos Gómez Teshima. MSc. Thesis, University of Victoria, 2015 (view publication)
- “How the R Community Creates and Curates Knowledge: A Comparative Study of Stack Overflow and Mailing Lists” by Alexey Zagalsky, Carlos Gómez Teshima, Daniel M. German, Margaret-Anne Storey, Germán Poo-Caamaño. The 13th International Conference on Mining Software Repositories, 2016 (view publication)
Raw data used for the study
R-ML-and-StackOverflow-psql.xz-part05 This is a partitioned SQL dump (postgresql) of the raw data used for the study. The file was compressed using xz and split (due to github restrictions regarding file size). To re-create the database run the following: (using Linux)
cat R-ML-and-StackOverflow-psql.xz-part* | xzcat | pgsql template1
This will create a database called
MLandSOF Under Windows, first concatenate the files (in order, -part00 to -part05), then decompress. The decompressed file is a SQL script that can be read by pgsql.
Coded sample data
R-ML-and-StackOverflow-psql-sample.tar.xz This is an SQL database dump (postgresql) of the emails and Stackoverflow posts that were analyzed and coded as part of the study (referred to as the sample data). The file was compressed using xz. To create the database run the following: (using Linux)
cat R-ML-and-StackOverflow-psql-sample.tar.xz | xzcat > r_ml_so.dump
And then load the dump using pg_restore:
pg_restore -h localhost -p 5432 -U postgres -d sampledb -v "r_ml_so.dump"
LimeSurveyReport-surveyAnswers.pdf This file contains the survey answers. The report is a printed version of the online report generated by Lime Survey.