Skip to content
No description, website, or topics provided.
Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
parsed
README.md
data_description_plot.ipynb
helpers.py
make_plots_data[6th].ipynb
make_plots_model[6th].ipynb
sx01cotagging_network.py
sx02number-of-tags-on-posts_data.py
sx03clustering_coefficient_data.py
sx04statistics_data.py
sx05tag-frequency.py
sx06polynomial-fit_data.py
sx07generative-model-theoretical-statistics.py
sx08generative-model.py
sx09number-of-tags-on-posts_model.py
sx10clustering_coefficient_model.py
sx11statistics_model.py
sx12polynomial-fit_model.py

README.md

Modeling and Analysis of Tagging Networks in Stack Exchange Communities

This code and data repository accompanies the paper:

For questions, please email Shangdi at sy543@cornell.edu.

The code for analyzing Stack Exchanges communities, as well as the code to generate the synthetic graphs for section 4, is written in Python 3.

We used the following versions of external python libraries:

Reproducing results and figures

To produce the results presented in the paper, run the python files in the following order, and results will be generated through running the code. It will take hours to run all code on a normal computer for all the 168 stack-exchange networks. The code for calculating Clustering Coefficients takes significant longer time than other code.

python sx01cotagging_network.py

Generate: "./cotag_data/*_cotag.csv" for each stack-exchange community "./parsed/*.txt".

Each ".csv" file describes the cotagging network of the corresponding stack-exchange community.

python sx02number-of-tags-on-posts_data.py

Generate: "./results/tag_num.csv".

python sx03clustering_coefficient_data.py

Generate: "./results/clustering.csv" and "./results/clustering2.csv".

python sx04statistics_data.py

Generate: "./results/stats.csv"

python sx05tag-frequency.py

Generate: "./results/tagfreq.csv" and "./results/patents_is_different.csv"

python sx06polynomial-fit_data.py

Generate: "./results/poly_params.csv" and "./results/param_PC.npy".

python sx07generative-model-theoretical-statistics.py

Generate: "./results/theory_cotag_u.csv", "./results/var_zero_post.csv", and "./files/theory_unique/*.csv" for for each stack-exchange community "./parsed/*.txt".

python sx08generative-model.py

Generate: "./files/generated/gen_*.csv" and "./files/generated/cotag_files/gen_*_cotag.csv" for each stack-exchange community "./parsed/*.txt".

python sx09number-of-tags-on-posts_model.py

Generate: "./gen_results/gen_tag_num.csv".

python sx10clustering_coefficient_model.py

Generate: "./gen_results/gen_clustering.csv", "./gen_results/gen_clustering2.csv", and "./gen_results/gen_param_PC.npy".

python sx11statistics_model.py

Generate: "./gen_results/gen_stats.csv"

python sx12polynomial-fit_model.py

Generate: "./gen_results/gen_poly_params.csv" and "./gen_results/gen_param_PC.npy".

Now all data files should be generated correctly, and we are ready to make plots.

Run the jupyter notbook data_description_plot.ipynb for Figure 1.

Run make_plots_data.ipynb for data-related plots.

Run make_plots_model.ipynb for model-related plots.

You can’t perform that action at this time.