This repo is focus on analysis
We use two scripts to download github history data from Github Archive
# Take year 2016 as example
mkdir 2016
python generate_url.py 2016
python download_url.py url2016 2016 120We preprocess github history data by the following steps:
- extracting [event, user, repo] pairs from original data.
python preprocess.py-
filter important users and repos.
-
find the strong relationship between user and repos.