Skip to content

yilinjuang/GitHub-Repo-Recommender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Octomender

Octomender = Octopus (GitHub) + Recommender

Github Repo Recommender System.

2017 Network Science Final Project with J. C. Liang.

Requirement

  • python3
  • NetworkX: High-productivity software for complex networks.
  • NumPy
  • SciPy
  • OpenMP>=4.0: C/C++ API that supports multi-platform shared memory multiprocessing programming.

Dataset

Github Archive

Preprocessing

Parse raw json data files into three pickle data files.

  • output-data-basename.user: map of user id (str) to user name (str)
  • output-data-basename.repo: map of repo id (int) to repo name (str)
  • output-data-basename.edge: list of tuples of user-repo edge (str, int)
Usage: parse.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename>
  -m, --member      parse MemberEvent.
  -w, --watch       parse WatchEvent.
Ex:    parse.py -m 2017-06-01-0.json data
Ex:    parse.py --watch json/2017-05/ data/2017-05

Refer raw json data format to GitHub API v3.

Ditto, but run with multiprocessing. Default number of processes is 16.

Usage: parse.py {-m|--member|-w|--watch} {<input-json-directory>|<input-json-file>} <output-data-basename> [n-process]
  -m, --member      parse MemberEvent.
  -w, --watch       parse WatchEvent.
  n-process         number of processes when multiprocessing.
Ex:    parse.py -m 2017-06-01-0.json data
Ex:    parse.py --watch json/2017-05/ data/2017-05 32

Merge multiple pickle data files into one.

Usage: mergedata.py <input-data-dir> <output-data-basename>
Ex:    mergedata.py data/2016-010203/ data/2016-Q1

Generate bipartite graph and project to unipartite graph (optional).

Usage: generate.py <input-data-basename> <output-graph-basename> [-p|--project]
  -p, --project     project to unipartite graph (multigraph).
Ex:    generate.py data/2017-05 graph/2017-05
Ex:    generate.py data/2016-Q1 graph/2016-Q1 -p

Refer implementation of bipartite graph to algorithms.bipartite of NetworkX.

Filter multigraph to single graph with different mode.

Usage: filter.py {-m|-t|-p} <input-unipartite-nxgraph> <output-filtered-nxgraph>
  -m                filtering mode: Multiplicity > 1.
  -t                filtering mode: Top % of multiplicity.
  -p                filtering mode: Multiplicity proportion > threshold.
Ex:    filter.py -m graph/2017-05_user.nxgraph graph/2017-05_user_m.nxgraph
Ex:    filter.py -t graph/2016-Q1_repo.nxgraph graph/2016-Q1_repo_t.nxgraph

Convert NetworkX Graph object (.nxgraph) to edge list.

Usage: nxgraph2edgelist.py <input-nxgraph> <output-edgelist-basename>
Ex:    nxgraph2edgelist.py graph/2017-05_bi.nxgraph graph/2017-05_bi

SVD Predictor

Octomender

Build

make

Run

Usage: ./octomender <input-edgelist>
Ex:    ./octomender graph/2017-05_bi.edgelist

Or direct output to file.

Usage: ./octomender <input-edgelist> > output.log
Ex:    ./octomender graph/2017-05_bi.edgelist > log/2017-05.log

Convert log file to readable format including interpretation of repo id to repo name.

Usage: whatsthisrepoid.py <input-log-file> <input-repo-data-file>
Ex:    whatsthisrepoid.py log/2017-05.log data/2017-05.repo

Look up the corresponding id/name of user/repo to name/id of it.

Usage: lookup.py <input-data-file> <query>
Ex:    lookup.py data/2017-05.user frankyjuang
Ex:    lookup.py data/2017-05.user 6175880
Ex:    lookup.py data/2017-05.repo tensorflow/tensorflow
Ex:    lookup.py data/2017-05.repo 45717250

About

Github Repo Recommender System. 2017 Network Science Final Project.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published