This repository contains the codes for the Semi-Supervise Factor Graph Model (SSFGM).
If you have any problem with the code and data format, please contact the author by yujieq@csail.mit.edu.
In the ./py
folder, we present our python implementation of the Two-Chain Sampling (TCS) algorithm, based on Tensorflow framework. (newest version)
In the ./cpp
folder, we present our c++ implementation of the Loopy Belief Propagation (LBP), SampleRank, and TCS with Metropolis-Hastings sampling algorithm.
In the ./misc
folder, there are some other scripts for baseline methods or evaluations.
Please see each folder for details.
Please download the preprocessed feature files using the following links:
- Twitter (World) (2.2G)
- Twitter (USA) (139M)
- Weibo (616M)
We cannot release the raw data of the Twitter datasets due to some limitations. Original data for Weibo and Facebook can be found at:
Weibo: https://aminer.org/influencelocality
Facebook: http://snap.stanford.edu/data/egonets-Facebook.html
Training file consists of two parts: node and edge.
The first part is node. Each line represent a node (instance), and the format is defined as follows:
[+/*/?]label featname_1:val featname_2:val ... [#id]
where +/*/?
each stands for training/validation/testing data, labels and feature names can be strings (length<32). The value can be real-valued or 0/1. We suggest to normalize the input features to [0,1].
The second part is edge. Each line represent an edge (correlation between two instances). The format is:
#edge line_a line_b edgetype
where line_a
, line_b
correspond to two nodes in the first part, and lines are counted starting with 0. edgetype
is a string indicating the type of this edge. Currently the code only support one type.