-
Notifications
You must be signed in to change notification settings - Fork 11
Active Learning for Graph Embedding
License
vwz/AGE
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This program (AGE) implements an active learning for graph embedding framework, as proposed in the following paper. If you use it for scientific experiments, please cite this paper: @article{DBLP:journals/corr/CaiZC17, author = {HongYun Cai and Vincent Wenchen Zheng and Kevin Chen{-}Chuan Chang}, title = {Active Learning for Graph Embedding}, journal = {CoRR}, volume = {abs/1705.05085}, year = {2017}, url = {https://arxiv.org/abs/1705.05085}, timestamp = {Mon, 15 May 2017 06:49:04 GMT} } The code has been tested under Ubuntu 16.04 LTS with Intel Xeon(R) CPU E5-1620 @3.50GHz*8 and 16G memory. ============== *** Installation *** ============== python setup.py install ============== *** Requirements *** ============== tensorflow (>0.12) networkx Graph convolutional network (Kipf and Welling, ICLR 2017): https://github.com/tkipf/gcn ============== *** Data *** ============== In order to use your own data, you have to provide an N by N adjacency matrix (N is the number of nodes), an N by D feature matrix (D is the number of features per node), and an N by E binary label matrix (E is the number of classes). Have a look at the load_data() function in utils.py for an example. In this example, we load citation network data (Cora, Citeseer or Pubmed). The original datasets can be found here: http://linqs.cs.umd.edu/projects/projects/lbc/. In our version (see data folder) we use dataset splits provided by https://github.com/kimiyoung/planetoid (Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov, Revisiting Semi-Supervised Learning with Graph Embeddings, ICML 2016) to load the whole dataset, and use the same test data as theirs. The validation node instances are randomly sampled from the non-test nodes set. We randomly generate 10 validation sets for each dataset and the node indexes are stored in "source/datasetname/val_idxa.txt" (where a is the validation set id, range within [0,10]). The initially labeled nodes are randomly sampled from the non-test and non-train nodes set. Given the C (the number of classes in this dataset) and a predefined L, AGE will randomly sample L nodes from each class as the initially labeled nodes (so there are C*L initial labeled nodes in total). ============== *** Run the Program *** ============== 1. First generate the graph centrality score for each node as follows. Command: python get_graph_centrality.py datasetname e.g., python get_graph_centrality.py citeseer Parameteres: datasetname: denote the dataset to process Output: The centality scores for each node (same order as in graph) are stored in "res/datasetname/graphcentrality/normcen" Note: We adopt PageRank Centrality in this work. You can try other centrality measurements by modifing function "centralissimo()" in file "get_graph_centrality.py". 2. Run the AGE algorithm to actively select nodes to label during the graph embedding process and record the MacroF1 and MicroF1 for node classification Command: python train_entropy_density_graphcentral_ts.py validation_id nb_initial_labelled_nodes_per_class class_nb datasetname e.g., python train_entropy_density_graphcentral_ts.py 0 4 6 citeseer Parameters: validation_id: the validation set id, refering to the id listed in "source/datasetname/val_idxa.txt" nb_initial_labelled_nodes_per_class: number of the initial labelled nodes per class, we use four in this work class_nb: number of class for each dataset datasetname: the name of the dataset to process
About
Active Learning for Graph Embedding
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published