GitHub - vwz/ProxEmbed: Semantic Proximity Search on Heterogeneous Graph by Proximity Embedding

vwz / ProxEmbed Public

Notifications You must be signed in to change notification settings
Fork 3
Star 15

Semantic Proximity Search on Heterogeneous Graph by Proximity Embedding

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
code		code
toy_data		toy_data
ReadMe		ReadMe

Repository files navigation

This directory contains the source code of ProxEmbed model.
========================================================================================================
@inproceedings{LiuZZZCWY17,
 author = {Liu, Zemin and Zheng, Vincent W. and Zhao, Zhou and Zhu, Fanwei and Chang, Kevin Chen-Chuan and Wu, Minghui and Ying, Jing},
 title = {Semantic Proximity Search on Heterogeneous Graph by Proximity Embedding},
 booktitle = {Proc. of the 31st AAAI Conference on Artificial Intelligence},
 series = {AAAI '17},
 year = {2017}
} 

Please cite the above reference for using our code.

For inqueries, please contact:
Zemin Liu (liuzemin@zju.edu.cn)
Vincent Zheng (vincent.zheng@adsc.com.sg)
========================================================================================================

This experiment consists of two parts, symmetric and asymmetric. 
The symmetric part is used to compute facebook and linkedin dataset, whose relation is symmetric.
While the asymmetric part is used to compute dblp dataset, whose relation is asymmetric.

In each part, such as this symmetric directory, there are two directories, "java - prepare data for model" and "python - model".
The "java - prepare data for model" directory is used to prepare input data for ProxEmbed model, while "python - model" directory is used to generate and assess ProxEmbed model.

In "java - prepare data for model", 
Main.java is the entry class. And the program will read parameters from file javaParams.properties.

In "python - model", the entry file is experimentForOneFileByParams.py. This file provides us a method to get parameters from file pythonParamsConfig, and then trains the model and then tests it.
For instance, in file pythonParamsConfig, if we set 
	root_dir = ./toy_data/,
	dataset_name = linkedin,
	suffix = 4,
	class_name = school,
	index = 1,
then this model will use ./toy_data/linkedin.splits/train.4/train_school_1 as the training data, 
	and will use ./toy_data/linkedin.splits/test/test_school_1 as test data,
	and will use ./toy_data/linkedin.splits/ideal/ideal_school_1 as ideal data (ground-truth).
	
So at last, this model then will output the results of NDCG and MAP.

The "java - prepare data for model" will read data from main folder of the dataset (which contains graph.node and graph.edge), such as "linkedin" folder, and then output intermediate dataset into the same folder.
Then "python - model" will take these data as one part of the input.