-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add neighborhood op #37
Conversation
out.push_back(ver_list[i]); | ||
mp[ver_list[i]] = true; | ||
if (out.size() >= max_num_neighbor) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this sampling method actually samples vertices exactly following the given probability. The vertices at the beginning of ver_list have a higher chance of being selected. random_shuffle is also expensive.
An easier way is to compute the accumulative sum on the probability list and do a binary search on the accumulative probability list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After using random_shuffle, I think it is mathematically equal to the binary search. Actually, if we use binary search, we need to do N times of binary search and at each time we need to remove the last vertex we sampled from the list, it is also expensive. Anyway, I think the random_shuffle is not a good way either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binary search does require to remove the last sampled vertex and adjust the accumulative sum.
However, is this really mathematically equivalent? Could you show me how to calculate the probability a vertex is sampled? I'm not sure how to calculate the probability when there is random_shuffle.
while (!node_queue.empty() && | ||
vertices_count < max_num_vertices) { | ||
ver_node& cur_node = node_queue.front(); | ||
node_queue.pop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can potentially cause memory error. cur_node references to the first element in the queue. after pop, the memory of the first element is no longer held by the queue. this can lead to unexpected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should remove the node_queue.pop() to the end of the while loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix this bug.
GetSample(col_row_list[src_id], // ver_list | ||
prob_array, // probability | ||
num_neighbor, // max_num_neighbor | ||
sampled_vec); // output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, there is an easier case we should support first, i.e., neighbors are sampled uniformly. I think we should distinguish the easier case with the general case for the sake of performance. However we implement the general case, it's going to be much more expensive. We care about the performance of subgraph sampling a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zheng-da Do we need to change the API or we can just give a uniform probability to this operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to change the interface of the operator. One option is to give an NDArray with one element.
} | ||
|
||
// BFS traverse the graph and sample vertices | ||
std::vector<dgl_id_t> sub_ver_mp(vertices_num, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a good idea to allocate a vector for all nodes in the input graph. A graph can be super large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Here we can just use the unordered_map to store the flag. Here I want to speedup the memory access but the memory cost could be high.
* add csr_neighborhood op * update neighborhood sample * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample.cc
* edge_id op csr forward on CPU (zheng-da#34) * add node subgraph generator. (zheng-da#35) * create DGLSubgraph. * fix. * return old eids in node_subgraph. * accelerate subgraph construction. * Add neighborhood op (zheng-da#37) * add csr_neighborhood op * update neighborhood sample * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample.cc * add graph compact operator. * fix a bug in dgl_subgraph. * fix a bug in dgl_graph_compact. * Update csr sample op (zheng-da#39) * add csr_neighborhood op * update neighborhood sample * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample.cc * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample.cc * Update csr_neighborhood_sample-inl.h * remove space. * move to dgl_graph to contrib. * move code. * move edge id. * fix compilation error. * add test for subgraph. * cleanup. * fix. * fix. * fix compile error. * fix compile error. * fix compile error. * fix. * add operator doc. * remove graph_compact * update doc. * address comments. * retrigger. * address comments. * retrigger * fix a bug in test. * retrigger * add check_format
* add csr_neighborhood op * update neighborhood sample * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample-inl.h * Update csr_neighborhood_sample.cc
Add neighborhood op