Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add neighborhood op #37

Merged
merged 5 commits into from
Nov 6, 2018
Merged

Add neighborhood op #37

merged 5 commits into from
Nov 6, 2018

Conversation

aksnzhy
Copy link

@aksnzhy aksnzhy commented Oct 31, 2018

Add neighborhood op

out.push_back(ver_list[i]);
mp[ver_list[i]] = true;
if (out.size() >= max_num_neighbor) {
return;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this sampling method actually samples vertices exactly following the given probability. The vertices at the beginning of ver_list have a higher chance of being selected. random_shuffle is also expensive.

An easier way is to compute the accumulative sum on the probability list and do a binary search on the accumulative probability list.

Copy link
Author

@aksnzhy aksnzhy Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After using random_shuffle, I think it is mathematically equal to the binary search. Actually, if we use binary search, we need to do N times of binary search and at each time we need to remove the last vertex we sampled from the list, it is also expensive. Anyway, I think the random_shuffle is not a good way either.

Copy link
Owner

@zheng-da zheng-da Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary search does require to remove the last sampled vertex and adjust the accumulative sum.
However, is this really mathematically equivalent? Could you show me how to calculate the probability a vertex is sampled? I'm not sure how to calculate the probability when there is random_shuffle.

while (!node_queue.empty() &&
vertices_count < max_num_vertices) {
ver_node& cur_node = node_queue.front();
node_queue.pop();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can potentially cause memory error. cur_node references to the first element in the queue. after pop, the memory of the first element is no longer held by the queue. this can lead to unexpected behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should remove the node_queue.pop() to the end of the while loop.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix this bug.

GetSample(col_row_list[src_id], // ver_list
prob_array, // probability
num_neighbor, // max_num_neighbor
sampled_vec); // output
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, there is an easier case we should support first, i.e., neighbors are sampled uniformly. I think we should distinguish the easier case with the general case for the sake of performance. However we implement the general case, it's going to be much more expensive. We care about the performance of subgraph sampling a lot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zheng-da Do we need to change the API or we can just give a uniform probability to this operator?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to change the interface of the operator. One option is to give an NDArray with one element.

}

// BFS traverse the graph and sample vertices
std::vector<dgl_id_t> sub_ver_mp(vertices_num, 0);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a good idea to allocate a vector for all nodes in the input graph. A graph can be super large.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Here we can just use the unordered_map to store the flag. Here I want to speedup the memory access but the memory cost could be high.

@zheng-da zheng-da merged commit a54ebbc into zheng-da:dgl_graph Nov 6, 2018
zheng-da pushed a commit that referenced this pull request Nov 15, 2018
* add csr_neighborhood op

* update neighborhood sample

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample.cc
zheng-da added a commit to BullDemonKing/incubator-mxnet that referenced this pull request Nov 25, 2018
* edge_id op csr forward on CPU (zheng-da#34)

* add node subgraph generator. (zheng-da#35)

* create DGLSubgraph.

* fix.

* return old eids in node_subgraph.

* accelerate subgraph construction.

* Add neighborhood op (zheng-da#37)

* add csr_neighborhood op

* update neighborhood sample

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample.cc

* add graph compact operator.

* fix a bug in dgl_subgraph.

* fix a bug in dgl_graph_compact.

* Update csr sample op (zheng-da#39)

* add csr_neighborhood op

* update neighborhood sample

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample.cc

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample.cc

* Update csr_neighborhood_sample-inl.h

* remove space.

* move to dgl_graph to contrib.

* move code.

* move edge id.

* fix compilation error.

* add test for subgraph.

* cleanup.

* fix.

* fix.

* fix compile error.

* fix compile error.

* fix compile error.

* fix.

* add operator doc.

* remove graph_compact

* update doc.

* address comments.

* retrigger.

* address comments.

* retrigger

* fix a bug in test.

* retrigger

* add check_format
zheng-da pushed a commit that referenced this pull request Nov 25, 2018
* add csr_neighborhood op

* update neighborhood sample

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample-inl.h

* Update csr_neighborhood_sample.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants