radius graph defaults #21

nec4 · 2021-05-31T12:23:54Z

Hello everyone! The tools are great so far. I noticed that there is a small but important corner case for the usage of radius_graph from torch_cluster. When searching for neighbors, the behavior is governed by the cutoff distance r and max_num_neighbors (see docs here: https://github.com/rusty1s/pytorch_cluster#radius-graph). The latter is set to a maximum of 32 neighbors for each node. If, for example, the user inputs a large cutoff distance intending to return all neighbors, they will still be truncated at a maximum of 32 even if the user expects more. Furthermore, I'm not sure how radius_graph decides to reject extra neighbors, or how example shuffling during training affects this - for my usage case it seems to make a big difference in the training and inference. Because the SchNet philosophy is to operate on the notion of cutoff distances, not maximum neighbors, would it make sense to add a kwarg to the TorchMD_GN.__init__() raise the limit of the max neighbors for this operation?

Of course, most users probably will not run into this problem if they stick to small cutoffs because they will never hit the upper neighbor ceiling. However, I would be happy to branch, implement this, write tests, and make a PR if it seems like a good idea.

The text was updated successfully, but these errors were encountered:

raimis · 2021-05-31T12:54:21Z

@nec4 thanks for reporting!

@PhilippThoelke, we should look at this. I have observed unexpected behavior too with respect the cutoff distance, but haven't dig so deep in the code.

PhilippThoelke · 2021-05-31T13:48:52Z

@nec4 I agree, the current behavior is not very intuitive. Unfortunately pytorch_cluster doesn't provide a way to look for all neighbors inside a cutoff so I think adding an argument for the max number of neighbors as you suggested seems to be the best option here. Feel free to open a PR with an extra argument to TorchMD_GN and TorchMD_T.

@raimis can you elaborate on that? What did you observe and do you think it is related to the max_num_neighbors in radius_graph?

By the way, the CUDA version of radius_graph in pytorch_cluster seems to just stop looking for neighbors once max_num_neighbors is reached (see this). I also don't think that the order is affected by shuffle as only molecules (samples) are shuffled, not the order of atoms inside the molecules, making this even more problematic. It might be different for the CPU version of radius_graph. It relies on a KDTreeVectorOfVectorsAdaptor from the nanoflann library, which might reorder atoms in some way.

raimis · 2021-05-31T14:34:02Z

@raimis can you elaborate on that? What did you observe and do you think it is related to the max_num_neighbors in radius_graph?

I tried to increase cutoff, but it had no any effect on accuracy.

nec4 · 2021-05-31T16:05:26Z

I think that might be the case if the number of neighbors exceeds the default 32 for your given choice of physical cutoff distance.

raimis · 2021-05-31T16:18:14Z

I'm trying to retrain with a more generous max_num_neighbors value to check.

nec4 · 2021-05-31T17:37:30Z

For me, it definitely makes a difference, although my use case is probably different. I will write a PR later tonight.

PhilippThoelke · 2021-05-31T19:15:12Z

If you are using expnorm radial basis functions you might also run into the problem that larger distances are represented by less RBFs. The expnorm parameters are initialized as recommended by PhysNet, which yields the following distribution for cutoff=5 and cutoff=15:

If your test with an increased max_num_neighbors does not change anything, maybe try different cutoff values with Gaussian RBFs as they are evenly spaced between the lower and upper cutoff. However, we found that using expnorm significantly improved results over Gaussians on QM9 so it might be worth looking into a better initialization of the expnormals to cover a broader range for high cutoff values.

giadefa · 2021-05-31T19:22:12Z

For larger cutoff it seems weird. Maybe we can make the initialization parameters trainable?

…

On Mon, May 31, 2021, 21:15 Philipp Thölke ***@***.***> wrote: If you are using expnorm radial basis functions you might also run into the problem that larger distances are represented by less RBFs. The expnorm parameters are initialized as recommended by PhysNet, which yields the following distribution for cutoff=5 and cutoff=15: [image: rbf] <https://user-images.githubusercontent.com/36135990/120232615-378d9880-c254-11eb-91c6-5e3cea81cda8.png> If your test with an increased max_num_neighbors does not change anything, maybe try different cutoff values with Gaussian RBFs as they are evenly spaced between the lower and upper cutoff. However, we found that using expnorm significantly improved results over Gaussians on QM9 so it might be worth looking into a better initialization of the expnormals to cover a broader range for high cutoff values. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#21 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOQES3ZOF4M66R75SZDTQPN43ANCNFSM452ZIPUQ> .

nec4 · 2021-05-31T19:51:23Z

Thats a good point. The ExpNorm basis functions are pretty coarse at the upper end of the basis if you use a larger upper cutoff. In the CGschNet package, we introduced another parameter, alpha, to modulate the transition between sharply peaked basis functions and more broadly shaped basis functions:

https://github.com/coarse-graining/cgnet/blob/934209b467047e1fdd0ceb3ade782abd71c1b275/cgnet/feature/utils.py#L241

PhilippThoelke · 2021-06-01T06:52:35Z

For larger cutoff it seems weird. Maybe we can make the initialization parameters trainable?

These parameters are trainable if trainable_rbf is set to true. However, I don't think this really fixes the problem because these parameters usually don't change a lot during training. I have not looked into how the parameters evolve in applications where a large cutoff is required though.

PhilippThoelke · 2021-06-02T07:10:01Z

The PR to set the max_num_neighbors argument is merged now (#22). Let me know if there is still inconsistent behavior related to the number of neighbors.

nec4 self-assigned this May 31, 2021

nec4 mentioned this issue May 31, 2021

Adjusting the maximum number of neighbor nodes in radius_graph #22

Merged

PhilippThoelke closed this as completed Jun 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

radius graph defaults #21

radius graph defaults #21

nec4 commented May 31, 2021

raimis commented May 31, 2021

PhilippThoelke commented May 31, 2021

raimis commented May 31, 2021

nec4 commented May 31, 2021

raimis commented May 31, 2021

nec4 commented May 31, 2021

PhilippThoelke commented May 31, 2021

giadefa commented May 31, 2021 via email

nec4 commented May 31, 2021

PhilippThoelke commented Jun 1, 2021

PhilippThoelke commented Jun 2, 2021

radius graph defaults #21

radius graph defaults #21

Comments

nec4 commented May 31, 2021

raimis commented May 31, 2021

PhilippThoelke commented May 31, 2021

raimis commented May 31, 2021

nec4 commented May 31, 2021

raimis commented May 31, 2021

nec4 commented May 31, 2021

PhilippThoelke commented May 31, 2021

giadefa commented May 31, 2021 via email

nec4 commented May 31, 2021

PhilippThoelke commented Jun 1, 2021

PhilippThoelke commented Jun 2, 2021