Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further Details on Non-Local Geometric Message Passing Implementation #11

Open
mouthful opened this issue Aug 2, 2023 · 4 comments
Open

Comments

@mouthful
Copy link

mouthful commented Aug 2, 2023

Hello,

I would like to express my gratitude for your sharing the valuable codebase. So3krates is an insightful work and inspires me a lot.

While going through the code, I noticed that I couldn't find the implementation details regarding non-local geometric message passing (corresponding to Eq. 14 in the paper). This concept is of particular interest to me, and I am wondering whether you are able to provide any guidance or additional information on where I can find the relevant implementation details about non-local geometric message passing.

Thank you for your kind sharing and support. I look forward to hearing from you.

@thorben-frank
Copy link
Owner

Hey,

you are right, the current version has it not implemented yet, since we are currently working on a improved mechanism that will replace the one proposed in the paper.

However, in the branch v0.1 you can still find the mechanism. Eq (14) from the paper is basically implemented here

alpha_s_ij = safe_scale(alpha_ij, scale=pair_mask[:, None]*phi_chi_cut[:, None]) # shape: (n_pairs,n_heads)
where cutoff and construction of the neighborhood is done from
d_chi_ij = safe_scale(to_scalar_feature_norm(chi_ij, axis=-1), pair_mask) # shape: (n_pairs)
to
phi_chi_cut = safe_scale(self.chi_cut_fn(segment_softmax(d_chi_ij)), scale=pair_mask)
.

Best,
Thorben

@mouthful
Copy link
Author

mouthful commented Aug 2, 2023

Hi,

Thank you very much for your prompt response and for addressing my concerns. As I am planning to utilize the non-local message passing module to investigate out-of-scale extrapolation in related tasks, I am curious about the rationale behind omitting the non-local geometric message passing in your implementation. From my understanding, this correction is significant for capturing non-local effects, which are essential for generalizing to larger molecules. Certain statements in your paper also seem to support this notion.

Additionally, I noticed that the attention coefficients are linked to a filter-generating function, $w_{ij}$, comprising a radial filter and a spherical filter. Based on this observation, would it be right to conclude that merely uncoupling geometric information from atomic features within the attention mechanism is adequate for capturing non-local effects, even when the receptive field is constrained by the radial neighborhood? Please let me know if my understanding is correct.

I appreciate your time and assistance in clarifying these points.

Best,
He Zhang

@thorben-frank
Copy link
Owner

thorben-frank commented Aug 2, 2023

Hi He Zhang,

Indeed capturing non-local effects is an important part for describing larger molecules. However, generalization to larger structures is usually achieved by building purely local representations which can then be re-used on much larger structures by summing up local contributions. Within our paper, we show that although we are using non-local interactions generalization to larger structures can still be achieved. But non-local corrections are not neccessary for generalizing to larger structures from an ML point of view, since your model might still give you a low error on some test set. However, important effects that happen on large length scales will remain not contributed for. Note, that this holds also for a model with non-local corrections, if your data does not include the length scales that you want to generalize to. I think generalization to larger structures in a phyiscally meaningful way is still an open question so it is nice to hear that you are investigating it. To that end, be aware that the proposed mechanism in the paper specifically targets long-range interactions that depend on relative orientations of neighborhoods as is e.g. for electronically delocalized effects. Other long-range effects e.g. electrostatics which depend on pairwise distances are not be captured by the mechanism. Since we are interested in describing all effects at once, we ommited it from our current code and are working towards a more general solution. I hope this gives a rationale behind our decision.

As you said you are looking into out-of-scale extrapolation, I would recommend to start of with a fully local model. Often, this models are already quite suited for that task. The interesting questions are then if a non-local model can generalize at all and if so if it captures the non-local parts correctly. This is something, no answer exists yet, as far as I know.

Best,
Thorben

@mouthful
Copy link
Author

mouthful commented Aug 2, 2023

Hi Thorben,

Thank you for your detailed response. I understand that the locality assumption based on Kohn's nearsightedness is reasonable for generalizing ML models to larger structures. The primary motivation for So3krates is to consider non-local effects that are crucial for specific molecular systems, which are often neglected by purely local representations. So3krates addresses this issue by utilizing non-local geometric corrections. And the So3krates models need to be trained exactly on these systems and used for in-scale but across-conformation generalization. Thus So3krates does not guarantee good out-of-scale generalization performance.

Regarding long-range effects (e.g., electrostatics) that depend on pairwise distances, I came across a model named Ewald-MP, which employs a nonlocal Fourier space scheme to capture these interactions. This paper might cover some scenarios you mentioned.

As you mentioned, you are working towards a more general solution to encode all effects and you start with a purely local model. I am curious whether you plan to re-incorporate all possible non-local correction modules (for covering all potential long-range effects) in the future. For instance, would you consider combining the SPHC-related module and the Ewald-MP modules?

Looking forward to your thoughts.

Best,
He

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants