-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First Pass at Fully Implementing the PropertyEstimator Calls #7
First Pass at Fully Implementing the PropertyEstimator Calls #7
Conversation
Cleaned up the config dictionary into a full object. Minor reformatting and code renaming to be more comprehensible.
Thanks a lot for the PR. I think the changes overall looks great, and a few minor issues can be fixed quickly. In general, I have two suggestions:
|
Made gradient calculation optional. Refactored code to reduce redundency. Temporarily made the estimation code blocking. Turned off uncertainty checks. Added support for JSON property files. Moved imports into try / except block.
if AGrad is False: | ||
return estimated_data, estimated_gradients | ||
|
||
jacobian = self._build_pvals_jacobian(mvals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to see this implementation leads to using gradients in PE. I have three questions:
- Are these gradients validated against the numerical gradients of mvals? If so, can we keep the numerical gradients code in here as a sanity check option?
- The
pvals
does not contain all changing parameters in the ForceBalance parameterization. These are the "freely moving" ones. ForceBalance also supports "parameter_eval" and "parameter_repeat", and when those are defined the one-to-one mapping_gradient_key_mappings
will lead to gradients of the evaluated parameters, (the ones not in pvals) to be lost. You can see here, thatpfields
have all fields in the Force field file that is changing. And Lee-Ping recently implemented OpenFF API calls to modify OpenFF Forcefield object. (If you merge the master branch you will get them.) The point is we want to collect gradients for allpfields
intomvals
, not justpvals
. - I think the
_build_pvals_jacobian()
gives the same result so shall we store that and skip re-computing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Currently I'm comparing them against the values of the native FB implementation - my preference would be to use FB as the reference in this case (as opposed to the initial PE implementation) as it's a good sign when two independent code implementations are in good agreement!
-
Apologies I'm not sure that I understand this point fully - would you be able to give a few examples of cases where this would occur?
If these options aren't set frequently (or more pressing if they don't get set in the first round of fitting) it might be good for now to just raise a
NotImplementedException
in those cases, and revisit this at a later date just to help ease the time constraints in this first sprint (although if the effort to add this is minimal I'm open to trying to fix it now rather than postpone it until later 🙂)! -
I think the latency of re-calculating the jacobian is well hidden by the cost of evaluating the physical properties so it shouldn't really contribute to the performance, but happy to add the caching if you'd prefer that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 2, I also had the same idea at first when implementing other components, but then I realized that we do need those features. An example is this test proposed by Mobley group: https://github.com/openforcefield/openforcefield/pull/384/files
If you look at line 300 in that file, the parameter_eval
rule is set to use a special sqrtk1
parameter to set the k1
parameter. Here you will get the value of sqrtk1
in the pvals
list, but actually k1
is the "real" parameter. The parameter_eval
feature can also be used to set relations between different terms, i.e. constraining k1
of two different SMIRKs to be the same.
For 3, I agree the cost of evaluating the physical properties is higher than other parts of the code, so we designed the code to distribution the evaluation cost to many "worker nodes". In comparison, the _build_pvals_jacobian
takes serial execution time on the host. Although it is cheap for small number of parameters, it's not exactly clear how mush time cost it will add eventually when the parameter relations are all evaluated in this function. Because this time cost can be easily eliminated by caching the result, I think it's better to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1, after think more about it, I think I can understand your reason. The general idea is that, we have (a) ForceBalance liquid target as reference, and we implemented (b) numerical gradients, it matches the reference, then we implement (c) analytical gradients and it also matches the reference. I think you're saying as long as (c) matches (a), we don't need to keep (b) in the code.
I agree with this. However I think (b) is still valuable, because when (c) does not match (a), we will have a middle step to be able to tell which part goes wrong. We have found this very helpful, and you can see examples in the implementation of (a) where Lee-Ping still keep the FDCheck code just in case. https://github.com/leeping/forcebalance/blob/8cbafff745b416b513045b1eacbd58e14af22c16/src/data/npt.py#L526
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 2, I also had the same idea at first when implementing other components, but then I realized that we do need those features. An example is this test proposed by Mobley group: https://github.com/openforcefield/openforcefield/pull/384/files
I was just summarizing the Torsions meeting, and I think we've decided to skip improper fitting for release-1. We are probably OK to have it raise a NotImplementedError if this comes along.
I think this PR is very close to be done now! Great job. Here are a few things left:
|
improve assign_openff_parameter performance by caching parameter search
Improve optgeo target
… property_estimator
Temporarily disabled the ab initio target.
Although the numerical mvals gradients as a sanity check is a good feature to have, we agreed that we don't have time for re-adding that now, so we will revisit that later. Everything else looks good and I will merge now. |
Description
This PR is the result of a first pass of integrating the propertyestimator into the template provided by Yudong.
Status