Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First Pass at Fully Implementing the PropertyEstimator Calls #7

Merged
merged 68 commits into from
Aug 21, 2019
Merged

First Pass at Fully Implementing the PropertyEstimator Calls #7

merged 68 commits into from
Aug 21, 2019

Conversation

SimonBoothroyd
Copy link

@SimonBoothroyd SimonBoothroyd commented Jun 24, 2019

Description

This PR is the result of a first pass of integrating the propertyestimator into the template provided by Yudong.

Status

  • Ready to go

SimonBoothroyd added 2 commits June 24, 2019 19:03
Cleaned up the config dictionary into a full object.
Minor reformatting and code renaming to be more comprehensible.
@SimonBoothroyd SimonBoothroyd changed the title [WIP] First Pass at Fully Implementing the PropertyEstimator Calls First Pass at Fully Implementing the PropertyEstimator Calls Jun 25, 2019
@yudongqiu
Copy link
Owner

yudongqiu commented Jun 25, 2019

Thanks a lot for the PR. I think the changes overall looks great, and a few minor issues can be fixed quickly. In general, I have two suggestions:

  • Customize the input file and simplify the API calls to property estimator.

  • Reproduce the ForceBalance run in studies/023_property_estimator. (Sorry that I didn't include the targets folder before, I just committed a compressed file targets.tar.gz there).

SimonBoothroyd added 9 commits June 27, 2019 18:39
Made gradient calculation optional.
Refactored code to reduce redundency.
Temporarily made the estimation code blocking.
Turned off uncertainty checks.
Added support for JSON property files.
Moved imports into try / except block.
if AGrad is False:
return estimated_data, estimated_gradients

jacobian = self._build_pvals_jacobian(mvals)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to see this implementation leads to using gradients in PE. I have three questions:

  1. Are these gradients validated against the numerical gradients of mvals? If so, can we keep the numerical gradients code in here as a sanity check option?
  2. The pvals does not contain all changing parameters in the ForceBalance parameterization. These are the "freely moving" ones. ForceBalance also supports "parameter_eval" and "parameter_repeat", and when those are defined the one-to-one mapping _gradient_key_mappings will lead to gradients of the evaluated parameters, (the ones not in pvals) to be lost. You can see here, that pfields have all fields in the Force field file that is changing. And Lee-Ping recently implemented OpenFF API calls to modify OpenFF Forcefield object. (If you merge the master branch you will get them.) The point is we want to collect gradients for all pfields into mvals, not just pvals.
  3. I think the _build_pvals_jacobian() gives the same result so shall we store that and skip re-computing it?

Copy link
Author

@SimonBoothroyd SimonBoothroyd Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Currently I'm comparing them against the values of the native FB implementation - my preference would be to use FB as the reference in this case (as opposed to the initial PE implementation) as it's a good sign when two independent code implementations are in good agreement!

  2. Apologies I'm not sure that I understand this point fully - would you be able to give a few examples of cases where this would occur?

    If these options aren't set frequently (or more pressing if they don't get set in the first round of fitting) it might be good for now to just raise a NotImplementedException in those cases, and revisit this at a later date just to help ease the time constraints in this first sprint (although if the effort to add this is minimal I'm open to trying to fix it now rather than postpone it until later 🙂)!

  3. I think the latency of re-calculating the jacobian is well hidden by the cost of evaluating the physical properties so it shouldn't really contribute to the performance, but happy to add the caching if you'd prefer that?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 2, I also had the same idea at first when implementing other components, but then I realized that we do need those features. An example is this test proposed by Mobley group: https://github.com/openforcefield/openforcefield/pull/384/files
If you look at line 300 in that file, the parameter_eval rule is set to use a special sqrtk1 parameter to set the k1 parameter. Here you will get the value of sqrtk1 in the pvals list, but actually k1 is the "real" parameter. The parameter_eval feature can also be used to set relations between different terms, i.e. constraining k1 of two different SMIRKs to be the same.

For 3, I agree the cost of evaluating the physical properties is higher than other parts of the code, so we designed the code to distribution the evaluation cost to many "worker nodes". In comparison, the _build_pvals_jacobian takes serial execution time on the host. Although it is cheap for small number of parameters, it's not exactly clear how mush time cost it will add eventually when the parameter relations are all evaluated in this function. Because this time cost can be easily eliminated by caching the result, I think it's better to do that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1, after think more about it, I think I can understand your reason. The general idea is that, we have (a) ForceBalance liquid target as reference, and we implemented (b) numerical gradients, it matches the reference, then we implement (c) analytical gradients and it also matches the reference. I think you're saying as long as (c) matches (a), we don't need to keep (b) in the code.
I agree with this. However I think (b) is still valuable, because when (c) does not match (a), we will have a middle step to be able to tell which part goes wrong. We have found this very helpful, and you can see examples in the implementation of (a) where Lee-Ping still keep the FDCheck code just in case. https://github.com/leeping/forcebalance/blob/8cbafff745b416b513045b1eacbd58e14af22c16/src/data/npt.py#L526

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 2, I also had the same idea at first when implementing other components, but then I realized that we do need those features. An example is this test proposed by Mobley group: https://github.com/openforcefield/openforcefield/pull/384/files

I was just summarizing the Torsions meeting, and I think we've decided to skip improper fitting for release-1. We are probably OK to have it raise a NotImplementedError if this comes along.

@yudongqiu
Copy link
Owner

yudongqiu commented Jul 10, 2019

I think this PR is very close to be done now! Great job.

Here are a few things left:

  • Include updated forcebalance/studies/023_property_estimator/optimize.out

  • keep numerical mvals gradients option as a sanity check.

  • Try improve jacobian calculation to support parameter_eval and parameter_repeat features. If it's too much work, we can revisit this later, and the sanity check with numerical gradients will be very helpful validating this.

@yudongqiu
Copy link
Owner

Although the numerical mvals gradients as a sanity check is a good feature to have, we agreed that we don't have time for re-adding that now, so we will revisit that later.

Everything else looks good and I will merge now.

@yudongqiu yudongqiu merged commit 5511216 into yudongqiu:property_estimator Aug 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants