Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable coordinate modification in the EEXE framework #15

Closed
wehs7661 opened this issue Jun 12, 2023 · 4 comments · Fixed by #16 or #21
Closed

Enable coordinate modification in the EEXE framework #15

wehs7661 opened this issue Jun 12, 2023 · 4 comments · Fixed by #16 or #21
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@wehs7661
Copy link
Owner

wehs7661 commented Jun 12, 2023

To expand the usage of EEXE, we want to enable coordinate manipulation at exchanges between replicas, which is most likely to be useful for estimating the free energy of multiple serial mutations using expanded ensemble simulations, such as mutating methane into ethane and then propane.

For example, we can have an EEXE simulation composed of two replicas mutating methane into ethane and ethane into propane, respectively, and only exchange the coordinates between replicas when they are at the end states, i.e., replica 1 being at λ=1 and replica 2 being at λ=0. In this example, we will have the following end states:

  • Replica 1: Mutating methane to ethane
    • State a: At λ = 0, we have methane with a dummy methyl group.
    • State b: At λ = 1, we have ethane with a dummy H atom.
  • Replica 2: Mutating ethane to propane
    • State c: At λ = 0, we have ethane with a dummy methyl group.
    • State d: At λ = 1, we have propane with a dummy H atom.

At exchanges, we will have two output gro files respectively from replicas 1 and 2, namely rep1.gro (state b, ethane with a dummy H atom at the first carbon) and rep2.gro (state c, ethane with a dummy ethyl group at the second carbon).

Note that in EEXE, each replica is bound to the transformation for its assigned alchemical range. In our case, this means that replica 1 will only be responsible for the mutation of a methane to an ethane, and replica 2 will only be responsible for mutating an ethane to a propane. Normally, we would just swap the gro files as is, so in the next iteration, replica 1 will be initialized with rep2.gro and sample the intermediate states along the mutation path between methane and ethane. However, rep2.gro is an ethane with a dummy methyl group, not an ethane with a dummy H atom that we need for such sampling. The same thing would happen when trying to initialize the next iteration of replica 2 using rep1.gro.

To address this issue, we can modify rep2.gro as follows and use it to proceed to the next iteration of replica 1:

  • Remove the dummy methyl group at the second carbon atom from rep2.gro.
  • Attach a dummy H atom to the first carbon atom in rep2.gro. Specifically, the coordinate of the dummy H atom can just be the coordinates of the second carbon atom. There won't be clashes since the dummy H atoms have no interactions with the rest of the system.

Similarly, we can modify rep1.gro as follows for the next iteration of replica 2:

  • Remove the dummy H atom at the first carbon from rep1.gro.
  • Attach a dummy methyl group at the second carbon atom in rep1.gro. Specifically, we can take the internal coordinates of the methyl group in rep2.gro, treat the group as rigid, rotate, and attach the group to the second carbon atom in rep1.gro.

Importantly, we can make the two modified gro files have the same potential energy, so the proposed exchange will always be adopted.

Here, we are not going to implement functions for coordinate manipulation in EEXE but modify the CLI run_EEXE (and the function run_grompp in ensemble_EXE.py, if necessary) to allow the flexibility of calling a user-defined function for coordinate manipulation from an input python module (where the user-defined function is defined).

@wehs7661 wehs7661 self-assigned this Jun 12, 2023
@wehs7661 wehs7661 added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 12, 2023
@wehs7661 wehs7661 changed the title Enable coordinate modification in the EEXE framework Enable chemistry transformation in the EEXE framework Jul 4, 2023
@wehs7661
Copy link
Owner Author

wehs7661 commented Jul 4, 2023

To better keep track of all efforts relevant to enabling chemistry transformation in the EEXE framework, I've expanded the scope of the original issue. Specifically, at this point, we have enabled coordinate manipulation, but the following two features still need to be enabled in the near term. For details on each of these tasks, please refer to the corresponding issues or PRs.

  • Enable specifying different GRO and TOP files for different replicas.
  • Enable specifying different sets of weights for different replicas.
  • Implement the CLI run_EEXE_mpi.

@wehs7661 wehs7661 linked a pull request Jul 4, 2023 that will close this issue
@wehs7661 wehs7661 pinned this issue Jul 5, 2023
@wehs7661 wehs7661 changed the title Enable chemistry transformation in the EEXE framework Enable coordinate modification in the EEXE framework Jul 5, 2023
@wehs7661
Copy link
Owner Author

wehs7661 commented Jul 5, 2023

It might make more sense to use a GitHub project to keep track of relevant efforts (instead of using an issue to track other issues), so I'm changing the title of the issue back to its original and closing the issue now.

This issue is a part of the work in the project EEXE for serial mutations.

@wehs7661
Copy link
Owner Author

wehs7661 commented Jul 8, 2023

I'm re-opening the issue since the original implementation failed would fail to import the external module using importlib. This is because the working directory where run_EEXE is executed is not in sys.path that importlib looks for the module. Appending the current working directory to sys.path should solve the problem.

Also, the current implementation assumes that the function for modifying coordinates takes only one input GRO file, but it seems more common that such a function takes in two input GRO files. For example, in the case described above, to modify rep1.gro, we'll also need coordinates of the methyl group from rep2.gro. This has to be dealt with and we should probably explore options that allow higher flexibility when possible.

@wehs7661
Copy link
Owner Author

Closing the issue since the abovementioned problems have been fixed in PR #21.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
1 participant