Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to choose parameter when plotting free energy surfaces? #16

Open
yusowa0716 opened this issue Jan 25, 2024 · 2 comments
Open

How to choose parameter when plotting free energy surfaces? #16

yusowa0716 opened this issue Jan 25, 2024 · 2 comments

Comments

@yusowa0716
Copy link

Hello,

Thank you for the awesome work provided.

Upon reading both the paper and the tutorial, I noticed that the tutorial focuses exclusively on a single protein. During the process of constructing the Free Energy Surface (FES), I've identified several parameters that require definition, including Tica lag time, Tica dimensionality, the number of Kmeans clusters, MSM lag time, and the number of MSM macrostates. In the reference MD simulations, these parameters are indicated in the filenames. However, for the coarse-grained MD simulations, the tutorial only supplies details for protein G.

It would be immensely helpful if a table could be compiled, detailing these parameters for all 12 proteins covered in both the REFERENCE and CG MD simulations. Furthermore, the inclusion of the 'skip' values used in the CG MD would be of great assistance. If possible, I would also greatly appreciate any guidance or shared experiences regarding the parameter selection process for novel proteins or simulations.

Thank you once again for your time and the valuable resources you have provided. I look forward to any assistance you can offer on this matter.

Warm regards,
XJTUNR

@AdriaPerezCulubret
Copy link
Collaborator

Hi!

For the reference MSMs, you can use a TICA lag time of 20 steps for all models, and project the main 3 TICA dimensions. Kmeans cluster number varies, between 600-1200 depending on the system, but you should get similar results with any number of clusters. MSM lag time and macrostate number also depends on the system.

For the CG models, we used more or less the same hyperparameters for all models. Tica lag is the same as reference, since we are using the same covariances. We projected 3 TICA dimensions, clustered into 200 Kmeans clusters, and used an MSM lagtime of 0.01 ns and between 3 to 5 macrostates. The only exception I believe is chignolin, were we used a lagtime of 0.001 ns.
Here's the exact command: cgmodel.markovModel(0.01, 5, units='ns')

The CG models were we skipped frames are lambda-repressor and protein G, were we skipped every 2 frames.

If I can find time I'll upload all the exact values in the repo both for reference and CG, but these are the general guidelines.

@yusowa0716
Copy link
Author

Thanks for your general guidelines.

I'm looking forward to the exact values for each protein.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants