-
Notifications
You must be signed in to change notification settings - Fork 720
Description
I'm just passing along an anonymous report that the net charge of amino acids read in using some AMBER-related topologies appears to cause some confusion (apparently this came up in a peer review criticism). I don't know if we are just "working as intended here," if there might be a small casting/data type issue, or if we just need to improve the docs a little. They didn't provide me with a programmatic reproducer, but I did get a brief verbal description on the whiteboard. I've tried to reconstruct into a minimum viable reproducer below, where some charges look a bit more sensible in float32
/rounded/truncated format, though something may still be off in some cases.
This might normally appear to be a fairly minor matter, but in the context of certain machine learning workflows the net charge on the residues can be a problematic feature apparently.
import numpy as np
from MDAnalysisTests.datafiles import PRM
import MDAnalysis as mda
u = mda.Universe(PRM)
for res in u.residues:
manual_charge = 0
for atom in res.atoms:
manual_charge += np.float32(atom.charge)
print(res.resname, res.charge, manual_charge)
ALA 1.0000000558793545 0.9999998
GLU -0.9999999655410647 -1.0
PHE 3.888271749019623e-08 5.9604645e-08
HIE 1.3504177331924438e-08 5.9604645e-08
ARG 0.9999999335850589 1.0
TRP 2.0954757928848267e-08 5.9604645e-08
SER 1.862645149230957e-08 0.0
SER 1.862645149230957e-08 0.0
TYR 7.799826562404633e-09 0.0
MET 3.8533471524715424e-08 5.9604645e-08
VAL 4.0978193283081055e-08 5.9604645e-08
HIE 1.3504177331924438e-08 5.9604645e-08
TRP 2.0954757928848267e-08 5.9604645e-08
LYS -6.891787052154541e-08 -1.1920929e-07
Are we "ok" here, or do we need a small tweak somewhere?
Activity
MAINT: rounding residue partial charges
orbeckst commentedon May 9, 2025
This looks all the same to me within machine precision for float32 (see below) so if the intended precision is float32 then I don't see a fundamental issue.
The PRM file specs https://ambermd.org/FileFormats.php#topology say that these are
E16.8
numbers. With 8 decimals and charges not larger than ±1 this implies intent for something closer to IEE754 single (~np.float32) than double although there's obviously no direct connection.All of this suggests to me that in principle we're doing nothing wrong by treating AMBER charges as float32.
However, if there's a better/more consistent way to present the data then I am not opposed.
tylerjereddy commentedon May 9, 2025
Are you for or against rounding per the matching PR?
orbeckst commentedon May 9, 2025
Just commented on the PR #5035 (comment) – in short: against rounding by default, would support option to do so, ultimately give the users tools.
orbeckst commentedon May 9, 2025
And would like to hear others chime in!
IAlibay commentedon May 13, 2025
My suggestion for the AMBER topology parser is to round at 8 decimals, this might fix this in this case.
This is a really hard one, which is why I've been dragging my feet on giving a response. I think I'm weakly in favour of rounding but at 1e-7 or something like that.