Skip to content

Query: amino acid residue charges #5032

@tylerjereddy

Description

@tylerjereddy
Member

I'm just passing along an anonymous report that the net charge of amino acids read in using some AMBER-related topologies appears to cause some confusion (apparently this came up in a peer review criticism). I don't know if we are just "working as intended here," if there might be a small casting/data type issue, or if we just need to improve the docs a little. They didn't provide me with a programmatic reproducer, but I did get a brief verbal description on the whiteboard. I've tried to reconstruct into a minimum viable reproducer below, where some charges look a bit more sensible in float32/rounded/truncated format, though something may still be off in some cases.

This might normally appear to be a fairly minor matter, but in the context of certain machine learning workflows the net charge on the residues can be a problematic feature apparently.

import numpy as np

from MDAnalysisTests.datafiles import PRM
import MDAnalysis as mda


u = mda.Universe(PRM)
for res in u.residues:
    manual_charge = 0
    for atom in res.atoms:
        manual_charge += np.float32(atom.charge)
    print(res.resname, res.charge, manual_charge)
ALA 1.0000000558793545 0.9999998
GLU -0.9999999655410647 -1.0
PHE 3.888271749019623e-08 5.9604645e-08
HIE 1.3504177331924438e-08 5.9604645e-08
ARG 0.9999999335850589 1.0
TRP 2.0954757928848267e-08 5.9604645e-08
SER 1.862645149230957e-08 0.0
SER 1.862645149230957e-08 0.0
TYR 7.799826562404633e-09 0.0
MET 3.8533471524715424e-08 5.9604645e-08
VAL 4.0978193283081055e-08 5.9604645e-08
HIE 1.3504177331924438e-08 5.9604645e-08
TRP 2.0954757928848267e-08 5.9604645e-08
LYS -6.891787052154541e-08 -1.1920929e-07

Are we "ok" here, or do we need a small tweak somewhere?

Activity

added a commit that references this issue on Apr 18, 2025
36662df
linked a pull request that will close this issue on Apr 18, 2025
orbeckst

orbeckst commented on May 9, 2025

@orbeckst
Member

This looks all the same to me within machine precision for float32 (see below) so if the intended precision is float32 then I don't see a fundamental issue.

>>> np.finfo(np.float32).eps
1.1920929e-07

>>> np.finfo(np.float64).eps
2.220446049250313e-16

The PRM file specs https://ambermd.org/FileFormats.php#topology say that these are E16.8 numbers. With 8 decimals and charges not larger than ±1 this implies intent for something closer to IEE754 single (~np.float32) than double although there's obviously no direct connection.

%FLAG CHARGE
%FORMAT(5E16.8)  (CHARGE(i), i=1,NATOM)
  CHARGE : the atom charges.  Amber internally uses units of charge such
           that E = q1*q2/r, where E is in kcal/mol, r is in Angstrom, 
           and q1,q2 are the values found in this section of the prmtop file.

           Codes that write prmtop files need to ensure that the values
           entered in this section satisfy the rule given above; similarly,
          codes that read prmtop files need to interpret these values in
           the same way that Amber does.  (This is, after all, an "Amber"
           file format!)

           [Aside: Conversion of these internal values to units of electron 
           charge depends of what values are chosen for various fundamental
           constants.  For reasons lost to history, Amber force field 
           development has been based on the formula: q(internal units) =
           q(electron charge units)*18.2223.  Gromacs and some other programs
           currently [2015] use a conversion factor of 18.222615, which
           more closely agrees with currently-accepted values for fundamental
           constants.  Conversions to or from the prmtop format may need 
           to take account of such small differences, if they are important.]

All of this suggests to me that in principle we're doing nothing wrong by treating AMBER charges as float32.

However, if there's a better/more consistent way to present the data then I am not opposed.

tylerjereddy

tylerjereddy commented on May 9, 2025

@tylerjereddy
MemberAuthor

Are you for or against rounding per the matching PR?

orbeckst

orbeckst commented on May 9, 2025

@orbeckst
Member

Just commented on the PR #5035 (comment) – in short: against rounding by default, would support option to do so, ultimately give the users tools.

orbeckst

orbeckst commented on May 9, 2025

@orbeckst
Member

And would like to hear others chime in!

IAlibay

IAlibay commented on May 13, 2025

@IAlibay
Member

these are E16.8 numbers. With 8 decimals and charges not larger than ±1

My suggestion for the AMBER topology parser is to round at 8 decimals, this might fix this in this case.

Are you for or against rounding

This is a really hard one, which is why I've been dragging my feet on giving a response. I think I'm weakly in favour of rounding but at 1e-7 or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @orbeckst@tylerjereddy@IAlibay

      Issue actions

        Query: amino acid residue charges · Issue #5032 · MDAnalysis/mdanalysis