Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add protonation functions #99

Open
wants to merge 5 commits into
base: v1.0/dev
Choose a base branch
from
Open

Conversation

jonwzheng
Copy link
Collaborator

Motivation or Problem

This PR adds several helper functions related to protonation & ionization.

Description of Changes

New "big" functions:

  • uncharge_mol(mol, method): Input = charged molecule (ion or zwitterion), output = uncharged form. Provides two algorithms for doing uncharging, default is to try both in case the other fails, starting with the rdkit algorithm.
  • is_symmetric_to_substructure(mol, substructure): Check whether a mol is symmetric to a provided substructure, i.e. return "True" for comparing ethylene glycol to "OH" substructure

Helper functions:

  • protonate_at_site(mol, site): Add a proton to a mol at a given idx and adjust formal charges
  • deprotonate_at_site(mol, site): Remove a proton of a mol at a given idx and adjust formal charges
  • is_implicit(mol) : Infer whether a molecule is an implicit or explicit mol object
  • find_symmetry_classes(mol): provides a set of symmetry classes for atoms in a mol object, based on code by Greg Landrum.

Testing

I included pytest modules for uncharge_mol and is_symmetric_to_substructure

Other notes

The two uncharging methods have different behaviors regarding explicit hydrogens.

Chem.MolToSmiles(uncharge_mol(mol_from_smiles("[C:1]([C:2]([C:3]([C:4](=[O:5])[O-:6])([H:12])[H:13])([H:10])[H:11])([H:7])([H:8])[H:9]"), method="rdkit"))
>> '[CH3:1][CH2:2][CH2:3][C:4](=[O:5])[OH:6]'

vs.

Chem.MolToSmiles(uncharge_mol(mol_from_smiles("[C:1]([C:2]([C:3]([C:4](=[O:5])[O-:6])([H:12])[H:13])([H:10])[H:11])([H:7])([H:8])[H:9]"), method="nocharge"))
>> '[H][O:6][C:4]([C:3]([C:2]([C:1]([H:7])([H:8])[H:9])([H:10])[H:11])([H:12])[H:13])=[O:5]'

Is the desired behavior to re-number?

They still return the same smiles if you use mol_to_smiles though.

@jonwzheng jonwzheng changed the base branch from main to v1.0/dev June 4, 2024 17:20
Copy link

codecov bot commented Jun 4, 2024

Codecov Report

Attention: Patch coverage is 75.78947% with 23 lines in your changes missing coverage. Please review.

Project coverage is 60.93%. Comparing base (a1db373) to head (7d20e94).

Files Patch % Lines
rdtools/mol.py 63.49% 12 Missing and 11 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           v1.0/dev      #99      +/-   ##
============================================
+ Coverage     60.62%   60.93%   +0.30%     
============================================
  Files            67       67              
  Lines          4874     4969      +95     
  Branches       1199     1227      +28     
============================================
+ Hits           2955     3028      +73     
- Misses         1779     1790      +11     
- Partials        140      151      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jonwzheng
Copy link
Collaborator Author

Based on the codecov, may need to add additional tests to properly test implicit_h logic

@xiaoruiDong
Copy link
Owner

@jonwzheng, thank you for this addition. I will work on reviewing this PR shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants