-
Notifications
You must be signed in to change notification settings - Fork 718
Description
Hi all,
I've been using OpenMM to run simulations for the past couple of years. Unlike other tools, OpenMM does not write a 'topology' file (e.g. .gro
, .prmtop
) but instead creates a Topology object on the fly when loading structures. Because simulation systems tend to be quite large, I've take to use PDBx/mmCIF files as my default file format for writing structures that I use as topologies.
I'd like to start using MDAnalysis but right now this involves jumping through a bunch of hoops to get my topologies in a format that is parseable. It'd be much easier if I could just load a PDBx/mmCIF file as a topology, specially since it's now the default file format for structures in the PDB.
To this end, I've started working on writing a simple PDBxParser
class. I wouldn't mind extending it to a PDBxReader/Writer class but that is not necessarily a priority (specially the writer). Would this be an interesting feature to add in your opinion? See the code here, I modeled the class after PDBParser
.
Thanks for the great work with the library so far!
EDIT: Since I cannot label issues, I'm just editing the title of the issue for now to make it clear!
Activity
[-]PDBx/mmCIF Reader/Topology Reader[/-][+][Feature][Improvement] PDBx/mmCIF Reader/Topology Reader[/+][-][Feature][Improvement] PDBx/mmCIF Reader/Topology Reader[/-][+][Feature][Contrib] PDBx/mmCIF Reader/Topology Reader[/+]orbeckst commentedon Oct 17, 2019
Hi @JoaoRodrigues , new formats are a great addition. Can you just create a PR and then we can comment directly on the code, run tests, etc?
[-][Feature][Contrib] PDBx/mmCIF Reader/Topology Reader[/-][+]PDBx/mmCIF Reader/Topology Reader[/+]JoaoRodrigues commentedon Oct 17, 2019
Sounds good. Will do. Thanks!
orbeckst commentedon Jul 30, 2020
Pure python implementation: https://github.com/Electrostatics/mmcif_pdbx
orbeckst commentedon Jul 30, 2020
Also, chemfiles reads mmCIF and can be used inside MDAnalysis, see Reading trajectories with chemfiles.
orbeckst commentedon Jul 8, 2021
Note on trying to use chemfiles:
With the 2.0.0b,
fails with
I can't just read it with
either as it gives the same ValueError.
Just trying something stupid with the topology as a PDB also fails
with
Conclusion: I couldn't get it to work with chemfiles. Maybe @Luthaf has some ideas but a key problem seems to be that our (EDIT) chemfiles converter do not work for topologies —— ??
Luthaf commentedon Jul 8, 2021
I think that topology reading is not registered with MDA, since the chemfiles adapter is implemented as a coordinate reader/writer. Although the conversion from chemfiles to MDA topologies is already implemented, so it should be mostly a question of adding a new
ChemfilesParser
class. I'll have a look at this next week!This is a bit strange and probably a bug, is this 1ake from the wwwPDB?
Luthaf commentedon Jul 8, 2021
Ok, I understand this part. The core of the issue is that to relatively different formats want to use the same extension: mmCIF and crystallography CIF. While they both use the same STAR format, they specify data in different ways. Unfortunately, chemfiles is associating the
.cif
extension with crystallography CIF files, and uses.mmcif
for mmCIF.There is a simple workaround though, since you can specify the format to use manually, with something like
Unfortunately, this still fails in the case of 1AKE (but should work for other files). I'll fix the 1AKE issue, it should be working in the next patch release.
I would also like to introduce a better format guessing functionality, to decide between mmCIF and crystallography CIF on the fly instead of having the use specify it manually.
9 remaining items
orbeckst commentedon May 17, 2022
The mmCIF/PDBx format would also be needed for alphafold #3377 .
razvanmarinescu commentedon Aug 16, 2022
does anyone have a solution for writing an MDAnalysis universe as PDBx/mmCIF?
joaomcteixeira commentedon Aug 16, 2022
you could use
pdb_tocif
tool from pdb-tools applied to all PDB files from a trajectory and thenpdb_mkensemble
to merge them.razvanmarinescu commentedon Aug 16, 2022
I just tried
pdb_tocif
, but it screws up the segids. It should have segid of B0, or B1, ... It only keeps the chain (B). See the question marks below:JoaoRodrigues commentedon Aug 16, 2022
Sorry to cross-post on a different project, but could you share an input file?
pdb_tocif
is not designed to handle multi-character chain identifiers. Feel free to open an issue in thepdb-tools
repo.marinegor commentedon May 15, 2024
I wonder if it'd be ok to use
gemmi
library (link) as a dependency to readmmcif
and other cif-like formats consistently?IMO it's one of the best supported crystallography-related libraries, is maintained by ccp4 and globalphasing, and allows very detailed cif parsing and writing.
For example, reading all atom ID and coordinates could be done as simple as:
the whole discussion with devs here
hmacdope commentedon May 17, 2024
@marinegor I think @richardjgowers has some ideas in this area
marinegor commentedon May 20, 2024
@richardjgowers could you share them (here, if it's appropriate place, or somewhere on discord)?
richardjgowers commentedon May 23, 2024
@marinegor I'd done this at a hackathon: #4303
The problem with this approach is that it doesn't do the "table join" on conect records that mmcif relies on, so you won't get some data (bonds).
I've since done this: https://github.com/OpenFreeEnergy/pdbinf which does do the "table join" to get bonds. It's into rdkit format, but conversion is trivial...
gemmi
-based mmcif reader (with easy extension to PDB/PDBx and mmJSON) #4712