-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dump_one
and dump_many
for SDF file format
#266
Comments
Thanks for noticing these issues. The I'm a bit confused with your second point. When connectivity information is present in the IOData instance, it should be dumped. Can you given an example showing this problem? That may clarify the issue. |
Yeah, you make the point. I didn't make it clear. Because when connectivity information is not available, such as in XYZ file format, there would be no such information dumped to SDF file. So, I think we need to figure out a good way of generating the connectivity. I know open babel handles this very well. |
It's a bit of mission creep....we can dump connectivity when we have it, but defining it would be a utility external to IOData I think. It's implicit in GOpt, and we could use that to define connectivity to the extent we need it. Perhaps would require splitting off a utility from GOpt for connectivity. |
I agree with Paul. IOData does (at least for now) not attempt to guess where bonds are because it goes beyond the original scope of reading and writing data. If we decide to extend the scope, there should also be some discussion on how far we'd like to go. I'll try to make a few guesses. Just detecting connectivity (without trying to guess the types of bonds) can be done with relatively little code (~15 lines) and a table of covalent radii. For PDB files, that would be fine. However, not for the SDF format, because it also describes the type of bond to represent a Lewis structures. Trying to guess a Lewis structure from the connectivity is quite complex and existing algorithms tend to break on exotic molecules. (Even humans don't always agree.)Such an algorithm would go quite far beyond the scope of IOData. Openbabel, RDKit and OpenEye have advanced solutions for this. You can also try to use variations in bond length to detect the bond order, but that would require well-optimized geometries. Effects from level of theory, basis set or just internal strain may be enough to break the algorithm. In any case, I'd suggest to fix one thing at a time. If you can make a PR fixing the |
So to be clear, I wouldn't be averse to having a stand-alone utility that had the functionality:
I wouldn't want to include this in It is a fascinating problem, though. I thought a little bit about the problem of generating atom/bond types this morning (for fun) and, wow, what a mess. Especially as we are interested in structures that are not necessarily equilibrium structures, coming up with anything sensible would be very difficult, except maybe for relatively simple organic compounds and inorganic molecules involving only elements from Groups 1,2, 16, 17, 18. Even in such easy cases, what one does with things like sulfur hexafluoride? One would almost need to run a semiempirical calculation (or minimal basis set HF) and then post-process the data to be reliable, and then one is really truly in the Also, (for now) |
Thanks for the comments! @PaulWAyers @tovrstra It makes things very clear to time for now. I will fix the missing tag issues shortly and make a new PR. |
Fix missing `M END` tag for SDF format, close #266
The current implementation of
iodata
has some problems with saving SDF files.I don't know a quick fix to it yet, but it would nice to have SDF dump functionality.
The text was updated successfully, but these errors were encountered: