Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIF File non-iterable NoneType object Error #79

Open
aydinmirac opened this issue Nov 16, 2022 · 5 comments
Open

CIF File non-iterable NoneType object Error #79

aydinmirac opened this issue Nov 16, 2022 · 5 comments

Comments

@aydinmirac
Copy link

Dear All,

I train ALIGNN with cif files. To improve the performance, I tried to augment my cif files with AugLiChem library. Here is the snippet from the original file and the augmented cif file:

Original file:

data_image0
_chemical_formula_structural       H10C14S2N2O2
_chemical_formula_sum              "H10 C14 S2 N2 O2"
_cell_length_a       4.3258
_cell_length_b       8.982
_cell_length_c       8.4721
_cell_angle_alpha    90
_cell_angle_beta     90.594
_cell_angle_gamma    90

_space_group_name_H-M_alt    "P 1"
_space_group_IT_number       1

loop_
  _space_group_symop_operation_xyz
  'x, y, z'

loop_
  _atom_site_type_symbol
  _atom_site_label
  _atom_site_symmetry_multiplicity
  _atom_site_fract_x
  _atom_site_fract_y
  _atom_site_fract_z
  _atom_site_occupancy
  H   H1        1.0  0.83700  0.83300  0.55800  1.0000
  H   H2        1.0  0.16300  0.33300  0.44200  1.0000
.
.
.
(continues)

Augmented file:

# generated using pymatgen
data_H5C7SNO
_symmetry_space_group_name_H-M   'P 1'
_cell_length_a   4.32580000
_cell_length_b   8.98200000
_cell_length_c   8.47210000
_cell_angle_alpha   90.00000000
_cell_angle_beta   90.59400000
_cell_angle_gamma   90.00000000
_symmetry_Int_Tables_number   1
_chemical_formula_structural   H5C7SNO
_chemical_formula_sum   'H10 C14 S2 N2 O2'
_cell_volume   329.16012678
_cell_formula_units_Z   2
loop_
 _symmetry_equiv_pos_site_id
 _symmetry_equiv_pos_as_xyz
  1  'x, y, z'
loop_
 _atom_site_type_symbol
 _atom_site_label
 _atom_site_symmetry_multiplicity
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
  H  H0  1  0.83420779  0.82961480  0.54722879  1.0
  H  H1  1  0.15729856  0.32521300  0.43626670  1.0
.
.
.
(continues)

ALIGNN works fine with original cif files but whenever I try to train it with augmented file, I encounter the following error:

Using backend: pytorch
Traceback (most recent call last):
  File "/raid/apps/alignn/2021/bin/train_folder.py", line 195, in <module>
    train_for_folder(
  File "/raid/apps/alignn/2021/bin/train_folder.py", line 103, in train_for_folder
    atoms = Atoms.from_cif(file_path)
  File "/raid/apps/alignn/2021/lib/python3.8/site-packages/jarvis/core/atoms.py", line 537, in from_cif
    cif_atoms = cif_atoms.get_primitive_atoms
  File "/raid/apps/alignn/2021/lib/python3.8/site-packages/jarvis/core/atoms.py", line 710, in get_primitive_atoms
    return Spacegroup3D(self).primitive_atoms
  File "/raid/apps/alignn/2021/lib/python3.8/site-packages/jarvis/analysis/structure/spacegroup.py", line 240, in primitive_atoms
    lattice, scaled_positions, numbers = spglib.find_primitive(
TypeError: cannot unpack non-iterable NoneType object

I can not see a problem in augmented files. Do you have any suggestions?

Best regards,

@bdecost
Copy link
Collaborator

bdecost commented Nov 16, 2022

Hi @miracaydin1, can you please attach the full CIF file and share the versions you're using for at least jarvis-tools and spglib?

This looks like a potential issue with https://github.com/usnistgov/jarvis. Have you tried any other CIF parsers?

@bdecost
Copy link
Collaborator

bdecost commented Nov 16, 2022

a possible workaround might be to parse the structures using pymatgen and convert to jarvis Atoms, since your augmented file seems to have been written by the pymatgen CIF io module

@knc6
Copy link
Collaborator

knc6 commented Nov 16, 2022

Can you try to install cif2cell pip install cif2cell==2.0.0a3 ? jarvis-tools by default will use it and hopefully the issue should be resolved https://github.com/usnistgov/jarvis/blob/master/jarvis/core/atoms.py#L304 .
Also as @bdecost mentioned please feel free to send the cif file via email or attach here.

@aydinmirac
Copy link
Author

Hi @bdecost,

I installed the latest version of ALIGNN (2022.11.06). The latest version includes spglib=1.16.2 and jarvis-tools=2021.10.03

As I said, I augmented my current cif files with AugLichem (https://github.com/BaratiLab/AugLiChem). As I understand, this library uses pymatgen.

Here are another original and augmented CIF files. I attached the full files.

Original file:

data_image0
_chemical_formula_structural       H10C14S2N2O2
_chemical_formula_sum              "H10 C14 S2 N2 O2"
_cell_length_a       4.3258
_cell_length_b       8.982
_cell_length_c       8.4721
_cell_angle_alpha    90
_cell_angle_beta     90.594
_cell_angle_gamma    90

_space_group_name_H-M_alt    "P 1"
_space_group_IT_number       1

loop_
  _space_group_symop_operation_xyz
  'x, y, z'

loop_
  _atom_site_type_symbol
  _atom_site_label
  _atom_site_symmetry_multiplicity
  _atom_site_fract_x
  _atom_site_fract_y
  _atom_site_fract_z
  _atom_site_occupancy
  H   H1        1.0  0.83700  0.83300  0.55800  1.0000
  H   H2        1.0  0.16300  0.33300  0.44200  1.0000
  H   H3        1.0  0.44200  0.88800  0.82500  1.0000
  H   H4        1.0  0.55800  0.38800  0.17500  1.0000
  H   H5        1.0  0.17000  0.75200  0.00700  1.0000
  H   H6        1.0  0.83000  0.25200  0.99300  1.0000
  H   H7        1.0  0.09300  0.49900  0.98000  1.0000
  H   H8        1.0  0.90700  0.99900  0.02000  1.0000
  H   H9        1.0  0.36800  0.35600  0.77600  1.0000
  H   H10       1.0  0.63200  0.85600  0.22400  1.0000
  C   C1        1.0  0.86010  0.62540  0.49070  1.0000
  C   C2        1.0  0.13990  0.12540  0.50930  1.0000
  C   C3        1.0  0.58150  0.70380  0.69700  1.0000
  C   C4        1.0  0.41850  0.20380  0.30300  1.0000
  C   C5        1.0  0.42910  0.78120  0.81420  1.0000
  C   C6        1.0  0.57090  0.28120  0.18580  1.0000
  C   C7        1.0  0.25580  0.69710  0.92090  1.0000
  C   C8        1.0  0.74420  0.19710  0.07910  1.0000
  C   C9        1.0  0.24010  0.54200  0.90470  1.0000
  C   C10       1.0  0.75990  0.04200  0.09530  1.0000
  C   C11       1.0  0.38720  0.46420  0.78700  1.0000
  C   C12       1.0  0.61280  0.96420  0.21300  1.0000
  C   C13       1.0  0.55200  0.54980  0.68570  1.0000
  C   C14       1.0  0.44800  0.04980  0.31430  1.0000
  S   S1        1.0  0.08567  0.61384  0.33595  1.0000
  S   S2        1.0  0.91433  0.11384  0.66405  1.0000
  N   N1        1.0  0.77000  0.74630  0.57380  1.0000
  N   N2        1.0  0.23000  0.24630  0.42620  1.0000
  O   O1        1.0  0.72840  0.50336  0.55790  1.0000
  O   O2        1.0  0.27160  0.00336  0.44210  1.0000

Augmented file:

# generated using pymatgen
data_H5C7SNO
_symmetry_space_group_name_H-M   'P 1'
_cell_length_a   4.32580000
_cell_length_b   8.98200000
_cell_length_c   8.47210000
_cell_angle_alpha   90.00000000
_cell_angle_beta   90.59400000
_cell_angle_gamma   90.00000000
_symmetry_Int_Tables_number   1
_chemical_formula_structural   H5C7SNO
_chemical_formula_sum   'H10 C14 S2 N2 O2'
_cell_volume   329.16012678
_cell_formula_units_Z   2
loop_
 _symmetry_equiv_pos_site_id
 _symmetry_equiv_pos_as_xyz
  1  'x, y, z'
loop_
 _atom_site_type_symbol
 _atom_site_label
 _atom_site_symmetry_multiplicity
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
  H  H0  1  0.83420779  0.82961480  0.54722879  1.0
  H  H1  1  0.15729856  0.32521300  0.43626670  1.0
  H  H2  1  0.44203270  0.87915717  0.82747758  1.0
  H  H3  1  0.55019103  0.38184614  0.18319968  1.0
  H  H4  1  0.17514122  0.75619369  0.00136765  1.0
  H  H5  1  0.83553055  0.24418767  0.99477299  1.0
  H  H6  1  0.09609248  0.49658218  0.98098265  1.0
  H  H7  1  0.89645711  0.00334474  0.01640165  1.0
  H  H8  1  0.36614542  0.34817874  0.77735952  1.0
  H  H9  1  0.62119004  0.84995189  0.22153261  1.0
  C  C10  1  0.84699770  0.61782486  0.49019123  1.0
  C  C11  1  0.14011910  0.13232519  0.51033348  1.0
  C  C12  1  0.59049700  0.70509916  0.69616131  1.0
  C  C13  1  0.42211903  0.20489082  0.30292231  1.0
  C  C14  1  0.42748631  0.78231425  0.81204122  1.0
  C  C15  1  0.57196073  0.28928330  0.18326057  1.0
  C  C16  1  0.25796312  0.69827142  0.92115724  1.0
  C  C17  1  0.74538351  0.19733637  0.07035087  1.0
  C  C18  1  0.25245680  0.54381040  0.90293595  1.0
  C  C19  1  0.76391822  0.04290571  0.09075583  1.0
  C  C20  1  0.38889531  0.46937274  0.79015752  1.0
  C  C21  1  0.61097097  0.96269025  0.21394540  1.0
  C  C22  1  0.55416748  0.55167165  0.68179287  1.0
  C  C23  1  0.42629221  0.04735866  0.31177248  1.0
  S  S24  1  0.07796286  0.61757333  0.32790173  1.0
  S  S25  1  0.89490833  0.11503795  0.66120287  1.0
  N  N26  1  0.75230103  0.74561484  0.57227018  1.0
  N  N27  1  0.22614752  0.23622076  0.43074555  1.0
  O  O28  1  0.71180393  0.49850082  0.56406201  1.0
  O  O29  1  0.27605354  0.00896080  0.44516716  1.0

Hi @knc6, let me try to install that package and inform you as soon as possible.

@aydinmirac
Copy link
Author

Hi @knc6, I installed the package.

Then I checked "config.json" file. Giving the exact number of the length of dataset gives error. I reduced the number of "n_val", "n_test" and "n_train" values

{
        "version": "1859f483ea2b41f6163b845eb689f0f1afdd1e8f",
        "dataset": "user_data",
        "target": "target",
        "atom_features": "cgcnn",
        "neighbor_strategy": "k-nearest",
        "random_seed": 123,
        "classification_threshold": null,
        "n_val": 5350,
        "n_test": 5350,
        "n_train": 42000,
        "train_ratio": 0.8,
        "val_ratio": 0.1,
        "test_ratio": 0.1,
        "target_multiplication_factor": null,
        "epochs": 300,

After reducing the values by around 10 molecules. The model started to work. I think these numbers are not automatically detected by the length of dataset. Assigning these values manually can cause error sometimes if you give a value more then the length of dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants