Skip to content

Refactor database#166

Open
msricher wants to merge 52 commits intotheochem:masterfrom
enjyashraf18:master
Open

Refactor database#166
msricher wants to merge 52 commits intotheochem:masterfrom
enjyashraf18:master

Conversation

@msricher
Copy link
Copy Markdown
Collaborator

@msricher msricher commented Apr 15, 2026

@enjyashraf18 @FarnazH I rebased from atomdb/master and enjyashaf18/refacor-species so that they can be merged. atomdb/master can be rebased on top of this, or we can use this PR. We just need to fix the tests. @enjyashraf18 can you help with this? Otherwise I'll try later this week when I can type more again.

Tests are saying I can't import "sph_harm" from scipy.special, but I grepped every file and "sph_harm" doesn't even appear. I'm not sure what to do about this. Every test is failing due to some import issue or another.

msricher and others added 30 commits April 15, 2026 07:26
- Included nbasis in fields
- Replaced hardcoded radial points with imported NPOINTS
- Placed datasets_data.h5 under datasets folder
- Added docstrings
enjyashraf18 and others added 15 commits April 15, 2026 11:18
1. Updated run module for numeric dataset
2. Created customized HDF5 file creator for numeric
3. Migrated all old files from msgpack to HDF5

 #handle wildcard case while loading the element
1. Updated run module for nist dataset
2. Created customized HDF5 file creator for nist
3. Migrated all old files from msgpack to HDF5
- Versioned file naming: datasets_data-v{version:03d}.h5 pattern (v000, v001, etc.)
- Kept the latest version HDF5 file open by default.
- Opens/closes specific version files only when explicitly requested.
1. test_api: skip when dataset is numeric, since numeric datasets do not support orbital gradients
2. gaussian: updated grid.shape to 1000
3. test_nist: treat None and NaN as equivalent for the energy field, as empty values in the HDF5 file are assigned NaN
4. test_wfn_slater: treat None and NaN as equivalent for the mu and eta fields
- Updated `load`, `compile_species`, and `datafile` to work with the new global database structure.
- Each dataset has its own HDF5 file.
- Enables more efficient memory usage by avoiding loading all datasets into one monolithic file.
@msricher
Copy link
Copy Markdown
Collaborator Author

My best attempt at merging it all together is now in branch master-all-prs. The data files from LFS are gone though.... @enjyashraf18 could you add them back in? I don't know which version is the most up-to-date.

@msricher
Copy link
Copy Markdown
Collaborator Author

This is ready for review. Failing tests are due to package qc-grid being broken. Can try to get that fixed tomorrow.

@msricher
Copy link
Copy Markdown
Collaborator Author

Issue #162 still exists, but aside from that every feature works now.

@PaulWAyers
Copy link
Copy Markdown
Member

@marco-2023 can you look at the qc-grid issue? I think we already deal with the spherical harmonic issue in grid.

The sph_harm is because scipy changed the way it works and now sph_harm is not supported in the latest version. The new scipy uses
https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.sph_harm_y.html
and another version (that computes all spherical harmonics up to some order)
https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.sph_harm_y_all.html

One has to be careful because the arguments have changed; as I recall the $\theta$ and $\phi$ variables swapped (maybe also the indices specifying the degree/type).

@msricher
Copy link
Copy Markdown
Collaborator Author

I fixed the sph_harm issue for now by pinning the scipy version.

@msricher
Copy link
Copy Markdown
Collaborator Author

@enjyashraf18 The compile_species function isn't working for me:

Traceback (most recent call last):
  File "/home/michelle/Git/atomdb/run.py", line 5, in <module>
    compile_species(element_symbol(i), i, 1, dataset="gaussian")
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/michelle/Git/atomdb/atomdb/species.py", line 789, in compile_species
    dump(DATASET_H5FILE, fields, dataset, mult)
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/michelle/Git/atomdb/atomdb/species.py", line 804, in dump
    element_folder_creator.create_hdf5_file(DATASET_H5FILE, fields, dataset, mult)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/michelle/Git/atomdb/atomdb/datasets/gaussian/h5file_creator.py", line 342, in create_hdf5_file
    DATASETS_H5FILE.create_group("/Datasets", dataset, f"{dataset} Data")
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/michelle/Git/atomdb/venv/lib/python3.14/site-packages/tables/file.py", line 975, in create_group
    parentnode = self._get_or_create_path(where, createparents)
  File "/home/michelle/Git/atomdb/venv/lib/python3.14/site-packages/tables/file.py", line 915, in _get_or_create_path
    return self.get_node(path)
           ~~~~~~~~~~~~~^^^^^^
  File "/home/michelle/Git/atomdb/venv/lib/python3.14/site-packages/tables/file.py", line 1798, in get_node
    node = self._get_node(nodepath)
  File "/home/michelle/Git/atomdb/venv/lib/python3.14/site-packages/tables/file.py", line 1741, in _get_node
    node = self._node_manager.get_node(nodepath)
  File "/home/michelle/Git/atomdb/venv/lib/python3.14/site-packages/tables/file.py", line 454, in get_node
    node = self.node_factory(key)
  File "/home/michelle/Git/atomdb/venv/lib/python3.14/site-packages/tables/group.py", line 1197, in _g_load_child
    node_type = self._g_check_has_child(childname)
  File "/home/michelle/Git/atomdb/venv/lib/python3.14/site-packages/tables/group.py", line 421, in _g_check_has_child
    raise NoSuchNodeError(
    ...<2 lines>...
    )
tables.exceptions.NoSuchNodeError: group ``/`` does not have a child named ``/Datasets``

I can't figure it out, do you know if you missed something while porting everything to the GLOBAL_DB?

@PaulWAyers
Copy link
Copy Markdown
Member

@marco-2023 or @gabrielasd can you see if it's clear what's going on here?

@enjyashraf18
Copy link
Copy Markdown
Collaborator

Hi @msricher , as I mentioned, unfortunately I had to switch back to Windows (just temporarily), so I can’t really debug properly right now.

I checked the commits to see the latest changes, do you know when this issue started? Was it right after recreating datasets_data-v000.h5 or later?

@msricher
Copy link
Copy Markdown
Collaborator Author

msricher commented Apr 23, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants