CFOUR molden files #273

shivupa · 2021-06-14T17:12:28Z

Hi,
We have noticed the MOs printed in the CFOUR MOLDEN output are not normalized as expected. We were interested in standardizing this.

We summarize the quirks below (relevant files are at this gist):

CFOUR outputs "[Molden Format]white space" rather than [Molden Format]". Needs .strip()
CFOUR also puts "[Molden Format]" in the file twice.
CFOUR converts the AO basis back to the cartesian basis (regardless of if it was run in the spherical basis)

We have run zero electron calculations with d,f,g,h functions individually to assist with the normalization and ordering. Could you recommend a path forward to get the properly normalized MOLDENs out? We are willing to help as much as we can since this is critical for our research.

Thanks,
Shiv Upadhyay (@shivupa) and Amanda Dumi (@amandadumi)

PaulWAyers · 2021-06-14T17:36:22Z

Thanks for pointing this out. @tovrstra is probably the best person to respond in detail. As you've surmised, molden files are darn tricky. Just off hand, however, I don't think it will be that difficult to support CFOUR molden files too.

tovrstra · 2021-06-15T08:34:59Z

@shivupa Thanks for bringing this up. We'd love to update our code so it can handle CFOUR files. How familiar are you with changing the source code and making a pull request? We typically review PRs and make suggestions for code improvements until it all looks good. A good start is to create a PR with unit tests with CFOUR files, which just show that the current code fails. Then you can add commits to fix the problems. A technically detailed outline can be found here: https://github.com/theochem/iodata/blob/master/CONTRIBUTING.rst

I took a quick look at the molden file in the gist and this one is certainly useful for unit testing, i.e. it is not too big. It will take a bit of reverse engineering to find out the non-standard conventions that CFOUR Molden files use. By calculating and printing out the overlap matrix with IOData of the MO's for a small calculation, like you provide, you can detect where the issues are. This should obviously be an identity matrix, but this fails when there is a mismatch with the conventions. An example of such a test can found here:

iodata/iodata/test/test_molden.py

Lines 263 to 281 in 1de4661

    
           @pytest.mark.parametrize("case", ["zn", "mn", "cuh"]) 
        
           def test_load_molden_high_am_psi4(case): 
        
               # The file tested here is created with PSI4 1.3.2. 
        
               # This is a special case because it contains higher angular momenta than 
        
               # officially supported by the Molden format. Most virtual orbitals were removed. 
        
               with path('iodata.test.data', f'psi4_{case}_cc_pvqz_pure.molden') as fn_molden: 
        
                   with pytest.warns(FileFormatWarning) as record: 
        
                       mol = load_one(str(fn_molden)) 
        
               assert len(record) == 1 
        
               assert "unnormalized" in record[0].message.args[0] 
        
               # Check normalization 
        
               olp = compute_overlap(mol.obasis, mol.atcoords) 
        
               if mol.mo.kind == "restricted": 
        
                   check_orthonormal(mol.mo.coeffs, olp) 
        
               elif mol.mo.kind == "unrestricted": 
        
                   check_orthonormal(mol.mo.coeffsa, olp) 
        
                   check_orthonormal(mol.mo.coeffsb, olp) 
        
               else: 
        
                   raise NotImplementedError

The function check_orthonormal does that test. Also the @pytest.mark.parametrize can be convenient to write a single test for multiple files.

The function with all the fixes for buggy molden files can be found here:

iodata/iodata/formats/molden.py

Line 552 in 1de4661

def _fix_molden_from_buggy_codes(result: dict, lit: LineIterator):

Depending on the type of issues in the Molden file, you may need to fix the sign convention, the ordering of the basis functions (within each shell) and/or their normalization. Non-standard signs and orderings can be defined in a conventions dictionary. Normalization must be fixed (for now) by rescaling the MO coefficients.

I'm not sure if this answer your question?

amandadumi · 2021-06-15T13:59:29Z

@tovrstra thanks for orienting us and giving us some guidance, it's very helpful Your response has answered all of the questions we have at this point. We are familiar with python development with git and github, so this will be fine for us. We will open a WIP pull request and get started!

shivupa · 2021-10-04T15:55:50Z

Closed via #276

amandadumi mentioned this issue Jul 11, 2021

Cfour molden #276

Merged

shivupa closed this as completed Oct 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CFOUR molden files #273

CFOUR molden files #273

shivupa commented Jun 14, 2021

PaulWAyers commented Jun 14, 2021

tovrstra commented Jun 15, 2021

amandadumi commented Jun 15, 2021

shivupa commented Oct 4, 2021

CFOUR molden files #273

CFOUR molden files #273

Comments

shivupa commented Jun 14, 2021

PaulWAyers commented Jun 14, 2021

tovrstra commented Jun 15, 2021

amandadumi commented Jun 15, 2021

shivupa commented Oct 4, 2021