Predicting other values from the QM9 dataset #3

PratyushAvi · 2021-06-14T13:39:56Z

Hello!

The QM9 dataset also contains information regarding the atomization energies of molecules. Can we predict those by simply adding to the targets list in main.py?

Also, can you please explain why you add 5 to the target values in [7,8,9,10]?

zetayue · 2021-06-14T15:01:20Z

Hi!

The QM9 dataset also contains information regarding the atomization energies of molecules. Can we predict those by simply adding to the targets list in main.py?

Yes, we actually are predicting those atomization energies instead of the original U_0, U, H, G values as mentioned in the Appendix Section 7.1 of our work, which is also the common approach used in related works.

Also, can you please explain why you add 5 to the target values in [7,8,9,10]?

Since we are not predicting the original values for U_0, U, H, G targets (target indices from 7 to 10 in QM9), we directly add 5 for those target indices to load the related atomization energy targets (target indices from 12 to 15 in QM9).

MXMNet/qm9_dataset.py

Lines 78 to 94 in 265db78

    
               | 7      | :math:`U_0`                      | Internal energy at 0K                                                             | :math:`\textrm{eV}`                         | 
        
               +--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+ 
        
               | 8      | :math:`U`                        | Internal energy at 298.15K                                                        | :math:`\textrm{eV}`                         | 
        
               +--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+ 
        
               | 9      | :math:`H`                        | Enthalpy at 298.15K                                                               | :math:`\textrm{eV}`                         | 
        
               +--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+ 
        
               | 10     | :math:`G`                        | Free energy at 298.15K                                                            | :math:`\textrm{eV}`                         | 
        
               +--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+ 
        
               | 11     | :math:`c_{\textrm{v}}`           | Heat capavity at 298.15K                                                          | :math:`\frac{\textrm{cal}}{\textrm{mol K}}` | 
        
               +--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+ 
        
               | 12     | :math:`U_0^{\textrm{ATOM}}`      | Atomization energy at 0K                                                          | :math:`\textrm{eV}`                         | 
        
               +--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+ 
        
               | 13     | :math:`U^{\textrm{ATOM}}`        | Atomization energy at 298.15K                                                     | :math:`\textrm{eV}`                         | 
        
               +--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+ 
        
               | 14     | :math:`H^{\textrm{ATOM}}`        | Atomization enthalpy at 298.15K                                                   | :math:`\textrm{eV}`                         | 
        
               +--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+ 
        
               | 15     | :math:`G^{\textrm{ATOM}}`        | Atomization free energy at 298.15K                                                | :math:`\textrm{eV}`                         |

An alternative way is to call the atomref function to subtract the atomic reference energies from total energies when loading the dataset:

MXMNet/qm9_dataset.py

Lines 138 to 143 in 265db78

    
           def atomref(self, target): 
        
               if target in atomrefs: 
        
                   out = torch.zeros(100) 
        
                   out[torch.tensor([1, 6, 7, 8, 9])] = torch.tensor(atomrefs[target]) 
        
                   return out.view(-1, 1) 
        
               return None

PratyushAvi · 2021-06-14T17:08:59Z

Thank you so much!!

PratyushAvi · 2021-06-22T14:17:36Z

Is there a way to predict multiple values at once? For example, if we wanted our targets to be u_298, h_298, and cv, is there a way to modify MXMNet to do that?

zetayue · 2021-06-23T01:31:31Z

Yes, you can just change the single target input in QM9 to multiple targets, and let the model to output multiple values for each molecule.

For example, to predict u_298, h_298, and cv, the corresponding indices are 13, 14, and 11. The MyTransform function has to be changed in order to load the ground truth multiple targets: data.y = data.y[:, 13, 14, 11]

MXMNet/main.py

Lines 65 to 68 in fa1dbc5

    
           class MyTransform(object): 
        
               def __call__(self, data): 
        
                   data.y = data.y[:, target] 
        
                   return data

Then the model should output a tensor of size [N, 3] for N molecules with 3 targets. The following line should be changed to self.y_W = nn.Linear(self.dim, 3):

MXMNet/layers.py

Line 87 in fa1dbc5

self.y_W = nn.Linear(self.dim, 1)

By doing these, the output and data.y should all have a size of [N, 3] for N molecules to compute the L1 loss between them:

MXMNet/main.py

Line 117 in fa1dbc5

loss = F.l1_loss(output, data.y)

PratyushAvi · 2021-06-23T14:01:27Z

I tried the modifications that you suggested. However, I don't think it recognizes the change in the tensor's size. I get the following error:

  File "/Users/pratyushavi/Developer/Projects/AAMP-UP/QM9/MXMNet/main.py", line 111, in <module>
    for data in train_loader:
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 557, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/site-packages/torch_geometric/data/dataset.py", line 188, in __getitem__
    data = data if self.transform is None else self.transform(data)
  File "/Users/pratyushavi/Developer/Projects/AAMP-UP/QM9/MXMNet/main.py", line 67, in __call__
    data.y = data.y[:, 13, 14, 11]
IndexError: too many indices for tensor of dimension 2

How do you suggest we deal with this?

EDIT: I had initially posted a testing version that I had made for myself. The line numbers would've been harder to trace because of that, so I updated the error message with what I got from running main.py with the modification.

zetayue · 2021-06-23T15:54:42Z

You can try data.y = data.y[:, [13, 14, 11]] or torch.index_select(data.y, 1, torch.tensor([13, 14, 11])). The goal is to get a tensor of size N*3 with the indices.

PratyushAvi closed this as completed Jun 14, 2021

PratyushAvi reopened this Jun 22, 2021

zetayue closed this as completed Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predicting other values from the QM9 dataset #3

Predicting other values from the QM9 dataset #3

PratyushAvi commented Jun 14, 2021 •

edited

zetayue commented Jun 14, 2021

PratyushAvi commented Jun 14, 2021

PratyushAvi commented Jun 22, 2021

zetayue commented Jun 23, 2021

PratyushAvi commented Jun 23, 2021 •

edited

zetayue commented Jun 23, 2021

Predicting other values from the QM9 dataset #3

Predicting other values from the QM9 dataset #3

Comments

PratyushAvi commented Jun 14, 2021 • edited

zetayue commented Jun 14, 2021

PratyushAvi commented Jun 14, 2021

PratyushAvi commented Jun 22, 2021

zetayue commented Jun 23, 2021

PratyushAvi commented Jun 23, 2021 • edited

zetayue commented Jun 23, 2021

PratyushAvi commented Jun 14, 2021 •

edited

PratyushAvi commented Jun 23, 2021 •

edited