Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicting other values from the QM9 dataset #3

Closed
PratyushAvi opened this issue Jun 14, 2021 · 6 comments
Closed

Predicting other values from the QM9 dataset #3

PratyushAvi opened this issue Jun 14, 2021 · 6 comments

Comments

@PratyushAvi
Copy link

PratyushAvi commented Jun 14, 2021

Hello!

The QM9 dataset also contains information regarding the atomization energies of molecules. Can we predict those by simply adding to the targets list in main.py?

Also, can you please explain why you add 5 to the target values in [7,8,9,10]?

@zetayue
Copy link
Owner

zetayue commented Jun 14, 2021

Hi!

The QM9 dataset also contains information regarding the atomization energies of molecules. Can we predict those by simply adding to the targets list in main.py?

Yes, we actually are predicting those atomization energies instead of the original U_0, U, H, G values as mentioned in the Appendix Section 7.1 of our work, which is also the common approach used in related works.

Also, can you please explain why you add 5 to the target values in [7,8,9,10]?

Since we are not predicting the original values for U_0, U, H, G targets (target indices from 7 to 10 in QM9), we directly add 5 for those target indices to load the related atomization energy targets (target indices from 12 to 15 in QM9).

MXMNet/qm9_dataset.py

Lines 78 to 94 in 265db78

| 7 | :math:`U_0` | Internal energy at 0K | :math:`\textrm{eV}` |
+--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+
| 8 | :math:`U` | Internal energy at 298.15K | :math:`\textrm{eV}` |
+--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+
| 9 | :math:`H` | Enthalpy at 298.15K | :math:`\textrm{eV}` |
+--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+
| 10 | :math:`G` | Free energy at 298.15K | :math:`\textrm{eV}` |
+--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+
| 11 | :math:`c_{\textrm{v}}` | Heat capavity at 298.15K | :math:`\frac{\textrm{cal}}{\textrm{mol K}}` |
+--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+
| 12 | :math:`U_0^{\textrm{ATOM}}` | Atomization energy at 0K | :math:`\textrm{eV}` |
+--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+
| 13 | :math:`U^{\textrm{ATOM}}` | Atomization energy at 298.15K | :math:`\textrm{eV}` |
+--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+
| 14 | :math:`H^{\textrm{ATOM}}` | Atomization enthalpy at 298.15K | :math:`\textrm{eV}` |
+--------+----------------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+
| 15 | :math:`G^{\textrm{ATOM}}` | Atomization free energy at 298.15K | :math:`\textrm{eV}` |

An alternative way is to call the atomref function to subtract the atomic reference energies from total energies when loading the dataset:

MXMNet/qm9_dataset.py

Lines 138 to 143 in 265db78

def atomref(self, target):
if target in atomrefs:
out = torch.zeros(100)
out[torch.tensor([1, 6, 7, 8, 9])] = torch.tensor(atomrefs[target])
return out.view(-1, 1)
return None

@PratyushAvi
Copy link
Author

Thank you so much!!

@PratyushAvi
Copy link
Author

Is there a way to predict multiple values at once? For example, if we wanted our targets to be u_298, h_298, and cv, is there a way to modify MXMNet to do that?

@zetayue
Copy link
Owner

zetayue commented Jun 23, 2021

Yes, you can just change the single target input in QM9 to multiple targets, and let the model to output multiple values for each molecule.

For example, to predict u_298, h_298, and cv, the corresponding indices are 13, 14, and 11. The MyTransform function has to be changed in order to load the ground truth multiple targets: data.y = data.y[:, 13, 14, 11]

MXMNet/main.py

Lines 65 to 68 in fa1dbc5

class MyTransform(object):
def __call__(self, data):
data.y = data.y[:, target]
return data

Then the model should output a tensor of size [N, 3] for N molecules with 3 targets. The following line should be changed to self.y_W = nn.Linear(self.dim, 3):

self.y_W = nn.Linear(self.dim, 1)

By doing these, the output and data.y should all have a size of [N, 3] for N molecules to compute the L1 loss between them:

MXMNet/main.py

Line 117 in fa1dbc5

loss = F.l1_loss(output, data.y)

@PratyushAvi
Copy link
Author

PratyushAvi commented Jun 23, 2021

I tried the modifications that you suggested. However, I don't think it recognizes the change in the tensor's size. I get the following error:

  File "/Users/pratyushavi/Developer/Projects/AAMP-UP/QM9/MXMNet/main.py", line 111, in <module>
    for data in train_loader:
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 557, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/site-packages/torch_geometric/data/dataset.py", line 188, in __getitem__
    data = data if self.transform is None else self.transform(data)
  File "/Users/pratyushavi/Developer/Projects/AAMP-UP/QM9/MXMNet/main.py", line 67, in __call__
    data.y = data.y[:, 13, 14, 11]
IndexError: too many indices for tensor of dimension 2

How do you suggest we deal with this?

EDIT: I had initially posted a testing version that I had made for myself. The line numbers would've been harder to trace because of that, so I updated the error message with what I got from running main.py with the modification.

@zetayue
Copy link
Owner

zetayue commented Jun 23, 2021

You can try data.y = data.y[:, [13, 14, 11]] or torch.index_select(data.y, 1, torch.tensor([13, 14, 11])). The goal is to get a tensor of size N*3 with the indices.

@zetayue zetayue closed this as completed Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants