Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic Energy Selection and Kernel Selection Methodologies Usage #104

Open
ankur56 opened this issue May 11, 2023 · 9 comments
Open

Atomic Energy Selection and Kernel Selection Methodologies Usage #104

ankur56 opened this issue May 11, 2023 · 9 comments
Assignees
Labels
documentation Improvements or additions to documentation question Further information is requested

Comments

@ankur56
Copy link

ankur56 commented May 11, 2023

Hello Maintainers,

I am working with a dataset of around 5000 molecular configurations, which are not necessarily generated through MD simulations. I am interested in employing the "Atomic Energy Selection" methodology, as outlined in this paper, to assemble a dataset for training my machine learning model.

In addition to this, the other methods available in this repository, such as the kernel selection method, seem to align well with my use-case requirements.

To better understand the implementation and to ensure I am utilizing these methodologies correctly, it would be greatly beneficial if you could provide an example code for the Atomic Energy Selection and Kernel Selection methodologies.

I have also spoken about this issue to Samuel Tovey via email.

@SamTov
Copy link
Member

SamTov commented May 11, 2023

@PythonFZ is atomic energy selection on classically generated data possible in IPSuite at the moment?

In any case, an example script for MMK data selection might be sufficient here as descriptor-based data selection has shown to be quite effective.

@PythonFZ PythonFZ self-assigned this May 15, 2023
@PythonFZ PythonFZ added documentation Improvements or additions to documentation question Further information is requested awaiting-response labels May 15, 2023
@PythonFZ
Copy link
Member

@ankur56 I created a repository that shows how the kernel selection method can be used:
https://dagshub.com/PythonFZ/IPS-Examples/src/ConfigurationSelection/main.ipynb

You can easily reproduce the example with the following code:

pip install git+https://github.com/zincware/IPSuite
git clone https://github.com/PythonFZ/IPS-Examples.git
git checkout ConfigurationSelection
dvc pull

# if you make some changes you can use the following to exectue them
dvc repro

The examples from the README might also be helpful for the general setup

We are currently working on a documentation for IPSuite but I don't know when it will be finished.
If you encounter any issues or have further questions feel free to ask here.

More information on how IPSuite works can be found at
https://zntrack.readthedocs.io/en/latest/
https://dvc.org/
https://github.com/zincware/ZnFlow

@ankur56
Copy link
Author

ankur56 commented May 16, 2023

Hi @PythonFZ,
Thank you for providing such a detailed response. I attempted to use the Jupyter notebook example, but encountered an error with the dvc command. The error log is as follows:

DVCProcessError                           Traceback (most recent call last)
Cell In[2], line 12
      8     mmk_selection = ips.configuration_selection.KernelSelection(kernel=kernel, data=data, initial_configurations=None, n_configurations=10)
      9     uniform_energetic_selection = ips.configuration_selection.UniformEnergeticSelection(
     10         data=data, n_configurations=10
     11     )
---> 12 project.run(repro=False)

File ~/miniconda3/envs/ipsuite/lib/python3.10/site-packages/zntrack/project/zntrack_project.py:163, in Project.run(self, eager, repro, optional, save, environment)
    161         cmd = get_dvc_cmd(node, **optional.get(node.name, {}))
    162         for x in cmd:
--> 163             run_dvc_cmd(x)
    164         node.save(results=False)
    165 if not eager and repro:

File ~/miniconda3/envs/ipsuite/lib/python3.10/site-packages/zntrack/utils/__init__.py:114, in run_dvc_cmd(script)
    112 return_code = dvc.cli.main(script)
    113 if return_code != 0:
--> 114     raise DVCProcessError(
    115         f"DVC CLI failed ({return_code}) for cmd: \n \"{' '.join(script)}\" "
    116     )
    117 # fix for https://github.com/iterative/dvc/issues/8631
    118 for logger_name, logger in logging.root.manager.loggerDict.items():

DVCProcessError: DVC CLI failed (255) for cmd: 
 "stage add --quiet --name AddData --force --outs nodes/AddData/atoms.h5 --deps /home/ankur/Documents/ipsuite/IPS-Examples/KCl1650K.extxyz --params params.yaml:AddData zntrack run ipsuite.data_loading.add_data_ase.AddData --name AddData" 

I'm currently trying to understand the params.yaml file, and it appears that I may need to modify some of the parameters to meet my specific needs. Since my dataset is non-periodic, I might have to utilize the periodic: false option in the soap kernel or select a different kernel entirely. I'm curious about the current kernels that are accessible within IPSuite.

@ankur56
Copy link
Author

ankur56 commented May 16, 2023

@SamTov and @PythonFZ, I was also hoping you could provide some clarification on the kernel plot displayed in the kernel_selection.gif figure. It appears that the x-axis represents the number of configurations in the training set, while the y-axis depicts the kernel value. Additionally, it seems that the number of local maxima is increasing with each iteration. Can you please explain what is happening in this plot?

I am also curious about the Minimum Membership Kernel (MMK) selection method, but I haven't been able to find much information online. Could you please direct me to the relevant paper? Thank you.

@PythonFZ
Copy link
Member

The MMK value gives a representation of how similar a given configuration is compared to a set of other configurations.
The kernel_selection.gif displays the MMK value. The configuration with the smallest MMK value is selected thereby increasing the number of maximums with each selection by one. This enables us to create training data sets which have a high dissimilarity between the configurations.

For your DVC error:

What happens if you run stage add --name AddData --force --outs nodes/AddData/atoms.h5 --deps /home/ankur/Documents/ipsuite/IPS-Examples/KCl1650K.extxyz --params params.yaml:AddData zntrack run ipsuite.data_loading.add_data_ase.AddData --name AddData ?

The periodic flag in the params.yaml should not be there and will be removed when #111 is fixed. If you build the graph with non-periodic data it should change. Otherwise, please change it manually.

@ankur56
Copy link
Author

ankur56 commented May 23, 2023

@PythonFZ Thank you for explaining the MMK method.

When I run the command,

dvc stage add --name AddData --force --outs /home/ankur/Documents/ipsuite/IPS-Examples/nodes/AddData/atoms.h5 --deps /home/ankur/Documents/ipsuite/IPS-Examples/KCl1650K.extxyz --params params.yaml:AddData zntrack run ipsuite.data_loading.add_data_ase.AddData --name AddData

I get the following error,

ERROR: unexpected error - module 'platformdirs' has no attribute 'site_cache_dir'

Seems to be related to this issue https://discuss.dvc.org/t/module-platformdirs-has-no-attribute-site-cache-dir/1636/4

The code is working fine with platformdirs verion 3.1.1 as described in that issue. I will now test this code on my dataset.

@ankur56
Copy link
Author

ankur56 commented May 24, 2023

I am now trying to run the code in a different directory on my dataset. I also changed the periodic flag to periodic: false before running the code. My code is as follows,

#!/usr/bin/env python3

import ipsuite as ips

kernel = ips.configuration_comparison.MMKernel()

with ips.Project(automatic_node_names=True, remove_existing_graph=True) as project:
    data = ips.AddData(file="/home/ankur/Documents/ipsuite/new/data.xyz")
    random_selection = ips.configuration_selection.RandomSelection(
        data=data, n_configurations=20
    )
    mmk_selection = ips.configuration_selection.KernelSelection(kernel=kernel, data=data, initial_configurations=None, n_configurations=10)
    uniform_energetic_selection = ips.configuration_selection.UniformEnergeticSelection(
        data=data, n_configurations=10
    )
project.run(repro=False)

mmk_selection.load()
print(mmk_selection.selected_configurations)

random_selection.load()
print(random_selection.selected_configurations)

I got the following output from this code,

2023-05-23 17:34:26,283 (DEBUG): Welcome to IPS - the Interatomic Potential Suite!
2023-05-23 17:34:26.476373: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-05-23 17:34:26.511040: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-05-23 17:34:26.511368: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-23 17:34:27.055499: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-05-23 17:34:27,645 (WARNING): Please run 'dvc add /home/ankur/Documents/ipsuite/new/data.xyz' to track the file with DVC. Otherwise, it might end up being git tracked.
Running DVC command: 'stage add --name AddData --force ...'
Running DVC command: 'stage add --name ConfigurationSelection --force ...'
Running DVC command: 'stage add --name ConfigurationSelection_1 --force ...'
Running DVC command: 'stage add --name ConfigurationSelection_1_kernel --force ...'
Could not create .gitignore entry in /home/ankur/Documents/ipsuite/new/nodes/ConfigurationSelection_1_kernel/.gitignore. DVC will attempt to create .gitignore entry again when the stage is run.
Running DVC command: 'stage add --name ConfigurationSelection_2 --force ...'
Could not load field selected_configurations for node ConfigurationSelection_1.
<class 'zntrack.utils.LazyOption'>
Could not load field selected_configurations for node ConfigurationSelection.
<class 'zntrack.utils.LazyOption'>

Am I missing something?

P.S. After code execution, the periodic flag automatically reverts back to periodic: true.

@PythonFZ
Copy link
Member

You seem to have fixed the DVC issue. You are running the code with project.run(repro=False) e.g. the graph will be written but not executed. There are many ways to run the code the easiest would be to run project.run() instead.

@ankur56
Copy link
Author

ankur56 commented May 24, 2023

@PythonFZ Thank you for your prompt response.

I ran the code with project.run(), but got an error saying got an unexpected keyword argument 'rcut'. I also tried removing the rcut flag from the params file but still got the same error. My params file is as follows,

AddData:
    lines_to_read: null
ConfigurationSelection:
    n_configurations: 20
    seed: 1234
ConfigurationSelection_1:
    correlation_time: 1
    n_configurations: 20
    points_per_cycle: 1
    seed: 1234
ConfigurationSelection_1_kernel:
    soap:
        _type: soap_parameter_dataclass
        value:
            l_max: 7
            n_jobs: -1
            n_max: 7
            periodic: false
            r_cut: 9.0
            rbf: gto
            sigma: 1.0
            weighting: null
ConfigurationSelection_2:
    n_configurations: 20

/home/ankur/miniconda3/envs/ipsuite/lib/python3.10/site-packages/ipsuite/configuration_compariso │
│ n/base.py:285 in run                                                                             │
│                                                                                                  │
│   282 │   │   configurations and save the result as a csv file.                                  │
│   283 │   │   """                                                                                │
│   284 │   │   self.result = []                                                                   │
│ ❱ 285 │   │   self.save_representation()                                                         │
│   286 │   │   if self.reference is None:                                                         │
│   287 │   │   │   with h5py.File(self.soap_file, "r") as representation_file:                    │
│   288 │   │   │   │   with trange(                                                               │
│                                                                                                  │
│ ╭─────────────────────────────────────── locals ───────────────────────────────────────╮         │
│ │ self = <ipsuite.configuration_comparison.MMKernel.MMKernel object at 0x145546df5f60> │         │
│ ╰──────────────────────────────────────────────────────────────────────────────────────╯         │
│                                                                                                  │
│ /home/ankur/miniconda3/envs/ipsuite/lib/python3.10/site-packages/ipsuite/configuration_compariso │
│ n/base.py:207 in save_representation                                                             │
│                                                                                                  │
│   204 │   │    and save them ordered in a hdf5 file.                                             │
│   205 │   │   """                                                                                │
│   206 │   │   species = [int(x) for x in set(self.analyte[0].get_atomic_numbers())]              │
│ ❱ 207 │   │   _soap = SOAP(                                                                      │
│   208 │   │   │   species=species,                                                               │
│   209 │   │   │   periodic=self.soap.periodic,                                                   │
│   210 │   │   │   rcut=self.soap.r_cut,                                                          │
│                                                                                                  │
│ ╭──────────────────────────────────────── locals ─────────────────────────────────────────╮      │
│ │    self = <ipsuite.configuration_comparison.MMKernel.MMKernel object at 0x145546df5f60> │      │
│ │ species = [1, 6, 7]                                                                     │      │
│ ╰─────────────────────────────────────────────────────────────────────────────────────────╯      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: SOAP.__init__() got an unexpected keyword argument 'rcut'
Traceback (most recent call last):
  File "/home/ankur/Documents/ipsuite/new/test_mmk_ips.py", line 20, in <module>
    project.run()
  File "/home/ankur/miniconda3/envs/ipsuite/lib/python3.10/site-packages/zntrack/project/zntrack_project.py", line 166, in run
    run_dvc_cmd(["repro"])
  File "/home/ankur/miniconda3/envs/ipsuite/lib/python3.10/site-packages/zntrack/utils/__init__.py", line 114, in run_dvc_cmd
    raise DVCProcessError(
zntrack.utils.DVCProcessError: DVC CLI failed (255) for cmd: 
 "repro --quiet" 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants