# Adaptive sampling

This script shows how to get geomtries out of the files produced with SchNarc using the adaptive sampling mode, an example to run the calculations with SHARC and parser to get the data out of SHARC-output files including a function to extend the data base.

In [1]:
path = "./"
import ase.io
import ase
import numpy as np
import schnetpack as spk
from ase.units import Bohr
print(Bohr)
import os
SHARC="/user/julia/software/SchNarc/sharc/source/../bin/"
from schnarc.utils import read_QMout

0.5291772105638411


## Running adaptive sampling

For adaptive sampling we need two models. These are provided in the folder "Models". 

For adaptive sampling you need 4 files:
* geom
* veloc
* input
* run.sh


The geom, veloc and input files for the dynamics with SchNarc can be generated with SHARC. Please have a look at the SHARC tutorial on how to set up trajectories. You will have 10 folders already provided for the purpose of this tutorial that contain these files.
The run.sh file contains the commands for adaptive sampling.
You need at least two models to do adaptive sampling.

Go into each of the folders and execute the run.sh file. It will terminate when the error of the models is larger than 1 eV.

Alternatively, below you can see how to run a trajectory.
You need the arguments --modelpaths PATH --adaptive --threhsolds 1 1 1 1 1.
The latter are errors for energy, forces, dipoles, socs, and nacs in a.u. that should not be exceeded by the two models. PATH should point to the path of the second ML model.
The dynamics will be run using the mean of all properties from the specified models. Note that you can use as many models as you like, just add more paths after the argument "--modelpaths".

If you set --print_uncertainty, the deviation of the model predictions will be print in a.u. into the file "NN.log".
Note that you can also use a gpu for dynamics, then please add "--cuda".

#### Attention
Running dynamics from this notebook takes much longer than executing the file "run.sh" in each folder using bash.

In [5]:
SCHNARC = "../../../src/scripts/"
os.system("cd TRAJ_1/ ; python %s/schnarc_md.py pred ../../DBs/Fulvene.db ../../Models/Model1 --modelpaths ../../Models/Model2 --adaptive --thresholds 0.1 100 100 100 1 --print_uncertainty >> NN.log 2>> NN.err "%SCHNARC)

2


First, use the data extractor from SHARC to generate the output.xyz files with the geometries.
Read the geometries with ase.
We have chosen to take the last and the last-5 geometry for the adaption of the training set

## Get geometries 
### Trajectories from ML model trained on energies and forces (Hessian approximation)

In [9]:
geoms=[]
ntraj = 1
ind=[]
for i in range(ntraj):
    
    #generate output.xyz 
    os.system("cd %s/TRAJ_%i/ ; %s/data_extractor_NetCDF.x -xyz output.dat; cd ../.."%(path,i+1,SHARC))
    
    geometries = ase.io.read("%s/TRAJ_%i/output.xyz"%(path,i+1),":")
    
    #this happens if the first geometry is predicted with a too large error
    if len(geometries)==0:
        if i+1 not in ind:
            ind.append(i+1)
    else:
        #append geometries with last geometry of trajectory file
        geoms.append(geometries[-1])
print(len(geoms))


1


## Make folders for calculation

In this example we will generate X folder (X=number of geometries for expanding the training set).
We will copy all relevant files to make QM calculations with SHARC.
Make sure you have downloaded the "Inputfile" folder and make sure to adapt the "run.sh" file for your cluster. 
For the sake of the tutorial, we will also provide the reference calculations in the zip-file "AdaptiveSampling-Calculations.zip". So you don't need to run the calculations, but you can copy the results and go on with adapting the training set.
Note that this zip only contains all file in the first folder, and in all other folders only the inputs and outputs are saved.

Note that you additionally have to copy the SAVEDIR folder and to adapt the "QMstring"-file to carry out phasecorrection as it was done for generating the initial training set.

In [10]:
#make adaptive sampling folders
inputs = "/../Inputfiles/"
for i in range(len(geoms)):
    os.system("mkdir QM")
    os.system("mkdir QM/Geom_%05d"%i)
    ase.io.write("QM/Geom_%05d/geometry.xyz"%i,geoms[i])
    os.system("cp QM/Geom_%05d/geometry.xyz QM/Geom_%05d/QM.in"%(i,i))
    os.system("cat %s/QMstring >> QM/Geom_%05d/QM.in" %(inputs,i))
    os.system(" cp %s/MOLPRO* QM/Geom_%05d/"%(inputs,i))
    #os.system(" mkdir QM/Geom_%05d/SAVEDIR"%i)
    #os.system(" cp %s/wf.1 QM/Geom_%05d/SAVEDIR"%(inputs,i))
    os.system(" cp %s/run.sh QM/Geom_%05d/" %(inputs,i))


In [25]:
print("Copy the submission script for the cluster you use into every folder and submit the calculations")

Copy the submission script for the cluster you use into every folder and submit the calculations


## Adapting the training set

For now we copy a dummy-QMout file from the Initial training set generation into every QM-folder for demonstration.
Note that units are all given in a.u. and that the geometries need to be saved in a.u. as well

In [13]:
for i in range(len(geoms)):
    os.system("cp ../1_TrainingSet/InitialConditions/ICOND_00001/QM.out QM/Geom_%05d/"%i)


## Read QMout

In [17]:
#iterating over all files
nfiles = len(geoms)
#define number of states
ntriplets = 0
nsinglets = 2
nstates = nsinglets + 3 * ntriplets

#define number of atoms
natoms = 12
#number of nacs
nnacs = int(nsinglets*(nsinglets-1)/2) + int(ntriplets*(ntriplets-1)/2)
#number of dipole moment values
ndipoles = int(nsinglets+ntriplets+nnacs)


#for conversion of atoms into bohr
from ase.units import Bohr

#data dictionary for updating the data base
data = {}

filename="./QM/Geom_00000/QM.out"
#we don't have spin-orbit couplings
socs=False
#read properties
data=read_QMout(filename,natoms,socs,nsinglets,ntriplets,0.5)
atoms = ase.io.read("TRAJ_1/output.xyz","-1")
#convert to bohr
from ase.units import Bohr
atoms = ase.atoms.Atoms(atoms.get_atomic_numbers(),atoms.get_positions()/Bohr)




### Add data to the data base

In [28]:
# load the old data base
from ase.db import connect
old_db = connect("../DBs/Fulvene.db")
for i in range(len(data)):
    old_db.write(atoms[i],data=data[i])

In [31]:
print("We have added",len(atoms), "data points to the data set and now have a total number of",len(old_db), "data points.")

We have added 20 data points to the data set and now have a total number of 120 data points.


Now retrain the models and redo the sampling until the training set is large enough.
For fulvene and the 40 fs we compute, the training set is large enough.

If you want to make production runs and use a second model to compare the energies, but not to compute forces or hessians, then use " --emodel2 PATH/ " instead of "--adaptive --modelpaths PATH".
The threshold is set to 1 eV per default. You can change the threshold via "--thresholds 1 1 1 1 1". (thresholds are always given for every possible property).