# Database Exploration

This Notebook outlines the Python functions that can be used to search and find potentials from the Interatomic Potentials Repository database.  

This is meant to be a moderately deep guide to the search capabilities for those wishing to integrate into their own projects. If you simply want to search the content in the database, see the "Search..." Notebooks instead.


## Colab setup

*If you are running this Notebook in colab you will need to install potentials:*

1. Run the next cell to download and install the database.
2. Restart the terminal: Runtime -> Restart runtime (ctrl+M .)

Otherwise, skip to the next section.

In [1]:
!pip install potentials

print('RESTART TERMINAL AFTER RUNNING!!!!')
print('Runtime -> Restart runtime')

Collecting potentials
  Downloading https://files.pythonhosted.org/packages/bf/6e/2f0bae6dde31cbc5b1b92546c0707cd91f4df84ec75fd1880a3b1bd3653e/potentials-0.1.0-py3-none-any.whl (68kB)
Installing collected packages: potentials
Successfully installed potentials-0.1.0
RESTART TERMINAL AFTER RUNNING!!!!
Runtime -> Restart runtime


## 1. Import potentials initialize database

Library imports

In [1]:
from pathlib import Path

import potentials

import pandas as pd

from IPython.core.display import display, HTML

### Database initialization

Initializing a potentials.Database() object without any parameters will set the default access parameters for [potentials.nist.gov](https://potentials.nist.gov) so users can search records.

In [2]:
potdb = potentials.Database()

### Optional: Load records

The records associated with one or more schemas may be loaded prior to searching by either specifying load parameters during initialization or calling load methods afterwards.  The load operations download all records of a given template from the database at once and stores the information locally in memory. Loading all records of a given schema takes a few seconds, but search operations on the loaded records are faster than searches on the remote records. Thus, depending on the intended usage, it may or may not be faster to load the records.

- Simple operations, like single searches for a small number of records, are overall faster *without* loading as only the matching records will be downloaded.

- Complex operations, like interactive tools with repeated search calls, will likely benefit from loading.

**Note** the search methods below ideally should return the same results if records have or have not been loaded. However, the underlying query mechanisms are quite different between the two, so this is not guaranteed yet.

In [3]:
#potdb.load_potentials(verbose=True)

In [4]:
#potdb.load_potential_LAMMPS(verbose=True)

## 2. Potential metadata

The records with the "Potential" schema define metadata associated with the potential, including citation information, usage notes, and a list of known implementations with files, parameter values, and/or external links.

The Database.get_potentials() method will search for matching potentials based on the specified parameters.

Parameters

- __id__ (*str or list, optional*) Potential ID(s) to search for.  These are unique identifiers derived from the publication information and the elemental system being modeled.

- __key__ (*str or list, optional*) UUID4 key(s) to search for.  Each entry has a unique random-generated UUID4 key.

- __author__ (*str or list, optional*) Author string(s) to search for.

- __year__ (*int or list, optional*) Publication year(s) to search for.

- __element__ (*str or list, optional*) Element model(s) to search for.

- __localpath__ (*str, optional*) If specified, any Potential records stored locally in localpath will be included in the search.

- __verbose__ (*bool, optional*) Indicates if informative print statements will be generated during the method operation.  Default value is False.

Returns

- __Potentials__ (*numpy.Array*) potentials.Potential objects for all matching Potential records.


In [5]:
pots = potdb.get_potentials(
    year = [2003, 2007],
    author = 'Mendelev',
    element = 'Fe',
)

# Show potentials
for pot in pots:
    print(pot)

Potential 2003--Mendelev-M-I-Han-S-Srolovitz-D-J-et-al--Fe-2
Potential 2003--Mendelev-M-I-Han-S-Srolovitz-D-J-et-al--Fe-5
Potential 2007--Mendelev-M-I-Han-S-Son-W-et-al--V-Fe


### 2.1. Explore metadata for the potentials

The search returns Potential objects that describe the metadata associated with the potentials.  This information can be accessed in a variety of ways.

#### html

The Potential.html() method builds html content for the potential consistent with how the entry appears on the Interatomic Potentials Repository.

In [6]:
potential = pots[0]
display(HTML(potential.html()))

#### asdict

The Potential.asdict() method builds a dictionary of the terms used by the Potential object.

In [7]:
potential.asdict()

{'key': '06bf7ccd-ea6a-4744-bfbc-adefa5bcdd60',
 'id': '2003--Mendelev-M-I-Han-S-Srolovitz-D-J-et-al--Fe-2',
 'recorddate': datetime.date(2018, 8, 15),
 'notes': "This listing is for the reference's Fe #2 interaction parameters.",
 'fictional': False,
 'elements': ['Fe'],
 'othername': None,
 'modelname': '2',
 'citations': [<potentials.Citation.Citation at 0x19135fd1f08>],
 'implementations': [<potentials.Implementation.Implementation at 0x19136e7c6c8>,
  <potentials.Implementation.Implementation at 0x19136eca2c8>,
  <potentials.Implementation.Implementation at 0x19136eca348>]}

The asdict() method is also useful for building DataFrame objects for comparing multiple records.

In [8]:
df = []
for pot in pots:
    df.append(pot.asdict())
df = pd.DataFrame(df)
df

Unnamed: 0,key,id,recorddate,notes,fictional,elements,othername,modelname,citations,implementations
0,06bf7ccd-ea6a-4744-bfbc-adefa5bcdd60,2003--Mendelev-M-I-Han-S-Srolovitz-D-J-et-al--...,2018-08-15,This listing is for the reference's Fe #2 inte...,False,[Fe],,2.0,[<potentials.Citation.Citation object at 0x000...,[<potentials.Implementation.Implementation obj...
1,3e8333b6-bc3a-470e-99cc-abb8a422403e,2003--Mendelev-M-I-Han-S-Srolovitz-D-J-et-al--...,2018-08-15,This listing is for the reference's Fe #5 inte...,False,[Fe],,5.0,[<potentials.Citation.Citation object at 0x000...,[<potentials.Implementation.Implementation obj...
2,6b558c94-3b96-4bc1-840f-48521e8770b6,2007--Mendelev-M-I-Han-S-Son-W-et-al--V-Fe,2018-08-15,,False,"[V, Fe]",,,[<potentials.Citation.Citation object at 0x000...,[<potentials.Implementation.Implementation obj...


## 3. LAMMPS Potential Implementations

### 3.1. Search LAMMPS Potentials

You can also search the potential implementations that work in LAMMPS with the Database.get_potential_LAMMPS() method.

Parameters

- __id__ (*str or list, optional*) LAMMPS potential ID(s) to search for. These are unique for each version of a potential.

- __key__ (*str or list, optional*) LAMMPS potential UUID4 key(s) to search for.  This should match the key for one of the Potential records.

- __potid__ (*str or list, optional*) Potential ID(s) to search for.  These are unique identifiers derived from the publication information and the elemental system being modeled. This should match the id for one of the Potential records.

- __potkey__ (*str or list, optional*) Potential UUID4 key(s) to search for.  This should match the key for one of the Potential records.

- __status__ (*str or None, optional*) Implementation status ('active', 'superceded', retracted') to limit search by.  If None is given, then all LAMMPS potential versions will be explored.  Default value is 'active', i.e. only the current versions.

- __pair_style__ (*str or list, optional*) LAMMPS pair style(s) to limit search by.

- __element__ (*str or list, optional*) Elements modeled by the potential to limit search by.

- __symbol__ (*str or list, optional*) Model symbols defined by the potential to limit search by.

- __verbose__ (*bool, optional*) Indicates if informative print statements will be generated during the method operation.  Default value is False.

Returns

- __lammps_potentials__ (*list*) PotentialLAMMPS objects.

Get active lammps potentials associated with the potentials found above

In [9]:
potkeys = []
for pot in pots:
    potkeys.append(pot.key)
    
lmppots = potdb.get_potential_LAMMPS(
    potkey = potkeys
)

# Show potentials
for lmppot in lmppots:
    print(lmppot)

2003--Mendelev-M-I--Fe-2--LAMMPS--ipr3
2003--Mendelev-M-I--Fe-5--LAMMPS--ipr1
2007--Mendelev-M-I--V-Fe--LAMMPS--ipr1


Get all lammps potentials associated with the first potential found above

In [10]:
lmppots = potdb.get_potential_LAMMPS(
    potkey = pots[0].key,
    status = None
)

# Show potentials
for lmppot in lmppots:
    print(lmppot)

2003--Mendelev-M-I--Fe-2--LAMMPS--ipr1
2003--Mendelev-M-I--Fe-2--LAMMPS--ipr2
2003--Mendelev-M-I--Fe-2--LAMMPS--ipr3


In [11]:
# Get active lammps potentials with the bop pair_style and elements Al and CU
lmppots = potdb.get_potential_LAMMPS(
    pair_style='bop',
    element=['Al', 'Cu']
)

# Show potentials
for lmppot in lmppots:
    print(lmppot)

2016--Zhou-X-W--Al-Cu--LAMMPS--ipr1
2016--Zhou-X-W--Al-Cu--LAMMPS--ipr2
2018--Zhou-X-W--Al-Cu-H--LAMMPS--ipr1


### 3.2 Download parameter files

The parameter files associated with LAMMPS potentials can be downloaded using the Database.download_LAMMPS_files() method.

Parameters

- __potential_LAMMPS__ (*PotentialLAMMPS or list*) The LAMMPS potential(s) to download paramter files for.

- __targetdir__ (*str, optional*) The root directory where all parameter files are to be saved. Default value is the current working directory.

To avoid naming conflicts, the downloaded files will be saved in subdirectories matching the id for each potential LAMMPS object.


Download the parameter files for the bop potentials identified above

In [12]:
potdb.download_LAMMPS_files(lmppots, targetdir='testdir')

Show the names of the downloaded files

In [13]:
for paramfile in Path('testdir').glob('*/*'):
    print(paramfile)

testdir\2016--Zhou-X-W--Al-Cu--LAMMPS--ipr1\AlCu.bop.table
testdir\2016--Zhou-X-W--Al-Cu--LAMMPS--ipr2\AlCu.bop.table
testdir\2018--Zhou-X-W--Al-Cu-H--LAMMPS--ipr1\AlCuH_Dec3.t


### 3.3 Generate LAMMPS input commands

The PotentialLAMMPS objects returned by get_potential_LAMMPS can be used to generate LAMMPS commands.

The PotentialLAMMPS.pair_info() method will dynamically generate LAMMPS command lines for the potential based on the atom type symbols used.

In [14]:
lmppot = lmppots[0]
print(lmppot.symbols)

['Al', 'Cu']


In [15]:
print(lmppot.pair_info(['Al', 'Al', 'Cu']))

mass 1 26.98
mass 2 26.98
mass 3 63.55

pair_style bop
pair_coeff * * AlCu.bop.table Al Al Cu
comm_modify cutoff 14.7



If you downloaded files as above, you can set the path to the parameter files as well using the PotentialLAMMPS.pot_dir property.

In [16]:
lmppot.pot_dir = Path('testdir', lmppot.id)

print(lmppot.pair_info(['Cu', 'Al']))

mass 1 63.55
mass 2 26.98

pair_style bop
pair_coeff * * testdir\2016--Zhou-X-W--Al-Cu--LAMMPS--ipr1\AlCu.bop.table Cu Al
comm_modify cutoff 14.7



## 4. Custom searches

The Database class also provides access to the underlying tools used to perform the database searches.

- __Database.cdcs__ accesses the cdcs.CDCS API client allowing for custom-built queries and rest calls to be constructed.

- __Database.potentials__ if load_potentials() is called, this is a numpy array containing all the loaded Potential objects.

- __Database.potentials_df__ if load_potentials() is called, this is a pandas.DataFrame constructed from all the loaded Potentials' asdict() representations.

- __Database.potential_LAMMPS__ if load_potential_LAMMPS() is called, this is a numpy array containing all the loaded PotentialLAMMPS objects.

- __Database.potential_LAMMPS_df__ if load_potential_LAMMPS() is called, this is a pandas.DataFrame constructed from all the loaded PotentialLAMMPS' asdict() representations. 

File cleanup

In [18]:
import shutil
shutil.rmtree('testdir')