# Database class

This Notebook outlines the Database class, which allows for users to interact with the https://potentials.nist.gov database.

Currently, the database class provides in-depth support for three primary record types
- __citations__, which contain bibliographic information for the articles associated with interatomic potentials,
- __potentials__, which provides the description of the different interatomic models as it appears on the NIST Interatomic Potentials Repository, and 
- __lammps_potentials__, which provides metadata information for generating LAMMPS input command lines for the LAMMPS compatible potential implementations.

There is also basic support allowing for generic searching and downloading of any other record types hosted there.

Library imports

In [1]:
try:
    # Check if potentials has been installed
    import potentials
except:
    # Install if need be and print message
    !pip install potentials
    print('!!!!! RESTART NOTEBOOK KERNEL TO USE POTENTIALS !!!!!')
     
else:
    # Other imports
    import time
    from pathlib import Path
    from IPython.core.display import display, HTML

## 1. Database initialization

The potentials.Database() class serves as the primary means of accessing records from the potentials database. Depending on the initialization settings, the class can access records from a CDCS instance and/or from a local directory. 

### Option #1: Default 

The default initialization with no parameters will only interact with https://potentials.nist.gov and no records will be loaded into memory.  This is best for lightweight apps that do single searches for potentials as every search requires an http request to the database. 

In [2]:
s = time.time()

db_remote_no_load = potentials.Database()

print(f'init took {time.time()-s} seconds')

init took 0.0 seconds


### Option #2: Loading records from remote

Specifying load=True, or load=[styles] will download all records from https://potentials.nist.gov for the indicated style(s) and store them in memory.  The loading may take a few seconds, but all searches after will be much faster as they will operate on the loaded records.  This is best for involved database explorations on the most current potentials hosted.

Records can also be (re)loaded after initialization with the load_all(), load_citations(), load_potentials() and load_lammps_potentials() methods.

In [3]:
s = time.time()

db_remote_load = potentials.Database(load=True, verbose=True)

print(f'init took {time.time()-s} seconds')

Loaded 276 remote citations
Loaded 289 remote potentials
Loaded 273 remote LAMMPS potentials
init took 53.98292279243469 seconds


### Option #3: Remote plus local

Giving a localpath allows for the database to interact with a local directory as well as https://potentials.nist.gov. This allows for

- the creation of a local copy of https://potentials.nist.gov by downloading all records, and 
- user-defined/modified potentials to be integrated in with the hosted ones.

NOTE: Searching the local records requires that they first be loaded. During loading, the local records are accessed before the remote ones. If the names of any records from the two locations are the same, the local versions are retained in the loaded set.

In [4]:
# cwd/testdb used here for testing purposes. SET TO REAL PATH FOR REAL OPERATIONS!
localpath = Path('testdb')

In [5]:
s = time.time()

db_remote_local = potentials.Database(localpath=localpath, load=True, verbose=True)

print(f'init took {time.time()-s} seconds')

Loaded 276 local citations
Loaded 276 remote citations
 - 0 new
Loaded 289 local potentials
Loaded 289 remote potentials
 - 0 new
Loaded 273 local LAMMPS potentials
Loaded 273 remote LAMMPS potentials
 - 0 new
init took 60.50048542022705 seconds


A local copy of the database (or a limited set of records) can easily be built using the download_all(), download_citations(), download_potentials() and download_lammps_potentials() methods. These methods will save all downloaded records to localpath.

In [6]:
db_remote_local.download_all(verbose=True)

276 citation records copied to localpath
Downloaded 289 of Potential
Downloaded 273 of potential_LAMMPS
Files for 273 LAMMPS potentials downloaded


### Option #4: Local only

If the records are stored locally, then the remote options can be turned off.  This avoids needing to perform any http requests during normal operations. As such, it is much faster and can be used on resources that do not have internet access.

In [7]:
s = time.time()

db_local = potentials.Database(localpath=localpath, load=True, remote=False, verbose=True)

print(f'init took {time.time()-s} seconds')

Loaded 276 local citations
Loaded 289 local potentials
Loaded 273 local LAMMPS potentials
init took 10.940054416656494 seconds


### Option #5: Authorized user

Uploading of new content to https://potentials.nist.gov is restricted to those who have accounts and write permissions.  Please contact [potentials@nist.gov](potentials@nist.gov) if you are interested in contributing content to the database.

In [None]:
db_auth = potentials.Database(username='lmh1')

## 2. Get methods

The available records can be searched through to find matches using various get methods.

### 2.1 get_citation()

Allows for searching the citations. get_citation() returns a single citation for a given DOI. The function first checks the loaded citations, then checks https://potentials.nist.gov, then checks CrossRef for a match.

In [12]:
cit = db_remote_load.get_citation('10.1016/j.actamat.2003.11.026', verbose=True)

Citation retrieved from loaded citations


In [11]:
cit = db_remote_no_load.get_citation('10.1016/j.actamat.2003.11.026', verbose=True)

Citation retrieved from remote database


The returned Citation object allows for the content to be retrieved as a bibtex string, a Python dictionary, or as formatted html.

In [15]:
print(cit.bibtex)

@article{Mishin_2004,
 abstract = {A new embedded-atom potential has been developed for Ni3Al by fitting to experimental and first-principles data. The potential describes lattice properties of Ni3Al, point defects, planar faults, as well as the γ and γ′ fields on the Ni–Al phase diagram. The potential is applied to calculate the energies of coherent Ni/Ni3Al interphase boundaries with three different crystallographic orientations. Depending on the orientation, the interface energy varies between 12 and 46 mJ/m2. Coherent γ/γ′ interfaces existing at high temperatures are shown to be more diffuse and are likely to have a lower energy than Ni/Ni3Al interfaces.},
 author = {Y. Mishin},
 doi = {10.1016/j.actamat.2003.11.026},
 journal = {Acta Materialia},
 month = {apr},
 number = {6},
 pages = {1451--1467},
 publisher = {Elsevier BV},
 title = {Atomistic modeling of the γ and γ'-phases of the Ni-Al system},
 url = {https://doi.org/10.1016%2Fj.actamat.2003.11.026},
 volume = {52},
 year = 

In [17]:
cit.asdict()

{'ENTRYTYPE': 'article',
 'ID': 'Mishin_2004',
 'abstract': 'A new embedded-atom potential has been developed for Ni3Al by fitting to experimental and first-principles data. The potential describes lattice properties of Ni3Al, point defects, planar faults, as well as the γ and γ′ fields on the Ni–Al phase diagram. The potential is applied to calculate the energies of coherent Ni/Ni3Al interphase boundaries with three different crystallographic orientations. Depending on the orientation, the interface energy varies between 12 and 46 mJ/m2. Coherent γ/γ′ interfaces existing at high temperatures are shown to be more diffuse and are likely to have a lower energy than Ni/Ni3Al interfaces.',
 'author': 'Y. Mishin',
 'doi': '10.1016/j.actamat.2003.11.026',
 'journal': 'Acta Materialia',
 'month': 'apr',
 'number': '6',
 'pages': '1451--1467',
 'publisher': 'Elsevier BV',
 'title': "Atomistic modeling of the γ and γ'-phases of the Ni-Al system",
 'url': 'https://doi.org/10.1016%2Fj.actamat.200

In [19]:
display(HTML(cit.html()))

### 2.2 get_potentials(), get_potential()

Allows for searching the hosted potentials and viewing the descriptions hosted on the NIST Interatomic Potentials Repository.  get_potentials() always returns a list of matches, while get_potential() returns a single match if exactly one is found and throws an error otherwise.

- __id__ (*str or list, optional*) Potential ID(s) to search for.  These are unique identifiers derived from the publication information and the elemental system being modeled.

- __key__ (*str or list, optional*) UUID4 key(s) to search for.  Each entry has a unique random-generated UUID4 key.

- __author__ (*str or list, optional*) Author string(s) to search for.

- __year__ (*int or list, optional*) Publication year(s) to search for.

- __element__ (*str or list, optional*) Element model(s) to search for.

- __localpath__ (*str, optional*) Path to a local directory to check for records first.  If not given, will check localpath value set during object initialization.  If not given or set during initialization, then only the remote database will be loaded.

- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.


In [30]:
# Get all Mendelev Fe potentials from 2003 and 2007
pots = db_local.get_potentials(author='Mendelev', element='Fe', year=[2003, 2007])
for pot in pots:
    print(pot.id)

2003--Mendelev-M-I-Han-S-Srolovitz-D-J-et-al--Fe-2
2003--Mendelev-M-I-Han-S-Srolovitz-D-J-et-al--Fe-5
2007--Mendelev-M-I-Han-S-Son-W-et-al--V-Fe


In [33]:
# Get the Mishin Al potential from 1999
pot = db_local.get_potential(author='Mishin', element='Al', year=1999)
print(pot.id)

1999--Mishin-Y-Farkas-D-Mehl-M-J-Papaconstantopoulos-D-A--Al


The returned Potential object(s) can be viewed as HTML

In [34]:
display(HTML(pot.html()))

Individual values can also be accessed as attributes. All associated attributes can be seen using asdict()

In [35]:
print(pot.asdict())

{'key': '0c2ddffb-e644-4e5c-8e53-9bf722bb5dee', 'id': '1999--Mishin-Y-Farkas-D-Mehl-M-J-Papaconstantopoulos-D-A--Al', 'recorddate': datetime.date(2018, 8, 15), 'notes': None, 'fictional': False, 'elements': ['Al'], 'othername': None, 'modelname': None, 'citations': [<potentials.Citation.Citation object at 0x000001A017F99108>], 'implementations': [<potentials.Implementation.Implementation object at 0x000001A0183DC708>, <potentials.Implementation.Implementation object at 0x000001A0183DC4C8>]}


In [36]:
print(pot.recorddate)

2018-08-15


### 2.3 get_lammps_potentials(), get_lammps_potential()

Allows for searching the LAMMPS potential records that assist with building LAMMPS input command lines, as well as the downloading/locating of LAMMPS potential parameter files.  get_lammps_potentials() always returns a list of matches, while get_lammps_potential() returns a single match if exactly one is found and throws an error otherwise.

- __id__ (*str or list, optional*) The id value(s) to limit the search by.
        
- __key__ (*str or list, optional*) The key value(s) to limit the search by.
        
- __potid__ (*str or list, optional*) The potid value(s) to limit the search by.
        
- __potkey__ (*str or list, optional*) The potkey value(s) to limit the search by.
        
- __status__ (*str or list, optional*) The status value(s) to limit the search by.
        
- __pair_style__ (*str or list, optional*) The pair_style value(s) to limit the search by.
        
- __element__ (*str or list, optional*) The included elemental model(s) to limit the search by.
        
- __symbol__ (*str or list, optional*) The included symbol model(s) to limit the search by.
        
- __verbose__ (*bool, optional*) If True, informative print statements will be used.
        
- __get_files__ (*bool, optional*) If True, then the parameter files for the matching potentials will also be retrieved and copied to the working directory. If False (default) and the parameter files are in the library, then the returned objects' pot_dir path will be set appropriately.

In [37]:
# Get all potentials with the bop pair style
lmppots = db_local.get_lammps_potentials(pair_style='bop')
for lmppot in lmppots:
    print(lmppot.id)

2006--Murdick-D-A--Ga-As--LAMMPS--ipr1
2012--Ward-D-K--Cd-Te--LAMMPS--ipr1
2012--Ward-D-K--Cd-Te-Zn--LAMMPS--ipr1
2013--Ward-D-K--Cd-Te-Zn--LAMMPS--ipr1
2014--Zhou-X-W--Cd-Te-Se--LAMMPS--ipr1
2015--Zhou-X-W--C--LAMMPS--ipr1
2015--Zhou-X-W--C-Cu--LAMMPS--ipr1
2015--Zhou-X-W--Cu-H--LAMMPS--ipr1
2016--Zhou-X-W--Al-Cu--LAMMPS--ipr2
2018--Zhou-X-W--Al-Cu-H--LAMMPS--ipr1


In [38]:
# Get the LAMMPS potential associated with the 1999 Mishin Al potential found above
lmppot = db_local.get_lammps_potential(potkey=pot.key)
print(lmppot.id)

1999--Mishin-Y--Al--LAMMPS--ipr1


The returned LAMMPSPotential object(s) can be used to generate LAMMPS command lines for the potential.  If the parameter files for the LAMMPS potential are located in the localpath directory, then the command lines will point to the correct location.

In [39]:
print(lmppot.pair_info())

mass 1 26.982

pair_style eam/alloy
pair_coeff * * testdb\potential_LAMMPS\1999--Mishin-Y--Al--LAMMPS--ipr1\Al99.eam.alloy Al



In [40]:
print(lmppot.pair_info(symbols=['Al', 'Al']))

mass 1 26.982
mass 2 26.982

pair_style eam/alloy
pair_coeff * * testdb\potential_LAMMPS\1999--Mishin-Y--Al--LAMMPS--ipr1\Al99.eam.alloy Al Al



If the parameter files have not been downloaded, or you want to copy them to the working directory, then you can use the get_files setting.

In [41]:
lmppot = db_local.get_lammps_potential(potkey=pot.key, get_files=True, verbose=True)
print(lmppot.pair_info())

1 matching LAMMPS potentials found from loaded records
Files for 1 LAMMPS potentials copied
mass 1 26.982

pair_style eam/alloy
pair_coeff * * 1999--Mishin-Y--Al--LAMMPS--ipr1\Al99.eam.alloy Al



## 3. Custom searches

The Database class also provides access to the underlying tools used to perform the database searches.

- __Database.citations_df, Database.potentials_df, and Database.lammps_potentials_df__ if records of the given style have been loaded, then these are pandas.DataFrame containing the dictionary representations of all the loaded record objects.

- __Database.citations, Database.potentials, and Database.lammps_potentials__: if records of the given style have been loaded, then these are numpy arrays containing all the loaded record objects.  The order of the records in these arrays is the same as in the corresponding DataFrames, so any conditional searches on the DataFrame values can be directly applied to these arrays.

- __Database.cdcs__ : the underlying cdcs.CDCS API client. Accessing this directly allows for custom-built queries and rest calls to be constructed.

In [42]:
db_local.lammps_potentials_df

Unnamed: 0,id,key,potid,potkey,units,atom_style,allsymbols,pair_style,status,symbols,elements,masses,charges
0,1985--Foiles-S-M--Ni-Cu--LAMMPS--ipr1,062d2ba7-3903-40ae-a772-daa471d107c6,1985--Foiles-S-M--Ni-Cu,301f04ce-9082-4542-8590-489300cd19e8,metal,atomic,False,eam,active,"[Cu, Ni]","[Cu, Ni]","[63.55, 58.71]","[0.0, 0.0]"
1,1985--Stillinger-F-H--Si--LAMMPS--ipr1,d085648c-b3ef-4be8-824b-7093fd22770a,1985--Stillinger-F-H-Weber-T-A--Si,edc31ad6-2b9a-455c-9b5f-e888a672ecbd,metal,atomic,False,sw,active,[Si],[Si],[28.085],[0.0]
2,1986--Foiles-S-M--Ag--LAMMPS--ipr1,76a265fc-45ff-49d7-8c64-2044f12402f2,1986--Foiles-S-M-Baskes-M-I-Daw-M-S--Ag,672d54f8-9f48-4200-af56-8a7378ebbc4a,metal,atomic,False,eam,active,[Ag],[Ag],[107.87],[0.0]
3,1986--Foiles-S-M--Ag-Au-Cu-Ni-Pd-Pt--LAMMPS--ipr1,c5afa7e8-6b3b-49cd-ad1c-ae3e4329363a,1986--Foiles-S-M-Baskes-M-I-Daw-M-S--Ag-Au-Cu-...,7a1302de-59cf-4efb-900e-cad845b68ee5,metal,atomic,False,eam,active,"[Ag, Au, Cu, Ni, Pd, Pt]","[Ag, Au, Cu, Ni, Pd, Pt]","[107.87, 196.97, 63.55, 58.71, 106.4, 195.09]","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]"
4,1986--Foiles-S-M--Au--LAMMPS--ipr1,c588810a-b96d-4871-bfe2-cff8a5a7c709,1986--Foiles-S-M-Baskes-M-I-Daw-M-S--Au,ffb66faa-319d-4556-8363-dad3959cd553,metal,atomic,False,eam,active,[Au],[Au],[196.97],[0.0]
...,...,...,...,...,...,...,...,...,...,...,...,...,...
268,2019--Aslam-I--Fe-Mn-Si-C--LAMMPS--ipr1,7e0ed24c-5ad9-4a3a-a5c9-19f6be618bb1,2019--Aslam-I-Baskes-M-I-Dickel-D-E-et-al--Fe-...,2f94611f-e4a3-4cf5-a2f1-fa15673a5ad7,metal,atomic,False,meam,active,"[Fe, Mn, Si, C]","[Fe, Mn, Si, C]","[55.847, 54.938, 28.0855, 12.0111]","[0.0, 0.0, 0.0, 0.0]"
269,2019--Byggmastar-J--Fe-O--LAMMPS--ipr1,957b3de5-bc9d-413f-a117-4a5f929431e8,2019--Byggmastar-J-Nagel-M-Albe-K-et-al--Fe-O,521acab0-00b4-4534-884b-218db3cb92ff,metal,atomic,False,tersoff/zbl,active,"[Fe, O]","[Fe, O]","[55.845, 15.9994]","[0.0, 0.0]"
270,2019--Mendelev-M-I--Cu-Zr--LAMMPS--ipr1,73d03486-1b34-4a24-b1d2-4da253abdda0,2019--Mendelev-M-I--Cu-Zr,2e5ed18c-5fb4-46e4-ae7a-5f50d9429d95,metal,atomic,False,eam/fs,active,"[Cu, Zr]","[Cu, Zr]","[63.546, 91.224]","[0.0, 0.0]"
271,2019--Mendelev-M-I--Fe-Ni-Cr--LAMMPS--ipr1,43aa1b40-dce4-48cf-8e4f-fb0e079180b6,2019--Mendelev-M-I--Fe-Ni-Cr,6c5d342a-7e25-467f-a234-d937301f6bc4,metal,atomic,False,eam/fs,active,"[Fe, Ni, Cr]","[Fe, Ni, Cr]","[55.845, 58.6934, 51.9961]","[0.0, 0.0, 0.0]"


## 4. Other database records

https://potentials.nist.gov contains other records than the three primary potentials-centric styles. The Database class has some basic methods supporting these other record types as well.

- __download_records()__ allowing all records of a given template (style) to be downloaded to the localpath.

- __get_record()__ allowing a single record of a given template to be retrieved by name either from localpath or the remote database.  The retrieved record is returned as a DataModelDict.DataModelDict object.

