# Database class

This Notebook outlines the Database class, which allows for users to interact with the https://potentials.nist.gov database.  It contains methods for interacting with any type of record in the database, as well as more in-depth support for
- __citations__, which contain bibliographic information for the articles associated with interatomic potentials.
- __potentials__, which provides the description of the different interatomic models as it appears on the NIST Interatomic Potentials Repository.
- __LAMMPS potentials__, which provides metadata information for generating LAMMPS input command lines for the LAMMPS compatible potential implementations.
- __openKIM models__, which get integrated in with the LAMMPS potentials based on the KIM models available.

Library imports

In [1]:
try:
    # Check if potentials has been installed
    import potentials
except:
    # Install if need be and print message
    !pip install potentials
    print('!!!!! RESTART NOTEBOOK KERNEL TO USE POTENTIALS !!!!!')
     
else:
    # Other imports
    import time
    from pathlib import Path
    from IPython.core.display import display, HTML

## 1. Database initialization

The potentials.Database() class serves as the primary means of accessing records from the potentials database. Depending on the initialization settings, the class can access records from a CDCS instance and/or from a local directory. 

Parameters

- __host__ (*str, optional*) CDCS site to access.  Default value is 'https://potentials.nist.gov/'.
- __username__ (*str, optional*) User name to use to access the host site.  Default value of '' will access the site as an anonymous visitor.
- __password__ (*str, optional*) Password associated with the given username.  Not needed for anonymous access.
- __certification__ (*str, optional*) File path to certification file if needed for host.
- __localpath__ (*str, optional*) Path to the local library directory to use.  If not given, will use the set library_directory setting.
- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.
- __local__ (*bool, optional*) Indicates if the load operations will check localpath for records. Default value is controlled by settings.
- __remote__ (*bool, optional*) Indicates if the load operations will download records from the remote database.  Default value is controlled by settings.  If a local copy exists, then setting this to False is considerably faster.
- __load__ (*bool, str or list, optional*) If True, citations, potentials and lammps_potentials will all be loaded during initialization. If False (default), none will be loaded.  Alternatively, a str or list can be given to specify which of the three record types to load.
- __status__ (*str, list or None, optional*) Only potential_LAMMPS records with the given status(es) will be loaded.  Allowed values are 'active' (default), 'superseded', and 'retracted'.  If None is given, then all potentials will be loaded.

### Option #1: Default , no pre-loading

By default, no records will be pre-loaded into memory.  This makes the initialization step fast, but all subsequent searches much slower.  Ideally, this is best for simple web-style apps where only a small number of record searches is expected.

__NOTE__: the get_potential(s) and get_lammps_potential(s) will only perform remote queries and will *not* explore local records if those types of records are not pre-loaded.

In [2]:
s = time.time()

db = potentials.Database()

print(f'init took {time.time()-s} seconds')

init took 0.003998756408691406 seconds


### Option #2: Pre-loading records

Specifying load=True, or load=[styles] will build a list of loaded records first by checking the local directory path, then records from the remote database for citations, potentials and/or lammps_potentials.  For records in both locations, the local versions are given precedence to allow for user-defined changes.  

The loading may take about a minute or so, but all searches after will be much faster as they will operate on records stored in memory. This is best for users who wish to combine local records with remote records, or who want to perform more involved explorations involving multiple queries to the associated records.

__NOTE__: Records can also be (re)loaded after initialization by calling the load_all(), load_citations(), load_potentials() and load_lammps_potentials() methods.

In [3]:
s = time.time()

db_loaded = potentials.Database(load=True, verbose=True)

print(f'init took {time.time()-s} seconds')

Loaded 334 local citations
Loaded 334 remote citations
 - 0 new
Loaded 295 local potentials
Loaded 295 remote potentials
 - 0 new
Loaded 280 local LAMMPS potentials
Loaded 280 remote LAMMPS potentials
 - 0 new
init took 43.56632971763611 seconds


If you wish, you can download all of the records from the remote to the local using download_all

In [6]:
# Use the above database to download records to the library directory
#db.download_all(indent=4, format='json', verbose=True, getfiles=False)

333 citations saved to localpath
 - 333 duplicate citations skipped
294 potentials saved to localpath
 - 294 duplicate potentials skipped
279 LAMMPS potentials saved to localpath
 - 279 duplicate potentials skipped


### Option #3: Local-only

Remote queries can be turned off by setting the remote parameter to False either in the default settings or during database initialization.  This allows for only a local copy of the database to be used for those who wish to run offline as much as possible.  

With the database copied locally, turning off remote is faster and is entirely offline.

In [4]:
s = time.time()

db_local = potentials.Database(load=True, remote=False, verbose=True)

print(f'init took {time.time()-s} seconds')

Loaded 334 local citations
Loaded 295 local potentials
Loaded 280 local LAMMPS potentials
init took 7.809008598327637 seconds


## 2. Get methods

The available records can be searched through to find matches using various get methods.

### 2.1 get_citation()

Allows for searching the citations. get_citation() returns a single citation for a given DOI. The function first checks the loaded citations, then checks https://potentials.nist.gov, then checks CrossRef for a match.

In [5]:
cit = db.get_citation('10.1016/j.actamat.2003.11.026', verbose=True)

Citation retrieved from local file 10.1016_j.actamat.2003.11.026.bib


In [6]:
cit = db_loaded.get_citation('10.1016/j.actamat.2003.11.026', verbose=True)

Citation retrieved from loaded citations


The returned Citation object allows for the content to be retrieved as a bibtex string, a Python dictionary, or as formatted html.

In [7]:
print(cit.bibtex)

@article{Mishin_2004,
 abstract = {A new embedded-atom potential has been developed for Ni3Al by fitting to experimental and first-principles data. The potential describes lattice properties of Ni3Al, point defects, planar faults, as well as the γ and γ′ fields on the Ni–Al phase diagram. The potential is applied to calculate the energies of coherent Ni/Ni3Al interphase boundaries with three different crystallographic orientations. Depending on the orientation, the interface energy varies between 12 and 46 mJ/m2. Coherent γ/γ′ interfaces existing at high temperatures are shown to be more diffuse and are likely to have a lower energy than Ni/Ni3Al interfaces.},
 author = {Y. Mishin},
 doi = {10.1016/j.actamat.2003.11.026},
 journal = {Acta Materialia},
 month = {apr},
 number = {6},
 pages = {1451--1467},
 publisher = {Elsevier BV},
 title = {Atomistic modeling of the γ and γ'-phases of the Ni-Al system},
 url = {https://doi.org/10.1016%2Fj.actamat.2003.11.026},
 volume = {52},
 year = 

In [8]:
cit.asdict()

{'ENTRYTYPE': 'article',
 'ID': 'Mishin_2004',
 'abstract': 'A new embedded-atom potential has been developed for Ni3Al by fitting to experimental and first-principles data. The potential describes lattice properties of Ni3Al, point defects, planar faults, as well as the γ and γ′ fields on the Ni–Al phase diagram. The potential is applied to calculate the energies of coherent Ni/Ni3Al interphase boundaries with three different crystallographic orientations. Depending on the orientation, the interface energy varies between 12 and 46 mJ/m2. Coherent γ/γ′ interfaces existing at high temperatures are shown to be more diffuse and are likely to have a lower energy than Ni/Ni3Al interfaces.',
 'author': 'Y. Mishin',
 'doi': '10.1016/j.actamat.2003.11.026',
 'journal': 'Acta Materialia',
 'month': 'apr',
 'number': '6',
 'pages': '1451--1467',
 'publisher': 'Elsevier BV',
 'title': "Atomistic modeling of the γ and γ'-phases of the Ni-Al system",
 'url': 'https://doi.org/10.1016%2Fj.actamat.200

In [9]:
display(HTML(cit.html()))

### 2.2 get_potentials(), get_potential()

Allows for searching the hosted potentials and viewing the descriptions hosted on the NIST Interatomic Potentials Repository.  get_potentials() always returns a list of matches, while get_potential() returns a single match if exactly one is found and throws an error otherwise.

- __id__ (*str or list, optional*) Potential ID(s) to search for.  These are unique identifiers derived from the publication information and the elemental system being modeled.

- __key__ (*str or list, optional*) UUID4 key(s) to search for.  Each entry has a unique random-generated UUID4 key.

- __author__ (*str or list, optional*) Author string(s) to search for.

- __year__ (*int or list, optional*) Publication year(s) to search for.

- __elements__ (*str or list, optional*) Element(s) to search for.

- __localpath__ (*str, optional*) Path to a local directory to check for records first.  If not given, will check localpath value set during object initialization.  If not given or set during initialization, then only the remote database will be loaded.

- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.


In [10]:
# Get all Mendelev Fe potentials from 2003 and 2007
pots = db_loaded.get_potentials(author='Mendelev', elements='Fe', year=[2003, 2007])
for pot in pots:
    print(pot.id)

2003--Mendelev-M-I-Han-S-Srolovitz-D-J-et-al--Fe-2
2003--Mendelev-M-I-Han-S-Srolovitz-D-J-et-al--Fe-5
2007--Mendelev-M-I-Han-S-Son-W-et-al--V-Fe


In [11]:
# Get the Mishin Al potential from 1999
pot = db_loaded.get_potential(author='Mishin', elements='Al', year=1999)
print(pot.id)

1999--Mishin-Y-Farkas-D-Mehl-M-J-Papaconstantopoulos-D-A--Al


The returned Potential object(s) can be viewed as HTML

In [12]:
display(HTML(pot.html()))

Individual values can also be accessed as attributes. All associated attributes can be seen using asdict()

In [13]:
print(pot.asdict())

{'key': '0c2ddffb-e644-4e5c-8e53-9bf722bb5dee', 'id': '1999--Mishin-Y-Farkas-D-Mehl-M-J-Papaconstantopoulos-D-A--Al', 'recorddate': datetime.date(2018, 8, 15), 'notes': None, 'fictional': False, 'elements': ['Al'], 'othername': None, 'modelname': None, 'citations': [<potentials.Citation.Citation object at 0x00000217B5B67550>], 'implementations': [<potentials.Implementation.Implementation object at 0x00000217B5A36C18>, <potentials.Implementation.Implementation object at 0x00000217B5620C18>]}


In [14]:
print(pot.recorddate)

2018-08-15


### 2.3 get_lammps_potentials(), get_lammps_potential()

Allows for searching the LAMMPS potential records that assist with building LAMMPS input command lines, as well as the downloading/locating of LAMMPS potential parameter files.  get_lammps_potentials() always returns a list of matches, while get_lammps_potential() returns a single match if exactly one is found and throws an error otherwise.

- __id__ (*str or list, optional*) The id value(s) to limit the search by.
        
- __key__ (*str or list, optional*) The key value(s) to limit the search by.
        
- __potid__ (*str or list, optional*) The potid value(s) to limit the search by.
        
- __potkey__ (*str or list, optional*) The potkey value(s) to limit the search by.
        
- __status__ (*str or list, optional*) The status value(s) to limit the search by.
        
- __pair_style__ (*str or list, optional*) The pair_style value(s) to limit the search by.
        
- __elements__ (*str or list, optional*) The included element(s) to limit the search by.
        
- __symbols__ (*str or list, optional*) The included symbol model(s) to limit the search by.
        
- __verbose__ (*bool, optional*) If True, informative print statements will be used.
        
- __getfiles__ (*bool, optional*) If True, then the parameter files for the matching potentials will also be retrieved and copied to the working directory. If False (default) and the parameter files are in the library, then the returned objects' pot_dir path will be set appropriately.

In [15]:
# Get all potentials with the bop pair style
lmppots = db_loaded.get_lammps_potentials(pair_style='bop')
for lmppot in lmppots:
    print(lmppot.id)

2006--Murdick-D-A--Ga-As--LAMMPS--ipr1
2012--Ward-D-K--Cd-Te--LAMMPS--ipr1
2012--Ward-D-K--Cd-Te-Zn--LAMMPS--ipr1
2013--Ward-D-K--Cd-Te-Zn--LAMMPS--ipr1
2014--Zhou-X-W--Cd-Te-Se--LAMMPS--ipr1
2015--Zhou-X-W--C--LAMMPS--ipr1
2015--Zhou-X-W--C-Cu--LAMMPS--ipr1
2015--Zhou-X-W--Cu-H--LAMMPS--ipr1
2016--Zhou-X-W--Al-Cu--LAMMPS--ipr2
2018--Zhou-X-W--Al-Cu-H--LAMMPS--ipr1


In [16]:
# Get the LAMMPS potential associated with the 1999 Mishin Al potential found above
lmppot = db_loaded.get_lammps_potential(potkey=pot.key)
print(lmppot.id)

1999--Mishin-Y--Al--LAMMPS--ipr1


The returned LAMMPSPotential object(s) can be used to generate LAMMPS command lines for the potential.  If the parameter files for the LAMMPS potential are located in the localpath directory, then the command lines will point to the correct location.

In [17]:
print(lmppot.pair_info())

mass 1 26.982

pair_style eam/alloy
pair_coeff * * C:\Users\lmh1\Documents\library\potential_LAMMPS\1999--Mishin-Y--Al--LAMMPS--ipr1\Al99.eam.alloy Al



In [18]:
print(lmppot.pair_info(symbols=['Al', 'Al']))

mass 1 26.982
mass 2 26.982

pair_style eam/alloy
pair_coeff * * C:\Users\lmh1\Documents\library\potential_LAMMPS\1999--Mishin-Y--Al--LAMMPS--ipr1\Al99.eam.alloy Al Al



If the parameter files have not been downloaded, or you want to copy them to the working directory, then you can use the get_files setting.

In [19]:
lmppot = db_loaded.get_lammps_potential(potkey=pot.key, getfiles=True, verbose=True)
print(lmppot.pair_info())

1 matching LAMMPS potentials found from loaded records
Files for 1 LAMMPS potentials copied
Files for 0 LAMMPS potentials downloaded
mass 1 26.982

pair_style eam/alloy
pair_coeff * * 1999--Mishin-Y--Al--LAMMPS--ipr1\Al99.eam.alloy Al



## 3. Custom searches

The Database class also provides access to the underlying tools used to perform the database searches.

- __Database.citations_df, Database.potentials_df, and Database.lammps_potentials_df__ if records of the given style have been loaded, then these are pandas.DataFrame containing the dictionary representations of all the loaded record objects.

- __Database.citations, Database.potentials, and Database.lammps_potentials__: if records of the given style have been loaded, then these are numpy arrays containing all the loaded record objects.  The order of the records in these arrays is the same as in the corresponding DataFrames, so any conditional searches on the DataFrame values can be directly applied to these arrays.

- __Database.cdcs__ : the underlying cdcs.CDCS API client. Accessing this directly allows for custom-built queries and rest calls to be constructed.

In [20]:
db_loaded.lammps_potentials_df

Unnamed: 0,allsymbols,atom_style,charges,elements,id,key,masses,pair_style,potid,potkey,status,symbols,units
0,False,atomic,"[0.0, 0.0]","[Cu, Ni]",1985--Foiles-S-M--Ni-Cu--LAMMPS--ipr1,062d2ba7-3903-40ae-a772-daa471d107c6,"[63.55, 58.71]",eam,1985--Foiles-S-M--Ni-Cu,301f04ce-9082-4542-8590-489300cd19e8,active,"[Cu, Ni]",metal
1,False,atomic,[0.0],[Si],1985--Stillinger-F-H--Si--LAMMPS--ipr1,d085648c-b3ef-4be8-824b-7093fd22770a,[28.085],sw,1985--Stillinger-F-H-Weber-T-A--Si,edc31ad6-2b9a-455c-9b5f-e888a672ecbd,active,[Si],metal
2,False,atomic,[0.0],[Ag],1986--Foiles-S-M--Ag--LAMMPS--ipr1,76a265fc-45ff-49d7-8c64-2044f12402f2,[107.87],eam,1986--Foiles-S-M-Baskes-M-I-Daw-M-S--Ag,672d54f8-9f48-4200-af56-8a7378ebbc4a,active,[Ag],metal
3,False,atomic,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]","[Ag, Au, Cu, Ni, Pd, Pt]",1986--Foiles-S-M--Ag-Au-Cu-Ni-Pd-Pt--LAMMPS--ipr1,c5afa7e8-6b3b-49cd-ad1c-ae3e4329363a,"[107.87, 196.97, 63.55, 58.71, 106.4, 195.09]",eam,1986--Foiles-S-M-Baskes-M-I-Daw-M-S--Ag-Au-Cu-...,7a1302de-59cf-4efb-900e-cad845b68ee5,active,"[Ag, Au, Cu, Ni, Pd, Pt]",metal
4,False,atomic,[0.0],[Au],1986--Foiles-S-M--Au--LAMMPS--ipr1,c588810a-b96d-4871-bfe2-cff8a5a7c709,[196.97],eam,1986--Foiles-S-M-Baskes-M-I-Daw-M-S--Au,ffb66faa-319d-4556-8363-dad3959cd553,active,[Au],metal
5,False,atomic,[0.0],[Cu],1986--Foiles-S-M--Cu--LAMMPS--ipr1,380d3b47-51e9-4590-8a59-8313dd8fb018,[63.55],eam,1986--Foiles-S-M-Baskes-M-I-Daw-M-S--Cu,7991d330-58cd-43ac-bba9-ff6a58dcf617,active,[Cu],metal
6,False,atomic,[0.0],[Ni],1986--Foiles-S-M--Ni--LAMMPS--ipr1,8e9ae12d-5034-418a-a168-fb5499ecffcd,[58.71],eam,1986--Foiles-S-M-Baskes-M-I-Daw-M-S--Ni,15a085b9-d9d6-404f-9c2f-3ed40e8ff7a4,active,[Ni],metal
7,False,atomic,[0.0],[Pd],1986--Foiles-S-M--Pd--LAMMPS--ipr1,fa189ff3-4a27-4c36-b8b7-3821443a4edd,[106.4],eam,1986--Foiles-S-M-Baskes-M-I-Daw-M-S--Pd,16dde7ea-c8cf-4a23-95b3-494a2b252e9b,active,[Pd],metal
8,False,atomic,[0.0],[Pt],1986--Foiles-S-M--Pt--LAMMPS--ipr1,0a74d2a8-ec49-459c-903a-44bd3e50969a,[195.09],eam,1986--Foiles-S-M-Baskes-M-I-Daw-M-S--Pt,87657840-e5e6-4378-94ea-381a90608142,active,[Pt],metal
9,False,atomic,[0.0],[Ag],1987--Ackland-G-J--Ag--LAMMPS--ipr1,59a266da-922c-429c-8633-8d3a8de4cd70,[107.8682],eam/fs,1987--Ackland-G-J-Tichy-G-Vitek-V-Finnis-M-W--Ag,dc4149ce-3592-4131-8683-ecf654d5a519,active,[Ag],metal


## 4. Other database records

https://potentials.nist.gov contains other records than the three primary potentials-centric styles. The Database class has some basic methods supporting these other record types as well.

- __download_records()__ allowing all records of a given template (style) to be downloaded to the localpath.

- __get_record()__ allowing a single record of a given template to be retrieved by name either from localpath or the remote database.  The retrieved record is returned as a DataModelDict.DataModelDict object.

