# Database class 

This Notebook outlines the core behavior and design of the potentials.Database class.

Library imports

In [1]:
# Standard Python libraries
from pathlib import Path
import time

# https://github.com/lmhale99/potentials
import potentials
    
print('Notebook tested for potentials version', potentials.__version__)

Notebook tested for potentials version 0.3.6


## 1. Database initialization

Initializing a potentials.Database object defines the interaction settings for two database locations

- A local location, which by default is set to a directory called library within the current settings directory.
- A remote location, which by default is the CDCS database at https://potentials.nist.gov/.

The default values for most of the parameters below can be changed/set using the package's settings.  See the [2. Change default settings Notebook](2. Change default settings Notebook.ipynb) for more details.

Parameters for local.  

- __local__ (*bool, optional*) Indicates if the load operations will check for local records. Default value is controlled by settings.  If False, then the local interactions will not be set.
- __localpath__ (*str, optional*) The path to a directory where a local-style directory is to be found. This is an alias for local_host, with a local_style of "local" and is retained for backwards compatibility.
- __local_name__ (*str, optional*) The name assigned to a pre-defined database to use for the local interactions.  Cannot be given with local_style, local_host or local_terms.
- __local_style__ (*str, optional*) The database style to use for the local interactions.
- __local_host__ (*str, optional*) The URL/file path where the local database is hosted.
- __local_terms__ (*dict, optional*) Any other keyword parameters defining necessary access/settings information for using the local database.  Allowed keywords are database style-specific.

Parameters for remote.

- __remote__ (*bool, optional*) Indicates if the load operations will check for remote records. Default value is controlled by settings.  If False, then the remote interactions will not be set.
- __remote_name__ (*str, optional*) The name assigned to a pre-defined database to use for the remote interactions.  Cannot be given with remote_style, remote_host or remote_terms.
- __remote_style__ (*str, optional*) The database style to use for the remote interactions.
- __remote_host__ (*str, optional*) The URL/file path where the remote database is hosted.
- __remote_terms__ (*dict, optional*) Any other keyword parameters defining necessary access/settings information for using the remote database.  Allowed keywords are database style-specific.
            
Parameters for identifying OpenKIM models.  See the [5.3. openKIM models Notebook](5.3. openKIM models Notebook.ipynb) for more details.

- __kim_models__ (*str or list, optional*) Allows for the list of installed_kim_models to be explicitly given. Cannot be given with the other kim parameters.
- __kim_api_directory__ (*path-like object, optional*) The directory containing the kim api to use to build the list. Cannot be given with the other kim parameters.
- __kim_models_file__ (*path-like object, optional*) The path to a whitespace-delimited file listing full kim ids. Cannot be given with the other kim parameters.


In [2]:
potdb = potentials.Database()

## 2. Checking and changing the database interactions

The interactions to the local and remote locations are handled by the local_database and remote_database attributes of the potentials.Database object.  These will depend on the settings given during the potentials.Database initialization, or the default values saved to the package settings if not given during the class initialization. 

In [3]:
print(potdb.local_database)
print(potdb.remote_database)

database style local at C:\Users\lmh1\Documents\library
database style cdcs at https://potentials.nist.gov/


Note that if local or remote is False during initialization, then the corresponding database interaction object will be None.  

If you wish to change the settings for local or remote for an already existing potentials.Database object, you can use the set_local_database() and set_remote_database() methods.  Each one of these accepts the associated parameters listed above for the given interaction (except for the boolean local and remote terms).  

In [4]:
# Change local to previously saved database settings named "master"
potdb.set_local_database(name='master')
print(potdb.local_database)

# Change it back to the default
potdb.set_local_database()
print(potdb.local_database)

database style mongo at localhost:27017.iprPy
database style local at C:\Users\lmh1\Documents\library


## 3. Download all

All records in the NIST Interatomic Potentials Repository that the potentials package interacts with can be downloaded using the download_all() method.  This provides a simple means of copying all remote entries to the local allowing for future offline usage of the repository.

- __status__ (*str, list or None, optional*) Only potential_LAMMPS records with the given status(es) will be downloaded.  Allowed values are 'active' , 'superseded', and 'retracted'. If None (default) is given, then all potentials will be downloaded.
- __downloadfiles__ (*bool, optional*) If True, the parameter files associated with the potential_LAMMPS record will also be downloaded.
- __overwrite__ (*bool, optional*) Flag indicating if any existing local records with names matching remote records are updated (True) or left unchanged (False).  Default value is False.
- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.

As an alternative, all records in the NIST database can be downloaded all at once from the https://github.com/lmhale99/potentials-library githib repository.  This provides version control handling of the records, as well as also including structure-based records used by atomman.


In [5]:
# Set a different localpath to test that records are downloaded
localpath = Path('C:/Users/lmh1/Documents/testlibrary')

potdb = potentials.Database(localpath=localpath)

In [7]:
s = time.time()
potdb.download_all()
e = time.time()
print(f'Took {(e-s)/60} minutes')

 44%|███████████████████████████████████▏                                            | 170/387 [03:51<05:04,  1.40s/it]

KeyboardInterrupt: 

Note that not specifying overwrite=True will skip over any records that already exist, leading to faster subsequent calls.

In [None]:
s = time.time()
potdb.download_all()
e = time.time()
print(f'Took {(e-s)/60} minutes')

## 4. General record interaction methods

The Database class also defines common generic interactions that work with all of the supported record styles.

__NOTE__: It is highly recommended for most users to use the record-specific variations of these methods rather than the generic ones.  The record-specific variations add functionality and more descriptive parameter information.


### 4.1. get_records

Retrieves all matching records from the local and/or remote locations.  If records with the same record name are retrieved from both locations, then the local versions of those records are given.

- __style__ (*str, optional*) The record style to search. If not given, a prompt will ask for it.
- __name__ (*str or list, optional*) The name(s) of records to limit the search by.
- __local__ (*bool, optional*) Indicates if the local location is to be searched.  Default value matches the value set when the database was initialized.
- __remote__ (*bool, optional*) Indicates if the remote location is to be searched.  Default value matches the value set when the database was initialized.
- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.
- __refresh_cache__ (*bool, optional*) If the local database is of style "local", indicates if the metadata cache file is to be refreshed.  If False, metadata for new records will be added but the old record metadata fields will not be updated.  If True, then the metadata for all records will be regenerated, which is needed to update the metadata for modified records.
- __return_df__ (*bool, optional*) If True, then the corresponding pandas.Dataframe of metadata will also be returned.
- __\*\*kwargs__ (*any, optional*) Any extra keyword arguments supported by the record style.

### 4.2. get_record

Retrieves a single matching record from the local and/or remote locations. If local is True and the record is found there, then the local copy of the record is returned without searching the remote.

- __style__ (*str, optional*) The record style to search. If not given, a prompt will ask for it.
- __name__ (*str or list, optional*) The name(s) of records to limit the search by.
- __local__ (*bool, optional*) Indicates if the local location is to be searched.  Default value matches the value set when the database was initialized.
- __remote__ (*bool, optional*) Indicates if the remote location is to be searched.  Default value matches the value set when the database was initialized.
- __prompt__ (*bool, optional*) If prompt=True (default) then a screen input will ask for a selection if multiple matching potentials are found.  If prompt=False, then an error will be thrown if multiple matches are found.
- __promptfxn__ (*function, optional*) A function that generates the prompt selection list.  If not given, the prompt will be a list of "id" values. 
- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.
- __refresh_cache__ (*bool, optional*) If the local database is of style "local", indicates if the metadata cache file is to be refreshed.  If False, metadata for new records will be added but the old record metadata fields will not be updated.  If True, then the metadata for all records will be regenerated, which is needed to update the metadata for modified records.
- __\*\*kwargs__ (*any, optional*) Any extra keyword arguments supported by the record style.

### 4.3. remote_query

Allows for custom Mongo-style or keyword search queries to be performed on records from the remote database.

- __style__ (*str, optional*) The record style to search. If not given, a prompt will ask for it.
- __name__ (*str or list, optional*) The name(s) of records to limit the search by.
- __return_df__ (*bool, optional*) If True, then the corresponding pandas.Dataframe of metadata will also be returned.
- __query__ (*dict, optional*) A custom-built CDCS Mongo-style query to use for the record search. Cannot be given with keyword.
- __keyword__ (*str, optional*) Allows for a search of records whose contents contain a keyword. Cannot be given with query.

### 4.4. download_records

Retrieves all matching records from the remote location and saves them to the local location.

- __style__ (*str, optional*) The record style to search. If not given, a prompt will ask for it.
- __name__ (*str or list, optional*) The name(s) of records to limit the search by.
- __overwrite__ (*bool, optional*) Flag indicating if any existing local records with names matching remote records are updated (True) or left unchanged (False).  Default value is False.
- __query__ (*dict, optional*) A custom-built CDCS-style query to use for the record search. Alternative to passing in the record-specific metadata kwargs. Note that name can be given with query.
- __keyword__ (*str, optional*) Allows for a search of records whose contents contain a keyword. Alternative to giving query or kwargs.
- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.
- __return_records__ (*bool, optional*) If True, the retrieved record objects are also returned.  Default value is False.
- __\*\*kwargs__ (*any, optional*) Any extra keyword arguments supported by the record style.

### 4.5. save_record

Saves a record to the local database.
    
- __record__ (*Record, optional*) The record to save.  If not given, then style and model are required.
- __style__ (*str, optional*) The record style to save.  Required if record is not given.
- __model__ (*str, DataModelDict, or file-like object, optional*) The contents of the record to save.  Required if record is not given.
- __name__ (*str, optional*) The name to assign to the record.  Required if record is not given and model is not a file name.
- __overwrite__ (*bool, optional*) Indicates what to do when a matching record is found in the remote location.  If False (default), then the record is not updated.  If True, then the record is updated.
- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.

### 4.6. upload_record

Uploads a record to the remote database.  Requires an account for the remote location with write permissions.

- __record__ (*Record, optional*) The record to upload.  If not given, then style and model are required.
- __style__ (*str, optional*) The record style to upload.  Required if record is not given.
- __model__ (*str, DataModelDict, or file-like object, optional*) The contents of the record to upload.  Required if record is not given.
- __name__ (*str, optional*) The name to assign to the record.  Required if record is not given and model is not a file name.
- __workspace__ (*str, optional*) The workspace to assign the record to. If not given, no workspace will be assigned (only accessible to user who submitted it).
- __overwrite__ (*bool, optional*) Indicates what to do when a matching record is found in the remote location.  If False (default), then the record is not updated.  If True, then the record is updated.
- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.

### 4.7. delete_record

Deletes a record from the local and/or remote locations.  

- __record__ (*Record, optional*) The record to delete.  If not given, then style and name are required.
- __style__ (*str, optional*) The style of the record to delete.  Required if record is not given.
- __name__ (*str, optional*) The name of the record to delete.  Required if record is not given.
- __local__ (*bool, optional*) Indicates if the record will be deleted from the local location. Default value is True.
- __remote__ (*bool, optional*) Indicates if the record will be deleted from the remote location. Default value is False.  If True, requires an account for the remote location with write permissions.
- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.