# CDCS Data Management

This Notebook details the functions and interactions for managing data in a CDCS instance.  Primarily, this means having an account on the database allowing you to log in to create and modify records. 

Notes on CDCS design:

- Each record is assigned to a user and can be assigned to a workspace.
- Any records without a workspace are only accessible to the assigned user.
- A workspace serves as a group of records that can be made accessible to specific users.
- Each CDCS instance has a "Global Public Workspace" that is meant for all users to be able to see.

In [1]:
from pathlib import Path

import cdcs
from cdcs import CDCS

print('Notebook executed for cdcs version', cdcs.__version__)

Notebook executed for cdcs version 0.2.0


## 1. Class initialization

A CDCS client manager can be initialized by passing it the host url and authenication information.

Parameters

- __host__: (*str*) URL for the database's server.
- __username__: (*str, optional*) Username of desired account on the server. A prompt will ask for the username if not given.
- __password__: (*str, optional*) Password of desired account on the server.  A prompt will ask for the password if not given.
- __auth__: (*tuple, optional*) Auth tuple to enable Basic/Digest/Custom HTTP Auth.  Alternative to giving username and password seperately.
- __cert__: (*str, optional*) if String, path to ssl client cert file (.pem). If Tuple, (‘cert’, ‘key’) pair.
- __certification__: (*str, optional*) Alias for cert. Retained for compatibility.
- __verify__: (*bool or str, optional*) Either a boolean, in which case it controls whether we verify the server’s TLS certificate, or a string, in which case it must be a path to a CA bundle to use. Defaults to True.
- __cdcsversion__: (*str, optional*) For CDCS versions 2.X.X, this allows for specifying the full CDCS version to ensure the class methods perform the correct REST calls.  This can be specified as "#.#.#", or if None is given will default to "2.15.0".  For CDCS versions 3.X.X, this is ignored as version info is obtained directly from the database.

In [2]:
curator_v2 = CDCS('https://potentials.nist.gov/', username='lmh1', cdcsversion='2.21.0')
print(curator_v2.cdcsversion)

Enter password for lmh1 @ https://potentials.nist.gov:········
(2, 21, 0)


In [3]:
curator_v3 = CDCS('https://test-potentials.nist.gov/', username='lmh1', verify=False)
print(curator_v3.cdcsversion)

Enter password for lmh1 @ https://test-potentials.nist.gov:········
(3, 0, 1)


In [4]:
curator = curator_v3

## 2. Query data

The query() method will return *all* matching records that you have access to.

Parameters

- __template__: (*list, str, pandas.Series or pandas.DataFrame, optional*) One or more templates or template titles to limit the search by.
- __title__: (*str, optional*) Record title to limit the search by.
- __keyword__: (*str or list, optional*) Keyword(s) to use for a string-based search of record content.  Only records containing all keywords will be returned. 
- __mongoquery__: (*str or dict, optional*) Mongodb find query to use in limiting searches by record element fields.  Note: only record parsing is supported, not field projection.
- __page__: (*int or None, optional*) If an int, then will return results only for that page of 10 records.  If None (default), then results for all pages will be compiled and returned.
- __parse_dates__: (*bool, optional*) If True (default) then date fields will automatically be parsed into pandas.Timestamp objects.  If False they will be left as str values.
- __progress_bar__: (*bool, optional*) If True (default) a progress bar will be displayed for multi-page query results.

Returns

- (*pandas.DataFrame*) All records matching the search request

Specify a template in the database to interact with

In [5]:
# Note: template should the name of a template in the database you are accessing!
template = 'FAQ'

Use query to fetch records

In [6]:
records = curator.query(template=template)
records

Unnamed: 0,id,template,workspace,user_id,title,xml_content,creation_date,last_modification_date,last_change_date,template_title
0,3497,11,1,5,graphs,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",2021-08-26 13:15:59.839000+00:00,2021-08-26 13:15:59.839000+00:00,2021-08-26 13:16:00.155000+00:00,FAQ
1,3498,11,1,5,lammps,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",2021-08-26 13:15:59.365000+00:00,2021-08-26 13:15:59.365000+00:00,2021-08-26 13:15:59.674000+00:00,FAQ
2,3500,11,1,5,manuscript,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",2021-08-26 13:15:58.878000+00:00,2021-08-26 13:15:58.878000+00:00,2021-08-26 13:15:59.197000+00:00,FAQ
3,3494,11,1,5,ref,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",2021-08-26 13:15:58.407000+00:00,2021-08-26 13:15:58.407000+00:00,2021-08-26 13:15:58.715000+00:00,FAQ
4,3496,11,1,5,formats,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",2021-08-26 13:15:57.919000+00:00,2021-08-26 13:15:57.919000+00:00,2021-08-26 13:15:58.236000+00:00,FAQ
5,3495,11,1,5,faq,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",2021-08-26 13:15:57.438000+00:00,2021-08-26 13:15:57.438000+00:00,2021-08-26 13:15:57.750000+00:00,FAQ
6,3499,11,1,5,submit,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",2021-08-26 13:15:56.939000+00:00,2021-08-26 13:15:56.939000+00:00,2021-08-26 13:15:57.264000+00:00,FAQ


Pick the first record, and see its xml contents

In [7]:
record = records.iloc[0]
content = record.xml_content
print(content)

<faq  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ><question>I've downloaded a file and plotted it, but the graphs don't look like the figures in the paper.  Why?</question><answer><![CDATA[There can be several reasons for this.  One is file formatting.   Different developers and software packages use different data and file formats to present their interatomic potentials.</p>  <p>Even with the same file/data format, invariant transformations in the EAM format mean that different parameterizations of the interatomic potentials can look completely different but yield the same physical properties.  This is especially true for alloys.  The issue of invariant transformations is discussed in several places.  Among them are: </p> <UL> <LI>Y. Mishin, "Interatomic potentials for metals," in <em>Handbook of Materials Modeling</em>, edited by S. Yip (Springer, Dordrect, The Netherlands, 2005), Chap. 2.2, pp. 459-478. <LI>A.E. Carlsson, "Beyond pair potentials in elemental transition me

## 3. Manage records

The methods associated with managing records requires having an account with the corresponding database and the correct privileges.

### 3.1. Upload a new record

New records can be uploaded to the database using the upload_record() method.

Parameters

- **template** (*str or pandas.Series*) The template or template title to associate with the record.
- **filename** (*str, optional*) Name of an XML file whose contents are to be uploaded.  Either filename or content required.
- **content** (*str or bytes, optional*) String content to upload. Either filename or content required.
- **title** (*str, optional*) Title to save the record as.  Optional if filename is given (title will be taken as filename without ext).
- **workspace** (*str or pandas.Series, optional*) If given, the record will be assigned to this workspace after successfully being uploaded.
- **duplicatecheck** (*bool, optional*) If True (default), then a ValueError will be raised if a record already exists with the same template and title.  If False, no check is performed possibly allowing for multiple records with the same title to exist in the database.  Note: this check only searches the records that you have access to, so duplicates are possible from other users.
- **verbose** (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

Use the content from the record queried above to upload a "test" record.

In [8]:
title = 'testrecord1'
curator.upload_record(template=template, title=title, content=content, verbose=True)

record testrecord1 (4884) successfully uploaded.


Alternatively, the names of any local XML files can be specified and the method will automatically read the contents.  The file name without path and extension will be used for the title if no title is given.

Also, note that the workspace is being set to 'Global Public Workspace'.

In [9]:
# Save content to local file
filename = 'testrecord2.xml'
with open(filename, 'w') as f:
    f.write(content)
    
# Upload from file
curator.upload_record(template=template, filename=filename, verbose=True,
                      workspace='Global Public Workspace')

# Delete local file (keep working directory clean)
Path(filename).unlink()

record testrecord2 (4885) successfully uploaded.
record 4885 assigned to workspace 1


### 3.2. Retrieve data records

The records that you own can be accessed using the get_records() and get_record() methods. get_records() will fetch all matching records, while get_record() will fetch a single record if exactly one match is found and will throw an error otherwise.

__NOTE__: The behavior of get_records() is different for CDCS versions 2.X.X and 3.X.X.  For versions 2.X.X, get_records will attempt to return all matching records at once.  For versions 3.X.X, get_records behaves similiarly to query in that matching records are returned in page batches of ten.

Parameters

- __template__ (*str or pandas.Series, optional*) The template or template title to limit the search by.
- __title__ (*str, optional*) The data record title to limit the search by.
- __page__ (*int or None, optional*) If an int, then will return results only for that page of 10 records. If None (default), then results for all pages will be compiled and returned.  Only used for CDCS versions 3.X.X. 
- __parse_dates__ (*bool, optional*) If True (default) then date fields will automatically be parsed into pandas.Timestamp objects.  If False they will be left as str values.
- __progress_bar__ (*bool, optional*) If True (default) a progress bar will be displayed for multi-page query results. Only used for CDCS versions 3.X.X.


In [10]:
curator.get_records(template=template, parse_dates=False)

Unnamed: 0,id,template,workspace,user_id,title,xml_content,checksum,creation_date,last_modification_date,last_change_date
0,4885,11,1.0,5,testrecord2,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",,2023-01-19T20:51:36.364217Z,2023-01-19T20:51:36.364217Z,2023-01-19T20:51:36.943279Z
1,4884,11,,5,testrecord1,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",,2023-01-19T20:51:35.236297Z,2023-01-19T20:51:35.236297Z,2023-01-19T20:51:35.236297Z
2,3497,11,1.0,5,graphs,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",,2021-08-26T13:15:59.839000Z,2021-08-26T13:15:59.839000Z,2021-08-26T13:16:00.155000Z
3,3498,11,1.0,5,lammps,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",,2021-08-26T13:15:59.365000Z,2021-08-26T13:15:59.365000Z,2021-08-26T13:15:59.674000Z
4,3500,11,1.0,5,manuscript,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",,2021-08-26T13:15:58.878000Z,2021-08-26T13:15:58.878000Z,2021-08-26T13:15:59.197000Z
5,3494,11,1.0,5,ref,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",,2021-08-26T13:15:58.407000Z,2021-08-26T13:15:58.407000Z,2021-08-26T13:15:58.715000Z
6,3496,11,1.0,5,formats,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",,2021-08-26T13:15:57.919000Z,2021-08-26T13:15:57.919000Z,2021-08-26T13:15:58.236000Z
7,3495,11,1.0,5,faq,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",,2021-08-26T13:15:57.438000Z,2021-08-26T13:15:57.438000Z,2021-08-26T13:15:57.750000Z
8,3499,11,1.0,5,submit,"<faq xmlns:xsi=""http://www.w3.org/2001/XMLSch...",,2021-08-26T13:15:56.939000Z,2021-08-26T13:15:56.939000Z,2021-08-26T13:15:57.264000Z


In [11]:
record = curator.get_record(template=template, title='testrecord2', parse_dates=False)
print(record)

id                                                                     4885
template                                                                 11
workspace                                                                 1
user_id                                                                   5
title                                                           testrecord2
xml_content               <faq  xmlns:xsi="http://www.w3.org/2001/XMLSch...
checksum                                                               None
creation_date                                   2023-01-19T20:51:36.364217Z
last_modification_date                          2023-01-19T20:51:36.364217Z
last_change_date                                2023-01-19T20:51:36.943279Z
Name: 0, dtype: object


In [12]:
print(record.xml_content)

<faq  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ><question>I've downloaded a file and plotted it, but the graphs don't look like the figures in the paper.  Why?</question><answer><![CDATA[There can be several reasons for this.  One is file formatting.   Different developers and software packages use different data and file formats to present their interatomic potentials.</p>  <p>Even with the same file/data format, invariant transformations in the EAM format mean that different parameterizations of the interatomic potentials can look completely different but yield the same physical properties.  This is especially true for alloys.  The issue of invariant transformations is discussed in several places.  Among them are: </p> <UL> <LI>Y. Mishin, "Interatomic potentials for metals," in <em>Handbook of Materials Modeling</em>, edited by S. Yip (Springer, Dordrect, The Netherlands, 2005), Chap. 2.2, pp. 459-478. <LI>A.E. Carlsson, "Beyond pair potentials in elemental transition me

### 3.3. Update an existing record

The content of a record in the database can be changed using the update_record() method.  Note that this only changes the record's content and all other metadata (user, workspace, database id,...) will remain unchanged.

- **record** (*pandas.Series, optional*) A previously identified record to delete.  As this uniquely defines a record, the template and title parameters are ignored if given. Can contain the new conte
- **template** (*str or pandas.Series, optional*) The template or template title associated with the record.  template + title values must uniquely identify one record.
- **title** (*str, optional*) Title of the record to delete.  template + title values must uniquely identify one record.
- **filename** (*str or Path, optional*) Path to file containing the new record content to upload. Either filename or content required.
- **content** (*str or bytes, optional*) New content to upload. Either filename or content required.
- **workspace** (*str or pandas.Series, optional*) If given, the record will be assigned to this workspace after successfully being updated.
- **verbose** (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

Change the content of testrecord2 by replacing "download" with "find"

In [13]:
new_content = record.xml_content.replace('download', 'find')
print(new_content)

<faq  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ><question>I've finded a file and plotted it, but the graphs don't look like the figures in the paper.  Why?</question><answer><![CDATA[There can be several reasons for this.  One is file formatting.   Different developers and software packages use different data and file formats to present their interatomic potentials.</p>  <p>Even with the same file/data format, invariant transformations in the EAM format mean that different parameterizations of the interatomic potentials can look completely different but yield the same physical properties.  This is especially true for alloys.  The issue of invariant transformations is discussed in several places.  Among them are: </p> <UL> <LI>Y. Mishin, "Interatomic potentials for metals," in <em>Handbook of Materials Modeling</em>, edited by S. Yip (Springer, Dordrect, The Netherlands, 2005), Chap. 2.2, pp. 459-478. <LI>A.E. Carlsson, "Beyond pair potentials in elemental transition metals

In [14]:
curator.update_record(record=record, content=new_content, verbose=True)

record testrecord2 (4885) has been updated.


Retrieve the record again from the database showing nothing has changed except modification date and content

In [15]:
record = curator.get_record(template=template, title='testrecord2')
print(record)

id                                                                     4885
template                                                                 11
workspace                                                                 1
user_id                                                                   5
title                                                           testrecord2
xml_content               <faq  xmlns:xsi="http://www.w3.org/2001/XMLSch...
checksum                                                               None
creation_date                              2023-01-19 20:51:36.364217+00:00
last_modification_date                     2023-01-19 20:51:39.051503+00:00
last_change_date                           2023-01-19 20:51:39.057506+00:00
Name: 0, dtype: object


In [16]:
print(record.xml_content)

<faq  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ><question>I've finded a file and plotted it, but the graphs don't look like the figures in the paper.  Why?</question><answer><![CDATA[There can be several reasons for this.  One is file formatting.   Different developers and software packages use different data and file formats to present their interatomic potentials.</p>  <p>Even with the same file/data format, invariant transformations in the EAM format mean that different parameterizations of the interatomic potentials can look completely different but yield the same physical properties.  This is especially true for alloys.  The issue of invariant transformations is discussed in several places.  Among them are: </p> <UL> <LI>Y. Mishin, "Interatomic potentials for metals," in <em>Handbook of Materials Modeling</em>, edited by S. Yip (Springer, Dordrect, The Netherlands, 2005), Chap. 2.2, pp. 459-478. <LI>A.E. Carlsson, "Beyond pair potentials in elemental transition metals

### 3.4. Assign records to a workspace

The upload_record() and update_record() methods give the option to assign each record to a workspace if wanted.  After a record is added, the workspace can also be changed using the assign_records() method.

Parameters

- __workspace__ (*str or pandas.Series*) The workspace or workspace title to assign the records to.
- __records__ (*pandas.Series or pandas.DataFrame, optional*) Pre-selected records to assign to the workspace.  Cannot be given with ids, template, or title.
- __ids__ (*str or list, optional*) The ID(s) of the records to assign to the workspace.  Selecting records using ids has the least overhead. Cannot be given with records, template, or title.
- __template__ (*str or pandas.Series, optional*) The template or template title of records to assign to the workspace.  Cannot be given with records or ids.
- __title__ (*str, optional*) The title of a record to assign to the workspace. Cannot be given with records or ids.
- __verbose__ (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

In [17]:
curator.assign_records(workspace='Global Public Workspace', title='testrecord1', template=template, verbose=True)

record 4884 assigned to workspace 1


### 3.5. Delete a record

A record can also be deleted by the assigned user using the delete_record() method.

Parameters

- __record__ (*pandas.Series, optional*) A previously identified record to delete.  As this uniquely defines a record, the other parameters are ignored.
- __template__ (*str or pandas.Series, optional*) The template or template title associated with the record.  template + title values must uniquely identify one record.
- __title__ (*str, optional*) Title of the record to delete.  template + title values must uniquely identify one record.
- __verbose__ (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

In [18]:
# Delete by identifying with title + template
curator.delete_record(title='testrecord1', template=template, verbose=True)

record testrecord1 (4884) has been deleted.


In [19]:
# Get a record first
record = curator.get_record(title='testrecord2', template=template)

# Delete by passing Series
curator.delete_record(record, verbose=True)

record testrecord2 (4885) has been deleted.


## 4. Manage blobs (raw files)

The database can also store any non-record files as blob files that can later be retrieved and downloaded all at once.

Create blob file for testing

In [20]:
filename = 'test_blob.txt'

with open(filename, 'w') as f:
    f.write('This is my blob for testing')

### 4.1. Upload blob

Blobs can be added to the database using the upload_blob() method.  When a record is uploaded, it will be associated with the given filename (minus the path).

Parameters

- __filename__ (*str or Path*) The path/name of the file to upload.
- __blobbytes__ (*bytesIO, optional*) Pre-loaded file contents.  Allows for the contents of open file-like objects to be passed in and associated with the filename.  If not given, then the file filename will be read instead.
- __workspace__ (*str or pandas.Series, optional*) If given, the blob will be assigned to this workspace after successfully being uploaded.
- __verbose__ (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

Returns

- __handle__ (*str*) The URL that can be used to retrieve the blob's contents.

In [21]:
handle = curator.upload_blob(filename=filename, verbose=True)
print(handle)

File "test_blob.txt" uploaded as blob "test_blob.txt" (3)
https://test-potentials.nist.gov/rest/blob/download/3/


### 4.2 Find blob metadata

The list of all blobs currently in the database, or all with an associated filename can be explored with the get_blobs() method.

Parameters

- __filename__ (*str, optional*) The name of the file to limit the search by.

A single blob can also be retrieved using the get_blob() method based on a filename, if it is unique in the database, or a database id.

Parameters

- __id__ (*str, optional*) The unique ID associated with the blob. Cannot be combined with filename
- __filename__ (*str, optional*) The name of the file to limit the search by, which must be unique.  Cannot be combined with id.

__NOTE__: Exploring blob metadata appears to behave differently for some versions of CDCS, with the filename searches only showing blobs that the user owns, while id searches shows any blobs available to the user, either by being owned or in an available workspace.

- Identifying by filename only works for user blobs
- Identifying by id works for all blobs available to you (user + allowed workspaces)

In [22]:
curator.get_blobs()

Unnamed: 0,id,user_id,filename,handle,blob,checksum,upload_date,pid
0,3,5,test_blob.txt,https://test-potentials.nist.gov/rest/blob/dow...,https://test-potentials.nist.gov/rest/blob/use...,,2023-01-19 20:51:44.108819+00:00,


In [23]:
blobdata = curator.get_blob(filename=filename)
print(blobdata)

id                                                             3
user_id                                                        5
filename                                           test_blob.txt
handle         https://test-potentials.nist.gov/rest/blob/dow...
blob           https://test-potentials.nist.gov/rest/blob/use...
checksum                                                    None
upload_date                     2023-01-19 20:51:44.108819+00:00
pid                                                         None
Name: 0, dtype: object


In [24]:
curator.get_blob(id=blobdata.id)

id                                                             3
user_id                                                        5
filename                                           test_blob.txt
handle         https://test-potentials.nist.gov/rest/blob/dow...
blob           https://test-potentials.nist.gov/rest/blob/3/u...
checksum                                                    None
upload_date                     2023-01-19 20:51:44.108819+00:00
pid                                                         None
dtype: object

### 4.3 Retrieve blob contents

The contents of a blob can then be retrieved using the get_blob_contents() method.

Paramters

- __blob__ (*pandas.Series, optional*) The blob metadata for a blob, as recieved from get_blobs() or get_blob().  Cannot be combined with id or filename.
- __id__ (*str, optional*) The unique ID associated with the blob. Cannot be combined with blob or filename.
- __filename__ (*str, optional*) The name of the file to limit the search by, which must be unique.  Cannot be combined with blob or id.

Returns

- __content__ (*bytes*) The blob's bytes contents.

In [25]:
print(curator.get_blob_contents(filename=filename))

b'This is my blob for testing'


Alternatively, the blob can be saved directly to a local file based on its associated filename with the download_blob() method.

Parameters

- __blob__ (*pandas.Series, optional*) The blob metadata for a blob, as recieved from get_blobs() or get_blob().  Cannot be combined with id or filename.
- __id__ (*str, optional*) The unique ID associated with the blob. Cannot be combined with blob or filename.
- __filename__ (*str, optional*) The name of the file to limit the search by, which must be unique.  Cannot be combined with blob or id.
- __savedir__ (*str or Path, optional*) The directory to save the file to.  Default value uses the current working directory.

In [26]:
curator.download_blob(filename=filename)

### 4.4. Assign blobs to a workspace

Just like records, blobs can also be assigned to workspaces.  This can be done during uploading, or it can be done after uploading using the assign_blobs() method.

- __workspace__ (*str or pandas.Series*) The workspace or workspace title to assign the blobs to.
- __blobs__ (*pandas.Series or pandas.DataFrame, optional*) Pre-selected blobs to assign to the workspace.  Cannot be given with ids or filename.
- __ids__ (*str or list, optional*) The ID(s) of the blobs to assign to the workspace.  Selecting blobs using ids has the least overhead. Cannot be given with blobs or filename.
- __filename__ (*str, optional*) The name of the blob file to assign to the workspace.  Cannot be given with blobs or ids.
- __verbose__ (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

In [27]:
curator.assign_blobs(workspace='Global Public Workspace', filename=filename, verbose=True)

blob 3 assigned to workspace 1


### 4.5. Delete blob

Finally, blobs can also be deleted using the delete_blob() method.

- __blob__ (*pandas.Series, optional*) The blob metadata for a blob, as recieved from get_blobs() or get_blob().  Cannot be combined with id or filename.
- __id__ (*str, optional*) The unique ID associated with the blob. Cannot be combined with blob or filename.
- __filename__ (*str, optional*) The name of the file to limit the search by, which must be unique.  Cannot be combined with blob or id.
- __verbose__ (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

In [28]:
curator.delete_blob(filename=filename, verbose=True)

Successfully deleted blob "test_blob.txt" (3)


In [29]:
Path(filename).unlink()

## 5. Managing workspaces

Workspaces are used to manage access to content stored in a CDCS database. Records and blobs can be assigned to a workspace, then access control is set at the workspace level to all associated content.  The workspace assignment typically falls into three categories:

1. Content not assigned to a workspace will only be accessible to the user who created it.
2. Content assigned to the "Global Public Workspace" can be accessible to anyone who has read access to the database.
3. Alternate workspaces can be used to limit access to some content for only specific users.

### 5.1. Finding workspace metadata

Metadata for the available workspaces can be explored using the get_workspaces() and get_workspace() methods.

Parameters

title: (str, optional) The workspace title to limit the search by.

In [30]:
curator.get_workspaces()

Unnamed: 0,id,title,owner,is_public
0,1,Global Public Workspace,,True


In [31]:
curator.get_workspace(title='Global Public Workspace')

id                                 1
title        Global Public Workspace
owner                           None
is_public                       True
Name: 0, dtype: object

The attribute global_workspace also retrieves the Global Public Workspace 

In [32]:
workspace = curator.global_workspace
print(workspace)

id                                 1
title        Global Public Workspace
owner                           None
is_public                       True
Name: 0, dtype: object


## 6. Managing templates

Templates are used to categorize the types of records that a database stores and to validate uploaded record contents against a schema.  Templates are managed differently than other types of CDCS content in that they are version controlled; changes to template contents do not overwrite the current contents but rather get uploaded as a new template version.  Separate entries called template managers then specify which template versions exist, are active/disabled, and which version is the current version. 

### 6.1. Finding template metadata

A list of all available templates can be retrieved using the template_titles attribute.

In [33]:
curator.template_titles

['Action',
 'calculation_bond_angle_scan',
 'calculation_diatom_scan',
 'stacking_fault',
 'Request',
 'PotentialProperties',
 'potential_LAMMPS_KIM',
 'potential_LAMMPS',
 'Potential',
 'free_surface',
 'FAQ',
 'crystal_prototype',
 'Citation',
 'calculation_isolated_atom',
 'relaxed_crystal']

Metadata for the available templates can be explored using the get_templates() and get_template() methods.

Parameters

- __title__ (*str, optional*): The template title to limit the search by.
- __is_disabled__ (*bool, optional*): If True, then disabled templates will be returned.  If False (default), then active templates will be returned.
- __current__ (bool, optional*): If True (default), only current template versions will be returned.
- __useronly__ (*bool, optional*): If True, only a user's templates are returned. If False (default), then all global templates are returned.

In [34]:
curator.get_templates()

Unnamed: 0,id,user,filename,checksum,content,hash,dependencies,title
0,1,,record-interatomic-potential-action.xsd,,"<?xml version=""1.0"" encoding=""UTF-8"" standalon...",f1c3de8a64553d81bd10cfc0cb24dd3cc129c710,[],Action
1,2,,record-calculation-bond-angle-scan.xsd,,"<?xml version=""1.0"" encoding=""UTF-8"" standalon...",16a76faca187dea47c6089707cd377eb4163ac7b,[],calculation_bond_angle_scan
2,3,,record-calculation-diatom-scan.xsd,,"<?xml version=""1.0"" encoding=""UTF-8"" standalon...",46ce4c17314f9aeb3f5bb14642baf42e3da3d1e2,[],calculation_diatom_scan
3,4,,record-stacking-fault.xsd,,"<?xml version=""1.0""?>\n<xsd:schema xmlns:xsd=""...",aaa696ec7723f68e5be8280c8cafab69e08cf3f3,[],stacking_fault
4,5,,Request3,,"<xsd:schema xmlns:xsd=""http://www.w3.org/2001/...",d108bacceb44a2789544ade71018b7f5ceacca1b,[],Request
5,6,,record-per-potential-properties.xsd,,"<?xml version=""1.0"" encoding=""UTF-8"" standalon...",ea7eeed57e0cea9b15dbda4b32c08266201a7e5a,[],PotentialProperties
6,7,,potential-LAMMPS-KIM.xsd,,"<?xml version=""1.0"" encoding=""UTF-8"" standalon...",1e331780cdb7d3382474e91844e5f4b95b12f1fa,[],potential_LAMMPS_KIM
7,8,,potential_LAMMPS.xsd,,"<?xml version=""1.0""?>\r\n<xsd:schema xmlns:xsd...",2881ca54c5f3fe29ee36b06308f98e9901e8c8fd,[],potential_LAMMPS
8,9,,Potential4.xsd,,"<?xml version=""1.0"" encoding=""UTF-8"" standalon...",c35b164c2947bdab2c6dd8566f84584a4a3eef7f,[],Potential
9,10,,record-free-surface.xsd,,"<?xml version=""1.0""?>\r\n<xsd:schema xmlns:xsd...",8d2ff690d7e5558b851769f75bad8d90af47102d,[],free_surface


In [35]:
curator.get_template('Citation')

id                                                             13
user                                                         None
filename                record-interatomic-potential-citation.xsd
checksum                                                     None
content         <?xml version="1.0" encoding="UTF-8" standalon...
hash                     893099854c8ac392bad133ce71ac34f60407ab6f
dependencies                                                   []
title                                                    Citation
Name: 0, dtype: object

The get_template_managers() method fetches metadata associated with the various versions of templates in the database. 

Parameters

- __title__ (*str, optional*) The template title to limit the search by.
- __is_disabled__ (*bool, optional*) If True, then disabled templates will be returned.  If False (default), then active templates will bereturned.
- __useronly__ (*bool, optional*) If True, only a user's templates are returned. If False (default), then all global templates arereturned.

In [36]:
curator.get_template_managers('Citation')

Unnamed: 0,id,versions,current,disabled_versions,title,user,is_disabled,_cls,creation_date
0,13,[13],13,[],Citation,,False,VersionManager.TemplateVersionManager,2022-11-10T15:01:36.345654Z


### 6.2. Adding and modifying templates

__NOTE__: The upload and update operations for templates differ from other types of CDCS contents due to the template version control system.  Notably, the content associated with an uploaded template cannot be modified or replaced, simply new versions uploaded and managed.

New templates can be added using upload_template(). This will create a template manager for the template title as well as the first version of that template.

Parameters

- __filename__ (*str, optional*) Name of the XSD schema file to upload for the template.  Optional if title is given (filename will be taken as "title".xsd).
- __content__ (*str or bytes, optional*) String contents of an XSD schema file to upload for the template.  Optional if filename is given as a full path to the XSD file.
- __title__ (*str, optional*) Title to save the template as.  Optional if filename is given (title will be taken as filename without ext).
- __useronly__ (*bool, optional*) If True, the template will be associated only with the user. If False (default), it will be made a global template.
- __verbose__ (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

Calling update_template() will create a new version of a template and associate it with an existing template title.  Options allow for some automatic version management of the new/old templates.

Parameters
    
- __filename__ (*str, optional*) Name of the XSD schema file to upload for the template.  Optional if title is given (filename will be taken as "title".xsd).
- __content__ (*str or bytes, optional*) String contents of an XSD schema file to upload for the template.  Optional if filename is given as a full path to the XSD file.
- __title__ (*str, optional*) Title to save the template as.  Optional if filename is given (title will be taken as filename without ext).
- __template_manager__ (*pandas.Series, optional*) Can be given instead of title if the template_manager info has already been retrieved from the database.
- __set_current__ (*bool, optional*) If True (default), will set the uploaded version of the template to be the current active version.
- __disable_old__ (*bool, optional*) If True, any active versions of the template besides the newly created and the current version (if different) will be disabled.  Default value is False.
- __verbose__ (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

### 6.3. Template version management operations

As templates cannot be modified or deleted, there are a number of operations that allow for template versions and managers to be made less accessible.

Individual template versions can be set as disabled or returned to active status using disable_template() and restore_template(), respectively.  Disabling a template version prevents users from adding records to that template version.  For each template title, a single template version is considered the "current" version. The set_current_template() method changes which template version is the current one.  All three of these methods rely on parameters that identify a specific version of a template.

Parameters
    
- __title__ (*str, optional*) The template title.
- __version__ (*int, optional*) The version of the template to make current.  Required unless template_id is given.  Note that version numbers start at 1.
- __template_manager__ (*pandas.Series, optional*) Template manager information for the template.  This can be given instead of title to avoid querying for the template manager.
- __template_id__ (*str, optional*) The database id for the template to set as current.  If given, then no other parameters are allowed (or needed).
- __verbose__ (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

Additionally, all versions of a given template can be disabled or restored using disable_template_manager() and restore_template_manager(), respectively.  The parameters for these methods are used to identify the template family to disable/restore. 

Parameters
    
- __title__ (*str, optional*) The template title.
- __template_manager__ (*pandas.Series, optional*) Template manager information for the template.  This can be given instead of title to avoid querying for the template manager.
- __verbose__ (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

### 6.4. Template migration and validation

Each record is associated with a specific template. Migration allows for records to be associated with a different template instead, as long as the record contents can be validated against the new template's schema.  This type of operation is typically done when a new version of a template is added that extends the current template.  Validation performs the schema validation checks for the records against the new template without migrating the records to the new template.

These operations Will hopefully be added soon as the proper use of the REST calls are being worked out. 

## 7. Managing XSLTs 

XSLT files allow for the XML content of records to be transformed. This is used by CDCS for a variety of purposes
- Transformations to HTML can be used to render the contents of record entries into user-friendly web pages.
- Transformations can also be specified to convert the record contents into other text-based formats that users may wish to download.
- A transformation can also be used for data migration when a new template version is not backwards compatible with the current template version.  Here, the transformation would be XML to XML so that the content is valid with the new template schema. 

### 7.1. Exploring existing XSLTs

Similar to other content types, get_xslts() and get_xslt() fetch all or one matching XSLT files from the database.  Each XSLT can be identified by its name or filename.

Parameters
 
- __name__ (*str, optional*) The xslt name to limit the search by.
- __filename__ (*str, optional*) The xslt filename to limit the search by.

### 7.2. Uploading a new XSLT

New XSLTs can be uploaded using upload_xslt().

Parameters
    
- __name__ (*str, optional*) The name to associate with the XSLT file.  Optional if filename is given as name will be taken as filename without its extension.
- __filename__ (*str, optional*) The filename to associate with the XSLT file.  Optional if name and content are given.  If not given, filename will be set to name + '.xsl'. Will read the file contents if the file exists and content is not given.
- __content__ (*str or bytes, optional*) XSLT file contents.  Optional if filename is given and points to a file that exists.
- __verbose__ (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

### 7.3. Modifying an existing XSLT

Any uploaded XSLT entries can be modified and updated using update_xslt(). Note that there are additional parameters compared to other update operations as the name and filename fields can be updated in addition to the content.

Parameters
    
- __name__ (*str, optional*) An xslt name. Will be used to identify the existing xslt entry to update if neither xslt nor xslt_id parameters are given. If either xslt or xslt_id are given, this can be used to assign a new name to the entry.
- __filename__ (*str, optional*) An xslt filename. Will be used to identify the existing xslt entry to update if neither xslt nor xslt_id parameters are given. If either xslt or xslt_id are given, this can be used to assign a new filename to the entry.
- __content__ (*str, optional*) New xsl content to assign to the entry.
- __newname__ (*str, optional*) New name to assign to the entry.
- __newfilename__ (*str, optional*) New filename to assign to the entry.
- __xslt__ (*pd.Series, optional*) The xslt entry information for the entry that is being updated.  
- __xslt_id__ (*str, optional*) The database id that uniquely identifies the xslt entry. 
- __verbose__ (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

### 7.4. Deleting an existing XSLT

Any XSLT files can also be removed from the database using delete_xslt().

Parameters
    
- __name__ (*str, optional*) Values for name and filename can be specified to try to uniquely identify the XSLT entry to delete.  Cannot be combined with xslt or xslt_id as they uniquely identify the XSLT on their own.
- __filename__ (*str, optional*) Values for name and filename can be specified to try to uniquely identify the XSLT entry to delete.  Cannot be combined with xslt or xslt_id as they uniquely identify the XSLT on their own.
- __xslt__ (*pd.Series, optional*) The xslt entry information for the entry that is to be deleted.  Cannot be combined with name, filename or xslt_id.
- __xslt_id__ (* str, optional*) The database id that uniquely identifies the xslt entry to delete.  Cannot be combined with name, filename or xslt.
- __verbose__ (*bool, optional*) Setting this to True will print extra status messages.  Default value is False.

## 8. Managing PID XPATHs

To help make data in a CDCS more compliant with FAIR practices, a PID (persistent identifier) can be assigned to each record entry and blob.  Without a PID, the URL for accessing a record/blob contains the database ID that was assigned to that content. The database ID is not persistent as deleting and reuploading the content will result in it being assigned a different database ID. This is especially relevant when restoring a database from a backup as the database IDs are not retained.

The PID XPATH operations listed here make it easy for PIDs to be automatically assigned to records of a given schema.  The steps involved are

1. The CDCS database needs to be set up to allow for automatically assigning PIDs, and one or more domains needs to be defined.  Contact whoever is in charge of hosting your CDCS instance as these options either require high administration privileges or are not accessible with REST commands or from the web interface.
2. In order for a PID to be assigned to a record, the PID must be included in the record's XML content as a full URL.  The database automatically finds the PID URL field from the XML content based on an XPATH that is set on a per-template basis.  The methods described in this section allow for these template-specific PID XPATHs to be managed.
3. Each PID URL will be of the form {host}/pid/rest/{provider}/{domain}/{record}.  Host is the host URL for the database, provider is typically "local", domain is one of the domains defined in step #1, and record then is some record-specific ID that makes the full URL unique.  As long as this PID URL is in the appropriate XML field specified by the PID XPATH for the template, is consistent with the form described, and is unique, then the record can be accessed from the PID URL.

### 8.1. Exploring existing PID XPATHs


The currently set PID XPATHs can be retrieved and explored with get_pid_xpaths() and get_pid_xpath().

Parameters

- __template__ (*str or pandas.Series, optional*) The template or template title to limit the search by.

### 8.2. Uploading and updating PID XPATHS

PID XPATHs can be set and changed with upload_pid_xpath() and update_pid_xpath(), respectively.
    
Parameters
    
- __template__ (*str or pandas.Series*) The template or template title.
- __xpath__ (*str*) The xpath to use for the pid field for the template.

### 8.3. Deleting PID XPATHS

A set PID XPATH can be deleted with delete_pid_xpath().
    
Parameters
    
- __template__ (*str or pandas.Series*) The template or template title.