# Record Classes

This Notebook provides a general overview of the Record subclasses and outlines the common methods and attributes that they share.

The potentials package is designed to interact with different types of data, which are referred to as "record styles".  Each style has its own Record subclass, and is associated with a particular data model schema. As style-subclass-schema are all uniquely linked, the terms may be used interchangeably in this documentation even if they are not technically equivalent.

The Record subclasses are designed to allow for users to interact with the data in three primary representations:

- The "python" representation interprets data fields as class attributes that can be directly accessed and (for most styles) modified.  This allows for data entries to be examined and created without the user needing to know the underlying schema.  It also allows for the Record class to have built-in methods and attributes that use the data in meaningful ways.
- The "data model" representation corresponds to the database entry schema associated with that style of record, which is equivalently represented in JSON and XML.  Each Record class can load data model records, store the loaded data model internally, and (re)build the data model based on the current class attributes.
- The "metadata" representation corresponds to a dictionary of simple metadata fields extracted from the record.  This is meant to provide a simple overview of each record which can be used for comparing and parsing.


In [1]:
# Standard Python libraries
from pathlib import Path

import pandas as pd

# https://github.com/lmhale99/potentials
import potentials

print('Notebook tested for potentials version', potentials.__version__)

Notebook tested for potentials version 0.3.0


## 1. Loading and managing record styles

The different record styles are managed in a modular way making it possible for new record styles to be easily added in the future.

### 1.1. potentials.recordmanager.check_styles()

The recordmanager is an object that manages the different Record classes and allows for them to be imported, found and accessed according to their record styles.  

Calling recordmanager.check_styles() will print messages about which record styles are included and if any error messages were encountered.  Typically, these error messages indicate that a particular style may require additional Python packages to be installed in order to be used.

In [2]:
potentials.recordmanager.check_styles()

Record styles that passed import:
- Citation
- Potential
- potential_LAMMPS
- potential_LAMMPS_KIM
- Action
- Request
- FAQ
Record styles that failed import:



### 1.2. potentials.load_record()

load_record() will create an object of a Record subclass based on a given record style.  This can be used to load a single local record or to start creating a new record.

- __style__ (*str*) The record style.
- __name__ (*str, optional*) The name to give to the specific record.
- __model__ (*str, DataModelDict, optional*) Data model content to load for the given record style.
- __\*\*kwargs__ (*any, optional*) Any extra keyword parameter supported by the record style.  Typically, these are attributes of the data that are set directly rather than loading from a model.

In [3]:
# Create a new FAQ record
record = potentials.load_record('FAQ', name='test')

## 2. Python representation

Most of the methods and attributes of the various record styles are designed to access and use the particular data that the record entry represents.  As such, the different record styles only share a few common properties.

- __style__ (*str*) The record style.  
- __name__ (*str*) The name assigned to the record.  This is used to save the record, so it should be unique at least for all records of the same style.
- __set_values(name=None, \*\*kwargs)__ Allows for the values of multiple attributes to be set at the same time.  The allowed kwargs depends on the style.

In [4]:
print('record.style:', record.style)
print('record.name: ', record.name)

record.style: FAQ
record.name:  test


FAQ records only have question and answer values.  These can be set directly to the record.question and record.answer attributes, or both at the same time using record.set_values()

In [5]:
question = "How much wood would a woodchuck chuck if a woodchuck could chuck wood?"
answer = "Sixty six sycamore sticks"

record.set_values(question=question, answer=answer)

## 3. Data model representation

### 3.1 Model operations

- __model__ (*DataModelDict*) The data model contents that were loaded/built.
- __modelroot__ (*str*) The name of the root element in the data model.
- __load_model(model, name=None)__ Loads in new data model contents and updates the Python attributes accordingly.
- __build_model()__ Builds data model content based on the current values of the Python attributes.  Any existing model will be replaced by the newly built contents.
- __reload_model()__ Loads the current value of model and updates the Python attributes accordingly.  Allows for direct changes to be made to the model representation.

In [6]:
print('record.modelroot:', record.modelroot)

record.modelroot: faq


__NOTE__: As the model values were set to the object and not loaded from a data model, no model exists yet!

In [7]:
# Model is not set yet as values were directly set above
try:
    record.model
except Exception as e:
    errortype = str(type(e)).split("'")[1]
    print(f'{errortype}: {e}')

AttributeError: model content has not been loaded or built


However, model does exist after building.  Model can then be accessed as a DataModelDict, or transformed into JSON or XML.

In [8]:
record.build_model()

print('Python DataModelDict of the data model:')
print(record.model)
print()

print('JSON of the data model:')
print(record.model.json(indent=4))
print()

print('XML of the data model:')
print(record.model.xml(indent=2))

Python DataModelDict of the data model:
DataModelDict([('faq', DataModelDict([('question', 'How much wood would a woodchuck chuck if a woodchuck could chuck wood?'), ('answer', 'Six sycamore sticks')]))])

JSON of the data model:
{
    "faq": {
        "question": "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
        "answer": "Six sycamore sticks"
    }
}

XML of the data model:
<?xml version="1.0" encoding="utf-8"?>
<faq>
  <question>How much wood would a woodchuck chuck if a woodchuck could chuck wood?</question>
  <answer>Six sycamore sticks</answer>
</faq>


The values can be updated by changing the record attributes, after which build_model() would need to be called again for model to reflect this.

Alternatively, the values can be changed directly to the model, then call reload_model() for the attributes to be updated.

In [9]:
print('record.answer:', record.answer)

# Correct the answer directly to the model
record.model['faq']['answer'] = 'A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood.'

# Note that record.answer is currently unchanged
print('record.answer:', record.answer)

# But, after calling reload_model() it is
record.reload_model()
print('record.answer:', record.answer)

record.answer: Six sycamore sticks
record.answer: Six sycamore sticks
record.answer: A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood.


### 3.2. XSD schema

The XSD shema describes the proper fields that a record can have for the style.

- __xsd_filename__ (*tuple*) The module location and filename for the record style's XSD schema.
- __xsd__ (*bytes*) The contents of the XSD schema file.
- __valid_xml(xml_content=None)__ Returns True/False based on if model or a given XML content is valid according to the XSD schema file.

In [10]:
print('record.xsd_filename:', record.xsd_filename)
print('record.valid_xml():', record.valid_xml())
print('record.xsd:')
print(record.xsd.decode('UTF-8'))


record.xsd_filename: ('potentials.xsd', 'FAQ.xsd')
record.valid_xml(): True
record.xsd:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" attributeFormDefault="unqualified" elementFormDefault="unqualified">
  <xsd:element name="faq">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="question" type="xsd:string"/>
        <xsd:element name="answer" type="xsd:string"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>


### 3.3. XSL transformation to HTML

The XSL transformation file converts the XML representation of the model to HTML.

- __xsl_filename__ (*tuple*) The module location and filename for an XSL transformation file that transforms the XML representation of the data model to HTML.
- __xsl__ (*bytes*) The contents of the XSL transformation file.
- __html(render=False)__ Generates HTML content based on model and the XSL transformation file.

In [11]:
print('record.xsl_filename:', record.xsd_filename)
print('record.xsl:')
print(record.xsl.decode('UTF-8'))

record.xsl_filename: ('potentials.xsd', 'FAQ.xsd')
record.xsl:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns="http://www.w3.org/TR/xhtml1/strict">
  <xsl:output method="html" encoding="utf-8" indent="yes" />
  
  <xsl:template match="faq">
    <div>
      <b>
        <xsl:text>Question: </xsl:text>
        <xsl:value-of select="question" disable-output-escaping="yes"/>
      </b>
      <br/>
      <xsl:text>Answer: </xsl:text>
      <xsl:value-of select="answer" disable-output-escaping="yes"/>
    </div>
  </xsl:template>
</xsl:stylesheet>


With render=False, html() returns the HTML content as a string.

In [12]:
print(record.html())

<div xmlns="http://www.w3.org/TR/xhtml1/strict"><b>Question: How much wood would a woodchuck chuck if a woodchuck could chuck wood?</b><br/>Answer: A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood.</div>


With render=True, html() will properly render the HTML content when called in Jupyter Notebooks

In [13]:
record.html(render=True)

### 3.4. Query methods

Query methods are defined for each record that convert simple function inputs into database-specific queries.  Ideally, these query methods and the pandasfilter() method discussed below should identify the same matching records for the same kwarg inputs.

- __mongoquery(\*\*kwargs)__ Builds a mongo-style query based on the given kwargs values for style-specific attributes.
- __cdcsquery(\*\*kwargs)__ Builds a mongo-style query for CDCS databases based on the given kwargs values for style-specific attributes.

In [14]:
# For FAQs, the query built will search for question and/or answer fields containing specified strings
record.mongoquery(question="woodchuck")

{'$and': [{'content.faq.question': {'$regex': 'woodchuck'}}]}

## 4. Metadata representation

The Record classes can only generate a metadata dictionary based on the current attributes and cannot load information in this form.  This is because the metadata representation is purposefully limited to simple data types and therefore may not contain all of the record's information.  Limiting the metadata output to simple data types ensures that it can be easily parsed and compared across different records.

- __metadata()__ Returns the metadata dictionary for the record.
- __pandasfilter(dataframe, \*\*kwargs)__ Generates a boolean index based on a given pandas.DataFrame of metadata fields based on the given kwargs values for style-specific attributes.  Ideally, this should as the query methods discussed above do for the same kwarg inputs.

In [15]:
record.metadata()

{'name': 'test',
 'question': 'How much wood would a woodchuck chuck if a woodchuck could chuck wood?',
 'answer': 'A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood.'}

Build a single-row DataFrame based on the record.

In [16]:
df = pd.DataFrame([record.metadata()])

In [17]:
record.pandasfilter(df, question='woodchuck')

0    True
dtype: bool

In [18]:
record.pandasfilter(df, question='bull moose')

0    False
dtype: bool