# Translate the AM Bench metadata in Excel tables to XML documents

The goal of this notebook is to generate XML documents describing AM Bench 2022 series data.

The AM Bench XML schema is an implementation of a data model that describes the Additive Manufacturing Benchmark 2022 series data. The data model provides a robust set of metadata for the build processes and their resulting specimens and
for measurements made on these in the context of the AM Bench 2022 project. The metadata are entered in excel spreadsheets by the project scientists. The metadata in excel files are translated to XML documents compliant with the schema using Python scripts defined in <code>ambench.mapping</code> in this notebook. Both the schema and the XML documents are uploaded into <b>private</b> AM Bench CDCS datebase instance (henceforth called CDCS) using REST API provided by <code>pycdcs</code> (https://github.com/usnistgov/pycdcs). This is described in <code>XML Schema to Template Loader.ipynb</code>.

This notebook includes the followings:

* Generating XML files.
* Validate the generated XML files against XML schema.
* Optionally uploading the XML files in CDCS.

Generating AM Bench XML files requires querying CDCS database (Please see <b>Persistent Identifier</b> below). Any user can query the public CDCS AM Bench database as anonymous user. Therefore, anyone can generate XML documents. 

_For uploading documents in private CDCS database you need a permission in order to do so._ If you wish to get the permission, please contact Lyle E. Levine (lyle.levine@nist.gov).


# Persistent Identifier

All AM Bench XML documents have unique identifier called _Persistent Identifier_. The persistent Identifier (<b>PID</b>) to an XML document is assigned by CDCS for its first time being uploaded in CDCS. Afterwards, the PID of the document remains the same even if its content changes. Therefore, generating XML documents requires to query the CDCS AM Bench database in order to check whether XML documents already exist in the CDCS or not. The PIDs of the existing documents are assigned to the updated ones.
The PIDs can be obtained from public CDCS by using REST API provided by <code>pycdcs</code>. 


# Relationship between AM Bench XML files.

There are dependencies among the AM Bench XML documents because of hyper links referring from one AM Bench XML file to another. For example, XML documents for measurements refer to another XML documents describing the specimens used in their measurements. Or an XML document for a specimen may refer to the one for its  source build part, or its source material.  Therefore, the XML files must be generated in a proper order by their document types as given in <code>ALL_DOC_TYPES</code> below in this notebook.

# Steps of Generating, and upoading XML files

Before running this notebook: 
* Create a configuration file in JSON format. The example is given in <code>default_config.json</code>

0. Import Python libraries (Step 0).
1. Instantiate <code>__CONFIG class</code> (Step 1).
2. Connect to CDCS (Step 2).
3. Create XML validator (Step 3).
4. Define functions used in this notebook (Step 4). The function <code>map_them</code> plays a role of an application controller. 

5. Generate, validate and upload XML files (Step 5).
    1. Select which document types to generate XML documents and pass them as argment <code>DOC_TYPES2LOAD</code> of <code>map_them</code>. 
    2. Upon completion of generating all XML files, validate them  using <code>validate_amdocs</code> in <code>map_them</code>.
    3. Optionall, upload the XML files to CDCS.

## Step 0. Import Python modules
Import Python modules required by running this notebook.

In [None]:
import sys
import os
import io
import pprint
import lxml.etree as ET
import xml.dom.minidom
import importlib
import glob
import json
import string
import datetime
import pandas
import getpass
import matplotlib.pyplot as plt
import requests
from PIL import Image, ImageDraw, ImageFont

In [None]:
# Set False for requiring to validate XML document when generate them.
# Instead validate them after all documents are generated.

import pyxb
pyxb.RequireValidWhenGenerating(False);

## Step 1. Instantiate __CONFIG class 
- In order to run this notebook create your own configuration file in JSON format. Please see the example given in <code>default_config.json</code>.
- Enter your JSON file in argument of the constructor of <code>__CONFIG class</code> defined in <code>config.py</code>. If no argument is passed in the constructor, <code>default_config.json</code> is used.

In [None]:
# Import config and instantiate __CONFIG class.
import config
from config import __CONFIG

CONFIG = __CONFIG(conf_json = "./your_config.json")


In [None]:
# If USER or PASS are null in your configuration, you are asked enter them in the prompts interactively.
# For anonymous user enter nothing in the prompts you get when you run this cell.

if CONFIG.USER is None:
    CONFIG.USER = input('username: ')
if CONFIG.PASS is None:
    CONFIG.PASS = getpass.getpass('enter password ')

AUTH=(CONFIG.USER, CONFIG.PASS)    


In [None]:
# Include the directory path for the required Python modules.

sys.path.insert(0, CONFIG.pyUTILS_path)
import ambench.cdcs_utils
from ambench.cdcs_utils import AMBench2022, xmlschema
from ambench.mapping import new_mapper


## Step 2. Create AMBench2022 instance
* AMBench2022 is a wrapper class of which base class is <code>CDCS</code> of <code>pycdcs</code>. 

In [None]:
ambench2022=AMBench2022(CONFIG.TEMPLATE,CONFIG.AMBENCH_URL,auth=AUTH)

## Step 3. Create XML Validator 
Create XML validator from in-memory schema files using xmlschema.

In [None]:
# xsd_filename is a file path of ROOT_SCHEMA

xsd_filename=f'{CONFIG.XSD}{CONFIG.ROOT_SCHEMA}'
VALIDATOR=xmlschema.XMLSchema(xsd_filename,build=False)
VALIDATOR.build()
VALIDATOR.validity 

## Step 4. Define functions 

In [None]:
def validate_amdocs(amdocs):
    '''
    Validate generated XML files.
    
    amdocs: a dict of generated  XML files and and flag indicating 
            whether XML file is new or not.
    '''
    for xmlfile,is_new in amdocs.items():
        v=VALIDATOR.is_valid(xmlfile)
        if not(v):
            try:
                VALIDATOR.validate(xmlfile)
            except Exception as e:
                print(xmlfile,"\n",e,"\n=====\n")
                raise e
    return True

In [None]:
def load_amdocs_cdcs(amdocs):
    '''
    Load XML documents to CDCS.
    
    amdocs: a dict of XML files and their flags indicating whether each XML file is new or not.
    '''
    uploaded={}
    errors={}
    for f,is_new in amdocs.items():
        fn=os.path.basename(f)
        if is_new:
            print('upload new:', f)
            response=ambench2022.upload_data(f)
        else:
            print('update existing:', f)
            response=ambench2022.update_data(f)

        if response.ok:
            uploaded[fn]=response.json()    
        else:
            errors[fn]=response.json()
    return uploaded,errors

In [None]:
# The list of all document types. 
# The order of generating XML files by documented types is according to the index of 
# document type element in this list.
# More document types will be added here as they become available.
ALL_DOC_TYPES=['Material','AMPowder','AMBuildPlate','AMBuildPart','AMBSpecimen',
               'AMComposition', 'AMLaserAbsorptivity', 'AMRadiography',  'AMMechanicalTesting']

In [None]:
def map_them(DOC_TYPES2LOAD,doLoad=False,breakOnError=True,folder='TEMP'):
    '''
    Generate, and validate XML files.  If doLoad is True upload them to the CDCS AM Bench database. 
    
    DOC_TYPES2LOAD: List of document types to generate and upload 
    doLoad: Flag whether to upload the generated documents to the CDCS instance or not.
    breakOnError: Flag whether to break the loop or not if there is an error in generating a document.
    folder: Name of folder where to stage the generated documents in local file system. The full folder 
            path depends on document type <outfolder> below.
    '''
    all_handled={}
    for DOC_TYPE in DOC_TYPES2LOAD:
        print("DOC_TYPE:",DOC_TYPE)
        
        outfolder=f"{CONFIG.LOADING}/{folder}/{DOC_TYPE}/"
        os.makedirs(outfolder,exist_ok=True)
        
        MAPPER=new_mapper(ambench2022, DOC_TYPE, CONFIG)
        try :
            amdocs=MAPPER.map_from_excel(outfolder,verbose=True) 
            ok = validate_amdocs(amdocs)
            if not ok:
                print()
            if doLoad and ok:
                load_amdocs_cdcs(amdocs)
            all_handled[DOC_TYPE]=amdocs
        except Exception as e:
            print(e)
            if breakOnError:
                break
    return all_handled

## Step 5. Generate XML files
Generate XML files for document types specified in <code>doc_types</code>. Optionally, upload the generated files in private CDCS.

In [None]:
%%time
# Specify the list of AM Bench documnent types to map
doc_types=ALL_DOC_TYPES

r=map_them(doc_types,doLoad=False)