# DOC_TYPE: Mapping excel to XML

The goal of this notebook is to generate XML documents for AMDoc Types from the metadata entered in excel and optionally to upload them in the <b>private</b> CDCS AM Bench project site. The persistent ID (<b>PID</b>) to an XML document is assigned by CDCS for its first time being uploaded in CDCS. Afterwards, the PID of the document remains the same regardless whether its content changes. 

For generating XML documents only you can use public CDCS site as anonymous user. For uploading documents you need a permission in order to do so. If you wish to get the permission, please contact Lyle E. Levine (lyle.levine@nist.gov).

In [None]:
import sys
import os
import io
import pprint
import lxml.etree as ET
import xml.dom.minidom
import importlib
import glob
import json
import string
import datetime
import pandas
import getpass
import matplotlib.pyplot as plt
import requests
from PIL import Image, ImageDraw, ImageFont

In [None]:
# Set False for requiring to validate XML document when generate them.
# Instead validate them after all documents are generated.

import pyxb
pyxb.RequireValidWhenGenerating(False);

# Initiantiate __CONFIG class 
- In order to run this notebook create your own configuration file in JSON format. Please see the example given in default_config.json.
- Enter your json file in the argument of the constructor of <code>__CONFIG class</code> defined in config.py. If no argument is passed in the constructor, default_config.json is used.


In [None]:
# Import config and instantiate __CONFIG class.
import config
from config import __CONFIG

CONFIG = __CONFIG(conf_json = "./myconfig-sciserver.json")


In [None]:
# If USER or PASS are null in your configuration setting, enter them interactively.
# For anonymous user enter nothing.

if CONFIG.USER is None:
    CONFIG.USER = input('username: ')
if CONFIG.PASS is None:
    CONFIG.PASS = getpass.getpass('enter password ')

AUTH=(CONFIG.USER, CONFIG.PASS)    


In [None]:
# Include the directory path for the required Python modules.

sys.path.insert(0, CONFIG.pyUTILS_path)
import ambench.cdcs_utils
from ambench.cdcs_utils import AMBench2022, xmlschema
from ambench.mapping import new_mapper


# Create AMBench2022 instance
* AMBench2022 is a wrapper class of which base class is CDCS from pycdcs. It has additional methods including querying, and uploading XML schemas and documents in the CDCS instance.

In [None]:
ambench2022=AMBench2022(CONFIG.TEMPLATE,CONFIG.AMBENCH_URL,auth=AUTH)

# Create XML Validator from in-memory schema files using xmlschema.

In [None]:
# xsd_filename is a file path of ROOT_SCHEMA

xsd_filename=f'{CONFIG.XSD}{CONFIG.ROOT_SCHEMA}'
VALIDATOR=xmlschema.XMLSchema(xsd_filename,build=False)
VALIDATOR.build()
VALIDATOR.validity 

# Generate, Validate and Load XML files
* Use function <code>map_them</code> to map the metadata for samples or measurements from AM Bench project entered in Excel spreadsheets to XML files.  
    * Select the types of samples or measurements to map and pass them as argment <code>DOC_TYPES2LOAD</code> of <code>map_them</code>.
    * There are dependencies between the XML documents because of hyper links referring from one document to another. For example, documents for measurements refer to the documents describing the specimens used in their measurements. Or a specimen document may refer to the one for a source build part, or a material and etc.  Therefore, the documents must be generated in a proper order by their types as given in <code>ALL_DOC_TYPES</code> below.
* Validate XML documents using <code>validate_amdocs</code>.
* Upload using <code>load_amdocs_cdcs</code>

In [None]:
def validate_amdocs(amdocs):
    '''
    Validate generated XML files.
    
    amdocs: a dict of generated  XML files and and flag indicating 
            whether XML file is new or not.
    '''
    for xmlfile,is_new in amdocs.items():
        v=VALIDATOR.is_valid(xmlfile)
        if not(v):
            try:
                VALIDATOR.validate(xmlfile)
            except Exception as e:
                print(xmlfile,"\n",e,"\n=====\n")
                raise e
    return True

In [None]:
def load_amdocs_cdcs(amdocs):
    '''
    Load XML documents to CDCS.
    
    amdocs: a dict of XML files and their flags indicating whether each XML file is new or not.
    '''
    uploaded={}
    errors={}
    for f,is_new in amdocs.items():
        fn=os.path.basename(f)
        if is_new:
            print('upload new:', f)
            response=ambench2022.upload_data(f)
        else:
            print('update existing:', f)
            response=ambench2022.update_data(f)

        if response.ok:
            uploaded[fn]=response.json()    
        else:
            errors[fn]=response.json()
    return uploaded,errors

In [None]:
# The order of loading doc types.
ALL_DOC_TYPES=['Material','AMPowder','AMBuildPlate','AMBuildPart','AMBSpecimen',
               'AMComposition', 'AMLaserAbsorptivity', 'AMRadiography',  'AMMechanicalTesting']

def map_them(DOC_TYPES2LOAD,doLoad=False,breakOnError=True,folder='TEMP'):
    '''
    Generate, and validate XML files.  If doLoad is True upload them to the CDCS AM Bench database. 
    
    DOC_TYPES2LOAD: List of document types to generate and upload 
    doLoad: Flag whether to upload the generated documents to the CDCS instance or not.
    breakOnError: Flag whether to break the loop or not if there is an error in generating a document.
    folder: Name of folder where to stage the generated documents in local file system. The full folder 
            path depends on document type <outfolder> below.
    '''
    all_handled={}
    for DOC_TYPE in DOC_TYPES2LOAD:
        print("DOC_TYPE:",DOC_TYPE)
        
        outfolder=f"{CONFIG.LOADING}/{folder}/{DOC_TYPE}/"
        os.makedirs(outfolder,exist_ok=True)
        
        MAPPER=new_mapper(ambench2022, DOC_TYPE, CONFIG)
        try :
            amdocs=MAPPER.map_from_excel(outfolder,verbose=True) 
            ok = validate_amdocs(amdocs)
            if not ok:
                print()
            if doLoad and ok:
                load_amdocs_cdcs(amdocs)
            all_handled[DOC_TYPE]=amdocs
        except Exception as e:
            print(e)
            if breakOnError:
                break
    return all_handled

In [None]:
%%time
# Specify the list of AM Bench documnent types to map
doc_types=ALL_DOC_TYPES

r=map_them(doc_types,doLoad=False)