# Lecture 11
## Input File Interlude

### Wednesday, October 11th 2017

# Input Files and Parsing

We usually want to read data into our software:
* Input parameters to the code (e.g. time step, linear algebra solvers, physical parameters, etc)
* Input fields (e.g. fields to visualize)
* Calibration data
* $\vdots$

This data can be provided by us, or the client, or come from a database somewhere.

There are *many* ways of reading in and parsing data.  In fact, this is often a non-trivial exercise depending on the quality of the data as well as its size.

Our immediate concern will be with how to read chemical reaction information into our chemical kinetics code.

Many kinetics codes read reaction information in from files in `.xml` format.

## XML Intro

```xml
<?xml version="1.0"?>

<ctml>

    <reactionData id="test_mechanism">

        <!-- reaction 01  -->
        <reaction reversible="yes" type="Elementary" id="reaction01">
            <equation>H + O2 [=] OH + O</equation>
            <rateCoeff>
                <Kooij>
                    <A units="cm3/mol/s">3.52e+16</A>
                    <b>-0.7</b>
                    <E units="kJ/mol">71.4</E>
                </Kooij>
            </rateCoeff>
            <reactants>H:1 O2:1</reactants>
            <products>OH:1 O:1</products>
        </reaction>

        <!-- reaction 02 -->
        <reaction reversible="yes" type="Elementary" id="reaction02">
            <equation>H2 + O [=] OH + H</equation>
            <rateCoeff>
                <Kooij>
                    <A units="cm3/mol/s">5.06e+4</A>
                    <b>2.7</b>
                    <E units="kJ/mol">26.3</E>
                </Kooij>
            </rateCoeff>
            <reactants>H2:1 O:1</reactants>
            <products>OH:1 H:1</products>
        </reaction>

    </reactionData>

</ctml>
```

## What is XML?

**Note:** Material presented here taken from the following sources
* [w3schools XML tutorial](https://www.w3schools.com/xml/default.asp)
* [`Python` `xml.etree.ElementTree` documentation](https://docs.python.org/3.6/library/xml.etree.elementtree.html?highlight=xml%20etree)
* [`XML` Documentation](https://www.w3.org/TR/2008/REC-xml-20081126/)
* [`XML` Wikipedia Page](https://en.wikipedia.org/wiki/XML)

Some basic `XML` comments:
* XML stands for `Extensible Markup Language`
* XML is just information wrapped in tags
* It doesn't *do* anything per se
* Its format is both machine- and human-readable

## What is our business with `XML`?

We need to know enough about `XML` to be able to read in chemical reactions to our chemical kinetics library.

To accomplish this, we must know a little bit about the structure of `XML` and what `Python` libraries are out there to help us actually do the parsing.

## Some Basic `XML` Anatomy

```xml
<!-- This is an XML comment -->
<?xml version="1.0" encoding="UTF-8"?> <!-- This is the optional XML prolog -->

<dogshelter> <!-- This is the root element -->
    <dog id="dog1"> <!-- This is the first child element.
                         It has an `id` attribute -->
        <name> Cloe </name> <!-- First subchild element -->
        <age> 3 </age> <!-- Second subchild element -->
        <breed> Border Collie </breed>
        <playgroup> Yes </playgroup>
    </dog>
    <dog id="dog2"> 
        <name> Karl </name> 
        <age> 7 </age>
        <breed> Beagle </breed>
        <playgroup> Yes </playgroup>
    </dog>
</dogshelter>
```

Note that all `XML` elements have a closing tag!

## Some More Basic `XML` Anatomy
See [w3schools XML tutorial](https://www.w3schools.com/xml/default.asp) for a very nice summary of the essential `XML` rules.

`XML` elements:  a few things to be aware of:
* Elements can contain text, attributes, and other elements
* `XML` names are case sensitive and cannot contain spaces
* Be consistent in your naming convention

`XML` attributes:  a few things to be aware of:
* `XML` attributes must be in quotes
* There are no rules about when to use elements or attributes
  - You could make an attribute an element and it might read better
* Rule of thumb:  Data should be stored as elements.  Metadata should be stored as attributes.

## Python and `XML`
We will use the `ElementTree` class to read in and parse `XML` input files in `Python`.

A very nice tutorial can be found in the 
[`Python` `ElementTree` documentation](https://docs.python.org/3.6/library/xml.etree.elementtree.html?highlight=xml%20etree).

We'll work with the `shelterdogs.xml` file to start.

In [1]:
import xml.etree.ElementTree as ET
tree = ET.parse('shelterdogs.xml')
dogshelter = tree.getroot()


print(dogshelter)
print(dogshelter.tag)
print(dogshelter.attrib)

<Element 'dogshelter' at 0x103caf728>
dogshelter
{}


### Looping Over Child Elements

In [2]:
for child in dogshelter:
    print(child.tag, child.attrib)

dog {'id': 'dog1'}
dog {'id': 'dog2'}


### Accessing Children by Index

In [3]:
print(dogshelter[0][0].text)

 Cloe 


In [4]:
print(dogshelter[1][0].text)

 Karl 


In [5]:
print(dogshelter[0][2].text)

 Border Collie 


### The `Element.iter()` Method
From the documentation:
> Creates a tree iterator with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order. 

In [6]:
for age in dogshelter.iter('age'):
    print(age.text)

 3 
 7 


### The `Element.findall()` Method
From the documentation:
> Finds all matching subelements, by tag name or path. Returns a list containing all matching elements in document order.

In [7]:
print(dogshelter.findall('dog'))

[<Element 'dog' at 0x103caf908>, <Element 'dog' at 0x103d5c138>]


In [8]:
for dog in dogshelter.findall('dog'): # Iterate over each child
    print('ID:  {}'.format(dog.get('id'))) # Use the get() method to get the attribute of the child
    print('----------')
    
    print('Name:  {}'.format(dog.find('name').text)) # Use the find() method to find a specific subchild

    age = float(dog.find('age').text)
    if (dog.find('age').attrib == 'months'):
        years = age / 12.0
        print('Age: {} years'.format(years))
    else:
        print('Age: {} years'.format(age))
    
    print('Breed: {}'.format(dog.find('breed').text))
    
    if (dog.find('playgroup').text.split()[0] == 'Yes'):
        print('PLAYGROUP')
    else:
        print('NO PLAYGROUP')
    print('\n::::::::::::::::::::\n')

ID:  dog1
----------
Name:   Cloe 
Age: 3.0 years
Breed:  Border Collie 
PLAYGROUP

::::::::::::::::::::

ID:  dog2
----------
Name:   Karl 
Age: 7.0 years
Breed:  Beagle 
PLAYGROUP

::::::::::::::::::::

