# Homework 9: Getting Familiar with NASA Polynomials
## Due Date:  Tuesday, November 7th at 11:59 PM

Read the NASA Polynomial dataset in raw format and parse and store the data into an .xml file.

### Review of the NASA Polynomials
You can find the NASA Polynomial file in `thermo.txt`.

You can find some details on the NASA Polynomials [at this site](http://combustion.berkeley.edu/gri_mech/data/nasa_plnm.html) in addition to the Lecture 16 notes.


The NASA polynomials for specie $i$ have the form:
$$
    \frac{C_{p,i}}{R}= a_{i1} + a_{i2} T + a_{i3} T^2 + a_{i4} T^3 + a_{i5} T^4
$$

$$
    \frac{H_{i}}{RT} = a_{i1} + a_{i2} \frac{T}{2} + a_{i3} \frac{T^2}{3} + a_{i4} \frac{T^3}{4} + a_{i5} \frac{T^4}{5} + \frac{a_{i6}}{T}
$$

$$
    \frac{S_{i}}{R}  = a_{i1} \ln(T) + a_{i2} T + a_{i3} \frac{T^2}{2} + a_{i4} \frac{T^3}{3} + a_{i5} \frac{T^4}{4} + a_{i7}
$$

where $a_{i1}$, $a_{i2}$, $a_{i3}$, $a_{i4}$, $a_{i5}$, $a_{i6}$, and $a_{i7}$ are the numerical coefficients supplied in NASA thermodynamic files. 

### Some Notes on `thermo.txt`
The first 7 numbers starting on the second line of each species entry (five of the second line and the first two of the third line) are the seven coefficients ($a_{i1}$ through $a_{i7}$, respectively) for the high-temperature range (above 1000 K, the upper boundary is specified on the first line of the species entry). 

The next seven numbers are the coefficients ($a_{i1}$ through $a_{i7}$, respectively) for the low-temperature range (below 1000 K, the lower boundary is specified on the first line of the species entry).

### Additional Specifications
Your final .xml file should contain the following specifications:

1. A `speciesArray` field that contains a space-separated list of all of the species present in the file.
2. Each species contains a `species` field with a `name` attribute as the species name.

    1. For each temperature range, use a sub-field with the minimum and maximum temperature as attributes.
    2. `floatArray` field that contains comma-separated string of each coefficient.
    
You can reference the `example_thermo.xml` file for an example .xml output.

**Hint**: First parse the file into a Python dictionary. 

In [1]:
import xml.etree.ElementTree as ET
import re
from copy import deepcopy

class PolynomialsParser:
    def __init__(self, file_name, new_file_name=None):
        self.file_name = file_name
        if '.txt' == file_name[-4:]:
            with open(file_name, 'r') as f:
                self.raw = f.read().strip()
            self.data_s = self.parse_raw(self.raw)
            self.data = self.to_float(self.data_s)
            if new_file_name is None:
                new_file_name = file_name[:-4] + '.xml'
            self.save_xml(self.data_s, new_file_name)
    
    def parse_raw(self, raw, i_start=5, lines=4):
        raw = raw.split('\n')
        species = [raw[i:i+lines] for i in range(i_start, len(raw), lines)][:-1]

        sci_number = re.compile('-?[0-9]+\.?[0-9]*E[+-]?[0-9]*')
        def find_sci_numbers(s):
            return [x for x in re.findall(sci_number, s)]
        
        def parse_specie(specie):
            name = specie[0].split()[0].strip()
            Ts = specie[0].split()[-4:-1]
            line_vals = [find_sci_numbers(s) for s in specie[1:]]
            coeff_high = line_vals[0] + line_vals[1][:2]
            coeff_low = line_vals[1][2:] + line_vals[2]
            return name, Ts, coeff_high, coeff_low
        
        data = dict()
        data['speciesArray'] = []
        data['speciesData'] = dict()
        
        for specie in species:
            name, Ts, coeff_high, coeff_low = parse_specie(specie)
            data['speciesArray'].append(name)
            data['speciesData'][name] = {'Ts':Ts, 'coeff_high':coeff_high, 'coeff_low':coeff_low}
        
        return data
    
    def save_xml(self, data, file_name):
        root = ET.Element('ctml')
        
        root.append(ET.Comment('phase gri30'))
        
        phase = ET.SubElement(root, 'phase', id='gri30')
        ET.SubElement(phase, 'speciesArray', datasrc='#species_data').text = ' '.join(data['speciesArray'])
        
        root.append(ET.Comment('species definitions'))
        
        def add_speciesData(speciesData, data, p0="100000.0"):
            for name in data['speciesArray']:
                _data = data['speciesData'][name]
                speciesData.append(ET.Comment('species {}'.format(name)))
                specie = ET.SubElement(speciesData, 'species', name=name)
                thermo = ET.SubElement(specie, 'thermo')
                low = ET.SubElement(thermo, 'NASA', Tmax=_data['Ts'][-1], Tmin=_data['Ts'][0], p0=p0)
                ET.SubElement(low, 'floatArray', name='coeffs', size=str(len(_data['coeff_low']))).\
                text = ', '.join(_data['coeff_low'])
                high = ET.SubElement(thermo, 'NASA', Tmax=_data['Ts'][1], Tmin=_data['Ts'][-1], p0=p0)
                ET.SubElement(high, 'floatArray', name='coeffs', size=str(len(_data['coeff_high']))).\
                text = ', '.join(_data['coeff_high'])
        
        speciesData = ET.SubElement(root, 'speciesData', id='species_data')
        add_speciesData(speciesData, data)
        
        tree = ET.ElementTree(root)
        tree.write(file_name)
        
    def to_float(self, data_s):
        data = deepcopy(data_s)
        for name, _data in data['speciesData'].items():
            _data['Ts'] = [float(s) for s in _data['Ts']]
            _data['coeff_high'] = [float(s) for s in _data['coeff_high']]
            _data['coeff_low'] = [float(s) for s in _data['coeff_low']]
        return data

In [2]:
PolynomialsParser('thermo.txt');