## Different ways to parse xml using python
1. Element tree module  - for document oriented files

python documentation: https://docs.python.org/3/library/xml.etree.elementtree.html#module-xml.etree.ElementTree

XML - Extensible Markup language
XSL - Extensible stylesheet language
XSLT - XSL transformations

In [4]:
import xml.etree.ElementTree as ET
import pprint

In [5]:
# file link: http://content.udacity-data.com/ud032/exampleResearchArticle.xml
file = 'exampleResearchArticle.xml' 

In [8]:
# loading the entire XML file as a tree
tree = ET.parse(file)

In [9]:
root = tree.getroot()

In [20]:
#As an Element, root has a tag and a dictionary of attributes
root.tag

'art'

In [22]:
root.attrib

{}

In [25]:
for child in root:
    print("Tag {0}, attrib {1}".format(child.tag, child.attrib))

Tag ui, attrib {}
Tag ji, attrib {}
Tag fm, attrib {}
Tag bdy, attrib {}
Tag bm, attrib {}


In [30]:
#support indexing
root[2]

<Element 'fm' at 0x1069cacc8>

In [34]:
title = root.find('./fm/bibl/title')
title_text = ""
for p in title:
    title_text += p.text
print(title_text)
    

Standardization of the functional syndesmosis widening by dynamic U.S examination


In [36]:
for element in root.findall('./fm/bibl/aug/au'):
    print(element.find('email').text)

omer@extremegate.com
mcarmont@hotmail.com
laver17@gmail.com
nyska@internet-zahav.net
kammarh@gmail.com
gideon.mann.md@gmail.com
barns.nz@gmail.com
eukots@gmail.com


In [62]:
# get all the authors from the xml file
def get_authors(root):
    authors = []
    for author in root.findall('./fm/bibl/aug/au'):
        data = {
            "fnm": author.find('fnm').text,
            "snm": author.find('snm').text,
            "email": author.find('email').text,
            "insr": []
        }
        for iid in author.findall('insr'):
            data['insr'].append(iid.attrib['iid'])
        authors.append(data)
    return authors

In [63]:
authors = get_authors(root)

In [64]:
authors

[{'email': 'omer@extremegate.com',
  'fnm': 'Omer',
  'insr': ['I1'],
  'snm': 'Mei-Dan'},
 {'email': 'mcarmont@hotmail.com',
  'fnm': 'Mike',
  'insr': ['I2'],
  'snm': 'Carmont'},
 {'email': 'laver17@gmail.com',
  'fnm': 'Lior',
  'insr': ['I3', 'I4'],
  'snm': 'Laver'},
 {'email': 'nyska@internet-zahav.net',
  'fnm': 'Meir',
  'insr': ['I3'],
  'snm': 'Nyska'},
 {'email': 'kammarh@gmail.com',
  'fnm': 'Hagay',
  'insr': ['I8'],
  'snm': 'Kammar'},
 {'email': 'gideon.mann.md@gmail.com',
  'fnm': 'Gideon',
  'insr': ['I3', 'I5'],
  'snm': 'Mann'},
 {'email': 'barns.nz@gmail.com',
  'fnm': 'Barnaby',
  'insr': ['I6'],
  'snm': 'Clarck'},
 {'email': 'eukots@gmail.com', 'fnm': 'Eugene', 'insr': ['I7'], 'snm': 'Kots'}]