## XML

### AGENDA

* Introduction to XML and Parsing
* Python XML Parsing Modules
* xml.etree.ElementTree Module
    * Parsing
    * Finding Elements
    * Modifying Elements

* xml.dom.minidom Module
    * Parsing
    * Finding Elements
    * Finding Length

### Introduction

* XML stands for **Extensible Markup Language**
* Similar to HTML in its appearance
* XML is used for data presentation(数据显示)
* XML is exclusively designed to send and receive data

### Python XML Parsing Modules

* **xml.etree.ElementTree**     
    Formats XML data in a tree structure which is the most natural representation of hierarchical data.  
* **xml.dom.minidom**      
    Used by people who are proficient with DOM(Document Object module).DOM applications often start by **parsing XML into DOM**
   
### xml.etree.ElementTree

![Screen Shot 2021-08-17 at 4.24.28 PM.png](attachment:2775b7f3-246c-4eb5-ad8a-d57d89f5f6e7.png)    

#### Parsing

This function takes XML in file format to parse it.

Example:   

In [2]:
import xml.etree.ElementTree as ET
mytree = ET.parse('sample.xml')
myroot = mytree.getroot()
print(myroot)

<Element 'metadata' at 0x7fd654ba5450>


In [3]:
data = """<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<food>
    <item name="breakfast">Idly</item>
    <price>$2.5</price>
    <description>
        Two idly's with chutney
    </description>
    <calories>553</calories>
</food>
</metadata>"""
myroot = ET.fromstring(data)
print(myroot.tag)

metadata


#### Finding Elements

You can find various elements and sub-elements using `tag`,`attrib`,`text`,etc.   

Example:

In [6]:
myroot[0]

<Element 'food' at 0x7fd655302f40>

In [7]:
myroot[0].tag

'food'

In [5]:
for x in myroot[0]:
    print(x)
    print('-'*30)
    print(x.tag, x.attrib)

<Element 'item' at 0x7fd65547d860>
------------------------------
item {'name': 'breakfast'}
<Element 'price' at 0x7fd65547d7c0>
------------------------------
price {}
<Element 'description' at 0x7fd65547d6d0>
------------------------------
description {}
<Element 'calories' at 0x7fd65547d680>
------------------------------
calories {}


In [9]:
for x in myroot[0]:
    print(x.text)

Idly
$2.5

        Two idly's with chutney
    
553


In [10]:
import xml.etree.ElementTree as ET
mytree = ET.parse('sample.xml')
myroot = mytree.getroot()

for x in myroot.findall('food'):
    item = x.find('item').text
    price = x.find('price').text
    print(item, price)

Idly $2.5


#### Modifying

XML can be modified using functions such as `set()`,`SubElement()`,etc.

Example:

In [11]:
import xml.etree.ElementTree as ET
mytree = ET.parse('sample.xml')
myroot = mytree.getroot()

for x in myroot.iter('description'):
    a = str(x.text) + 'Description has been added'
    x.text = str(a)
    x.set('updated', 'yes')
mytree.write('new.xml')