
# Mining the social web 
## Workout 2. Understanding JSON

- CityU COM5507 201819A - Unit 2: Web data collection
- 24 Oct 2018, Week 8: Mining the social web - data formats 


- Course Instructor: [Dr. Xinzhi Zhang](www.drxinzhizhang.com)  (JOUR, Hong Kong Baptist University) 
  - xzzhang2@gmail.com


- The codes in this notebook are modified from various sources. All codes are for educational purposes only and released under the CC1.0. 

In [None]:
# JSON
# tutorial: https://stackabuse.com/reading-and-writing-xml-files-in-python/
# official tutorial: https://docs.python.org/3.7/library/xml.etree.elementtree.html 

## Finding the XML elements 

In [None]:
import xml.etree.ElementTree as ET # always import this when parsing XML 

In [None]:
data = '''
<person>
  <name>Chuck</name>
  <phone type="intl">
     +1 734 303 4456
   </phone>
   <email hide="yes"/>
</person>
'''

In [None]:
tree = ET.fromstring(data)
print('Name:', tree.find('name').text)
# what happened if we print: print('Name:', tree.find('name'))? 
print('Phone:', tree.find('phone').text) 
# compare: print('Phone:', tree.find('phone').text.strip()) 
print('Attrib', tree.find('email').get('hide'))

In [None]:
input = '''
<stuff>
    <users>
        <user x="2">
            <id>001</id>
            <name>Chuck</name>
        </user>
        <user x="7">
            <id>009</id>
            <name>Brent</name>
        </user>
    </users>
</stuff>'''

In [None]:
stuff = ET.fromstring(input)
lst = stuff.findall('users/user')
print('User count:', len(lst))

In [None]:
for item in lst:
    print('Name', item.find('name').text)
    print('Id', item.find('id').text)
    print('Attribute', item.get("x"))

## A real example: the Hong Kong LegCo voting
- Open data: https://www.legco.gov.hk/general/chinese/open-legco/cm-201819.html#n1a
- Data source: https://www.legco.gov.hk/yr18-19/chinese/counmtg/voting/cm_vote_20181010.xml
- The Hong Kong LegCo voting record documented on the LegCo website is a very good example of open data, as well as a good practicum field for data analytical skills. 

In [None]:
import xml.etree.ElementTree as ET  
tree = ET.parse('cm_vote_20181010.xml')  
root = tree.getroot()

In [None]:
root.tag

In [None]:
root.attrib

In [None]:
for child in root:
    print(child.tag, child.attrib) 

In [None]:
for member in root.iter('member'):
    print(member.attrib) 

In [None]:
names = []
votes = []
for member in root.iter('member'):
    vote = member.find('vote').text
    name = member.get('name-ch') 
    #print(name, vote)
    votes.append(vote)
    names.append(name)

In [None]:
import pandas as pd

In [None]:
df_vote = {
    'names': names,
    'votes': votes
}

In [None]:
pd_vote = pd.DataFrame.from_dict(df_vote) 
print(pd_vote.info())

In [None]:
pd_vote

## Challenges: 
1. Try to use Pandas package to play with this voting record and dig further about the patterns of voting, if any. 