# JSON and XML - Lab

## Introduction

In this lab, you'll practice navigating JSON and XML data structures.

## Objectives
You will be able to:
* Effectively use the JSON module to load and parse JSON documents
* Read and access data stored in JSON and XML
* Compare  and contrast the JSON and XML as data interchange types


## XML

In [1]:
import xml.etree.ElementTree as ET

### Create an XML tree and retrieve the root tag.

In [2]:
#Your code here
finance_xml = ET.parse('nyc_2001_campaign_finance.xml')
root = finance_xml.getroot()

### How many direct descendents does the root tag have?

In [4]:
#Answer: 1
len(root)

1

### How many different types of tags are there within the entire XML file?

In [10]:
# Your code here
tags = []
for element in root.iter():
    tags.append(element.tag)
len(list(set(tags)))

13

### Create a DataFrame listing the number of each type of tag. 
Sort the DataFrame in descending order by the tag count. The first entry should demonstrate there are 286 row tags in the XML file.   
(Your DataFrame will be a single column, so could also be thought of as a Series.)

In [12]:
import pandas as pd
from collections import Counter 

In [16]:
#Your code here
tags_count = Counter(tags)
pd.DataFrame.from_dict(tags_count, orient='index').reset_index().sort_values(by=[0], ascending=False)

Unnamed: 0,index,0
1,row,286
2,candid,285
3,candname,285
5,canclass,285
6,election,284
7,officecd,284
9,primarypay,284
10,generalpay,284
11,runoffpay,284
12,totalpay,284


## JSON

### Open the same dataset from json

In [17]:
#Your code here
import json
f = open('nyc_2001_campaign_finance.json')
fin_json = json.load(f)
fin_json

{'meta': {'view': {'id': '8dhd-zvi6',
   'name': '2001 Campaign Payments',
   'attribution': 'Campaign Finance Board (CFB)',
   'averageRating': 0,
   'category': 'City Government',
   'createdAt': 1315950830,
   'description': 'A listing of public funds payments for candidates for City office during the 2001 election cycle',
   'displayType': 'table',
   'downloadCount': 1470,
   'hideFromCatalog': False,
   'hideFromDataJson': False,
   'indexUpdatedAt': 1536596254,
   'newBackend': False,
   'numberOfComments': 0,
   'oid': 4140996,
   'provenance': 'official',
   'publicationAppendEnabled': False,
   'publicationDate': 1371845179,
   'publicationGroup': 240370,
   'publicationStage': 'published',
   'rowClass': '',
   'rowsUpdatedAt': 1371845177,
   'rowsUpdatedBy': '5fuc-pqz2',
   'tableId': 932968,
   'totalTimesRated': 0,
   'viewCount': 233,
   'viewLastModified': 1536605717,
   'viewType': 'tabular',
   'columns': [{'id': -1,
     'name': 'sid',
     'dataTypeName': 'meta_data

### What is the root data type of the json file?

In [23]:
### Your code here
type(fin_json)

dict

In [37]:
fin_json['data']

[[1,
  'E3E9CC9F-7443-43F6-94AF-B5A0F802DBA1',
  1,
  1315925633,
  '392904',
  1315925633,
  '392904',
  '{\n  "invalidCells" : {\n    "1519001" : "TOTALPAY",\n    "1518998" : "PRIMARYPAY",\n    "1519000" : "RUNOFFPAY",\n    "1518999" : "GENERALPAY",\n    "1518994" : "OFFICECD",\n    "1518996" : "OFFICEDIST",\n    "1518991" : "ELECTION"\n  }\n}',
  None,
  'CANDID',
  'CANDNAME',
  None,
  'OFFICEBORO',
  None,
  'CANCLASS',
  None,
  None,
  None,
  None],
 [2,
  '9D257416-581A-4C42-85CC-B6EAD9DED97F',
  2,
  1315925633,
  '392904',
  1315925633,
  '392904',
  '{\n}',
  '2001',
  'B4',
  'Aboulafia, Sandy',
  '5',
  None,
  '44',
  'P',
  '45410.00',
  '0',
  '0',
  '45410.00'],
 [3,
  'B80D7891-93CF-49E8-86E8-182B618E68F2',
  3,
  1315925633,
  '392904',
  1315925633,
  '392904',
  '{\n}',
  '2001',
  '445',
  'Adams, Jackie R',
  '5',
  None,
  '7',
  'P',
  '11073.00',
  '0',
  '0',
  '11073.00'],
 [4,
  'BB012003-78F5-406D-8A87-7FF8A425EE3F',
  4,
  1315925633,
  '392904',
  1315

### Navigate to the 'data' key of your loaded json object. What data type is this?

In [26]:
#Your code here
type(fin_json['data'])

list

### Preview the first entry from the value returned by the 'data' key above.

In [34]:
#Your code here
fin_json['data'][0]

[1,
 'E3E9CC9F-7443-43F6-94AF-B5A0F802DBA1',
 1,
 1315925633,
 '392904',
 1315925633,
 '392904',
 '{\n  "invalidCells" : {\n    "1519001" : "TOTALPAY",\n    "1518998" : "PRIMARYPAY",\n    "1519000" : "RUNOFFPAY",\n    "1518999" : "GENERALPAY",\n    "1518994" : "OFFICECD",\n    "1518996" : "OFFICEDIST",\n    "1518991" : "ELECTION"\n  }\n}',
 None,
 'CANDID',
 'CANDNAME',
 None,
 'OFFICEBORO',
 None,
 'CANCLASS',
 None,
 None,
 None,
 None]

### Preview the Entry under meta -> view -> columns (the keys of three successively nested dictionaries)

In [41]:
fin_json['meta']['view']['columns']

[{'id': -1,
  'name': 'sid',
  'dataTypeName': 'meta_data',
  'fieldName': ':sid',
  'position': 0,
  'renderTypeName': 'meta_data',
  'format': {},
  'flags': ['hidden']},
 {'id': -1,
  'name': 'id',
  'dataTypeName': 'meta_data',
  'fieldName': ':id',
  'position': 0,
  'renderTypeName': 'meta_data',
  'format': {},
  'flags': ['hidden']},
 {'id': -1,
  'name': 'position',
  'dataTypeName': 'meta_data',
  'fieldName': ':position',
  'position': 0,
  'renderTypeName': 'meta_data',
  'format': {},
  'flags': ['hidden']},
 {'id': -1,
  'name': 'created_at',
  'dataTypeName': 'meta_data',
  'fieldName': ':created_at',
  'position': 0,
  'renderTypeName': 'meta_data',
  'format': {},
  'flags': ['hidden']},
 {'id': -1,
  'name': 'created_meta',
  'dataTypeName': 'meta_data',
  'fieldName': ':created_meta',
  'position': 0,
  'renderTypeName': 'meta_data',
  'format': {},
  'flags': ['hidden']},
 {'id': -1,
  'name': 'updated_at',
  'dataTypeName': 'meta_data',
  'fieldName': ':updated_at'

In [44]:
[i['name'] for i in fin_json['meta']['view']['columns']]

['sid',
 'id',
 'position',
 'created_at',
 'created_meta',
 'updated_at',
 'updated_meta',
 'meta',
 'ELECTION',
 'CANDID',
 'CANDNAME',
 'OFFICECD',
 'OFFICEBORO',
 'OFFICEDIST',
 'CANCLASS',
 'PRIMARYPAY',
 'GENERALPAY',
 'RUNOFFPAY',
 'TOTALPAY']

### Create a DataFrame from your json data
The previous two questions previewed one entry from the data object within the json file, as well as the column details associated with that data from the meta entry within the json file. Both should have 19 entries. Create a DataFrame of the data. Be sure to use the information from the meta entry to add appropriate column names to your DataFrame.

In [46]:
#Your code here
cols = [i['name'] for i in fin_json['meta']['view']['columns']]
json_df = pd.DataFrame(fin_json['data'],columns=cols)
json_df

Unnamed: 0,sid,id,position,created_at,created_meta,updated_at,updated_meta,meta,ELECTION,CANDID,CANDNAME,OFFICECD,OFFICEBORO,OFFICEDIST,CANCLASS,PRIMARYPAY,GENERALPAY,RUNOFFPAY,TOTALPAY
0,1,E3E9CC9F-7443-43F6-94AF-B5A0F802DBA1,1,1315925633,392904,1315925633,392904,"{\n ""invalidCells"" : {\n ""1519001"" : ""TOTA...",,CANDID,CANDNAME,,OFFICEBORO,,CANCLASS,,,,
1,2,9D257416-581A-4C42-85CC-B6EAD9DED97F,2,1315925633,392904,1315925633,392904,{\n},2001,B4,"Aboulafia, Sandy",5,,44,P,45410.00,0,0,45410.00
2,3,B80D7891-93CF-49E8-86E8-182B618E68F2,3,1315925633,392904,1315925633,392904,{\n},2001,445,"Adams, Jackie R",5,,7,P,11073.00,0,0,11073.00
3,4,BB012003-78F5-406D-8A87-7FF8A425EE3F,4,1315925633,392904,1315925633,392904,{\n},2001,HF,"Addabbo, Joseph P",5,,32,P,75350.00,73970.00,0,149320.00
4,5,945825F9-2F5D-47C2-A16B-75B93E61E1AD,5,1315925633,392904,1315925633,392904,{\n},2001,IR,"Alamo-Estrada, Agustin",5,,14,P,25000.00,2400.00,0,27400.00
5,6,9546F502-39D6-4340-B37E-60682EB22274,6,1315925633,392904,1315925633,392904,{\n},2001,BR,"Allen, William A",5,,9,P,62990.00,0,0,62990.00
6,7,4B6C74AD-17A0-4B7E-973A-2592D68A687D,7,1315925633,392904,1315925633,392904,{\n},2001,454,"Alleyne, Alithia",5,,40,P,0,0,0,0
7,8,ABD22A5E-B8DA-446F-82BC-93AA11AF99DF,8,1315925633,392904,1315925633,392904,{\n},2001,375,"Alonso, Miguel",5,,37,P,36121.00,680.00,0,36801.00
8,9,7CD36FB5-600F-44F5-A10C-CB3434B6805F,9,1315925633,392904,1315925633,392904,{\n},2001,241,"Andrews, Jr., Anthony D",5,,28,P,68580.00,20156.00,0,88736.00
9,10,7C977007-BC4A-4A04-AE0E-B2946AFE7193,10,1315925633,392904,1315925633,392904,{\n},2001,504,"Ariola, JoAnn",5,,32,P,0,69816.00,0,69816.00


### What's wrong with the first row of the DataFrame?

In [48]:
#Your code here
json_df.head(1)

Unnamed: 0,sid,id,position,created_at,created_meta,updated_at,updated_meta,meta,ELECTION,CANDID,CANDNAME,OFFICECD,OFFICEBORO,OFFICEDIST,CANCLASS,PRIMARYPAY,GENERALPAY,RUNOFFPAY,TOTALPAY
0,1,E3E9CC9F-7443-43F6-94AF-B5A0F802DBA1,1,1315925633,392904,1315925633,392904,"{\n ""invalidCells"" : {\n ""1519001"" : ""TOTA...",,CANDID,CANDNAME,,OFFICEBORO,,CANCLASS,,,,


The first header seems to be a partial duplicate of the headers

## Summary

Congratulations! You've started exploring some more complicated data structures used for the web and got to practice data munging and exploring!