# Working with JSON Files in Python
JSON (JavaScript Object Notation) files are lightweight and human-readable to store and exchange data. It is easy for machines to parse and generate these files and are based on the JavaScript programming language.

JSON files store data within {} similar to how a dictionary stores it in Python. But their major benefit is that they are language-independent, meaning they can be used with any programming language – be it Python, C or even Java!

This is how a JSON file looks:




# Serialization and Deserialization
Serialization: The process of converting an object into a special format which is suitable for transmitting over the network or storing in file or database is called Serialization.

Deserialization: It is the reverse of serialization. It converts the special format returned by the serialization back into a usable object.

In the case of JSON, when we serializing objects, we essentially convert a Python object into a JSON string and deserialization builds up the Python object from its JSON string representation.

Python provides a built-in module called json for serializing and deserializing objects. To use json module import it as follows:

# The json module mainly provides the following functions for serializing and deserializing.

#dump(obj, fileobj)
#dumps(obj)
#load(fileobj)
#loads(s)

In [None]:
#Load

# Deserializing with load()
The load() function deserializes the JSON object from the file like object and returns it.

Its syntax is as follows:

load(fp) -> a Python object

In [1]:
import json
import pandas as pd

# open json file
with open('C:\\Users\\hcluser1\\Data_J_X\\Marks.json','r') as file:
    data = json.load(file)

# json dictionary
print(type(data))

# loading into a DataFrame
df_json = pd.DataFrame(data)
df_json

<class 'list'>


Unnamed: 0,id,name,math,physics,chemistry
0,A001,RAHUL,60,66,61
1,A002,NIBTA,89,76,51
2,A003,SURESH,79,90,78


In [2]:
df_json.columns

Index(['id', 'name', 'math', 'physics', 'chemistry'], dtype='object')

# But you can even load the JSON file directly into a dataframe using the pandas.read_json() function as shown below:

In [4]:
# reading directly into a DataFrame usind pd.read_json()
path = 'C:\\Users\\hcluser1\\Data_J_X\\Marks.json'
df = pd.read_json(path)
df

Unnamed: 0,id,name,math,physics,chemistry
0,A001,RAHUL,60,66,61
1,A002,NIBTA,89,76,51
2,A003,SURESH,79,90,78


In [3]:
#Flattening nested list from JSON object

In [5]:
df = pd.read_json('C:\\Users\\hcluser1\\Data_J_X\\Marks_Details_Nested.json')
df

Unnamed: 0,school_name,class,students
0,ABC primary school,Year 1,"{'id': 'A001', 'name': 'RAHUL', 'math': 60, 'p..."
1,ABC primary school,Year 1,"{'id': 'A002', 'name': 'NIBITA', 'math': 89, '..."
2,ABC primary school,Year 1,"{'id': 'A003', 'name': 'SURESH', 'math': 79, '..."


# loads() 
function is as same as load() but instead of deserializing the JSON string from a file, it deserializes from a string.

In [6]:
import json
# load data using Python JSON module
with open('C:\\Users\\hcluser1\\Data_J_X\\Marks_Details_Nested.json','r') as f:
    data = json.loads(f.read())
# Flatten data
df_nested_list = pd.json_normalize(data, record_path =['students'])

df_nested_list

Unnamed: 0,id,name,math,physics,chemistry
0,A001,RAHUL,60,66,61
1,A002,NIBITA,89,76,51
2,A003,SURESH,79,90,78


In [17]:
# To include school_name and class
df_nested_list = pd.json_normalize(
    data, 
    record_path =['students'], 
    meta=['school_name', 'class']
)

df_nested_list

Unnamed: 0,id,name,math,physics,chemistry,school_name,class
0,A001,RAHUL,60,66,61,ABC primary school,Year 1
1,A002,NIBITA,89,76,51,ABC primary school,Year 1
2,A003,SURESH,79,90,78,ABC primary school,Year 1


# #json_string and parsed_json
 
 One of the most common task which we perform on JSON is to convert it to Python object. json library provides loads function to achieve it. Lets understand it with following example.In the below example we will take a JSON string (json_string) and convert it to Python object (parsed_json).

In [20]:
import json
# XML equival of json_string
""" 
<user>
    <first_name>Guido</first_name>
    <last_name>Rossum</last_name>
</user>
"""
json_string = '{"first_name": "Guido", "last_name":"Rossum"}'

parsed_json = json.loads(json_string)
print(type(parsed_json), parsed_json)

<class 'dict'> {'first_name': 'Guido', 'last_name': 'Rossum'}


In [None]:
# We have used loads function to convert JSON string to Python object. As parsed_json is a dictionary, 
#lets read its individual elements.

In [21]:
print(parsed_json['first_name'],
      parsed_json['last_name'])

Guido Rossum


In [24]:
#We can even traverse using for loop.
    
for j, v in parsed_json.items():
      print(j, "=>", v)

first_name => Guido
last_name => Rossum


In [7]:
#Import from URL

import pandas as pd

url = 'https://raw.githubusercontent.com/werowe/logisticRegressionBestModel/master/ct1.json'

dfct=pd.read_json(url,lines=True)

In [8]:
dfct

Unnamed: 0,state,postcode,street,district,unit,location,region,number,city
0,CT,6457,Country Club Rd,,,"{'type': 'Point', 'coordinates': [-72.7277847,...",Middlesex,1111,Middletown
1,CT,6037,Parish Dr,,,"{'type': 'Point', 'coordinates': [-72.7738706,...",Hartford,51,Berlin
2,CT,6037,Stockings Brook Rd,,,"{'type': 'Point', 'coordinates': [-72.8102478,...",Hartford,90,Berlin
3,CT,6037,Lamentation Dr,,,"{'type': 'Point', 'coordinates': [-72.7450054,...",Hartford,99,Berlin
4,CT,6037,Lamentation Dr,,,"{'type': 'Point', 'coordinates': [-72.7406975,...",Hartford,207,Berlin
...,...,...,...,...,...,...,...,...,...
95,CT,6037,Four Rod Rd,,,"{'type': 'Point', 'coordinates': [-72.7676758,...",Hartford,160,Berlin
96,CT,6037,Percival Ave,,,"{'type': 'Point', 'coordinates': [-72.7812195,...",Hartford,147,Berlin
97,CT,6037,Robindale Dr,,,"{'type': 'Point', 'coordinates': [-72.7849181,...",Hartford,131,Berlin
98,CT,6037,Sugar Maple Ln,,,"{'type': 'Point', 'coordinates': [-72.7689817,...",Hartford,43,Berlin


In [27]:
#Notice that Pandas did not unwind the location JSON object.

In [28]:
#We turn the elements in location into list and then construct a DataFrame from that


pd.DataFrame(list(dfct['location']))

Unnamed: 0,type,coordinates
0,Point,"[-72.7277847, 41.5692709]"
1,Point,"[-72.7738706, 41.6332836]"
2,Point,"[-72.8102478, 41.5992734]"
3,Point,"[-72.7450054, 41.5991937]"
4,Point,"[-72.7406975, 41.5992867]"
...,...,...
95,Point,"[-72.7676758, 41.6267272]"
96,Point,"[-72.7812195, 41.6267528]"
97,Point,"[-72.7849181, 41.6266921]"
98,Point,"[-72.7689817, 41.6266608]"


# Parse JSON from URL
You can get JSON objects directly from the web and convert them to python objects. This is done through an API endpoint

In [45]:
import json
import urllib.request

# download raw json object
url = "https://api.gdax.com/products/BTC-EUR/ticker"
data = urllib.request.urlopen(url).read().decode()

# parse json object
obj = json.loads(data)

# output some object attributes
print('$ ' + obj['price'])
print('$ ' + obj['volume'])

$ 47174.02
$ 1542.2826956


# Serializing with dump()
The dump() function is used to serialize data. It takes a Python object, serializes it and writes the output (which is a JSON string) to a file like object.

The syntax of dump() function is as follows:

Syntax: dump(obj, fp)

In [36]:
     import json

     person = {
         'first_name': "John",
         "isAlive": True,
         "age": 27,
         "address": {
             "streetAddress": "21 2nd Street",
             "city": "New York",
             "state": "NY",
             "postalCode": "10021-3100"
         },
         "hasMortgage": None
     }

 
     with open('C:\\Users\\hcluser1\\Data_J_X\\person.json', 'w') as f:  # writing JSON object
         json.dump(person, f)

“dump” function which directly writes the dictionary to a file in the form of JSON, without needing to convert it into an actual JSON object.

In [37]:
open('C:\\Users\\hcluser1\\Data_J_X\\person.json', 'r').read()   # reading JSON object as string

'{"first_name": "John", "isAlive": true, "age": 27, "address": {"streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021-3100"}, "hasMortgage": null}'

In [38]:
type(open('C:\\Users\\hcluser1\\Data_J_X\\person.json', 'r').read())

str

# dumps()
The dumps() function works exactly like dump() but instead of sending the output to a file-like object, it returns the output as a string.

In [39]:
     import json

     person = {
         'first_name': "John",
         "isAlive": True,
         "age": 27,
         "address": {
             "streetAddress": "21 2nd Street",
             "city": "New York",
             "state": "NY",
             "postalCode": "10021-3100"
         },
         "hasMortgage": None
     }

In [46]:
json_object = json.dumps(person)   # serialize

In [47]:
json_object

'{"first_name": "John", "isAlive": true, "age": 27, "address": {"streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021-3100"}, "hasMortgage": null}'

In [48]:
person = json.loads(json_object)  # deserialize from string

In [49]:
person

{'first_name': 'John',
 'isAlive': True,
 'age': 27,
 'address': {'streetAddress': '21 2nd Street',
  'city': 'New York',
  'state': 'NY',
  'postalCode': '10021-3100'},
 'hasMortgage': None}

In [44]:
type(person)

dict

In [50]:
# Writing to sample.json
with open("C:\\Users\\hcluser1\\Data_J_X\\sample.json", "w") as outfile:
    outfile.write(json_object)

# Convert JSon file to CSV


In [9]:
import pandas as pd
df = pd.read_json (r'C:\\Users\\hcluser1\\Data_J_X\\Marks.json')
export_csv = df.to_csv (r'C:\\Users\\hcluser1\\Data_J_X\\New_Marks_J_C_1.csv', index = None, header=True)

# Convert CSV file to JSON

In [51]:
import pandas as pd
df = pd.read_csv (r'C:\\Users\\hcluser1\\Data_J_X\\New_Marks.csv')
df.to_json (r'C:\\Users\\hcluser1\\Data_J_X\\Marks_C_J.json')

In [52]:
import pandas as pd
data = pd.read_csv("C:\\Users\\hcluser1\\Data_J_X\\New_Marks.csv", sep=",")
print(data.head(2))
with open('C:\\Users\\hcluser1\\Data_J_X\\New_Marks_C_J.json', 'w') as f:
    f.write(data.to_json(orient='records', lines=True))

# check
data = pd.read_json("C:\\Users\\hcluser1\\Data_J_X\\New_Marks_C_J.json", lines=True)
print(data.head(2))

     id   name  math  physics  chemistry
0  A001  RAHUL    60       66         61
1  A002  NIBTA    89       76         51
     id   name  math  physics  chemistry
0  A001  RAHUL    60       66         61
1  A002  NIBTA    89       76         51
