# How to convert JSON into a Pandas DataFrame

##### Based on the article wrote by B.Chen (https://towardsdatascience.com/how-to-convert-json-into-a-pandas-dataframe-100b2ae1e0d8).

## Begin by reading simple JSON from a local file

In [4]:
import pandas as pd
df = pd.read_json('data/simple.json')
df

Unnamed: 0,id,name,math,physics,chemistry
0,A001,Tom,60,66,61
1,A002,James,89,76,51
2,A003,Jenny,79,90,78


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   id         3 non-null      object
 1   name       3 non-null      object
 2   math       3 non-null      int64 
 3   physics    3 non-null      int64 
 4   chemistry  3 non-null      int64 
dtypes: int64(3), object(2)
memory usage: 248.0+ bytes


## Reading simple JSON from a URL

In [6]:
URL = 'http://raw.githubusercontent.com/BindiChen/machine-learning/master/data-analysis/027-pandas-convert-json/data/simple.json'

df = pd.read_json(URL)
df

Unnamed: 0,id,name,math,physics,chemistry
0,A001,Tom,60,66,61
1,A002,James,89,76,51
2,A003,Jenny,79,90,78


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   school_name  3 non-null      object
 1   class        3 non-null      object
 2   students     3 non-null      object
dtypes: object(3)
memory usage: 200.0+ bytes


## Flattening nested list from JSON object

In [11]:
df = pd.read_json('data/nested_list.json')
df

Unnamed: 0,school_name,class,students
0,ABC primary school,Year 1,"{'id': 'A001', 'name': 'Tom', 'math': 60, 'phy..."
1,ABC primary school,Year 1,"{'id': 'A002', 'name': 'James', 'math': 89, 'p..."
2,ABC primary school,Year 1,"{'id': 'A003', 'name': 'Jenny', 'math': 79, 'p..."


In [12]:
import json
#load data using Python JSON module
with open('nested_list.json','r') as f:
    data = json.loads(f.read())

#Flatten data
df_nested_list = pd.json_normalize(data, record_path = ['students'])
df_nested_list

FileNotFoundError: [Errno 2] No such file or directory: 'nested_list.json'

The result looks great but doesn’t include school_name and class. To include them, we can use the argument meta to specify a list of metadata we want in the result.

In [13]:
# To include school_name and class
df_nested_list = pd.json_normalize(
    data, 
    record_path =['students'], 
    meta=['school_name', 'class']
)
df_nested_list

NameError: name 'data' is not defined

## Flattening nested list and dict from JSON object

Trying to make a read of this JSON with read_json() gives an 

In [14]:
#read_error = pd.read_json('data/nested_mix.json')

In [15]:
#Gotta use json_normalise()
import json
# load data using Python JSON module
with open('data/nested_mix.json','r') as f:
    data = json.loads(f.read())
    
# Normalizing data
df = pd.json_normalize(
    data, 
    record_path =['students'],
    meta=[
        'class',
        ['info', 'president'], 
        ['info', 'contacts', 'tel']
    ]
)
df

Unnamed: 0,id,name,math,physics,chemistry,class,info.president,info.contacts.tel
0,A001,Tom,60,66,61,Year 1,John Kasich,123456789
1,A002,James,89,76,51,Year 1,John Kasich,123456789
2,A003,Jenny,79,90,78,Year 1,John Kasich,123456789


## Extracting a single value from deeply nested JSON

Pandas json_normalize() can do most of the work when working with nested data from a JSON file. However, it flattens the entire nested data when your goal might actually be to extract one value.
How can we do that more effectively? The answer is using read_json with glom.

In [18]:
from glom import glom

df = pd.read_json('data/nested_deep.json')
df['students'].apply(lambda row: glom(row, 'grade.math'))

0    60
1    89
2    79
Name: students, dtype: int64

### Revision

Pandas read_json() function is a quick and convenient way for converting simple flattened JSON into a Pandas DataFrame. When dealing with nested JSON, we can use the Pandas built-in json_normalize() function.