# Change JSON to JSON-in-CSV Format
File format has to be strings of JSON arrays with newline separation

## Load libraries

In [12]:
import json
import csv
import os

## Load data

In [2]:
f = open('./rawJSON.json')
jsonString = json.load(f)

Loading JSON data with `json.load()` gives an array of JSON dictionaries.

In [10]:
print(jsonString[0])

{'_id': '622a246457385c7597960660', 'index': 0, 'guid': '3d628606-766f-4dfa-abff-e61750547b12', 'isActive': False, 'balance': '$3,405.09', 'picture': 'http://placehold.it/32x32', 'age': 24, 'eyeColor': 'brown', 'name': 'Terry Harvey', 'gender': 'male', 'company': 'TROLLERY', 'email': 'terryharvey@trollery.com', 'phone': '+1 (892) 485-3715', 'address': '919 High Street, Enetai, Virgin Islands, 3063', 'about': 'Consequat consectetur mollit nulla cupidatat. Est aliquip cupidatat mollit non in voluptate deserunt irure veniam voluptate amet reprehenderit est irure. Ullamco id ullamco eu deserunt consectetur. Veniam deserunt qui elit ipsum Lorem non do dolor commodo mollit do pariatur.\r\n', 'registered': '2017-08-14T08:43:35 +05:00', 'latitude': 68.666876, 'longitude': -130.039902, 'tags': ['id', 'anim', 'consequat', 'cupidatat', 'qui', 'pariatur', 'sint'], 'friends': [{'id': 0, 'name': 'Hillary Brooks'}, {'id': 1, 'name': 'Johns Vega'}, {'id': 2, 'name': 'Sexton Chambers'}], 'greeting': 'H

## Write to csv
With the JSON as an array of dictionaries, all we need to do is load the json array row by row, convert each row into a string, then write into a csv - with a few small adjustments.

For the CSV to interpret the JSON as a single string, we need to wrap it in double quotes.

For the CSV to interpret each JSON string as a new row, we use `os.linesep` to add a line break at the end of our JSON string.

For BigQuery to read the string, all quotes need to be double quotes.

In [59]:
with open('./csvJSON.csv','w', newline='',  encoding='utf-8') as csvFile:
    for i in jsonString:
        print(i)
        jsonStr = str(i)
        #Change Bools to string
        jsonStr = jsonStr.replace("True", "'True'").replace("False", "'False'")
        #Make all single quotes into double double quotes
        jsonStrQ = jsonStr.replace("'", '""')
        #Wrap everything a a set of single double quotes to show it is text
        csvFile.write('"' + jsonStrQ + '"' + os.linesep)
#Close the file connection to the original JSON
f.close()

{'_id': '622a246457385c7597960660', 'index': 0, 'guid': '3d628606-766f-4dfa-abff-e61750547b12', 'isActive': False, 'balance': '$3,405.09', 'picture': 'http://placehold.it/32x32', 'age': 24, 'eyeColor': 'brown', 'name': 'Terry Harvey', 'gender': 'male', 'company': 'TROLLERY', 'email': 'terryharvey@trollery.com', 'phone': '+1 (892) 485-3715', 'address': '919 High Street, Enetai, Virgin Islands, 3063', 'about': 'Consequat consectetur mollit nulla cupidatat. Est aliquip cupidatat mollit non in voluptate deserunt irure veniam voluptate amet reprehenderit est irure. Ullamco id ullamco eu deserunt consectetur. Veniam deserunt qui elit ipsum Lorem non do dolor commodo mollit do pariatur.\r\n', 'registered': '2017-08-14T08:43:35 +05:00', 'latitude': 68.666876, 'longitude': -130.039902, 'tags': ['id', 'anim', 'consequat', 'cupidatat', 'qui', 'pariatur', 'sint'], 'friends': [{'id': 0, 'name': 'Hillary Brooks'}, {'id': 1, 'name': 'Johns Vega'}, {'id': 2, 'name': 'Sexton Chambers'}], 'greeting': 'H

# Loading to BigQuery

The instructions use bq command line tool. Unfortunately, creating a schema with the command line tool from JSON is a pain, especially for those on windows. Bq neither pulls JSON in utf-8 format nor in newline JSON, which is a requirement for loading a schema to bigquery.

Therefore, it's much easier to create an empty table with ``create table if not exists `<project>.<dataset>.csv_json_dummy_users` (json JSON)``

This will give a single table, now we can use either the console or bq command line tool to upload date.

Command line tool is as follows: `bq load --replace=false <project>:<dataset>.csv_json_dummy_users ./csvJSON.csv`


In [8]:
from google.cloud import bigquery

#replace with your values 

project = 'project'
dataset = 'dataset'
table = 'csv_json_dummy_users'


# Construct a BigQuery client object.
client = bigquery.Client(project=project)

query = f'''
create table if not exists `{project}.{dataset}.{table}` (json JSON)
'''

client.query(query)


<google.cloud.bigquery.job.QueryJob at 0x2835f7bdb00>

Will return query job object if successful. Does not return data. 

Now we can upload the data as indicated above.

After upload, run the queries shown in the article.

In [20]:
import pandas

query = f'''
    select
        #Removing quotes
        json._id as quoted_json,
        json_value(json._id) as unquoted_json,
    from`{project}.{dataset}.{table}` 
  '''

query_df = client.query(query).to_dataframe()

In [23]:
query_df.head(5)

Unnamed: 0,quoted_json,unquoted_json
0,"""622a246457385c7597960660""",622a246457385c7597960660
1,"""622a2464b45b63d2d38e8315""",622a2464b45b63d2d38e8315
2,"""622a246429e1871909511ea1""",622a246429e1871909511ea1
3,"""622a246495e8b42ee77cf3e9""",622a246495e8b42ee77cf3e9
4,"""622a246455d1fcda294a7b38""",622a246455d1fcda294a7b38


In [27]:
query = f'''
    select
        #Parsing nested data
        json.friends as friends_json,
        json.friends.name, #returns null
        array((select f.name from unnest(json_query_array(json.friends)) as f)) as friends_array
    from`{project}.{dataset}.{table}` 
  '''

query_df = client.query(query).to_dataframe()

query_df.head(5)

Unnamed: 0,friends_json,name,friends_array
0,"[{""id"":0,""name"":""Hillary Brooks""},{""id"":1,""nam...",,"[""Hillary Brooks"", ""Johns Vega"", ""Sexton Chamb..."
1,"[{""id"":0,""name"":""Lillian Sims""},{""id"":1,""name""...",,"[""Lillian Sims"", ""Harvey Mcgowan"", ""Lindsey Co..."
2,"[{""id"":0,""name"":""Nelson Day""},{""id"":1,""name"":""...",,"[""Nelson Day"", ""Fran Knapp"", ""Luisa Jenkins""]"
3,"[{""id"":0,""name"":""Rodgers Nunez""},{""id"":1,""name...",,"[""Rodgers Nunez"", ""Jerry Ramos"", ""Schultz Schu..."
4,"[{""id"":0,""name"":""Jarvis Ball""},{""id"":1,""name"":...",,"[""Jarvis Ball"", ""Rhonda Ferguson"", ""Colon Kell..."


In [33]:
query = f'''
    select
  json_value(json._id) as id,
  json_value(json._foo) as foo,
  json_value(json.name) as name,
    
  #Parsing nested data
  array((select f.name from unnest(json_query_array(json.friends)) as f)) as friend_array

    from`{project}.{dataset}.{table}` 
order by 2 desc
  '''

query_df = client.query(query).to_dataframe()

query_df.head(10)

Unnamed: 0,id,foo,name,friend_array
0,,622a2464b45b63d2d38e8315,Talley Herring,[]
1,,622a246495e8b42ee77cf3e9,Mendez Gill,"[""Courtney Leblanc"", ""Camacho Emerson"", ""Wilco..."
2,,622a24646e014f88bb009a47,Mabel Yates,"[""Lola Bender"", ""Ethel Robertson"", ""Snow Myers""]"
3,,622a246457385c7597960660,Terry Harvey,"[""Hillary Brooks"", ""Johns Vega"", ""Sexton Chamb..."
4,,622a246455d1fcda294a7b38,Christy Walsh,"[""Lillian Sims"", ""Harvey Mcgowan"", ""Lindsey Co..."
5,,622a246429e1871909511ea1,Hammond Ellis,[]
6,,622a246404e82ab3113b7659,Silvia Wiggins,"[""Jarvis Ball"", ""Rhonda Ferguson"", ""Colon Kell..."
7,622a246457385c7597960660,,Terry Harvey,"[""Hillary Brooks"", ""Johns Vega"", ""Sexton Chamb..."
8,622a246455d1fcda294a7b38,,Christy Walsh,"[""Lillian Sims"", ""Harvey Mcgowan"", ""Lindsey Co..."
9,622a2464b45b63d2d38e8315,,Talley Herring,"[""Nelson Day"", ""Fran Knapp"", ""Luisa Jenkins""]"
