---
# Reading and Writing Data to Different Sources

Data are stored in many different ways. We will be discussing loading data into pandas and storing them into different file types.

---

In [1]:
import pandas as pd
import numpy as np
from IPython.display import display


In [2]:
# Function for printing a horizontal line. For display purpose
def printhr(s: str = None, n: int = 40):
    """Print a horizontal rule of the character "=" of length n.

    Args:
        s (str, optional): Header message. Defaults to None.
        n (int, optional): Number of characters. Defaults to 50.
    """

    if s:
        print("=" * int(n / 2), s, "=" * int(n / 2))
    else:
        print("=" * n)


---
## Comma-Separated Values - .csv

CSV is a plain text file where each column is separated by a delimiter (comma).

`.read_csv()` is used to load in a CSV as a DataFrame.  
`.to_csv()` is used to write the DataFrame into a CSV file.

---

In [3]:
# Load in csv as df and make the ResponseId the index
df = pd.read_csv("data/survey_results_public_2022.csv", index_col="ResponseId")

df.head(3)

Unnamed: 0_level_0,MainBranch,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,YearsCodePro,...,TimeSearching,TimeAnswering,Onboarding,ProfessionalTech,TrueFalse_1,TrueFalse_2,TrueFalse_3,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,None of these,,,,,,,,,,...,,,,,,,,,,
2,I am a developer by profession,"Employed, full-time",Fully remote,Hobby;Contribute to open-source projects,,,,,,,...,,,,,,,,Too long,Difficult,
3,"I am not primarily a developer, but I write co...","Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Friend or family member...,Technical documentation;Blogs;Programming Game...,,14.0,5.0,...,,,,,,,,Appropriate in length,Neither easy nor difficult,40205.0


In [4]:
# Write to csv

# Create new df
filt = df["Country"] == "Japan"
japan_df = df.loc[filt]

# Write to csv file. This will create a file named csv_file.csv
# inside a folder named new_files
japan_df.to_csv("new_files/csv_file.csv")

---
### Delimiters

Since CSVs are just plain text files, they can be delimited with different characters. This separator (delimiter) can be specified by any single character, the common ones being comma, tab, and colon.  

The **sep** parameter can be specified if the delimiter is other than a comma, both on reading (`.read_csv`) and writing (`.to_csv`). This defaults to a comma ( , ).

---

In [5]:
# Write to tab-separated value (TSV) file.
# TSV is variation of CSV. TSV uses tab as its delimiter.

# Create new df
filt = df["Country"] == "Germany"
germany_df = df.loc[filt]
display(germany_df.head(3))

# Write
germany_df.to_csv("new_files/tsv_file.tsv", sep="\t")


Unnamed: 0_level_0,MainBranch,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,YearsCodePro,...,TimeSearching,TimeAnswering,Onboarding,ProfessionalTech,TrueFalse_1,TrueFalse_2,TrueFalse_3,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6,"I am not primarily a developer, but I write co...","Student, full-time",,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Books / Physical media;School (i.e., Universit...",,,15,,...,,,,,,,,Appropriate in length,Easy,
26,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Other online resources ...,Technical documentation;Blogs;Written Tutorial...,Coursera;Udemy;Codecademy;edX;Udacity,16,9.0,...,30-60 minutes a day,60-120 minutes a day,Somewhat short,DevOps function;Microservices;Continuous integ...,Yes,No,Yes,Appropriate in length,Neither easy nor difficult,90647.0
49,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby;Contribute to open-source projects,Some college/university study without earning ...,Books / Physical media;Other online resources ...,Technical documentation;Blogs;Written Tutorial...,,40,25.0,...,15-30 minutes a day,Less than 15 minutes a day,Just right,Continuous integration (CI) and (more often) c...,Yes,No,Yes,Appropriate in length,Easy,106644.0


---
Loading with different separator

---

In [6]:
# Load in tab separated values (tsv) to a DataFrame
df = pd.read_csv("new_files/tsv_file.tsv", sep="\t", index_col="ResponseId")
display(df.head(3))

Unnamed: 0_level_0,MainBranch,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,YearsCodePro,...,TimeSearching,TimeAnswering,Onboarding,ProfessionalTech,TrueFalse_1,TrueFalse_2,TrueFalse_3,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6,"I am not primarily a developer, but I write co...","Student, full-time",,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Books / Physical media;School (i.e., Universit...",,,15,,...,,,,,,,,Appropriate in length,Easy,
26,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Other online resources ...,Technical documentation;Blogs;Written Tutorial...,Coursera;Udemy;Codecademy;edX;Udacity,16,9.0,...,30-60 minutes a day,60-120 minutes a day,Somewhat short,DevOps function;Microservices;Continuous integ...,Yes,No,Yes,Appropriate in length,Neither easy nor difficult,90647.0
49,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby;Contribute to open-source projects,Some college/university study without earning ...,Books / Physical media;Other online resources ...,Technical documentation;Blogs;Written Tutorial...,,40,25.0,...,15-30 minutes a day,Less than 15 minutes a day,Just right,Continuous integration (CI) and (more often) c...,Yes,No,Yes,Appropriate in length,Easy,106644.0


---
## Excel - .xlsx and .xls

Excel files are Microsoft's proprietery spreadsheet files. XLSX is the new Excel file format and can be read only by Excel 2007 and later. XLS is the older file format and can be read by all versions.  

`.read_excel()` is used to load in a CSV as a DataFrame.  
`.to_excel()` is used to write the DataFrame into a CSV file.

### Dependencies

Unlike CSV files, reading and writing excel files require additional package installs:  
`openpyxl` - for writing and reading xlsx, and write to xls*  
`xlrd` - for reading old xls  

pip supports multiple installs in 1 expression if you want to install both:  
`pip install openpyxl xlrd`

***Note**: xlwt support has been deprecated since 1.2.0 for writing XLS files. I can't find a replacement for writing XLS files and pandas docs doesn't mention XLS files. openpyxl and xlsxwriter can write to XLS file but is actually just an XLSX file saved with an .xls extension.

---

---
### Reading and Writing .xlsx

---

In [7]:
# Reading
excel_df = pd.read_excel("data/excel_new.xlsx", index_col=0)
display(excel_df.head())


Unnamed: 0_level_0,First Name,Last Name,Gender,Country,Age,Date,Id
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Dulce,Abril,Female,United States,32,15/10/2017,1562
2,Mara,Hashimoto,Female,Great Britain,25,16/08/2016,1582
3,Philip,Gent,Male,France,36,21/05/2015,2587
4,Kathleen,Hanner,Female,United States,25,15/10/2017,3549
5,Nereida,Magwood,Female,United States,58,16/08/2016,2468


In [8]:
# Writing
display(df.head(3))
df.to_excel("new_files/new_excel.xlsx")


Unnamed: 0_level_0,MainBranch,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,YearsCodePro,...,TimeSearching,TimeAnswering,Onboarding,ProfessionalTech,TrueFalse_1,TrueFalse_2,TrueFalse_3,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6,"I am not primarily a developer, but I write co...","Student, full-time",,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Books / Physical media;School (i.e., Universit...",,,15,,...,,,,,,,,Appropriate in length,Easy,
26,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Other online resources ...,Technical documentation;Blogs;Written Tutorial...,Coursera;Udemy;Codecademy;edX;Udacity,16,9.0,...,30-60 minutes a day,60-120 minutes a day,Somewhat short,DevOps function;Microservices;Continuous integ...,Yes,No,Yes,Appropriate in length,Neither easy nor difficult,90647.0
49,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby;Contribute to open-source projects,Some college/university study without earning ...,Books / Physical media;Other online resources ...,Technical documentation;Blogs;Written Tutorial...,,40,25.0,...,15-30 minutes a day,Less than 15 minutes a day,Just right,Continuous integration (CI) and (more often) c...,Yes,No,Yes,Appropriate in length,Easy,106644.0


---
### Reading and Writing .xls

When writing XLS files, we need to explicitly pass an **engine** argument. This can either be `openpyxl` or `xlsxwriter`.  

**Note**: xlwt support has been deprecated since 1.2.0 for writing XLS files. I can't find a replacement for writing XLS files and pandas docs doesn't mention XLS files. openpyxl and xlsxwriter can write to XLS file but is actually just an XLSX file saved with an .xls extension.

---

In [9]:
# Reading
excel_old_df = pd.read_excel("data/excel_old.xls", index_col=0)
display(excel_old_df.sample(5))


Unnamed: 0_level_0,First Name,Last Name,Gender,Country,Age,Date,Id
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
34,Judie,Claywell,Female,France,35,16/08/2016,7569
45,Weston,Martina,Male,United States,26,21/05/2015,6540
2,Mara,Hashimoto,Female,Great Britain,25,16/08/2016,1582
35,Dewitt,Borger,Male,United States,36,21/05/2015,8514
23,Many,Cuccia,Female,Great Britain,46,21/05/2015,5489


In [10]:
# Writing
display(df.head(3))
df.to_excel("new_files/old_excel22.xls", engine="xlsxwriter")


Unnamed: 0_level_0,MainBranch,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,YearsCodePro,...,TimeSearching,TimeAnswering,Onboarding,ProfessionalTech,TrueFalse_1,TrueFalse_2,TrueFalse_3,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6,"I am not primarily a developer, but I write co...","Student, full-time",,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Books / Physical media;School (i.e., Universit...",,,15,,...,,,,,,,,Appropriate in length,Easy,
26,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Other online resources ...,Technical documentation;Blogs;Written Tutorial...,Coursera;Udemy;Codecademy;edX;Udacity,16,9.0,...,30-60 minutes a day,60-120 minutes a day,Somewhat short,DevOps function;Microservices;Continuous integ...,Yes,No,Yes,Appropriate in length,Neither easy nor difficult,90647.0
49,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby;Contribute to open-source projects,Some college/university study without earning ...,Books / Physical media;Other online resources ...,Technical documentation;Blogs;Written Tutorial...,,40,25.0,...,15-30 minutes a day,Less than 15 minutes a day,Just right,Continuous integration (CI) and (more often) c...,Yes,No,Yes,Appropriate in length,Easy,106644.0


---
### Reading and Writing to Excel Sheets

`.read_excel()` and `.to_excel()` has a **sheet_name** parameter that allows it work with sheets. 

**read_excel's** sheet_name is used to load the specified sheet. **sheet_name** can take a str, int, list, or None, and defaults to 0 (first sheet).  
- Strings are used to read from the file's sheet name.  
- Integers are for sheet position (zero-indexed, and chart sheets do not count).  
- Lists of the combination of strs and ints are used to read from multiple sheets. If a list is passed, a dict of DataFrames will be returned where the passed list elements are the keys of the dict.
- None to read all sheets.  

**.to_excel's** sheet_name determines the sheet name to be created. Defaults to *Sheet1*



---

---
Loading sheets as DataFrames  

`excel_new.xlsx` has 3 sheets in the order:  
"Sheet1", "another sheet", "third_sheet3"

---

In [11]:
# Check sheet names using the sheet_name attribute of ExcelFile objects
xl_file = pd.ExcelFile("data/excel_new.xlsx")
xl_file.sheet_names


['Sheet1', 'another sheet', 'third_sheet3']

In [12]:
# Load 2nd sheet
excel_df = pd.read_excel("data/excel_new.xlsx", sheet_name="another sheet", index_col=0)
excel_df.head()

Unnamed: 0_level_0,First Name,Last Name,Gender,Country,Age,Date,Id
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Belinda,Partain,Female,United States,37,15/10/2017,2564
2,Holly,Eudy,Female,United States,52,16/08/2016,8561
3,Many,Cuccia,Female,Great Britain,46,21/05/2015,5489
4,Libbie,Dalby,Female,France,42,21/05/2015,5489
5,Lester,Prothro,Male,France,21,15/10/2017,6574


In [13]:
# Load 1st and 3rd sheets
excel_dfs = pd.read_excel("data/excel_new.xlsx", sheet_name=["Sheet1", 2], index_col=0)

# This creates a dict of DataFrames
display(excel_dfs)

{'Sheet1':    First Name  Last Name  Gender        Country  Age        Date    Id
 0                                                                     
 1       Dulce      Abril  Female  United States   32  15/10/2017  1562
 2        Mara  Hashimoto  Female  Great Britain   25  16/08/2016  1582
 3      Philip       Gent    Male         France   36  21/05/2015  2587
 4    Kathleen     Hanner  Female  United States   25  15/10/2017  3549
 5     Nereida    Magwood  Female  United States   58  16/08/2016  2468
 6      Gaston      Brumm    Male  United States   24  21/05/2015  2554
 7        Etta       Hurn  Female  Great Britain   56  15/10/2017  3598
 8     Earlean     Melgar  Female  United States   27  16/08/2016  2456
 9    Vincenza    Weiland  Female  United States   40  21/05/2015  6548
 10     Fallon    Winward  Female  Great Britain   28  16/08/2016  5486
 11    Arcelia     Bouska  Female  Great Britain   39  21/05/2015  1258
 12   Franklyn     Unknow    Male         France   38 

In [14]:
# Load individual DataFrames
xl_df1 = excel_dfs["Sheet1"]
xl_df2 = excel_dfs[2]

display(xl_df1.head(3))
printhr()
display(xl_df2.head(3))


Unnamed: 0_level_0,First Name,Last Name,Gender,Country,Age,Date,Id
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Dulce,Abril,Female,United States,32,15/10/2017,1562
2,Mara,Hashimoto,Female,Great Britain,25,16/08/2016,1582
3,Philip,Gent,Male,France,36,21/05/2015,2587




Unnamed: 0_level_0,First Name,Last Name,Gender,Country,Age,Date,Id
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Nena,Hacker,Female,United States,29,15/10/2017,8563
2,Kelsie,Wachtel,Female,France,27,16/08/2016,8642
3,Sau,Pfau,Female,United States,25,21/05/2015,9536


---
Specifying sheet name when writing Excel file.

---

In [15]:
# Let's use the previously loaded DataFrame and write it to an .xlsx file
display(excel_df.head(2))
printhr()

# Write into a sheet named "Sample Data"
excel_df.to_excel("new_files/excel_sheet.xlsx", sheet_name="Sample Data")

# Confirm
xl_file = pd.ExcelFile("new_files/excel_sheet.xlsx")
xl_file.sheet_names


Unnamed: 0_level_0,First Name,Last Name,Gender,Country,Age,Date,Id
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Belinda,Partain,Female,United States,37,15/10/2017,2564
2,Holly,Eudy,Female,United States,52,16/08/2016,8561




['Sample Data']

---------
## WIP FROM THIS POINT ON
--------

---
## Excel - .xlsx and .xls

Excel files are Microsoft's proprietery spreadsheet files. XLSX is the new Excel file format and can be read only by Excel 2007 and later. XLS is the older file format and can be read by all versions.  

`.read_excel()` is used to load in a CSV as a DataFrame.  
`.to_excel()` is used to write the DataFrame into a CSV file.

### Dependencies

Unlike CSV files, reading and writing excel files require additional package installs:  
`openpyxl` - for writing and reading xlsx, and write to xls*  
`xlrd` - for reading old xls  

pip supports multiple installs in 1 expression if you want to install both:  
`pip install openpyxl xlrd`

***Note**: xlwt support has been deprecated since 1.2.0 for writing XLS files. I can't find a replacement for writing XLS files and pandas docs doesn't mention XLS files. openpyxl and xlsxwriter can write to XLS file but is actually just an XLSX file saved with an .xls extension.

---

---
## JavaScript Notation Object - .json
 
JSON is a text-based file, where it can be an array or structured like Python's **dict** type. If structured like a **dict**, it contains key-value pairs where the keys are strings and the values can be a variety of data types.

`.read_json()` is used to load in a JSON as a DataFrame.  
`.to_json()` is used to write the DataFrame into a JSON file.

---

In [17]:
# Write JSON file. 
display(excel_df.head())

# We'll use the previously loaded excel_df as our DataFrame
json_df = excel_df

# Write to JSON using default parameters
json_df.to_json("new_files/json_file.json")

Unnamed: 0_level_0,First Name,Last Name,Gender,Country,Age,Date,Id
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Belinda,Partain,Female,United States,37,15/10/2017,2564
2,Holly,Eudy,Female,United States,52,16/08/2016,8561
3,Many,Cuccia,Female,Great Britain,46,21/05/2015,5489
4,Libbie,Dalby,Female,France,42,21/05/2015,5489
5,Lester,Prothro,Male,France,21,15/10/2017,6574


In [20]:
# Load JSON

# We'll use JSON we created earlier
json_df = pd.read_json("new_files/json_file.json")

# Load JSON using default parameters
display(json_df.head())

Unnamed: 0,First Name,Last Name,Gender,Country,Age,Date,Id
1,Belinda,Partain,Female,United States,37,2017-10-15,2564
2,Holly,Eudy,Female,United States,52,2016-08-16,8561
3,Many,Cuccia,Female,Great Britain,46,2015-05-21,5489
4,Libbie,Dalby,Female,France,42,2015-05-21,5489
5,Lester,Prothro,Male,France,21,2017-10-15,6574


---
### JSON Orientation

Data in JSON files can be structured different ways, so in order to help pandas parse it properly, an **orient** parameter is available both for reading and writing JSON.  

The **orient** is a str parameter that allows it to parse the expected JSON format. Values that can be passed:

    'split' : dict like {index -> [index], columns -> [columns], data -> [values]}
    'records' : list like [{column -> value}, ... , {column -> value}]
    'index' : dict like {index -> {column -> value}}
    'columns' : dict like {column -> {index -> value}}
    'values' : just the values array
    'table' : dict like {schema -> {schema}, data -> {data}}
              Describing the data, where data component is records

\
Available orientations vary according to the data being read/written:  

**Series**: split, records, index -- defaults to index  
**DataFrame**: split, records, index, columns, values, table -- defaults to columns  
**dicts**: split, records, index, columns, values

---

---
#### Exploring Orientations

Let us see how some of these orientations look like

---

In [30]:
# First write the JSON files with different orientations
json_df = excel_df.head(2)

# Write in "columns". Default DF orientation 
json_df.to_json("new_files/json_columns.json")
# "records" orientation
json_df.to_json("new_files/json_records.json", orient="records")

# "records" orientation, with lines parameter
json_df.to_json("new_files/json_records_lines.json", orient="records", lines=True)
# The lines parameter makes it so that each entry (row) is written on individual lines.
# This makes turn the array structure into individual dicts instead.
# i.e. from [entry1, entry2, entry3] to entry1 \n entry2 \n entry3

In [31]:
# Let us view these text files
jdf1 = pd.read_json("new_files/json_columns.json")
jdf2 = pd.read_json("new_files/json_records.json", orient="records")
jdf3 = pd.read_json("new_files/json_records_lines.json", orient="records", lines=True)
display(jdf1)
display(jdf2)
display(jdf3)

Unnamed: 0,First Name,Last Name,Gender,Country,Age,Date,Id
1,Belinda,Partain,Female,United States,37,2017-10-15,2564
2,Holly,Eudy,Female,United States,52,2016-08-16,8561


Unnamed: 0,First Name,Last Name,Gender,Country,Age,Date,Id
0,Belinda,Partain,Female,United States,37,2017-10-15,2564
1,Holly,Eudy,Female,United States,52,2016-08-16,8561


Unnamed: 0,First Name,Last Name,Gender,Country,Age,Date,Id
0,Belinda,Partain,Female,United States,37,2017-10-15,2564
1,Holly,Eudy,Female,United States,52,2016-08-16,8561


In [60]:
fpaths = [
    "new_files/json_columns.json",
    "new_files/json_records.json",
    "new_files/json_records_lines.json",
]
for fpath in fpaths:
    with open(fpath) as jf:
        x = jf.read()
        y = jf.readlines()
        print(y)
        # print(len(y))

    if fpath != fpaths[len(fpaths) - 1]:
        print()
        printhr()
        print()

[]


[]


[]


In [None]:
#TODO
# Parsing JSON Draft

# df preview
display(df.sample(3))




## Sample Series

# Series orient 1:  
# Sample DataFrame

csv
xlsx
json
    read json link: i.e. read_json(link)

https://raw.githubusercontent.com/CoreyMSchafer/code_snippets/master/Python/Flask_Blog/snippets/posts.json

sql db
