---
# Reading and Writing Data to Different Sources

In this section, we are going to use a time series data, `eth_1h.csv`, because the stackoverflow survey does not contain a datetime column.

---

In [2]:
import pandas as pd
import numpy as np
from IPython.display import display

In [3]:
# Function for printing a horizontal line. For display purpose
def printhr(s: str = None, n: int = 40):
    """Print a horizontal rule of the character "=" of length n.

    Args:
        s (str, optional): Header message. Defaults to None.
        n (int, optional): Number of characters. Defaults to 50.
    """

    if s:
        print("=" * int(n / 2), s, "=" * int(n / 2))
    else:
        print("=" * n)

---
## CSV

---

In [4]:
# # Load in csv as df and make the ResponseId the index 
df = pd.read_csv("data/survey_results_public_2022.csv", index_col="ResponseId")

df.head()

Unnamed: 0_level_0,MainBranch,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,YearsCodePro,...,TimeSearching,TimeAnswering,Onboarding,ProfessionalTech,TrueFalse_1,TrueFalse_2,TrueFalse_3,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,None of these,,,,,,,,,,...,,,,,,,,,,
2,I am a developer by profession,"Employed, full-time",Fully remote,Hobby;Contribute to open-source projects,,,,,,,...,,,,,,,,Too long,Difficult,
3,"I am not primarily a developer, but I write co...","Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Friend or family member...,Technical documentation;Blogs;Programming Game...,,14.0,5.0,...,,,,,,,,Appropriate in length,Neither easy nor difficult,40205.0
4,I am a developer by profession,"Employed, full-time",Fully remote,I don’t code outside of work,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Books / Physical media;School (i.e., Universit...",,,20.0,17.0,...,,,,,,,,Appropriate in length,Easy,215232.0
5,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Stack Overflow;O...,,8.0,3.0,...,,,,,,,,Too long,Easy,


In [5]:
# # Write to csv

# Create new df
filt = df["Country"] == "Japan"
japan_df = df.loc[filt]

# Write to csv file. This will create a file named csv_file.csv 
# inside a folder named new_files
# japan_df.to_csv("new_files/csv_file.csv")

---
Separated values with different separators.  
Separators can be specified by the **sep** parameter.

---

In [6]:
# # Write to csv

# Create new df
filt = df["Country"] == "Germany"
germany_df = df.loc[filt]

# Write to csv file. This will create a file named csv_file.csv 
# inside a folder named new_files
germany_df.to_csv("new_files/tsv_file.tsv", sep="\t")

---
Loading with different separator

---

In [7]:
# Load in tab separated values (tsv) by use of the sep parameter
df = pd.read_csv("new_files/tsv_file.tsv", sep="\t", index_col="ResponseId")
display(df)


Unnamed: 0_level_0,MainBranch,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,YearsCodePro,...,TimeSearching,TimeAnswering,Onboarding,ProfessionalTech,TrueFalse_1,TrueFalse_2,TrueFalse_3,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6,"I am not primarily a developer, but I write co...","Student, full-time",,,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Books / Physical media;School (i.e., Universit...",,,15,,...,,,,,,,,Appropriate in length,Easy,
26,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Other online resources ...,Technical documentation;Blogs;Written Tutorial...,Coursera;Udemy;Codecademy;edX;Udacity,16,9,...,30-60 minutes a day,60-120 minutes a day,Somewhat short,DevOps function;Microservices;Continuous integ...,Yes,No,Yes,Appropriate in length,Neither easy nor difficult,90647.0
49,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby;Contribute to open-source projects,Some college/university study without earning ...,Books / Physical media;Other online resources ...,Technical documentation;Blogs;Written Tutorial...,,40,25,...,15-30 minutes a day,Less than 15 minutes a day,Just right,Continuous integration (CI) and (more often) c...,Yes,No,Yes,Appropriate in length,Easy,106644.0
50,I am a developer by profession,"Employed, full-time",Fully remote,Hobby,"Secondary school (e.g. American high school, G...","Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Stack Overflow,,7,4,...,30-60 minutes a day,15-30 minutes a day,Somewhat long,Continuous integration (CI) and (more often) c...,No,No,Yes,Appropriate in length,Easy,51192.0
60,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Written Tutorial...,Codecademy,4,2,...,Over 120 minutes a day,30-60 minutes a day,Somewhat long,None of these,No,No,No,Too short,Easy,63986.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
73186,I am a developer by profession,"Employed, full-time",Fully remote,Hobby;Contribute to open-source projects,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",Books / Physical media;Other online resources ...,Technical documentation;Blogs;Stack Overflow;O...,,12,11,...,,,,,,,,Appropriate in length,Neither easy nor difficult,
73199,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby;Contribute to open-source projects,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)","Other online resources (e.g., videos, blogs, f...",Other (Please specify):,,12,3,...,Less than 15 minutes a day,Less than 15 minutes a day,Somewhat long,None of these,No,No,Yes,Appropriate in length,Easy,
73204,I code primarily as a hobby,"Student, part-time;Not employed, and not looki...",,,"Secondary school (e.g. American high school, G...",Books / Physical media;Other online resources ...,Written Tutorials;Stack Overflow;Online books;...,,3,,...,,,,,,,,Appropriate in length,Easy,
73248,I code primarily as a hobby,I prefer not to say,,,Something else,"Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Written Tutorial...,Other,4,,...,,,,,,,,Appropriate in length,Easy,


---
## Excel
need xlwt openpyxl xlrd  
xlwt - for writing old xls
xlrd - for reading old xls  
openpyxl - writing and reading xlsx

---

In [8]:
japan_df.to_excel("new_files/excel_file.xlsx")