# Reading and Writing Data Lab

### This lab introduces the idea of “persistent” programs that keep data in permanent storage, and shows how to use different kinds of permanent storage, like files.

### This lab will cover:
1. Filenames and paths
2. Reading and Writing CSV files using Pandas
3. Reading and Writing binary files using Pandas


## 1. Persistence



#### Most of the programs we have seen so far are transient in the sense that they run for a short time and produce some output, but when they end, their data disappears. If you run the program again, it starts with a clean slate.

#### Other programs are persistent: they run for a long time (or all the time); they keep at least some of their data in permanent storage (a hard drive, for example); and if they shut down and restart, they pick up where they left off.

#### Examples of persistent programs are operating systems, which run pretty much whenever a computer is on, and web servers, which run all the time, waiting for requests to come in on the network.

#### One of the simplest ways for programs to maintain their data is by reading and writing text files. We have already seen programs that read text files; in this chapter we will see programs that write them.

#### An alternative is to store the state of the program in a database. In this lab we will learn about a module, pickle, that makes it easy to store program data.

## 2. Filenames and paths

#### Files are organized into directories (also called “folders”). Every running program has a “current directory”, which is the default directory for most operations. For example, when you open a file for reading, Python looks for it in the current directory.

#### The os module provides functions for working with files and directories (“os” stands for “operating system”). os.getcwd returns the name of the current directory:

In [3]:
import os
cwd = os.getcwd()


In [4]:
cwd

'/content'

#### cwd stands for “current working directory”. The result in this example is '/content', which is the current directory being used by this lab.

## 3. Reading and writing CSV files with Pandas

#### A text file is a sequence of characters stored on a permanent medium like a hard drive, flash memory, or CD-ROM. 
#### To open an existing CSV file using Pandas we use the method read_csv
#### For instance, given the file "covid19_cases.csv" stored in the internet URL: https://raw.githubusercontent.com/thousandoaks/Python4DS103/main/data/covid19_cases.csv

In [5]:
import pandas as pd

In [12]:
covidDataFrame = pd.read_csv('https://raw.githubusercontent.com/thousandoaks/Python4DS103/main/data/covid19_cases.csv')

In [13]:
covidDataFrame.head()

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp,Cumulative_number_for_14_days_of_COVID-19_cases_per_100000
0,19/09/2020,19,9,2020,47,1,Afghanistan,AF,AFG,38041757.0,Asia,1.616645
1,18/09/2020,18,9,2020,0,0,Afghanistan,AF,AFG,38041757.0,Asia,1.535155
2,17/09/2020,17,9,2020,17,0,Afghanistan,AF,AFG,38041757.0,Asia,1.653446
3,16/09/2020,16,9,2020,40,10,Afghanistan,AF,AFG,38041757.0,Asia,1.708649
4,15/09/2020,15,9,2020,99,6,Afghanistan,AF,AFG,38041757.0,Asia,1.627159


#### To write an existing DataFrame using Pandas we use the method to_csv
#### For instance, given the DataFrame "covidLongFormatDataFrame" we can write it as a local file in the currrent working directory "./"


In [14]:
covidDataFrame.to_csv('./covidDataFrame.csv')

### We can recover the file from the current working directory back to our programme memory

In [16]:
yetAnotherCovidDataFrame = pd.read_csv('./covidDataFrame.csv')

In [17]:
yetAnotherCovidDataFrame.head()

Unnamed: 0.1,Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp,Cumulative_number_for_14_days_of_COVID-19_cases_per_100000
0,0,19/09/2020,19,9,2020,47,1,Afghanistan,AF,AFG,38041757.0,Asia,1.616645
1,1,18/09/2020,18,9,2020,0,0,Afghanistan,AF,AFG,38041757.0,Asia,1.535155
2,2,17/09/2020,17,9,2020,17,0,Afghanistan,AF,AFG,38041757.0,Asia,1.653446
3,3,16/09/2020,16,9,2020,40,10,Afghanistan,AF,AFG,38041757.0,Asia,1.708649
4,4,15/09/2020,15,9,2020,99,6,Afghanistan,AF,AFG,38041757.0,Asia,1.627159


## 4. Reading and writing Excel files with Pandas

#### To open an existing Excel using Pandas we use the method read_csv
#### For instance, given the Excel file "mini.xlsx" stored in the internet URL: https://github.com/thousandoaks/Python4DS103/blob/main/data/mini.xlsx?raw=true

In [19]:
excelDataFrame=pd.read_excel('https://github.com/thousandoaks/Python4DS103/blob/main/data/mini.xlsx?raw=true')  

In [20]:
excelDataFrame.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,project_title,region,funding_ratio,badge,tag,language,remark,start_date,end_date,date_remark,sentiment,anger,fear,joy,love,optimism
0,6976,6976,EasyTouch: Turn your world into a touch sensor,CA,3.8548,Backer,Hardware,en,"Hi! I had not receive my rewards util today, t...",2014-10-21 14:30:02,2014-11-20 15:30:12,2015-07-02 11:00:27,NEGATIVE,YES,NO,NO,NO,NO
1,6977,6977,EasyTouch: Turn your world into a touch sensor,CA,3.8548,Backer,Hardware,en,I have just received mine yesterday! Yipee! An...,2014-10-21 14:30:02,2014-11-20 15:30:12,2015-05-12 13:33:46,POSITIVE,NO,NO,YES,NO,YES
2,6978,6978,EasyTouch: Turn your world into a touch sensor,CA,3.8548,Backer,Hardware,en,"I haven't received mine. Winnipeg, Canada. For...",2014-10-21 14:30:02,2014-11-20 15:30:12,2015-05-11 16:11:59,NEUTRAL,NO,NO,NO,NO,NO
3,6979,6979,EasyTouch: Turn your world into a touch sensor,CA,3.8548,Backer,Hardware,en,When are you going to fix your website?,2014-10-21 14:30:02,2014-11-20 15:30:12,2015-05-07 23:15:33,NEUTRAL,NO,NO,NO,NO,NO
4,6980,6980,EasyTouch: Turn your world into a touch sensor,CA,3.8548,Superbacker,Hardware,en,"Just received mine on May 6th. Vancouver, Canada.",2014-10-21 14:30:02,2014-11-20 15:30:12,2015-05-07 01:46:24,NEUTRAL,NO,NO,NO,NO,NO


#### To write an existing DataFrame using Pandas we use the method to_excel
#### For instance, given the DataFrame "excelDataFrame" we can write it as a local file in the currrent working directory "./"


In [21]:
excelDataFrame.to_excel('./superFile.xlsx')

## 5. Reading and writing pickle files with Pandas
#### Pickle file format is a popular binary file format. Pickle files usually load faster and occupy less space than their CSV counterparts
#### We can save the previous DataFrame as a pickle file by way of the method to_pickle

In [22]:
covidDataFrame.to_pickle('./PickleFile.pkl')

#### and read it back again:

In [23]:
YetAnotherDataFrame=pd.read_pickle('./PickleFile.pkl')

In [24]:
YetAnotherDataFrame.head()

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp,Cumulative_number_for_14_days_of_COVID-19_cases_per_100000
0,19/09/2020,19,9,2020,47,1,Afghanistan,AF,AFG,38041757.0,Asia,1.616645
1,18/09/2020,18,9,2020,0,0,Afghanistan,AF,AFG,38041757.0,Asia,1.535155
2,17/09/2020,17,9,2020,17,0,Afghanistan,AF,AFG,38041757.0,Asia,1.653446
3,16/09/2020,16,9,2020,40,10,Afghanistan,AF,AFG,38041757.0,Asia,1.708649
4,15/09/2020,15,9,2020,99,6,Afghanistan,AF,AFG,38041757.0,Asia,1.627159
