<a href="https://colab.research.google.com/github/thousandoaks/Python4DS103/blob/main/labs/Reading_Writing_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Reading and Writing Data Lab

### This lab introduces the idea of “persistent” programs that keep data in permanent storage, and shows how to use different kinds of permanent storage, like files.

### This lab will cover:
1. Filenames and paths
2. Reading and Writing CSV files using Pandas
3. Reading and Writing binary files using Pandas


## 1. Persistence



#### Most of the programs we have seen so far are transient in the sense that they run for a short time and produce some output, but when they end, their data disappears. If you run the program again, it starts with a clean slate.

#### Other programs are persistent: they run for a long time (or all the time); they keep at least some of their data in permanent storage (a hard drive, for example); and if they shut down and restart, they pick up where they left off.

#### Examples of persistent programs are operating systems, which run pretty much whenever a computer is on, and web servers, which run all the time, waiting for requests to come in on the network.

#### One of the simplest ways for programs to maintain their data is by reading and writing text files. We have already seen programs that read text files; in this chapter we will see programs that write them.

#### An alternative is to store the state of the program in a database. In this lab we will learn about a module, pickle, that makes it easy to store program data.

## 2. Filenames and paths

#### Files are organized into directories (also called “folders”). Every running program has a “current directory”, which is the default directory for most operations. For example, when you open a file for reading, Python looks for it in the current directory.

#### The os module provides functions for working with files and directories (“os” stands for “operating system”). os.getcwd returns the name of the current directory:

In [3]:
import os
cwd = os.getcwd()


In [4]:
cwd

'/content'

#### cwd stands for “current working directory”. The result in this example is '/content', which is the current directory being used by this lab.

## 3. Reading and writing CSV files with Pandas

#### A text file is a sequence of characters stored on a permanent medium like a hard drive, flash memory, or CD-ROM. 
#### To open an existing file using Pandas we use the method read_csv
#### For instance, given the file "covid19_cases.csv" stored in the local folder "../data/":


In [5]:
import pandas as pd

In [9]:
covidLongFormatDataFrame = pd.read_csv('./covid19_cases.csv')

FileNotFoundError: ignored

In [None]:
covidLongFormatDataFrame.head()

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp,Cumulative_number_for_14_days_of_COVID-19_cases_per_100000
0,19/09/2020,19,9,2020,47,1,Afghanistan,AF,AFG,38041757.0,Asia,1.616645
1,18/09/2020,18,9,2020,0,0,Afghanistan,AF,AFG,38041757.0,Asia,1.535155
2,17/09/2020,17,9,2020,17,0,Afghanistan,AF,AFG,38041757.0,Asia,1.653446
3,16/09/2020,16,9,2020,40,10,Afghanistan,AF,AFG,38041757.0,Asia,1.708649
4,15/09/2020,15,9,2020,99,6,Afghanistan,AF,AFG,38041757.0,Asia,1.627159


#### To write an existing DataFrame using Pandas we use the method to_csv
#### For instance, given the DataFrame "covidLongFormatDataFrame" we can write it as a local file in the folder "../data/"


In [None]:
covidLongFormatDataFrame.to_csv('/home/jovyan/data/NewFile.csv')

## 4. Reading and writing pickle files with Pandas
#### Pickle file format is an alternative to CSV files. Pickle files usually load faster and occupy less space than their CSV counterparts
#### We can save the previous DataFrame as a pickle file by way of the method to_pickle

In [None]:
covidLongFormatDataFrame.to_pickle('/home/jovyan/data/PickleFile.pkl')

#### and read it back again:

In [None]:
pd.read_pickle('/home/jovyan/data/PickleFile.pkl')

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp,Cumulative_number_for_14_days_of_COVID-19_cases_per_100000
0,19/09/2020,19,9,2020,47,1,Afghanistan,AF,AFG,38041757.0,Asia,1.616645
1,18/09/2020,18,9,2020,0,0,Afghanistan,AF,AFG,38041757.0,Asia,1.535155
2,17/09/2020,17,9,2020,17,0,Afghanistan,AF,AFG,38041757.0,Asia,1.653446
3,16/09/2020,16,9,2020,40,10,Afghanistan,AF,AFG,38041757.0,Asia,1.708649
4,15/09/2020,15,9,2020,99,6,Afghanistan,AF,AFG,38041757.0,Asia,1.627159
...,...,...,...,...,...,...,...,...,...,...,...,...
43713,25/03/2020,25,3,2020,0,0,Zimbabwe,ZW,ZWE,14645473.0,Africa,
43714,24/03/2020,24,3,2020,0,1,Zimbabwe,ZW,ZWE,14645473.0,Africa,
43715,23/03/2020,23,3,2020,0,0,Zimbabwe,ZW,ZWE,14645473.0,Africa,
43716,22/03/2020,22,3,2020,1,0,Zimbabwe,ZW,ZWE,14645473.0,Africa,


## 5. Reading and writing pickle files using Python
#### The built-in function open takes the name of the file as a parameter and returns a file object you can use to read the file.



#### For instance to open a file named "words.txt" stored in /home/jovyan/data/ 

In [None]:


fin = open('/home/jovyan/data/words.txt')

In [None]:
## We read one line 
fin.readline()

'aa\n'

In [None]:
## We read another line 
fin.readline()

'aah\n'

In [None]:
## And another line 
fin.readline()

'aahed\n'

#### You can also use a file object as part of a for loop. This program reads words.txt and prints each word, one per line:

In [None]:
fin = open('/home/jovyan/data/words.txt')
for line in fin:
    word = line.strip()
    if word=='abbey':
        break
    print(word)

aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
aardvark
aardvarks
aardwolf
aardwolves
aas
aasvogel
aasvogels
aba
abaca
abacas
abaci
aback
abacus
abacuses
abaft
abaka
abakas
abalone
abalones
abamp
abampere
abamperes
abamps
abandon
abandoned
abandoning
abandonment
abandonments
abandons
abas
abase
abased
abasedly
abasement
abasements
abaser
abasers
abases
abash
abashed
abashes
abashing
abasing
abatable
abate
abated
abatement
abatements
abater
abaters
abates
abating
abatis
abatises
abator
abators
abattis
abattises
abattoir
abattoirs
abaxial
abaxile
abbacies
abbacy
abbatial
abbe
abbes
abbess
abbesses
