# Opening and writing to files

## Course: Programming and Data Management (EDI 3400)

### *Vegard H. Larsen (Department of Data Science and Analytics)*

## File paths

- To access a file on an operating system, you need a file path to that file. 

- The file path is a string that represents the location of a file. 

- The path consists of three parts:
    1. **Folder path**: the file folder location where the folders are separated by a forward slash / (Mac and Linux) or backslash \ (Windows)
    2. **File Name**: the name of the file
    3. **Extension**: the end of the file name after the . that indicate the file type, some examples are: 
        - .py for a Python script
        - .ipynb for a Jupyter Notebook
        - .txt for a text file
        - .db for a database file
        - .xlsx for a Excel file 

## Command-promt/terminal/system-commands

- `pwd` is a command that can be used to print the directory where you are currently standing
- `ls` lists the content of the working directory
- `less` show the content of a file

**Note:** *These commands are not Python code but 'system'-commands. You may have to run them in a code cell with no other input on the first line in the cell. You can run all types of 'system'-commands if you start a line with the !-symbol. Windows and Unix have different system commands, but some standard Unix commands like 'pwd', 'ls' and 'less' is built into the Jupyter Notebook.*

## Put the files we use in the same folder as the Notebook
- In Ed the files is stored together with the Notebook

In [1]:
pwd

'/Users/vegard/Github/Programming-and-data-management/Lecture_notebooks'

In [2]:
ls

01_Introduction.ipynb
02_Variables_expressions_and_statements.ipynb
03_Built_in_functions_and_containers.ipynb
04a_Conditional_execution_and_loops.ipynb
04b_Opening_and_writing_to_files.ipynb
a_new_play_copy.txt
[34mfiles[m[m/


In [3]:
less files/Hamlet.txt

In [4]:
!python files/testScript.py

This is an example of a Python script. This file is stored in the folder Lecture4


## The built in function `open()`

- Open a file and return a corresponding file object that allow us to iteract with the content in the file
- Syntax: `f = open('file', mode='rt')`
- Some possible values of mode:
    - `w` - open for writing
    - `r` - open for reading (default)
    - `b` - binary mode
    - `t` - text mode (default)

## Let's open `Hamlet.txt`

In [5]:
# The built-in function open, is opening the file
# We can access the file content using the reader variable

reader = open('files/Hamlet.txt')

- To see the content we have to read the now open file

In [6]:
# Read all the lines of Hamlet at the same time

text_of_hamlet = reader.readlines()

# Print the first five lines of Hamlet

text_of_hamlet[0:5]

['ACT I\n',
 '\n',
 'SCENE I. Elsinore. A platform before the castle.\n',
 '\n',
 'FRANCISCO at his post. Enter to him BERNARDO\n']

- Then we must close the file

In [7]:
reader.close() # We can no longer access the content of Hamlet.txt

In [8]:
reader.readlines()

ValueError: I/O operation on closed file.

## It's smart to use exception handling when interacting with files

In [9]:
# Make sure that the file is closed, even if something crashes

reader2 = open('files/Hamlet.txt')
try:
    text_hamlet2 = reader2.readlines()
except:
    print('Something went wrong!')
finally:
    reader2.close()

In [10]:
# Print the first five lines of Hamlet

text_hamlet2[0:5]

['ACT I\n',
 '\n',
 'SCENE I. Elsinore. A platform before the castle.\n',
 '\n',
 'FRANCISCO at his post. Enter to him BERNARDO\n']

In [11]:
reader2.readlines()

ValueError: I/O operation on closed file.

## The best way to interact with files is to use the `with` statement

* Is used to make exception handling cleaner and more readable
* Usually used when opening files that are stored on disk or on-line

In [12]:
with open('files/Hamlet.txt') as reader3:
    text_hamlet3 = reader3.readlines()

In [13]:
# Print the last five lines of Hamlet 
text_hamlet3[-5:]

['Speak loudly for him.\n',
 'Take up the bodies: such a sight as this\n',
 'Becomes the field, but here shows much amiss.\n',
 'Go, bid the soldiers shoot.\n',
 'A dead march. Exeunt, bearing off the dead bodies; after which a peal of ordnance is shot off']

In [14]:
#reader3.readlines()

## Writing to a file 

- This will overwrite the content of the file

In [21]:
less files/a_new_play.txt

In [16]:
our_play_text = 'ACT I\n\t\n\tSCENE I. In the course EDI3400.\n\t\n\tFRANCISCO is in his seat.\n'

In [17]:
our_play = our_play_text.split('\t')

In [18]:
our_play

['ACT I\n',
 '\n',
 'SCENE I. In the course EDI3400.\n',
 '\n',
 'FRANCISCO is in his seat.\n']

In [19]:
with open('a_new_play.txt', mode='wt') as writer:
    writer.writelines(our_play)

## Read a CSV (Comma Separated Values) file:

In [22]:
ls files

Hamlet.txt
a_new_play.txt
a_new_play_copy.txt
daily_media_consumption_norway_percentage.csv
example_sales.xlsx
hello_world.py
notebook_user_interface.png
testScript.py


In [23]:
# We use a library from the Standard Python Library to handle the csv file
# We will talk more about the Standard Python Library in lecture 5
import csv

# link to data https://www.ssb.no/en/statbank/table/04487
# now we will look at radio, tv and internet consumption in Norway

media_consumption = []
with open('files/daily_media_consumption_norway_percentage.csv', newline='') as csvfile:
    csv_reader = csv.reader(csvfile, delimiter=';')
    for row in csv_reader:
        media_consumption.append(row)

In [24]:
# The data is daily percentage media consumption reported by Norwegians
 
media_consumption

[['Date', ' Radio', ' Television', ' Internet'],
 ['1991', '71', '81', ''],
 ['1992', '65', '80', ''],
 ['1994', '67', '82', ''],
 ['1995', '66', '84', ''],
 ['1996', '59', '82', ''],
 ['1997', '61', '84', '7'],
 ['1998', '59', '83', '10'],
 ['1999', '60', '84', '18'],
 ['2000', '57', '82', '27'],
 ['2001', '56', '85', '35'],
 ['2002', '58', '85', '35'],
 ['2003', '58', '84', '42'],
 ['2004', '58', '83', '44'],
 ['2005', '55', '85', '55'],
 ['2006', '54', '83', '60'],
 ['2007', '53', '82', '66'],
 ['2008', '54', '80', '71'],
 ['2009', '53', '80', '73'],
 ['2010', '56', '82', '77'],
 ['2011', '55', '81', '80'],
 ['2012', '60', '77', '80'],
 ['2013', '59', '74', '85'],
 ['2014', '64', '74', '88'],
 ['2015', '59', '67', '87'],
 ['2016', '59', '67', '89'],
 ['2017', '54', '62', '90'],
 ['2018', '50', '60', '91'],
 ['2019', '48', '48', '90'],
 ['2020', '49', '48', '92'],
 ['2021', '47', '46', '93']]

## There are better ways of importing tabular data in Python

- We will use `pandas` for this later in the course

- The data in the list `media_consumption` is not easy to work with

- Let's give it a try and make four lists containing the data in numerical form

## Extracting the dates

In [25]:
# The number of rows/lines in the data set

len(media_consumption)

31

In [26]:
date = []          # This list will store the dates

for i in range(31-1):
    row = media_consumption[i+1]
    date.append(int(row[0]))

In [27]:
print(date)

[1991, 1992, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]


## Adding radio and TV consumption 

In [28]:
date = []     # this list will store the dates
radio = []    # this data will store the radio consumption
tv = []       # this data will store the tv consumption

for i in range(31-1):
    row = media_consumption[i+1]
    date.append(int(row[0]))
    radio.append(float(row[1]))
    tv.append(float(row[2]))

In [29]:
print(tv)

[81.0, 80.0, 82.0, 84.0, 82.0, 84.0, 83.0, 84.0, 82.0, 85.0, 85.0, 84.0, 83.0, 85.0, 83.0, 82.0, 80.0, 80.0, 82.0, 81.0, 77.0, 74.0, 74.0, 67.0, 67.0, 62.0, 60.0, 48.0, 48.0, 46.0]


## And the internet consumption ...

In [30]:
date = []     # this list will store the dates
radio = []    # this data will store the radio consumption
tv = []       # this data will store the tv consumption
internet = [] # this data will store the internet consumption

for i in range(31-1):
    row = media_consumption[i+1]
    date.append(int(row[0]))
    radio.append(float(row[1]))
    tv.append(float(row[2]))
    internet.append(float(row[3]))

ValueError: could not convert string to float: ''

In [32]:
# We can solve this by using a try statement

date = []     # this list will store the dates
radio = []    # this data will store the radio consumption
tv = []       # this data will store the tv consumption
internet = [] # this data will store the internet consumption

for i in range(31-1):
    row = media_consumption[i+1]
    date.append(int(row[0]))
    radio.append(float(row[1]))
    tv.append(float(row[2]))
    try:
        internet.append(float(row[3]))
    except ValueError:
        internet.append(None)   # or we can add zeros?

In [33]:
print(internet)

[None, None, None, None, None, 7.0, 10.0, 18.0, 27.0, 35.0, 35.0, 42.0, 44.0, 55.0, 60.0, 66.0, 71.0, 73.0, 77.0, 80.0, 80.0, 85.0, 88.0, 87.0, 89.0, 90.0, 91.0, 90.0, 92.0, 93.0]


## Is Radio consumption ever larger than TV consumption?

In [34]:
# Let's iterate through the data and chech whether 
# there are some years where radio consumption is 
# larger than tv consumption

for i in range(31-1):
    if radio[i] > tv[i]:
        print(date[i])

2020
2021


In [35]:
# What about internet consumption vs tv consumption

for i in range(31-1):
    try:
        if internet[i] > tv[i]:
            print(date[i])
    except:
        pass

2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
