# 07 Python: Reading and Writing Files

Many data processing tasks require reading and writing to files. `open()` is a built-in function for creating, writing and reading files.  This function takes two parameters; `filename`, and `mode`.  N.B. there are other optional paramters, which we can see by typing `help(open)`


# Reading text files into Python

In [2]:
infile = open("input.txt", "r")
print(infile)
print(type(infile))

<_io.TextIOWrapper name='input.txt' mode='r' encoding='UTF-8'>
<class '_io.TextIOWrapper'>


`open()` returns a *file object* which we have saved to the variable called `infile`.  Common naming conventions for file input/file output objects are `fin` and `fout`, respectively. This object has methods to read and write data. To read the entire contents of the file, use the `read` method.  Notice that the `read` method returns a *string*:

In [3]:
this = infile.read()
print(type(this))
print(this)

<class 'str'>
1
2
3
4
5
6
7
8
9
10



Alternatively we can specify the the number characters you want to return as an argument:

In [10]:
infile = open("input.txt", "r")
infile.read(4) # if we use 'print(infile.read)' the \n will be represented as newlines

'1\n2\n'

`read` will keep track of where we are in the file so if we call it again it will read the *next* 4 characters.  We can repeatedly call this function until there are no more characters to be read:

In [11]:
print(infile.read(4)) # since we are using print() the \n get converted to new lines.

3
4



In [12]:
print(infile.read(4)) 
print(infile.read(4))
print(infile.read(5)) # this reaches to the end of the file

5
6

7
8

9
10



In [50]:
infile.read(1) # there are no characters left to be read

''

Alternatively we can specify the the number characters you want to return as an argument:

In [51]:
infile = open("input.txt", "r")
print(infile.readline())

1



As before, `readline` will keep track of where we are in the file until there are no more lines to be read. 

In [52]:
print(infile.readline()) # 2
print(infile.readline()) # 3
print(infile.readline()) # 4
print(infile.readline()) # 5
print(infile.readline()) # 6
print(infile.readline()) # 7
print(infile.readline()) # 8
print(infile.readline()) # 9
print(infile.readline()) # 10
print(infile.readline()) # no more lines to be read

2

3

4

5

6

7

8

9

10




In [53]:
infile.readline() # nothing left to print

''

It is good practice to close a file once you no longer need it to free up resources.  Once the file is closed, any further attempts to use the file object

In [55]:
fin = open("input.txt", "r") 
print(fin.read(5))
fin.close() # close the file
#fin.read(5)# we will no longer be able to access this file once closed

1
2
3


Notice how we can save the entire file as a string object to a variable.  If we do that, we can access the information regardless of whether or not the file object has been closed:

In [1]:
fin = open("input.txt", "r") 
val = fin.read() # read file as one string
fin.close()
print(val)

1
2
3
4
5
6
7
8
9
10



To get the first 4 characters for characters in `val` we need to use slicing as we would for any other string.

In [61]:
print(val[0:4]) # first 4 characters
print(val[4:8]) # next 4 characters

1
2

3
4



You can check whether the file is close using `<filename>.closed` which returns `True` or `False`.

Notice how we can save the entire file as a string object to a variable.  If we do that, we can access the information regardless of whether or not the file object has been closed:

In [58]:
fin = open("input.txt", "r") 
val = fin.read() # read file as one string
fin.close()
print(val)

1
2
3
4
5
6
7
8
9
10



To get the first 4 characters for characters in `val` we need to use slicing as we would for any other string.

In [83]:
print(fin.closed)

True


In [61]:
print(val[0:4]) # first 4 characters
print(val[4:8]) # next 4 characters

1
2

3
4



## Iteration with files

Alternatively, we could read every line using the `readlines` method.  Notice that this method returns a Python list.  We know that we can iterate through a Python list in a `for` loop:

In [62]:
fin = open("input.txt", "r")
alllines = fin.readlines()
print(type(alllines))
print(alllines) 

<class 'list'>
['1\n', '2\n', '3\n', '4\n', '5\n', '6\n', '7\n', '8\n', '9\n', '10\n']


In [65]:
for lines in alllines:
    print(lines)

1

2

3

4

5

6

7

8

9

10



We may have noticed how we get an extra whitespace between lines.nl
This is because the  `print` function has a newline character printed by default, so we are actually printing `1\n\n` for the first line for example.  To remedy this, we can use the `strip` function.  
[strip()](https://www.geeksforgeeks.org/python-string-strip-2/) removes any characters specified as arguments from both left and right of a string.  If no argument is specified, then all whitespace from starting from the left (resp. right) is removed until we reach the first non-match.


In [67]:
# example ussage of strip()
print("acabbaaccbbcac".strip("ac"))

bbaaccbb


In [69]:
for lines in alllines:
    print(lines.strip('\n'))

1
2
3
4
5
6
7
8
9
10


In [13]:
fin = open("input.txt", "r")
for x in fin:
    print(x)

1

2

3

4

5

6

7

8

9

10



In [14]:
fin = open("input.txt", "r")
for x in fin:
    line = x.strip('\n')
    print(line)
    if (int(line) == 4):
        break
fin.close()

1
2
3
4


In [75]:
fin = open("input.txt", "r")
for x in fin:
    print(x.strip('\n'))
    if '4' in x:
        break
fin.close()

1
2
3
4


### for and while loops for iterations
We can accomplish similar things using either the `for` or `while` loops:

In [78]:
# iterate through input.txt line by line using a for loop:
infile = open("input.txt", "r")
for line in infile:
    print(line.strip('\n'))
infile.close()

1
2
3
4
5
6
7
8
9
10


In [79]:
# iterate through input.txt line by line using a while loop:
infile = open("input.txt", "r")
line = infile.readline()
while line != "":
    line = infile.readline()
    print(line.strip('\n'))
infile.close()


2
3
4
5
6
7
8
9
10



## with for closing files
To avoid writing programs which forget to close the file, we could also use [`with`](https://effbot.org/zone/python-with-statement.htm). The `with` statement will automatically close the file after the suite is exited.  Hence we never have to write `infile.close()`.

In [84]:
# The following will auto-close file
with open("input.txt", "r") as infile:
    for line in infile:
        print(line.strip('\n'))
print(infile.closed)

1
2
3
4
5
6
7
8
9
10
True


# Writing Text to a File

Selecting the write `'w'` mode will allow us to *write* (rather than read) text to a file.

In [6]:
outfile = open("output.txt", "w")
# writes the numbers 1 through 10 on new lines
for n in range(1,11):
    # notice how we need to convert numbers to strings before we write them to files
    outfile.write(str(n)+"\n")

**Warning 1**: The above method will try to overwrite the file {\tt output.txt} if it exists, otherwise, the file will be created.<br>
**Warning 2**: The contents are not written to file until we close it. <br>
**Warning 3**: Numbers need to be converted to strings before writing.

In [7]:
# not written to final until we run the following:
outfile.close() 

The preferred method is to use the `with` clause: (this way it is impossible for us to forget to close the file)

In [19]:
with open('output.txt', 'w') as fout:
    for n in range(1,11):
        fout.write(str(n) + "\n")

Notice how `output.txt` gets overwritten after the following chunk is run:

In [91]:
with open("output.txt", "w") as f:
    f.write("Test")

### creating an empty file:
There is a few different ways we can do this:

In [89]:
# creates an empty file with "pass"
with open("test2.txt", "w") as f:
    pass

In [90]:
# creats an empty file named "filename.txt"
open("filename.txt", 'w').close()

Like the `read` functions, `write` will remember its place within the document and will pick up where it left off:

In [92]:
with open("test.txt", "w") as fout:
    fout.write("Test")
    fout.write("Test again")

In [94]:
# this overwrites the file created above
with open("test.txt", "w") as fout:
    fout.write("Test\n")
    fout.write("Test again")

## appending to an existing file
To add text to the end of an exisiting file we need to use `open` with the mode `a` (for append).

In [8]:
outfile = open("output.txt", "a")
for n in range(11,20):
    outfile.write(str(n) + "\n")
outfile.close() 


In [9]:
# if newfile.txt doesn't exist already, the following code will create it
outfile = open("newfile.txt", "a")
for n in range(11,20):
    outfile.write(str(n) + "\n")
outfile.close() 

## reading CSV files
A common type of file you may want to read into your program is a comma separated value (CSV) file.
We can read csv files by iterating over the file object and using `strip` and `split`.

In [10]:
with open("data.csv", "r") as infile:
    for line in infile:
        #print(line)
        line = line.strip(" \n")
        #print(fields)
        fields = line.split(",")
        #for i in range(0,len(fields)):
            #fields[i] = fields[i].strip()
        #print(fields)

In [11]:
fields

['1.7702', '1.1211', '-0.6032', '-0.6982', '0.4066']

* `line` is a string (with the end of line character `\n` removed)
* `fields` is a list containing the individual cell values for the corresponding row.


In [103]:
print(type(line))
print(line)
print(type(fields))
print(fields)

<class 'str'>
1.7702,1.1211,-0.6032,-0.6982,0.4066
<class 'list'>
['1.7702', '1.1211', '-0.6032', '-0.6982', '0.4066']


Alternatively, you could use the `csv` module for reading in CSV files.  In an effort to reduce the number of lines printed, lets filter the results to only rows having the first element greater than 1.  Remember that each element in this list is currently being treated as a string.  Before we do any calculations on this numeric values, we need to convert them using `float`.

In [3]:
import csv
# only print the rows that start with a number> 1 
with open("data.csv", "r") as infile:
    csvfile = csv.reader(infile)
    for row in csvfile:
        if float(row[0]) > 1:
            print(row)

['2.3358', '0.0044', '0.3163', '0.8698', '1.4817']
['3.1387', '-0.1494', '1.1793', '2.1482', '-0.2141']
['1.3228', '0.1419', '0.6433', '2.5167', '0.9105']
['2.9412', '-1.9048', '-1.328', '0.3225', '-0.2039']
['1.5379', '2.7226', '-0.0049', '-3.8528', '-0.4739']
['1.3476', '-0.0039', '-0.8244', '0.2143', '0.0362']
['1.0519', '-1.3884', '1.0226', '-1.0947', '1.3978']
['1.7878', '1.8082', '-0.694', '0.6162', '-0.9046']
['2.1175', '0.8949', '-1.765', '0.6082', '0.8375']
['2.653', '0.0148', '0.4559', '-0.0419', '1.2743']
['3.008', '-0.271', '0.4868', '0.4959', '0.1369']
['1.896', '1.0309', '1.1718', '2.3715', '1.6846']
['1.1017', '-0.5897', '-0.3399', '1.2663', '1.6784']
['1.0155', '-0.2549', '1.2958', '0.6724', '0.484']
['1.1547', '0.2841', '0.3959', '-0.2621', '1.2498']
['1.2275', '2.3317', '-1.3622', '-0.9929', '-1.5922']
['1.9564', '-0.6527', '0.4776', '1.3519', '-0.9619']
['2.0049', '-0.6503', '0.0042', '-0.3649', '1.1627']
['1.7067', '-0.4797', '-0.2498', '1.1692', '0.5081']
['2.8036'

In [4]:
print(type(csvfile))
print(type(row))
print(row)

<class '_csv.reader'>
<class 'list'>
['1.7702', '1.1211', '-0.6032', '-0.6982', '0.4066']


In [5]:
# only print the rows that start with a number> 1 
# save it to a new list
data = []
with open("data.csv", "r") as infile:
    csvfile = csv.reader(infile)
    for row in csvfile:
        if float(row[0]) > 1:
            data.append(row)
            
print(data)

[['2.3358', '0.0044', '0.3163', '0.8698', '1.4817'], ['3.1387', '-0.1494', '1.1793', '2.1482', '-0.2141'], ['1.3228', '0.1419', '0.6433', '2.5167', '0.9105'], ['2.9412', '-1.9048', '-1.328', '0.3225', '-0.2039'], ['1.5379', '2.7226', '-0.0049', '-3.8528', '-0.4739'], ['1.3476', '-0.0039', '-0.8244', '0.2143', '0.0362'], ['1.0519', '-1.3884', '1.0226', '-1.0947', '1.3978'], ['1.7878', '1.8082', '-0.694', '0.6162', '-0.9046'], ['2.1175', '0.8949', '-1.765', '0.6082', '0.8375'], ['2.653', '0.0148', '0.4559', '-0.0419', '1.2743'], ['3.008', '-0.271', '0.4868', '0.4959', '0.1369'], ['1.896', '1.0309', '1.1718', '2.3715', '1.6846'], ['1.1017', '-0.5897', '-0.3399', '1.2663', '1.6784'], ['1.0155', '-0.2549', '1.2958', '0.6724', '0.484'], ['1.1547', '0.2841', '0.3959', '-0.2621', '1.2498'], ['1.2275', '2.3317', '-1.3622', '-0.9929', '-1.5922'], ['1.9564', '-0.6527', '0.4776', '1.3519', '-0.9619'], ['2.0049', '-0.6503', '0.0042', '-0.3649', '1.1627'], ['1.7067', '-0.4797', '-0.2498', '1.1692', 

In [7]:
with open("data2.csv", "w") as writeFile:
    writer = csv.writer(writeFile)
    writer.writerows(data)

### Sidenote: other useful modules
A useful module/function in case you forget the name of your file is to list all files in a directory.  

In [108]:
import os
print(os.listdir("."))

['.DS_Store', '.ipynb_checkpoints', '07Python.ipynb', '07PythonII.ipynb', 'clickersols.ipynb', 'data.csv', 'filename.txt', 'input.txt', 'Lecture 7 demo.ipynb', 'ls_orchid.fasta', 'ls_orchid.fasta.rtf', 'output.txt', 'province_population.csv', 'provincepop.csv', 'PythonDA.ipynb', 'PythonRW.ipynb', 'random.fasta.rtf', 'rocketman.txt', 'sequence.fasta.rtf', 'test.txt', 'whatever.csv']


We can also use the module `pprint (for pretty print) to make this output a little neater:

In [109]:
from pprint import pprint
pprint(os.listdir("."))

['.DS_Store',
 '.ipynb_checkpoints',
 '07Python.ipynb',
 '07PythonII.ipynb',
 'clickersols.ipynb',
 'data.csv',
 'filename.txt',
 'input.txt',
 'Lecture 7 demo.ipynb',
 'ls_orchid.fasta',
 'ls_orchid.fasta.rtf',
 'output.txt',
 'province_population.csv',
 'provincepop.csv',
 'PythonDA.ipynb',
 'PythonRW.ipynb',
 'random.fasta.rtf',
 'rocketman.txt',
 'sequence.fasta.rtf',
 'test.txt',
 'whatever.csv']


#### Example 1
Write a Python program that writes to the file `test.txt` the numbers from 20 to 10 on its own line in descending order.

In [10]:
with open("test.txt", "w") as outfile:
    for n in range(20,9,-1):
        outfile.write(str(n)+"\n")

#### Example 2
Write a Python program that reads your newly created `test.txt` file line by line and only prints out the value if it is even.

In [12]:
with open("test.txt", "r") as infile:
    for line in infile:
        if int(line) % 2 == 0:
            #print(line)
            print(line.strip("\n"))

20
18
16
14
12
10


#### Example 3
Print out the contents of the census file  [provinces.csv](https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710000501) available on Canvas.  You may use the `csv` module if you wish.

In [23]:
import csv

with open("provinces.csv", "r") as infile:
    csvfile = csv.reader(infile)
    for row in csvfile:   
        # row is a list
        print(row)

['', '2013', '2014', '2015', '2016', '2017']
['Canada', '35,152,370', '35,535,348', '35,832,513', '36,264,604', '36,708,083']
['Newfoundland and Labrador', '527,399', '528,386', '528,815', '530,305', '528,817']
['Prince Edward Island', '145,198', '145,915', '146,791', '149,472', '152,021']
['Nova Scotia', '943,049', '942,209', '941,545', '948,618', '953,869']
['New Brunswick', '755,710', '754,700', '753,944', '757,384', '759,655']
['Quebec', '8,151,331', '8,210,533', '8,254,912', '8,321,888', '8,394,034']
['Ontario', '13,555,754', '13,680,425', '13,789,597', '13,976,320', '14,193,384']
['Manitoba', '1,265,588', '1,280,912', '1,295,422', '1,318,115', '1,338,109']
['Saskatchewan', '1,104,825', '1,120,639', '1,131,150', '1,148,588', '1,163,925']
['Alberta', '3,997,950', '4,108,416', '4,177,527', '4,236,376', '4,286,134']
['British Columbia', '4,590,081', '4,646,462', '4,694,699', '4,757,658', '4,817,160']
['Yukon', '36,298', '36,817', '37,289', '38,086', '38,459']
['Northwest Territories'

#### Example 4
Try to print out only the provinces with population > 1  million people in 2015 from the data in **Example 3**.  Hint: You will need to remove the commas from the numbers (eg. 44214 instead of 44,214) using the `replace()` function.

In [26]:
int(row[3].replace(",",""))

36608

In [27]:
with open("provinces.csv", "r") as infile:
    csvfile = csv.reader(infile)
    for row in csvfile:   
        # need to remove the commas before feeding to int
        if (row[0] != '' and row[0] != 'Canada') and int(row[3].replace(",","")) > 1000000:
            print(row[0],row[3])
            #print(row[0],int(row[3].replace(",","")))


Quebec 8,254,912
Ontario 13,789,597
Manitoba 1,295,422
Saskatchewan 1,131,150
Alberta 4,177,527
British Columbia 4,694,699


# Handling Exceptions in Python
An **exception** is an error situation that must be handled or the program will fail. **Exception handling** is how your program deals with these errors.


In [28]:
10 * (1/0) # ZeroDivisionError: division by zero

ZeroDivisionError: division by zero

In [2]:
4 + spam*3 # NameError: name 'spam' is not defined

NameError: name 'spam' is not defined

In [3]:
'2' + 2 # TypeError: Can't convert 'int' object to str implicitly

TypeError: can only concatenate str (not "int") to str

For example, this could be useful in the context of reading files: If we try to read a file that does not exist, we need not have our entire program fail, and try to catch this exception in the following manner:

In [4]:
filename = 'nonexistingfile.txt'
try:
    with open(filename, 'r') as f:
        for line in f:
            print(line)
    
except FileNotFoundError:
    print("Could not read file:", filename)

Could not read file: nonexistingfile.txt


Note that without the try-except our program would error out:

In [5]:
filename = 'nonexistingfile.txt'
with open(filename, 'r') as f:
    for line in f:
        print(line)

FileNotFoundError: [Errno 2] No such file or directory: 'nonexistingfile.txt'

In [7]:
 try:
    # try block, exit if error
    num = int(input("Enter a number:"))
    print("You entered:",num)
except ValueError:
    # only executed if exception
    print("Error: Invalid number")
else:
    # only executed if no exception
    print("Thank you for the number")
finally:
    # always executed
    print("Always do finally block")

Enter a number:5
You entered: 5
Thank you for the number
Always do finally block


Note that we can always generate an error using the `raise` statement.

In [8]:
def raiseHell():
    try:
        raise ValueError
    except ValueError:
        print("You raised Hell!")

In [9]:
raiseHell()

You raised Hell!


**Example** Write a Python program that reads two numbers and converts them to integers, prints both numbers, and then divides the first number by the second number and prints the result.

In [31]:
try:
    num1 = int(input("Enter a number:"))
    print(num1)
    num2 = int(input("Enter a number:"))
    print(num2)
    print(num1,"/",num2,"=",(num1/num2))
except ValueError:
    print("Invalid")
except ZeroDivisionError:
    print("Cannot divide by 0!")

Enter a number:8
8
Enter a number:0
0
Cannot divide by 0!


# Reading URLs with Python
`urllib` is a package that collects several modules for working with
URLs:

- `urllib.request` for opening and reading URLs [read more about it here](https://docs.python.org/3/library/urllib.request.html#module-urllib.request)
- `urllib.parse` for parsing URLs [read more about it here](https://docs.python.org/3/library/urllib.parse.html#module-urllib.parse)

In [13]:
import urllib.request
# don't forget the protocol http://
loc="http://google.com"
site = urllib.request.urlopen(loc)
contents = site.read()
print(type(site))
print(type(contents))
print(contents)
site.close()

<class 'http.client.HTTPResponse'>
<class 'bytes'>
b'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en-CA"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script nonce="Ud5Xf0FW85LJsQXM1D3tSg==">(function(){window.google={kEI:\'5yizXbz7EcXy-gTojKeIBQ\',kEXPI:\'0,18167,1335580,5662,731,223,510,1065,3152,377,207,904,113,53,1884,250,10,713,271,67,97,93,24,325,890,11,40,13,66,154,236,25,4,95,1130393,143,1197698,462,302802,26305,1294,12383,4855,32692,15247,864,18547,854,2481,2,2,6801,364,3319,5505,224,2215,5945,1119,2,204,375,727,2431,1362,4323,4968,773,2256,1401,3337,1146,9,1966,6193,1719,1496,312,1978,2042,8911,5295,897,1121,956,873,37,1180,2975,2736,3061,2,631,3240,7446,620,1139,1744,21,317,1119,904,1150,975,1,368,2778,919,992,1285,8,2142,1,653,967,601,25,1279,2212,202,323,5,1252,840,324,193,1263,203,8,48,820,3438,108,1

In [14]:
url = 'https://www.ask.com/web'
# Build and encode data
values = {'q' : 'data analysis'}

# converts to a language the internet understands
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
    page = response.read()
    print(page)

b'<!DOCTYPE html>\n<html lang="en">\n<head>\n    <meta charset="UTF-8"/>\n    <meta name="viewport" content="width=device-width,initial-scale=1"/>\n\n    \n\n\n<meta property="og:image" content="//www.ask.com/logo.png" />\n    <meta property="og:description" content="Ask.com is the #1 question answering service that delivers the best answers from the web and real people - all in one place."/>\n    <meta property="og:title" content="Ask.com - What\'s Your Question?"/>\n    <meta property="og:url" content="www.ask.com"/>\n    <meta property="og:site_name" content="Ask.com"/>\n    <meta property="fb:page_id" content="123118179545" />\n    <meta name="twitter:card" content="summary" />\n    <meta name="twitter:site" content="@askdotcom" />\n\n    <link rel="canonical" href="//www.ask.com"/>\n\n\n<link REL="search" type="application/opensearchdescription+xml" HREF="//www.ask.com/ask-search.xml" title="Ask Search">\n\n<title>Ask.com - What\'s Your Question?</title>\n\n\n\n<link rel="styleshe

## Example 
**Question 1** Write a Python program that connects to any web page
and prints its contents.

In [16]:
import urllib.request
loc="https://people.ok.ubc.ca/ivrbik/"
site = urllib.request.urlopen(loc)
contents = site.read()
print(contents)
site.close()

b'<!DOCTYPE html>\n<html>\n\n<head>\n<title>Irene Vrbik</title>\n<link rel="stylesheet" type="text/css" href="css/global.css"></link>\n<link href=\'https://fonts.googleapis.com/css?family=Open+Sans:400,300,600,800,700,100\' rel=\'stylesheet\' type=\'text/css\'></link>\n<link href=\'https://fonts.googleapis.com/css?family=Ovo\' rel=\'stylesheet\' type=\'text/css\'></link> \n<link href=\'https://fonts.googleapis.com/css?family=Vollkorn:400,700\' rel=\'stylesheet\' type=\'text/css\'>\n\n<meta name="description" content="Irene Vrbik\'s personal website.">\n<meta name="keywords" content="Irene Vrbik, Classification, Clustering, Biostatistics">\n<meta name="author" content="Irene Vrbik">\n</head>\n\n<body>\n\n    <div id="Wrapper">\n        <div id="Logo"></div>\n        <div id="Name">Irene Vrbik</div><br>\n        <div id="Uni">>>University of British Columbia Okanagan</div>\n        <div id="NavBar">\n            <ul>\n                <li><a href="index.html">home</a></li>\n              

**Question 2** Write a Python program that connects to: http://www.sharecsv.com/dl/ab69f200ce5071b27e4af626da293d27/province_population.csv and outputs the CSV data.
Modify your program to print each province and its 2015 population in descending sorted order. (See next slide for hint)

In [34]:
# extract the information
import urllib.request
import csv

url = "http://www.sharecsv.com/dl/ab69f200ce5071b27e4af626da293d27/province_population.csv"

with urllib.request.urlopen(url) as site:
    data = site.read().decode()

data

',2011,2012,2013,2014,2015,\r\nCanada,"34,342.80","34,751.50","35,155.50","35,543.70","35,851.80",\r\nNewfoundland and Labrador,525,526.9,528,529.1,527.8,\r\nPrince Edward Island,144,145.3,145.4,146.2,146.4,\r\nNova Scotia,944.5,944.8,943,942.4,943,\r\nNew Brunswick,755.5,756.8,755.7,754.6,753.9,\r\nQuebec,"8,007.70","8,084.80","8,154.80","8,214.90","8,263.60",\r\nOntario,"13,263.50","13,409.60","13,551.00","13,677.70","13,792.10",\r\nManitoba,"1,233.70","1,250.40","1,265.30","1,280.20","1,293.40",\r\nSaskatchewan,"1,066.30","1,087.20","1,106.10","1,122.30","1,133.60",\r\nAlberta,"3,790.20","3,888.60","4,007.70","4,120.90","4,196.50",\r\nBritish Columbia,"4,499.10","4,542.60","4,582.60","4,638.40","4,683.10",\r\nYukon,35.4,36.2,36.4,37,37.4,\r\nNorthwest Territories,43.5,43.6,43.9,44,44.1,\r\nNunavut,34.2,34.7,35.4,36.1,36.9,\r\n'

In [35]:
# write the data to a csv file using the csv module
outfile = open("province_population.csv", "w")
outfile.write(data)
outfile.close()

Now we can manipulate this csv data within Python:

In [36]:
# read in the csv document and save it to a list
provinces = [] 
with open("province_population.csv", "r") as infile:
    csvfile = csv.reader(infile)
    for row in csvfile:         
        print(row)
        if (len(row) > 0 and row[0] != '' and row[0] != 'Canada'):                          
            val = row[5].replace(",","")
            # since the population is stored in thousands, we will multiply by
            # 1000 to store the population in "ones"
            provinces.append([float(val)*1000, row[0]])

['', '2011', '2012', '2013', '2014', '2015', '']
['Canada', '34,342.80', '34,751.50', '35,155.50', '35,543.70', '35,851.80', '']
['Newfoundland and Labrador', '525', '526.9', '528', '529.1', '527.8', '']
['Prince Edward Island', '144', '145.3', '145.4', '146.2', '146.4', '']
['Nova Scotia', '944.5', '944.8', '943', '942.4', '943', '']
['New Brunswick', '755.5', '756.8', '755.7', '754.6', '753.9', '']
['Quebec', '8,007.70', '8,084.80', '8,154.80', '8,214.90', '8,263.60', '']
['Ontario', '13,263.50', '13,409.60', '13,551.00', '13,677.70', '13,792.10', '']
['Manitoba', '1,233.70', '1,250.40', '1,265.30', '1,280.20', '1,293.40', '']
['Saskatchewan', '1,066.30', '1,087.20', '1,106.10', '1,122.30', '1,133.60', '']
['Alberta', '3,790.20', '3,888.60', '4,007.70', '4,120.90', '4,196.50', '']
['British Columbia', '4,499.10', '4,542.60', '4,582.60', '4,638.40', '4,683.10', '']
['Yukon', '35.4', '36.2', '36.4', '37', '37.4', '']
['Northwest Territories', '43.5', '43.6', '43.9', '44', '44.1', '']
[

In [37]:
# each element in provinces list is a list containing the province's population and name
for row in provinces:
    print(row)

[527800.0, 'Newfoundland and Labrador']
[146400.0, 'Prince Edward Island']
[943000.0, 'Nova Scotia']
[753900.0, 'New Brunswick']
[8263600.0, 'Quebec']
[13792100.0, 'Ontario']
[1293400.0, 'Manitoba']
[1133600.0, 'Saskatchewan']
[4196500.0, 'Alberta']
[4683100.0, 'British Columbia']
[37400.0, 'Yukon']
[44100.0, 'Northwest Territories']
[36900.0, 'Nunavut']


In [39]:
# calling .sort on this list will sort according to that first element (population)
provinces.sort(reverse=True)
for row in provinces:
    #print(row)
    print(row[1], "", row[0])

Ontario  13792100.0
Quebec  8263600.0
British Columbia  4683100.0
Alberta  4196500.0
Manitoba  1293400.0
Saskatchewan  1133600.0
Nova Scotia  943000.0
New Brunswick  753900.0
Newfoundland and Labrador  527800.0
Prince Edward Island  146400.0
Northwest Territories  44100.0
Yukon  37400.0
Nunavut  36900.0
