# Welcome to Session 4 - Working with CSV Files

Numerical data are often stored in spreadsheets, with CSV files one of the most common.

We'll learn how to open, read, and write CSV files.

## Open, Read, and Write CSV Files

### Importing Python Modules

When Python starts, a number of core functions are immediately available to use.

We've already used some, including print() and type().

Python doesn't load all functions for the sake of efficiency. Other functions are grouped in modules.

We must import a module in order to use its functions.

In [None]:
# Import the csv module

import csv

### Question 1

How do we know what functions are available in a module?

In [None]:
dir(csv)

### Reading CSV Files

We use the open() function within a 'with' statement as an efficient way to ensure proper file handling.

Otherwise we'd have to specifically open and close the file we are either reading or writing.

In [None]:
# A file called fishcounts.csv is available in the data folder.

with open('data/fishcounts.csv', 'r', newline='', encoding='utf-8-sig') as fishfile: # Open the file in read ('r') mode
    reader = csv.reader(fishfile) # reader contains all the data
    
    # The reader is a csv reader object
    print(reader)


In [None]:
# We can iterate over a reader object
with open('data/fishcounts.csv', 'r', newline='', encoding='utf-8-sig') as fishfile:
    reader = csv.reader(fishfile)
    for row in reader: # iterate over the reader object to get each row of the data
        print(row)

### Question 1

What do you observe about the data that are printed?
1. What structure are the data contained within?
2. What has happened to the fish counts which were integers in the spreadsheet?

In [None]:
# We can also easily turn it into a list object with the list() function.

with open('data/fishcounts.csv', 'r', newline='', encoding='utf-8-sig') as fishfile:
    reader = csv.reader(fishfile)
    fishlist = list(reader) # Convert the reader object to a list object
    print(fishlist) # Now we've got a list of lists (representing a list of spreadsheet rows)

### Activity 1

Open the following CSV file and turn the reader contents into a list object and print it.
data/2020-sc-fishery.csv

In [None]:
#Tackle Activity 1 here


In [None]:
We have a list of 137 lists, the first list being the column headings from the spreadsheet.

Look over the first list within the list of data you printed.
* The second column (index #1) is the weight of fish landed in pounds
* The fourth column (index #3) is the dollar value
* Some rows have no data and are described in the fifth column as 'Confidential'
* Some rows that are not marked 'Confidential' are missing data.

In [None]:
### Question 1

What is the value and type of an empty cell?

In [None]:
# The first empty cell is the second item in row (list) 2. It is listed as ''
# We know how to access the second list already
print(fishlist[1])

# To access the second item in the second list, we have to add the location of the item
print(fishlist[1][1])

# To view the type of the empty item, we use the type() function
print(type(fishlist[1][1]))

In [None]:
# To test the value of the empty item (in this case we know it is an empty string):
print(fishlist[1][1] == '') # is fishlist[1][1] equal to '' ?

### Activity 2

1. Iterate over the master list to access each row individually
2. Test if the fifth item in the cell (index #?) is equal to 'Public'
3. For the items that are public, test to see if both item 2 and item 4 in the list (index #s?) are not equal to '' (empty string)
4. For those items, print the weight in pounds, the price, and the price per pound (calculate)

In [None]:
# Tackle Activity 2 here



### Question 2

What happened?

### Casting a Variable as a Different Type of Variable

Remember the type() function?
Test the type of the number in the fourth row (index ?) second column (index?)

In [None]:
# Test it here


So the 'number' is actually a string. We want to do math with it, so we need it to be either an inter (whole number) or a float (decimal).
It looks like only whole numbers are represented, so let's change the number string to an integer

In [None]:
#### Cast a Variable as an Integer with int()

In [None]:
asinteger = int(fishlist[3][1])
print(asinteger)

### Question 3

In [None]:
What happened?

The comma in the text string can't be interpreted as part of an integer.
We need to remove it before we can cast it as an integer

#### Replacing Parts of a String With the .replace() Method

In [None]:
cleanstring = fishlist[3][1].replace(',', '')
print(cleanstring)

In [None]:
asinteger = int(cleanstring)
print(asinteger, type(asinteger))

So to cast our string as an integer, we first cleaned it and then changed it to an integer.
This can be done in one line of code by nesting, but it's harder to read

In [None]:
myinteger = int(fishlist[3][1].replace(',', ''))
print(myinteger, type(myinteger))

In [None]:
### Back to Our Loop!

In [None]:
for row in fishlist:
    if row[4]=='Public':
        if row[1]!='' and row[3]!='':
            weight = int(row[1].replace(',','')) # fix our weight data and assign it to variable 'weight'
            dollars = int(row[3].replace(',','')) # fix our dollar data and assign it to variable 'dollars'
            print(weight, dollars, dollars/weight) # use the weight and dollar integers when printing

Great! But the calculation for price per pound resulted in long floats. It looks horrible.
But there's a function for rounding floats: round(float, # of decimal places)

In [None]:
# e.g.
round(0.585336172801947,2)

In [None]:
# So let's round our price per pound

for row in fishlist:
    if row[4]=='Public':
        if row[1]!='' and row[3]!='':
            weight = int(row[1].replace(',','')) # fix our weight data and assign it to variable 'weight'
            dollars = int(row[3].replace(',','')) # fix our dollar data and assign it to variable 'dollars'
            print(weight, dollars, round(dollars/weight, 2)) # round the price per pound to two decimal places

### Preparing to Write to a CSV File

Now we've extracted the data we can use, cleaned it up, and performed a calculation, let's write it to a CSV file.

The problem is, we haven't actually permanently changed anything. We've only been printing it on the screen.

We need to prepare a list of lists (like we first extracted) to use to write to a CSV file.

In [None]:
newlist = [] # our empty new master list to populate

for row in fishlist:
    if row[4]=='Public':
        if row[1]!='' and row[3]!='':
            weight = int(row[1].replace(',','')) # fix our weight data and assign it to variable 'weight'
            dollars = int(row[3].replace(',','')) # fix our dollar data and assign it to variable 'dollars'
            # print(weight, dollars, round(dollars/weight, 2)) # round the price per pound to two decimal places
            newlist.append([weight, dollars, round(dollars/weight, 2)]) # add a list of the three data items we want.
print(newlist)

### Activity 3

In [None]:
Almost there, but it sure would be nice to include the fish name too. Adjust the code to include the fish name

In [43]:
newlist = [] # our empty new master list to populate

for row in fishlist:
    if row[4]=='Public':
        if row[1]!='' and row[3]!='':
            # Assign the fish name to a variable called fishname

            weight = int(row[1].replace(',',''))
            dollars = int(row[3].replace(',',''))
            
            # Modify the append statement to include the fish name in the column where you want it to appear
            newlist.append([weight, dollars, round(dollars/weight, 2)]) # add a list of the four data items we want.

print(newlist)

[['SPOT', 5164, 5106, 0.99], ['COBIA', 1588, 5910, 3.72], ['SCAMP', 34460, 200943, 5.83], ['WAHOO', 9028, 31937, 3.54], ['ESCOLAR', 1879, 2940, 1.56], ['HOGFISH', 7718, 41008, 5.31], ['CONCHS **', 208, 245, 1.18], ['MUMMICHOG', 35111, 216015, 6.15], ['SQUIDS **', 5029, 4998, 0.99], ['SWORDFISH', 721788, 2143642, 2.97], ['BARRELFISH', 4399, 22886, 5.2], ['CRAB, BLUE', 3421731, 5641831, 1.65], ['HIND, ROCK', 3487, 17694, 5.07], ['PORGY, RED', 37063, 102862, 2.78], ['TRIPLETAIL', 12, 42, 3.5], ['DOLPHINFISH', 156543, 446617, 2.85], ['GROUPER, GAG', 129487, 768305, 5.93], ['GROUPER, RED', 1341, 6910, 5.15], ['GRUNT, WHITE', 16285, 26926, 1.65], ['JACK, ALMACO', 39801, 49845, 1.25], ['SNAPPER, RED', 6140, 37003, 6.03], ['TUNA, BIGEYE', 27165, 87933, 3.24], ['KINGFISHES **', 43458, 51453, 1.18], ['LIONFISH, RED', 1625, 8028, 4.94], ['SHAD, GIZZARD', 176774, 55270, 0.31], ['SNAPPER, GRAY', 2700, 9019, 3.34], ['SNAPPER, SILK', 947, 3347, 3.53], ['TUNA, BLUEFIN', 11973, 50130, 4.19], ['DOLPHINF

### Writing a CSV File

The method to open the file for writing is pretty similar to opening a file for reading.

In [44]:
with open('data/new-fish-file.csv', 'w', newline='') as newfile: # Specify the exisitng/new file to open/create in write ('w') mode
    writer = csv.writer(newfile) # Create a writer object
    # To write a single row we use the writerow method on the writer object.
    # Let's do that to add column headings
    writer.writerow(["Fish Name", "Catch in Pounds", "Value in USD", "Price per Pound"])
    
    # To write our list of data lists, we use the writerows method on the write object
    writer.writerows(newlist)

### Quiz

[hyperlink to a quiz here. Perhaps we can use Google Forms quizzes with multi-choice questions to help solidify learning]

### Challenge [this is homework or to do if the class finishes early]

Challenge description [challenge is to consolidate and practice content learned during this session]

In [None]:
#Tackle the challenge here [code space]




### Resources [web resources for reference or reading to help expand knowledge on this section's content]

[Python Documentation - Reading and Writing CSV Files](https://docs.python.org/3/library/csv.html?highlight=csv#module-csv)