In [2]:
# Use of IPython.display and HTML for using css to format text in Markdown based on
# www.dataquest.io/blog/advanced-jupyter-notebooks-tutorial
from IPython.display import HTML
HTML(
    "<style>\
    span.str {color:#BA2121; font-style:italic; font-weight:bold;}\
    span.num {color:#080; font-style:italic; font-weight:bold;}\
    span.bltn {color:#080; font-weight:bold;}\
    span.op {color:#AA22FF;}\
    span.func {color:#00F; font-weight:bold;}\
    h3.yt {color:#009900; font-style:italic;}\
    span.yt {color:#009900; font-style:italic;}</style>"
)

<h2><b><u>Case Study - Port Authority of New York and New Jersey - Part 2</u></b></h2>

In this case study you will practice what you've learned from DataCamp's Data Science Toolbox Part 1 and the DASC 2113 lecture. This will include writing and working with different types of <b><i>functions</i></b>. 

<h3><b><i>Background</i></b></h3>
You will continue to work with and examine the PANYNJ <b><i>Air Passenger Traffic per Month</i></b> dataset introduced in the previous Port Authority of New York and New Jersey case study. 

<h3><b><i>Goal</i></b></h3>
The goal of this exercise is to examine the Port Authority of NY NJ's <b><i>Air Passenger Traffic per Month</i></b> dataset. You will examine different aspects of the data and use it to answer questions in the <span class="yt">"Your Turn"</span> section of this Notebook. 

You will practice topics covered in DataCamp's <a href = "https://learn.datacamp.com/courses/intermediate-python">Intermediate Python</a> and <a href = "https://learn.datacamp.com/courses/python-data-science-toolbox-part-1">Python Data Science Toolbox (Part 1)</a> such as: creating and using <b><i> user-defined functions</i></b>, use <b><i>error handling</i></b> in a user-defined function, create and use <b><i>lambda functions</i></b>, and become more familiar with <b><i>scope</i></b> of variables.

<h3><b><i>Data</i></b></h3>
The <a href = "https://data.ny.gov/Transportation/Air-Passenger-Traffic-per-Month-Port-Authority-of-/8pkr-4b7t">PANYNY Air Passenger Traffic per Month dataset</a> collects data on passengers using the five airport facilities managed by the agency. The dataset includes the years from 1977 - 2015 with monthly passenger counts for each airport [5]. The monthly passenger counts include both passenger arrivals and departures aggregated counts for both domestic and international flights [5]. In this exercise you will be working with a selected subset of the full data. 


<h3> Creating your own functions </h3>
In the code blocks below you will work on writing <b><i>user-defined functions</i></b>. You will be presented with various scenarios and will create functions that can be used as solutions. 

<h4><u> Scenario 1:</u></h4>

You have been asked to check that the total number of passengers reported monthly is correct. To do this, check that <span class="num">'Total Passengers'</span> <span class="op">==</span> <span class="num">'Domestic Passengers'</span> <span class="op">+</span> <span class="num">'International Passengers'</span> for each entry in the data. You will need to do this every time new monthly data is reported and that the data will maintain the same order and format. <b><i>Due to specifications by your boss you cannot use pandas.</i></b>

Create a function <span class="func">checkTotalPass()</span> in the cell below that checks the correctness of the <i>Total Passengers</i> column for each row in the data. If a row is incorrect <span class="bltn">print</span> that row. 

In [3]:
# NOTE: Code in this cell is complete. Make sure you understand the code before running the cell.
# Code to load csv using csv module. For more information on the csv module: https://docs.python.org/3/library/csv.html
import csv

airport_data= []

# Read in csv file to a list.
with open('Data/ACY_EWR_JFK_2015_2013.csv', newline='') as csvfile:
    csv_reader = csv.reader(csvfile, delimiter=',', quotechar='|') # quotechar for string markers
    for row in csv_reader: # iterator faster 
        airport_data.append(list(row)) # convert to list
        
# Data you will work with        
air_subset = airport_data[0:11][:] #[0:11]-0 indexes the first outer list and the 11 acesses the first 11 lists of lists
print(air_subset)

[['Airport Code', 'Year', 'Month', 'Domestic Passengers', 'International Passengers', 'Total Passengers'], ['ACY', '2015', 'Feb', '96431', '65', '96496'], ['ACY', '2015', 'Aug', '108700', '1932', '110632'], ['ACY', '2015', 'Sep', '82268', '315', '82583'], ['ACY', '2015', 'Jan', '98177', '90', '98267'], ['ACY', '2015', 'Jul', '109247', '1894', '111141'], ['ACY', '2015', 'Dec', '101600', '28', '101628'], ['ACY', '2015', 'Jun', '96259', '1102', '97361'], ['ACY', '2015', 'May', '103668', '425', '104093'], ['ACY', '2015', 'Oct', '80901', '443', '81344'], ['ACY', '2015', 'Nov', '94358', '0', '94358']]


<h4><u> Creating a User-Defined Function </u></h4>
In the code above the csv file was imported into a <span class="bltn">list</span> called <b><i>airport_data</i></b> which was then subset into a smaller version, <b><i>air_subset</i></b> that contains the first 10 rows of the file. 

Below is the code for the function <span class="func">checkTotalPass</span>. It is declared with the <span class="bltn">def</span> keyword and has one <span class="bltn">parameter</span> <b><i>air_data</i></b> which is a nested list. You will fill in various lines of the <span class="func">checkTotalPass</span> function based on the comments as hints. 

In [4]:
""" Function to check if the total amount of passengers is recorded correctly for each month. If it is incorrect, print
the incorrect data. 

Args: air_data (nested list): Nested list. Each sublist contains a row of data. 

Returns: void
""" 
# doc string (documentation string) line 1 description of function, line 2 variables expected, line 3 doesn't return anything
# add doc strings to functions

def checkTotalPass(air_data): 
    
    # Create a new list that does not contain the sublist with the data header. Store it in a variable called data_only
    # Hint: Access the 2-d air_data elements beginning at the 1st row and include all columns [1:][:]. 
    data_only =air_data[1:][:] # removes header

    # Empty List to store incorrect results 
    incor_list = []
    
    # Loop through data only. Hint: Use a for loop 
    for val in data_only:

        # Store the 'Total Passenger' value in a variable total_p. Hint: Extract the last element of val. 
        # Make sure to cast to an int using int()
        total_p = int(val[-1]) # string converted to string
        
        # Store the 'Domestic Passenger' value in a variable d_pass. Hint: Extract the element of val at position 3.
        # Make sure to cast to an int using int()
        d_pass = int(val[3]) # convert to int

        # Store the 'International Passenger' value in a variable i_pass. Hint: Extract the second to last element from val. 
        # Make sure to cast to an int using int()
        i_pass = int(val[-2]) # convert to int

        # Calculate a variable sum_pass by adding d_pass + i_pass
        sum_pass = d_pass + i_pass
    
        # Check if sum_pass == total_p. If not append val to incor_list. 
        if (sum_pass != total_p):
            incor_list.append(val)
            
    # Check if any incorrect results by checking if incor_list contains > 0 elements. 
    if(len(incor_list) > 0):

        # Print the header. Hint: print the first row in air_data, i.e. element 0. Include all columns
        print(air_data[0][:])

        # Print each incorrect value in incor_list
    for incor_val   in incor_list:
        print(incor_val)
            

<h4><u> Calling a User-Defined Function </u></h4>

Now that the code for <span class="func">checkTotalPass</span> has been completed, the function can be called by passing a <span class="bltn">nested list</span> to it. In the Code cell below, call the <span class="fun">checkTotalPass</span> function with <b><i>air_subset</i></b> as the function argument. 

In [5]:
# Call checkTotalPass with air_subset
checkTotalPass(air_subset) # checks if every row checks out

The <span class="func">checkTotalPass</span> function did not print any output so we know that the <span class="num">'Total Passengers'</span> <span class="op">==</span> <span class="num">'Domestic Passengers'</span> <span class="op">+</span> <span class="num">'International Passengers'</span> for each entry in the data.

Now let's append a new list to the <b><i>air_subset</i></b> nested list that represents a new row in the data, then run the <span class="func">checkTotalPass</span> function again. 

In [6]:
# NOTE: Code in this cell is complete. Make sure you understand the code before running the cell.
# Add an incorrect entry to air_subset
air_subset.append(['ACY', '2015', 'Apr', '105539', '161', '105000'])

# Run again
checkTotalPass(air_subset)

['Airport Code', 'Year', 'Month', 'Domestic Passengers', 'International Passengers', 'Total Passengers']
['ACY', '2015', 'Apr', '105539', '161', '105000']


We can see from the output above that the newly added row (<span class="bltn">list</span>) has an error in the <i>Total Passengers</i> value. Since we made the code for checking this into the user-defined function <span class="func">checkTotalPass</span> it was easy to check the newly added data. 

<h3>Scenario 2: Import CSV Function</h3>
In a couple of the case study Notebooks you have seen code to <span class="bltn">import</span> a csv file without using external libraries such as NumPy or pandas. The structure of the code is the same, the only difference is variable names. This is a good opportunity to create a user-defined function to import the csv file instead of having to duplicate the code over and over. 


<h4><u> Creating a User-Defined Function </u></h4>
Below is the code for the function <span class="func">readCSV2List</span>. It is declared with the <span class="bltn">def</span> keyword and has one <span class="bltn">parameter</span> <b><i>file_info</i></b> which is a <span class="str">string</span>. Read through the lines of the <span class="func">readCSV2List</span> function. This function contains much of the same code as you've seen before, however adds a <span class="bltn">try and except</span> block to handle <b>errors</b> that might occur during the read in process.


In [7]:
# NOTE: Code in this cell is complete. Make sure you understand the code before running the cell.
# Error handling based on: https://docs.python.org/3/library/csv.html

""" Read in a csv file using Python's csv module. 

Args: file_info (string): Full file path and file name of csv to read in. 

Returns: Nested list of the data in the csv file. Each row is stored as a sublist. 
"""
def readCSV2List(file_info): 
    # Import Python's built in csv module 
    import csv
    
    # Import says to handle errors
    import sys

    # Create a list to store the sublists
    data= []

    # Handle errors
    try:
        # Read in csv file to a list.
        with open(file_info, newline='') as csvfile:
            csv_reader = csv.reader(csvfile, delimiter=',', quotechar='|')
            for row in csv_reader:
                    data.append(list(row))     
        return data
    except csv.Error as e: 
        print(e)

<h4><u> Calling a User-Defined Function </u></h4>
Let's call the <span class="func">readCSV2List</span> function to open the same csv file <span class="str">"Data/ACY_EWR_JFK_2015_2013.csv"</span> that was previously opened into a variable <b><i>test_read</i></b>. Then let's check to make sure that the function is working correctly by check that the contents of <b><i>test_read</i></b> are equal to the contents of <b><i>airport_data</i></b>.

In [8]:
# Check that the readCSV2List function reads the data the same as previous code. 

#Read in the 'Data/ACY_EWR_JFK_2015_2013.csv' 
# using readCSV2List as test_read. 
test_read = readCSV2List('./Data/ACY_EWR_JFK_2015_2013.csv') # start with ./ to read data in different folder
# good practice to store data in folder that is separate from code
# starting with ../ will look back up the file structure

# Test if test_read's contents are equal to airport_data's contents. 
# If the contents are equal the output should show True
test_read == airport_data

True

Now that we've seen the <span class="func">readCSV2List</span> function working, let's see what happens when it tries to open a csv file that doesn't exist. The function will <span class="bltn">try</span> to open and read in the file then when it encounters an error it will move to the <span class="bltn"> except</span> block and <span class="bltn">print</span> the error. 

In [9]:
# Call readCSV2List function and store the returned list in a variable error_read.
# Read a file that does not exist called 'test_file.csv'.
error_read = readCSV2List('Data/test_file.csv')


FileNotFoundError: [Errno 2] No such file or directory: 'Data/test_file.csv'

The <span class="func">readCSV2List</span> returns the error encountered in the <span class="bltn">try</span> block. The full stack trace of the error is not shown. For the purposes of this course you could use the <span class="func">readCSV2List </span> function to <span class="bltn">import</span> a provided csv file used with this class's Juypter Notebooks as a nested list. However, you will most often use a pre-written function from a library such as <b><i>pandas</i></b> to read in csv files. 

<h3>Lambda Functions</h3>
In the code blocks below you will practice with lambda functions. 

<h4><u>Creating and Using a Function</u></h4>
In the code block is a function <span class="func">addPassengers</span> that takes two <span class="str">strings</span> that represent the passengers values as parameters. The <span class="func">addPassengers</span> function then converts these values to <span class="num">integers</span> and <span class="op">adds </span> them together returning the sum. 

In [None]:
# Let's write a function that adds together two passenger values from the air_subset variable and returns the results. 
# needs a doc string
def addPassengers(p1, p2): 
    # Add p1 to p2 and store in the variable sum_pass
    sum_pass = int(p1) + int(p2)
    
    # return sum_p
    return sum_pass

Now that we've written <span class="func">addPassengers</span> let's extract data from <b><i>air_subset</i></b> to use with the function. The variable <b><i>d_1</i></b> stores the value of the domestic passenger count from the first row and <b><i>i_1</i></b> the international passenger count from the first row. <b><i>d_1 and i_1</i></b> will be passed as arguments to <span class="func"> addPassengers</span> and the return value stored in <b><i>t_1</i></b>. <span class="bltn">Print</span> the value of <b><i>t_1</i></b>. 

In [None]:
# Extract the domestic passenger value for the first row of data in air_subset and store this as d_1. 
#Remember that air_subset[0][:] contains the data headers and that the domestic passenger column is located at index 3.
d_1 = air_subset[1][3]

# Extract the international passenger value for the first row of data in air_subset and store this as i_1. 
# Remember that the international passenger column is located at index 4. 
i_1 = air_subset[1][4]

# Call the addPassengers function with d_1 and i_1 as arguments store the result as t_1
t_1 = addPassengers(d_1, i_1)

# print t_1
print(t_1)

96496


<h4><u>Creating and Using a Lambda Function</u></h4>
Let's create a <span class="bltn">lambda function</span> that produces the same result as the <span class="func">addPassengers</span> function and store the result in the variable <b><i>add_pass</i></b>. <span class="bltn">Print</span> <b><i>t_2</i></b>. 

In [None]:
# Write addPassengers as a lambda function called add_pass. Hint: (lambda: parameter1, parameter2: int(parameter1) + int(parameter2))
add_pass = (lambda p1, p2: int(p1) + int(p2))

# Call add pass and store result in variable t_2
t_2 = add_pass(d_1, i_1)

# print t_2
print(t_2)

96496


<h3> Variable Scope </h3>
In the code blocks below you will practice with the <b><i>scope</i></b> of variables. This section will not use data from the Air Passenger Traffic per Month data, but will instead use user-defined variables.

<h4><u>Global and Local Scope</u></h4>
As you work through completing the code in the cell below, think about the following questions: 

Why does the variable <b><i>x</i></b> <span class="bltn">print</span> <span class="num">0, 1, 2, 3, 3</span> in the <span class="func">print3Ints</span> function but then <span class="bltn">print</span> <span class="num">0</span> in the last <span class="bltn">print</span> statement? What does this show about the <b><i>scope</i></b> of the variable <b><i>x</i></b>? 

In [None]:
# NOTE: Code in this cell is complete. Make sure you understand the code before running the cell.

# Create a variable x and store in it an integer value 0
x = 0

# Create a function to loop through a list of integers
# Create a list containing the values 1 through 3. Store it in a variable called three_ints.
def print3Ints(): 
    three_ints = [1, 2, 3]
    

    # Loop through the three_ints list
    for an_int in three_ints:
        # Create a variable x. Set it equal to  each value looped through (an_int)
        x = an_int
        print(x) # prints inside loop and inside function
    print(x)     # prints outside loop and inside function

# Call print3Ints
print3Ints()

# Print x. Does it print 0 or 3? 
print("\n", x)  # prints outside function - var x outside function is different var than the var x inside the function

1
2
3
3

 0


Why does the variable <b><i>x</i></b> <span class="bltn">print</span> <span class="num">0, 1, 2, 3, 3</span> in the <span class="func">print3Ints</span> function but then <span class="bltn">print</span> <span class="num">0</span> in the last <span class="bltn">print</span> statement? What does this show about the <b><i>scope</i></b> of the variable <b><i>x</i></b>? 

Type your answers here.


Possible Answer: The first declaration of x = 0 is a variable created at the global scope. It is available from anywhere in the program after its declaration. The variable x = an_int is created within the print3Ints function and has a local scope. This makes it accessible only within the function. The last print statement prints the x with the global scope which retains the 0 value.

<h4><u>Global and Local Scope - Example 2</u></h4>
As you work through completing the code in the cell below, think about the following questions: 

Why is the <span class="func">innerInts</span> function able to <span class="bltn">print</span> the values of <b><i>x</i></b> in the <span class="func">print3Ints</span> function? 


In [None]:
# Create a variable x and store in it an integer value 0
x = 0

print("Value of x: ", x)

# Modify the printInts function to include an inner function
def print3Ints(): 
    def innerInts():
        print("Inner function: ", x)
        
    three_ints = [1, 2, 3]
    

    # Loop through the three_ints list
    for an_int in three_ints:
        # Create a variable x. Set it equal to  each value looped through
        x = an_int
        print("Outer function: ", x)
        innerInts()
    print("Final value stored in x variable in print3Ints function: ", x)
    
print3Ints()
    
# Print x. Does it print 0 or 3? 
print("\nValue of x: ",x)    

Value of x:  0
Outer function:  1
Inner function:  1
Outer function:  2
Inner function:  2
Outer function:  3
Inner function:  3
Final value stored in x variable in print3Ints function:  3

Value of x:  0


Why is the <span class="func">innerInts</span> function able to <span class="bltn">print</span> the values of <b><i>x</i></b> in the <span class="func">print3Ints</span> function? 

Type your answers here.

Possible Answer: 
The innerInts() function is enclosed within the scope of the print3Ints function. It is able to access variables declared in it.

In the Code cell below, try calling the <span class="func">innerInts</span> function declared in the previous code block. 

In [None]:
# Try calling the innerInts() function.
innerInts()

# cannot call this function outside of the print3Ints() function.

NameError: name 'innerInts' is not defined

Why did the <span class="func">innerInts</span> function call not work?

Type your answer here. 

Possible Answer: The innerInts() function's scope is enclosed within the print3Ints() function. It is not available from outside of it. 

<h3 class="yt">Your Turn: 1</h3>
In this Markdown Cell, describe the purpose of each line in the <span class="func">readCSV2List</span> function. Number each line starting with <span class="bltn">def</span> <span class="func">readCSV2List</span> as <b>#1</b>.

<b><u>Answer:</u></b>


<h3 class="yt">Your Turn: 2</h3>
Think of a common task you might encounter when working with data. Create pseudocode for a <span class="func">function</span> to handle this task. Identify a <b>name</b> for the function, the <b><i>parameters</i></b> the would take, the <b>steps needed to implement it</b>, and any <b><i>value</i></b> returned by the function. 

Type your answer in this Markdown cell. 

<b><u>Answer: </u></b>
Line 1: This is the function header with one parameter. It gives the name of the function.
Line 2: Comment
Line 3: This is the command to import the csv library which has Python csv file handling fiunctions.
Line 4: Blank
Line 5: Comment
Line 6: This is the command to import the Python sys module which has functions to handle parts of the runtime environment.
Line 7: Blank
Line 8: Comment
Line 9: The variable data is assigned as a blank list placeholder to store sublists.
Line 10: Blank
Line 11: Comment
Line 12: Start of a 'try' block to test the code section for errors.
Line 13: Comment
Line 14: Open a csv file in READ with the file object named csvfile. Newlines are started with a space.
Line 15: Convert csvfile to csv.reader object and save it as csv_reader with ',' used to separate fields and '|' to quote fields.
Line 16: Begins for loop to extract data one row at a time.
Line 17: Add each newly read row to the lit, 'data'
Line 18: Return the list, data
Line 19: This will handle exceptions and is executed if the 'try' block finds an error which occurs when the 'try' block finds a line of data that does not fit the requirments.
Line 20: Print error message

Your Turn: 2
Think of a common task you might encounter when working with data. Create pseudocode for a function to handle this task. Identify a name for the function, the parameters the would take, the steps needed to implement it, and any value returned by the function.

Type your answer in this Markdown cell.

Answer: 
# Common Task: Convert Old English measure of miles to Metric kilometers
# Pseudo Code:
Define function - name: 'Convert_mi_to_km' and parameter: 'file_info'
    import module csv to handle csv files
    import module sys to handle errors
    create placeholder list called data
    start error checking with 'try'
        read the csv file to a list called csvfile with ' ' for newlines
            create new object csv_reader from csvfile and set field separator and field quotes
            start 'for' loop to iterate through csv_reader
                convert miles to kilometers and add modified rows to 'data' list
        return the modified 'data' list
    set error handling for errors found
        print error


<h3 class="yt">Your Turn: 3</h3>
Fill in the missing code in the <span class="func">code2LC</span> function. The function should take a <span class="bltn">nested list</span> as its <b><i>parameter</i></b> and return a <span class="bltn">list</span> of <span class="str">lower case airport codes</span>. The comments can be used as hints. Call the function with <b><i>air_subset</i></b> as the argument. 

In [11]:
## see: Creating a User Defined Function

# Function to print airport code as lower case
def code2LC(air_data):
    # Create a list lower case 
    lower_case = []
    
    # Remove the header column from aList.
    aList = air_data[1:][:]

    # Loop through the values in aList
    for val in aList:
        
        # Store the airport code in the variable a_code
        a_code = val[0]
        
        # print a_code as lowercase with .lower()
        lower_case.append(a_code.lower())
        print(lower_case)

    # Return lower_case
    return lower_case
              
# Call the code2LC function with air_subset as the argument. Store the results in the variable lc and print it. 
lc = code2LC(air_subset)


['acy']
['acy', 'acy']
['acy', 'acy', 'acy']
['acy', 'acy', 'acy', 'acy']
['acy', 'acy', 'acy', 'acy', 'acy']
['acy', 'acy', 'acy', 'acy', 'acy', 'acy']
['acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy']
['acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy']
['acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy']
['acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy']
['acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy']


<h3 class="yt">Your Turn: 4</h3>
Create the <span class="func">code2LC</span> function as a <span class="bltn">lambda function</span>. Store the result in the variable <b><i>a_2_lower</i></b>. It should <span class="bltn">print</span> the same results as your function in Your Turn 3.

In [25]:
# Hint: use map() with the lambda function. 
a_2_lower = map(lambda air_subset: air_subset[0].lower(), air_subset[1:][:])

# Convert a_2_lower to a list then print
print(list(a_2_lower))

['acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy', 'acy']
