# Functions

For this tutorial, we will use the modified US Census baby name dataset.

> US Social Security Administration (2022). "National data on the relative frequency of given names in the population of U.S. births wjere te individual has a Social Security Number." <https://www.ssa.gov/oact/babynames/limits.html>

The dataset has been modified from the source to include column names, and to limit the date range from 2010-2021.

First, a review of concepts:

* lists
* dictionaries
* loops
* conditionals
* Jupyter commands

For this we will refer back to tutorial 1.

## Setup

Begin by having everyone download "names.zip" from the repository. Make link available via Zoom and in-person.

Navigate to extracted names directory and create notebooks there.

In [22]:
# functions - syntax and example
# note the workflow here is based on the Carpentries

"""
# syntax
def function_name(arg1, arg2, ...):
    do stuff here
    return
"""

'\n# syntax\ndef function_name(arg1, arg2, ...):\n    do stuff here\n    return\n'

In [23]:
def kilograms_to_pounds(weight_k):
    weight_lbs = weight_k * 2.2
    return weight_lbs

In [24]:
print("my dog weighs", kilograms_to_pounds(15), 'pounds')

my dog weighs 33.0 pounds


### Exercise

Write a function that converts pounds to kilograms. A call to the function should look like:

```
pounds_to_kilograms(33)
```

And the output in the above case would be

```
14.99
```

In [25]:
def pounds_to_kilograms(weight_pounds):
    weight_k = weight_pounds / 2.2
    return weight_k

In [26]:
pounds_to_kilograms(33)

14.999999999999998

In [27]:
# we can calculate our weight on the moon

"""
Formula for this calculation is

weight_moon = (weight_earth/9.81m/s^2) * 1.622m/s^2
"""

def moon_weight_kilograms(earth_weight_k):
    moon_weight_k = (earth_weight_k/9.81) * 1.622
    rounded_moon_weight = round(moon_weight_k, 2)
    return rounded_moon_weight

In [28]:
moon_weight_kilograms(15)

2.48

In [29]:
# my dog is overweight - should she get more exercise or move to the moon?
# let's get her weight in pounds by _composing_ functions using the output of other functions

# all the work in the next function is done by our other functions
def moon_weight_pounds(earth_weight_k):
    moon_weight_k = moon_weight_kilograms(earth_weight_k)
    moon_weight_lbs = kilograms_to_pounds(moon_weight_k)
    return moon_weight_lbs

In [30]:
print('on the moon my dog would weigh', moon_weight_pounds(15), 'pounds')

on the moon my dog would weigh 5.456 pounds


### Exercise

So far all of our functions have returned something - in most cases a weight that has been calculated by the function.

It is not necessary to return an object in Python. In fact, the **return** statement itself is optional.

Given the following function definitions:

```
def hello(name):
    print("Hello", name, "!")
 
def greet(name):
    greeting = "Greetings " + str(name) + " !"
    return greeting
    
def farewell(name):
    goodbye = "Goodbye " + str(name) + " !"
    return
```

What is the output of the following:

```
hello('Sam')

print(hello('Sam'))

greet('Donna')

print(greet('Donna'))

farewell('Minerva')

print(farewell('Minerva'))
```

Are any results different from what we expect? Why? 

How can we change the ```farewell``` function so that we get an output?

In [62]:
def hello(name):
    print("Hello", name, "!")

def greet(name):
    greeting = "Greetings " + str(name) + "!"
    return greeting

def farewell(name):
    goodbye = "Goodbye " + str(name) + "!"
    return

In [63]:
hello('Sam')

Hello Sam !


In [64]:
print(hello("Sam")) # the None is bc nothing is returned

Hello Sam !
None


In [65]:
greet('Donna')

'Greetings Donna!'

In [66]:
print(greet('Donna'))

Greetings Donna!


In [67]:
farewell('Minerva')

In [68]:
print(farewell('Minerva')) # again, nothing is returned

None


In [69]:
# using baby name data without functions
# let's get the most popular baby names for a given year
# note we are using standard libraries - pandas is recommended for tabular data

import glob
import csv

In [70]:
# get a list of file names and also just the first filename

baby_name_files = glob.glob('./names/*')
print('all files:', baby_name_files)

one_year_names = baby_name_files[0]
print('\nthe first file in the list is:', one_year_names)

all files: ['./names\\2010', './names\\2011', './names\\2012', './names\\2013', './names\\2014', './names\\2015', './names\\2016', './names\\2017', './names\\2018', './names\\2019', './names\\2020', './names\\2021']

the first file in the list is: ./names\2010


In [71]:
# most popular names - since the files include the names most popular first,
# we could just eyeball it but we'd have to search for the most popular boy name

# read csv data into a list
name_data = []
with open(one_year_names, 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        name_data.append(row)

In [72]:
# note the structure - table is a list of dictionaries
# each row is a dictionary, with column names as keys

# we can open the CSV file in Jupyter and compare

name_data[:5]

[{'name': 'Isabella', 'sex': 'F', 'count': '22925'},
 {'name': 'Sophia', 'sex': 'F', 'count': '20648'},
 {'name': 'Emma', 'sex': 'F', 'count': '17354'},
 {'name': 'Olivia', 'sex': 'F', 'count': '17030'},
 {'name': 'Ava', 'sex': 'F', 'count': '15436'}]

In [73]:
# comparing values is a bit clumsy with CSV library
# this is one way

f_max_c = 0
f_popular = None
m_max_c = 0
m_popular = None

In [74]:
for name in name_data:
        if name['sex'] == 'F':
            if int(name['count']) > int(f_max_c):
                f_max_c = int(name['count'])
                f_popular = name['name']
        elif name['sex'] == 'M':
            if int(name['count']) > int(m_max_c):
                m_max_c = int(name['count'])
                m_popular = name['name']

In [75]:
# clean up the filename a little
y = one_year_names.replace('./names\\', '')

# output results
print(y, 'most popular girl name:', f_popular, '(', f_max_c, ')')
print(y, 'most popular boy name:', m_popular, '(', m_max_c, ')')

2010 most popular girl name: Isabella ( 22925 )
2010 most popular boy name: Jacob ( 22139 )


In [76]:
# our full code for the above is

name_data = []
with open(one_year_names, 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        name_data.append(row)
f_max_c = 0
f_popular = None
m_max_c = 0
m_popular = None
for name in name_data:
        if name['sex'] == 'F':
            if int(name['count']) > int(f_max_c):
                f_max_c = int(name['count'])
                f_popular = name['name']
        elif name['sex'] == 'M':
            if int(name['count']) > int(m_max_c):
                m_max_c = int(name['count'])
                m_popular = name['name']
# clean up the filename a little
y = one_year_names.replace('./names\\', '')

# output results
print(y, 'most popular girl name:', f_popular, '(', f_max_c, ')')
print(y, 'most popular boy name:', m_popular, '(', m_max_c, ')')

2010 most popular girl name: Isabella ( 22925 )
2010 most popular boy name: Jacob ( 22139 )


In [77]:
# that's okay but it is a bit hard to read, and doesn't scale well
# to get the results for a different year we need the file's index position in the file list
# also, we're using the same logic twice- once for girls' names and a second time for boys'
# these considerations (easy to break and repetitive) make this a good candidate for some functions

# simplify the counting process first
def popular_name(name_data, sex):
    name_c = 0
    popular = None
    for name in name_data:
        if name['sex'] == sex:
            if int(name['count']) > int(name_c):
                name_c = int(name['count'])
                popular = name['name']
    return popular, name_c

In [78]:
# test the function
# we have replaced 13 lines of code with only 9 lines of code! (it adds up!)

pop_girl_name, count_named = popular_name(name_data, 'F')
print('most popular girl name in the first file is:', pop_girl_name)
print('there were', count_named, 'girls named', pop_girl_name, 'in the file', one_year_names)

most popular girl name in the first file is: Isabella
there were 22925 girls named Isabella in the file ./names\2010


In [79]:
# something we repeat is reading files, so that also makes a good function
# in this case we're not really reducing the number of lines of code, but
# we are modularizing our code for reuse - anytime we need to read a csv file from here on
# only requires us to call this function

def csv_to_list(file_pointer):
    csv_data = []
    with open(file_pointer, 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            csv_data.append(row)
    return csv_data

In [80]:
# here is our above process using functions

name_data = csv_to_list(one_year_names)
pop_girl_name, count_girls_named = popular_name(name_data, 'F')
pop_boy_name, count_boys_named = popular_name(name_data, 'M')

y = one_year_names.replace('./names\\', '')

print(y, 'most popular girl name:', pop_girl_name, '(', count_girls_named, ')')
print(y, 'most popular boy name:', pop_boy_name, '(', count_boys_named, ')')

2010 most popular girl name: Isabella ( 22925 )
2010 most popular boy name: Jacob ( 22139 )


In [81]:
# we can refine this further
# it would be nice to enter a year and get the info for that year

def fnames_years(data_dir, fname_pattern):
    year_data = {}
    fnames = glob.glob(data_dir + "/" + fname_pattern)
    # this is pretty fragile - will break easily
    for f in fnames:
        data_year = f.replace(data_dir + '\\', '')
        year_data[data_year] = f
    return year_data

In [82]:
print(fnames_years('./names', "*"))

{'2010': './names\\2010', '2011': './names\\2011', '2012': './names\\2012', '2013': './names\\2013', '2014': './names\\2014', '2015': './names\\2015', '2016': './names\\2016', '2017': './names\\2017', '2018': './names\\2018', '2019': './names\\2019', '2020': './names\\2020', '2021': './names\\2021'}


In [83]:
annual_names = fnames_years('./names', "*")

In [84]:
# now we can pick an arbitrary year

y = 2022
if str(y) in annual_names.keys():
    file_pointer = annual_names[str(y)]
    name_data = csv_to_list(file_pointer)
    pop_girl_name, count_girls_named = popular_name(name_data, 'F')
    pop_boy_name, count_boys_named = popular_name(name_data, 'M')
    print(y, 'most popular girl name:', pop_girl_name, '(', count_girls_named, ')')
    print(y, 'most popular boy name:', pop_boy_name, '(', count_boys_named, ')')
else:
    print("Data for selected year not available.")

Data for selected year not available.


In [85]:
# we could make a function of the above
# then use it to process lists of years, single years, whatever makes sense

def favorite_baby_names_year(y, file_dict):
    if str(y) in file_dict.keys():
        file_pointer = file_dict[str(y)]
        name_data = csv_to_list(file_pointer)
        pop_girl_name, count_girls_named = popular_name(name_data, 'F')
        pop_boy_name, count_boys_named = popular_name(name_data, 'M')
        print(y, 'most popular girl name:', pop_girl_name, '(', count_girls_named, ')')
        print(y, 'most popular boy name:', pop_boy_name, '(', count_boys_named, ')')
    else:
        print("Data for selected year not available.")

In [86]:
# one year

favorite_baby_names_year(2015, annual_names)

2015 most popular girl name: Emma ( 20468 )
2015 most popular boy name: Noah ( 19654 )


In [87]:
# a list of years

year_list = [2008, 2012, 2018, 2024]
for y in year_list:
    print(y)
    favorite_baby_names_year(y, annual_names)
    print('\n')

2008
Data for selected year not available.


2012
2012 most popular girl name: Sophia ( 22322 )
2012 most popular boy name: Jacob ( 19091 )


2018
2018 most popular girl name: Emma ( 18786 )
2018 most popular boy name: Liam ( 19940 )


2024
Data for selected year not available.


