# Functions

For this tutorial, we will use the modified US Census baby name dataset.

> US Social Security Administration (2022). "National data on the relative frequency of given names in the population of U.S. births wjere te individual has a Social Security Number." <https://www.ssa.gov/oact/babynames/limits.html>

The dataset has been modified from the source to include column names, and to limit the date range from 2010-2021.

First, a review of concepts:

* lists
* dictionaries
* loops
* conditionals

For this we will refer back to tutorial 1.

In [2]:
# functions - syntax and example
# note the workflow here is based on the Carpentries

"""
# syntax
def function_name(arg1, arg2, ...):
    do stuff here
    return
"""

'\n# syntax\ndef function_name(arg1, arg2, ...):\n    do stuff here\n    return\n'

In [13]:
def kilograms_to_pounds(weight_k):
    weight_lbs = weight_k * 2.2
    return weight_lbs

In [14]:
print("my dog weighs", kilograms_to_lbs(15), 'pounds')

my dog weighs 33.0 pounds


In [24]:
# we can calculate our weight on the moon

"""
Formula for this calculation is

weight_moon = (weight_earth/9.81m/s^2) * 1.622m/s^2
"""

def moon_weight_kilograms(earth_weight_k):
    moon_weight_k = (earth_weight_k/9.81) * 1.622
    return round(moon_weight_k, 2)

In [25]:
moon_weight_kilograms(15)

2.48

In [28]:
# my dog is overweight - should she get more exercise or move to the moon?
# let's get her weight in pounds by _composing_ functions using the output of other functions

# all the work in the next function is done by our other functions
def moon_weight_pounds(earth_weight_k):
    moon_weight_k = moon_weight_kilograms(earth_weight_k)
    moon_weight_lbs = kilograms_to_pounds(moon_weight_k)
    return moon_weight_lbs

In [29]:
print('on the moon my dog would weigh', moon_weight_pounds(15), 'pounds')

on the moon my dog would weigh 5.456 pounds


In [30]:
# using baby name data without functions
# let's get the most popular baby names for a given year
# note we are using standard libraries - pandas is recommended for tabular data

import glob
import csv

In [32]:
# get a list of file names and also just the first filename

baby_name_files = glob.glob('./names/*')
print('all files:', baby_name_files)

one_year_names = baby_name_files[0]
print('\nthe first file in the list is:', one_year_names)

all files: ['./names\\2010', './names\\2011', './names\\2012', './names\\2013', './names\\2014', './names\\2015', './names\\2016', './names\\2017', './names\\2018', './names\\2019', './names\\2020', './names\\2021']

the first file in the list is: ./names\2010


In [34]:
# most popular names - since the files include the names most popular first,
# we could just eyeball it but we'd have to search for the most popular boy name

# read csv data into a list
name_data = []
with open(one_year_names, 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        name_data.append(row)

In [37]:
# note the structure - table is a list of dictionaries
# each row is a dictionary, with column names as keys

name_data[:5]

[{'name': 'Isabella', 'sex': 'F', 'count': '22925'},
 {'name': 'Sophia', 'sex': 'F', 'count': '20648'},
 {'name': 'Emma', 'sex': 'F', 'count': '17354'},
 {'name': 'Olivia', 'sex': 'F', 'count': '17030'},
 {'name': 'Ava', 'sex': 'F', 'count': '15436'}]

In [38]:
# comparing values is a bit clumsy with CSV library
# this is one way

f_max_c = 0
f_popular = None
m_max_c = 0
m_popular = None

In [39]:
for name in name_data:
        if name['sex'] == 'F':
            if int(name['count']) > int(f_max_c):
                f_max_c = int(name['count'])
                f_popular = name['name']
        elif name['sex'] == 'M':
            if int(name['count']) > int(m_max_c):
                m_max_c = int(name['count'])
                m_popular = name['name']

In [41]:
# clean up the filename a little
y = one_year_names.replace('./names\\', '')

# output results
print(y, 'most popular girl name:', f_popular, '(', f_max_c, ')')
print(y, 'most popular boy name:', m_popular, '(', m_max_c, ')')

2010 most popular girl name: Isabella ( 22925 )
2010 most popular boy name: Jacob ( 22139 )


In [42]:
# our full code for the above is

name_data = []
with open(one_year_names, 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        name_data.append(row)
f_max_c = 0
f_popular = None
m_max_c = 0
m_popular = None
for name in name_data:
        if name['sex'] == 'F':
            if int(name['count']) > int(f_max_c):
                f_max_c = int(name['count'])
                f_popular = name['name']
        elif name['sex'] == 'M':
            if int(name['count']) > int(m_max_c):
                m_max_c = int(name['count'])
                m_popular = name['name']
# clean up the filename a little
y = one_year_names.replace('./names\\', '')

# output results
print(y, 'most popular girl name:', f_popular, '(', f_max_c, ')')
print(y, 'most popular boy name:', m_popular, '(', m_max_c, ')')

2010 most popular girl name: Isabella ( 22925 )
2010 most popular boy name: Jacob ( 22139 )


In [49]:
# that's okay but it is a bit hard to read, and doesn't scale well
# to get the results for a different year we need the file's index position in the file list
# also, we're using the same logic twice- once for girls' names and a second time for boys'
# these considerations (easy to break and repetitive) make this a good candidate for some functions

# simplify the counting process first
def popular_name(name_data, sex):
    name_c = 0
    popular = None
    for name in name_data:
        if name['sex'] == sex:
            if int(name['count']) > int(name_c):
                name_c = int(name['count'])
                popular = name['name']
    return popular, name_c

In [51]:
# test the function
# we have replaced 13 lines of code with only 9 lines of code! (it adds up!)

pop_girl_name, count_named = popular_name(name_data, 'F')
print('most popular girl name in the first file is:', pop_girl_name)
print('there were', count_named, 'girls named', pop_girl_name, 'in the file', one_year_names)

most popular girl name in the first file is: Isabella
there were 22925 girls named Isabella in the file ./names\2010


In [53]:
# something we repear is reading files, so that also makes a good function
# in this case we're not really reducing the number of lines of code, but
# we are modularizing our code for reuse - anytime we need to read a csv file from here on
# only requires us to call this function

def csv_to_list(file_pointer):
    csv_data = []
    with open(file_pointer, 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            csv_data.append(row)
    return csv_data

In [54]:
# here is our above process using functions

name_data = csv_to_list(one_year_names)
pop_girl_name, count_girls_named = popular_name(name_data, 'F')
pop_boy_name, count_boys_named = popular_name(name_data, 'M')

y = one_year_names.replace('./names\\', '')

print(y, 'most popular girl name:', pop_girl_name, '(', count_girls_named, ')')
print(y, 'most popular boy name:', pop_boy_name, '(', count_boys_named, ')')

2010 most popular girl name: Isabella ( 22925 )
2010 most popular boy name: Jacob ( 22139 )
