In this guided project, we'll work with a dataset on births in the U.S, compiled by FiveThirtyEight.

The dataset contains the following columns:

 - year: Year (1994 to 2003).
 - month: Month (1 to 12).
 - date_of_month: Day number of the month (1 to 31).
 - day_of_week: Day of week (1 to 7).
 - births: Number of births that day.
 

###  1. Introduction To The Dataset

In [62]:
file = "data/US_births_1994-2003_CDC_NCHS.csv"

In [63]:
data = open(file).read().split("\r")
data[0:10]

['year,month,date_of_month,day_of_week,births',
 '1994,1,1,6,8096',
 '1994,1,2,7,7772',
 '1994,1,3,1,10142',
 '1994,1,4,2,11248',
 '1994,1,5,3,11053',
 '1994,1,6,4,11406',
 '1994,1,7,5,11251',
 '1994,1,8,6,8653',
 '1994,1,9,7,7910']

### 2. Converting Data Into A List Of Lists

In [72]:
def read_csv(csv_file):
    data = open(csv_file).read().split("\r")
    string_list = data[1:]
    final_list = []
    for item in string_list:
        int_fields = []
        string_fields = item.split(",")
        for value in string_fields:
            int_fields.append(int(value))
        final_list.append(int_fields)
    return final_list

cdc_list = read_csv(file)
cdc_list[0:10]

[[1994, 1, 1, 6, 8096],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 3, 1, 10142],
 [1994, 1, 4, 2, 11248],
 [1994, 1, 5, 3, 11053],
 [1994, 1, 6, 4, 11406],
 [1994, 1, 7, 5, 11251],
 [1994, 1, 8, 6, 8653],
 [1994, 1, 9, 7, 7910],
 [1994, 1, 10, 1, 10498]]

### 3. Calculating Number Of Births Each Month

In [78]:
def month_births(data_list):
    births_per_month = {}
    for item in data_list:
        month = item[1]
        births = item[4]
        if month not in births_per_month:
            births_per_month[month] = births
        else:
            births_per_month[month] += births
    return births_per_month

cdc_month_births = month_births(cdc_list)
print(cdc_month_births)

{1: 3232517, 2: 3018140, 3: 3322069, 4: 3185314, 5: 3350907, 6: 3296530, 7: 3498783, 8: 3525858, 9: 3439698, 10: 3378814, 11: 3171647, 12: 3301860}


### 4. Calculating Number Of Births Each Day Of Week

In [79]:
def dow_births(data_list):
    births_per_dow = {}
    for item in data_list:
        dow = item[3]
        births = item[4]
        if dow not in births_per_dow:
            births_per_dow[dow] = births
        else:
            births_per_dow[dow] += births
    return births_per_dow

cdc_day_births = dow_births(cdc_list)
print(cdc_day_births)

{1: 5789166, 2: 6446196, 3: 6322855, 4: 6288429, 5: 6233657, 6: 4562111, 7: 4079723}


### 5.Creating A More General Function

In [81]:
def calc_counts(data_list, column):
    births_per_col = {}
    for item in data_list:
        col = item[column]
        births = item[4]
        if col not in births_per_col:
            births_per_col[col] = births
        else:
            births_per_col[col] += births
    return births_per_col

cdc_year_births = calc_counts(cdc_list, 0)
cdc_month_births = calc_counts(cdc_list, 1)
cdc_dom_births = calc_counts(cdc_list, 2)
cdc_dow_births =calc_counts(cdc_list, 3)

### 6.Next Steps
That's it for the guided steps. Here are some suggestions for next steps:

- Write a function that can calculate the min and max values for any dictionary that's passed in.

- Write a function that extracts the same values across years and calculates the differences between consecutive values to show if number of births is increasing or decreasing.
  - For example, how did the number of births on Saturday change each year between 1994 and 2003?

- Find a way to combine the CDC data with the SSA data, which you can find here. Specifically, brainstorm ways to deal with the overlapping time periods in the datasets.