# Gun Deaths in the USA 2012 to 2014

### Baby's first data science project
***

This notebook will explore some gun death data helpfully provided by [fivethirtyeight](https://github.com/fivethirtyeight/guns-data)

Each row in the dataset represents a single gun related fatality. There are columns that contain demographic and other information about the victim. The columns are organized as so:

| # | year | month | intent | police | sex | age | race | hispanic | place | education |
|---|------|-------|--------|--------|-----|-----|------|----------|-------|-----------|
| 0 | 1    | 2     | 3      | 4      | 5   | 6   | 7    | 8        | 9     | 10        |

***

The first row is the header and represents the following data:

- \# (0) : Unique identifier
- year (1) : The year in which the fatality occured
- month (2) : The month in which the fatality occured
- intent (3) : The intent of the perpetrator, this can be one of the following:

    - Suicide
    - Accidental
    - NA
    - Homicide
    - Undetermined


- police (4) : Whether a police officer was involved. Either 1 (True) or 0 (False)
- sex (5) : The gender of the victim. M for Male or F for Female
- age (6) : The age of the victim
- race (7) : The race of the victim, this can be one of the following:

    - White
    - Asian/Pacific Islander
    - Hispanic
    - Native American/Native Alaskan
    - Black
    
    
- hispanic (8) : A code that indicates the Hispanic origin of the victim
- place (9) : Where the shooting occured
- education (10) : Education level of the victim, this can be one of the following:

    - 1 : Less than High School
    - 2 : Graduated from High School or equivalent
    - 3 : Some College
    - 4 : At least graduated from college
    - 5 : N/A

***

First things first, let's read in and clean up the data...


We are going to use the *csv* module to read it in and make a list of lists.

In [2]:
import csv

f = open('C:/Users/Manuel/Documents/Python/ipy/Gun Deaths/gun_data.csv', 'r')

data = list(csv.reader(f))

print(data[:10])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', 'BA+'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', 'Some college'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', 'BA+'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', 'HS/GED'], ['6', '2012', '02', 'Suicide', '0', 'M', '17', 'Native American/Native Alaskan', '100', 'Home', 'Less than HS'], ['7', '2012', '02', 'Undetermined', '0', 'M', '48', 'White', '100', 'Home', 'HS/GED'], ['8', '2012', '03', 'Suicide', '0', 'M', '41', 'Native American/Native Alaskan', '100', 'Home', 'HS/GED'], ['9', '2012', '02', 'Accidental', '0', 'M', '50', 'White', '100', 'Other specified', 'Some college']]


***

As you can see, it needs a bit of work before we can work with it. Let's start by removing the header row and preserving it in its own variable *headers*. Printing both will ensure that everything went correctly.

***

In [3]:
headers = data[0]
print(headers)

data = data[1:len(data)]
print(data[:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', 'BA+'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', 'Some college'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', 'BA+'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', 'HS/GED']]


***

Now that the data is cleaned up a bit we can start pulling out good information. We will start by making a dictionary of deaths per year.

***

In [4]:
# Small list comprehension to put the years in their own list

years = [death[1] for death in data]

# Create an empty dictionary, use a for loop to test if the year is in the dictionary yet or not, if it is add 1 to the count
# else create a new key in the dictionary for the year and set the value to 1

year_counts = {}

for year in years:
    if year in year_counts:
        year_counts[year] = year_counts[year] + 1
    else:
        year_counts[year] = 1

# Print it to make sure everything looks okay

print(year_counts)

{'2012': 33563, '2013': 33636, '2014': 33599}


***

Well the good news is that it looks fairly stable in those years. Let's see if there is any significant variance between months next. We are going to import the *datetime* module for this and create a *datetime object* using the year, month and a fixed day of our choice (since it isn't provided in the data). Since the year and month are saved as strings they will be converted to integers.

After getting the data ready we will execute a for loop similar to the one to create the *year_counts* dictionary to create a *date_counts* dictionary.

***

In [9]:
import datetime

# This list comprehension says to create a new datetime object for each row in the data. It will convert the value at 1
# to an integer and assign it to year, convert the value at 2 to an integer and assign it to month, and assign 3 to day just as
# a placeholder

dates = [datetime.datetime(year = int(death[1]), month = int(death[2]), day = 3) for death in data]

print(dates[:5])

date_counts = {}

# This is a little messy here, the datetime .strftime function will convert a datetime object to a string using a formula we
# provide it, in this case "%b %d, %Y" which formats to "Month Day, Year". Otherwise it follows the same logic as the year_counts
# formula above. In fact I should probably just make a function for these.

for date in dates:
    if date.strftime("%b %d, %Y") in date_counts:
        date_counts[date.strftime("%b %d, %Y")] = date_counts[date.strftime("%b %d, %Y")] + 1
    else:
        date_counts[date.strftime("%b %d, %Y")] = 1

print(date_counts)

[datetime.datetime(2012, 1, 3, 0, 0), datetime.datetime(2012, 1, 3, 0, 0), datetime.datetime(2012, 1, 3, 0, 0), datetime.datetime(2012, 2, 3, 0, 0), datetime.datetime(2012, 2, 3, 0, 0)]
{'Jan 03, 2012': 2758, 'Feb 03, 2012': 2357, 'Mar 03, 2012': 2743, 'Apr 03, 2012': 2795, 'May 03, 2012': 2999, 'Jun 03, 2012': 2826, 'Jul 03, 2012': 3026, 'Aug 03, 2012': 2954, 'Sep 03, 2012': 2852, 'Oct 03, 2012': 2733, 'Nov 03, 2012': 2729, 'Dec 03, 2012': 2791, 'Jan 03, 2013': 2864, 'Feb 03, 2013': 2375, 'Mar 03, 2013': 2862, 'Apr 03, 2013': 2798, 'May 03, 2013': 2806, 'Jun 03, 2013': 2920, 'Jul 03, 2013': 3079, 'Aug 03, 2013': 2859, 'Sep 03, 2013': 2742, 'Oct 03, 2013': 2808, 'Nov 03, 2013': 2758, 'Dec 03, 2013': 2765, 'Jan 03, 2014': 2651, 'Feb 03, 2014': 2361, 'Mar 03, 2014': 2684, 'Apr 03, 2014': 2862, 'May 03, 2014': 2864, 'Jun 03, 2014': 2931, 'Jul 03, 2014': 2884, 'Aug 03, 2014': 2970, 'Sep 03, 2014': 2914, 'Oct 03, 2014': 2865, 'Nov 03, 2014': 2756, 'Dec 03, 2014': 2857}


***

Remember how I said I should make a function for those? Going to do that now.

***