# Gun Deaths in the USA 2012 to 2014

### Exploring the data
***

This notebook will explore some gun death data helpfully provided by [fivethirtyeight](https://github.com/fivethirtyeight/guns-data)

Each row in the dataset represents a single gun related fatality. There are columns that contain demographic and other information about the victim. The columns are organized as so:

| # | year | month | intent | police | sex | age | race | hispanic | place | education |
|---|------|-------|--------|--------|-----|-----|------|----------|-------|-----------|
| 0 | 1    | 2     | 3      | 4      | 5   | 6   | 7    | 8        | 9     | 10        |

***

The first row is the header and represents the following data:

- \# (0) : Unique identifier
- year (1) : The year in which the fatality occured
- month (2) : The month in which the fatality occured
- intent (3) : The intent of the perpetrator, this can be one of the following:

    - Suicide
    - Accidental
    - NA
    - Homicide
    - Undetermined


- police (4) : Whether a police officer was involved. Either 1 (True) or 0 (False)
- sex (5) : The gender of the victim. M for Male or F for Female
- age (6) : The age of the victim
- race (7) : The race of the victim, this can be one of the following:

    - White
    - Asian/Pacific Islander
    - Hispanic
    - Native American/Native Alaskan
    - Black
    
    
- hispanic (8) : A code that indicates the Hispanic origin of the victim
- place (9) : Where the shooting occured
- education (10) : Education level of the victim, this can be one of the following:

    - 1 : Less than High School
    - 2 : Graduated from High School or equivalent
    - 3 : Some College
    - 4 : At least graduated from college
    - 5 : N/A

***

First things first, let's read in and clean up the data...


We are going to use the *csv* module to read it in and make a list of lists.

In [1]:
import csv

f = open('C:/Users/Manuel/Documents/Python/ipy/Gun Deaths/gun_data.csv', 'r')

data = list(csv.reader(f))

print(data[:10])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', 'BA+'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', 'Some college'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', 'BA+'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', 'HS/GED'], ['6', '2012', '02', 'Suicide', '0', 'M', '17', 'Native American/Native Alaskan', '100', 'Home', 'Less than HS'], ['7', '2012', '02', 'Undetermined', '0', 'M', '48', 'White', '100', 'Home', 'HS/GED'], ['8', '2012', '03', 'Suicide', '0', 'M', '41', 'Native American/Native Alaskan', '100', 'Home', 'HS/GED'], ['9', '2012', '02', 'Accidental', '0', 'M', '50', 'White', '100', 'Other specified', 'Some college']]


***

As you can see, it needs a bit of work before we can work with it. Let's start by removing the header row and preserving it in its own variable *headers*. Printing both will ensure that everything went correctly.

***

In [2]:
headers = data[0]
print(headers)

data = data[1:len(data)]
print(data[:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', 'BA+'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', 'Some college'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', 'BA+'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', 'HS/GED']]


***

Now that the data is cleaned up a bit we can start pulling out good information. We will start by making a dictionary of deaths per year.

***

In [3]:
# Small list comprehension to put the years in their own list

years = [death[1] for death in data]

# Create an empty dictionary, use a for loop to test if the year is in the dictionary yet or not, if it is add 1 to the count
# else create a new key in the dictionary for the year and set the value to 1

year_counts = {}

for year in years:
    if year in year_counts:
        year_counts[year] = year_counts[year] + 1
    else:
        year_counts[year] = 1

# Print it to make sure everything looks okay

print(year_counts)

{'2012': 33563, '2013': 33636, '2014': 33599}


***

Well the good news is that it looks fairly stable in those years. Let's see if there is any significant variance between months next. We are going to import the *datetime* module for this and create a *datetime object* using the year, month and a fixed day of our choice (since it isn't provided in the data). Since the year and month are saved as strings they will be converted to integers.

After getting the data ready we will execute a for loop similar to the one to create the *year_counts* dictionary to create a *date_counts* dictionary.

***

In [4]:
import datetime

# This list comprehension says to create a new datetime object for each row in the data. It will convert the value at 1
# to an integer and assign it to year, convert the value at 2 to an integer and assign it to month, and assign 3 to day just as
# a placeholder

dates = [datetime.datetime(year = int(death[1]), month = int(death[2]), day = 3) for death in data]

print(dates[:5])

date_counts = {}

# This is a little messy here, the datetime .strftime function will convert a datetime object to a string using a formula we
# provide it, in this case "%b %d, %Y" which formats to "Month Day, Year". Otherwise it follows the same logic as the year_counts
# formula above. In fact I should probably just make a function for these.

for date in dates:
    if date.strftime("%b %d, %Y") in date_counts:
        date_counts[date.strftime("%b %d, %Y")] = date_counts[date.strftime("%b %d, %Y")] + 1
    else:
        date_counts[date.strftime("%b %d, %Y")] = 1

print(date_counts)

[datetime.datetime(2012, 1, 3, 0, 0), datetime.datetime(2012, 1, 3, 0, 0), datetime.datetime(2012, 1, 3, 0, 0), datetime.datetime(2012, 2, 3, 0, 0), datetime.datetime(2012, 2, 3, 0, 0)]
{'Jan 03, 2012': 2758, 'Feb 03, 2012': 2357, 'Mar 03, 2012': 2743, 'Apr 03, 2012': 2795, 'May 03, 2012': 2999, 'Jun 03, 2012': 2826, 'Jul 03, 2012': 3026, 'Aug 03, 2012': 2954, 'Sep 03, 2012': 2852, 'Oct 03, 2012': 2733, 'Nov 03, 2012': 2729, 'Dec 03, 2012': 2791, 'Jan 03, 2013': 2864, 'Feb 03, 2013': 2375, 'Mar 03, 2013': 2862, 'Apr 03, 2013': 2798, 'May 03, 2013': 2806, 'Jun 03, 2013': 2920, 'Jul 03, 2013': 3079, 'Aug 03, 2013': 2859, 'Sep 03, 2013': 2742, 'Oct 03, 2013': 2808, 'Nov 03, 2013': 2758, 'Dec 03, 2013': 2765, 'Jan 03, 2014': 2651, 'Feb 03, 2014': 2361, 'Mar 03, 2014': 2684, 'Apr 03, 2014': 2862, 'May 03, 2014': 2864, 'Jun 03, 2014': 2931, 'Jul 03, 2014': 2884, 'Aug 03, 2014': 2970, 'Sep 03, 2014': 2914, 'Oct 03, 2014': 2865, 'Nov 03, 2014': 2756, 'Dec 03, 2014': 2857}


***

Remember how I said I should make a function for those? Going to do that now.

***

In [7]:
def counter_dict(little_list):
    counts = {}
    for thing in little_list:
        if thing in counts:
            counts[thing] = counts[thing] + 1
        else:
            counts[thing] = 1
    return(counts)

In [8]:
# Use a list comprehension to make a list of the sex column in the data. Then make a counting dictionary
# with our new function

sex = [death[5] for death in data]

sex_counts = counter_dict(sex)
print(sex_counts)

# Use a list comprehension to make a list of the race column in the data. Then make a counting dictionary
# with our new function

race = [death[7] for death in data]

race_counts = counter_dict(race)
print(race_counts)

{'M': 86349, 'F': 14449}
{'Asian/Pacific Islander': 1326, 'White': 66237, 'Native American/Native Alaskan': 917, 'Black': 23296, 'Hispanic': 9022}


***

There is some great info here but it would be helpful to add some more depth to it. Now we are going to bring in some census data that includes the total number of people of each race in the United States. This dataset is stored in the *census.csv* file. The data is in the following format:

| ID | Year | Id.1 | Sex | Id.2 | Hispanic Origin | Id.3 | Id2 | Geography | Total | Race Alone - White | Race Alone - Hispanic | Race Alone - Black or African American | Race Alone - American Indian and Alaska Native | Race Alone - Asian | Race Alone - Native Hawaiian and Other Pacific Islander | Two or More Races |
|----|------|------|-----|------|-----------------|------|-----|-----------|-------|--------------------|-----------------------|----------------------------------------|------------------------------------------------|--------------------|---------------------------------------------------------|-------------------|
| 0  | 1    | 2    | 3   | 4    | 5               | 6    | 7   | 8         | 9     | 10                 | 11                    | 12                                     | 13                                             | 14                 | 15                                                      | 16                |

***

The four important columns:
- Year : The year the census data is from
- Total : The total population of the United States
- Race Alone - XXX : The total population of XXX race
- Two or More Races : The total population of people of 2 or more races


***


In [13]:
# Read in the csv file and use the *csv module* to make a list of lists

cf = open('C:/Users/Manuel/Documents/Python/ipy/Gun Deaths/census.csv', 'r')

census = list(csv.reader(cf))

print(census)

[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]


In [19]:
# Put this data into a dictionary. Going to split the nested lists and then enumerate to create the dictionary.

col1 = census[0]
col2 = census[1]

census_dict = {}

for i, col in enumerate(col1):
    census_dict[col] = col2[i]
    
print(census_dict)

{'Id': '0100000US', 'Year': 'April 1, 2010 Census', 'Sex': 'Both Sexes', 'Hispanic Origin': 'Total', 'Id2': '', 'Geography': 'United States', 'Total': '308745538', 'Race Alone - White': '197318956', 'Race Alone - Hispanic': '44618105', 'Race Alone - Black or African American': '40250635', 'Race Alone - American Indian and Alaska Native': '3739506', 'Race Alone - Asian': '15159516', 'Race Alone - Native Hawaiian and Other Pacific Islander': '674625', 'Two or More Races': '6984195'}


***

There is an issue here in which the racial categories don't match between the 2 datasets. We will manually fix this.

***

In [31]:
race_pops = {
    "White":int(census_dict["Race Alone - White"]),
    "Asian/Pacific Islander":(int(census_dict["Race Alone - Asian"]) + int(census_dict["Race Alone - Native Hawaiian and Other Pacific Islander"])),
    "Hispanic":int(census_dict["Race Alone - Hispanic"]),
    "Native American/Native Alaskan":int(census_dict["Race Alone - American Indian and Alaska Native"]),
    "Black":int(census_dict["Race Alone - Black or African American"])
}

print(race_pops)

{'White': 197318956, 'Asian/Pacific Islander': 15834141, 'Hispanic': 44618105, 'Native American/Native Alaskan': 3739506, 'Black': 40250635}


***

Now we have two dictionaries, race_counts that contains the number of gun fatalities by race, and race_pops that contains the total population by race. They are both in the same format which makes it easier to work with.

First we will calculate the gun death rate for each race

***

In [42]:
def death_rate(race):
    rate = int(race_counts[race]) / int(race_pops[race])
    return(rate)

In [43]:
# Test to make sure our function works

white_rate = death_rate('White')
print(white_rate)
asian_rate = death_rate('Asian/Pacific Islander')
print(asian_rate)
hispanic_rate = death_rate('Hispanic')
print(hispanic_rate)
native_rate = death_rate('Native American/Native Alaskan')
print(native_rate)
black_rate = death_rate('Black')
print(black_rate)

0.0003356849303419181
8.374309664161763e-05
0.00020220491210910907
0.0002452195557381109
0.000578773477735196


***

As you can see there are problems with this. Calculating by fatalities per 100,000 should give us a clearer picture.

***

In [44]:
def deaths_per_100(race):
    hundred_rate = ((int(race_counts[race]) / int(race_pops[race])) * 100000)
    return(hundred_rate)

In [52]:
# Test to make sure our function works

white_per_hundred = deaths_per_100('White')
print("White fatalities per 100,000 people: ", white_per_hundred)
asian_per_hundred = deaths_per_100('Asian/Pacific Islander')
print("Asian fatalities per 100,000 people: ", asian_per_hundred)
hispanic_per_hundred = deaths_per_100('Hispanic')
print("Hispanic fatalities per 100,000 people: ", hispanic_per_hundred)
native_per_hundred = deaths_per_100('Native American/Native Alaskan')
print("Native fatalities per 100,000 people: ", native_per_hundred)
black_per_hundred = deaths_per_100('Black')
print("Black fatalities per 100,000 people: ", black_per_hundred)

White fatalities per 100,000 people:  33.56849303419181
Asian fatalities per 100,000 people:  8.374309664161762
Hispanic fatalities per 100,000 people:  20.220491210910907
Native fatalities per 100,000 people:  24.521955573811088
Black fatalities per 100,000 people:  57.8773477735196


In [55]:
# We could iterate for this if we wanted...and we should!

for thing in race_counts:
    print(thing, "gun fatalities per 100,000: ", deaths_per_100(thing))

Asian/Pacific Islander gun fatalities per 100,000:  8.374309664161762
White gun fatalities per 100,000:  33.56849303419181
Native American/Native Alaskan gun fatalities per 100,000:  24.521955573811088
Black gun fatalities per 100,000:  57.8773477735196
Hispanic gun fatalities per 100,000:  20.220491210910907


***

Let's keep pulling out data before making inferences. Next let's find information about gun fatalities based on intent, starting with murder rate for each race.

***

In [66]:
# Create a list of all intents and all races. We will enumerate between these.

intents = [death[3] for death in data]
races = [death[7] for death in data]
homicide_by_race_hundred = {}

# Create a dictionary of murders by race

for i, race in enumerate(races):
    if intents[i] == "Homicide":
        if race in homicide_by_race_hundred:
            homicide_by_race_hundred[race] = homicide_by_race_hundred[race] + 1
        else:
            homicide_by_race_hundred[race] = 1

# A function to calculate murder rate per 100k

def murders_per_100(race):
    murder_rate = ((int(homicide_by_race_hundred[race]) / int(race_pops[race])) * 100000)
    return(murder_rate)

# Display the murder rate per 100k by race

for race in race_counts:
    print(race, "gun homicides per 100,000: ", murders_per_100(race))

Asian/Pacific Islander gun homicides per 100,000:  3.530346230970155
White gun homicides per 100,000:  4.6356417981453335
Native American/Native Alaskan gun homicides per 100,000:  8.717729026240365
Black gun homicides per 100,000:  48.471284987180944
Hispanic gun homicides per 100,000:  12.627161104219914


### Reserved for later analysis

#### Possible avenues: link between month and homicides, link between education and suicide, link between education and murder, links between genders, explore accidental deaths, etc etc