In [2]:
import csv
import datetime

The dataset came from FiveThirtyEight, and can be found here. The dataset is stored in the guns.csv file. It contains information on gun deaths in the US from 2012 to 2014. Each row in the dataset represents a single fatality. The columns contain demographic and other information about the victim.

In this project, we'll explore the dataset, and try to find patterns in the demographics of the victims. Our first step is to read the data in and take a look at it.

In [3]:
with open("data/guns.csv","r") as f:
    csvreader = csv.reader(f)
    data = list(csvreader)

headers = data[0]
data = data[1:]
print(data[:5])

[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


The year column contains information on the year in which gun deaths occurred. We can use this column to calculate how many gun deaths happened in each year.

We can perform this operation by creating a dictionary, then keeping count in the dictionary of how many times each element occurs in the year column.

In [4]:
years = []
for row in data:
    years.append(row[1])

year_counts = {}
for year in years:
    if year not in year_counts:
        year_counts[year] = 1
    else:
        year_counts[year] += 1
        
print(year_counts)

{'2012': 33563, '2013': 33636, '2014': 33599}


It looks like gun deaths didn't change much by year from 2012 to 2014. Let's see if gun deaths in the US change by month and year.

In [5]:
dates = []
for row in data:
    dates.append(datetime.datetime(year=int(row[1]), month=int(row[2]), day=1))

date_counts = {}
for date in dates:
    if date not in date_counts:
        date_counts[date] = 1
    else:
        date_counts[date] += 1

for date, count in date_counts.items():
    print(date, count)

2012-01-01 00:00:00 2758
2012-02-01 00:00:00 2357
2012-03-01 00:00:00 2743
2012-04-01 00:00:00 2795
2012-05-01 00:00:00 2999
2012-06-01 00:00:00 2826
2012-07-01 00:00:00 3026
2012-08-01 00:00:00 2954
2012-09-01 00:00:00 2852
2012-10-01 00:00:00 2733
2012-11-01 00:00:00 2729
2012-12-01 00:00:00 2791
2013-01-01 00:00:00 2864
2013-02-01 00:00:00 2375
2013-03-01 00:00:00 2862
2013-04-01 00:00:00 2798
2013-05-01 00:00:00 2806
2013-06-01 00:00:00 2920
2013-07-01 00:00:00 3079
2013-08-01 00:00:00 2859
2013-09-01 00:00:00 2742
2013-10-01 00:00:00 2808
2013-11-01 00:00:00 2758
2013-12-01 00:00:00 2765
2014-01-01 00:00:00 2651
2014-02-01 00:00:00 2361
2014-03-01 00:00:00 2684
2014-04-01 00:00:00 2862
2014-05-01 00:00:00 2864
2014-06-01 00:00:00 2931
2014-07-01 00:00:00 2884
2014-08-01 00:00:00 2970
2014-09-01 00:00:00 2914
2014-10-01 00:00:00 2865
2014-11-01 00:00:00 2756
2014-12-01 00:00:00 2857


The `sex` and `race` columns contain potentially interesting information on how gun deaths in the US vary by gender and race. Exploring both of these columns can be done with a similar dictionary counting technique to what we did earlier.

In [6]:
sex_counts = {}
for row in data:
    if row[5] not in sex_counts:
        sex_counts[row[5]] = 1
    else:
        sex_counts[row[5]] += 1

race_counts = {}
for row in data:
    if row[7] not in race_counts:
        race_counts[row[7]] = 1
    else:
        race_counts[row[7]] += 1
        
for sex, count in sex_counts.items():
    print(sex, count)

for race, count in race_counts.items():
    print(race, count)

M 86349
F 14449
Asian/Pacific Islander 1326
White 66237
Native American/Native Alaskan 917
Black 23296
Hispanic 9022


The data does not suggest any strong correlation between the month of the year and the number of gun deaths. Additionally the number of gun deaths did not in- or decrease that much from 2012 to 2014. 

However there is a very strong correlation between `sex` and the number of gun deaths and `race` and the number of gun deaths: 

* The vast majority of gun victims are male (86349) while female gun victims are far less common (14449).
* Most gun victims are whie, followed by blacks, both accumulating to nearly 90000 gun victims in total. 

From here on, it might be interesting to analyze the cause of gun deaths and verify if there is any strong correlation between say sex, race and reason. An additional and very politicized issue would be to analyze whether police involvent correlates with the race of the victim. 

We explored gun deaths by race in the past screen. However, our analysis only gives us the total number of gun deaths by race in the US. Unless we know the proportion of each race in the US, we won't be able to meaningfully compare those numbers. What we really want to get is a rate of gun deaths per 100000 people of each race. In order to do this, we'll need to read in data about what percentage of the US population falls into each racial category. Luckily, we can import some census data to help us out.

In [7]:
with open("data/census.csv", "r") as f:
    csvreader = csv.reader(f)
    census = list(f)
    
census_header = census[0]
census = census[1]


In [8]:
mapping = {
    "Asian/Pacific Islander": 15159516 + 674625,
    "Black": 40250635,
    "Native American/Native Alaskan": 3739506,
    "Hispanic": 44618105,
    "White": 197318956
}

race_per_hundredk = {}

for race in race_counts:
    race_per_hundredk[race] = round((race_counts[race] / mapping[race]) * 100000, 3)

race_per_hundredk

{'Asian/Pacific Islander': 8.374,
 'Black': 57.877,
 'Hispanic': 20.22,
 'Native American/Native Alaskan': 24.522,
 'White': 33.568}

We can filter our results, and restrict them to the Homicide intent. This will tell us what the gun-related murder rate per 100000 people in each racial category is. In order to do this, we'll need to redo our work in generating race_counts, but only count rows where the intent was Homicide.

In [9]:
intents = [x[3] for x in data]
races = [x[7] for x in data]

homicide_race_counts = {}

for idx, race in enumerate(races):
    if intents[idx] == "Homicide":
        if race not in homicide_race_counts:
            homicide_race_counts[race] = 1
        else:
            homicide_race_counts[race] += 1
            
for race in homicide_race_counts:
    homicide_race_counts[race] = round((homicide_race_counts[race] / mapping[race]) * 100000, 2)

homicide_race_counts

{'Asian/Pacific Islander': 3.53,
 'Black': 48.47,
 'Hispanic': 12.63,
 'Native American/Native Alaskan': 8.72,
 'White': 4.64}

When looking at the homicide per race per 100000, it becomes obvious, that although white persons may account for the most gun victims overall, the most amount of homicide victims are black. 