# Homework 0 : verify _fivethirtyeight_ conclusions using the same dataset

We will be focusing on the following statements made from the [dataset](https://github.com/fivethirtyeight/guns-data/blob/master/interactive_data.csv) :

1. Nearly two-thirds of gun deaths are suicides.
2. More than 85 percent of suicide victims are male.
3. Around a third of all gun deaths are homicides.
4. Around two-thirds of homicide victims who are males in the age-group of 15--34 are black.
5. Women constitue only 15 percent of the total homicide victims.




## Setting up our dataset

### First we need to find a way to display tables in jupyternotebooks :

From [this](https://stackoverflow.com/a/42323522/7360943):

In [1]:
from IPython.display import HTML, display
import tabulate
table = [["Sun",696000,1989100000],
         ["Earth",6371,5973.6],
         ["Moon",1737,73.5],
         ["Mars",3390,641.85]]
display(HTML(tabulate.tabulate(table, tablefmt='html')))

0,1,2
Sun,696000,1989100000.0
Earth,6371,5973.6
Moon,1737,73.5
Mars,3390,641.85


### Now we will figure out how to read csv into tables :

From [this example](https://realpython.com/python-csv/), and knowing the columns are : "Intent","Gender","Age","Race","Deaths","Population","Rate"
We use the for row loop, in order to fill an array with the row values :

In [2]:
import csv

with open('deaths.csv', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    line_count = 0
    deaths_table = [["Intent","Gender","Age","Race","Deaths","Population","Rate"]]
    INTENT = 0
    GENDER = 1
    AGE = 2
    RACE = 3
    DEATHS = 4
    POPULATION = 5
    RATE = 6
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        deaths_table.append([row["Intent"],row["Gender"],row["Age"],row["Race"],row["Deaths"],row["Population"],row["Rate"]])
        line_count += 1
    print(f'Processed {line_count} lines.')

Column names are , Intent, Gender, Age, Race, Deaths, Population, Rate
Processed 541 lines.


Now we display (or not if it's too big a table) the results in an HTML table : 

In [3]:
#display(HTML(tabulate.tabulate(deaths_table, tablefmt='html')))

# Learning more about the data

Through looking at the data, and discussions in class suggesting to look at [how the data is generated](https://github.com/fivethirtyeight/guns-data/blob/master/interactive_prep.R), we found out that when _None selected_ is input in a cell it actually means it is a dupplicate of the data with a specific value in the cell. To check this, we can sum all deaths with _None selected_ values for all categories, and compare to the sum of all the rest with specific categories :

In [16]:
deaths_none_selected = 0
deaths_except_none_selected = 0
for death_row in deaths_table[1:]:
    if death_row[INTENT:RACE+1] == ["None selected"]*4:
        deaths_none_selected+=int(death_row[DEATHS])
    elif all(cell != "None selected" for cell in death_row[INTENT:RACE+1]):
        deaths_except_none_selected+=int(death_row[DEATHS])
print(f"There are {deaths_none_selected} deaths with none selected as cause, and {deaths_except_none_selected} with causes selected")


There are 33599 deaths with none selected as cause, and 33595 with causes selected


We can see this is **exactly the same**. We could check for other combinations as well and it should check out.

## 1. Nearly two-thirds of gun deaths are suicides.

We will now add up all deaths, and only those that are suicides to calculate the said ratio :

In [18]:
total_suicides = 0
total_deaths = 0
for death_row in deaths_table[1:]:
    if death_row[INTENT] == "Suicide" and all( cell != "None selected" for cell in death_row[GENDER:RACE+1])  : #NB as per how the data was prepared this should only match one row
        total_suicides+=int(death_row[DEATHS])
    elif death_row[INTENT:RACE+1] == ["None selected"]*4:
        total_deaths+=int(death_row[DEATHS])
        
print("There were %d suicides for %d deaths in total" %(total_suicides ,total_deaths)) #old print syntax which I like less
print(f"This makes the percent ratio be {round(100*total_suicides/total_deaths, 1)}%")
print(f"Two thirds as they suggested makes {round(100*2/3, 1)} %")

There were 21058 suicides for 33599 deaths in total
This makes the percent ratio be 62.7%
Two thirds as they suggested makes 66.7 %


We can see that indeed to a few percent difference, nearly two thirds of all gun deaths are suicides : 

>1. Nearly two-thirds of gun deaths are suicides.  ✅

## 2. More than 85 percent of suicide victims are male.

For this, we must count (already done above) all suicide victims, and also only those that are males in order to calculate the said ratio :

In [20]:
male_suicides = 0
for death_row in deaths_table[1:]:
    if death_row[INTENT:RACE+1] == ["Suicide", "Male", "None selected", "None selected"]:
        male_suicides+=int(death_row[DEATHS])
print(f"We have {male_suicides} male suicides, for {total_suicides} suicides overall, making {round(100*male_suicides/total_suicides,1)} %")

We have 18162 male suicides, for 21058 suicides overall, making 86.2 %


This value checks out with in fact being over 85% : 

> 2. More than 85 percent of suicide victims are male. ✅

# 3. Around a third of all gun deaths are homicides.
We can reuse all gun deaths we counted above, and we will now count only homicides, which is very simple and the same as counting only male victimes as before with the right selection :

In [22]:
homicide_deaths = 0
for death_row in deaths_table[1:]:
    if death_row[INTENT:RACE+1] == ["Homicide"] + 3*["None selected"]:
        homicide_deaths+=int(death_row[DEATHS])
print(f"We have {homicide_deaths} homicide deaths, for {total_deaths} deaths overall, making {round(100*homicide_deaths/total_deaths,1)} %")

We have 11726 homicide deaths, for 33599 deaths overall, making 34.9 %


This does make ~33.3% so the affirmation is correct :

> 3. Around a third of all gun deaths are homicides. ✅

# 4. Around two-thirds of homicide victims who are males in the age-group of 15--34 are black.

We must count all male homicide victimes in age-group 15--34 then only those that are black :


In [31]:
male_homicide_victimes_15_34 = 0
black_male_homicide_victimes_15_34 = 0
for death_row in deaths_table[1:]:
    if death_row[INTENT:RACE+1] == ["Homicide", "Male", "15 - 34"] + ["None selected"]:
        male_homicide_victimes_15_34+=int(death_row[DEATHS])
    elif death_row[INTENT:RACE+1] == ["Homicide", "Male", "15 - 34", "Black"]:
        black_male_homicide_victimes_15_34+=int(death_row[DEATHS])
print(f"We have {black_male_homicide_victimes_15_34} black homicide victims in age-group 15 - 34, for {male_homicide_victimes_15_34} homicide victims in age-group 15 - 34 deaths overall, making {round(100*black_male_homicide_victimes_15_34/male_homicide_victimes_15_34,1)} %")

We have 4312 black homicide victims in age-group 15 - 34, for 6520 homicide victims in age-group 15 - 34 deaths overall, making 66.1 %


This does make very close to 66.6% which two thirds imply :

> 4. Around two-thirds of homicide victims who are males in the age-group of 15--34 are black. ✅

# 5. Women constitue only 15 percent of the total homicide victims.

Now we need to calculate only women homicide deaths, reusing the previous total homicide deaths :

We can actually make a function to more easily get values out of our list, which  would have simplified the whole of the previous cases as well.

In [39]:
def get_deaths_value(selector_list: list, deaths_table: list):
    deaths_value = 0
    for death_row in deaths_table[1:]:
        if death_row[INTENT:RACE+1] == selector_list:
            deaths_value+=int(death_row[DEATHS])
    return deaths_value

And use this function for requesting the women homicides :

In [44]:
women_homicides = get_deaths_value(["Homicide", "Female", "None selected", "None selected"], deaths_table)
print(f"We have {women_homicides} women homicides, as opposed to {homicide_deaths} homicides overall, which is a {round(100*women_homicides/homicide_deaths,1)} % ratio")

We have 1791 women homicides, as opposed to 11726 homicides overall, which is a 15.3 % ratio


This is indeed as stated :

> 5. Women constitue only 15 percent of the total homicide victims. ✅ 

# Sum up and thoughts !

We have found their statements to be correct, through our tedious work. Of course, it would be way more efficient to run through the whole list only once, as opposed to what we are doing here, running 5 for loops for each statement. Also, using the defined function for all verifications would improve readability and save page space.

In the end, this was fun, and we did not use anymore tools than the standard python library ! Eeasy :D