# Suicide Trend Analysis (1985-2016)
## -- By  Economy, Gender, Age, Generation, Countries

(Tableau Project)

## Introduction
This is a Tableau Project published on [my Tableau Public](https://public.tableau.com/profile/linjing7424#!/vizhome/SuicideRateAnalysis2/1).

The goal of this project aims at improving suicide prevention. The age, gender, generation, nationality of people who died by suicide have been analysed, as well as the economy of the countries. 

## Data

The data for this project consists of one data file (comes originally from [Kaggle](https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016)):

- `suicide.csv`: 27820 data points (rows) with 12 columns

The columns include:

1. country
- year
- sex
- age
- suicides_no
- population
- suicides/100k pop
- country-year
- HDI for year
- gdp_for_year (USD)
- gdp_per_capita (USD)
- generation

## Summary

From the graphs, it can be seen that different age, gender, generation groups have a signifcate impact on the suicide rates. People who belong to the groups of males, older people, or earlier generation have a higher suicide rate. 

GDP also significantly affect them - as the global GDP per Capita (USD) increases, the global suicide rate decreases.

Besides, the rates largely vary among countries.



## Design


- This dataset includes geographic data, so I used the world map to show the different suicide rates among countries and regions. 

- It also include years, which makes it easy to present how the rates change with time among people of different age, gender, generation groups.

- Bar charts are used to compare the differences of rates between different age, gender, generation groups.

- To compare the trends of world economy and suicide rates, I used bars for the economy indicators and lines for the rates to separate them visually.

- It might be worth for readers to find the trends of some specific countries, so I put the multiple  selector beside the graphs.


## Feedback

*@Eileen* gave me some feedbak on [my first sketch](https://public.tableau.com/profile/linjing7424#!/vizhome/SuicideRateAnalysis/1):

> A very interesting story! For the different generations, you should explain the terms "Generation Z", "Millenials", ... etc. with years. You cannot assume that all readers now when to place them. 

> For the last slide, it might be nice to additionally see the global statistics divided by gender.

### Changes after collecting feedback
In the original sourch, I did not find the specific start / end year of each generation. Thus, I checked it online. However, for most of the generations (except baby boomers), there is no specific specific start / end year, and only decades are available. Finally I used decades to describe the generations for those who might not familiar with them.

In my initial sketch, I did include the global statistics divided by gender, but I put the graph at the beginning of the story, far from the dashboard of gender difference among countries. Now I put them next to each other, so readers can easily compare them.


## Resources
N/A

In [1]:
# All the documents are stored on Google Drive.
# It requires authorisation before use them.
# This chunk of code is about authorisation.

# Mounting Google Drive locally
# https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveA

from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


In [0]:
!pwd

/content/gdrive/My Drive/Colab Notebooks/dand_p8_tableau


In [0]:
# CPU!!! --gdrive
#google colab - change path to google drive

import os 
os.chdir('gdrive/My Drive/Colab Notebooks/dand_p8_tableau')
!pwd

/content/gdrive/My Drive/Colab Notebooks/dand_p8_tableau


In [0]:
!ls

 data					'Project Overview.gdoc'
 flight_delay_analysis_2008_2018.ipynb	 suicide_trends_analysis.ipynb


In [0]:
import pandas as pd

In [0]:
df = pd.read_csv('data/suicide.csv')

## Data

The data for this project consists of one data file (comes originally from [Kaggle](https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016)):

- `suicide.csv`: 27820 data points (rows) with 12 columns

The columns include:

1. country
- year
- sex
- age
- suicides_no
- population
- suicides/100k pop
- country-year
- HDI for year
- gdp_for_year (USD)
- gdp_per_capita (USD)
- generation

In [0]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27820 entries, 0 to 27819
Data columns (total 12 columns):
country               27820 non-null object
year                  27820 non-null int64
sex                   27820 non-null object
age                   27820 non-null object
suicides_no           27820 non-null int64
population            27820 non-null int64
suicides/100k pop     27820 non-null float64
country-year          27820 non-null object
HDI for year          8364 non-null float64
 gdp_for_year ($)     27820 non-null object
gdp_per_capita ($)    27820 non-null int64
generation            27820 non-null object
dtypes: float64(2), int64(4), object(6)
memory usage: 2.5+ MB


In [0]:
df.sample()

Unnamed: 0,country,year,sex,age,suicides_no,population,suicides/100k pop,country-year,HDI for year,gdp_for_year ($),gdp_per_capita ($),generation
27050,United States,2001,male,5-14 years,214,21032860,1.02,United States2001,,10621824000000,40018,Millenials


In [0]:
df['age'].value_counts()

35-54 years    4642
55-74 years    4642
75+ years      4642
25-34 years    4642
15-24 years    4642
5-14 years     4610
Name: age, dtype: int64

In [0]:
df['year'].value_counts()

2009    1068
2001    1056
2010    1056
2007    1032
2011    1032
2002    1032
2003    1032
2000    1032
2006    1020
2008    1020
2005    1008
2004    1008
1999     996
2012     972
2013     960
1998     948
2014     936
1995     936
1996     924
1997     924
1994     816
1992     780
1993     780
1990     768
1991     768
2015     744
1987     648
1989     624
1988     588
1986     576
1985     576
2016     160
Name: year, dtype: int64

In [0]:
df['sex'].value_counts()

male      13910
female    13910
Name: sex, dtype: int64