In [1]:
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px

# **Suicide Rate vs GDP per Capita**
## **Selected Asian and North American Countries**
### In this study, the data was collected from [World Development Explorer](http://www.worlddev.xyz/) and centers on GDP (Gross Domestic Product), total population, and suicide mortality rate.   GDPs are normally calculated by using the country’s domestic currency, but when it comes to comparisons the standard currency is the US$. A high GDP meant high standard of living.  The suicide mortality rate is measured by the number of suicide deaths in a year per 100,000 population. This study is in comparison with the total population.


In [2]:
# Read the csv file and show the first 5
URL ="wdi_data.csv"
df = pd.read_csv("https://raw.githubusercontent.com/tmarissa/DATA-690-WANG/main/world_development_explorer/wdi_data.csv")
df.head(5)

Unnamed: 0.1,Unnamed: 0,Year,NY.GDP.PCAP.PP.CD,SP.POP.TOTL,SH.STA.SUIC.P5,Country Code,Country Name,Region,Income Group,Lending Type
0,0,2000,29265.251318,30685730,13.0,CAN,Canada,North America,High income,Not classified
1,1,2005,36211.084598,32243753,13.2,CAN,Canada,North America,High income,Not classified
2,2,2010,40019.92625,34004889,13.0,CAN,Canada,North America,High income,Not classified
3,3,2015,44670.080539,35702908,12.5,CAN,Canada,North America,High income,Not classified
4,4,2016,46472.340249,36109487,12.5,CAN,Canada,North America,High income,Not classified


### Originally, this dataset has 10 columns and 35 rows. However, the first column was dropped because it contains another set of index. A new dataframe of the first 5 rows was printed.

In [3]:
#Get information.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35 entries, 0 to 34
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Unnamed: 0         35 non-null     int64  
 1   Year               35 non-null     int64  
 2   NY.GDP.PCAP.PP.CD  35 non-null     float64
 3   SP.POP.TOTL        35 non-null     int64  
 4   SH.STA.SUIC.P5     35 non-null     float64
 5   Country Code       35 non-null     object 
 6   Country Name       35 non-null     object 
 7   Region             35 non-null     object 
 8   Income Group       35 non-null     object 
 9   Lending Type       35 non-null     object 
dtypes: float64(2), int64(3), object(5)
memory usage: 2.9+ KB


In [4]:
# There is an unnamed column that needs to be dropped
df.drop(columns=["Unnamed: 0"], inplace=True)
df.head(5)

Unnamed: 0,Year,NY.GDP.PCAP.PP.CD,SP.POP.TOTL,SH.STA.SUIC.P5,Country Code,Country Name,Region,Income Group,Lending Type
0,2000,29265.251318,30685730,13.0,CAN,Canada,North America,High income,Not classified
1,2005,36211.084598,32243753,13.2,CAN,Canada,North America,High income,Not classified
2,2010,40019.92625,34004889,13.0,CAN,Canada,North America,High income,Not classified
3,2015,44670.080539,35702908,12.5,CAN,Canada,North America,High income,Not classified
4,2016,46472.340249,36109487,12.5,CAN,Canada,North America,High income,Not classified


## **Description of the Data**

- According to the data type string, there are seven country names which has an equivalent of seven country code. Those countries belongs to 2 region. There are three income groups and two lending types,

- According to the statistical methods which are count, mean, std, min, 25%, 50%, and 75%. This data are analyze with their GDP per capita, total population, and suicide rate. 

In [5]:
df.describe(include='O')

Unnamed: 0,Country Code,Country Name,Region,Income Group,Lending Type
count,35,35,35,35,35
unique,7,7,2,3,2
top,JPN,United States,East Asia & Pacific,High income,Not classified
freq,5,5,25,20,20


In [6]:
df.describe()

Unnamed: 0,Year,NY.GDP.PCAP.PP.CD,SP.POP.TOTL,SH.STA.SUIC.P5
count,35.0,35.0,35.0,35.0
mean,2009.2,30878.481814,305092900.0,10.962857
std,6.134761,24567.019378,437113800.0,6.32135
min,2000.0,2920.560791,4027887.0,2.5
25%,2005.0,8104.751132,35906200.0,3.85
50%,2010.0,31663.453099,127141000.0,11.6
75%,2015.0,44392.41416,288839500.0,13.2
max,2016.0,89386.0794,1378665000.0,24.9


## **A Comparative Study from 2000 to 2016**
### To be able to see the statistical differences, the data is gathered from two different years, 2000 and 2016. A random sampler dataframe of year 2000 and 2016 are below. The countries in this study is grouped into 2 regions, North America and East Asian and the Pacific. The North American countries are United States and Canada. The East Asian and the Pacific are Philippines, China, Singapore, Indonesia, and Japan.

### The countries are choosen because of the incomes brackets of their citizens as a representation of their wealth. The United States and Canada belong to the high income brackets in North America and its counterpart on the high income bracket in East Asia and Pacific are Singapore and Japan. While the Indonesia and China are part of East Asia and Pacific but belong to upper middle income bracket. On the other hand, Philippines, a part of East Asia and Pacific, is on lower middle income bracket. The comparison of many East Asia and Pacific countries with North America countries is because the income brackets are more stable while in the East Asia and Pacific countries, some are established while others are emerging in their income brackets.

## **Asia's Optimistic Descent** 
### In the year 2000, the lowest GDP was China but has a suicide mortality rate of 13.2. Its mortality rate is almost the same as high GDP countries such as Singapore, United States and Canada. On the other hand, Indonesia and Philippines who have a low GDP but also have the lowest suicide mortality which is below 4. Japan has the highest suicide mortality was 24.7 with a high GDP. 

### In the year 2016, China's GDP leap from slightly below \$3,000 to slightly above \$13,000, yet it was able to reduce it suicide mortality rate to 9.7. Japan whose GDP started with a high \$39.96 in 2000 has declined to \$26.83 has also seen a reduction of suicide mortality rate to 18.5. However, the United States has an increase in their GDP by almost 50% has seen the suicide mortality rate increase from 11.3 as of 2000 to 15.3 in 2016. The other North American country, Canada has an increase of their GDP by almost 50% but has seen a reduction of a meager .5 from 13 to 12.5. Singapore doubled its GDP from 2000 to 2016 has seen a reduction of suicide mortality rate from 12.4 to 9.9.






In [7]:
fig = px.scatter(df, x='NY.GDP.PCAP.PP.CD', y="SH.STA.SUIC.P5", animation_frame="Year",
           size="SP.POP.TOTL", color="Country Name", hover_name="Country Name",
           log_x=True, size_max=55, range_x=[1000,100000],
           range_y=[0, 30])

for trace in fig.data:
    trace.name = trace.name.split('=')[1]

fig.update_layout(
    title="Suicide Mortality Rate vs GDP per Capita (2000-2016)",
    xaxis_title="GDP per Capita",
    yaxis_title="Suicide Mortality Rate")
  
fig.show()

### The bar chart belows shows the GDP for 2000 and 2016. In 2000, the highest to lowest GDP per capita were Singapore, United States, Canada, Japan, Indonesia, Philippines and China. China has the lowest GDP per capita in this study, but in the sixteen years span, China was able to move its GDP per capita above Indonesia in 2016. Dropping Philippines to have the lowest GDP per capita followed by Indonesia who the second to the lowest GDP per capita in this study.

### While looking at the bar chart in comparison from 2000 to 2016, the total population in East Asia and Pacific continues to grow as well as those in North America. East Asia and Pacific's GDP per capita also increases with those of North America. However, the suicide mortality rate in East Asia and Pacific saw some reduction while those in United States steadily increases while Canada saw its suicide mortality rate maintained. 

In [18]:
fig = px.bar(df, x='Country Name', y='NY.GDP.PCAP.PP.CD',
             hover_data=['SH.STA.SUIC.P5', 'NY.GDP.PCAP.PP.CD'], color='Region',
             animation_frame="Year", range_y=[0, 95000], height=500)  

for trace in fig.data:
    trace.name = trace.name.split('=')[1]

fig.update_layout(
    title="Suicide Mortality Rate vs GDP per Capita (2000-2016)",
    xaxis_title="Countries",
    yaxis_title="GDP per Capita") 
  

fig.show()

## **North America's Alarming Direction**

### As total population in North America and East Asia and Pacific grow and so is their GDP per capita. In fact, the growth in total population and GDP per capita in the regions are remarkably almost at the same pace. However, the disparity ends there. Through the years, the suicide mortality rate in the two different region seems to head to opposite directions. 

### Below the graphs seperate the countries into two regions.

In [9]:
df_region = df.groupby(["Region","Year"]).mean().reset_index()
df_region.sample(2)

Unnamed: 0,Region,Year,NY.GDP.PCAP.PP.CD,SP.POP.TOTL,SH.STA.SUIC.P5
5,North America,2000,32800.080047,156424070.5,12.15
1,East Asia & Pacific,2005,21211.920863,349674896.4,10.92


In [10]:
df_GDP_region = df_region.pivot('Year', "Region", "NY.GDP.PCAP.PP.CD")
fig = go.Figure()
fig.add_trace(go.Scatter(x = df_GDP_region.index, y = df_GDP_region["North America"], mode = "lines", name ="North America"))
fig.add_trace(go.Scatter(x = df_GDP_region.index, y = df_GDP_region["East Asia & Pacific"], mode = "lines", name ="East Asia & Pacific"))

fig.update_layout(
    title="GDP per Capita by Region 2000-2016)",
    xaxis_title="Years",
    yaxis_title="GDP per Capita")


fig.show()

In [11]:
df_population_region = df_region.pivot('Year', "Region", "SP.POP.TOTL")

fig = go.Figure()
fig.add_trace(go.Scatter(x = df_population_region.index, y = df_population_region["North America"], mode = "lines", name ="North America"))
fig.add_trace(go.Scatter(x = df_population_region.index, y = df_population_region["East Asia & Pacific"], mode = "lines", name ="East Asia & Pacific"))

fig.update_layout(
    title="Total Population by Region 2000-2016)",
    xaxis_title="Years",
    yaxis_title="Total Population Growth")


fig.show()

In [12]:
df_suicide_region = df_region.pivot('Year', "Region", "SH.STA.SUIC.P5")

fig = go.Figure()
fig.add_trace(go.Scatter(x = df_suicide_region.index, y = df_suicide_region["North America"], mode = "lines", name ="North America"))
fig.add_trace(go.Scatter(x = df_suicide_region.index, y = df_suicide_region["East Asia & Pacific"], mode = "lines", name ="East Asia & Pacific"))

fig.update_layout(
    title="Suicide Mortality Rate by Region 2000-2016)",
    xaxis_title="Years",
    yaxis_title="Suicide Mortality Rate")


fig.show()

### The United States’ and Canada’s suicide mortality hovers higher than 10 and can’t seem drop lower. Unlike Japan, whose suicide mortality rate started off at the high point of 24.7 in 2000 yet experiences continuous big drops which is down to 18.5 in 2016. Japan's drop in suicide mortality rate in 2010 might also account for the region's descend in suicide mortality rate.

### The line chart showed that the United States's suicide mortality rate is high and persistently increases while other countries has experience dips or still hovering at below 4 in the 16 years span. 

In [13]:
df2 = df[df["Country Name"].isin(["Canada", "China", "Indonesia", "Japan", "Philippines", "Singapore" ,"United States"])] 

In [21]:
fig = px.line(df2, x='Year', y='SH.STA.SUIC.P5',
             hover_data=['SH.STA.SUIC.P5', 'NY.GDP.PCAP.PP.CD'], color="Country Name")  

for trace in fig.data:
    trace.name = trace.name.split('=')[1]

fig.update_layout(
    title="Suicide Mortality Rate by Countries (2000-2016)",
    xaxis_title="Year",
    yaxis_title="Suicide Mortality Rate") 
  

fig.show()