# Homework 3 - Immigration, the stock market, and GDP

The objective of this homework is to practice working with Pandas Dataframes. To successfully complete this homework, you may use any resources available to you. 

Answer the following question: What has a higher correlation with the GDP in the US: stock market returns or immigration?

You need to accomplish the following tasks:
1. Install the [wbdata](http://wbdata.readthedocs.io/en/latest/) package for API access to Worldbank data.
2. Explore the databases `Population estimates and projections`, `Global Financial Development`, and `World Development Indicators`.
3. Get the data on `GDP per capita growth (annual %)` as a dataframe.
4. Get the data on `Net immigration` as a dataframe (Make sure that you also have a percentage value for this). 
5. Get the data on `Stock market return (%, year-on-year)` as a dataframe.
5. Explore the data and note the issues. 
5. Clean and combine the data.
6. What is the correlation between the GDP and net immigration and stock market returns. 

In [2]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [70]:
import wbdata
import pandas as pd

## 1)Check the database for available resources:<br>
- we see that #40, 32, and 2 are the database needed

In [4]:
wbdata.get_source()

11	Africa Development Indicators
36	Statistical Capacity Indicators
31	Country Policy and Institutional Assessment
41	Country Partnership Strategy for India (FY2013 - 17)
1 	Doing Business
30	Exporter Dynamics Database ��� Indicators at Country-Year Level
12	Education Statistics
13	Enterprise Surveys
28	Global Financial Inclusion
33	G20 Financial Inclusion Indicators
14	Gender Statistics
15	Global Economic Monitor
27	Global Economic Prospects
32	Global Financial Development
21	Global Economic Monitor Commodities
55	Commodity Prices- History and Projections
34	Global Partnership for Education
29	The Atlas of Social Protection: Indicators of Resilience and Equity
16	Health Nutrition and Population Statistics
39	Health Nutrition and Population Statistics by Wealth Quintile
40	Population estimates and projections
18	IDA Results Measurement System
45	Indonesia Database for Policy and Economic Research
6 	International Debt Statistics
54	Joint External Debt Hub
25	Jobs
37	LAC Equity Lab
19	M

## 2) Select data: <br>
- we can then select the sources to find the code to the corresponding data we need to generate the dataframes.

In [120]:
# wbdata.get_indicator(source=40)
# wbdata.get_indicator(source=2)
# wbdata.get_indicator(source=32)

## 3) Generate Dataframes: <br>
 - We can now generate dataframes from the data selected; we can also set the country code to 'USA'.<br>
 - For the 'Net Immigrations' dataframe, we need an additional percentage column; this is calculated below using total population of the same year. 

In [68]:
# Total population
indicators = {"SP.POP.TOTL":"Population estimates and projections"}
totalPop = wbdata.get_dataframe(indicators, country = 'USA')
#totalPop

In [119]:
# Net immigration
indicators = {"SM.POP.NETM":"Population estimates and projections"}
df1 = wbdata.get_dataframe(indicators,  country = 'USA')

# check total NaN values in dataframe
df1.isnull().sum()


Population estimates and projections    47
dtype: int64

In [121]:
joinPop = pd.merge(totalPop, df1, left_index=True, right_index=True)
joinPop = joinPop.rename(columns={'Population estimates and projections_x': 'Total Population', 'Population estimates and projections_y': 'Net Immigration'})


- A new column called 'Percent Value' is created by dividing values of the the columns 'Net Immigration' and 'Total Population'

In [122]:
joinPop['Percent Value'] = joinPop['Net Immigration'] / joinPop['Total Population']


- All NaN values are dropped

In [96]:
df0 = joinPop.dropna()
netImmigration = df0.drop('Total Population', axis=1)


In [150]:
# GDP per capita growth (annual %)
indicators = {"NY.GDP.PCAP.KD.ZG":"World Development Indicators"}
gdpGrowth = wbdata.get_dataframe(indicators, country = 'USA')
gdpGrowth = gdpGrowth.rename(columns={'World Development Indicators': 'GDP per Capita Growth (annual %)'})

# check total NaN values in dataframe
gdpGrowth.isnull().sum()

GDP per Capita Growth (annual %)    2
dtype: int64

In [149]:
# Stock Market Return (%, year on year)
indicators = {"GFDD.OM.02":"Global Financial Development"}
stockReturn = wbdata.get_dataframe(indicators, country = 'USA')
stockReturn = stockReturn.rename(columns={'Global Financial Development': 'Stock Market Return'})

# check total NaN values in dataframe
stockReturn.isnull().sum()

Stock Market Return    1
dtype: int64

## 4) Cleaning up:<br>
- As majority of NaN values exists in the Net Immigration dataframe, by dropping them from that dataframe and then joining the GDP and Stock Market dataframes to it, we are ignoring the years where there is no data for Immigration (since we are merging the dataframes on index, and in this case, its the date)

- Here, we are merging the completed Net Immigration dataframe with the other two dataframes, GDP growth and Stock Market Return.

In [140]:
merge1 = netImmigration.join(gdpGrowth)


In [145]:
merge2 = merge1.join(stockReturn)


- Below is the combined dataframe. After dropping all NaN values (47 in Net Immigration, 1 in every other dataframe), we are left with the below data. There is a  5 year gap between each row; this is due to immigration data, where the data is collected over a five year period.

In [146]:
merge2

Unnamed: 0_level_0,Net Immigration,Percent Value,GDP per Capita Growth (annual %),Stock Market Return
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2012,4500000.0,0.014331,1.463851,8.81
2007,5033689.0,0.01671,0.815188,12.72
2002,5206538.0,0.018102,0.846126,-16.77
1997,8612074.0,0.031586,3.236587,30.27
1992,4516808.0,0.017608,2.129114,10.52
1987,3428740.0,0.014151,2.541097,21.41
1982,3402260.0,0.014686,-2.841549,-6.51
1977,3926811.0,0.01783,3.561747,-3.78
1972,2940497.0,0.014009,4.142406,11.0
1967,1549465.0,0.007798,1.389951,7.96


### Data Analysis:<br>
- We first take a look at the max Stock Market Return value; this happend in 1997, the same year we had the max value for immigration. We can conclude that Stock Market Return is positively correlated to the immigration rate.<br><br>

- The GDP per Capita Growth on the other hand, is negatively related to immigration and stock market return. We can see this by the Max and Min taken for each column; when GDP is at max, the resulting immigration and stock market return is on the lower end, closer to both column's min. values. Mean can also serve as a reference point here; when GDP is at max, both immigration and stock market return values are below their mean.


In [158]:
# Return the row with max value for stock market
merge2.loc[merge2["Stock Market Return"].idxmax]

Net Immigration                     8.612074e+06
Percent Value                       3.158574e-02
GDP per Capita Growth (annual %)    3.236587e+00
Stock Market Return                 3.027000e+01
Name: 1997, dtype: float64

In [161]:
# Return the row with max value for GDP
merge2.loc[merge2['GDP per Capita Growth (annual %)'].idxmax]

Net Immigration                     1.829274e+06
Percent Value                       9.806442e-03
GDP per Capita Growth (annual %)    4.480669e+00
Stock Market Return                -5.960000e+00
Name: 1962, dtype: float64

In [165]:
# Return the row with max value for immigration %
merge2.loc[merge2['Percent Value'].idxmax]

Net Immigration                     8.612074e+06
Percent Value                       3.158574e-02
GDP per Capita Growth (annual %)    3.236587e+00
Stock Market Return                 3.027000e+01
Name: 1997, dtype: float64

In [167]:
merge2['Stock Market Return'].max()
merge2['Stock Market Return'].min()
merge2['Stock Market Return'].mean()

30.27

-16.77

6.333636363636364

In [166]:
merge2['Percent Value'].max()
merge2['Percent Value'].min()
merge2['Percent Value'].mean()

0.03158574325984662

0.007797541165103265

0.016056212909710706

In [168]:
merge2['GDP per Capita Growth (annual %)'].max()
merge2['GDP per Capita Growth (annual %)'].min()
merge2['GDP per Capita Growth (annual %)'].mean()

4.48066935423344

-2.84154866453498

1.9786534390025714

### Sources:<br>

http://wbdata.readthedocs.io/en/latest/  -Wbdata documentation <br>

https://stackoverflow.com/questions/40468069/python-pandas-merge-two-dataframes-by-index  - join on index <br>

https://stackoverflow.com/questions/11346283/renaming-columns-in-pandas   -renaming columns <br>

https://stackoverflow.com/questions/34023918/make-new-column-in-panda-dataframe-by-adding-values-from-other-columns/34023971  - create new column with values from existing columns <br>

https://stackoverflow.com/questions/26266362/how-to-count-the-nan-values-in-the-column-in-panda-data-frame -count NaN values in columns <br>

https://stackoverflow.com/questions/13413590/how-to-drop-rows-of-pandas-dataframe-whose-value-in-certain-columns-is-nan  - dropping Nan values (see middle page examples) <br>