**Diether**  
**Fin 585R**  
**Homework**  
**Intro to Merging Data**

**Overview**

Merging data is a critical empirical skill. Modern empirical finance research often requires a researcher to merge together data from multiple sources. It's not unusual to have to merge together somewhere between 5-10 fairly large datasets for one empirical paper these days. Fortunately, `Pandas` merging capabilities and features are very good. The language or idioms for merging in `Pandas` mostly come from `SQL`. Please read the `Pandas`' documentation on merging before completing this homework:

[Pandas' Docs: Merging](http://pandas.pydata.org/pandas-docs/stable/merging.html)
<br>

The tasks below force you to work through `pandas` fundamental merging capabilities. Note, you don't need to make sure that the column or row order of your output matches mine.

In [1]:
import numpy as np
import pandas as pd

**Task 1**  

Merge the two dataframes (df1 and df2) so the resulting output looks like the following:

```
   id    x    y   v   w
0  i0   x0   y0  v0  w0
1  i2   x2   y2  v2  w2
2  i3   x3   y3  v3  w3
3  i4  NaN  NaN  v4  w4
4  i6  NaN  NaN  v6  w6
```

In [2]:
df1 = pd. DataFrame({'id': ['i0', 'i1', 'i2', 'i3','i5'],
                     'x': ['x0', 'x1', 'x2', 'x3','x5'],
                    'y': ['y0', 'y1', 'y2', 'y3','y5']})
df1

Unnamed: 0,id,x,y
0,i0,x0,y0
1,i1,x1,y1
2,i2,x2,y2
3,i3,x3,y3
4,i5,x5,y5


In [3]:
df2 = pd.DataFrame({'id': ['i0','i2', 'i3','i4','i6'],
                    'w': ['w0', 'w2', 'w3','w4','w6'],
                    'v': ['v0', 'v2', 'v3','v4','v6']})
df2

Unnamed: 0,id,w,v
0,i0,w0,v0
1,i2,w2,v2
2,i3,w3,v3
3,i4,w4,v4
4,i6,w6,v6


In [6]:
pd.merge(df1, df2, on='id', how='right')

Unnamed: 0,id,x,y,w,v
0,i0,x0,y0,w0,v0
1,i2,x2,y2,w2,v2
2,i3,x3,y3,w3,v3
3,i4,,,w4,v4
4,i6,,,w6,v6


**Task 2**  

Continue to use the dataframes from Task 1. Merge the two dataframes (df1 and df2) so that the resulting output looks like the following:

```
   id    x    y    v    w
0  i0   x0   y0   v0   w0
1  i1   x1   y1  NaN  NaN
2  i2   x2   y2   v2   w2
3  i3   x3   y3   v3   w3
4  i5   x5   y5  NaN  NaN
5  i4  NaN  NaN   v4   w4
6  i6  NaN  NaN   v6   w6
```

In [7]:
pd.merge(df1, df2, on='id', how='outer')

Unnamed: 0,id,x,y,w,v
0,i0,x0,y0,w0,v0
1,i1,x1,y1,,
2,i2,x2,y2,w2,v2
3,i3,x3,y3,w3,v3
4,i5,x5,y5,,
5,i4,,,w4,v4
6,i6,,,w6,v6


**Task 3**

Merge the two dataframes (`df` and `extra`) so that the resulting output looks like the following:

```
          me       ret stock  year  analysts  earnings
0  me_0,2014  r_0,2014    S0  2014  a_0,2014  e_0,2014
1  me_0,2015  r_0,2015    S0  2015       NaN       NaN
2  me_1,2014  r_1,2014    S1  2014  a_1,2014  e_1,2014
3  me_1,2015  r_1,2015    S2  2015  a_2,2015  e_2,2015
```

In [8]:
df = pd.DataFrame({'stock': ['S0', 'S0', 'S1', 'S2'],
                   'year': ['2014', '2015', '2014', '2015'],
                   'ret': ['r_0,2014', 'r_0,2015', 'r_1,2014', 'r_1,2015'],
                   'me': ['me_0,2014', 'me_0,2015', 'me_1,2014', 'me_1,2015']})
df

Unnamed: 0,stock,year,ret,me
0,S0,2014,"r_0,2014","me_0,2014"
1,S0,2015,"r_0,2015","me_0,2015"
2,S1,2014,"r_1,2014","me_1,2014"
3,S2,2015,"r_1,2015","me_1,2015"


In [9]:
extra = pd.DataFrame({'stock': ['S0', 'S1', 'S1', 'S2'],
                      'year' : ['2014', '2014', '2015', '2015'],
                      'earnings': ['e_0,2014', 'e_1,2014', 'e_1,2015', 'e_2,2015'],
                      'analysts': ['a_0,2014', 'a_1,2014', 'a_1,2015', 'a_2,2015']})
extra

Unnamed: 0,stock,year,earnings,analysts
0,S0,2014,"e_0,2014","a_0,2014"
1,S1,2014,"e_1,2014","a_1,2014"
2,S1,2015,"e_1,2015","a_1,2015"
3,S2,2015,"e_2,2015","a_2,2015"


In [14]:
pd.merge(df, extra, on=['stock', 'year'], how='left')[['me', 'ret', 'year', 'analysts', 'earnings']]

Unnamed: 0,me,ret,year,analysts,earnings
0,"me_0,2014","r_0,2014",2014,"a_0,2014","e_0,2014"
1,"me_0,2015","r_0,2015",2015,,
2,"me_1,2014","r_1,2014",2014,"a_1,2014","e_1,2014"
3,"me_1,2015","r_1,2015",2015,"a_2,2015","e_2,2015"
