### Joining DataFrames

Pandas provide many ways to merge and join DataFrames based on different conditions.

In [2]:
import numpy as np
import pandas as pd


df1 = pd.DataFrame({'name': ['John', 'George', 'Ringo'],
                    'color': ['Blue', 'Blue', 'Purple']})
df2 = pd.DataFrame({'name': ['Paul', 'George', 'Ringo'],
                    'carcolor': ['Red', 'Blue', np.nan]},
                    index=[3, 1, 2])

df1

Unnamed: 0,name,color
0,John,Blue
1,George,Blue
2,Ringo,Purple


In [3]:
df2

Unnamed: 0,name,carcolor
3,Paul,Red
1,George,Blue
2,Ringo,


### Adding Rows to DataFrames

The `concat()` function in pandas provide a convenient way to add rows to a DataFrame; it accpets a list of DataFrames to combine, find any columns that have the same name, and uses a single column for each of the repeated columns:

In [4]:
pd.concat([df1, df2])

Unnamed: 0,name,color,carcolor
0,John,Blue,
1,George,Blue,
2,Ringo,Purple,
3,Paul,,Red
1,George,,Blue
2,Ringo,,


The `concat()` function preserves the index value from the DataFrames it is joining. You can pass `verrify_integrity=True` to the function to raise an exception if any of the DataFrames have overlapping index values. Alternatively, pass `ignore_index=True` for the function to create new index values for the merged DataFrame:

In [5]:
pd.concat([df1, df2], ignore_index=True)

Unnamed: 0,name,color,carcolor
0,John,Blue,
1,George,Blue,
2,Ringo,Purple,
3,Paul,,Red
4,George,,Blue
5,Ringo,,


### Adding Columns to DataFrames

The `cocat()` function will do the same with adding columns to a DataFrame, all we need to do is too pass `axis=1`:

In [6]:
pd.concat([df1, df2], axis=1)

Unnamed: 0,name,color,name.1,carcolor
0,John,Blue,,
1,George,Blue,George,Blue
2,Ringo,Purple,Ringo,
3,,,Paul,Red


### Joining DataFrames

Based, on set theory, pandas support 4 types of merger/joining of DataFrames: 

* Inner join: Intersection of the DataFrames
* Outter join: Union of the DataFrames
* Left join: Intersection and the "left" DataFrame
* Right join: Intersection and the "right" DataFrame

There are two function/methods in pandas for this. The `.join()` method join two DataFrames based on index values, while the `.merge()` method/function is a more general version of that, able to merge on index value or columns (mix and match possible). 

Let's try an inner join:

In [7]:
df1.merge(df2) # Inner join is the default

Unnamed: 0,name,color,carcolor
0,George,Blue,Blue
1,Ringo,Purple,


An outer join will fill in nan values for those entries that does not exist of either DataFrames to be merged:

In [8]:
df1.merge(df2, how='outer')

Unnamed: 0,name,color,carcolor
0,John,Blue,
1,George,Blue,Blue
2,Ringo,Purple,
3,Paul,,Red


Using parameters `left_on=`, `right_on=`, `left_index=`, and `right_index`, we can specify which index/columns to join on for both DataFrames.