## Merging and Appending

In a real-world scenario, you would rarely have the entire data stored in a single table to load into a dataframe. You would have to __load data into Python through multiple dataframes__ and then find a way to bring everything together.This is why __merge and append are some of the most common operations__ performed. First, let us import the necessary libraries.

In [1]:
import numpy as np
import pandas as pd

Great ! Now, let us Start with Merging DataFrames. 

### Merging DataFrames

Merging DataFrames is done in manner similar to the RDBMS (Relational Database Management System). Here, we pick out a common column between the dataframes, and use that column to return the merged dataframe. 

We can perform 4 kinds of merging. We also call them __`joins`__. These are - 

- __`inner join`__ - In this scenario, we will take all the common entries in the joining column from both the dataframes.
- __`outer join`__ - In this scenario, we will take all the entries from the joining column, from both the dataframes. 
- __`left join`__ - In this scenario, we will take all the entries from the left / first dataframe. 
- __`right join`__ - In this scenario, we will take all the entries from the right / second dataframe. 

This merge operation is performed using the __`merge()`__ method in Pandas. This method takes the `right dataframe` as argument, and takes 2 keyword arguments - `on`, specifying the common column, and `how`, specifying the type of join (from the options above). It follows the following syntax -

__`leftdf.merge(rightdf,how=,on=)`__

Let us see an example.


In [73]:
sales=pd.read_excel('https://github.com/yashj1301/Python3-UpGrad-UMich/blob/master/Python%203.x/Upgrad/Modules/Module%203%20-%20Python%20for%20Data%20Science/Session%203%20-%20Pandas/Data/sales_returns.xlsx?raw=true',
                    sheet_name='Orders')
sales.head()

Unnamed: 0,Order ID,Market,Profit,Sales
0,AG-2012-AA6453-41020,Africa,53.76,298.68
1,AG-2012-AC4203-40915,Africa,14.58,91.38
2,AG-2012-AH2103-41133,Africa,11.04,276.96
3,AG-2012-AJ7803-40978,Africa,7.17,35.97
4,AG-2012-AS2853-41235,Africa,15.36,54.9


In [74]:
sales_return=pd.read_excel('https://github.com/yashj1301/Python3-UpGrad-UMich/blob/master/Python%203.x/Upgrad/Modules/Module%203%20-%20Python%20for%20Data%20Science/Session%203%20-%20Pandas/Data/sales_returns.xlsx?raw=true',
                           sheet_name='Returns')
sales_return.head()

Unnamed: 0,Returned,Order ID
0,Yes,CA-2012-SA20830140-41210
1,Yes,IN-2012-PB19210127-41259
2,Yes,CA-2012-SC20095140-41174
3,Yes,IN-2015-JH158207-42140
4,Yes,IN-2014-LC168857-41747


We can clearly see that the column `Order ID` is common in both the dataframes. Now, let us use this to merge the two dataframes. We will first return all the entries from Sales that are returned. For this, we will use the `inner join`. Let us see it in action.

In [76]:
ret=sales.merge(sales_return,on='Order ID', how='inner')
ret.head()

Unnamed: 0,Order ID,Market,Profit,Sales,Returned
0,AG-2013-PO88653-41634,Africa,191.25,1932.24,Yes
1,AG-2014-CM21603-41755,Africa,10.32,43.05,Yes
2,AG-2014-CP20853-41889,Africa,14.1,84.72,Yes
3,AG-2014-RD95853-41712,Africa,21.03,64.38,Yes
4,AO-2013-JE57454-41544,Africa,106.59,499.23,Yes


We have merged the 2 dataframes. Now, let us see if our `Returned` column contains any unique values or not. 

In [77]:
ret['Returned'].value_counts()

Yes    1079
Name: Returned, dtype: int64

As we can see, our DataFrame contains only those values that were returned. Now, if we want to return all the values from the `Sales` Dataframe, regardless of whether they were returned or not, then we will use the `left join`. Let us see it in action.

In [114]:
left=sales.merge(sales_return,on='Order ID',how='left')
left.head()

Unnamed: 0,Order ID,Market,Profit,Sales,Returned
0,AG-2012-AA6453-41020,Africa,53.76,298.68,
1,AG-2012-AC4203-40915,Africa,14.58,91.38,
2,AG-2012-AH2103-41133,Africa,11.04,276.96,
3,AG-2012-AJ7803-40978,Africa,7.17,35.97,
4,AG-2012-AS2853-41235,Africa,15.36,54.9,


We can see that the column `Returned` contains NoneType values. To change this, first let us convert the column values to string, and replace the values of `NaN` by `No`.

In [115]:
left['Returned']=left['Returned'].astype(str).replace('nan','No')
left['Returned'].value_counts()

No     24649
Yes     1079
Name: Returned, dtype: int64

See? This is how we can use the `merge()` method for merging two dataframes. Here, we merged two entire dataframes. Now, we will see concatenating dataframes.

### Concatenating DataFrames

In Merging, we merged two dataframes having a common column. When we talk about concatenation, we say that __two (or more) dataframes having the exact same columns, but different entries__.

In this case, we say that __we are just adding entries to an existing dataframe using another dataframe__ with the same columns. This can be achieved using the __`concat()`__ method, which takes two arguments, a `list of dataframes to be joined` and an `index (0 (for columns) or 1 (for rows))`. It follows the following syntax - 

__`pd.concat([list-of-df],axis=)`__

Let us see an example.




In [122]:
gold = pd.DataFrame({'Country': ['USA', 'France', 'Russia'],
                         'Medals': [15, 13, 9]}
                    )
silver = pd.DataFrame({'Country': ['India', 'Germany', 'Australia'],
                        'Medals': [29, 20, 16]}
                    )
bronze = pd.DataFrame({'Country': ['Singapore', 'UK', 'Canada'],
                        'Medals': [40, 28, 27]}
                    )

Let us view them one by one. We use the `display()` function for this purpose. Let us see it in action.

In [127]:
display(gold,silver,bronze)

Unnamed: 0,Country,Medals
0,USA,15
1,France,13
2,Russia,9


Unnamed: 0,Country,Medals
0,India,29
1,Germany,20
2,Australia,16


Unnamed: 0,Country,Medals
0,Singapore,40
1,UK,28
2,Canada,27


Now, let us concatenate these dataframes. First, we will concatenate them via rows, then by columns.

In [123]:
pd.concat([gold,silver,bronze])

Unnamed: 0,Country,Medals
0,USA,15
1,France,13
2,Russia,9
0,India,29
1,Germany,20
2,Australia,16
0,Singapore,40
1,UK,28
2,Canada,27


In [130]:
pd.concat([gold,silver,bronze],axis=1)

Unnamed: 0,Country,Medals,Country.1,Medals.1,Country.2,Medals.2
0,USA,15,India,29,Singapore,40
1,France,13,Germany,20,UK,28
2,Russia,9,Australia,16,Canada,27


See? This is how we concatenate dataframes using the `concat()` method. It can also be used to combine dataframes, but only in the case where the columns are completely different. 