## Merge 

###### We can use join and merge to combine 2 dataframes. The join method works best when we are joining dataframes on their indexes (though you can specify another column to join on for the left dataframe). The merge method is more versatile and allows us to specify columns besides the index to join on for both dataframes.

### merge() for combining data on common columns or indices
### .join() for combining data on a key column or an index
### concat() for combining DataFrames across rows or columns

#### At a high level:

#### .concat() simply stacks multiple DataFrame together either vertically, or stitches horizontally after aligning on index
#### .merge() first aligns two DataFrame' selected common column(s) or index, and then pick up the remaining columns from the aligned rows of each DataFrame.
#### More specifically, .concat():

##### Is a top-level pandas function
##### Combines two or more pandas DataFrame vertically or horizontally
##### Aligns only on the index when combining horizontally
##### Errors when any of the DataFrame contains a duplicate index.
##### Defaults to outer join with the option for inner join
##### And .merge():

##### Exists both as a top-level pandas function and a DataFrame method (as of pandas 1.0)
##### Combines exactly two DataFrame horizontally
##### Aligns the calling DataFrame's column(s) or index with the other DataFrame's column(s) or index
##### Handles duplicate values on the joining columns or index by performing a cartesian product
##### Defaults to inner join with options for left, outer, and right
##### Note that when performing pd.merge(left, right), if left has two rows containing the same values from the joining columns or index, each row will combine with right's corresponding row(s) resulting in a cartesian product. On the other hand, if .concat() is used to combine columns, we need to make sure no duplicated index exists in either DataFrame.

##### Practically speaking:

##### Consider .concat() first when combining homogeneous DataFrame, while consider .merge() first when combining complementary DataFrame.
##### If need to merge vertically, go with .concat(). If need to merge horizontally via columns, go with .merge(), which by default merge on the columns in common.

In [1]:
import pandas as pd

In [2]:
df1 = pd.DataFrame({'city': ['mumbai','delhi','banglore'],
                               'temperatur':[32,45,30],})

In [4]:
df2 = pd.DataFrame({'city': ['mumbai','delhi','banglore'],
                        'humidity':[68,65,75]})

In [5]:
df3 = pd.merge(df1,df2, on='city')

In [6]:
df3


Unnamed: 0,city,temperatur,humidity
0,mumbai,32,68
1,delhi,45,65
2,banglore,30,75


In [None]:
# let's say we have extra city then how will it merger

In [7]:
df1 = pd.DataFrame({'city': ['mumbai','delhi','banglore', 'pune'],
                               'temperatur':[32,45,30,28]})

In [8]:
df2 = pd.DataFrame({'city': ['mumbai','delhi','manali'],
                        'humidity':[68,65,10]})

In [10]:
df3 = pd.merge(df1,df2, on='city')

In [11]:
df3

Unnamed: 0,city,temperatur,humidity
0,mumbai,32,68
1,delhi,45,65


In [12]:
# in above code it joins only data of city which is common this is nothing but inner join

In [13]:
#if we want to include all city then we have to perform outer join 

In [15]:
df3 = pd.merge(df1,df2, on='city', how='outer')

In [16]:
df3

Unnamed: 0,city,temperatur,humidity
0,mumbai,32.0,68.0
1,delhi,45.0,65.0
2,banglore,30.0,
3,pune,28.0,
4,manali,,10.0


In [17]:
# inner join is default

In [18]:
## basically we can use merge if we are using column to combine two dataframe

In [19]:
# if we want common element betwwen two dataframe and also all the element from df1 then we can use left join

In [20]:
df3 = pd.merge(df1,df2, on='city', how='left')

In [21]:
df3

Unnamed: 0,city,temperatur,humidity
0,mumbai,32,68.0
1,delhi,45,65.0
2,banglore,30,
3,pune,28,


In [22]:
# if we want common element betwwen two dataframe and also all the element from df2 then we can use left join

In [24]:
df3 = pd.merge(df1,df2, on='city', how='right')

In [25]:
df3

Unnamed: 0,city,temperatur,humidity
0,mumbai,32.0,68
1,delhi,45.0,65
2,manali,,10


In [26]:
# in outer join we can find from which dataframe , data is coming

In [27]:
df3 = pd.merge(df1,df2, on='city', how='outer', indicator=True)

In [28]:
df3

Unnamed: 0,city,temperatur,humidity,_merge
0,mumbai,32.0,68.0,both
1,delhi,45.0,65.0,both
2,banglore,30.0,,left_only
3,pune,28.0,,left_only
4,manali,,10.0,right_only


In [31]:
df1 = pd.DataFrame({'city': ['mumbai','delhi','banglore', 'pune'],
                               'temperatur':[21,14,35,18],
                                'humidity':[68,65,71,75]})

In [32]:
df2 = pd.DataFrame({'city': ['mumbai','delhi','manali'],
                        'humidity':[68,65,10],
                        'temperatur':[21,14,15]})

In [33]:
df1

Unnamed: 0,city,temperatur,humidity
0,mumbai,21,68
1,delhi,14,65
2,banglore,35,71
3,pune,18,75


In [34]:
df2

Unnamed: 0,city,humidity,temperatur
0,mumbai,68,21
1,delhi,65,14
2,manali,10,15


In [35]:
df3 = pd.merge(df1,df2, on='city')

In [36]:
df3

Unnamed: 0,city,temperatur_x,humidity_x,humidity_y,temperatur_y
0,mumbai,21,68,68,21
1,delhi,14,65,65,14


In [38]:
df3 = pd.merge(df1,df2, on='city', suffixes = ('_left', '_right'))

In [39]:
df3

Unnamed: 0,city,temperatur_left,humidity_left,humidity_right,temperatur_right
0,mumbai,21,68,68,21
1,delhi,14,65,65,14
