# Working with data
## Joining Pandas DataFrames

***
<br>

## Why do we want to join DataFrame

* There are many occasions when we have related data spread across multiple files.
* The data can be related to each other in different ways. How they are related and how completely we can join the data from the datasets will vary.

## DataFrame join operation

* In SQL, such an operation, is called a 'join'.
* In Pandas, we have a `merge` function that allows us to join tables in 4 ways:
    * getting the common part - inner join
    * all rows from the left table, matching rows from the right table - left outer join
    * all rows from the right table, matching rows from the left table - right outer join
    * all rows - full outer join

In [1]:
import pandas as pd

In [2]:
# create the first DataFrame
a = [['Anna',24],['Michael',9],['John',40],['Eve',43]]
df_a = pd.DataFrame(a)
df_a.columns = ['Name', 'Age']
df_a

Unnamed: 0,Name,Age
0,Anna,24
1,Michael,9
2,John,40
3,Eve,43


In [3]:
# create the second DataFrame
b = {
    'Name': ['Eve','Michael','George','Catherine','Diana'],
    'City': ['Warsaw','Krakow','Gdansk','Poznan','Lodz']
}
df_b = pd.DataFrame(b)
df_b

Unnamed: 0,Name,City
0,Eve,Warsaw
1,Michael,Krakow
2,George,Gdansk
3,Catherine,Poznan
4,Diana,Lodz


In [4]:
# inner join, use intersection of keys (Name column) from both frames
pd.merge(df_a, df_b, on='Name')

Unnamed: 0,Name,Age,City
0,Michael,9,Krakow
1,Eve,43,Warsaw


In [5]:
# left outer join - use only keys (Name column) from left (df_a) frame
pd.merge(df_a, df_b, on='Name', how='left')

Unnamed: 0,Name,Age,City
0,Anna,24,
1,Michael,9,Krakow
2,John,40,
3,Eve,43,Warsaw


In [6]:
# right outer join - use only keys (Name column) from right (df_b) frame
pd.merge(df_a, df_b, on='Name', how='right')

Unnamed: 0,Name,Age,City
0,Eve,43.0,Warsaw
1,Michael,9.0,Krakow
2,George,,Gdansk
3,Catherine,,Poznan
4,Diana,,Lodz


In [7]:
# full outer join - use union of keys (Name column) from both frames
pd.merge(df_a, df_b, on='Name', how='outer')

Unnamed: 0,Name,Age,City
0,Anna,24.0,
1,Michael,9.0,Krakow
2,John,40.0,
3,Eve,43.0,Warsaw
4,George,,Gdansk
5,Catherine,,Poznan
6,Diana,,Lodz


## --- Exercise ---

Based on the given `technologies1` and `technologies2` dictionaries, create DataFrames and then perform a merge (4 types) DataFrames with the `merge` method as the key using column `Courses`.

In [None]:
technologies1 = {
    'Courses': ["Spark","PySpark","Python","pandas"],
    'Fee': [20000,25000,22000,30000],
    'Duration': ['30days','40days','35days','50days']
}

technologies2 = {
    'Courses': ["Spark","Java","Python","Go"],
    'Discount': [2000,2300,1200,2000]
}

# write your code here