# Concatenation & Joins
Create the 1st dataframe.

In [318]:
import pandas as pd

In [319]:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                    index=[0, 1, 2, 3])
df1

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


Create the 2nd dataframe.

In [320]:
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D4', 'D5', 'D6', 'D7']},
                    index=[4, 5, 6, 7])
df2

Unnamed: 0,A,B,C,D
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


Let's concatenate `df1` and `df2`.

In [321]:
result = pd.concat([df1, df2])
result

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


In [322]:
result_axis1 = pd.concat([df1, df2], axis = 1) 
result_axis1

Unnamed: 0,A,B,C,D,A.1,B.1,C.1,D.1
0,A0,B0,C0,D0,,,,
1,A1,B1,C1,D1,,,,
2,A2,B2,C2,D2,,,,
3,A3,B3,C3,D3,,,,
4,,,,,A4,B4,C4,D4
5,,,,,A5,B5,C5,D5
6,,,,,A6,B6,C6,D6
7,,,,,A7,B7,C7,D7


You can concatenate a mix of Series and DataFrames. The Series will be transformed to DataFrames with the column name as the name of the Series.

In [323]:
x1 = pd.Series(['extra0', 'extra1', 'extra2', 'extra3'], name='X')

pd.concat([df1, x1], axis=1)

Unnamed: 0,A,B,C,D,X
0,A0,B0,C0,D0,extra0
1,A1,B1,C1,D1,extra1
2,A2,B2,C2,D2,extra2
3,A3,B3,C3,D3,extra3


Create the 3rd DataFrame named `df3`.

In [324]:
df3 = pd.DataFrame({'E': ['E0', 'E1', 'E2', 'E3'],
                    'F': ['F0', 'F1', 'F2', 'F3'],
                    'G': ['G0', 'G1', 'G2', 'G3'],
                    'H': ['H0', 'H1', 'H2', 'H3']},
                    index=[0, 1, 2, 3])
df3


Unnamed: 0,E,F,G,H
0,E0,F0,G0,H0
1,E1,F1,G1,H1
2,E2,F2,G2,H2
3,E3,F3,G3,H3


In [339]:
pd.concat([df1, df3], axis=1)

Unnamed: 0,A,B,C,D,E,F,G,H
0,A0,B0,C0,D0,E0,F0,G0,H0
1,A1,B1,C1,D1,E1,F1,G1,H1
2,A2,B2,C2,D2,E2,F2,G2,H2
3,A3,B3,C3,D3,E3,F3,G3,H3


## Append

In [326]:
result_append = df1.append(df2)
result_append

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


Does it produce the same results as `pd.concat([df1, df2])`?

# Joins


Let's create a small DataFrame named `df_a` to and try a left, right, inner and outer join. 


In [327]:
raw_data = {
        'subject_id': ['1', '2', '3', '4', '5'],
        'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 
        'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
df_a = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
df_a

Unnamed: 0,subject_id,first_name,last_name
0,1,Alex,Anderson
1,2,Amy,Ackerman
2,3,Allen,Ali
3,4,Alice,Aoni
4,5,Ayoung,Atiches


Create a 2nd DataFrame named `df_b`.

In [328]:
raw_data = {
        'subject_id': ['4', '5', '6', '7', '8'],
        'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 
        'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
df_b = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
df_b

Unnamed: 0,subject_id,first_name,last_name
0,4,Billy,Bonder
1,5,Brian,Black
2,6,Bran,Balwner
3,7,Bryce,Brice
4,8,Betty,Btisan


Create a 3rd DataFrame named `df_n`.

In [329]:
raw_data = {
        'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
        'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}
df_n = pd.DataFrame(raw_data, columns = ['subject_id','test_id'])
df_n

Unnamed: 0,subject_id,test_id
0,1,51
1,2,15
2,3,15
3,4,61
4,5,16
5,7,14
6,8,15
7,9,1
8,10,61
9,11,16


## Merge with a *left* join

Left outer join produces a complete set of records from `df_a`, with the matching records (where available) in `df_b`. If there is no match, the right side will contain null.

In [330]:
pd.merge(df_a, df_b, on ='subject_id', how ='left')

Unnamed: 0,subject_id,first_name_x,last_name_x,first_name_y,last_name_y
0,1,Alex,Anderson,,
1,2,Amy,Ackerman,,
2,3,Allen,Ali,,
3,4,Alice,Aoni,Billy,Bonder
4,5,Ayoung,Atiches,Brian,Black


Please observe what is the effect of switching the positions of `df_a` and `df_b`, even though the merge is still a left join.

In [331]:
pd.merge(df_b, df_a, on ='subject_id', how ='left')

Unnamed: 0,subject_id,first_name_x,last_name_x,first_name_y,last_name_y
0,4,Billy,Bonder,Alice,Aoni
1,5,Brian,Black,Ayoung,Atiches
2,6,Bran,Balwner,,
3,7,Bryce,Brice,,
4,8,Betty,Btisan,,


## Merge with a *right* join 
Right outer join produces a complete set of records from `df_b`, with the matching records (where available) in `df_a`. If there is no match, the left side will contain null.


In [332]:
pd.merge(df_a, df_b, on ='subject_id', how ='right')

Unnamed: 0,subject_id,first_name_x,last_name_x,first_name_y,last_name_y
0,4,Alice,Aoni,Billy,Bonder
1,5,Ayoung,Atiches,Brian,Black
2,6,,,Bran,Balwner
3,7,,,Bryce,Brice
4,8,,,Betty,Btisan


Please observe what is the effect of switching the positions of `df_a` and `df_b`, even though the merge is still a right join.

In [333]:
pd.merge(df_b, df_a, on ='subject_id', how ='right')

Unnamed: 0,subject_id,first_name_x,last_name_x,first_name_y,last_name_y
0,4,Billy,Bonder,Alice,Aoni
1,5,Brian,Black,Ayoung,Atiches
2,1,,,Alex,Anderson
3,2,,,Amy,Ackerman
4,3,,,Allen,Ali


## Merge with an *outer* join
An outer join produces the set of all records in `df_a` and `df_b`, with matching records from both sides where available. If there is no match, the missing side will contain null.

In [334]:
pd.merge(df_a, df_b, on ='subject_id', how ='outer')

Unnamed: 0,subject_id,first_name_x,last_name_x,first_name_y,last_name_y
0,1,Alex,Anderson,,
1,2,Amy,Ackerman,,
2,3,Allen,Ali,,
3,4,Alice,Aoni,Billy,Bonder
4,5,Ayoung,Atiches,Brian,Black
5,6,,,Bran,Balwner
6,7,,,Bryce,Brice
7,8,,,Betty,Btisan


Please observe what is the effect of switching the positions of `df_a` and `df_b`, even though the merge is still an outer join.

In [335]:
pd.merge(df_b, df_a, on ='subject_id', how ='outer')

Unnamed: 0,subject_id,first_name_x,last_name_x,first_name_y,last_name_y
0,4,Billy,Bonder,Alice,Aoni
1,5,Brian,Black,Ayoung,Atiches
2,6,Bran,Balwner,,
3,7,Bryce,Brice,,
4,8,Betty,Btisan,,
5,1,,,Alex,Anderson
6,2,,,Amy,Ackerman
7,3,,,Allen,Ali


## Merge with an *inner* join
An inner join produces only the set of records that match in both `df_a` and `df_b`, which means only the intersection. 


In [336]:
pd.merge(df_a, df_b, on ='subject_id', how ='inner')

Unnamed: 0,subject_id,first_name_x,last_name_x,first_name_y,last_name_y
0,4,Alice,Aoni,Billy,Bonder
1,5,Ayoung,Atiches,Brian,Black


In [337]:
pd.merge(df_b, df_a, on ='subject_id', how ='inner')

Unnamed: 0,subject_id,first_name_x,last_name_x,first_name_y,last_name_y
0,4,Billy,Bonder,Alice,Aoni
1,5,Brian,Black,Ayoung,Atiches
