# Concat, Merge and Join

- merge is used to combine dataframes on the basis of values of common columns (indices can also be used, use left_index=True and/or right_index=True)
- concat is used to append dataframes one below the other (or sideways, depending on whether the axis option is set to 0 or 1)
- join is used to merge dataframes on the basis of the index; instead of using merge with the option left_index=True we can use join

In [48]:
import numpy as np
np.random.seed(0)
import pandas as pd

## Recap: Concatenation of np.ndarrays

In [49]:
x = [[1, 2],
     [3, 4]]

print(np.concatenate([x, x], axis=0))

[[1 2]
 [3 4]
 [1 2]
 [3 4]]


In [50]:
x = [[1, 2],
     [3, 4]]

print(np.concatenate([x, x], axis=1))

[[1 2 1 2]
 [3 4 3 4]]


## DataFrames for the Examples

In [51]:
df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
    },
    index=[0, 1, 2, 3]
)

df2 = pd.DataFrame(
    {
        "A": ["A4", "A5", "A6", "A7"],
        "B": ["B4", "B5", "B6", "B7"],
    },
    index=[0, 1, 2, 5]
)

## Append with Pandas

A useful shortcut to concat() are the append() instance methods on Series and DataFrame. They concatenate along axis=0, namely the index.

```python
DataFrame.append(
    other,
    ...
)
```

- otherDataFrame or Series/dict-like object, or list of these: The data to append.

In [52]:
print(df1)
print(df2)

    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3
    A   B
0  A4  B4
1  A5  B5
2  A6  B6
5  A7  B7


In [53]:
df12_append = df1.append(df2)

print(df12_append)

    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3
0  A4  B4
1  A5  B5
2  A6  B6
5  A7  B7


## Concatenation with Pandas

Like its sibling function on ndarrays, numpy.concatenate, pandas.concat takes a list or dict of homogeneously-typed objects and concatenates them with some configurable handling of “what to do with the other axes".

```python
pd.concat(
    objs,
    axis=0,
    join="outer",
    ...
)
```

- objs : a sequence or mapping of Series or DataFrame objects.
- axis : {0, 1, …}, default 0. The axis to concatenate along.
- join : {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es). Outer for union and inner for intersection.

In [54]:
print(df1)
print(df2)

    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3
    A   B
0  A4  B4
1  A5  B5
2  A6  B6
5  A7  B7


In [55]:
df_concat = pd.concat(
    [df1, df2]
)

print(df_concat)

    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3
0  A4  B4
1  A5  B5
2  A6  B6
5  A7  B7


In [56]:
df_concat = pd.concat(
    [df1, df2],
    join="inner"
)

print(df_concat)

    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3
0  A4  B4
1  A5  B5
2  A6  B6
5  A7  B7


In [57]:
df_concat = pd.concat(
    [df1, df2],
    join="inner",
    axis=1
)

print(df_concat)

    A   B   A   B
0  A0  B0  A4  B4
1  A1  B1  A5  B5
2  A2  B2  A6  B6


In [58]:
try:
    df_concat = pd.concat(
        [df1, df2],
        join="inner",
        verify_integrity=True
    )
except:
    df_concat = None

print(df_concat)

None


In [59]:
df_concat = pd.concat(
    [df1, df2],
    join="inner",
    ignore_index=True
)

print(df_concat)

    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3
4  A4  B4
5  A5  B5
6  A6  B6
7  A7  B7


## Merge

Pandas provides a single function, merge(), as the entry point for all standard database join operations between DataFrame or named Series objects.

```python
pd.merge(
    left,
    right,
    how="inner",
    on=None,
    ...
)
````

- left: A DataFrame or named Series object.
- right: Another DataFrame or named Series object.
- how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}
    - left: use only keys from left frame
    - right: use only keys from right frame
    - outer: use union of keys from both frames
    - inner: use intersection of keys from both frames
    - cross: creates the cartesian product from both frames
- on: Column or index level names to join on. Must be found in both the left and right DataFrame and/or Series objects.

In [60]:
left = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2"],
        "A": ["A0", "A1", "A2"]
    }
)


right = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2"],
        "B": ["B0", "B1", "B2"]
    }
)

In [61]:
print(left)
print(right)

  key   A
0  K0  A0
1  K1  A1
2  K2  A2
  key   B
0  K0  B0
1  K1  B1
2  K2  B2


In [62]:
print(pd.merge(left, right, on="key"))

  key   A   B
0  K0  A0  B0
1  K1  A1  B1
2  K2  A2  B2


In [63]:
left2 = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2"],
        "A": ["A0", "A1", "A2"]
    }
)


right2 = pd.DataFrame(
    {
        "key": ["K0", "K1", "K3"],
        "B": ["B0", "B1", "B2"]
    }
)

In [64]:
print(left2)
print(right2)

  key   A
0  K0  A0
1  K1  A1
2  K2  A2
  key   B
0  K0  B0
1  K1  B1
2  K3  B2


In [65]:
print(pd.merge(left2, right2, on="key"))

  key   A   B
0  K0  A0  B0
1  K1  A1  B1


In [66]:
print(pd.merge(left2, right2, on="key", how="outer"))

  key    A    B
0  K0   A0   B0
1  K1   A1   B1
2  K2   A2  NaN
3  K3  NaN   B2


## Join

DataFrame.join() is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

```python
DataFrame.join(
    other,
    on=None,
    how='left',
    ...
)
```

- otherDataFrame: Index should be similar to one of the columns in this one.
- on: Column or index level name(s)
- how{‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’
    - left: use calling frame’s index
    - right: use other’s index.
    - outer: form union of calling frame’s index with other’s index
    - inner: form intersection of calling frame’s index with other’s index

![join](../media/join.png)

In [67]:
left = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2"],
    },
    index=["K0", "K1", "K2"]
)


right = pd.DataFrame(
    {
        "B": ["B0", "B1", "B2"],
    },
    index=["K0", "K1", "K3"]
)

In [68]:
print(left)
print(right)

     A
K0  A0
K1  A1
K2  A2
     B
K0  B0
K1  B1
K3  B2


In [69]:
print(left.join(right, how="outer"))

      A    B
K0   A0   B0
K1   A1   B1
K2   A2  NaN
K3  NaN   B2


In [70]:
print(left.join(right, how="inner"))

     A   B
K0  A0  B0
K1  A1  B1


In [71]:
print(left.join(right, how="left"))

     A    B
K0  A0   B0
K1  A1   B1
K2  A2  NaN


In [72]:
print(left.join(right, how="right"))

      A   B
K0   A0  B0
K1   A1  B1
K3  NaN  B2
