# Concat, Join and Compare with Pandas

Pandas provides various facilities for easily combining together Series or DataFrame with various kinds of set logic  
for the indexes and relational algebra functionality in the case of join / merge-type operations.

Note: Adding a column to a DataFrame is relatively fast. However, adding a row requires a copy, and may be expensive.

In [250]:
import numpy as np
np.random.seed(0)
import pandas as pd

## Recap: Concatenation of np.ndarrays

In [251]:
x = [1, 2, 3]
y = [4, 5, 6]
z = [7, 8, 9]

print(np.concatenate([x, y, z]))

[1 2 3 4 5 6 7 8 9]


In [252]:
x = [[1, 2],
     [3, 4]]

print(np.concatenate([x, x], axis=1))

[[1 2 1 2]
 [3 4 3 4]]


## Append with Pandas

A useful shortcut to concat() are the append() instance methods on Series and DataFrame. They concatenate along axis=0, namely the index.

```python
DataFrame.append(
    other,
    ...
)
```

- otherDataFrame or Series/dict-like object, or list of these: The data to append.

In [253]:
df_append = df1.append(df3)

print(df_append)

    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3
0  A4  B4
1  A5  B5
2  A6  B6
5  A7  B7


## Concatenation with Pandas

Like its sibling function on ndarrays, numpy.concatenate, pandas.concat takes a list or dict of homogeneously-typed objects and concatenates them with some configurable handling of “what to do with the other axes".

```python
pd.concat(
    objs,
    axis=0,
    join="outer",
    ...
)
```

- objs : a sequence or mapping of Series or DataFrame objects.
- axis : {0, 1, …}, default 0. The axis to concatenate along.
- join : {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es). Outer for union and inner for intersection.

In [254]:
s1 = pd.Series([1, 2, 3], index=['A', 'B', 'C'])
s2 = pd.Series([4, 5, 6], index=['D', 'E', 'F'])

In [255]:
print(pd.concat([s1, s2]))

A    1
B    2
C    3
D    4
E    5
F    6
dtype: int64


In [256]:
df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
    },
    index=[0, 1, 2, 3]
)

df2 = pd.DataFrame(
    {
        "A": ["A4", "A5", "A6", "A7"],
        "B": ["B4", "B5", "B6", "B7"],
    },
    index=[4, 5, 6, 7]
)

df3 = pd.DataFrame(
    {
        "A": ["A4", "A5", "A6", "A7"],
        "B": ["B4", "B5", "B6", "B7"],
    },
    index=[0, 1, 2, 5]
)

In [257]:
df_concat1 = pd.concat(
    [df1, df2],
    join="outer",
    axis="rows"
)

print(df_concat1)

    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3
4  A4  B4
5  A5  B5
6  A6  B6
7  A7  B7


In [258]:
df_concat2 = pd.concat(
    [df1, df2],
    join="inner",
    axis=0
)

print(df_concat2)

    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3
4  A4  B4
5  A5  B5
6  A6  B6
7  A7  B7


In [259]:
df_concat3 = pd.concat(
    [df1, df3],
    join="outer",
    axis=1
)

print(df_concat3)

     A    B    A    B
0   A0   B0   A4   B4
1   A1   B1   A5   B5
2   A2   B2   A6   B6
3   A3   B3  NaN  NaN
5  NaN  NaN   A7   B7


In [260]:
df_concat4 = pd.concat(
    [df1, df3],
    join="inner",
    axis=1
)

print(df_concat4)

    A   B   A   B
0  A0  B0  A4  B4
1  A1  B1  A5  B5
2  A2  B2  A6  B6


## Merge

Pandas provides a single function, merge(), as the entry point for all standard database join operations between DataFrame or named Series objects.

```python
pd.merge(
    left,
    right,
    how="inner",
    on=None,
    ...
)
````

- left: A DataFrame or named Series object.
- right: Another DataFrame or named Series object.
- how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}
    - left: use only keys from left frame
    - right: use only keys from right frame
    - outer: use union of keys from both frames
    - inner: use intersection of keys from both frames
    - cross: creates the cartesian product from both frames
- on: Column or index level names to join on. Must be found in both the left and right DataFrame and/or Series objects.

In [261]:
left = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2"],
        "A": ["A0", "A1", "A2"]
    }
)


right = pd.DataFrame(
    {
        "key": ["K0", "K1", "K2"],
        "B": ["B0", "B1", "B2"]
    }
)

print(left)
print(right)

  key   A
0  K0  A0
1  K1  A1
2  K2  A2
  key   B
0  K0  B0
1  K1  B1
2  K2  B2


In [262]:
print(pd.merge(left, right, on="key"))

  key   A   B
0  K0  A0  B0
1  K1  A1  B1
2  K2  A2  B2


## Join

DataFrame.join() is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

```python
DataFrame.join(
    other,
    on=None,
    how='left',
    ...
)
```

- otherDataFrame: Index should be similar to one of the columns in this one.
- on: Column or index level name(s)
- how{‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’
    - left: use calling frame’s index
    - right: use other’s index.
    - outer: form union of calling frame’s index with other’s index
    - inner: form intersection of calling frame’s index with other’s index

In [263]:
left = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2"],
    },
    index=["K0", "K1", "K2"]
)


right = pd.DataFrame(
    {
        "B": ["B0", "B1", "B2"],
    },
    index=["K0", "K1", "K3"]
)

print(left)
print(right)

     A
K0  A0
K1  A1
K2  A2
     B
K0  B0
K1  B1
K3  B2


In [264]:
print(left.join(right, how="outer"))

      A    B
K0   A0   B0
K1   A1   B1
K2   A2  NaN
K3  NaN   B2


In [265]:
print(left.join(right, how="inner"))

     A   B
K0  A0  B0
K1  A1  B1
