# 🐍 Python + Pandas 🐼

#### Utilizing the Validate Parameter in the Pandas Merge Function

What if we want to ensure the integrity of our data merges 🤔

The 'validate' parameter checks whether the merging of two datasets follows a certain rule. These rules are based on the uniqueness of the keys (the column(s) we are merging on - in our case, this is PersonID) in the datasets. 

In [2]:
import pandas as pd

In [3]:
person_df = [
    {"PersonID": 1, "Name": "Wikus"},
    {"PersonID": 2, "Name": "Thandi"},
    {"PersonID": 3, "Name": "Xeki"}
]

person_df = pd.DataFrame(person_df)

In [4]:
purchase_df = {
    'PersonID': [3, 3, 4],
    'Item': ['Egg', 'Potato', 'Petrol'],
    'Quantity': [5, 10, 15],
    'Price': [10.50, 29.99, 5.50]
}

purchase_df = pd.DataFrame(purchase_df)

Notice the error message produced due to the validate rule not being met.

In [5]:
final_df = pd.merge(
    person_df,
    purchase_df,
    on = "PersonID",
    how = "outer",
    indicator = True,
    validate='1:1',
)

<class 'pandas.errors.MergeError'>: Merge keys are not unique in right dataset; not a one-to-one merge

Validate rule being met.

In [6]:
final_df = pd.merge(
    person_df,
    purchase_df,
    on = "PersonID",
    how = "outer",
    indicator = True,
    validate='1:m',
)

In [7]:
display(final_df)

Unnamed: 0,PersonID,Name,Item,Quantity,Price,_merge
0,1,Wikus,,,,left_only
1,2,Thandi,,,,left_only
2,3,Xeki,Egg,5.0,10.5,both
3,3,Xeki,Potato,10.0,29.99,both
4,4,,Petrol,15.0,5.5,right_only


The validate parameter has four rules:
- one_to_one = 1:1
- one_to_many = 1:m
- many_to_one = m:1
- many_to_many = m:m