# Data Cleaning with Pandas
In this notebook, we will explore various data cleaning techniques using `pandas` in Python.
We will work on detecting and handling missing values, removing duplicates, standardizing data formats, and more.

In [None]:
import pandas as pd
import numpy as np

# Example data
data = {'Name': ['John Doe', 'Jane Smith', 'Alice Johnson', None, 'Michael Brown'],
        'Age': [28, 34, None, 45, 22],
        'Date of Visit': ['2022-01-10', '2022/02/15', '15-03-2022', '2022-04-20', None],
        'Visitor Count': [5, None, 8, 12, 3]}

# Creating a DataFrame
df = pd.DataFrame(data)
df

## Step 1: Handling Missing Values
Let's detect missing values in our dataset and explore different ways to handle them.

In [None]:
# Detect missing values
df.isnull()

In [None]:
# Filling missing values in 'Age' column with the mean age
df['Age'].fillna(df['Age'].mean(), inplace=True)
df

## Step 2: Removing Duplicates
Sometimes datasets contain duplicate records. We can identify and remove them using pandas.

In [None]:
# Check for duplicates
df.duplicated()

In [None]:
# Removing duplicates if any exist
df.drop_duplicates(inplace=True)
df

## Step 3: Standardizing Date Formats
Inconsistent date formats can lead to errors in analysis. Let's standardize the 'Date of Visit' column.

In [None]:
# Convert the 'Date of Visit' to a standard format (YYYY-MM-DD)
df['Date of Visit'] = pd.to_datetime(df['Date of Visit'], errors='coerce')
df

## Summary
In this notebook, we have demonstrated:
- How to handle missing values
- How to identify and remove duplicates
- How to standardize date formats

Next, we will explore real-time data processing in the following notebook.