# Data Transformation & Manipulation Techniques in Python
This notebook covers the most commonly used Python syntaxes for transforming and manipulating data during data analysis.
We will use Pandas as the primary library for these operations.

### 1. Importing Libraries and Loading Data
Before starting, let's import the necessary libraries and load the dataset.

In [None]:
import pandas as pd
import numpy as np

# Example dataset
data = {'A': [1, 2, np.nan, 4, 5],
        'B': [5, np.nan, np.nan, 8, 10],
        'C': ['foo', 'bar', 'foo', 'bar', np.nan],
        'D': ['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05']}

df = pd.DataFrame(data)
df['D'] = pd.to_datetime(df['D'])
df

### 2. Handling Missing Data
Handling missing data is a crucial step in any data transformation process.

In [None]:
# Fill missing values with mean/median for numeric columns
df['A'].fillna(df['A'].mean(), inplace=True)
df['B'].fillna(df['B'].median(), inplace=True)

# Fill missing values with mode for categorical columns
df['C'].fillna(df['C'].mode()[0], inplace=True)
df

### 3. Feature Engineering & Transformation
Creating and transforming features is often essential to enhance predictive power.

In [None]:
# Creating a new feature based on existing ones
df['E'] = df['A'] * df['B']  # Multiply two columns

# Transforming a column using a custom function
df['F'] = df['A'].apply(lambda x: np.log(x) if x > 0 else 0)
df

### 4. Grouping and Aggregation
Grouping data and applying aggregation functions is key in summarizing information.

In [None]:
# Grouping by column 'C' and aggregating
grouped_df = df.groupby('C').agg({'A': 'mean', 'B': 'sum'})
grouped_df

### 5. Merging and Joining DataFrames
Merging multiple datasets together is often required during data analysis.

In [None]:
# Creating another DataFrame for the merge example
df2 = pd.DataFrame({'C': ['foo', 'bar'], 'G': [100, 200]})

# Merging df with df2 on column 'C'
merged_df = pd.merge(df, df2, on='C', how='left')
merged_df

### 6. Pivoting and Reshaping Data
Pivoting is useful when you need to summarize or rearrange data.

In [None]:
# Example of pivoting a DataFrame
pivot_df = df.pivot_table(index='C', values='A', aggfunc='mean')
pivot_df

### 7. Sorting and Filtering Data
Sorting and filtering are essential operations to arrange and subset the data.

In [None]:
# Sorting by column 'A'
sorted_df = df.sort_values(by='A', ascending=False)

# Filtering rows based on condition
filtered_df = df[df['A'] > 2]
filtered_df