## Section 1: Combining Data

Additional examples of [data wrangling in Pandas](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)

### `pd.concat()`
- **Purpose**: Combines multiple DataFrames into one.

- **Key Parameter**:
    - `ignore_index`: When set to `True` resets the index so that the resulting DataFrame has a new continuous index.

In [None]:
import pandas as pd

# Department 1 data
df_dept1 = pd.DataFrame({
    'EmployeeID': [101, 102, 103],
    'Name': ['Alice', 'Bob', 'Michael'],
    'Department': ['Sales', 'Sales', 'Sales'],
    'Salary': [60000, 62000, 61000]
})

# Department 2 data
df_dept2 = pd.DataFrame({
    'EmployeeID': [104, 105],
    'Name': ['David', 'Henry'],
    'Department': ['Marketing', 'Marketing'],
    'Salary': [65000, 67000]
})

### `rename()`
- **Purpose**: Changes column names to more meaningful or standardized labels.
- **Key Parameter**:
    - `columns`: A dictionary mapping old column names to new names.

In [None]:
# Rename columns to more descriptive names


### `sort_values()`
- **Purpose**: Sorts the DataFrame by a specific column.
- **Key Parameter**:
    - `by`: Specifies the column(s) to sort by.
    - `ascending`: When set to `False` sorts the data in descending order.

In [None]:
# Sort by Annual_Salary in descending order


## Section 2: Method Chaining

### **Concept**: Combining several DataFrame operations into one continuous expression.

### **Advantages**:
- Improves readability and conciseness.
- Reduces the need for intermediate variables.

### Steps in this chain:
- **Concatenation**: Merges the two department DataFrames.
- **Renaming**: Standardizes column names.
- **Sorting**: Orders the DataFrame by annual salary in descending order.

In [None]:
import pandas as pd

# Department 1 data
df_dept1 = pd.DataFrame({
    'EmployeeID': [101, 102, 103],
    'Name': ['Alice', 'Bob', 'Michael'],
    'Department': ['Sales', 'Sales', 'Sales'],
    'Salary': [60000, 62000, 61000]
})

# Department 2 data
df_dept2 = pd.DataFrame({
    'EmployeeID': [104, 105],
    'Name': ['David', 'Henry'],
    'Department': ['Marketing', 'Marketing'],
    'Salary': [65000, 67000]
})

# Method chaining: Combine, rename, and sort in descending order of salary.

## Group Activity: Cleaning an Untidy Sales Dataset Using Method Chaining

### Method Chaining Instructions:
- Remove duplicates.
- Fill missing values with 0.
- Reshape the DataFrame from wide to long format.
- Sort the final DataFrame.

In [None]:
import pandas as pd

df_sales = pd.DataFrame({
    'Product': [
        'Widget A', 'Widget B', 'Widget A', 'Widget C',
        'Widget B', 'Widget A', 'Widget D', 'Widget E',
        'Widget C', 'Widget D', 'Widget B', 'Widget E'
    ],
    'Region': [
        'North', 'South', 'North', 'East',
        'South', 'North', 'West', 'East',
        'Central', 'North', 'West', 'South'
    ],
    'Sales_Q1': [100, 200, 100, 150, None, 100, 180, 210, 140, 190, 205, 220],
    'Sales_Q2': [110, None, 110, 160, 210, 110, 185, 220, 150, 200, 215, 230],
    'Sales_Q3': [105, 205, 105, None, 215, 105, 175, 205, 145, 195, 210, 225],
    'Sales_Q4': [115, 215, 115, 165, 225, None, 190, 215, 155, 205, 220, 235]
})

print("Expanded df_sales DataFrame:")
print(df_sales)


# Method chaining: Clean the dataset in one pipeline.


In [None]:
# Create a pivot table that calculates the sum of sales per Product per Region
