# Introduction to Pandas

Pandas is the most popular Python library for data analysis and manipulation. It provides high-performance, easy-to-use data structures and data analysis tools built on top of NumPy.

The name "pandas" is derived from "panel data" - an econometrics term for multidimensional structured datasets.

## Why Pandas?

### Advantages:
1. **Easy Data Handling**: Work with structured data intuitively
2. **Data Cleaning**: Handle missing data, duplicates, and inconsistencies
3. **Data Transformation**: Reshape, merge, join, and pivot data easily
4. **Time Series**: Built-in support for time series data
5. **Integration**: Works seamlessly with NumPy, Matplotlib, and other libraries
6. **I/O Operations**: Read/write CSV, Excel, SQL, JSON, and more

### Key Features:
- DataFrame and Series data structures
- Intelligent data alignment
- Flexible grouping and aggregation
- Built-in visualization
- Efficient indexing and selection

## Installation and Import

Install pandas if you haven't already:
```bash
pip install pandas
```

In [None]:
# Import pandas with the standard alias
import pandas as pd
import numpy as np

# Check pandas version
print("Pandas version:", pd.__version__)

## Core Data Structures

Pandas has two main data structures:

1. **Series**: One-dimensional labeled array (like a column)
2. **DataFrame**: Two-dimensional labeled data structure (like a table)

## Pandas Series

A Series is a one-dimensional array with labels (index).

In [None]:
# Create a Series from a list
series_from_list = pd.Series([10, 20, 30, 40, 50])
print("Series from list:")
print(series_from_list)
print("\nData type:", type(series_from_list))

# Create a Series with custom index
series_with_index = pd.Series([10, 20, 30, 40, 50], 
                               index=['a', 'b', 'c', 'd', 'e'])
print("\nSeries with custom index:")
print(series_with_index)

# Create a Series from a dictionary
data_dict = {'apple': 5, 'banana': 3, 'orange': 8}
series_from_dict = pd.Series(data_dict)
print("\nSeries from dictionary:")
print(series_from_dict)

In [None]:
# Series attributes and methods
series = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

print("Values:", series.values)
print("Index:", series.index)
print("Shape:", series.shape)
print("Size:", series.size)
print("Data type:", series.dtype)

# Accessing elements
print("\nAccess by index label:", series['c'])
print("Access by position:", series[2])
print("Access multiple:", series[['a', 'c', 'e']])

# Basic statistics
print("\nMean:", series.mean())
print("Sum:", series.sum())
print("Max:", series.max())
print("Min:", series.min())

## Pandas DataFrame

A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. Think of it as a spreadsheet or SQL table.

In [None]:
# Create DataFrame from dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 28, 32],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Berlin'],
    'Salary': [50000, 60000, 55000, 65000, 58000]
}

df = pd.DataFrame(data)
print("DataFrame from dictionary:")
print(df)

# Create DataFrame from list of lists
data_list = [
    ['Alice', 25, 'New York', 50000],
    ['Bob', 30, 'London', 60000],
    ['Charlie', 35, 'Paris', 55000]
]
df_from_list = pd.DataFrame(data_list, 
                             columns=['Name', 'Age', 'City', 'Salary'])
print("\nDataFrame from list:")
print(df_from_list)

In [None]:
# DataFrame attributes and methods
print("Shape:", df.shape)
print("\nColumn names:", df.columns.tolist())
print("\nIndex:", df.index.tolist())
print("\nData types:\n", df.dtypes)
print("\nInfo:")
df.info()
print("\nFirst 3 rows:")
print(df.head(3))
print("\nLast 2 rows:")
print(df.tail(2))
print("\nDescriptive statistics:")
print(df.describe())

## Selecting Data

Pandas provides multiple ways to select and access data:

In [None]:
# Create sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 28, 32],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Berlin'],
    'Salary': [50000, 60000, 55000, 65000, 58000]
})

# Select single column (returns Series)
print("Select 'Name' column:")
print(df['Name'])
print("\nType:", type(df['Name']))

# Select multiple columns (returns DataFrame)
print("\nSelect multiple columns:")
print(df[['Name', 'Age']])

# Select rows by index position (iloc)
print("\nFirst row (iloc):")
print(df.iloc[0])

print("\nFirst 3 rows:")
print(df.iloc[0:3])

# Select rows by index label (loc)
print("\nSelect by label (loc):")
print(df.loc[1:3, ['Name', 'City']])

# Select specific cells
print("\nSelect specific cell:")
print(df.loc[2, 'Name'])

In [None]:
# Boolean indexing (filtering)
print("People older than 30:")
print(df[df['Age'] > 30])

print("\nPeople in New York or London:")
print(df[df['City'].isin(['New York', 'London'])])

# Multiple conditions (AND)
print("\nAge > 25 AND Salary > 55000:")
print(df[(df['Age'] > 25) & (df['Salary'] > 55000)])

# Multiple conditions (OR)
print("\nAge < 30 OR Salary > 60000:")
print(df[(df['Age'] < 30) | (df['Salary'] > 60000)])

# String methods
print("\nNames starting with 'A' or 'B':")
print(df[df['Name'].str.startswith(('A', 'B'))])

## Adding and Modifying Data

In [None]:
# Create sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [50000, 60000, 55000]
})

print("Original DataFrame:")
print(df)

# Add a new column
df['City'] = ['New York', 'London', 'Paris']
print("\nAfter adding 'City' column:")
print(df)

# Add calculated column
df['Salary_k'] = df['Salary'] / 1000
print("\nAfter adding calculated column:")
print(df)

# Modify existing column
df['Age'] = df['Age'] + 1
print("\nAfter incrementing Age:")
print(df)

# Add new row using loc
df.loc[3] = ['David', 29, 65000, 'Tokyo', 65]
print("\nAfter adding new row:")
print(df)

# Add row using concat
new_row = pd.DataFrame([['Eve', 33, 58000, 'Berlin', 58]], 
                       columns=df.columns)
df = pd.concat([df, new_row], ignore_index=True)
print("\nAfter concatenating new row:")
print(df)

In [None]:
# Drop column
df_dropped_col = df.drop('Salary_k', axis=1)
print("After dropping 'Salary_k' column:")
print(df_dropped_col)

# Drop multiple columns
df_dropped_cols = df.drop(['City', 'Salary_k'], axis=1)
print("\nAfter dropping multiple columns:")
print(df_dropped_cols)

# Drop row
df_dropped_row = df.drop(3, axis=0)
print("\nAfter dropping row 3:")
print(df_dropped_row)

# Drop rows with condition
df_filtered = df[df['Age'] >= 30]
print("\nKeep only rows where Age >= 30:")
print(df_filtered)

## Handling Missing Data

Real-world data often contains missing values. Pandas provides tools to handle them:

In [None]:
# Create DataFrame with missing values
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, np.nan, 35, 28, np.nan],
    'City': ['New York', 'London', None, 'Tokyo', 'Berlin'],
    'Salary': [50000, 60000, np.nan, 65000, 58000]
})

print("DataFrame with missing values:")
print(df)

# Check for missing values
print("\nMissing values per column:")
print(df.isnull().sum())

print("\nAny missing values?:", df.isnull().any().any())

# Visualize missing data
print("\nMissing data mask:")
print(df.isnull())

In [None]:
# Drop rows with any missing values
df_dropped = df.dropna()
print("After dropping rows with NaN:")
print(df_dropped)

# Drop columns with any missing values
df_dropped_cols = df.dropna(axis=1)
print("\nAfter dropping columns with NaN:")
print(df_dropped_cols)

# Drop rows only if all values are missing
df_dropped_all = df.dropna(how='all')
print("\nAfter dropping rows where all values are NaN:")
print(df_dropped_all)

# Drop rows with missing values in specific columns
df_dropped_subset = df.dropna(subset=['Age'])
print("\nAfter dropping rows with NaN in 'Age':")
print(df_dropped_subset)

In [None]:
# Fill missing values with a constant
df_filled = df.fillna(0)
print("Fill NaN with 0:")
print(df_filled)

# Fill with different values per column
df_filled_dict = df.fillna({'Age': df['Age'].mean(), 
                            'City': 'Unknown', 
                            'Salary': df['Salary'].median()})
print("\nFill with different values per column:")
print(df_filled_dict)

# Forward fill (propagate last valid observation)
df_ffill = df.fillna(method='ffill')
print("\nForward fill:")
print(df_ffill)

# Backward fill
df_bfill = df.fillna(method='bfill')
print("\nBackward fill:")
print(df_bfill)

# Interpolate (for numerical data)
df_interpolated = df.copy()
df_interpolated['Age'] = df_interpolated['Age'].interpolate()
print("\nInterpolated Age:")
print(df_interpolated)

## Sorting Data

In [None]:
# Create sample DataFrame
df = pd.DataFrame({
    'Name': ['Charlie', 'Alice', 'Eve', 'Bob', 'David'],
    'Age': [35, 25, 32, 30, 28],
    'Salary': [55000, 50000, 58000, 60000, 65000]
})

print("Original DataFrame:")
print(df)

# Sort by single column
df_sorted = df.sort_values('Age')
print("\nSorted by Age:")
print(df_sorted)

# Sort in descending order
df_sorted_desc = df.sort_values('Salary', ascending=False)
print("\nSorted by Salary (descending):")
print(df_sorted_desc)

# Sort by multiple columns
df_sorted_multi = df.sort_values(['Age', 'Salary'], ascending=[True, False])
print("\nSorted by Age (asc) then Salary (desc):")
print(df_sorted_multi)

# Sort by index
df_sorted_index = df.sort_index()
print("\nSorted by index:")
print(df_sorted_index)

## Grouping and Aggregation

GroupBy allows you to split data into groups and apply functions to each group:

In [None]:
# Create sample DataFrame
df = pd.DataFrame({
    'Department': ['Sales', 'Sales', 'IT', 'IT', 'HR', 'HR', 'Sales'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'],
    'Age': [25, 30, 35, 28, 32, 45, 27],
    'Salary': [50000, 60000, 55000, 65000, 58000, 52000, 54000]
})

print("Employee DataFrame:")
print(df)

# Group by single column
grouped = df.groupby('Department')

# Calculate mean per group
print("\nMean salary by department:")
print(grouped['Salary'].mean())

# Multiple aggregations
print("\nMultiple statistics by department:")
print(grouped['Salary'].agg(['mean', 'min', 'max', 'count']))

# Group by and aggregate different columns differently
print("\nDifferent aggregations per column:")
print(grouped.agg({
    'Age': ['mean', 'min', 'max'],
    'Salary': ['mean', 'sum']
}))

In [None]:
# Multiple grouping columns
df['Experience'] = ['Junior', 'Senior', 'Senior', 'Junior', 'Senior', 'Senior', 'Junior']

print("DataFrame with Experience:")
print(df)

# Group by multiple columns
grouped_multi = df.groupby(['Department', 'Experience'])
print("\nMean salary by Department and Experience:")
print(grouped_multi['Salary'].mean())

# Reset index to make it a regular DataFrame
print("\nWith reset index:")
print(grouped_multi['Salary'].mean().reset_index())

# Size of each group
print("\nCount of employees per group:")
print(grouped_multi.size())

## Merging and Joining DataFrames

Combine multiple DataFrames like SQL joins:

In [None]:
# Create sample DataFrames
employees = pd.DataFrame({
    'EmployeeID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'DepartmentID': [101, 102, 101, 103]
})

departments = pd.DataFrame({
    'DepartmentID': [101, 102, 103, 104],
    'Department': ['Sales', 'IT', 'HR', 'Marketing']
})

print("Employees:")
print(employees)
print("\nDepartments:")
print(departments)

# Inner join (only matching rows)
merged_inner = pd.merge(employees, departments, on='DepartmentID', how='inner')
print("\nInner join:")
print(merged_inner)

# Left join (all rows from left DataFrame)
merged_left = pd.merge(employees, departments, on='DepartmentID', how='left')
print("\nLeft join:")
print(merged_left)

# Right join
merged_right = pd.merge(employees, departments, on='DepartmentID', how='right')
print("\nRight join:")
print(merged_right)

# Outer join (all rows from both)
merged_outer = pd.merge(employees, departments, on='DepartmentID', how='outer')
print("\nOuter join:")
print(merged_outer)

In [None]:
# Concatenate DataFrames vertically (stack rows)
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

print("DataFrame 1:")
print(df1)
print("\nDataFrame 2:")
print(df2)

concatenated = pd.concat([df1, df2], ignore_index=True)
print("\nConcatenated vertically:")
print(concatenated)

# Concatenate horizontally (side by side)
df3 = pd.DataFrame({'C': [9, 10], 'D': [11, 12]})
concatenated_h = pd.concat([df1, df3], axis=1)
print("\nConcatenated horizontally:")
print(concatenated_h)

## Pivot Tables and Reshaping

Transform data between wide and long formats:

In [None]:
# Create sample sales data
sales = pd.DataFrame({
    'Date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02', '2024-01-03', '2024-01-03'],
    'Product': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Sales': [100, 150, 120, 180, 110, 160],
    'Region': ['East', 'East', 'West', 'West', 'East', 'East']
})

print("Sales data:")
print(sales)

# Create pivot table
pivot = sales.pivot_table(values='Sales', 
                          index='Date', 
                          columns='Product', 
                          aggfunc='sum')
print("\nPivot table (Sales by Date and Product):")
print(pivot)

# Pivot with multiple values
pivot_multi = sales.pivot_table(values='Sales', 
                                index='Date', 
                                columns='Product', 
                                aggfunc=['sum', 'mean'])
print("\nPivot with multiple aggregations:")
print(pivot_multi)

In [None]:
# Melt (wide to long format)
df_wide = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Math': [90, 85, 95],
    'English': [88, 92, 89]
})

print("Wide format:")
print(df_wide)

df_long = pd.melt(df_wide, 
                  id_vars=['Name'], 
                  value_vars=['Math', 'English'],
                  var_name='Subject', 
                  value_name='Score')
print("\nLong format (melted):")
print(df_long)

# Pivot (long to wide format)
df_wide_again = df_long.pivot(index='Name', columns='Subject', values='Score')
print("\nBack to wide format:")
print(df_wide_again)

## Reading and Writing Data

Pandas can read/write data from/to various formats:

In [None]:
# Create sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 28],
    'City': ['New York', 'London', 'Paris', 'Tokyo'],
    'Salary': [50000, 60000, 55000, 65000]
})

# Write to CSV
df.to_csv('employees.csv', index=False)
print("Written to employees.csv")

# Read from CSV
df_from_csv = pd.read_csv('employees.csv')
print("\nRead from CSV:")
print(df_from_csv)

# Write to Excel (requires openpyxl)
# df.to_excel('employees.xlsx', index=False, sheet_name='Employees')

# Read from Excel
# df_from_excel = pd.read_excel('employees.xlsx', sheet_name='Employees')

# Write to JSON
df.to_json('employees.json', orient='records', indent=2)
print("\nWritten to employees.json")

# Read from JSON
df_from_json = pd.read_json('employees.json')
print("\nRead from JSON:")
print(df_from_json)

## String Operations

Pandas provides powerful string manipulation methods:

In [None]:
# Create DataFrame with string data
df = pd.DataFrame({
    'Name': ['alice smith', 'BOB JONES', 'Charlie Brown', 'david LEE'],
    'Email': ['alice@example.com', 'BOB@EXAMPLE.COM', 'charlie@test.org', 'david@sample.net']
})

print("Original:")
print(df)

# Convert to lowercase
df['Name_lower'] = df['Name'].str.lower()

# Convert to uppercase
df['Name_upper'] = df['Name'].str.upper()

# Title case
df['Name_title'] = df['Name'].str.title()

print("\nWith case conversions:")
print(df[['Name', 'Name_lower', 'Name_upper', 'Name_title']])

# Extract domain from email
df['Domain'] = df['Email'].str.split('@').str[1]

# Check if contains substring
df['Has_example'] = df['Email'].str.contains('example')

# Replace text
df['Email_masked'] = df['Email'].str.replace('@', ' [at] ')

print("\nWith string operations:")
print(df[['Email', 'Domain', 'Has_example', 'Email_masked']])

# String length
df['Name_length'] = df['Name'].str.len()

# Strip whitespace
df['Name_stripped'] = df['Name'].str.strip()

print("\nString length and stripped:")
print(df[['Name', 'Name_length', 'Name_stripped']])

## Date and Time Operations

Pandas has excellent support for working with dates and times:

In [None]:
# Create DataFrame with date strings
df = pd.DataFrame({
    'Date': ['2024-01-15', '2024-02-20', '2024-03-25', '2024-04-30'],
    'Sales': [1000, 1500, 1200, 1800]
})

print("Original:")
print(df)
print("\nDate dtype:", df['Date'].dtype)

# Convert to datetime
df['Date'] = pd.to_datetime(df['Date'])
print("\nAfter conversion:")
print(df)
print("Date dtype:", df['Date'].dtype)

# Extract date components
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['DayOfWeek'] = df['Date'].dt.dayofweek
df['DayName'] = df['Date'].dt.day_name()
df['MonthName'] = df['Date'].dt.month_name()

print("\nWith extracted components:")
print(df)

In [None]:
# Date range
date_range = pd.date_range(start='2024-01-01', end='2024-01-10', freq='D')
print("Date range:")
print(date_range)

# Create DataFrame with date range
df_dates = pd.DataFrame({
    'Date': pd.date_range('2024-01-01', periods=10, freq='D'),
    'Value': np.random.randint(100, 200, 10)
})

print("\nDataFrame with date range:")
print(df_dates)

# Set date as index
df_dates.set_index('Date', inplace=True)
print("\nWith date as index:")
print(df_dates)

# Select by date
print("\nData for 2024-01-05:")
print(df_dates.loc['2024-01-05'])

# Date arithmetic
print("\nDates + 7 days:")
print(df_dates.index + pd.Timedelta(days=7))

## Practical Example: Sales Data Analysis

Let's analyze a realistic sales dataset:

In [None]:
# Create sample sales data
np.random.seed(42)
dates = pd.date_range('2024-01-01', periods=100, freq='D')
products = ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard']
regions = ['North', 'South', 'East', 'West']

sales_data = pd.DataFrame({
    'Date': np.random.choice(dates, 200),
    'Product': np.random.choice(products, 200),
    'Region': np.random.choice(regions, 200),
    'Quantity': np.random.randint(1, 20, 200),
    'Price': np.random.randint(100, 2000, 200)
})

sales_data['Revenue'] = sales_data['Quantity'] * sales_data['Price']
sales_data = sales_data.sort_values('Date').reset_index(drop=True)

print("Sales data (first 10 rows):")
print(sales_data.head(10))

print("\nDataset info:")
print(f"Shape: {sales_data.shape}")
print(f"Columns: {sales_data.columns.tolist()}")
print(f"\nTotal revenue: ${sales_data['Revenue'].sum():,}")

In [None]:
# Analysis 1: Revenue by product
revenue_by_product = sales_data.groupby('Product')['Revenue'].sum().sort_values(ascending=False)
print("Revenue by product:")
print(revenue_by_product)

# Analysis 2: Average price and quantity by product
product_stats = sales_data.groupby('Product').agg({
    'Price': 'mean',
    'Quantity': 'mean',
    'Revenue': 'sum'
}).round(2)
print("\nProduct statistics:")
print(product_stats)

# Analysis 3: Best performing region
revenue_by_region = sales_data.groupby('Region')['Revenue'].sum().sort_values(ascending=False)
print("\nRevenue by region:")
print(revenue_by_region)

# Analysis 4: Monthly revenue
sales_data['Month'] = pd.to_datetime(sales_data['Date']).dt.to_period('M')
monthly_revenue = sales_data.groupby('Month')['Revenue'].sum()
print("\nMonthly revenue:")
print(monthly_revenue)

In [None]:
# Analysis 5: Top 5 days by revenue
top_days = sales_data.groupby('Date')['Revenue'].sum().nlargest(5)
print("Top 5 days by revenue:")
print(top_days)

# Analysis 6: Product-Region performance matrix
pivot_table = sales_data.pivot_table(
    values='Revenue',
    index='Product',
    columns='Region',
    aggfunc='sum',
    fill_value=0
)
print("\nRevenue by Product and Region:")
print(pivot_table)

# Analysis 7: Best selling product per region
best_per_region = sales_data.groupby(['Region', 'Product'])['Revenue'].sum().reset_index()
best_per_region = best_per_region.loc[best_per_region.groupby('Region')['Revenue'].idxmax()]
print("\nBest selling product per region:")
print(best_per_region)

## Basic Visualization

Pandas has built-in plotting capabilities using Matplotlib:

In [None]:
import matplotlib.pyplot as plt

# Create sample data
df = pd.DataFrame({
    'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
    'Sales': [1000, 1500, 1200, 1800, 2100, 1900],
    'Expenses': [800, 900, 850, 950, 1100, 1000]
})

# Line plot
df.plot(x='Month', y=['Sales', 'Expenses'], kind='line', figsize=(10, 5))
plt.title('Sales vs Expenses')
plt.ylabel('Amount ($)')
plt.grid(True)
plt.show()

# Bar plot
df.plot(x='Month', y='Sales', kind='bar', figsize=(10, 5), color='skyblue')
plt.title('Monthly Sales')
plt.ylabel('Sales ($)')
plt.xticks(rotation=0)
plt.show()

# Multiple columns bar plot
df.plot(x='Month', y=['Sales', 'Expenses'], kind='bar', figsize=(10, 5))
plt.title('Sales and Expenses by Month')
plt.ylabel('Amount ($)')
plt.xticks(rotation=0)
plt.legend()
plt.show()

## Common Pandas Operations Cheat Sheet

### Selection:
- `df['column']` - Select single column
- `df[['col1', 'col2']]` - Select multiple columns
- `df.iloc[0]` - Select by position
- `df.loc[0, 'col']` - Select by label
- `df[df['col'] > value]` - Filter rows

### Aggregation:
- `df.groupby('col').agg()` - Group and aggregate
- `df.sum()`, `df.mean()`, `df.count()` - Statistics
- `df.describe()` - Summary statistics

### Transformation:
- `df.sort_values('col')` - Sort
- `df.drop('col', axis=1)` - Drop column
- `df.fillna(value)` - Fill missing values
- `df.dropna()` - Drop missing values

### Merging:
- `pd.merge(df1, df2, on='key')` - Join DataFrames
- `pd.concat([df1, df2])` - Concatenate
- `df.pivot_table()` - Create pivot table

## Exercises

Practice with these exercises:

1. Create a DataFrame with student names, grades for 3 subjects, and calculate average grade
2. Load a CSV file and find the top 5 rows with the highest values in a specific column
3. Group data by category and calculate sum, mean, and count for each group
4. Handle missing values by filling them with the column mean
5. Merge two DataFrames (students and their courses) using a common ID
6. Create a pivot table showing average grades by student and subject
7. Extract year, month, and day from a date column
8. Find duplicate rows and remove them

Good luck!

## Summary

Pandas is essential for:
- Data manipulation and cleaning
- Exploratory data analysis
- Data preparation for machine learning
- Working with structured data

**Key Takeaways:**
- DataFrame and Series are the core data structures
- Powerful selection, filtering, and indexing capabilities
- GroupBy enables split-apply-combine operations
- Easy handling of missing data
- Built-in support for merging, joining, and reshaping
- Excellent time series functionality
- Direct integration with visualization libraries

**Next Steps:**
- Practice with real datasets (Kaggle, UCI ML Repository)
- Learn data visualization with Matplotlib and Seaborn
- Explore advanced pandas features (MultiIndex, window functions)
- Study data cleaning and preprocessing techniques
- Combine with NumPy for numerical operations