# Introduction to Pandas 🐼
## Data Analysis Made Easy

**Duration: 50 minutes**  
**Target Audience: Beginners with little to no pandas experience**

---

### What We'll Learn Today:
1. **What is Pandas?** (5 minutes)
2. **Setting Up & First Steps** (5 minutes)
3. **Series: The Building Block** (10 minutes)
4. **DataFrames: Your New Best Friend** (15 minutes)
5. **Data Exploration & Basic Operations** (10 minutes)
6. **Hands-on Practice** (5 minutes)

---

> **💡 Tip:** Run each code cell as we go through the lesson. Press `Shift + Enter` to execute a cell!

## 1. What is Pandas? 🤔

**Pandas** (Python Data Analysis Library) is like a Swiss Army knife for data analysis. Think of it as Excel on steroids!

### Why Pandas?
- 📊 **Easy data manipulation**: Clean, transform, and analyze data effortlessly
- 📈 **Handles multiple data types**: Numbers, text, dates, and more
- 🔗 **Integrates beautifully**: Works seamlessly with other Python libraries
- 🚀 **Performance**: Built on NumPy for speed

### Real-world Applications:
- 🏥 **Healthcare**: Analyzing patient data and treatment outcomes
- 💰 **Finance**: Stock market analysis and risk assessment
- 🛒 **E-commerce**: Customer behavior and sales trends
- 🌱 **Research**: Scientific data analysis and visualization

---
**Think of pandas as your data assistant that never gets tired of organizing spreadsheets!**

## 2. Setting Up & First Steps 🚀

Let's start our pandas journey! First, we need to import the library.

In [None]:
# Import pandas (the standard convention is to use 'pd' as an alias)
import pandas as pd
import numpy as np  # We'll use this for some examples

# Let's check which version of pandas we're using
print(f"Pandas version: {pd.__version__}")
print("🎉 Pandas is ready to use!")

## 3. Series: The Building Block 🧱

A **Series** is like a single column in a spreadsheet. It's a one-dimensional array that can hold any data type.

### Think of it as:
- A list with superpowers
- A single column from Excel
- A 1D array with labels (index)

In [None]:
# Creating our first Series - Student scores
scores = pd.Series([85, 92, 78, 96, 88])
print("Student Scores:")
print(scores)
print(f"\nType: {type(scores)}")

In [None]:
# Creating a Series with custom labels (index)
student_names = ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']
named_scores = pd.Series([85, 92, 78, 96, 88], index=student_names)

print("Scores with Student Names:")
print(named_scores)
print(f"\nAlice's score: {named_scores['Alice']}")
print(f"Highest score: {named_scores.max()}")
print(f"Average score: {named_scores.mean():.1f}")

### 🔍 Quick Exercise: Try it yourself!
Create a Series with the temperatures for a week. Use the days of the week as labels.

In [None]:
# Your turn! Create a temperature series for the week
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
temperatures = [72, 75, 78, 74, 71, 69, 73]  # Temperatures in Fahrenheit

weekly_temps = pd.Series(temperatures, index=days)
print("Weekly Temperatures:")
print(weekly_temps)
print(f"\nWarmest day: {weekly_temps.idxmax()} ({weekly_temps.max()}°F)")
print(f"Coolest day: {weekly_temps.idxmin()} ({weekly_temps.min()}°F)")

## 4. DataFrames: Your New Best Friend 📊

A **DataFrame** is like a complete spreadsheet - it has multiple columns and rows. Think of it as multiple Series combined together!

### Analogy:
- 📋 **Series** = Single column in Excel
- 📊 **DataFrame** = Complete Excel worksheet with multiple columns

### Key Features:
- 2-dimensional (rows and columns)
- Each column can have different data types
- Built-in indexing and labeling
- Perfect for real-world datasets

In [None]:
# Creating our first DataFrame - Student information
student_data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [20, 21, 19, 22, 20],
    'Grade': ['A', 'B+', 'B', 'A+', 'A-'],
    'Score': [85, 92, 78, 96, 88]
}

df = pd.DataFrame(student_data)
print("Student DataFrame:")
print(df)

In [None]:
# Let's explore our DataFrame
print("DataFrame Shape (rows, columns):", df.shape)
print("\nColumn Names:", df.columns.tolist())
print("\nData Types:")
print(df.dtypes)
print("\nFirst 3 rows:")
print(df.head(3))

### Selecting Data from DataFrames

Just like in Excel, we often want to look at specific columns or rows. Pandas makes this super easy!

In [None]:
# Selecting a single column (returns a Series)
print("Just the names:")
print(df['Name'])
print(f"\nType: {type(df['Name'])}")

print("\n" + "="*30)

# Selecting multiple columns (returns a DataFrame)
print("Names and Scores:")
print(df[['Name', 'Score']])
print(f"\nType: {type(df[['Name', 'Score']])}")

In [None]:
# Filtering data (like using filters in Excel)
print("Students with score >= 90:")
high_scorers = df[df['Score'] >= 90]
print(high_scorers)

print("\nStudents aged 20:")
age_20 = df[df['Age'] == 20]
print(age_20[['Name', 'Age', 'Score']])

### Adding New Columns

Just like adding a new column in Excel, we can easily add new data to our DataFrame!

In [None]:
# Adding a new column
df['Pass'] = df['Score'] >= 80  # Boolean column: True if score >= 80
df['Score_Category'] = df['Score'].apply(lambda x: 'Excellent' if x >= 90 
                                        else 'Good' if x >= 80 
                                        else 'Needs Improvement')

print("Updated DataFrame:")
print(df)

## 5. Data Exploration & Basic Operations 🔍

Now let's learn some essential operations that you'll use all the time in data analysis!

In [None]:
# Basic statistics - like having a built-in calculator!
print("📊 BASIC STATISTICS")
print("="*30)
print(f"Average score: {df['Score'].mean():.1f}")
print(f"Median score: {df['Score'].median()}")
print(f"Highest score: {df['Score'].max()}")
print(f"Lowest score: {df['Score'].min()}")
print(f"Score range: {df['Score'].max() - df['Score'].min()}")

print("\n📈 SUMMARY STATISTICS")
print("="*30)
print(df['Score'].describe())

In [None]:
# Sorting data (like sorting in Excel)
print("📋 STUDENTS SORTED BY SCORE (Highest first):")
print(df.sort_values('Score', ascending=False)[['Name', 'Score', 'Grade']])

print("\n📋 STUDENTS SORTED BY NAME (Alphabetical):")
print(df.sort_values('Name')[['Name', 'Age', 'Score']])

In [None]:
# Grouping data (like pivot tables in Excel)
print("📊 PERFORMANCE BY AGE:")
age_groups = df.groupby('Age')['Score'].agg(['mean', 'count'])
print(age_groups)

print("\n📊 GRADE DISTRIBUTION:")
grade_counts = df['Grade'].value_counts()
print(grade_counts)

### Working with Real Data: Reading Files

In the real world, data often comes from files. Let's create and read a CSV file!

In [None]:
# Save our DataFrame to a CSV file
df.to_csv('students.csv', index=False)
print("✅ Saved students.csv")

# Read it back
df_from_file = pd.read_csv('students.csv')
print("\n📂 Data read from CSV file:")
print(df_from_file)

# Check if they're the same
print(f"\nAre they identical? {df.equals(df_from_file)}")

## 6. Hands-on Practice Challenge! 🎯

**Scenario:** You're analyzing data for a small coffee shop. Let's create and analyze some sales data!

In [None]:
# Create coffee shop sales data
coffee_data = {
    'Day': ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'],
    'Espresso': [25, 30, 28, 35, 45, 60, 40],
    'Latte': [40, 45, 42, 50, 65, 80, 55],
    'Cappuccino': [20, 25, 23, 30, 35, 45, 32],
    'Temperature': [68, 72, 70, 75, 73, 78, 76]  # Weather temperature
}

coffee_df = pd.DataFrame(coffee_data)
print("☕ COFFEE SHOP SALES DATA")
print("="*40)
print(coffee_df)

In [None]:
# Let's analyze the coffee shop data!

# 1. Calculate total daily sales
coffee_df['Total_Sales'] = coffee_df['Espresso'] + coffee_df['Latte'] + coffee_df['Cappuccino']

# 2. Find the best and worst sales days
best_day = coffee_df.loc[coffee_df['Total_Sales'].idxmax(), 'Day']
worst_day = coffee_df.loc[coffee_df['Total_Sales'].idxmin(), 'Day']

print(f"🏆 Best sales day: {best_day} ({coffee_df['Total_Sales'].max()} drinks)")
print(f"📉 Worst sales day: {worst_day} ({coffee_df['Total_Sales'].min()} drinks)")

# 3. Most popular drink overall
drink_totals = {
    'Espresso': coffee_df['Espresso'].sum(),
    'Latte': coffee_df['Latte'].sum(),
    'Cappuccino': coffee_df['Cappuccino'].sum()
}

most_popular = max(drink_totals, key=drink_totals.get)
print(f"☕ Most popular drink: {most_popular} ({drink_totals[most_popular]} total sales)")

# 4. Average sales per day
print(f"📊 Average daily sales: {coffee_df['Total_Sales'].mean():.1f} drinks")

print("\n" + "="*50)
print("UPDATED DATAFRAME WITH TOTAL SALES:")
print(coffee_df)

### 🎯 Your Turn: Practice Challenge!

**Task:** Answer these questions using pandas operations:

1. Which days had sales above the average?
2. Is there a relationship between temperature and total sales?
3. What percentage of total sales does each drink type represent?

Try to solve these in the cell below!

In [None]:
# YOUR SOLUTION HERE - Try to solve the challenges!

# 1. Days with above-average sales
avg_sales = coffee_df['Total_Sales'].mean()
above_avg_days = coffee_df[coffee_df['Total_Sales'] > avg_sales]['Day'].tolist()
print(f"1️⃣ Days with above-average sales ({avg_sales:.1f}): {above_avg_days}")

# 2. Temperature vs Sales relationship (basic correlation)
correlation = coffee_df['Temperature'].corr(coffee_df['Total_Sales'])
print(f"2️⃣ Temperature-Sales correlation: {correlation:.3f}")
print("   (1.0 = perfect positive, 0 = no relationship, -1.0 = perfect negative)")

# 3. Drink type percentages
total_all_sales = coffee_df[['Espresso', 'Latte', 'Cappuccino']].sum().sum()
espresso_pct = (coffee_df['Espresso'].sum() / total_all_sales) * 100
latte_pct = (coffee_df['Latte'].sum() / total_all_sales) * 100
cappuccino_pct = (coffee_df['Cappuccino'].sum() / total_all_sales) * 100

print(f"3️⃣ Sales by drink type:")
print(f"   ☕ Espresso: {espresso_pct:.1f}%")
print(f"   ☕ Latte: {latte_pct:.1f}%")
print(f"   ☕ Cappuccino: {cappuccino_pct:.1f}%")

## 🎉 Congratulations! You've Completed Pandas Basics!

### What You've Learned Today:
✅ **Pandas fundamentals**: Series and DataFrames  
✅ **Data creation**: From dictionaries and lists  
✅ **Data selection**: Columns, rows, and filtering  
✅ **Data manipulation**: Adding columns and transforming data  
✅ **Data analysis**: Statistics, sorting, and grouping  
✅ **File operations**: Reading and writing CSV files  
✅ **Real-world practice**: Coffee shop sales analysis  

### Next Steps in Your Pandas Journey:
🚀 **Intermediate Topics:**
- Data cleaning and handling missing values
- Merging and joining DataFrames
- Advanced filtering and querying
- Time series analysis
- Data visualization with pandas

🚀 **Advanced Topics:**
- Performance optimization
- Working with large datasets
- Integration with other libraries (matplotlib, seaborn, scikit-learn)
- Database connections

### Key Takeaways:
💡 **Remember**: Pandas is like Excel with superpowers  
💡 **Practice**: The more you use it, the more natural it becomes  
💡 **Explore**: Don't be afraid to try new operations  
💡 **Documentation**: `help()` and pandas documentation are your friends  

---
**Happy Data Analyzing! 🐼📊**