# 🚀 Beginner Level (0 to Basics)



# 🔹 Step 1: Introduction to pandas

### What is pandas?
Pandas is a powerful Python library for data analysis and manipulation.

### Installation
`pip install pandas`

### Importing pandas
```python
import pandas as pd
```

### Code Practice


In [1]:
import pandas as pd

# 🔹 Step 2: Understanding pandas Data Structures

### Series (1D Data)
A Series is a one-dimensional array-like object.

### DataFrame (2D Data)
A DataFrame is a two-dimensional, table-like data structure.

### Creating Series and DataFrames
```python
# From Lists
s = pd.Series([1, 2, 3, 4])

# From Dictionaries
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

# From NumPy Arrays
import numpy as np
df = pd.DataFrame(np.random.rand(4,3), columns=['A', 'B', 'C'])

# From CSV Files
df = pd.read_csv('data.csv')
```

### Code Practice


In [5]:
df = pd.read_csv("data.csv")
df.head(2)

Unnamed: 0,Region,Country,Item Type,Sales Channel,Order Priority,Order Date,Order ID,Ship Date,Units Sold,Unit Price,Unit Cost,Total Revenue,Total Cost,Total Profit
0,Sub-Saharan Africa,South Africa,Fruits,Offline,M,7/27/2012,443368995,7/28/2012,1593,9.33,6.92,14862.69,11023.56,3839.13
1,Middle East and North Africa,Morocco,Clothes,Online,M,9/14/2013,667593514,10/19/2013,4611,109.28,35.84,503890.08,165258.24,338631.84


# 🔹 Step 3: Basic DataFrame Operations

### Checking Data
```python
df.head()
df.tail()
df.info -- have diffrent type
df.info() -- have diffrent type
df.describe -- have diffrent type
df.describe().round(1) -- have diffrent type
df.columns
df['Region'].unique()
```

### Accessing Columns and Rows
```python
df['city']
df.loc[0:3, ['city', 'type']]
df.iloc[0:3, [1, 2]]
```

### Adding & Removing Columns
```python
df['new_col'] = values
df.drop(columns=['col'], inplace=True)
```

### Columns
```
Index(['Region', 'Country', 'Item Type', 'Sales Channel', 'Order Priority',
       'Order Date', 'Order ID', 'Ship Date', 'Units Sold', 'Unit Price',
       'Unit Cost', 'Total Revenue', 'Total Cost', 'Total Profit'],
      dtype='object')
```

### Code Practice


In [18]:
# df.head(2)
# df.tail(2)
# df.info
# df.info()
# df.describe
# df.describe().round(1)
# df.columns
df['Region'].unique()

array(['Sub-Saharan Africa', 'Middle East and North Africa',
       'Australia and Oceania', 'Europe', 'Asia',
       'Central America and the Caribbean', 'North America'], dtype=object)

# 📊 Intermediate Level (Core pandas Operations)



# 🔹 Step 4: Data Cleaning & Preprocessing

### Handling Missing Values
```python
df.isnull().sum()
df.fillna(0)
df.dropna()
```

### Changing Data Types
```python
df['price'] = df['price'].astype(float)
```

### Code Practice


# 🔹 Step 5: Data Selection & Filtering

### Conditional Selection
```python
df[df['Age'] > 30]
df[df['Age'] < 30]
df[df['Age'] == 30]
```

### Multiple Conditions
```python
df[(df['Age'] > 30) & (df['Salary'] > 60000)]
```

### Code Practice


# 🔹 Step 6: Data Aggregation & Grouping

### GroupBy Operations
```python
df.groupby('Region')[['Units Sold', 'Total Cost']].sum()
```

### Code Practice


# 🔹 Step 7: Sorting & Indexing

```python
df.sort_values(by='Salary', ascending=False)
df.set_index('Name')
```

### Code Practice


# 🔹 Step 8: Merging & Joining Data

```python
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Salary': [50000, 60000]})
merged_df = pd.merge(df1, df2, on='ID')
```

### Code Practice


# 📈 Advanced Level (Performance Optimization & Advanced Features)



# 🔹 Step 9: Handling Dates & Time Series Analysis

```python
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
```

### Code Practice


# 🔹 Step 10: Working with Large Datasets (Performance Optimization)

```python
df.to_parquet('data.parquet')
```

### Code Practice


# 🔹 Step 11: Data Visualization with pandas

```python
df.plot(kind='line')
```

### Code Practice


# 🔹 Step 12: Exporting & Saving Data

```python
df.to_csv('output.csv')
```

### Code Practice


# 🎯 Expert Level (Real-World Applications & Integration)



# 🔹 Step 13: Connecting pandas with Databases

```python
import mysql.connector
conn = mysql.connector.connect(host='localhost', user='user', password='pass', database='db')
df = pd.read_sql('SELECT * FROM table', conn)
```

### Code Practice


# 🔹 Step 14: Automating Tasks with pandas

```python
import requests
response = requests.get('API_URL')
df = pd.json_normalize(response.json())
```

### Code Practice


# 🔹 Step 15: Machine Learning Integration

```python
df = pd.get_dummies(df, columns=['Category'])
```

### Code Practice


# 🔹 Step 16: Real-World Projects & Case Studies

Analyze datasets for Sales, HR, and Healthcare.

### Code Practice


## 🔹 Additional Advanced Topics

### 1️⃣ MultiIndexing (Hierarchical Indexing)
> Creating multi>level indexes (`df.set_index([col1, col2])`)
> Accessing data with multiple index levels (`df.loc[(index1, index2)]`)
> Unstacking & Stacking Data (`df.unstack()`, `df.stack()`)

### 2️⃣ Advanced String Operations
> Using `.str` methods (`df['column'].str.upper()`, `.str.strip()`, `.str.replace()`)
> Extracting patterns with regex (`df['column'].str.extract()`)
> Splitting strings into multiple columns (`df['column'].str.split()`)

### 3️⃣ Window Functions (Rolling & Expanding)
> Rolling Window Analysis (`df.rolling(window=3).mean()`)
> Expanding Window (`df.expanding().sum()`)
> Exponential Moving Average (`df.ewm(span=3).mean()`)

### 4️⃣ Time Zone Handling
> Converting time zones (`df['date'].dt.tz_localize()`)
> Changing time zones (`df['date'].dt.tz_convert()`)

### 5️⃣ Custom Aggregation Functions
> Using `.agg()` with multiple functions
> Writing custom aggregation functions

### 6️⃣ Categorical Data Optimization
> Converting object columns to categorical (`df['col'] = df['col'].astype('category')`)
> Performance benefits of categorical data

### 7️⃣ Sparse Data Handling
> Working with sparse data (`pd.SparseArray()`)
> Memory optimization techniques

### 8️⃣ Pandas with JSON Data
> Reading and normalizing JSON (`pd.json_normalize()`)
> Flattening nested JSON structures

### 9️⃣ Advanced SQL Integration
> Fetching large datasets efficiently (`chunksize` in `pd.read_sql()`)
> Using SQLAlchemy with pandas
> Query optimization for pandas & SQL

### 🔟 Pandas Profiling for Automated EDA
> Using `pandas>profiling` for automatic data analysis
> Generating a summary report of dataset statistics

## 🚀 Expert Level: Advanced Pandas for Data Analysts

This section covers advanced data manipulation techniques for expert-level data analysis.

### 🔹 Advanced Data Cleaning & Preprocessing

- Handling missing values: `fillna()`, `interpolate()`, `dropna()`
- Detecting and removing duplicates: `drop_duplicates()`
- Handling outliers using IQR method
- Data type conversions with `astype()`

```python
# Handling missing values
df.fillna(method='ffill')

# Detecting duplicates
df.duplicated().sum()

# Removing outliers
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
df = df[~((df < (Q1 - 1.5 * IQR)) | (df > (Q3 + 1.5 * IQR))).any(axis=1)]
```


### 🔹 Performance Optimization

- Using `categorical` data types for memory efficiency
- Checking memory usage with `df.info()`
- Downcasting numerical types to optimize RAM usage

```python
# Converting columns to category type
df['category_column'] = df['category_column'].astype('category')

# Downcasting numerical columns
df['int_column'] = pd.to_numeric(df['int_column'], downcast='integer')
df['float_column'] = pd.to_numeric(df['float_column'], downcast='float')
```


### 🔹 Time Series Analysis

- Converting columns to `datetime`
- Extracting date parts (year, month, day)
- Resampling time-series data

```python
# Convert column to datetime
df['date_column'] = pd.to_datetime(df['date_column'])

# Extract year, month, day
df['year'] = df['date_column'].dt.year
df['month'] = df['date_column'].dt.month

# Resampling data
df.resample('M', on='date_column').sum()
```


### 🔹 Merging, Joining & Concatenation

- `merge()` for SQL-style joins
- Concatenating multiple DataFrames
- `join()` for combining index-based datasets

```python
# Merging two DataFrames
merged_df = df1.merge(df2, on='common_column', how='inner')

# Concatenating multiple DataFrames
concat_df = pd.concat([df1, df2], axis=0)
```
