## 🐼 PANDAS COMPLETE COURSE (2025 EDITION)


## 1️⃣ Introduction to Pandas
🐍 What is Pandas?

Pandas (short for Panel Data) is a Python library built on top of NumPy, designed for:

Data manipulation (cleaning, transforming, reshaping)

Data analysis (aggregating, grouping, summarizing)

Data visualization (basic charts and exploration)

Integration with many formats — CSV, Excel, SQL, JSON, etc.

It provides fast, flexible, and expressive data structures to work with labeled and relational data (like a spreadsheet or SQL table).

👉 In short:

Pandas = NumPy + SQL + Excel (combined, but in Python syntax)

### 🧠 Why Use Pandas?
🔹 Before Pandas:

Working with raw Python lists, dictionaries, or NumPy arrays for data analysis was hard because:

You needed loops for simple operations.

Data often came from CSVs or databases — not easy to clean manually.

NumPy arrays don’t store labels (column names / indices).

🔹 With Pandas:

Data is organized like a table with rows and columns.

You can easily filter, aggregate, merge, clean, and transform data.

Pandas is optimized in C for performance.

| Feature               | Python Lists | NumPy Arrays | Pandas                          |
| --------------------- | ------------ | ------------ | ------------------------------- |
| Labeled Data          | ❌            | ❌            | ✅                               |
| Heterogeneous Data    | ✅            | ❌            | ✅                               |
| Missing Data Handling | ❌            | ❌            | ✅                               |
| SQL-like Operations   | ❌            | ❌            | ✅                               |
| File I/O Support      | ❌            | ✅ (limited)  | ✅ (CSV, Excel, JSON, SQL, etc.) |


### 🧩 Pandas vs NumPy

| **Aspect**    | **NumPy**                               | **Pandas**                                     |
| ------------- | --------------------------------------- | ---------------------------------------------- |
| Structure     | Homogeneous (same data type)            | Heterogeneous (different types)                |
| Main Object   | ndarray                                 | Series, DataFrame                              |
| Label Support | Only numeric indices                    | Custom row/column labels                       |
| Missing Data  | Limited handling (NaN for float only)   | Full support with `NaN`, `isna()`, `fillna()`  |
| Functionality | Mathematical operations                 | Data analysis, manipulation, grouping, joining |
| Performance   | Slightly faster for numeric computation | Slightly slower (adds abstraction)             |
| Use Case      | Scientific computing                    | Data analysis / manipulation                   |


###### 👉 Pandas actually uses NumPy under the hood, so you often use both together.

## Install pandas 

In [5]:
!pip install pandas



## 🧱 Core Data Structures in Pandas
Pandas introduces two primary data structures:


#### 1️⃣ Series — 1D Labeled Array

A one-dimensional array that holds data + labels (index).

In [10]:
import pandas as pd

s = pd.Series([1,2,3,4,5,6])
print(s)

s_index = pd.Series([1,2,3],index=['a','b','c'])
print(s_index)

0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64
a    1
b    2
c    3
dtype: int64


###### 🔹 Key Points:

Each element has an index label.

Works like a column in Excel or a single NumPy array with labels.

Vectorized operations supported:

In [13]:
print(s * 2)   # multiplies each element
print(s + 5)   # adds 5 to each element


0     2
1     4
2     6
3     8
4    10
5    12
dtype: int64
0     6
1     7
2     8
3     9
4    10
5    11
dtype: int64


### 2️⃣ DataFrame — 2D Labeled Table

A tabular data structure with rows and columns, like an Excel sheet or SQL table.

In [16]:
import pandas as pd 

data = {
    'name':["vineeth","jagan","jagadeesh"],
    'languges':['python','c','java']
}

df = pd.DataFrame(data)


ValueError: All arrays must be of the same length