# 🔢 NumPy: Numerical Python

**Definition:** NumPy is the core library for numerical and scientific computing in Python.  
It introduces the `ndarray` object (N-dimensional array), which is faster and more efficient than Python lists.

### Why NumPy?
- Efficient storage and fast operations on arrays of numbers.
- Powerful tools for linear algebra, Fourier transforms, and random number generation.
- Supports broadcasting, which eliminates the need for loops.

💡 **Real-world Use Cases:**
- ML preprocessing (normalize and scale features)
- Simulations in physics or finance
- Image manipulation (arrays of pixel values)


In [None]:
import numpy as np

### Example: Creating Arrays
Here we create 1D and 2D arrays using NumPy. Arrays are the foundation for all numerical operations.

In [None]:
# Basic array creation
arr = np.array([1,2,3,4,5])
print("1D Array:", arr)

matrix = np.array([[1,2,3],[4,5,6],[7,8,9]])
print("\n2D Matrix:\n", matrix)

print("\nShape:", matrix.shape)
print("Size:", matrix.size)
print("Datatype:", matrix.dtype)

### Example: Indexing & Masks
Boolean masks filter elements (like even numbers), while fancy indexing selects specific indices.

In [None]:
# Indexing, slicing, fancy indexing, boolean masks
arr = np.arange(10)
print("Original:", arr)
print("Even numbers:", arr[arr % 2 == 0])
indices = [1, 3, 5, 7]
print("Picked elements:", arr[indices])

### Example: Reshaping, Transpose, Stacking, Splitting
Reshape changes dimensions, transpose flips rows/cols, stacking combines arrays, splitting divides them.

In [None]:
# Reshaping, transpose, stacking, splitting
arr = np.arange(1,13).reshape(3,4)
print("Matrix:\n", arr)
print("Transpose:\n", arr.T)

a = np.array([1,2,3])
b = np.array([4,5,6])
print("Horizontal Stack:", np.hstack((a,b)))
print("Vertical Stack:\n", np.vstack((a,b)))

split1, split2 = np.split(np.arange(1,11), 2)
print("First half:", split1)
print("Second half:", split2)

### Example: Mathematical Functions
NumPy provides fast vectorized math functions such as sin, cos, exp. Useful for scientific and engineering computations.

In [None]:
# Mathematical functions
x = np.linspace(0, 2*np.pi, 5)
print("x:", x)
print("sin(x):", np.sin(x))
print("cos(x):", np.cos(x))
print("exp(x):", np.exp(x))

### Example: Sorting & Searching
Sort values, find max index, or locate elements using `np.where()`.

In [None]:
# Sorting & Searching
arr = np.array([10, 2, 8, 4, 7])
print("Original:", arr)
print("Sorted:", np.sort(arr))
print("Index of max:", np.argmax(arr))
print("Where arr > 5:", np.where(arr > 5))

### Example: Broadcasting
Apply operations across an array without writing loops. Here we reduce all salaries by 10% tax.

In [None]:
# Broadcasting & vectorized operations
salaries = np.array([1000,2000,3000,4000])
print("After 10% tax:", salaries * 0.9)

### Example: Linear Algebra
Perform matrix multiplication, essential for ML algorithms and simulations.

In [None]:
# Linear Algebra
A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])
print("Matrix Multiplication:\n", np.dot(A,B))

### Example: Random Numbers & Statistics
Generate random numbers for simulations or ML initialization, and compute basic stats like mean and std.

In [None]:
# Random numbers & statistics
rand_matrix = np.random.randint(1,100,(3,3))
print("Random Matrix:\n", rand_matrix)

data = np.random.normal(50, 10, 1000)
print("Mean:", data.mean(), " Std Dev:", data.std())

# 🐼 Pandas: Data Manipulation

**Definition:** Pandas provides high-performance data structures and data analysis tools.

### Core Data Structures
- **Series**: 1D labeled array (like a column in Excel)
- **DataFrame**: 2D table of rows and columns

💡 **Real-world Use Cases:**
- Business analytics (sales, HR, finance)
- Data cleaning before ML model training
- Handling time-series data


### Example: Creating a Pandas DataFrame
We build an employee dataset for analysis.

In [None]:
import pandas as pd

# Sample Employee dataset
data = {
    'ID': [1,2,3,4,5,6,7,8],
    'Name': ['Alice','Bob','Charlie','David','Eva','Frank','Grace','Helen'],
    'Age': [25,30,35,40,29,50,28,32],
    'Department': ['HR','IT','Finance','IT','HR','Finance','IT','Marketing'],
    'Salary': [50000,60000,70000,80000,None,95000,62000,72000],
    'JoiningDate': ['2019-03-01','2018-07-15','2020-01-12','2017-06-23','2019-11-05','2015-05-30','2021-01-10','2018-09-25'],
    'Bonus': [5000,None,7000,8000,4500,10000,None,6000]
}
df = pd.DataFrame(data)
df['JoiningDate'] = pd.to_datetime(df['JoiningDate'])
df

### Example: Inspecting Data
`info()` shows structure and nulls, `describe()` provides summary statistics.

In [None]:
# Inspecting
print(df.info())
print(df.describe())

### Example: Handling Missing Data
Fill missing values with averages or drop rows with missing entries.

In [None]:
# Handling missing values
df['Salary'].fillna(df['Salary'].mean(), inplace=True)
df = df.dropna(subset=['Bonus'])
df

### Example: Selecting & Filtering
Select specific columns or rows based on conditions.

In [None]:
# Selecting & filtering
print(df[['Name','Department','Salary']])
print(df[(df['Department']=='IT') & (df['Salary']>65000)])

### Example: Sorting
Arrange rows by Salary (descending or ascending).

In [None]:
# Sorting
df.sort_values(by='Salary', ascending=False)

### Example: Apply Functions
Use `.apply()` with `lambda` to compute new columns, e.g., Bonus as % of Salary.

In [None]:
# Apply functions
df['BonusPercent'] = df.apply(lambda row: (row['Bonus']/row['Salary'])*100, axis=1)
df[['Name','Salary','Bonus','BonusPercent']]

### Example: String Operations
Handle text data with functions like uppercase and substring checks.

In [None]:
# String operations
df['NameUpper'] = df['Name'].str.upper()
df['IsIT'] = df['Department'].str.contains('IT')
df[['Name','NameUpper','IsIT']]

### Example: Date/Time Handling
Extract parts of datetime like year and month.

In [None]:
# Date/time handling
df['JoinYear'] = df['JoiningDate'].dt.year
df['JoinMonth'] = df['JoiningDate'].dt.month
df[['Name','JoiningDate','JoinYear','JoinMonth']]

### Example: Grouping & Aggregation
Summarize data per category, e.g., average Salary per Department.

In [None]:
# Grouping & aggregation
print(df.groupby('Department')['Salary'].mean())

### Example: Pivot Tables
Reshape and summarize data, similar to Excel Pivot Tables.

In [None]:
# Pivot tables
pd.pivot_table(df, values='Salary', index='Department', aggfunc='mean')

### Example: Creating a Pandas DataFrame
We build an employee dataset for analysis.

In [None]:
# Joins
dept_extra = pd.DataFrame({
    'Department': ['HR','IT','Finance','Marketing','Legal'],
    'Head': ['Anna','John','Sam','Rose','Tom']
})
pd.merge(df, dept_extra, on='Department', how='left')

### Example: File I/O
Save DataFrames to CSV and load them back.

In [None]:
# File I/O
df.to_csv("employees.csv", index=False)
df_loaded = pd.read_csv("employees.csv")
df_loaded.head()

# 🎨 Matplotlib: Visualization

**Definition:** Matplotlib is the base library for creating static, animated, and interactive visualizations.

### Features
- Highly customizable
- Supports line, bar, scatter, histogram
- Works with Pandas and NumPy

💡 **Use Cases:** Sales dashboards, performance monitoring, scientific visualization.


### Example: Line Plot
Used to show trends over categories (e.g., employee salaries).

In [None]:
import matplotlib.pyplot as plt

# Line plot
plt.plot(df['Name'], df['Salary'], marker='o', color='blue')
plt.title("Employee Salaries")
plt.xlabel("Employee")
plt.ylabel("Salary")
plt.xticks(rotation=45)
plt.show()

### Example: Histogram
Shows distribution of continuous variables (e.g., Age distribution).

In [None]:
# Histogram
plt.hist(df['Age'], bins=5, color='orange')
plt.title("Age Distribution")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()

### Example: Styled Plot
Customize lines, markers, colors, and add grid and legend.

In [None]:
# Styled plots
x = [1,2,3,4,5]
y = [10,20,25,30,40]
plt.plot(x, y, 'r--', marker='o', label='Trend')
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Styled Line Plot")
plt.legend()
plt.grid(True)
plt.show()

### Example: Subplots & Pie Chart
Display multiple plots in one figure. Pie charts show proportion of categories.

In [None]:
# Multiple subplots & pie chart
plt.figure(figsize=(10,5))

plt.subplot(1,2,1)
plt.scatter(df['Age'], df['Salary'], color='blue')
plt.title("Age vs Salary")

plt.subplot(1,2,2)
plt.pie(df['Department'].value_counts(), labels=df['Department'].value_counts().index, autopct='%1.1f%%')
plt.title("Employees by Department")

plt.tight_layout()
plt.show()

### Example: Saving Figures
Export plots to image files for reports.

In [None]:
# Saving figures
plt.bar(df['Name'], df['Salary'])
plt.title("Employee Salaries")
plt.xticks(rotation=45)
plt.savefig("employee_salaries.png")
plt.show()

# 🌈 Seaborn: Statistical Visualization

**Definition:** Seaborn is built on Matplotlib and provides a high-level interface for attractive statistical graphics.

### Features
- Easy integration with Pandas
- Specialized plots for distributions and categories
- Beautiful styles by default

💡 **Use Cases:** Correlation heatmaps, EDA, categorical comparisons.


### Example: Distribution Plot with KDE
Histograms show data distribution. KDE adds smooth curve to visualize density.

In [None]:
import seaborn as sns

# Distribution plots
sns.histplot(df['Salary'], kde=True, bins=6)
plt.title("Salary Distribution with KDE")
plt.show()

### Example: Scatterplot
Shows relationship between two variables (e.g., Age vs Salary), colored by Department.

In [None]:
# Scatterplot
sns.scatterplot(x='Age', y='Salary', hue='Department', size='Bonus', data=df)
plt.title("Age vs Salary by Department")
plt.show()

### Example: Boxplot
Shows distribution, median, and outliers per category.

In [None]:
# Boxplot & Violin plot
plt.figure(figsize=(12,5))

plt.subplot(1,2,1)
sns.boxplot(x='Department', y='Salary', data=df)
plt.title("Boxplot: Salary by Department")

plt.subplot(1,2,2)
sns.violinplot(x='Department', y='Salary', data=df, palette="muted")
plt.title("Violinplot: Salary by Department")

plt.tight_layout()
plt.show()

### Example: Countplot
Shows frequency of categorical values (e.g., number of employees per Department).

In [None]:
# Countplot & Barplot
sns.countplot(x='Department', data=df)
plt.title("Department Count")
plt.show()

sns.barplot(x='Department', y='Salary', data=df, estimator=np.mean, ci=None)
plt.title("Average Salary by Department")
plt.show()

### Example: Heatmap
Visualizes correlations between numeric columns.

In [None]:
# Heatmap & Pairplot
plt.figure(figsize=(8,6))
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()

sns.pairplot(df[['Age','Salary','Bonus','AnnualSalary']], diag_kind='kde')
plt.show()

### Example: Distribution Plot with KDE
Histograms show data distribution. KDE adds smooth curve to visualize density.

In [None]:
# Lineplot & FacetGrid
sns.lineplot(x='Age', y='Salary', data=df, marker='o')
plt.title("Lineplot: Age vs Salary")
plt.show()

g = sns.FacetGrid(df, col="Department")
g.map(sns.histplot, "Age")
plt.show()

### Example: Jointplot
Combines scatterplot with histograms for joint distributions.

In [None]:
# Jointplot
sns.jointplot(x='Age', y='Salary', data=df, kind='scatter')
plt.show()

# 🎯 Wrap-Up

- **NumPy** → Fast numerical operations (arrays, math, linear algebra, random)
- **Pandas** → Rich data manipulation (filtering, grouping, pivot, joins)
- **Matplotlib** → Customizable plots
- **Seaborn** → Beautiful statistical plots

👉 Together, these libraries form the **foundation of Data Science in Python**.

---
✅ Next Step: Try applying these techniques to your own dataset (CSV/Excel/SQL).