# Supervised Learning

## Introduction to Supervised Learning
- Definition of supervised learning
- Difference between supervised, unsupervised, and reinforcement learning
- Goal: predict a target variable from input/features
- Variables: features (X) and target (y)
- Types of target:
  - Continuous → regression
  - Categorical → classification

## Fundamental Concepts
- Dataset and samples
- Training set vs Test set
- Overfitting and underfitting
- Bias and Variance
- Model evaluation

## Data Preprocessing
- Data cleaning
- Handling missing values
- Normalization and standardization
- Encoding categorical variables:
  - One-hot encoding
  - Label encoding
- Feature scaling
- Feature selection and feature engineering

## Regression
### Linear Regression
- Formula and interpretation of coefficients
- Parameter estimation with OLS (Ordinary Least Squares)
- Assumptions of linear regression
- Evaluation metrics: MSE, RMSE, R²

### Polynomial Regression
- Using polynomials for non-linear models
- Overfitting and regularization

### Regularized Regression
- Ridge regression (L2)
- Lasso regression (L1)
- Elastic Net

### Other Regression Models
- Support Vector Regression (SVR)
- Decision Tree Regressor
- Random Forest Regressor
- Gradient Boosting Regressor

## Classification
### Binary Classification
- Logistic Regression
- Interpretation of coefficients
- Sigmoid function
- Decision boundary

### Multiclass Classification
- One-vs-Rest
- Softmax
- Multinomial Logistic Regression

### Classification Algorithms
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
- Decision Tree Classifier
- Random Forest Classifier
- Gradient Boosting Classifier (XGBoost, LightGBM, CatBoost)
- Naive Bayes
- Neural Networks for classification

## Model Evaluation
### Regression Metrics
- MSE, RMSE, MAE
- R² (Coefficient of determination)
- Adjusted R²

### Classification Metrics
- Accuracy
- Precision, Recall, F1-score
- Confusion Matrix
- ROC curve and AUC
- Log-loss
- Matthews Correlation Coefficient (MCC)
- Cohen’s Kappa
- Balanced accuracy
- F-beta score

### Cross-validation
- K-Fold Cross Validation
- Leave-One-Out Cross Validation
- Stratified K-Fold

## Advanced Techniques
- Ensemble methods
  - Bagging (Random Forest)
  - Boosting (AdaBoost, Gradient Boosting)
  - Stacking
- Feature importance
- Hyperparameter tuning
  - Grid Search
  - Random Search
  - Bayesian Optimization
- Learning curves and validation curves

## Advanced Topics / Algorithms
- Probabilistic models:
  - Bayesian regression
  - Gaussian Naive Bayes
- Distance-based methods beyond KNN:
  - Metric learning concepts
- Tree-based advanced techniques:
  - Extra Trees
  - Gradient Boosting variants (CatBoost specifics)
- Neural networks for tabular data (basic MLP)
- Calibration of probabilistic classifiers

## Practical Data Handling
- Feature transformation:
  - Log transformation, polynomial features
  - Interaction terms
- Categorical variable encoding advanced:
  - Target encoding
  - Frequency encoding
- Handling missing data advanced:
  - Imputation techniques (mean, median, k-NN, MICE)
- Data leakage prevention
- Pipeline automation (scikit-learn pipelines)

## Evaluation / Metrics
- Learning curves (training vs validation performance)
- Validation curves (hyperparameter impact)
- Bootstrapping and Monte Carlo evaluation
- Nested Cross-validation
- Confounding variables and collinearity
- Concept drift handling

## Practical Considerations
- Imbalanced datasets
  - Oversampling (SMOTE)
  - Undersampling
- Handling outliers
- Model interpretability
- Model deployment
- Scalability and optimization

## Applications
- Sales prediction (regression)
- Image or text classification
- Fraud detection
- Churn prediction
- Medical diagnosis


# Unsupervised Learning

## Introduction to Unsupervised Learning
- Definition of unsupervised learning
- Difference between supervised, unsupervised, and reinforcement learning
- Goal: find patterns, structures, or groupings in data
- No target variable (y)
- Common tasks:
  - Clustering
  - Dimensionality reduction
  - Anomaly detection

## Fundamental Concepts
- Dataset and features
- Distance and similarity measures
  - Euclidean distance
  - Manhattan distance
  - Cosine similarity
  - Correlation
- Overfitting and underfitting in unsupervised learning
- Evaluation challenges (lack of ground truth)

## Data Preprocessing
- Data cleaning
- Handling missing values
- Normalization and standardization
- Encoding categorical variables
- Feature scaling
- Feature selection and feature engineering
- Dimensionality reduction before clustering (optional)

## Clustering
### Partitioning Methods
- K-Means
  - Algorithm overview
  - Choosing number of clusters (elbow method, silhouette score)
  - Limitations: sensitive to initialization, outliers
- K-Medoids / PAM
- Mini-Batch K-Means

### Hierarchical Clustering
- Agglomerative clustering
  - Linkage methods: single, complete, average, ward
- Divisive clustering
- Dendrogram visualization

### Density-Based Clustering
- DBSCAN
- OPTICS
- HDBSCAN

### Model-Based Clustering
- Gaussian Mixture Models (GMM)
- Expectation-Maximization algorithm
- Choosing number of components (BIC/AIC)

### Other Clustering Techniques
- Spectral Clustering
- Self-Organizing Maps (SOM)
- Mean-Shift clustering

## Dimensionality Reduction
- Principal Component Analysis (PCA)
- Kernel PCA
- Independent Component Analysis (ICA)
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Uniform Manifold Approximation and Projection (UMAP)
- Linear Discriminant Analysis (LDA, supervised variant)

## Anomaly Detection
- Z-score and statistical methods
- Isolation Forest
- One-Class SVM
- Local Outlier Factor (LOF)
- Autoencoder-based anomaly detection

## Evaluation of Unsupervised Models
- Internal metrics:
  - Silhouette score
  - Davies-Bouldin index
  - Calinski-Harabasz index
- External metrics (if ground truth available):
  - Adjusted Rand Index (ARI)
  - Normalized Mutual Information (NMI)
  - Fowlkes-Mallows score
- Visual inspection:
  - Scatter plots, cluster plots
  - Heatmaps

## Advanced Techniques
- Ensemble clustering
- Consensus clustering
- Subspace clustering
- Feature learning with autoencoders
- Self-supervised learning (modern approach)

## Practical Considerations
- Choosing the right number of clusters/components
- Handling high-dimensional data
- Handling categorical features
- Handling outliers
- Scaling for large datasets
- Interpretability of clusters or latent features

## Applications
- Customer segmentation
- Market basket analysis
- Anomaly/fraud detection
- Image compression or embedding
- Topic modeling in text
- Dimensionality reduction for visualization
