# General approach
1. **Define the business question clearly**
   * “Do we want to predict churn (classification), forecast revenue (regression), segment customers (clustering), or optimize actions (RL)?”
2.	**Identify the type of problem**
	* Classification, regression, clustering, time series, anomaly detection, RL, etc.
3.	**Collect and prepare data**
	* Clean, preprocess, encode features, handle missing values.
4.	**Choose candidate algorithms**
	* Start simple (Logistic Regression, Decision Tree).
	* Scale up if needed (Random Forest, Gradient Boosting, Neural Networks).
5.	**Split data and evaluate**
	* Train/test (or cross-validation).
	* Use the right metrics (AUC, RMSE, log-loss, etc. depending on purpose).
6.	**Interpret results & refine**
	* Check for overfitting (train vs validation curves).
	* Use feature importance / SHAP for business insight.
7.	**Deploy & monitor**
	* Push to production, track drift, retrain when necessary.

## Modeling Techniques with Typical Questions

| Category                         | Focus / Target                     | Typical Day-to-Day Questions                              | Typical Use Cases                            | Algorithms                                    |
|----------------------------------|------------------------------------|-----------------------------------------------------------|----------------------------------------------|-----------------------------------------------|
| **Discriminative Modeling**      | Learn boundaries that separate classes or predict outcomes | *“Will this customer churn or stay?”*<br>*“Is this transaction fraudulent?”* | Churn prediction, fraud detection, credit scoring | Logistic Regression, SVM, Decision Trees, Random Forests, Gradient Boosting |
| **Generative Modeling**          | Model how data is generated (joint distribution P(X, Y)) → infer probabilities | *“Which class most likely generated this example?”*<br>*“If it’s spam, how do the words usually look?”* | Spam filtering, text classification, image generation | Naive Bayes, Gaussian Mixture Models (GMM), Hidden Markov Models (HMM), GANs, VAEs |
| **Regression (Continuous Outcomes)** | Predict numeric values (continuous target variable) | *“What will our sales be next quarter?”*<br>*“How much revenue will this customer generate?”* | Revenue forecasting, demand prediction, customer lifetime value | Linear Regression, Polynomial Regression, Regression Trees, Gradient Boosting, Neural Nets |
| **Unsupervised Structure Discovery** | No target → find hidden structure or reduce dimensionality | *“How many natural customer segments do we have?”*<br>*“Can we reduce features but keep most information?”* | Customer segmentation, anomaly detection, visualization | K-Means, DBSCAN, Hierarchical Clustering, PCA, t-SNE |
| **Sequential Decision-Making (Reinforcement Learning)** | Learn policies that maximize cumulative reward over time | *“What’s the best pricing strategy over time?”*<br>*“Which product should we recommend next?”* | Dynamic pricing, recommender systems, robotics, supply chain optimization | Q-Learning, Deep Reinforcement Learning, Actor-Critic Methods |
| **Time-to-Event / Risk Modeling (Survival Analysis)** | Predict time until an event occurs, handling censoring | *“When will this customer likely churn?”*<br>*“How long until a machine fails?”* | Churn timing, equipment failure, patient survival | Kaplan-Meier Estimator, Cox Proportional Hazards, Survival Forests |
| **Time Series Forecasting**      | Predict future values along a timeline with temporal dependencies | *“What will our daily sales look like next month?”*<br>*“How much energy will be consumed tomorrow?”* | Sales prediction, financial forecasting, demand planning, energy usage | ARIMA, Exponential Smoothing (ETS), Prophet, LSTM, Transformers for sequences |

# Machine Learning Algorithms — Comprehensive Overview

| Algorithm / Family       | Type              | Typical Use Cases                           | Advantages                                                                 | Disadvantages                                                                 |
|---------------------------|------------------|---------------------------------------------|----------------------------------------------------------------------------|-------------------------------------------------------------------------------|
| **Linear Regression**     | Supervised (Regression) | Price prediction, sales forecasting         | Simple, interpretable                                                      | Assumes linearity, sensitive to outliers                                      |
| **Polynomial Regression** | Supervised (Regression) | Modeling nonlinear trends                   | Captures curves with few features                                          | Overfitting risk at high degree                                               |
| **Ridge / Lasso Regression** | Supervised (Regression) | Feature selection, collinearity handling    | Regularization prevents overfitting, handles multicollinearity             | Still assumes linearity                                                       |
| **Logistic Regression**   | Supervised (Classification) | Churn prediction, credit scoring           | Probabilistic output, interpretable, fast                                  | Only linear decision boundaries                                               |
| **Naive Bayes**           | Supervised (Classification) | Spam detection, text classification        | Very fast, works well with high-dimensional text                           | Strong independence assumption                                                |
| **Support Vector Machine (SVM)** | Supervised (Classification/Regression) | Text/image classification                  | Works well in high dimensions, robust margin-based optimization            | Computationally heavy, hard to tune                                           |
| **k-Nearest Neighbors (KNN)** | Supervised (Classification/Regression) | Recommendations, anomaly detection         | No training needed, simple, intuitive                                      | Slow predictions, sensitive to noise and scaling                              |
| **Decision Tree**         | Supervised (Classification/Regression) | Segmentation, fraud detection              | Easy to interpret, handles non-linear relationships                        | Prone to overfitting                                                          |
| **Random Forest**         | Supervised (Ensemble) | General-purpose classification/regression   | Robust, reduces overfitting, handles missing values                        | Slower prediction, less interpretable                                         |
| **Gradient Boosting (XGBoost, LightGBM, CatBoost)** | Supervised (Ensemble) | Tabular data, ranking, Kaggle comps        | Very high accuracy, handles non-linearity, built-in regularization          | Needs tuning, longer training times                                           |
| **Neural Networks (MLP, Deep Learning)** | Supervised | NLP, image/speech recognition, demand forecasting | Captures complex nonlinear patterns                                       | Requires lots of data, compute-heavy, black box                               |
| **K-Means**               | Unsupervised (Clustering) | Customer segmentation, image compression    | Simple, scalable                                                           | Assumes spherical clusters, must choose k, sensitive to initialization        |
| **Hierarchical Clustering** | Unsupervised (Clustering) | Taxonomy, document clustering              | Dendrograms interpretable                                                  | Poor scalability                                                              |
| **DBSCAN**                | Unsupervised (Clustering) | Anomaly detection, spatial data             | Finds arbitrary cluster shapes, no need for k                              | Struggles with varying densities, high dimension data                         |
| **Gaussian Mixture Models (GMM)** | Unsupervised (Clustering) | Soft clustering, anomaly detection         | Probabilistic cluster membership                                           | Assumes Gaussian-shaped clusters                                              |
| **PCA (Principal Component Analysis)** | Unsupervised (Dim. Reduction) | Visualization, noise reduction             | Reduces dimensionality, speeds computation                                 | Components are hard to interpret                                              |
| **t-SNE**                 | Unsupervised (Dim. Reduction) | Visualizing embeddings (2D/3D)             | Reveals local structure/clusters                                           | Computationally expensive, not predictive                                     |
| **Apriori / FP-Growth**   | Unsupervised (Association Rule Mining) | Market basket analysis                     | Generates interpretable association rules                                  | Combinatorial explosion with many items                                       |
| **ARIMA / ETS**           | Time Series      | Sales/finance forecasting                   | Interpretable, widely used                                                 | Requires stationarity, struggles with nonlinear data                          |
| **Prophet (Facebook)**    | Time Series      | Business-friendly forecasting               | Handles trends, holidays automatically                                     | Less accurate on very noisy data                                              |
| **LSTM / RNN**            | Time Series (Deep Learning) | Demand patterns, sequential data           | Captures long-term dependencies                                            | Data-hungry, hard to tune                                                     |
| **Ensemble Methods (Bagging, Boosting, Stacking)** | Meta-approach | Fraud detection, competitions              | Improves accuracy and robustness                                           | Less interpretable, longer training                                           |
| **Reinforcement Learning (Q-Learning, DQN, Actor-Critic)** | RL | Robotics, recommender systems, pricing     | Learns optimal policies by trial & error                                   | Data- and compute-intensive, slow convergence                                 |
| **CNN (Convolutional Neural Network)** | Computer Vision | Image classification, object detection     | Great for spatial data, transfer learning possible                         | Requires large datasets, GPU resources                                        |
| **Transformers (BERT, GPT)** | NLP (Deep Learning) | Text classification, translation, NER       | State-of-the-art performance, handles context well                         | Huge compute cost, less interpretable                                         |
| **Isolation Forest**      | Anomaly Detection | Fraud, rare-event detection                 | Works well for high-dimensional data                                       | May miss subtle anomalies                                                     |
| **One-Class SVM**         | Anomaly Detection | Intrusion detection                         | Effective for small datasets                                               | Scales poorly, kernel tuning required                                         |
| **Autoencoders**          | Anomaly Detection (DL) | Outlier detection, dimensionality reduction| Learns compressed representation                                           | Needs tuning, less interpretable                                              |
| **Collaborative Filtering** | Recommendation Systems | Product/movie recommendation               | Leverages user–item interactions                                          | Cold start problem, sparse data issues                                        |
| **Content-Based Filtering** | Recommendation Systems | News, personalized feeds                   | Works with item metadata                                                   | Limited diversity, needs good features                                        |
| **Hybrid Recommenders**   | Recommendation Systems | E-commerce, streaming services              | Combines collaborative + content-based                                     | More complex to implement                                                     |
| **GANs (Generative Adversarial Networks)** | Generative Models | Image synthesis, data augmentation         | Generates realistic synthetic data                                         | Hard to train, instability issues                                             |
| **Causal Inference (e.g., Uplift Modeling, Propensity Scoring)** | Specialized | Marketing experiments, healthcare          | Answers “what if” causal questions                                         | Strong assumptions, tricky validation                                         |
| **Survival Analysis (CoxPH, Kaplan-Meier)** | Specialized | Customer churn, medical outcomes           | Handles time-to-event with censoring                                       | Needs careful interpretation                                                  |
| **Explainable AI (XAI, SHAP, LIME)** | Meta-approach | Any ML application needing transparency    | Makes black-box models interpretable                                       | Adds complexity, approximation not always exact                               |