# General approach
1. **Define the business question clearly**
   * “Do we want to predict churn (classification), forecast revenue (regression), segment customers (clustering), or optimize actions (RL)?”
2.	**Identify the type of problem**
	* Classification, regression, clustering, time series, anomaly detection, RL, etc.
3.	**Collect and prepare data**
	* Clean, preprocess, encode features, handle missing values.
4.	**Choose candidate algorithms**
	* Start simple (Logistic Regression, Decision Tree).
	* Scale up if needed (Random Forest, Gradient Boosting, Neural Networks).
5.	**Split data and evaluate**
	* Train/test (or cross-validation).
	* Use the right metrics (AUC, RMSE, log-loss, etc. depending on purpose).
6.	**Interpret results & refine**
	* Check for overfitting (train vs validation curves).
	* Use feature importance / SHAP for business insight.
7.	**Deploy & monitor**
	* Push to production, track drift, retrain when necessary.

## Modeling Techniques with Typical Questions

| Category                         | Focus / Target                     | Typical Day-to-Day Questions                              | Typical Use Cases                            | Algorithms                                    |
|----------------------------------|------------------------------------|-----------------------------------------------------------|----------------------------------------------|-----------------------------------------------|
| **Discriminative Modeling**      | Learn boundaries that separate classes or predict outcomes | *“Will this customer churn or stay?”*<br>*“Is this transaction fraudulent?”* | Churn prediction, fraud detection, credit scoring | Logistic Regression, SVM, Decision Trees, Random Forests, Gradient Boosting |
| **Generative Modeling**          | Model how data is generated (joint distribution P(X, Y)) → infer probabilities | *“Which class most likely generated this example?”*<br>*“If it’s spam, how do the words usually look?”* | Spam filtering, text classification, image generation | Naive Bayes, Gaussian Mixture Models (GMM), Hidden Markov Models (HMM), GANs, VAEs |
| **Regression (Continuous Outcomes)** | Predict numeric values (continuous target variable) | *“What will our sales be next quarter?”*<br>*“How much revenue will this customer generate?”* | Revenue forecasting, demand prediction, customer lifetime value | Linear Regression, Polynomial Regression, Regression Trees, Gradient Boosting, Neural Nets |
| **Unsupervised Structure Discovery** | No target → find hidden structure or reduce dimensionality | *“How many natural customer segments do we have?”*<br>*“Can we reduce features but keep most information?”* | Customer segmentation, anomaly detection, visualization | K-Means, DBSCAN, Hierarchical Clustering, PCA, t-SNE |
| **Sequential Decision-Making (Reinforcement Learning)** | Learn policies that maximize cumulative reward over time | *“What’s the best pricing strategy over time?”*<br>*“Which product should we recommend next?”* | Dynamic pricing, recommender systems, robotics, supply chain optimization | Q-Learning, Deep Reinforcement Learning, Actor-Critic Methods |
| **Time-to-Event / Risk Modeling (Survival Analysis)** | Predict time until an event occurs, handling censoring | *“When will this customer likely churn?”*<br>*“How long until a machine fails?”* | Churn timing, equipment failure, patient survival | Kaplan-Meier Estimator, Cox Proportional Hazards, Survival Forests |
| **Time Series Forecasting**      | Predict future values along a timeline with temporal dependencies | *“What will our daily sales look like next month?”*<br>*“How much energy will be consumed tomorrow?”* | Sales prediction, financial forecasting, demand planning, energy usage | ARIMA, Exponential Smoothing (ETS), Prophet, LSTM, Transformers for sequences |