📌 START
   │
   ├──▶ 🧩 1. Understand the Problem Type
   │       ├── Classification?
   │       │     ├── Binary → Use Sigmoid + Binary Crossentropy
   │       │     └── Multi-class → Use Softmax + Categorical Crossentropy
   │       │
   │       └── Regression?
   │             └── Use Linear Output + MSE / MAE / Huber
   │
   ├──▶ 🧠 2. Input Shape and Preprocessing
   │       ├── Number of Features?
   │       ├── Normalization/Standardization needed?
   │       └── Reshape input if needed (especially for images, sequences)
   │
   ├──▶ 🔧 3. Choose Network Type
   │       ├── Tabular → Use MLP (Dense NN)
   │       ├── Images → Use CNN
   │       ├── Text/Sequences → Use RNN / LSTM / GRU / Transformers
   │       └── Time Series → RNN/GRU + Sequence handling
   │
   ├──▶ 🧱 4. Define Number of Layers & Neurons
   │       ├── Shallow Data → Fewer Layers
   │       ├── Complex Patterns → More Layers (Deep Learning)
   │       ├── Start with: [Input] → [Hidden Layer 1: 4~64 neurons] → ... → [Output]
   │       └── Rule of thumb: decrease neurons layer by layer or keep balanced
   │
   ├──▶ 🔋 5. Choose Activation Functions
   │       ├── Hidden Layers
   │       │     ├── ReLU (default)
   │       │     ├── Leaky ReLU / ELU → If ReLU dying neurons
   │       │     └── Tanh → If data is centered around 0
   │       └── Output Layer
   │             ├── Sigmoid → Binary classification
   │             ├── Softmax → Multi-class classification
   │             └── Linear → Regression
   │
   ├──▶ 🧮 6. Choose Loss Function
   │       ├── Match with output activation
   │       │     ├── Sigmoid → Binary Crossentropy
   │       │     ├── Softmax → Categorical Crossentropy
   │       │     └── Linear → MSE / MAE
   │
   ├──▶ ⚙️ 7. Optimizer & Learning Rate
   │       ├── Adam (Most used)
   │       ├── SGD → When tuning from scratch or prefer stability
   │       └── LR Scheduling → Reduce LR on plateau or exponential decay
   │
   ├──▶ 🧪 8. Training Strategy
   │       ├── Use Validation Set
   │       ├── Early Stopping
   │       ├── Dropout / BatchNorm if needed
   │       └── Track Metrics (accuracy, loss, F1, etc.)
   │
   ├──▶ 🧠 9. Testing & Generalization
   │       ├── Evaluate on Test Set
   │       └── Avoid overfitting → Regularization, dropout, early stop
   │
   └──▶ 🚀 10. Fine-Tune & Optimize
           ├── Tune hidden layer sizes
           ├── Try different activations/loss combos
           ├── Try batch sizes, LR, optimizer variations
           └── Profile training speed and performance


START
  |
  |--> What type of problem are you solving?
        |
        |--> 🧠 Classification?
        |        |
        |        |--> Binary? (e.g., cat vs dog)
        |        |       |
        |        |       |--> Output Layer: 1 neuron
        |        |       |--> Output Activation: Sigmoid
        |        |       |--> Loss Function: BCELoss / BCEWithLogitsLoss (if no sigmoid)
        |        |       |
        |        |       |--> Hidden Layers: ReLU / LeakyReLU
        |
        |        |--> Multi-class? (e.g., dog, cat, panda)
        |        |       |
        |        |       |--> Output Layer: n neurons (n = num classes)
        |        |       |--> Output Activation: None (logits)
        |        |       |--> Loss Function: CrossEntropyLoss (handles logits + softmax)
        |        |       |
        |        |       |--> Hidden Layers: ReLU / Swish / GELU
        |
        |        |--> Multi-label? (e.g., image with cat & dog)
        |                |
        |                |--> Output Layer: n neurons (n = labels)
        |                |--> Output Activation: Sigmoid
        |                |--> Loss Function: BCEWithLogitsLoss
        |                |
        |                |--> Hidden Layers: ReLU / ELU
        |
        |
        |--> 📏 Regression? (e.g., predict house price)
        |        |
        |        |--> Output Layer: 1 neuron (or more for multi-output)
        |        |--> Output Activation: None
        |        |--> Loss Function: MSELoss / MAELoss / HuberLoss
        |        |
        |        |--> Hidden Layers: ReLU / LeakyReLU / Swish
        |
        |
        |--> 🎯 Embedding / Feature Learning?
        |        |
        |        |--> Output Layer: n-d vector
        |        |--> Activation: Often None
        |        |--> Loss Function: TripletLoss / ContrastiveLoss / Custom
        |
        |        |--> Hidden Layers: Tanh / ReLU / GELU

  |
  |--> Choose Number of Hidden Layers
        |
        |--> Simple data? 1–2 layers
        |--> Tabular data? 2–5 layers
        |--> Complex data (images, text)? 5–50+ layers (CNNs, RNNs, Transformers)
  |
  |--> Choose Hidden Neurons (per layer)
        |
        |--> Start big, shrink down (e.g., 64 → 32 → 16)
        |--> Use trial & error or hyperparameter tuning
  |
  |--> Choose Hidden Activations
        |
        |--> Default: ReLU
        |--> Dead Neurons? → LeakyReLU / ELU
        |--> Smoother learning? → Swish / GELU
        |--> Avoid Sigmoid/Tanh in deep networks
  |
  |--> Optional Add-ons:
        |
        |--> BatchNorm → More stable training
        |--> Dropout → Prevent overfitting
        |--> Residuals / Skip Connections → For very deep nets
  |
  |--> Done! Train your model 🚀
