# Neuron networks from scratch in Python
References: http://103.203.175.90:81/fdScript/RootOfEBooks/E%20Book%20collection%20-%202024%20-%20G/CSE%20%20IT%20AIDS%20ML/Neural%20Network.pdf

## Chapter 12: Validation data

### Hyperparameter Tuning and Validation – Summary

- **Do not use the test dataset for hyperparameter tuning**  
  → It leads to overfitting and biases the model.

- **Test dataset usage**  
  → Only for final performance evaluation, never for model adjustment.

- **Use a validation dataset for tuning**  
  → A separate dataset created from the training data.  
  → Helps find the best hyperparameters without touching the test data.

- **If limited data is available**:
  1. **Split training data temporarily**  
     → Use part of training data for validation, then retrain using all training data.
  2. **Use cross-validation**  
     → Split training data into k parts.  
     → Train on (k−1) parts, validate on the remaining.  
     → Repeat k times, changing the validation part each time (k-fold cross-validation).

- **Cross-validation benefits**  
  → No training data is wasted.  
  → More reliable validation.  
  → Especially useful for small datasets.

- **Hyperparameter search process**:
  - Loop over different sets of parameters.
  - Avoid checking all combinations unless training is fast.
  - Try likely good settings first, refine iteratively.
---

## Chapter 13: Training data

### Summary of Data Preprocessing for Neural Networks

**Purpose of Preprocessing**
- Neural networks perform better when input values are scaled to a small range, typically between -1 to 1 or 0 to 1.
- Centering the data (around 0) helps reduce weight bias and improves training stability.
- Prevents issues like floating-point overflow and helps control training dynamics.

**Preprocessing Requirements**
- The same preprocessing must be applied to:
  - Training data  
  - Validation data  
  - Testing data  
  - Prediction data  
- Use the **same scaler** for all datasets to ensure consistency.

**Example of Scaling**
- For image data with values from 0 to 255:
  - Divide by 255 → scales to range [0, 1]
  - Subtract 127.5 and divide by 127.5 → scales to range [-1, 1]

**Important Considerations**
- Slightly out-of-range values (just above 1 or below -1) are usually not problematic.
- Use only training data to determine scaling parameters to avoid data leakage.
- Save the scaler and apply it during future predictions.

**Data Augmentation**
- Used when training data is limited.
- Common methods: rotation, cropping, flipping (especially for images).
- Augmentations should reflect real-world variations relevant to the task.
- Inappropriate augmentation (e.g., rotating road signs for traffic models) can harm performance.
