#### 1. **Reading CSVs and Feature Extraction**
- Reading the features CSV: O(n·d), where n = number of samples, d = number of features per sample.
- Column selection and checking: O(d).

#### 2. **Shapiro-Wilk Normality Test**
- Outer loop: 10 classes (constant, call it c).
- Inner loop: d features.
- For each, `shapiro` runs on up to n_c samples (n_c ≤ n), but in practice, c ≪ n, d ≪ n, so this is **O(d·c·k)** where k is average samples per class (k = n/c). Thus, O(n·d).

#### 3. **Kruskal-Wallis Test**
- Loop over d features.
- For each, groupby by class (O(n)), then `kruskal` compares c groups (each ~k samples).
- Complexity per feature: O(n) (to create groups), total O(n·d).

#### 4. **Category Means and Overall Means**
- Groupby and mean over n samples and d features: O(n·d).

#### 5. **Scatter Matrices (Within- and Between-class)**
- Within-class: For each class (c), difference and dot products over d features for each of n samples: O(n·d^2).
- Between-class: For each class, difference and dot products over d: O(c·d^2).

#### 6. **Fisher's Discriminant Ratio**
- Matrix inversion of d×d matrix: O(d^3).
- Matrix multiplication: O(d^3).
- For small d (few features), this is fast; for large d, this can dominate.

#### 7. **AUC Calculation**
- For each class (c) and feature (d): computes AUC over n samples.
- O(c·d·n).

#### 8. **Filtering Classes with <2 Samples**
- Value counts and masking: O(n).

#### 9. **Train/Test Split and Random Forest**
- `train_test_split`: O(n).
- Training RandomForestClassifier: Let t be the number of trees, m samples for training, f features per tree, depth h. Empirically, each tree is O(f·m·log m), so total O(t·f·m·log m).
- Prediction: O(t·f·m_test·log m), m_test is test set size.

#### 10. **Metrics, Report, DataFrame Export**
- All metrics: O(n).
- Exporting CSVs: O(n·d).

#### 11. **KDE and Boxplots**
- For each feature (d), plotting KDE/boxplot:
    - Seaborn's `kdeplot` is O(n) for that feature.
    - So, O(n·d) for all plots.
- Saving plots: O(1) per plot.

#### **Overall Complexity**
- **O(n·d + d^3 + t·f·m·log m)**, where
    - n: samples
    - d: features
    - t: trees (random forest)
    - f: features per tree (usually d or sqrt(d))
    - m: train set size

- For small d (common in feature extraction), script is dominated by the number of samples and forest training.
- For large d (high-dimensional data), Fisher's discriminant and scatter matrices can dominate.

#### **Summary Table**

| Step                   | Big O             |
|------------------------|-------------------|
| CSV Reading            | O(n·d)            |
| Shapiro/Kruskal        | O(n·d)            |
| Means/Groupby          | O(n·d)            |
| Scatter Matrices       | O(n·d^2)          |
| Fisher Ratio           | O(d^3)            |
| AUC                    | O(n·d)            |
| Filtering              | O(n)              |
| Train/Test Split       | O(n)              |
| Random Forest Train    | O(t·f·m·log m)    |
| Metrics/Export         | O(n·d)            |
| KDE/Boxplots           | O(n·d)            |

- **Dominant terms for practical cases**: O(n·d) or O(t·f·m·log m).

