add pseudocode for CV

microsoft · WinstonLiyt · Feb 27, 2025 · Feb 27, 2025 · Feb 27, 2025 · Mar 3, 2025
commit bfaa601e5f0a208c2d00ce1df8c114fddaced58e
diff --git a/rdagent/components/coder/data_science/raw_data_loader/prompts.yaml b/rdagent/components/coder/data_science/raw_data_loader/prompts.yaml
@@ -273,6 +273,27 @@ spec:
           - The dataset returned by `load_data` is not pre-split. After calling `feat_eng`, split the data into training and test sets.
           - [Notice] Apply cross-validation (e.g. KFold) on the training set (`X_transformed`, `y_transformed`) to ensure a reliable assessment of model performance.
           - Keep the test set (`X_test_transformed`) unchanged, as it is only used for generating the final predictions.
+          - Pseudocode logic for reference:
+          ```
+          Set number of splits and initialize KFold cross-validator.
+
+          Create dictionaries for validation and test predictions.
+
+          For each model file:
+              Import the model dynamically.
+              Initialize arrays for out-of-fold (OOF) and test predictions.
+
+              For each fold in KFold:
+                  Split data into training and validation sets.
+                  Run model workflow to get validation and test predictions.
+                  Validate shapes.
+                  Store validation and test predictions.
+
+              Compute average test predictions across folds.
+              Save OOF and averaged test predictions.
+
+          Ensemble predictions from all models and print the final shape.
+          ```
 
         4. Submission File:
           - Save the final predictions as `submission.csv`, ensuring the format matches the competition requirements (refer to `sample_submission` in the Folder Description for the correct structure).