Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: test whether CV is effective #649

Closed
wants to merge 10 commits into from
Next Next commit
test whether CV is effective
  • Loading branch information
WinstonLiyt committed Feb 27, 2025
commit 4dd24a102ecb7cbc21379c5bdfec7c8a51d5f772
Original file line number Diff line number Diff line change
@@ -271,7 +271,7 @@ spec:

3. Dataset Splitting
- The dataset returned by `load_data` is not pre-split. After calling `feat_eng`, split the data into training and test sets.
- If feasible, apply cross-validation on the training set (`X_transformed`, `y_transformed`) to ensure a reliable assessment of model performance.
- [Notice] Apply cross-validation on the training set (`X_transformed`, `y_transformed`) to ensure a reliable assessment of model performance.
- Keep the test set (`X_test_transformed`) unchanged, as it is only used for generating the final predictions.

4. Submission File:
Original file line number Diff line number Diff line change
@@ -109,6 +109,7 @@ workflow_eval:
[Note]
1. The individual components (data loading, feature engineering, model tuning, etc.) have already been evaluated by the user. You should only evaluate and improve the workflow code, unless there are critical issues in the components.
2. Model performance is NOT a concern in this evaluation—only correct execution and formatting matter.
3. As long as the execution does not exceed the time limit, ensure that the code uses cross-validation to split the training data and train the model. If cross-validation is not used, mention it in the execution section and set `final_decision` to `false`.

## Evaluation Criteria
You will be given the workflow execution output (`stdout`) to determine correctness.