microsoft · WinstonLiyt · Apr 18, 2025
diff --git a/rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml b/rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml
@@ -212,3 +212,85 @@ output_format:
         }
       },
     }
+
+  hypothesis_v2: |-
+    {
+      "component": "The component name that the hypothesis {% if pipeline %}mainly {% endif %}focuses on. Must be one of ('DataLoadSpec', 'FeatureEng', 'Model', 'Ensemble', 'Workflow').",
+      "hypothesis": "Start with [Ablate with Trial xx]. A concise, testable statement derived from previous experimental outcomes. Limit it to one or two sentences that clearly specify the expected change or improvement in the <component>'s performance.",
+      "reason": "A brief explanation, also in one or two sentences, outlining the rationale behind the hypothesis. It should reference specific trends or failures from past experiments and explain how the proposed approach may address these issues.",
+    }
+
+
+
+
+hypo_task_gen:
+  system: |-
+    {% include "scenarios.data_science.share:scen.role" %}
+    The user is improving a Kaggle competition implementation iteratively through traces where each new trace is modified from the current SOTA in the trace, not necessarily the immediate predecessor.
+
+    You will be given a competition scenario, a description of the trace history, and the current SOTA implementation. You need to carefully analyze the past experiments and their corresponding results to determine which techniques are appropriate for the current competition context. Then, based on the current experiment, propose a new hypothesis (which must be grounded in previous shortcomings and designed to produce an ablation effect). Finally, formulate a detailed experimental task that aligns with the hypothesis and addresses the targeted areas for improvement.
+
+    ## Step 1: Step 1: Analyze Previous Experiments and Feedback
+    Carefully review prior experiments and their results; identify which techniques have been tried, which have failed, and which gaps remain.  
+    Do not output this analysis explicitly.
+
+    ## Step2: Hypothesis Proposal
+    ### Hypothesis Defination
+    A hypothesis is a precise, testable, and actionable statement that proposes a specific modification or improvement to address an identified problem in a Kaggle competition implementation. Be careful not to use vague phrases like "xx technique"—you must be specific and clearly state the exact method or approach being proposed.
+    {% if not pipeline %}
+    Each hypothesis should focus on one of the following 5 components of an implementation:
+    {{ component_desc }}
+    {% else %}
+    Although we don't require each hypothesis to focus on single specific component, you should still respond the main component that the hypothesis focus on.
+    Candidate components are:
+    {{ component_desc }}
+    {% endif %}
+
+    ### Hypothesis Specification
+    {{ hypothesis_spec }}
+
+    ### Hypothesis Output Format
+    {{ hypothesis_output_format }}
+
+    ## Step3: Task Design
+    ### Task Design Defination
+    Your first task is to generate new solution based on the proposed hypothesis. Your task should very detailed with specific steps and instructions. The task should be specific and fine-grained, avoiding general or vague statements.
+
+    ### Task Design Specification
+    {{ task_specification }}
+
+    ### Task Design Guidelines
+    The task should be concise with several steps each only in a few sentences.
+    DO NOT repeat the details which has already included in the SOTA code. If the SOTA code has covered the steps perfectly, you should not repeat the steps in detail.
+    DO NOT write any code in the task description!
+    Observing reasons from failed experiments and feedback to prevent repeating similar mistakes in analogous situations.
+
+    ### [Partial Response Format 1] Task Output Format:
+    {{ task_output_format }}
+
+    {% if workflow_check %}
+    ## Step 4: Workflow Update
+    Since components have dependencies, your second task is to update the workflow to reflect the changes made to the target component. Please also decide whether the workflow needs to be updated and provide a brief description of the change task.
+    {{ component_desc }}
+    [Partial Response Format 2] Your generated workflow description should be a simple text and the following agent will do the implementation. If you think the workflow should not be updated, just respond with "No update needed".
+    {% endif %}
+
+    Your final output should strictly adhere to the following JSON format. 
+    {
+      "hypothesis": ---he dict corresponding to Hypothesis Output Format---,
+      "task_design": ---The dict corresponding to Task Output Format---,
+      {% if workflow_check %}"workflow_update": ---A string corresponding to workflow description--- {% endif %}
+    }
+
+  user: |-
+    # Scenario Description
+    {{ scenario_desc }}
+
+    # Previous Experiments and Feedbacks:
+    {{ exp_and_feedback_list_desc }}
+
+    # Current SOTA Implementation
+    {{ sota_exp_desc }}
+
+    # Feedback from Previous Failed Experiments (e.g., experiments that did not pass evaluation, encountered bugs, or failed to surpass SOTA performance):
+    {{ failed_exp_and_feedback_list_desc }}
diff --git a/rdagent/scenarios/data_science/proposal/exp_gen/proposal.py b/rdagent/scenarios/data_science/proposal/exp_gen/proposal.py
@@ -354,6 +354,8 @@ def task_gen(
             task_spec = sota_exp.experiment_workspace.file_dict[component_info["spec_file"]]
         else:
             task_spec = T(f"scenarios.data_science.share:component_spec.{hypothesis.component}").r()
+
+
         sys_prompt = T(".prompts_v2:task_gen.system").r(
             targets=component_info["target_name"],
             task_specification=task_spec,
@@ -430,6 +432,27 @@ def gen(self, trace: DSTrace, pipeline: bool = False) -> DSExperiment:
             type="failed",
         )
 
+        if pipeline:
+            component_info = COMPONENT_TASK_MAPPING["Pipeline"]
+        else:
+            component_info = COMPONENT_TASK_MAPPING.get(hypothesis.component)
+
+
+        sys_prompt = T(".prompts_v2:hypo_task_gen.system").r(
+            component_desc=component_desc,
+            hypothesis_spec=T(".prompts_v2:specification.hypothesis").r(),
+            hypothesis_output_format=T(".prompts_v2:output_format.hypothesis_v2").r(pipeline=pipeline),
+            task_specification=task_spec,
+            task_output_format=component_info["task_output_format"],
+            workflow_check=not pipeline and hypothesis.component != "Workflow",
+        )
+        user_prompt = T(".prompts_v2:hypo_task_gen.user").r(
+            scenario_desc=scenario_desc,
+            exp_feedback_list_desc=exp_feedback_list_desc,
+            sota_exp_desc=sota_exp_desc,
+            failed_exp_and_feedback_list_desc=failed_exp_feedback_list_desc,
+        )
+
         # Step 1: Identify problems
         scen_problems = self.identify_scenario_problem(
             scenario_desc=scenario_desc,