Skip to content

camel-ai/backtranslated-tir

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Agent-Distilled Math Reasoning (TIR+CoT) Dataset

This dataset contains mathematical problems paired with both tool-integrated reasoning (TIR) traces and corresponding chain-of-thought (CoT) traces, distilled via agent-based pipelines. It is designed for fine-tuning large language models on step-by-step mathematical reasoning and tool-augmented problem solving.

Training Data

We generate SFT data based on multiple data sources to ensure diverse and challenging coverage across mathematical domains. The initial training set consolidates examples from several established benchmarks and are collected by ToRL (Li et al. (2025)), including:

  • NuminaMATH (Jia et al., 2024)
  • MATH (Hendrycks et al., 2021)
  • DeepScaleR (Luo et al., 2025)

To mitigate data leakage, we further clean the corpus by filtering out any training examples whose question text shares a repeated 10-gram subsequence with any question in our test sets. This deduplication step ensures a fair and reliable assessment of generalization performance. We ended up collecting 25,000 math problems in total. After the TIR trace filtering process with the Solver Agent, we obtain 11.6k TIR traces, with an overall accuracy around 46%.

Data Format

The dataset is split into two files: train_part1.jsonl and train_part2.jsonl. Each entry in these files is a JSON object with at least the following fields:

  • problem: The text of the math problem statement.
  • TIR trace: The tool-integrated reasoning (TIR) trace generated by the Solver Agent. This field contains the step-by-step, interleaved plan, tool calls, and intermediate reasoning as executed by the agent using external tools.
  • CoT trace: The solution trace, rephrased from the corresponding TIR trace by the Rephrase Agent. This field provides clear, step-by-step reasoning and is suitable for use as the target in supervised fine-tuning.

Intended Use

  • Supervised Fine-Tuning: The problem field can be used as the model input and CoT trace as the model output for training models to solve math problems with detailed reasoning.
  • Evaluation: The dataset can also be used to benchmark reasoning and solution generation capabilities of language models.

Example Entry

{
  "problem": "For each positive number x, let f(x) = ((x + 1/x)^6 - (x^6 + 1/x^6) - 2) / ((x + 1/x)^3 + (x^3 + 1/x^3)). Find the minimum value of f(x).",
  "TIR trace": "<message>\nStep-by-step plan:\n\n1. Simplify the expression: Use algebraic manipulation to rewrite the numerator and denominator in terms of a simpler variable, such as t = x + 1/x.\n2. Express powers in terms of t: Use the binomial theorem or known identities to express (x + 1/x)^n and x^n + 1/x^n in terms of t.\n3. Rewrite f(x) as a function of t: After substitution, f(x) becomes a rational function f(t).\n4. Find the minimum value of f(t): Use calculus (derivative) or algebraic methods to find critical points of f(t) in the domain t ≥ 2.\n\n...\n\nFinal answer: The minimum value is 6.\n</message>",
  "CoT trace": "Step 1: Introduce the substitution t = x + 1/x. Step 2: Express powers in terms of t using identities. Step 3: Rewrite f(x) in terms of t and simplify. Step 4: Analyze f(t) and find the minimum. Final answer: 6."
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published