Skip to content

trl-lab/open-domain-query-classification

Repository files navigation

Classifications of Open-Domain Queries for Tabular Data Analysis

This repository contains the code, prompts, and data used to build and evaluate the classifiers presented in the paper "Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis". The work was accepted to the AI for Tabular Data workshop at EurIPS 2025.

Usage

Setup

First, make sure to install the required dependencies. This repository provides requirements configurations that can be synced with pip or uv.

Install requirements with pip:

pip install -r requirements.txt

Or syncing the uv project:

uv sync

Since the classifiers leverage the OpenAI API, ensure you have your API key set up in your environment:

export OPENAI_API_KEY="your_openai_api_key"

Running Classifiers

The classifiers can be run in a step-by-step manner in the respective notebooks:

Analysis and Visualization

The results of the classifiers are analyzed in the notebook 03_Analysis.ipynb.

Organization of the Repository

Code for reproducing results: Find the main code in the jupyter notebooks in the root directory.

Shared Code: Find the shared code for processing, classification, and prompt management in the common/ directory.

Data: Find the input, development, and output data in the data/ directory.

Prompts: Find the prompt templates used for classification in the prompts/ directory. In this directory you can also find a full history of the prompt engineering process in the history subdirectories.

Citation

If you find this work useful in your research, please consider citing the following paper:

@inproceedings{gommAreWeAsking2025,
  title = {Are {{We Asking}} the {{Right Questions}}? {{On Ambiguity}} in {{Natural Language Queries}} for {{Tabular Data Analysis}}},
  shorttitle = {Are {{We Asking}} the {{Right Questions}}?},
  booktitle = {{{AI}} for {{Tabular Data}} Workshop at {{EurIPS}} 2025},
  author = {Gomm, Daniel and Wolff, Cornelius and Hulsebos, Madelon},
  year = 2025,
  url = {https://arxiv.org/abs/2511.04584}
}

About

Classification of open-domain queries in tabular insight extraction settings

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published