Classifications of Open-Domain Queries for Tabular Data Analysis

This repository contains the code, prompts, and data used to build and evaluate the classifiers presented in the paper "Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis". The work was accepted to the AI for Tabular Data workshop at EurIPS 2025.

Usage

Setup

First, make sure to install the required dependencies. This repository provides requirements configurations that can be synced with pip or uv.

Install requirements with pip:

pip install -r requirements.txt

Or syncing the uv project:

uv sync

Since the classifiers leverage the OpenAI API, ensure you have your API key set up in your environment:

export OPENAI_API_KEY="your_openai_api_key"

Running Classifiers

The classifiers can be run in a step-by-step manner in the respective notebooks:

01_DataPrivilegeClassification.ipynb: Classifies whether a query requires privileged data access.
02_QuerySpecificationClassification.ipynb: Classifies the specification of the query.

Analysis and Visualization

The results of the classifiers are analyzed in the notebook 03_Analysis.ipynb.

Organization of the Repository

Code for reproducing results: Find the main code in the jupyter notebooks in the root directory.

Shared Code: Find the shared code for processing, classification, and prompt management in the common/ directory.

Data: Find the input, development, and output data in the data/ directory.

Prompts: Find the prompt templates used for classification in the prompts/ directory. In this directory you can also find a full history of the prompt engineering process in the history subdirectories.

Citation

If you find this work useful in your research, please consider citing the following paper:

@inproceedings{gommAreWeAsking2025,
  title = {Are {{We Asking}} the {{Right Questions}}? {{On Ambiguity}} in {{Natural Language Queries}} for {{Tabular Data Analysis}}},
  shorttitle = {Are {{We Asking}} the {{Right Questions}}?},
  booktitle = {{{AI}} for {{Tabular Data}} Workshop at {{EurIPS}} 2025},
  author = {Gomm, Daniel and Wolff, Cornelius and Hulsebos, Madelon},
  year = 2025,
  url = {https://arxiv.org/abs/2511.04584}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
common		common
data		data
prompts		prompts
.gitignore		.gitignore
.python-version		.python-version
01_DataPrivilegeClassification.ipynb		01_DataPrivilegeClassification.ipynb
02_QuerySpecificationClassification.ipynb		02_QuerySpecificationClassification.ipynb
03_Analysis.ipynb		03_Analysis.ipynb
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Classifications of Open-Domain Queries for Tabular Data Analysis

Usage

Setup

Running Classifiers

Analysis and Visualization

Organization of the Repository

Citation

About

Uh oh!

Releases

Packages

Languages

License

trl-lab/open-domain-query-classification

Folders and files

Latest commit

History

Repository files navigation

Classifications of Open-Domain Queries for Tabular Data Analysis

Usage

Setup

Running Classifiers

Analysis and Visualization

Organization of the Repository

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages