Skip to content

A method and corresponding code for automatic description generation for Text-to-SQL

License

Notifications You must be signed in to change notification settings

XGenerationLab/XiYan-DBDescGen

Repository files navigation

Automatic database description generation for Text-to-SQL

Important Links

🤖Arxiv | 📖XiYan-SQL |

Introduction

This repository provides a method for automatically generating effective database descriptions when explicit descriptions are unavailable. The proposed method employs a dual-process approach: a coarse-to-fine process, followed by a fine-to-coarse process. Experimental results on the Bird benchmark indicate that using descriptions generated by the proposed improves SQL generation accuracy by 0.93% compared to not using descriptions, and achieves 37% of human-level performance. We support three common database dialects: SQLite, MySQL and PostgreSQL.

Read more: Arxiv

image

Requirements

  • python >= 3.9

You can install the required packages with the following command:

pip install -r requirements.txt

Quick Start

  1. Create a database connection.

Connect to SQLite:

import os
from sqlalchemy import create_engine

db_path = "path_to_sqlite"
abs_path = os.path.abspath(db_path)
db_engine = create_engine(f'sqlite:///{abs_path}')
  1. Set llama-index LLM.

Take dashscope as an example:

from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels
dashscope_llm = DashScope(model_name=DashScopeGenerationModels.QWEN_PLUS, api_key='YOUR API KEY HERE.')
  1. Generate the database description and build M-Schema.
from schema_engine import SchemaEngine

db_name = 'your_db_name'
comment_mode = 'generation'
schema_engine_instance = SchemaEngine(db_engine, llm=dashscope_llm, db_name=db_name,
                                      comment_mode=comment_mode)
schema_engine_instance.fields_category()
schema_engine_instance.table_and_column_desc_generation()
mschema = schema_engine_instance.mschema
mschema.save(f'./{db_name}.json')
mschema_str = mschema.to_mschema()
print(mschema_str)

Citation

If you find our work helpful, feel free to give us a cite.

@article{description_generation,
      title={Automatic database description generation for Text-to-SQL}, 
      author={Yingqi Gao and Zhiling Luo},
      year={2025},
      eprint={2502.20657},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2502.20657}, 
}

@article{xiyansql,
      title={A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL}, 
      author={Yingqi Gao and Yifu Liu and Xiaoxia Li and Xiaorong Shi and Yin Zhu and Yiming Wang and Shiqi Li and Wei Li and Yuntao Hong and Zhiling Luo and Jinyang Gao and Liyu Mou and Yu Li},
      year={2024},
      journal={arXiv preprint arXiv:2411.08599},
      url={https://arxiv.org/abs/2411.08599},
      primaryClass={cs.AI}
}

About

A method and corresponding code for automatic description generation for Text-to-SQL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages