This repository provides a method for automatically generating effective database descriptions when explicit descriptions are unavailable. The proposed method employs a dual-process approach: a coarse-to-fine process, followed by a fine-to-coarse process. Experimental results on the Bird benchmark indicate that using descriptions generated by the proposed improves SQL generation accuracy by 0.93% compared to not using descriptions, and achieves 37% of human-level performance. We support three common database dialects: SQLite, MySQL and PostgreSQL.
Read more: Arxiv
- python >= 3.9
You can install the required packages with the following command:
pip install -r requirements.txt
- Create a database connection.
Connect to SQLite:
import os
from sqlalchemy import create_engine
db_path = "path_to_sqlite"
abs_path = os.path.abspath(db_path)
db_engine = create_engine(f'sqlite:///{abs_path}')
- Set llama-index LLM.
Take dashscope as an example:
from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels
dashscope_llm = DashScope(model_name=DashScopeGenerationModels.QWEN_PLUS, api_key='YOUR API KEY HERE.')
- Generate the database description and build M-Schema.
from schema_engine import SchemaEngine
db_name = 'your_db_name'
comment_mode = 'generation'
schema_engine_instance = SchemaEngine(db_engine, llm=dashscope_llm, db_name=db_name,
comment_mode=comment_mode)
schema_engine_instance.fields_category()
schema_engine_instance.table_and_column_desc_generation()
mschema = schema_engine_instance.mschema
mschema.save(f'./{db_name}.json')
mschema_str = mschema.to_mschema()
print(mschema_str)
If you find our work helpful, feel free to give us a cite.
@article{description_generation,
title={Automatic database description generation for Text-to-SQL},
author={Yingqi Gao and Zhiling Luo},
year={2025},
eprint={2502.20657},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2502.20657},
}
@article{xiyansql,
title={A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL},
author={Yingqi Gao and Yifu Liu and Xiaoxia Li and Xiaorong Shi and Yin Zhu and Yiming Wang and Shiqi Li and Wei Li and Yuntao Hong and Zhiling Luo and Jinyang Gao and Liyu Mou and Yu Li},
year={2024},
journal={arXiv preprint arXiv:2411.08599},
url={https://arxiv.org/abs/2411.08599},
primaryClass={cs.AI}
}