💡 In the world of data science, tables are one of the most fundamental data structures. Understanding and extracting meaningful insights from tabular data is crucial across various domains such as finance, healthcare, and marketing. This repository aims to be a comprehensive collection of resources, research papers, tools, and tutorials focused on Table Understanding.
✨ Awesome-Table-Understanding is a curated list of resources dedicated to the field of Table Understanding.
🔥 This project is currently under development. Feel free to ⭐ (STAR) and 🔭 (WATCH) it to stay updated on the latest developments.
If you notice any missing papers from the list, please feel free to email me or submit a pull request. I will gladly add it! Additionally, if you find any mis-categorized items, please let me know.
-
[SIGMOD'23] Table Discovery in Data Lakes: State-of-the-art and Future Direction
-
[ACL'23] Transformers for Tabular Data Representation: A Survey of Models and Applications.
-
[TWEB'24] DaCo: Matching Tabular Data to Knowledge Graph with Effective Core Column Set Discovery.
-
[Arxiv'24] HGT: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding.
-
[VLDB'24] ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models. [code]
-
[VLDB'24] Observatory: Characterizing Embeddings of Relational Tables. [code] ⭐ [Must Read]
-
[VLDB'24] Chorus: Foundation Models for Unified Data Discovery and Exploration.
-
[VLDB'24 TaDA] ALT-GEN: Benchmarking Table Union Search using Large Language Models.
-
[NAACL'24] TableLlama: Towards Open Large Generalist Models for Tables.
-
[ICDE'24] KGLink: A column type annotation method that combines knowledge graph and pre-trained language model. [code]
-
[SIGMOD'24] Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation. [code]
-
[SIGMOD'24] Table-GPT: Table Fine-tuned GPT for Diverse Table Tasks. [code]
-
[Arxiv'23] AdaTyper: Adaptive Semantic Column Type Detection.
-
[NIPS'23] HYTREL: Hypergraph-enhanced Tabular Data Representation Learning. [code]
-
[VLDB'23] RECA: Related Tables Enhanced Column Semantic Type Annotation Framework. [code]
-
[VLDB'23] DeepJoin: Joinable Table Discovery with Pre-trained Language Models. [code]
-
[VLDB'23] Starmie: Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning. [code]
-
[SIGMOD'23] Steered Training Data Generation for Learned Semantic Type Detection.
-
[SIGMOD'23] SANTOS: Relationship-based Semantic Table Union Search. [code]
-
[ICDE'23] Towards Explainable Table Interpretation Using Multi-view Explanations. [code]
-
[ACL'23 Findings] Automatic Table Union Search with Tabular Representation Learning.
-
[SIGMOD'22] DODUO: Annotating Columns with Pre-trained Language Models. [code] ⭐ [Must Read]
-
[NAACL'21] TABBIE: Pretrained Representations of Tabular Data. [code]
-
[WWW'21] TCN: Table Convolutional Network for Web Table Interpretation.
-
[VLDB'20] Sato: contextual semantic type detection in tables. [code] ⭐ [Must Read]
-
[VLDB'20] TURL: Table Understanding through Representation Learning. [code] ⭐ [Must Read]
-
[KDD'19] Sherlock: A deep learning approach to semantic data type detection.
-
WikiTables-TURL: Table Understanding through Representation Learning. [website]
-
GitTables: A Large-Scale Corpus of Relational Tables. [paper] [website]
-
SOTAB: Web Data Commons - Schema.org Table Annotation Benchmark. [website]
- 2nd International Workshop on Tabular Data Analysis (TaDA) [website]