DVL: Decisive Vector Learning for Column Annotation

DVL is a deep-learning approach for column annotation with noisy labels, i.e. training DNN models with noisy labels and applying the trained model to annotate unseen tables with column labels such as name, address, etc. This is helpful for data security, data cleaning, schema matching, data discovery and data govergance. This repository provides data and source code to guide usage of DVL and replication of results in the paper ().

Dependencies and Installation

Install dependencies using pip install -r requirements.txt.
Download the extracted distributed representations of WebTables from SATO [http://sato-data.s3.amazonaws.com/tmp.zip]. The extracted feature files go to ./features.
To train a model, run 'train_test_DVL.py' with the path to the configs.

python train_test_DVL.py -c=./configs/sherlock+LDA.txt

For evaluation:

python train_test_DVL.py -c=./configs/sherlock+LDA.txt --multi_col_only=False --mode=eval --model_list=./results/type78/sherlock+LDA/DVL_pairflip_0.45_0.2.pt

Contact

To get help with problems using DVL or replicating our results, please submit a GitHub issue.

Acknowledgement

The code is based on SATO [https://github.com/megagonlabs/sato].

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
backbone		backbone
configs		configs
extract		extract
results/type78/sherlock+LDA		results/type78/sherlock+LDA
tools		tools
utils		utils
README.md		README.md
requirements.txt		requirements.txt
train_test_CRF.py		train_test_CRF.py
train_test_DVL.py		train_test_DVL.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backbone

backbone

configs

configs

extract

extract

results/type78/sherlock+LDA

results/type78/sherlock+LDA

tools

tools

utils

utils

README.md

README.md

requirements.txt

requirements.txt

train_test_CRF.py

train_test_CRF.py

train_test_DVL.py

train_test_DVL.py

Repository files navigation

DVL: Decisive Vector Learning for Column Annotation

Dependencies and Installation

Contact

Acknowledgement

About

Releases

Packages

Languages

xiaoboCBSR/DVL

Folders and files

Latest commit

History

Repository files navigation

DVL: Decisive Vector Learning for Column Annotation

Dependencies and Installation

Contact

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages