Skip to content

xiaoboCBSR/DVL

Repository files navigation

DVL: Decisive Vector Learning for Column Annotation

DVL is a deep-learning approach for column annotation with noisy labels, i.e. training DNN models with noisy labels and applying the trained model to annotate unseen tables with column labels such as name, address, etc. This is helpful for data security, data cleaning, schema matching, data discovery and data govergance. This repository provides data and source code to guide usage of DVL and replication of results in the paper ().

Dependencies and Installation

  1. Install dependencies using pip install -r requirements.txt.
  2. Download the extracted distributed representations of WebTables from SATO [http://sato-data.s3.amazonaws.com/tmp.zip]. The extracted feature files go to ./features.
  3. To train a model, run 'train_test_DVL.py' with the path to the configs.
python train_test_DVL.py -c=./configs/sherlock+LDA.txt
  1. For evaluation:
python train_test_DVL.py -c=./configs/sherlock+LDA.txt --multi_col_only=False --mode=eval --model_list=./results/type78/sherlock+LDA/DVL_pairflip_0.45_0.2.pt

Contact

To get help with problems using DVL or replicating our results, please submit a GitHub issue.

Acknowledgement

The code is based on SATO [https://github.com/megagonlabs/sato].

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages