Skip to content

wbsg-uni-mannheim/wdc-sotab

Repository files navigation

sotab-benchmark

This repository contains the code for recreating the Schema.org Table Annotation Benchmark .

Schema.org Table Corpus

SOTAB is created based on the Schema.org Table Corpus . To run the code for creating SOTAB, all zip files from the top100 and minimum3 subsets of Schema.org Table Corpus need to be downloaded and put in the directory: data/stc_zip_files/

Run download.sh to download processed datasets for the VizNet corpus. It will also create data directory.

$ bash download.sh

SOTAB creation

To create the SOTAB datasets for Column Type Annotation and Column Property Annotation the notebooks need to be run in the order stated below:

  1. Language Detection
  2. MatchColumnNamesToSchema.org
  3. Expand properties-CreateTables
  4. AnnotatingTables
  5. TableSelection-CPA
  6. Different-Formats-CPA
  7. RandomColumns-CPA
  8. TableSelection-CTA
  9. CreatingSplits-CPA
  10. CreatingSplits-CTA

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published