Skip to content

ws-garcia/CSVsniffer

Repository files navigation

CSVsniffer

DOI

Companion repository for the paper:

Detecting CSV File Dialects by Table Uniformity Measurement and Data Type Inference (PDF)

by W. García.

An application of the new methodology outlined in the paper can be found in the CSV interface repository.

Introduction

The results from the research can be reproduced by running the RunTests method from the macro-enabled Excel workbook CSVsniffer.xlsm. To review the results for CleverCSV it is necessary to run the scripts from the clevercsv_test.py file. The text files with the results output are stored in the Current research and cleverCSV folders

Data

The CSV folder contains the files copied from the Pollock framework and other collected test files. Also the dataset used for the CSV wrangling research is available in the CSV_Wranglin folder. Note that only link to the files can be provided, in this last case,due to the authors holds the copyright.

The expect configuration for each set CSV tested is saved in the Dialect_annotations.txtand Manual_dialect_annotation.txt files.

Requirements

Below are the requirements for reproducing the experiments.

  • Microsoft Office Excel.
  • CleverCSV and all its dependencies.

About

Robust CSV dialect detection methodology for VBA that outperforms existing state of the art solutions by roughly 10%.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages