Companion repository for the paper:
Detecting CSV File Dialects by Table Uniformity Measurement and Data Type Inference (PDF)
by W. García.
An application of the new methodology outlined in the paper can be found in the CSV interface repository.
The results from the research can be reproduced by running the RunTests
method from the macro-enabled Excel workbook CSVsniffer.xlsm
. To review the results for CleverCSV it is necessary to run the scripts from the clevercsv_test.py
file. The text files with the results output are stored in the Current research
and cleverCSV
folders
The CSV
folder contains the files copied from the Pollock framework and other collected test files. Also the dataset used for the CSV wrangling research is available in the CSV_Wranglin
folder. Note that only link to the files can be provided, in this last case,due to the authors holds the copyright.
The expect configuration for each set CSV tested is saved in the Dialect_annotations.txt
and Manual_dialect_annotation.txt
files.
Below are the requirements for reproducing the experiments.
- Microsoft Office Excel.
- CleverCSV and all its dependencies.