Implement new API for handling malformed rows #66

vincentlaucsb · 2019-12-13T00:45:30Z

Background

Currently, CSVReader rejects all rows not the same size as the predetermined header row. This causes issues when parsing CSV files which are not quite up to spec.

Although it is possible to handle weird rows by creating a subclass of CSVReader and overriding CSVReader::bad_row_handler, that's kind of annoying.

Solution

CSVFormat will get a new method called allow_variable_lengths(false). CSVReader will then simply not perform row length checking until read_row() is called. This may even lead to performance improvements as the nested if/else branches in CSVReader::write_record will no longer be necessary.

For the default case (reject different length rows), CSVReader will behave as it has before, i.e. bad rows are tossed out and ignored with no user intervention.

Behavior for Variable Length Rows

If a user wants to keep rows of different length but still use CSVReader's format guessing ability, then when iterating over the read rows, then the library will provide a size() method (and potentially others such as is_weird_length(), is_shorter(), etc. so that the user can tell which rows are malformed.

Indexing Operator

If "foobar" is the name of the 16th column, and some malformed row has <16 columns, then row["foobar"] shall result in an error being thrown.

If a CSV mostly has 16 columns but some row has >16 columns, then the extra columns should only be retrieved using operator[](size_t) and not operator[](string). The CSVRow iterator should iterate through all entries of shorter and longer rows without crashing.

The text was updated successfully, but these errors were encountered:

vincentlaucsb · 2020-03-12T06:37:18Z

Implemented by #80

vincentlaucsb added the enhancement label Dec 13, 2019

vincentlaucsb self-assigned this Dec 13, 2019

vincentlaucsb added this to the Bug Fixes & Features milestone Dec 13, 2019

vincentlaucsb mentioned this issue Dec 13, 2019

try catch not working in CSVReader object if strict parsing is there #62

Closed

bangusi mentioned this issue Jan 24, 2020

This csv file crashes the parser #71

Closed

vincentlaucsb closed this as completed Mar 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new API for handling malformed rows #66

Implement new API for handling malformed rows #66

vincentlaucsb commented Dec 13, 2019

vincentlaucsb commented Mar 12, 2020

Implement new API for handling malformed rows #66

Implement new API for handling malformed rows #66

Comments

vincentlaucsb commented Dec 13, 2019

Background

Solution

Behavior for Variable Length Rows

Indexing Operator

vincentlaucsb commented Mar 12, 2020