Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new API for handling malformed rows #66

Closed
vincentlaucsb opened this issue Dec 13, 2019 · 1 comment
Closed

Implement new API for handling malformed rows #66

vincentlaucsb opened this issue Dec 13, 2019 · 1 comment
Assignees

Comments

@vincentlaucsb
Copy link
Owner

Background

Currently, CSVReader rejects all rows not the same size as the predetermined header row. This causes issues when parsing CSV files which are not quite up to spec.

Although it is possible to handle weird rows by creating a subclass of CSVReader and overriding CSVReader::bad_row_handler, that's kind of annoying.

Solution

CSVFormat will get a new method called allow_variable_lengths(false). CSVReader will then simply not perform row length checking until read_row() is called. This may even lead to performance improvements as the nested if/else branches in CSVReader::write_record will no longer be necessary.

For the default case (reject different length rows), CSVReader will behave as it has before, i.e. bad rows are tossed out and ignored with no user intervention.

Behavior for Variable Length Rows

If a user wants to keep rows of different length but still use CSVReader's format guessing ability, then when iterating over the read rows, then the library will provide a size() method (and potentially others such as is_weird_length(), is_shorter(), etc. so that the user can tell which rows are malformed.

Indexing Operator

If "foobar" is the name of the 16th column, and some malformed row has <16 columns, then row["foobar"] shall result in an error being thrown.

If a CSV mostly has 16 columns but some row has >16 columns, then the extra columns should only be retrieved using operator[](size_t) and not operator[](string). The CSVRow iterator should iterate through all entries of shorter and longer rows without crashing.

@vincentlaucsb
Copy link
Owner Author

Implemented by #80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant