You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, CSVReader rejects all rows not the same size as the predetermined header row. This causes issues when parsing CSV files which are not quite up to spec.
Although it is possible to handle weird rows by creating a subclass of CSVReader and overriding CSVReader::bad_row_handler, that's kind of annoying.
Solution
CSVFormat will get a new method called allow_variable_lengths(false). CSVReader will then simply not perform row length checking until read_row() is called. This may even lead to performance improvements as the nested if/else branches in CSVReader::write_record will no longer be necessary.
For the default case (reject different length rows), CSVReader will behave as it has before, i.e. bad rows are tossed out and ignored with no user intervention.
Behavior for Variable Length Rows
If a user wants to keep rows of different length but still use CSVReader's format guessing ability, then when iterating over the read rows, then the library will provide a size() method (and potentially others such as is_weird_length(), is_shorter(), etc. so that the user can tell which rows are malformed.
Indexing Operator
If "foobar" is the name of the 16th column, and some malformed row has <16 columns, then row["foobar"] shall result in an error being thrown.
If a CSV mostly has 16 columns but some row has >16 columns, then the extra columns should only be retrieved using operator[](size_t) and not operator[](string). The CSVRow iterator should iterate through all entries of shorter and longer rows without crashing.
The text was updated successfully, but these errors were encountered:
Background
Currently,
CSVReader
rejects all rows not the same size as the predetermined header row. This causes issues when parsing CSV files which are not quite up to spec.Although it is possible to handle weird rows by creating a subclass of
CSVReader
and overridingCSVReader::bad_row_handler
, that's kind of annoying.Solution
CSVFormat
will get a new method calledallow_variable_lengths(false)
.CSVReader
will then simply not perform row length checking untilread_row()
is called. This may even lead to performance improvements as the nested if/else branches inCSVReader::write_record
will no longer be necessary.For the default case (reject different length rows),
CSVReader
will behave as it has before, i.e. bad rows are tossed out and ignored with no user intervention.Behavior for Variable Length Rows
If a user wants to keep rows of different length but still use
CSVReader
's format guessing ability, then when iterating over the read rows, then the library will provide asize()
method (and potentially others such asis_weird_length()
,is_shorter()
, etc. so that the user can tell which rows are malformed.Indexing Operator
If "foobar" is the name of the 16th column, and some malformed row has <16 columns, then
row["foobar"]
shall result in an error being thrown.If a CSV mostly has 16 columns but some row has >16 columns, then the extra columns should only be retrieved using
operator[](size_t)
and notoperator[](string)
. TheCSVRow
iterator should iterate through all entries of shorter and longer rows without crashing.The text was updated successfully, but these errors were encountered: