There are 3 commits in my branch with associated tests to handle various scenarios when the data values within a file contain carriage return characters.
The first caters for the quoted data values containing the row separator carriage return character (which previously resulted in the CSV::MalformedCSVError). When reading the lines of data if the string contains an uneven number of quote characters then the content of the next line is added to the current line string.
The second commit caters for carriage return characters within the data when guessing line endings by ignoring those characters contained within quote characters. I made this change to resolve an issue whereby a file contained more quoted data carriage return characters than line ending characters.
The third commit caters for double carriage return characters when removing empty values so that such values are not deemed to be empty.
Handle presence of row separator carriage return character within csv…
Cater for carriage return characters within csv data when guessing li…
Cater for double carriage return characters when removing empty values
thanks, I'll have a look
What is the source of your CSV-files? I'd argue that the source program contains a bug if it writes carriage return characters other than at the end-of-line.
Thank you for sharing your modifications, but this looks too much like a rare corner-case to me. I'm not sure if many people would benefit from this.
There will always be quoted carriage return characters present within the file if a cell value consists of multi-line text. I'd argue that was a common occurrence if you're attempting to read files containing description field data.
As per wikipedia on "Basic Rules and Examples": -
"A record ends at a line terminator. However, line-terminators can be embedded as data within fields, so software must recognize quoted line-separators (see below) in order to correctly assemble an entire record from perhaps multiple lines."
OK, thanks for the input!
This is a rather common occurrence when dealing with any data that contains something along the lines of 'notes' that users could enter regarding the other data fields.
I agree that this would be a nice feature, specially since FasterCSV handle this properly and losing that feature is frustrating.
Merge pull #31 from @chrismhilton to support carriage returns
I've temporarily forked and merged for my own use
@tilo These features seem well worth adding.
I was in desperate need for this feature.
I had thought of abandoning smarter_csv until I found this patch.
I've now incorporated it in a monkey-patching way into my smarter_csv.
@chrismhilton thanks for your contribution! nice work! sorry I didn't have time to look at this project for a while.
@sunito @wyaeld I'm merging this into the project and will release a new version
merging pull request for issue #31, reving-up to 1.0.18
@wyaeld @chrismhilton @sunito @robly Sorry for the delay! It's been super-busy at work :-P