Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to ignore quote character #83

Closed
bangusi opened this issue Apr 5, 2020 · 5 comments
Closed

Add option to ignore quote character #83

bangusi opened this issue Apr 5, 2020 · 5 comments

Comments

@bangusi
Copy link

bangusi commented Apr 5, 2020

It appears the quote character defaults to '"' which works in many cases but I have ran into situations
where the file has no quote character specified. In such cases when '"' is encountered the parser produces incorrect results. It can even lead to program crash.

@vincentlaucsb
Copy link
Owner

What happens if you set the quote character to a random character?

CSVFormat format;
format.quote('#')

I know this fix is not ideal, but if it works then I could just allow setting nothing as the quote character.

@bangusi
Copy link
Author

bangusi commented Apr 9, 2020

Using some random character works and that is a workaround that I used.
Problem is you have to know that character does not appear in the file. .. trial and error
Also in my case I didn't discover this was a problem until the program kept producing wrong result and sometimes crashing. The file had double quote that was not terminating

I think it better to make the quote character optional.

@vincentlaucsb
Copy link
Owner

It is 100% my intention to make the quoting character optional.

Can you give me a sample file or snippet of a file that crashes the parser? I'm trying to see if there are ways to deal with this without the parser crashing.

@bangusi
Copy link
Author

bangusi commented Apr 9, 2020

I can't get the actual data file since I don't own the data.
But roughly it is like below.

column1~column2~column3~column4
value1~value2~"somevalue3~value4
value1~value2~somevalue3~value4
value1~value2~somevalue3~value4
value1~value2~somevalue3~value4
value1~value2~somevalue3~value4
value1~value2~"somevalue3~value4

It is a large file with over 2 million rows and more than 170 columns.
There is a large number of rows between the first occurrence of the quote and the next. The program would sometime combine content from multiple rows into a single token .i.e. spanning new lines.

I hope that helps.

@vincentlaucsb
Copy link
Owner

vincentlaucsb commented May 16, 2020

I've added the ability to turn off quoting in 1.3.3.

CSVFormat format
format.quote(false);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants