-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use super CSV for writing CSV files #416
Conversation
…essary quotes which makes the files much smaller
Any speed results here? |
@@ -42,11 +45,14 @@ public CsvTupleWriter(final OutputStream out) { | |||
* @param out the {@link OutputStream} to which the data will be written. | |||
*/ | |||
public CsvTupleWriter(final char separator, final OutputStream out) { | |||
csvWriter = new CSVWriter(new BufferedWriter(new OutputStreamWriter(out)), separator); | |||
final CsvPreference sepratorPreference = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reason to not expose the quote char and/or the eol char, etc., as options too? I guess right now CSV really means CSV but in the future I could imagine us customizing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should extend it when we need it.
Running: Super CSV:
Open CSV (master):
It looks like the variation is much larger for Open CSV in my small sample but it doesn't look significantly slower or faster. Whereas the file size for downloads can be significantly smaller. |
FYI: My experience on bigger dataset implies that Open CSV is faster than the Using java file scan to ingest twitter data (about 20GB) takes around 1hr. Although this is not a fair comparison, consider that freebase is much On Fri, Feb 21, 2014 at 8:18 PM, Dominik Moritz notifications@github.comwrote:
|
Use super CSV for writing CSV files
Instead of:
we will write