Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for configurable NULL encoding #14

Closed
dankasak opened this issue Jun 18, 2018 · 7 comments
Closed

Add support for configurable NULL encoding #14

dankasak opened this issue Jun 18, 2018 · 7 comments

Comments

@dankasak
Copy link

dankasak commented Jun 18, 2018

Current NULL encoding options are limited. It works for some cases - where upstream can handle what we produce. Other cases - eg MySQL 'load data infile' - is unable to correctly identify NULLs using our encoding method ( eg ,, ). The docs here:
http://search.cpan.org/~hmbrand/Text-CSV_XS-1.35/CSV_XS.pm#csv
... suggest you can produce output that databases can parse by doing:

while (my $row = $sth->fetch) {
  $csv->print ($fh, [ map { $_ // "\\N" } @$row ]);
  }

... but this is absolutely not the case. Given the data:

[ "blah", undef, 3 ]

... the required output for importing into MySQL or other DBs would be:

"blah",\N,3

... but the above hack instead gives us:

"blah","\\N",3

There are 2 problems with this:

  1. Text::CSV_XS is escaping the \N, giving us \\N. DBs won't parse this correctly.
  2. Text::CSV_XS is quoting the \\N. DBs won't parse this correctly either.

What we really need is a way to pass in any string sequence that can be used to encode a NULL value. Additionally, this string sequence should not be quoted.

@Tux
Copy link
Owner

Tux commented Jun 19, 2018

Sorry, but you did not mention one critical issue that causes the behavior you see, as in the default setup, it works exactly as documented:

$ perl -MText::CSV_XS -e'Text::CSV_XS->new->say(*STDOUT,[map{$_//"\\N"}"blah",undef,3])'
blah,\N,3

But I am sure you have setup your instance with { escape => "\\" }, which causes the need to escape the escape, and fields that have escapes in them are automatically quoted:

$ perl -MText::CSV_XS -e'Text::CSV_XS->new({escape=>"\\"})->say(*STDOUT,[map{$_//"\\N"}"blah",undef,3])'
blah,"\\N",3

I'll see if there can be a more explicit attribute to achieve this, as callbacks are not an option to make this combination work.

@Tux
Copy link
Owner

Tux commented Jun 19, 2018

Try a pull/clone from here. I added the undef_str attribute:

$ perl -Mblib -MText::CSV_XS -e'Text::CSV_XS->new({escape=>"\\",undef_str=>"\\N"})->say(*STDOUT,["blah",undef,3])'perl -Mblib -MText::CSV_XS -e'Text::CSV_XS->new({escape=>"\\",undef_str=>"\\N"})->say(*STDOUT,["blah",undef,3])'
blah,\N,3

See https://github.com/Tux/Text-CSV_XS/blob/master/doc/CSV_XS.md#undef_str

@dankasak
Copy link
Author

Wow, that's great! Thankyou so much for the fast response and patch.

@Tux
Copy link
Owner

Tux commented Jun 19, 2018

Feedback would be more than welcome BTW.
I saw it failed tests on 5.18.x and below, so I cannot release yet.
(and I just pushed a patch that allows UTF-8 values for undef_str, where (U+002205) springs to mind as useful :)

@Tux
Copy link
Owner

Tux commented Jun 20, 2018

@dankasak would you be so kind to pull/clone again? I changed a lot of code to make it work on 5.18.x and below. If my tests do not cover all issues, your code might.
Any other feedback for now?

@dankasak
Copy link
Author

I've pulled and re-tested, using a couple of options for supporting different databases. Looks good to me :)

@Tux
Copy link
Owner

Tux commented Jun 21, 2018

Thanks for the feedback. I'll start the big test-to-release process

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants