Skip to content
This repository has been archived by the owner on Jan 8, 2021. It is now read-only.

Third argument of mb_convert_encoding() can be an array #51

Closed
leofeyer opened this issue Oct 7, 2015 · 5 comments
Closed

Third argument of mb_convert_encoding() can be an array #51

leofeyer opened this issue Oct 7, 2015 · 5 comments

Comments

@leofeyer
Copy link
Contributor

leofeyer commented Oct 7, 2015

According to the PHP manual, the third argument of mb_convert_encoding() can be either a comma separated list or an array. The Mbstring class only handles the first case.

@nicolas-grekas
Copy link
Contributor

can you please provide a patch?

@leofeyer
Copy link
Contributor Author

leofeyer commented Oct 7, 2015

I have already tried to fix it but it does not seem to be trivial. Here are two examples where the conversion is no problem:

// ISO-8859-1 to UTF-8
mb_convert_encoding(utf8_decode('déjà'), 'UTF-8', 'ISO-8859-1');

// ISO-2022-JP to UTF-8
mb_convert_encoding(mb_convert_encoding('漢字', 'ISO-2022-JP', 'UTF-8'), 'UTF-8', 'ISO-2022-JP');

We are using mb_convert_encoding() to convert file and folders names, which can be encoded in various ways all over the world. Therefore, we do not know the exact charset.

The PHP mbstring extension supports the following in this case:

// <unknown> to UTF-8
mb_convert_encoding($filename, 'UTF-8', 'ASCII,ISO-2022-JP,UTF-8,EUC-JP,ISO-8859-1');

// the third argument can also be an array
mb_convert_encoding($filename, 'UTF-8', array('ASCII', 'ISO-2022-JP', 'UTF-8', 'EUC-JP', 'ISO-8859-1'));

This still works fine with our two test cases from above:

// ISO-8859-1 to UTF-8
mb_convert_encoding(utf8_decode('déjà'), 'UTF-8', 'ASCII,ISO-2022-JP,UTF-8,EUC-JP,ISO-8859-1');

// ISO-2022-JP to UTF-8
mb_convert_encoding(mb_convert_encoding('漢字', 'ISO-2022-JP', 'UTF-8'), 'UTF-8', 'ASCII,ISO-2022-JP,UTF-8,EUC-JP,ISO-8859-1');

But it does not work with the compatibility layer:

1) Patchwork\Tests\PHP\Shim\MbstringTest::testmb_convert_encoding
iconv(): Wrong charset, conversion from `ascii,iso-2022-jp,utf-8,euc-jp,iso-8859-1' to `utf-8//IGNORE' is not allowed

Any idea how to fix this?

@leofeyer
Copy link
Contributor Author

leofeyer commented Oct 8, 2015

I have found a proper solution (see #52).

@nicolas-grekas
Copy link
Contributor

@leofeyer note that HHVM doesn't accept an array as last argument.

@leofeyer
Copy link
Contributor Author

Does it accept a comma separated list?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants