Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow file helper perform character set conversion #4322

Closed
gpoehl opened this issue Jul 15, 2014 · 10 comments
Closed

Allow file helper perform character set conversion #4322

gpoehl opened this issue Jul 15, 2014 · 10 comments
Assignees
Labels
status:to be verified Needs to be reproduced and validated.
Milestone

Comments

@gpoehl
Copy link

gpoehl commented Jul 15, 2014

Using non ascii characters in file or directory names leads to unwanted string conversion.

Allowing a character set conversion via

iconv("UTF-8", "ISO-8859-1//TRANSLIT", 'file_or_directory_name_with_chars_like_äöü')

could solve most of the pain.

File helper functions could be enhanced to accept the output charset as a parameter or via application config.

@samdark
Copy link
Member

samdark commented Jul 15, 2014

What's your use case?

@gpoehl
Copy link
Author

gpoehl commented Jul 15, 2014

I'm creating pdf reports based on user input and / or value of database fields. I know it's possible to use some (numeric) key as filenames and sustitute them later by stored strings. But then you can't use extensions which will display (and download) folder contents.

@samdark
Copy link
Member

samdark commented Jul 15, 2014

Most of nowaday FS are unicode-aware so characters like äöü are valid. I'm not sure why you want to convert these but it looks like application thing, not framework thing.

@samdark samdark closed this as completed Jul 15, 2014
@gpoehl
Copy link
Author

gpoehl commented Jul 15, 2014

@samdark, I don't know what exactly goes wrong but the following code

$dir = 'Test Characters aöü';
$directory = '../../'. $dir;
\yii\helpers\FileHelper::createDirectory($directory);

procuces such directory.
Test Characters aöü

Content of writtten files is ok.

I checked php.ini which has a default_charset = "UTF-8"
and apaches httpd.conf to
AddDefaultCharset UTF-8

Did I miss something in the configuration of my environment or is it required to convert non ascii characters?

@samdark samdark reopened this Jul 15, 2014
@samdark samdark added this to the 2.0 RC milestone Jul 15, 2014
@samdark samdark self-assigned this Jul 15, 2014
@qiangxue
Copy link
Member

I don't think this has anything to do with the file helper. The helper should not attempt to do charset conversion.

@samdark
Copy link
Member

samdark commented Jul 15, 2014

Yes. I just want to make sure it doesn't contain any surprises...

@samdark
Copy link
Member

samdark commented Jul 15, 2014

OK, verified that it's Windows issue and createDirectory itself doesn't make it worse. You can solve it multiple ways and it's up to you and your application to deal with it.

@samdark samdark closed this as completed Jul 15, 2014
@gpoehl
Copy link
Author

gpoehl commented Jul 15, 2014

Your' re right. It is a windows issue. Plain php mkdir or dos cmd mkdir has same result.

I searched a lot but the only solution I found is the charset conversion.

Would love to see how other developers solved this issue.

PS: I'm running a NTFS filsystem which has UTF-8 character set as default.

@samdark
Copy link
Member

samdark commented Jul 15, 2014

You have 2 ways of dealing with it:

  1. urlencode/urldecode if you want to use original file names.
  2. Your way but with intl's translit instead of iconv. We have Inflector::slug that should work for file names as well.

@gpoehl
Copy link
Author

gpoehl commented Jul 21, 2014

@samdark, thanks for your answer.
Just some comments which might help others facing the same problem.
PHP talks to the windows file system by ISO codepages but not by Unicode (Utf-8 or something close to it). A solution is expected for PHP version 6 - whenever this might be.

So there are basicly 3 choices.

  1. use only latin characers
  2. urlencode / decode when file- and folder names don't matter
  3. translate unicode strings to an appriopiate ISO Codepage.

A good explaination is given here:
http://stackoverflow.com/questions/1525830/how-do-i-use-filesystem-functions-in-php-using-utf-8-strings

The following code might help to investigate which kind of translation meets most of users needs. Combining them with file helper functions might be handy but I accept that this should handled by the developer.

$dir = 'Test characters € äöü';
            $dir = 'Test characters äöü';
            $dirs = [
                [$dir, 'string'],
                [utf8_decode($dir), 'utf8_decode'],           // fails on € sign
                [urlencode($dir), 'urlencode'],
                [urldecode($dir), 'urldecode'],
                [htmlentities($dir), 'htmlentities'],
                [iconv("UTF-8", "ISO-8859-1//TRANSLIT", $dir) , 'iconv'],
                [iconv("UTF-8", "ISO-8859-1//TRANSLIT", urldecode($dir)), 'iconv urldecoded'],
//              [mb_convert_encoding($dir), 'iso-8859-15'), 'mbConvert']     // mkdir errror
                [UConverter::transcode($dir, 'ISO-8859-1', 'UTF-8'), 'transcodeFromUtf8'],           // fails on € sign
                [transliterator_transliterate('Any-Latin', $dir), 'transliterate any latin'],
                [transliterator_transliterate('Any-Latin; Latin-ASCII', $dir), 'transliterate any latin and latin ascii'],
                [Inflector::slug($dir), 'Inflector Slug']
            ];
            foreach ($dirs as $key =>$dir) {
                $directory = '../../test/' . $key . ' - ' . $dir[0] .' --- ' .$dir[1];
                echo '<br>' . $directory;
                if (!is_dir($directory)) {
                    FileHelper::createDirectory($directory);
                }
            }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:to be verified Needs to be reproduced and validated.
Projects
None yet
Development

No branches or pull requests

3 participants