Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance Inflector helper with ascii function #464

Merged
merged 15 commits into from
Dec 28, 2013

Conversation

tonydspaniard
Copy link
Contributor

closes #364

Edit:
Seems to me that transliteration without PECL or the new Transliterator class (PHP >= 5.4), is something that a simple array won't do.

I think is ok on the first round, but if we wish to adopt transliteration the right way, we should consider some good rework on this matter.

References:

https://drupal.org/project/transliteration
http://php.net/manual/es/transliterator.transliterate.php
https://doc.wikimedia.org/mediawiki-core/master/php/html/UtfNormal_8php_source.html

* @param array $replace the characters to be replaced by spaces.
* @return string the translated
*/
public static function ascii($string, $replace = array())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this $replace because it does something that has nothing to do with ascii.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is true, but when you translate something like: I'll be back, you get Ill be back which will make a wrong slug: Ill-be-back instead of I-ll-be-back if we use replacement tokens

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why I'll be back becomes Ill-be-back? Isn't ' an ASCII char already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I get your point, that should be work of the slug function.

Antonio Ramirez added 4 commits June 1, 2013 17:25
* upstream:
  Fixed build break.
  Fixes issue yiisoft#194: Added Application::catchAll.
  added missing default to getValue in boostrap tabs
  \yii\widgets\Menu improvement
  Fixes issue yiisoft#467: allow view file to be absent as long as the themed version exists.
  better auto scrolling.
  Fixes issue yiisoft#446: automatically scroll to first error.
*/
public static $transliteration = array(
protected static $transliteration = array(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why turning this into protected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo they should all be customizable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also transliteration prop have incorrect map. Please check this http://iamseanmurphy.com/creating-seo-friendly-urls-in-php-with-url-slug/ .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have inspired my self on that link, how come is incorrect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonydspaniard For example there is no rule for cyrilic а. Also for example rule '/ъ|ь|Ъ|Ы|Ь/' => '' incorrect because Ы should be Y. If you inspired by this link we should take these rules carefully ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonydspaniard I think its good idea to make some kind of converter to automatic convert rules from link to this format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @cebe, working on this

* upstream/master: (91 commits)
  fixed init.bat paths
  renamed backstage → backend
  Fixed test break.
  Response WIP
  coding style fix.
  [1] Redone missing code. [2] Added empty line at the end of file [3] Removed exception.
  Updated code style. braces on the same line for control statements.
  Removed unused columsn from find constraint sql. Fixed typo. Added extra schema check for when a foreign table is not in the same schema. Updated indentation to conform to other classes.
  Multilevel Items
  Fixes issue yiisoft#514.
  Removed the config setting that should not have been commited.
  Removed false exception catching.
  Removed custom pgsql PDO and added defaultSchema as public property.
  CS fixes.
  Minor refactoring of DbMessageSource.
  Response WIP
  Minor refactoring of t().
  Added Jsonable support.
  Fixes yiisoft#13. Implement DB message source.
  Yii::t() minor fix.
  ...
@cebe
Copy link
Member

cebe commented Oct 22, 2013

@tonydspaniard as we are on PHP 5.4 now, can you check if/whats obsolete here now?

@samdark
Copy link
Member

samdark commented Oct 23, 2013

@cebe transliteration is now available in 5.4 intl: http://www.php.net/manual/en/class.transliterator.php

@iJackUA
Copy link
Contributor

iJackUA commented Oct 23, 2013

@samdark , @cebe it is availabe in Intl, but as I understand it is supposed that Intl could be not installed (as for Messages you are making a fallback functionality) so it seems that fallback for translitaration is also (still) needed.

@samdark
Copy link
Member

samdark commented Oct 23, 2013

We're creating a fallback for English only where transliteration isn't needed.

@iJackUA
Copy link
Contributor

iJackUA commented Oct 23, 2013

I understand it, so why not to make slug function with both options something like

    public static function slug($string, $replacement = '-')
    {
        if(extension_loaded(intl)) {

            $options = "Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove;";
            //best options should be evaluated, just a quick copy-paste            
            $string = transliterator_transliterate($options, $string);
            $string = preg_replace('/[-\s]+/', '-', $string);
            return trim($string, '-');

        } else {
        $map = static::$transliteration + [
                '/[^\w\s]/' => ' ',
                '/\\s+/' => $replacement,
                '/(?<=[a-z])([A-Z])/' => $replacement . '\\1',
                str_replace(':rep', preg_quote($replacement, '/'), '/^[:rep]+|[:rep]+$/') => ''
            ];
        return preg_replace(array_keys($map), array_values($map), $string);
        }
    }

@tonydspaniard
Copy link
Contributor Author

@cebe back on track with this, will review again all what is needed and what is not. @iJackUA fallback proposal seems good to me.

* upstream: (2012 commits)
  doc fix.
  Changed the signature of `urlCreator` and button creators for `yii\gridview\ActionColumn`
  parameter adjustment.
  The signature for `yii\gridview\ActionColumn::urlCreator` is changed - the `$action` parameter is moved to the first
  Fixed the signature of Schema::findUniqueIndexes().
  reverted yiisoft#1598 and added a test for it
  Fix wrong array index in unique indexes for MySql
  Making accesors public
  Get DB unique indexes information
  Fixes yiisoft#1610: `Html::activeCheckboxList()` and `Html::activeRadioList()` will submit an empty string if no checkbox/radio is selected
  Gii should keep horizontal layout
  Documentation at "yii\authclient" updated.
  Doc comments at "yii\authclient" updated.
  Auth clients "Choice" doc comments updated.
  Auth clients "Choice" widget javascript advanced.
  Bootstrap's dropdown encodes also trailing caret tag
  Auth clients "Choice" widget markup updated.
  Gii should keep horizontal layout
  extended from codeception testcase, added docs
  Auth clients for Facebook, GitHub, LinkedIn added.
  ...

Conflicts:
	framework/yii/helpers/BaseInflector.php
	tests/unit/framework/helpers/InflectorTest.php
@tonydspaniard
Copy link
Contributor Author

After reviewing and checking lots of sources out there, I couldn't find a really good and simple method to transliterate text. iconv fails on cyrillic so I thought about Unicode char code replacements... The best solution so far was the one created on Drupal that was based on UtfNormal from media wiki.

I have added the TransliteratorHelper class for you guys to review it. It could be that the data to transliterate is at the wrong location transliteration/data or is too much for the framework. Please, any feedback is greatly appreciated @yiisoft/core-developers

Edit: TransliteratorHelper fallsback into intl extension if installed.

@qiangxue
Copy link
Member

I just reviewed the code. I think it's a bit too much including transliteration/data for fallback purpose. The situation is a bit similar to the message formatting where we use intl if possible and the fallback just makes things work but not perfectly correct.

It seems to me @iJackUA's approach is good enough.

What do you think?

@tonydspaniard
Copy link
Contributor Author

Yeah... I thought about transliteration/data was too much, but was the best approach so far. I will rollback to previous and use @iJackUA's approach updating $transliteration values to fit the replacements as good as possible according to UTF-8 tables referred before. We can always update those values.

Thanks @qiangxue

@iJackUA
Copy link
Contributor

iJackUA commented Dec 27, 2013

if I understand correctly $transliteration currently will work only with Latin character
It does not contain Cyrillic for example - so it returns nothing (empty string) if I will do something like ...::slufigy('Привет Йии фрэймворк')
and

$options = "Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove;";
$string = transliterator_transliterate($options, $string);

makes much more right now - it transliterate any language (at least I have tried with Russian) to Latin characters

in my test with string
Привет Hello Йии Framework ! Как дела ? How it goes ?
I have such output (the first one is expected and considered correct)
INTL : Privet-Hello-Jii-Framework-Kak-dela-How-it-goes
Inflector::slug : Hello-Framework-How-it-goes

btw should there be an option for lowercasing the string in slug ? As in most cases it is used for Seo Urls and to remove inconsistency I prefere to make url all lowercase

@samdark
Copy link
Member

samdark commented Dec 27, 2013

Lowercasing it is a good idea. transliterator_transliterate is only available when there's intl installed. Since we're requiring intl for any advanced message translation (such as plurals) anyway I think the good solution is to use intl if available and fallback to English-only it not.

@tonydspaniard
Copy link
Contributor Author

@iJackUA I am going to try to include the Cyrillic reference but do not expect great results as with the helper i have to remove. For those interested on using a more robust transliteration fallback, please check (done versions for both Yii1 + Yi2): https://github.com/2amigos/transliteration-helper

@samdark I guess add a bool parameter to the function won't hurt.

* upstream: (21 commits)
  Fixes yiisoft#1643: Added default value for `Captcha::options`
  Fixes yiisoft#1654: Fixed the issue that a new message source object is generated for every new message being translated
  Allow dash char in ActionColumn’s button names.
  Added SecurityTest.
  fixed functional test when enablePrettyUrl is false.
  fixed composer.json
  minor doc fix.
  Fixes yiisoft#1634: Use masked CSRF tokens to prevent BREACH exploits
  Use better random CSRF token.
  GII unique indexes avoid autoIncrement columns
  updated debug retry params.
  Added sleep().
  Added unit test for ActiveRecord::updateAttributes().
  Fixes yiisoft#1641: Added `BaseActiveRecord::updateAttributes()`
  Fixed yiisoft#1504: Debug toolbar isn't loaded successfully in some environments when xdebug is enabled
  Mongo README.md updated.
  Fixes yiisoft#1611: Added `BaseActiveRecord::markAttributeDirty()`
  Number validator was missing
  Fixes yiisoft#1638: prevent table names from being enclosed within curly brackets twice.
  Unique indexes rules for single columns into array
  ...
@tonydspaniard
Copy link
Contributor Author

@qiangxue please review new approach. Tested both with intl and no intl extension. Both returned the same results.

Updated char map and fallback transliteration approach.

];
return preg_replace(array_keys($map), array_values($map), $string);
// ensure UTF-8 and remove invalid UTF-8 chars.
$string = mb_convert_encoding((string) $string, 'UTF-8', mb_list_encodings());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed? Or is it correct to do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to be very defensive always on code. Maybe it is not needed. I can remove it.

* upstream:
  Fixed CSRF token masking issue.
  improved error message of calling invalid scope method.
  Fixed repo URL
  Fixes yiisoft#1650: Added Connection::pdoClass.
  code style fix.
  added changelog
  codestyle fix
  improved checkIntegrity method
  Modified extension guidlines
  fix sphinx command signature
  fixed bug with forgotten param, fixed behavior for one table integrity
  fixed sequence reset
  added postgresql features to reset seq/check integrity
qiangxue added a commit that referenced this pull request Dec 28, 2013
enhance Inflector  helper with ascii function
@qiangxue qiangxue merged commit f18e530 into yiisoft:master Dec 28, 2013
@qiangxue
Copy link
Member

Thanks!

@tonydspaniard tonydspaniard deleted the 364-toAscii branch February 5, 2014 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StringHelper: new proposed methods
8 participants