[2.1] Feature/unidecoder #2105

wants to merge 66 commits into
Zend Framework member


When opening FILES CHANGED tab on GitHub it takes a longer while to display an can hang your browser**


This PR adds Zend\Text\UniDecoder to zf2. It is independent, has no requirements and is an utility class.

UniDecoder is used to convert utf-8 strings into plain-ASCII equivalents, transliterating accents and special characters into their ASCII equivalents. The component is loosely-based on Python Unidecoder and the transliteration is based on a database in form of conversion tables.

I want UniDecoder to find it's way into ZF 2.0.0 because I need it for Console transliteration in case the console is non-utf8 compatible and the Application tries to output multibyte strings. Without transliteration, the whole output will become unreadable and certain special characters can modify remote session settings, clear the screen or throw cursor around the screen.

Basic usage

The class is a static class, similar to Text\Multibyte and offers a single ::decode() method. Below is an example usage:

use Zend\Text\UniDecoder\UniDecoder;

$text = 'привет, здравствуйте';
echo UniDecoder::decode($text); // outputs: privet, zdravstvuite
DASPRiD and others added some commits Jun 3, 2012
@DASPRiD DASPRiD Initial slugifier commit 11008a3
@Thinkscape Thinkscape Remove Slugifier, update php docblocks. 54bd9b9
@Thinkscape Thinkscape Add tests, add handling of broken multibyte strings, refactor to static.
- Text\UniDecoder\UniDecoder is now a "static" class, with static ::decode() method
- UniDecoder::decode() now accepts second parameter which is a placeholder for broken or unknown characters.
- UniDecoder::decode() now checks if it is provided with a scalar value and will throw an exception otherwise.
- UniDecoder now handles broken utf-8 strings and will attempt to repair them.
- Add tests for all UniDecoder functionality.
Zend Framework member

@weierophinney @DASPRiD

This has been discussed before. It's an utility component which will be used by Zend\Console.
Please merge this PR before RC3.


This pull request fails (merged 45204a8 into 984d799).


This pull request fails (merged 5b59aac into 984d799).

Zend Framework member

I've noticed that PREG behavior changed between PHP 5.4.4 (I've tested the component against) and PHP 5.4.5 (which is used by travis). This makes one of the tests to fail. I will investigate tomorrow and work around it or modify test to match - transliterating broken utf8 strings is a secondary goal.

Zend Framework member

I tagged this as 2.1 because any new feature will be released in that version

Zend Framework member
Zend Framework member

@Maks3w no. I've discussed it before with @DASPRiD and @weierophinney. It's not a feature, it's a utility class. Please read description above. I need it for console to work properly and console is a 2.0 feature.

Zend Framework member

@marc-mabe Not really :-) Notice that it does not require or depend on any php extension, instead it shuffles bytes and depends on translit tables.

Zend Framework member

@Thinkscape: Sry, can't look into the code currently but doesn't ext/iconv help to transliterate special characters?
Is Zend\Text the right place for it ? - I integrated Zend\Text\MultiByte into Zend\StdLib\StringUtils to make it better available on other components (min dependencies) - It's not ready and not finished discussed because it was postponed to 2.1. Sure it's not the same functionality but there are some overlaps were to use it.
-> Could we discus about it before putting it into 2.0

Zend Framework member


  1. iconv('\\TRANSLIT') does not do what this component does.
  2. Zend\Text is the best place, because Zend\Filter and Zend\Validator are due for a cleanup and refactor, but we did not have enough time before RC1.
  3. I know Multibyte and I'm against moving that into Stdlib. Especially if it was to merged with UniDecoder and possibly slugifier (if we ported that too from @DASPRiD)
  4. Stdlib\ArrayUtils and StringUtils work on arrays and strings. Notice that these are built-in datatypes in PHP. Multibyte strings are an external standard. Because PHP uses string for byte data, that's why multibyte strings in PHP are held in string but similarities end here. There are multiple standards for multibyte strings, tens of rules and thousands of characters across different tables. PHP 6 was rumored to introduce built-in, native multibyte string support, but until that time, it's still an external, complex and expensive standard. It must not be squeezed in Stdlib just because there might be many components using it.
  5. UniDecoder consists of 190 new files. That's the main reason it lives inside it's NS under Zend\Text. This allows for easily excluding it, or including on-demand by people.
Zend Framework member

@Thinkscape: Only one note: strings in PHP are a sequence of bytes. This gets a text if you add a character set. PHP 6 only ads a data-type unicode and thats the same as using string+Unicode but now PHP nows about unicode ;)
-> So each text is an "external standard" and since PHP 6 each text not using unicode is an external standard, too.

Zend Framework member


We have renamed the folder for tests from Zend to ZendTest.

Can you rebase your PR to catch this change?

Thanks in advance.

sasezaki and others added some commits Jul 26, 2012
@sasezaki sasezaki remove old Zend\Service scripts from demos 2921413
@ZeinEddin ZeinEddin arabic translation for captcha ad57f2c
@ZeinEddin ZeinEddin arabic translation for zend validate d031696
@froschdesign froschdesign Orthographic mistakes fixed
- orthographic mistakes fixed
- hyphens added
tr form annotation builder - if isRequired is true, this will automatica…
…lly populate the required attribute for the input
tr adding test to confirm that required attribute is added to element dfb18c5
tr fixing for php 5.3.3 fd65d88
@waltertamboer waltertamboer Updated the docblocks and made sure the bridge is initialized when ne…
@Maks3w Maks3w [Tests] Rename Zend subdir to ZendTest for PSR-0 compliant 17e9735
tr adding check for no required value set on password 70d71e5
tr need to check that the variables aren't already an array. otherwise, …
…for example, forms become an array of elements.
@davidwindell davidwindell [Auth] Allow basic resolver to return AuthResult 97c8ebd
@davidwindell davidwindell typofix 3017e0e
@RWOverdijk RWOverdijk Added suggestion 78639b7
@RWOverdijk RWOverdijk Added where it is being used b513213
Juha Suni fixes ZF2-439 ee19ad6
@RWOverdijk RWOverdijk Added to Stdlib composer.json as well and removed space 1dc66bc
@x3ak x3ak Extract name only when we need it 15920d9
@x3ak x3ak Some small PHPDoc changes 9f18fa3
@juriansluiman juriansluiman Fix ViewModel::setVariables with objects as vars
A typo in the check to cast objects to array accessible variables
caused the tests to pass, but causes problems with a set of
variables in an array to be casted to arrays.
@juriansluiman juriansluiman Create test for objects casted to empty array
Objects in variables array implementing Traversable, but not ArrayAccess
are converted into an empty array. The test shows this and the ViewModel
is patched to specifically cast only the right objects to an array.
@juriansluiman juriansluiman Fix CS for Variable test asset e75fc32
@Thinkscape Thinkscape Fix Console\Charset NS import after one of previous refactors. bc16af6
@Thinkscape Thinkscape Add a workaround for Windows adapter to read a single char.
Reading a single char, validated against a char mask now works on Windows. The adapter will attempt to use either "choice" command (known since Windows 95) or Windows PowerShell, to obtain a single keystroke from user and validate it against a mask of allowed characters. This method works with both low and high ASCII chars, which enables capturing keys such as cursors, tab, escape, backspace and enter.
@Thinkscape Thinkscape Add convenience method to statically create prompts.
It is now possible to create prompts by calling static prompt() method, i.e. `$key = Prompt\Char::prompt("Press any key")`.
@weierophinney weierophinney [#2083] Better fix for issue, and CS cleanup
- Revert to previous constructor behavior, but instead have doWrite() call
  getFirePhp() to lazy-load the bridge instance.
- Do not import classes from a subnamespace of the current namespace
- trailing whitespace
@davidwindell davidwindell added tests e8d99aa
@weierophinney weierophinney [#2096] CS cleanup
- Fixed whitespace issues
- Moved interfaces into their subnamespaces
- Do not use phpdoc-style docblocks as in-body comments
@weierophinney weierophinney [#2096] Update classes referencing Console
- ensure they refer to interfaces correctly
@Thinkscape Thinkscape Fix fatal errors after recent changes, fix handling of space and 0 in…
… Posix adapter.
@Rovak Rovak First refactoring 989895c
@Thinkscape Thinkscape Fix class imports fatal errors. f167db2
@Rovak Rovak tests passing bb1c666
@Rovak Rovak PHPCS Fixer f5c770c
@Rovak Rovak Revert comment spacing dde1097
tr fixing parse method to set a path of at least / b6c28ec
tr forcing socket to use a path of '/' if no path is returned from uri 194bd39
@pborreli pborreli Fixed typos 4b44562
@marc-mabe marc-mabe Wddx: libxml_disable_entity_loader(true) needs to be reset with previ…
…ous value
@padraic padraic Added trim() to XML input when importing an XML or HTML string. Allow…
…s for slight errors in output XML found in the wild where extra space can be introduced in error.
tr moved path check to http by overriding parse method ca025b8
tr forgot to return Http object 47f0b73
tr correct docblock for overriden parse method ad733f2
@cgmartin cgmartin Cleanup unnecessary override in FormCheckbox view helper 1a2d769
@ethanhann ethanhann When calling Ldap->search, ErrorHandle::start is called in Ldap->sear…
…ch, and then again in Ldap->connect. This results in an "ErrorHandler already started" exception. Moving the $this->getResource call (which is ultimately responsible for the Ldap->connect call) before the ErrorHandle::start call in the Ldap->search function resolves this issue.
@padraic padraic Set SSL Content used by HTTP Client's Socket adapter to set verify_pe…
…er to TRUE and allow_self_signed to FALSE by default
tr fixing desending typo in db select. af20dd6
@prolic prolic changed from "file_exists" to "class_exists" in Zend\Filter\Encrypt c03e48f
@prolic prolic fix PSR2 in Zend\Filter\Encrypt 6e672e6
@weierophinney weierophinney [#2084] Cleanup
- A few test assets had snuck into the tree since this PR was added;
  moved those under the ZendTest directory
- phpunit.xml.dist needed to point to ZendTest directory
- likewise with run-tests.sh
- Also, if you have run a composer install, you still need access to
  TestAsset files under ZendTest -- as such, I've made autoloading of
  that namespace happen in all situations, and register the Zend
  namespace only if the composer autoloader is not present.
@weierophinney weierophinney [#2098] s/execute/onDispatch/
- Done for internal consistency; we recommend using "onEventName" for
  methods that are event listeners.
@weierophinney weierophinney [#2099] Fixes issue in Wildcard route
- Wildcard route was making the assumption that getPath() could return
  an empty string. Added a logic path to reset $path to '' when a single
  slash value is discovered.
@Maks3w Maks3w [Tests] Use Zend_Autoloader when Composer's autoloader is not present.
Note: If your tests are failing after this patch probably you need to do "php composer.phat update" to update Composer's autoloader with the new lists of namespaces
@weierophinney weierophinney Fixes to tests based on URI changes
- Modified tests that omitted "/" path from generated URIs to use them.
@weierophinney weierophinney [2110] Added README.md entry
- Noted change to use verify_peer by default
@weierophinney weierophinney [#2079] Move "required" logic to listener
- Moved to a listener, and made to update the attributes array only if
  it is boolean true
@weierophinney weierophinney CS fixes 0734af2
@stefankleff stefankleff Added unit test for ZF2-440 -> set data with null values 53c90fe
@weierophinney weierophinney [ZF2-440][#2095] Resolves case of null values
- null values passed to elements marked as not required should validate
@weierophinney weierophinney [#2119] CS fixes
- trailing whitespace
@weierophinney weierophinney [#2119] micro-opt
- Test for null before testing is_string/strlen
@Thinkscape Thinkscape Rebase to latest master, remove redundant test. 3a4e7e7
Zend Framework member

@Maks3w I've rebased and moved test case to ZendTest.

I've installed php 5.4.5 but this single preg tests that failed before seems like a heisenbug.

@Thinkscape Thinkscape closed this Aug 8, 2012
@Thinkscape Thinkscape reopened this Aug 8, 2012
Zend Framework member

You have a problem with the git history. Use git reflog to rescue your old state and retry the rebase.

Zend Framework member

What's the status of this PR? @Thinkscape do you want me to look into the preg issue?

Zend Framework member

@DASPRiD Yeah, fire away. Works for me.
I've checked again under 5.4.4 and 5.4.5. Tests pass, so I don't know what was the previous behavior about.

Zend Framework member

I really, really want to merge this. However, I've spent about 45 minutes trying to resolve merge commits, and it keeps looking like I'm on the verge of breaking something.

If you want to see this in 2.1, can I ask you to please rebase?

@weierophinney weierophinney reopened this Sep 14, 2012

I cherry picked the relevant commits (please review that!) of this PR and applied them on top of a new branch. This should make it easier to merge. The new PR is #2399

Zend Framework member
Zend Framework member

Closing this PR, as @juriansluiman made a new one.

@DASPRiD DASPRiD closed this Sep 21, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment