Skip to content
This repository has been archived by the owner on Jan 8, 2020. It is now read-only.

[2.1] Feature/unidecoder #2105

Closed
wants to merge 66 commits into from

Conversation

Thinkscape
Copy link
Member

WARNING!!!

When opening FILES CHANGED tab on GitHub it takes a longer while to display an can hang your browser**

Rationale

This PR adds Zend\Text\UniDecoder to zf2. It is independent, has no requirements and is an utility class.

UniDecoder is used to convert utf-8 strings into plain-ASCII equivalents, transliterating accents and special characters into their ASCII equivalents. The component is loosely-based on Python Unidecoder and the transliteration is based on a database in form of conversion tables.

I want UniDecoder to find it's way into ZF 2.0.0 because I need it for Console transliteration in case the console is non-utf8 compatible and the Application tries to output multibyte strings. Without transliteration, the whole output will become unreadable and certain special characters can modify remote session settings, clear the screen or throw cursor around the screen.

Basic usage

The class is a static class, similar to Text\Multibyte and offers a single ::decode() method. Below is an example usage:

<?php
use Zend\Text\UniDecoder\UniDecoder;

$text = 'привет, здравствуйте';
echo UniDecoder::decode($text); // outputs: privet, zdravstvuite

DASPRiD and others added 3 commits August 2, 2012 17:40
- Text\UniDecoder\UniDecoder is now a "static" class, with static ::decode() method
- UniDecoder::decode() now accepts second parameter which is a placeholder for broken or unknown characters.
- UniDecoder::decode() now checks if it is provided with a scalar value and will throw an exception otherwise.
- UniDecoder now handles broken utf-8 strings and will attempt to repair them.
- Add tests for all UniDecoder functionality.
@Thinkscape
Copy link
Member Author

@weierophinney @DASPRiD

This has been discussed before. It's an utility component which will be used by Zend\Console.
Please merge this PR before RC3.

@travisbot
Copy link

This pull request fails (merged 45204a8 into 984d799).

@travisbot
Copy link

This pull request fails (merged 5b59aac into 984d799).

@Thinkscape
Copy link
Member Author

I've noticed that PREG behavior changed between PHP 5.4.4 (I've tested the component against) and PHP 5.4.5 (which is used by travis). This makes one of the tests to fail. I will investigate tomorrow and work around it or modify test to match - transliterating broken utf8 strings is a secondary goal.

@Maks3w
Copy link
Member

Maks3w commented Aug 3, 2012

I tagged this as 2.1 because any new feature will be released in that version

@marc-mabe
Copy link
Member

@Thinkscape It looks similar to my work on Zend\Stdlib\StringUtils : https://github.com/marc-mabe/zf2/blob/string/library/Zend/Stdlib/StringUtils.php

@Thinkscape
Copy link
Member Author

@Maks3w no. I've discussed it before with @DASPRiD and @weierophinney. It's not a feature, it's a utility class. Please read description above. I need it for console to work properly and console is a 2.0 feature.

@Thinkscape
Copy link
Member Author

@marc-mabe Not really :-) Notice that it does not require or depend on any php extension, instead it shuffles bytes and depends on translit tables.

@marc-mabe
Copy link
Member

@Thinkscape: Sry, can't look into the code currently but doesn't ext/iconv help to transliterate special characters?
Is Zend\Text the right place for it ? - I integrated Zend\Text\MultiByte into Zend\StdLib\StringUtils to make it better available on other components (min dependencies) - It's not ready and not finished discussed because it was postponed to 2.1. Sure it's not the same functionality but there are some overlaps were to use it.
-> Could we discus about it before putting it into 2.0

@Thinkscape
Copy link
Member Author

@marc-mabe

  1. iconv('\\TRANSLIT') does not do what this component does.
  2. Zend\Text is the best place, because Zend\Filter and Zend\Validator are due for a cleanup and refactor, but we did not have enough time before RC1.
  3. I know Multibyte and I'm against moving that into Stdlib. Especially if it was to merged with UniDecoder and possibly slugifier (if we ported that too from @DASPRiD)
  4. Stdlib\ArrayUtils and StringUtils work on arrays and strings. Notice that these are built-in datatypes in PHP. Multibyte strings are an external standard. Because PHP uses string for byte data, that's why multibyte strings in PHP are held in string but similarities end here. There are multiple standards for multibyte strings, tens of rules and thousands of characters across different tables. PHP 6 was rumored to introduce built-in, native multibyte string support, but until that time, it's still an external, complex and expensive standard. It must not be squeezed in Stdlib just because there might be many components using it.
  5. UniDecoder consists of 190 new files. That's the main reason it lives inside it's NS under Zend\Text. This allows for easily excluding it, or including on-demand by people.

@marc-mabe
Copy link
Member

@Thinkscape: Only one note: strings in PHP are a sequence of bytes. This gets a text if you add a character set. PHP 6 only ads a data-type unicode and thats the same as using string+Unicode but now PHP nows about unicode ;)
-> So each text is an "external standard" and since PHP 6 each text not using unicode is an external standard, too.

@Maks3w
Copy link
Member

Maks3w commented Aug 6, 2012

Hi,

We have renamed the folder for tests from Zend to ZendTest.

Can you rebase your PR to catch this change?

Thanks in advance.

tr and others added 16 commits August 8, 2012 13:28
- A few test assets had snuck into the tree since this PR was added;
  moved those under the ZendTest directory
- phpunit.xml.dist needed to point to ZendTest directory
- likewise with run-tests.sh
- Also, if you have run a composer install, you still need access to
  TestAsset files under ZendTest -- as such, I've made autoloading of
  that namespace happen in all situations, and register the Zend
  namespace only if the composer autoloader is not present.
- Done for internal consistency; we recommend using "onEventName" for
  methods that are event listeners.
- Wildcard route was making the assumption that getPath() could return
  an empty string. Added a logic path to reset $path to '' when a single
  slash value is discovered.
Note: If your tests are failing after this patch probably you need to do "php composer.phat update" to update Composer's autoloader with the new lists of namespaces
- Modified tests that omitted "/" path from generated URIs to use them.
- Noted change to use verify_peer by default
- Moved to a listener, and made to update the attributes array only if
  it is boolean true
- null values passed to elements marked as not required should validate
- trailing whitespace
- Test for null before testing is_string/strlen
@Thinkscape
Copy link
Member Author

@Maks3w I've rebased and moved test case to ZendTest.

I've installed php 5.4.5 but this single preg tests that failed before seems like a heisenbug.

@Thinkscape Thinkscape closed this Aug 8, 2012
@Thinkscape Thinkscape reopened this Aug 8, 2012
@Maks3w
Copy link
Member

Maks3w commented Aug 11, 2012

You have a problem with the git history. Use git reflog to rescue your old state and retry the rebase.

@DASPRiD
Copy link
Member

DASPRiD commented Aug 20, 2012

What's the status of this PR? @Thinkscape do you want me to look into the preg issue?

@Thinkscape
Copy link
Member Author

@DASPRiD Yeah, fire away. Works for me.
I've checked again under 5.4.4 and 5.4.5. Tests pass, so I don't know what was the previous behavior about.

@weierophinney
Copy link
Member

I really, really want to merge this. However, I've spent about 45 minutes trying to resolve merge commits, and it keeps looking like I'm on the verge of breaking something.

If you want to see this in 2.1, can I ask you to please rebase?

@juriansluiman
Copy link
Contributor

I cherry picked the relevant commits (please review that!) of this PR and applied them on top of a new branch. This should make it easier to merge. The new PR is #2399

@Thinkscape
Copy link
Member Author

Just make sure to include the commit with my unit tests.
If they all pass, everything's jolly.

On Fri, Sep 21, 2012 at 2:31 PM, Jurian Sluiman notifications@github.comwrote:

I cherry picked the relevant commits (please review that!) of this PR and
applied them on top of a new branch. This should make it easier to merge.
The new PR is #2399 #2399


Reply to this email directly or view it on GitHubhttps://github.com//pull/2105#issuecomment-8763061.

@DASPRiD
Copy link
Member

DASPRiD commented Sep 21, 2012

Closing this PR, as @juriansluiman made a new one.

@DASPRiD DASPRiD closed this Sep 21, 2012
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet