Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Stdlib\StringUtils #3110

Merged
merged 54 commits into from

3 participants

@marc-mabe

This class provides some basic handling of strings of different character encodings.
It comes with string wrappers for iconv, mbstring, intl (grapheme_* functions / UConverter) and a wrapper for native string functions. Wrapped functions are: strlen, strpos, substr, strpad, wordwrap, convert.

So it will be up to the user which PHP extension will be used supporting the required character encoding and which one is the best for a given character encoding.

Wrapper usage:

The following command returns an instance of the best available string wrapper supporting the given character encoding and if given with support to convert a string of the encoding into the other encoding.
If no wrapper was found an exception will be thrown.

StringUtils::getWrapper('encoding'[, 'encoding to convert to']) : StringWrapperInterface

The returned StringWrapperInterface simply wrappes the string functions -- there will be no error handling because of the heavy use of string functions (like in a loop) this should be up the consumer.

More helpful methods:

StringUtils::isSingleByteEncoding(<string encoding>) : boolean
StringUtils::getSingleByteEncodings() : string[]
StringUtils::isValidUtf8(<string>) : boolean // using preg_match
StringUtils::registerWrapper(<StringWrapperInterface>)
StringUtils::getRegisteredWrappers() : StringWrapperInterface[]

PS:

  • Character encodings are case-insensitive and will be handled in upper-case internally.
  • The default order of registered wrappers are: intl, mbstring, iconv, native
  • Zend\Text\MultiByte will be deprecated and redirected using the string wrapper (all tests has been moved)
marc-mabe added some commits
@marc-mabe marc-mabe initial StringUtils cd09a59
@marc-mabe marc-mabe Merge branch 'master' of git://github.com/zendframework/zf2 into string 6c0f698
@marc-mabe marc-mabe Native string adapter don't need ext/mbstring 906b5c3
@marc-mabe marc-mabe StringUtils: tests, no component deps 2098c00
@marc-mabe marc-mabe Merge branch 'master' of git://github.com/zendframework/zf2 into string 696fbec
@marc-mabe marc-mabe adapter -> wrapper 49d8ec4
@marc-mabe marc-mabe intl string wrapper and some small other changes 29e0da2
@marc-mabe marc-mabe Merge branch 'develop' of git://github.com/zendframework/zf2 into string 4adbb3d
@marc-mabe marc-mabe ZendTest namespace 6b42747
@marc-mabe marc-mabe StringUtils: phpdoc + cs 3dd2d06
@marc-mabe marc-mabe StringUtils: phpdoc + cs b17b3de
@marc-mabe marc-mabe StringUtils: added tests 0819967
@marc-mabe marc-mabe StringUtilsTest: updated phpdoc 1ebab45
@marc-mabe marc-mabe StringUtils: cs 5c89903
@marc-mabe marc-mabe StringUtils: tests + fixes + supported encodings for iconv and mbstring 35b91e6
@marc-mabe marc-mabe StringUtils: wording: charset -> encoding dde5cf5
@marc-mabe marc-mabe fixed wrong typed variable in StringUtils::getWrapper 1d800eb
@marc-mabe marc-mabe StringUtils: cs bdeddca
@marc-mabe marc-mabe StringUtils: implemented basic functionality into AbstractStringWrapp…
…er::convert
9a44514
@marc-mabe marc-mabe StringUtils: hopefully a little better encoding list b831a53
@marc-mabe marc-mabe StringUtils: hopefully a little better encoding list 5f42457
@marc-mabe marc-mabe StringUtils: optimations 1c82b00
@marc-mabe marc-mabe StringUtils: cs 0c998cb
@marc-mabe marc-mabe Updated Zend\Validator to used StringUtils c16b3ef
@marc-mabe marc-mabe Updated Zend\Mvc to used StringUtils 954950d
@marc-mabe marc-mabe StringUtils: MbString wrapper use of 'mb_list_encodings' 7ba3c5f
@marc-mabe marc-mabe Updated Zend\Text to use StringUtils and deprecated Zend\Text\MultiByte 6b82989
@marc-mabe marc-mabe Updated Zend\Feed to use StringUtils d0fa0ad
@marc-mabe marc-mabe Zend\Feed: replaced one iconv_strlen with a string wrapper 0a17816
@marc-mabe marc-mabe FIXME: Converting the euro sign from UTF-8 to ISO-8859-16 using the m…
…bstring extension gives a wrong result
90b0367
@marc-mabe marc-mabe removed trailing spaces 46ae440
@marc-mabe

With PHP-5.5 the intl adapter could handle more encodings supported by UConverter (https://wiki.php.net/rfc/uconverter) but strings need to be encoded to UTF-8 to run methods like strlen, substr ... using the grapheme_* functions

@weierophinney

@marc-mabe I see the PR is still marked WIP -- what's the current status, and when do you foresee completion for review?

@marc-mabe

It's ready for review!

The only one missing is the intl-UConverter but PHP-5.5 is still alpha ;)

@weierophinney
@marc-mabe

already done

...y/Zend/Stdlib/StringWrapper/AbstractStringWrapper.php
((233 lines not shown))
+ if ($padType === \STR_PAD_BOTH) {
+ $lastStringLeft = '';
+ $lastStringRight = '';
+ $repeatCountLeft = $repeatCountRight = ($repeatCount - $repeatCount % 2) / 2;
+
+ $lastStringLength = $lengthOfPadding - 2 * $repeatCountLeft * $padStringLength;
+ $lastStringLeftLength = $lastStringRightLength = floor($lastStringLength / 2);
+ $lastStringRightLength += $lastStringLength % 2;
+
+ $lastStringLeft = $this->substr($padString, 0, $lastStringLeftLength);
+ $lastStringRight = $this->substr($padString, 0, $lastStringRightLength);
+
+ return str_repeat($padString, $repeatCountLeft) . $lastStringLeft
+ . $input
+ . str_repeat($padString, $repeatCountRight) . $lastStringRight;
+ } else {
@weierophinney Owner

This else isn't needed, as the previous block returns on completion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
...y/Zend/Stdlib/StringWrapper/AbstractStringWrapper.php
((238 lines not shown))
+ $lastStringLength = $lengthOfPadding - 2 * $repeatCountLeft * $padStringLength;
+ $lastStringLeftLength = $lastStringRightLength = floor($lastStringLength / 2);
+ $lastStringRightLength += $lastStringLength % 2;
+
+ $lastStringLeft = $this->substr($padString, 0, $lastStringLeftLength);
+ $lastStringRight = $this->substr($padString, 0, $lastStringRightLength);
+
+ return str_repeat($padString, $repeatCountLeft) . $lastStringLeft
+ . $input
+ . str_repeat($padString, $repeatCountRight) . $lastStringRight;
+ } else {
+ $lastString = $this->substr($padString, 0, $lengthOfPadding % $padStringLength);
+
+ if ($padType === \STR_PAD_LEFT) {
+ return str_repeat($padString, $repeatCount) . $lastString . $input;
+ } else {
@weierophinney Owner

This else isn't needed, as the previous block returns on completion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@weierophinney weierophinney commented on the diff
library/Zend/Stdlib/StringWrapper/Iconv.php
((13 lines not shown))
+use Zend\Stdlib\Exception;
+
+/**
+ * @category Zend
+ * @package Zend_Stdlib
+ * @subpackage StringWrapper
+ */
+class Iconv extends AbstractStringWrapper
+{
+ /**
+ * List of supported character sets (upper case)
+ *
+ * @var string[]
+ * @link http://www.gnu.org/software/libiconv/
+ */
+ protected static $encodings = array(
@weierophinney Owner

From what I understand, these encodings could vary from release to release. How do you plan to address that?

The defined encodings are from http://www.gnu.org/software/libiconv/
I couldn't find a way to detect supported encodings. There isn't a function and there isn't informational output by phpinfo.
Do you know a way ?

-> For now the consumer have to extend this class to support more compiled encodings by iconv.

@weierophinney Owner

That's fine -- we should likely detail this in the docs, however, so that individuals can understand that the encodings are hardcoded, and extension is the only way to change the set.

I'll start working on documentation next week.

@DASPRiD Collaborator
DASPRiD added a note

About the formatting:
Even though it looks okay, but if you apply the "parameter" formatting from PSR-2 here, you have to keep one value per line, if all values aren't on the same line.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
library/Zend/Stdlib/StringWrapper/MbString.php
((99 lines not shown))
+ * @return int|false
+ */
+ public function strpos($haystack, $needle, $offset = 0)
+ {
+ return mb_strpos($haystack, $needle, $offset, $this->encoding);
+ }
+
+ /**
+ * Convert a string from one character encoding to another
+ *
+ * @param string $str
+ * @param string $toEncoding
+ * @param string $fromEncoding
+ * @return string|false
+ */
+ public function convert($str, $backword = false)
@weierophinney Owner

I think you mean "backward", not "backword" -- but more importantly, it seems like the parameters in the docblock do not match those actually accepted. This is true both in this class and the iconv variant -- please clarify, and indicate the proper arguments in each, as well as clarify what is meant by "backward" (I think you may actually mean something like "reverse").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
.../Zend/Stdlib/StringWrapper/StringWrapperInterface.php
((27 lines not shown))
+ public static function isSupported($encoding, $convertEncoding = null);
+
+ /**
+ * Get a list of supported character encodings
+ *
+ * @return string[]
+ */
+ public static function getSupportedEncodings();
+
+ /**
+ * Constructor
+ *
+ * @param string $encoding Character encoding working on
+ * @param string|null $convertEncoding Character encoding to convert to
+ */
+ public function __construct($encoding, $convertEncoding = null);
@weierophinney Owner

I'm not a big fan of defining constructors inside interfaces; it makes it more difficult to mock implementations, create decorators, etc. I'd argue also that since you have a setEncoding() method already, this requirement can be removed. That said, I'd argue for adding a setConvertEncoding method to the interface, as it's clear that you intend for that functionality in most implementations.

Thats because it requires the encoding property set to a supported encoding. Else the wrapper needs to check if the encodingproperty is empty / supported on every string function.

I defined setting the encoding and convert encoding properties within one method to be free if a wrapper supports some encodings for basic string functions but not to convert all to all. -> For example the native string wrapper is marked to support all defined single byte encodings but it can't convert all encodings.

@weierophinney Owner

Why not have getEncoding() raise an exception when the value is empty? That would allow you to remove it from the constructor, and resolve the need to check for an encoding each time it's needed (you'd simply say $enc = $this->getEncoding(), and it would raise an exception if not set).

OK after reflecting it a little bit more I'll add getters for both encodings but the getter should only return (no check) because this methods could be needed outside to get the current object state. I'll add the check after getting the value - it's only one if + throw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
.../Zend/Stdlib/StringWrapper/StringWrapperInterface.php
((73 lines not shown))
+ * Find the position of the first occurrence of a substring in a string
+ *
+ * @param string $haystack
+ * @param string $needle
+ * @param int $offset
+ * @param string $encoding
+ * @return int|false
+ */
+ public function strpos($haystack, $needle, $offset = 0);
+
+ /**
+ * Convert a string from one character encoding to another
+ *
+ * @param string $str
+ * @param boolean $backward
+ * @return string|false
@weierophinney Owner

Find a better word than "backward" -- what you mean to do is reverse encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@marc-mabe marc-mabe else isn't needed, as the previous block returns on completion + remo…
…ved call to strlen if length of padding is less than or equal 0
54d97d5
@weierophinney weierophinney commented on the diff
tests/ZendTest/Stdlib/StringUtilsTest.php
((43 lines not shown))
+ array('8bit'),
+ array('ISo-8859-1'),
+ array('ISo-8859-2'),
+ array('ISo-8859-3'),
+ array('ISo-8859-4'),
+ array('ISo-8859-5'),
+ array('ISo-8859-6'),
+ array('ISo-8859-7'),
+ array('ISo-8859-8'),
+ array('ISo-8859-9'),
+ array('ISo-8859-10'),
+ array('ISo-8859-11'),
+ array('ISo-8859-13'),
+ array('ISo-8859-14'),
+ array('ISo-8859-15'),
+ array('ISo-8859-16'),
@weierophinney Owner

I'm guessing you intended for the "o" in the above to be "O" (capitalized). :-)

no, it's to make sure the encodings are case-insensitive - I'll add a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@weierophinney weierophinney commented on the diff
library/Zend/Text/MultiByte.php
((29 lines not shown))
- . $input
- . str_repeat($padString, $repeatCountRight) . $lastStringRight;
- } else {
- $lastString = iconv_substr($padString, 0, $lengthOfPadding % $padStringLength, $charset);
-
- if ($padType === STR_PAD_LEFT) {
- $return = str_repeat($padString, $repeatCount) . $lastString . $input;
- } else {
- $return = $input . str_repeat($padString, $repeatCount) . $lastString;
- }
- }
- }
+ trigger_error(sprintf(
+ "This method is deprecated, please use '%s' instead",
+ 'Zend\Stdlib\StringUtils::getWrapper(<charset>)->strPad'
+ ), E_USER_DEPRECATED);
@weierophinney Owner

If you're going to do this, I'd also add an @deprecated annotation to the class.

wordWrap already had one - I added it for strPad, too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@weierophinney weierophinney was assigned
@weierophinney

@marc-mabe Looking good -- address the comments I've made above, and I think we can review for inclusion.

@marc-mabe

@weierophinney: I have fixed or commented your notes

@marc-mabe

Added the methods getEncoding() and getConvertEncoding and removed the constructor from interface.
Now the wrappers has a supported default encoding UTF-8 for all wrappers else Native - there it's ASCII

Because the recommended way to instantiate a wrapper is to use StringUtils::getWrapper() which defines the given character encodings. The encoding will be used but unconventional instantiations doesn't have issues with an unset encoding because of the default encodings defined.

library/Zend/Feed/Writer/Extension/ITunes/Entry.php
@@ -34,6 +36,18 @@ class Entry
protected $encoding = 'UTF-8';
/**
+ * The used string wrapper supporting encoding
+ *
+ * @var StringWrapperInterface
+ */
+ protected $stringWrapper;
+
+ public function __construct()
+ {
+ $this->stringWrapper = StringUtils::getWrapper($this->getEncoding());
@weierophinney Owner

Considering there's a default, and getEncoding() simply returns the property, I'd just use $this->encoding here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
library/Zend/Feed/Writer/Extension/ITunes/Feed.php
@@ -34,6 +36,21 @@ class Feed
protected $encoding = 'UTF-8';
/**
+ * The used string wrapper supporting encoding
+ *
+ * @var StringWrapperInterface
+ */
+ protected $stringWrapper;
+
+ /**
+ * Constructor
+ */
+ public function __construct()
+ {
+ $this->stringWrapper = StringUtils::getWrapper($this->getEncoding());
@weierophinney Owner

Same comment here as for the feed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
library/Zend/ProgressBar/Adapter/Console.php
@@ -438,7 +439,12 @@ public function notify($current, $max, $percent, $timeTaken, $timeRemaining, $te
break;
case self::ELEMENT_TEXT:
- $renderedElements[] = \Zend\Text\MultiByte::strPad(substr($text, 0, $this->textWidth), $this->textWidth, ' ', STR_PAD_RIGHT, $this->charset);
+ $renderedElements[] = StringUtils::getWrapper($this->charset)->strPad(
+ substr($text, 0, $this->textWidth),
+ $this->textWidth,
+ ' ',
+ \STR_PAD_RIGHT
@weierophinney Owner

You don't need to do the global namespace prefix here; constants are dereferenced from the global namespace automagically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@weierophinney weierophinney commented on the diff
...Zend/Stdlib/Exception/ExtensionNotLoadedException.php
@@ -0,0 +1,7 @@
+<?php
+
@weierophinney Owner

Needs file and class level docblocks.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@weierophinney weierophinney commented on the diff
library/Zend/Stdlib/Exception/RuntimeException.php
@@ -0,0 +1,7 @@
+<?php
+
@weierophinney Owner

Needs file and class level docblocks.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@weierophinney weierophinney commented on the diff
library/Zend/Stdlib/StringUtils.php
((82 lines not shown))
+ public static function registerWrapper($wrapper)
+ {
+ $wrapper = (string) $wrapper;
+ if (!in_array($wrapper, static::$wrapperRegistry, true)) {
+ static::$wrapperRegistry[] = $wrapper;
+ }
+ }
+
+ /**
+ * Unregister a string wrapper class
+ *
+ * @param string $wrapper
+ * @return void
+ */
+ public static function unregisterWrapper($wrapper)
+ {
@weierophinney Owner

We should likely also have a method to clear the wrapper registry (for testing purposes).

added StringUtils::resetRegisteredWrappers()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
library/Zend/Text/MultiByte.php
((5 lines not shown))
*/
- public static function strPad($input, $padLength, $padString = ' ', $padType = STR_PAD_RIGHT, $charset = 'utf-8')
+ public static function strPad($input, $padLength, $padString = ' ', $padType = \STR_PAD_RIGHT, $charset = 'utf-8')
@weierophinney Owner

The global qualifier for the constant is not necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@weierophinney

@marc-mabe Almost there!

...Test/Stdlib/StringWrapper/CommonStringWrapperTest.php
((126 lines not shown))
+ $this->markTestSkipped("Encoding {$encoding} or {$convertEncoding} not supported");
+ }
+
+ $result = $wrapper->convert($str);
+ $this->assertSame($expected, $result);
+
+ // backword
+ $result = $wrapper->convert($expected, true);
+ $this->assertSame($str, $result);
+ }
+
+ public function wordWrapProvider()
+ {
+ return array(
+ // Standard cut tests
+ array('utf-8', 'äbüöcß', 2, ' ', true,
@DASPRiD Collaborator
DASPRiD added a note

Within data providers, it helps to give every data-set a descriptive string key (see route tests for instance). Those will be given in case of an error, instead of just the numeric index. Since these are converted tests, it's fine to use the original test names (lowercase-dashed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
...y/Zend/Stdlib/StringWrapper/AbstractStringWrapper.php
((152 lines not shown))
+ * @return string|false
+ */
+ public function wordWrap($string, $width = 75, $break = "\n", $cut = false)
+ {
+ $string = (string) $string;
+ if ($string === '') {
+ return '';
+ }
+
+ $break = (string) $break;
+ if ($break === '') {
+ throw new Exception\InvalidArgumentException('Break string cannot be empty');
+ }
+
+ $width = (int) $width;
+ $cut = (bool) $cut;
@DASPRiD Collaborator
DASPRiD added a note

This is not directly neccessary, as we are not doing any strict comparision on $cut.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@DASPRiD DASPRiD commented on the diff
...y/Zend/Stdlib/StringWrapper/AbstractStringWrapper.php
((60 lines not shown))
+ * @param string $encoding The character encoding to work with
+ * @param string|null $convertEncoding The character encoding to convert to
+ * @return StringWrapperInterface
+ */
+ public function setEncoding($encoding, $convertEncoding = null)
+ {
+ $supportedEncodings = static::getSupportedEncodings();
+
+ $encodingUpper = strtoupper($encoding);
+ if (!in_array($encodingUpper, $supportedEncodings)) {
+ throw new Exception\InvalidArgumentException(
+ 'Wrapper doesn\'t support character encoding "' . $encoding . '"'
+ );
+ }
+
+
@DASPRiD Collaborator
DASPRiD added a note

Redundant blank line.

removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
library/Zend/Stdlib/StringWrapper/Iconv.php
((191 lines not shown))
+ {
+ $encoding = $this->getEncoding();
+ $convertEncoding = $this->getConvertEncoding();
+ if ($convertEncoding === null) {
+ throw new Exception\LogicException(
+ 'No convert encoding defined'
+ );
+ }
+
+ if ($encoding === $convertEncoding) {
+ return $str;
+ }
+
+ $fromEncoding = $reverse ? $convertEncoding : $encoding;
+ $toEncoding = $reverse ? $encoding : $convertEncoding;
+ return iconv($fromEncoding, $toEncoding, $str);
@DASPRiD Collaborator
DASPRiD added a note

What about //translit and //ignore?

Such functionality isn't supported yet because it needs to be available by all wrappers. Same with strto[lower|upper].

@DASPRiD Collaborator
DASPRiD added a note

So in cases those are required we fallback to direct iconv usage in the core?

//TRANSLIT - than you need iconv explicit
//IGNORE - I tested converting an utf-8 string with one invalid character into iso-8859-1:

  • mbstring ignores the invalid character and doesn't trigger an error
  • iconv stops converting on the invalid character and triggers a notice. With //IGNORE it ignores the invalid character but triggers a notice, too

So I could add //IGNORE automatically on every convert because truncating the string to convert helps nobody.

What do you think ?

@weierophinney Owner

Sounds reasonable.

added //IGNORE to the destination encoding on iconv wrapper + test to not truncate on converting for all wrappers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@marc-mabe

hopefully done now

library/Zend/Stdlib/StringWrapper/Iconv.php
((143 lines not shown))
+ 'UCS-4',
+ 'UCS-4BE',
+ 'UCS-4LE',
+ 'UTF-16',
+ 'UTF-16BE',
+ 'UTF-16LE',
+ 'UTF-32',
+ 'UTF-32BE',
+ 'UTF-32LE',
+ 'UTF-7',
+ 'C99',
+ 'JAVA',
+
+ // Full Unicode, in terms of uint16_t or uint32_t (with machine dependent endianness and alignment)
+ // 'UCS-2-INTERNAL',
+ // 'UCS-4-INTERNAL',
@DASPRiD Collaborator
DASPRiD added a note

It should either be noted why those are commented out or those lines should be removed completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@DASPRiD DASPRiD commented on the diff
...Test/Stdlib/StringWrapper/CommonStringWrapperTest.php
((124 lines not shown))
+ $wrapper = $this->getWrapper($encoding, $convertEncoding);
+ if (!$wrapper) {
+ $this->markTestSkipped("Encoding {$encoding} or {$convertEncoding} not supported");
+ }
+
+ $result = $wrapper->convert($str);
+ $this->assertSame($expected, $result);
+
+ // backword
+ $result = $wrapper->convert($expected, true);
+ $this->assertSame($str, $result);
+ }
+
+ public function wordWrapProvider()
+ {
+ return array(
@DASPRiD Collaborator
DASPRiD added a note

For some reason, my comment wasn't saved here yesterday:

Use string keys for all data sets (e.g. the original test names, lowercase-dashed). Those will be used for reporting failures, instead of the numeric index, which helps a lot to also understand what the data set is actually testing.

Now I understand what you mean - I already added the orig. test method names but not as array keys -> updated

@DASPRiD Collaborator
DASPRiD added a note

Since this is already the word wrap data provider, I'd personally remove the redundant leading "word-wrap-" from the data set names :)

removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@weierophinney

@marc-mabe I think with the addition of the //IGNORE pragma, this will be ready; I'd add it myself, but I'm not sure I understand exactly which argument and/or which wrappers to add it to. Also, as @DASPRiD notes, remove the "word-wrap-" prefix from the one set of data providers.

Many thanks in advance -- this looks like it will be quite useful!

@marc-mabe

@weierophinney: I removed the word-wrap- prefix as @DASPRiD noted and also added //IGNORE to the destination encoding of the iconv wrapper + tests

@weierophinney

I get 2 failures when I run tests; details are below. These are with PHP 5.4.9, using iconv v2.15 and libmbfl v1.3.2.

1) ZendTest\Stdlib\StringWrapper\IconvTest::testConvertDontSubstringsOnInvalidCharacter
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-foo x bar
+

tests/ZendTest/Stdlib/StringWrapper/CommonStringWrapperTest.php:151

2) ZendTest\Stdlib\StringWrapper\MbStringTest::testConvertDontSubstringsOnInvalidCharacter
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-foo x bar
+foo ?x bar

tests/ZendTest/Stdlib/StringWrapper/CommonStringWrapperTest.php:151

Travis doesn't report errors, so I'm going to go ahead and merge, but wanted to note the potential conflicts in versions.

@weierophinney weierophinney merged commit 0c8ad71 into from
@marc-mabe

@weierophinney:

  • mbstirng sets the configured substitution character - it looks like this character has been changed
    • the used character could be detected with mb_substitute_character to fix the test
    • I this the behavior is up to mbstring and the wrapper itself don't needs a change
  • iconv completely breaks conversion on an invalid character of input string
    • it looks like the //IGNORE addition is for unsupported character by output encoding and the behavior on invalid input characters has been changed to return an empty string now :(
    • This can't simply changed or handled without a version check
    • Zend\Escaper could be affected, too - I'm not sure 100%

I added the //IGNORE addition and the test to make sure converting will not be truncated on an invalid input character. Now this is the case for iconv and I don't see a workaround.

Possibilities:

  • remove the test
  • remove the test and the //IGNORE addition
  • changing the behavior to throw an exception or return false in this case (on all wrappers) but this requires additional checks and workarounds which slows down the function and we could ran into other issues not visible now

I personally prefer to simply remove the test, leave the //IGNORE addition and describe it on docs.

@weierophinney

@marc-mabe Makes sense -- give me a new PR, and I'll merge.

@marc-mabe marc-mabe referenced this pull request from a commit in marc-mabe/zf2
Marc Bennewitz removed test failing since PHP>=5.4
noted in #3110 (which added the test)
ceb37fa
@katalonec katalonec referenced this pull request from a commit in katalonec/zf2
@weierophinney weierophinney Merge branch 'feature/3110' into develop
Close #3110
1ca5cb9
@ghost Unknown referenced this pull request from a commit
@weierophinney weierophinney Merge branch 'feature/3110' into develop
Close #3110
548ff6a
@ghost Unknown referenced this pull request from a commit
Marc Bennewitz removed test failing since PHP>=5.4
noted in #3110 (which added the test)
2fa7578
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jun 15, 2012
  1. @marc-mabe

    initial StringUtils

    marc-mabe authored
Commits on Jul 7, 2012
  1. @marc-mabe
Commits on Jul 8, 2012
  1. @marc-mabe
  2. @marc-mabe
  3. @marc-mabe
  4. @marc-mabe

    adapter -> wrapper

    marc-mabe authored
Commits on Jul 13, 2012
  1. @marc-mabe
Commits on Nov 26, 2012
  1. @marc-mabe
  2. @marc-mabe

    ZendTest namespace

    marc-mabe authored
  3. @marc-mabe

    StringUtils: phpdoc + cs

    marc-mabe authored
Commits on Nov 27, 2012
  1. @marc-mabe

    StringUtils: phpdoc + cs

    marc-mabe authored
  2. @marc-mabe

    StringUtils: added tests

    marc-mabe authored
  3. @marc-mabe
  4. @marc-mabe

    StringUtils: cs

    marc-mabe authored
Commits on Nov 29, 2012
  1. @marc-mabe
  2. @marc-mabe
  3. @marc-mabe
  4. @marc-mabe

    StringUtils: cs

    marc-mabe authored
  5. @marc-mabe
  6. @marc-mabe
  7. @marc-mabe
  8. @marc-mabe

    StringUtils: optimations

    marc-mabe authored
  9. @marc-mabe

    StringUtils: cs

    marc-mabe authored
  10. @marc-mabe
  11. @marc-mabe
  12. @marc-mabe
  13. @marc-mabe
  14. @marc-mabe
Commits on Nov 30, 2012
  1. @marc-mabe
  2. @marc-mabe

    FIXME: Converting the euro sign from UTF-8 to ISO-8859-16 using the m…

    marc-mabe authored
    …bstring extension gives a wrong result
  3. @marc-mabe

    removed trailing spaces

    marc-mabe authored
Commits on Dec 31, 2012
  1. @marc-mabe

    Changed API to only check given encoding once and to better support t…

    marc-mabe authored
    …he new Intl-UConverter of PHP 5.5
  2. @marc-mabe
Commits on Jan 2, 2013
  1. @marc-mabe

    It's 2013

    marc-mabe authored
Commits on Jan 3, 2013
  1. @marc-mabe

    else isn't needed, as the previous block returns on completion + remo…

    marc-mabe authored
    …ved call to strlen if length of padding is less than or equal 0
  2. @marc-mabe
  3. @marc-mabe

    phpdoc

    marc-mabe authored
  4. @marc-mabe

    Added missing @deprecated

    marc-mabe authored
  5. @marc-mabe
  6. @marc-mabe
  7. @marc-mabe
  8. @marc-mabe

    fixed phpdoc

    marc-mabe authored
Commits on Jan 6, 2013
  1. @marc-mabe

    psr

    marc-mabe authored
  2. @marc-mabe
  3. @marc-mabe
  4. @marc-mabe

    just use $this->encoding

    marc-mabe authored
  5. @marc-mabe
  6. @marc-mabe
  7. @marc-mabe
  8. @marc-mabe
  9. @marc-mabe
  10. @marc-mabe
Commits on Jan 7, 2013
  1. @marc-mabe
  2. @marc-mabe
Something went wrong with that request. Please try again.