Stdlib\StringUtils #3110

Merged
merged 54 commits into from Jan 7, 2013

Conversation

Projects
None yet
3 participants
Member

marc-mabe commented Nov 30, 2012

This class provides some basic handling of strings of different character encodings.
It comes with string wrappers for iconv, mbstring, intl (grapheme_* functions / UConverter) and a wrapper for native string functions. Wrapped functions are: strlen, strpos, substr, strpad, wordwrap, convert.

So it will be up to the user which PHP extension will be used supporting the required character encoding and which one is the best for a given character encoding.

Wrapper usage:

The following command returns an instance of the best available string wrapper supporting the given character encoding and if given with support to convert a string of the encoding into the other encoding.
If no wrapper was found an exception will be thrown.

StringUtils::getWrapper('encoding'[, 'encoding to convert to']) : StringWrapperInterface

The returned StringWrapperInterface simply wrappes the string functions -- there will be no error handling because of the heavy use of string functions (like in a loop) this should be up the consumer.

More helpful methods:

StringUtils::isSingleByteEncoding(<string encoding>) : boolean
StringUtils::getSingleByteEncodings() : string[]
StringUtils::isValidUtf8(<string>) : boolean // using preg_match
StringUtils::registerWrapper(<StringWrapperInterface>)
StringUtils::getRegisteredWrappers() : StringWrapperInterface[]

PS:

  • Character encodings are case-insensitive and will be handled in upper-case internally.
  • The default order of registered wrappers are: intl, mbstring, iconv, native
  • Zend\Text\MultiByte will be deprecated and redirected using the string wrapper
    (all tests has been moved)

marc-mabe added some commits Jun 15, 2012

@marc-mabe marc-mabe initial StringUtils cd09a59
@marc-mabe marc-mabe Merge branch 'master' of git://github.com/zendframework/zf2 into string 6c0f698
@marc-mabe marc-mabe Native string adapter don't need ext/mbstring 906b5c3
@marc-mabe marc-mabe StringUtils: tests, no component deps 2098c00
@marc-mabe marc-mabe Merge branch 'master' of git://github.com/zendframework/zf2 into string 696fbec
@marc-mabe marc-mabe adapter -> wrapper 49d8ec4
@marc-mabe marc-mabe intl string wrapper and some small other changes 29e0da2
@marc-mabe marc-mabe Merge branch 'develop' of git://github.com/zendframework/zf2 into string 4adbb3d
@marc-mabe marc-mabe ZendTest namespace 6b42747
@marc-mabe marc-mabe StringUtils: phpdoc + cs 3dd2d06
@marc-mabe marc-mabe StringUtils: phpdoc + cs b17b3de
@marc-mabe marc-mabe StringUtils: added tests 0819967
@marc-mabe marc-mabe StringUtilsTest: updated phpdoc 1ebab45
@marc-mabe marc-mabe StringUtils: cs 5c89903
@marc-mabe marc-mabe StringUtils: tests + fixes + supported encodings for iconv and mbstring 35b91e6
@marc-mabe marc-mabe StringUtils: wording: charset -> encoding dde5cf5
@marc-mabe marc-mabe fixed wrong typed variable in StringUtils::getWrapper 1d800eb
@marc-mabe marc-mabe StringUtils: cs bdeddca
@marc-mabe marc-mabe StringUtils: implemented basic functionality into AbstractStringWrapp…
…er::convert
9a44514
@marc-mabe marc-mabe StringUtils: hopefully a little better encoding list b831a53
@marc-mabe marc-mabe StringUtils: hopefully a little better encoding list 5f42457
@marc-mabe marc-mabe StringUtils: optimations 1c82b00
@marc-mabe marc-mabe StringUtils: cs 0c998cb
@marc-mabe marc-mabe Updated Zend\Validator to used StringUtils c16b3ef
@marc-mabe marc-mabe Updated Zend\Mvc to used StringUtils 954950d
@marc-mabe marc-mabe StringUtils: MbString wrapper use of 'mb_list_encodings' 7ba3c5f
@marc-mabe marc-mabe Updated Zend\Text to use StringUtils and deprecated Zend\Text\MultiByte 6b82989
@marc-mabe marc-mabe Updated Zend\Feed to use StringUtils d0fa0ad
@marc-mabe marc-mabe Zend\Feed: replaced one iconv_strlen with a string wrapper 0a17816
@marc-mabe marc-mabe FIXME: Converting the euro sign from UTF-8 to ISO-8859-16 using the m…
…bstring extension gives a wrong result
90b0367
@marc-mabe marc-mabe removed trailing spaces 46ae440
Member

marc-mabe commented Dec 11, 2012

With PHP-5.5 the intl adapter could handle more encodings supported by UConverter (https://wiki.php.net/rfc/uconverter) but strings need to be encoded to UTF-8 to run methods like strlen, substr ... using the grapheme_* functions

Owner

weierophinney commented Jan 2, 2013

@marc-mabe I see the PR is still marked WIP -- what's the current status, and when do you foresee completion for review?

Member

marc-mabe commented Jan 2, 2013

It's ready for review!

The only one missing is the intl-UConverter but PHP-5.5 is still alpha ;)

Owner

weierophinney commented Jan 2, 2013

Okay, so update the title to remove the WIP designation. :-)

On Wednesday, January 2, 2013, Marc Bennewitz wrote:

It's ready for review!

The only one missing is the intl-UConverter but PHP-5.5 is still alpha ;)


Reply to this email directly or view it on GitHubhttps://github.com/zendframework/zf2/pull/3110#issuecomment-11817181.

Matthew Weier O'Phinney
matthew@weierophinney.net
http://mwop.net/

Member

marc-mabe commented Jan 2, 2013

already done

@weierophinney weierophinney commented on an outdated diff Jan 3, 2013

...y/Zend/Stdlib/StringWrapper/AbstractStringWrapper.php
+ if ($padType === \STR_PAD_BOTH) {
+ $lastStringLeft = '';
+ $lastStringRight = '';
+ $repeatCountLeft = $repeatCountRight = ($repeatCount - $repeatCount % 2) / 2;
+
+ $lastStringLength = $lengthOfPadding - 2 * $repeatCountLeft * $padStringLength;
+ $lastStringLeftLength = $lastStringRightLength = floor($lastStringLength / 2);
+ $lastStringRightLength += $lastStringLength % 2;
+
+ $lastStringLeft = $this->substr($padString, 0, $lastStringLeftLength);
+ $lastStringRight = $this->substr($padString, 0, $lastStringRightLength);
+
+ return str_repeat($padString, $repeatCountLeft) . $lastStringLeft
+ . $input
+ . str_repeat($padString, $repeatCountRight) . $lastStringRight;
+ } else {
@weierophinney

weierophinney Jan 3, 2013

Owner

This else isn't needed, as the previous block returns on completion.

@weierophinney weierophinney commented on an outdated diff Jan 3, 2013

...y/Zend/Stdlib/StringWrapper/AbstractStringWrapper.php
+ $lastStringLength = $lengthOfPadding - 2 * $repeatCountLeft * $padStringLength;
+ $lastStringLeftLength = $lastStringRightLength = floor($lastStringLength / 2);
+ $lastStringRightLength += $lastStringLength % 2;
+
+ $lastStringLeft = $this->substr($padString, 0, $lastStringLeftLength);
+ $lastStringRight = $this->substr($padString, 0, $lastStringRightLength);
+
+ return str_repeat($padString, $repeatCountLeft) . $lastStringLeft
+ . $input
+ . str_repeat($padString, $repeatCountRight) . $lastStringRight;
+ } else {
+ $lastString = $this->substr($padString, 0, $lengthOfPadding % $padStringLength);
+
+ if ($padType === \STR_PAD_LEFT) {
+ return str_repeat($padString, $repeatCount) . $lastString . $input;
+ } else {
@weierophinney

weierophinney Jan 3, 2013

Owner

This else isn't needed, as the previous block returns on completion.

@weierophinney weierophinney commented on the diff Jan 3, 2013

library/Zend/Stdlib/StringWrapper/Iconv.php
+use Zend\Stdlib\Exception;
+
+/**
+ * @category Zend
+ * @package Zend_Stdlib
+ * @subpackage StringWrapper
+ */
+class Iconv extends AbstractStringWrapper
+{
+ /**
+ * List of supported character sets (upper case)
+ *
+ * @var string[]
+ * @link http://www.gnu.org/software/libiconv/
+ */
+ protected static $encodings = array(
@weierophinney

weierophinney Jan 3, 2013

Owner

From what I understand, these encodings could vary from release to release. How do you plan to address that?

@marc-mabe

marc-mabe Jan 3, 2013

Member

The defined encodings are from http://www.gnu.org/software/libiconv/
I couldn't find a way to detect supported encodings. There isn't a function and there isn't informational output by phpinfo.
Do you know a way ?

-> For now the consumer have to extend this class to support more compiled encodings by iconv.

@weierophinney

weierophinney Jan 3, 2013

Owner

That's fine -- we should likely detail this in the docs, however, so that individuals can understand that the encodings are hardcoded, and extension is the only way to change the set.

@marc-mabe

marc-mabe Jan 3, 2013

Member

I'll start working on documentation next week.

@DASPRiD

DASPRiD Jan 6, 2013

Member

About the formatting:
Even though it looks okay, but if you apply the "parameter" formatting from PSR-2 here, you have to keep one value per line, if all values aren't on the same line.

@weierophinney weierophinney commented on an outdated diff Jan 3, 2013

library/Zend/Stdlib/StringWrapper/MbString.php
+ * @return int|false
+ */
+ public function strpos($haystack, $needle, $offset = 0)
+ {
+ return mb_strpos($haystack, $needle, $offset, $this->encoding);
+ }
+
+ /**
+ * Convert a string from one character encoding to another
+ *
+ * @param string $str
+ * @param string $toEncoding
+ * @param string $fromEncoding
+ * @return string|false
+ */
+ public function convert($str, $backword = false)
@weierophinney

weierophinney Jan 3, 2013

Owner

I think you mean "backward", not "backword" -- but more importantly, it seems like the parameters in the docblock do not match those actually accepted. This is true both in this class and the iconv variant -- please clarify, and indicate the proper arguments in each, as well as clarify what is meant by "backward" (I think you may actually mean something like "reverse").

@weierophinney weierophinney and 1 other commented on an outdated diff Jan 3, 2013

.../Zend/Stdlib/StringWrapper/StringWrapperInterface.php
+ public static function isSupported($encoding, $convertEncoding = null);
+
+ /**
+ * Get a list of supported character encodings
+ *
+ * @return string[]
+ */
+ public static function getSupportedEncodings();
+
+ /**
+ * Constructor
+ *
+ * @param string $encoding Character encoding working on
+ * @param string|null $convertEncoding Character encoding to convert to
+ */
+ public function __construct($encoding, $convertEncoding = null);
@weierophinney

weierophinney Jan 3, 2013

Owner

I'm not a big fan of defining constructors inside interfaces; it makes it more difficult to mock implementations, create decorators, etc. I'd argue also that since you have a setEncoding() method already, this requirement can be removed. That said, I'd argue for adding a setConvertEncoding method to the interface, as it's clear that you intend for that functionality in most implementations.

@marc-mabe

marc-mabe Jan 3, 2013

Member

Thats because it requires the encoding property set to a supported encoding. Else the wrapper needs to check if the encodingproperty is empty / supported on every string function.

I defined setting the encoding and convert encoding properties within one method to be free if a wrapper supports some encodings for basic string functions but not to convert all to all. -> For example the native string wrapper is marked to support all defined single byte encodings but it can't convert all encodings.

@weierophinney

weierophinney Jan 3, 2013

Owner

Why not have getEncoding() raise an exception when the value is empty? That would allow you to remove it from the constructor, and resolve the need to check for an encoding each time it's needed (you'd simply say $enc = $this->getEncoding(), and it would raise an exception if not set).

@marc-mabe

marc-mabe Jan 3, 2013

Member

OK after reflecting it a little bit more I'll add getters for both encodings but the getter should only return (no check) because this methods could be needed outside to get the current object state. I'll add the check after getting the value - it's only one if + throw.

@weierophinney weierophinney commented on an outdated diff Jan 3, 2013

.../Zend/Stdlib/StringWrapper/StringWrapperInterface.php
+ * Find the position of the first occurrence of a substring in a string
+ *
+ * @param string $haystack
+ * @param string $needle
+ * @param int $offset
+ * @param string $encoding
+ * @return int|false
+ */
+ public function strpos($haystack, $needle, $offset = 0);
+
+ /**
+ * Convert a string from one character encoding to another
+ *
+ * @param string $str
+ * @param boolean $backward
+ * @return string|false
@weierophinney

weierophinney Jan 3, 2013

Owner

Find a better word than "backward" -- what you mean to do is reverse encoding.

@marc-mabe marc-mabe else isn't needed, as the previous block returns on completion + remo…
…ved call to strlen if length of padding is less than or equal 0
54d97d5

@weierophinney weierophinney commented on the diff Jan 3, 2013

tests/ZendTest/Stdlib/StringUtilsTest.php
+ array('8bit'),
+ array('ISo-8859-1'),
+ array('ISo-8859-2'),
+ array('ISo-8859-3'),
+ array('ISo-8859-4'),
+ array('ISo-8859-5'),
+ array('ISo-8859-6'),
+ array('ISo-8859-7'),
+ array('ISo-8859-8'),
+ array('ISo-8859-9'),
+ array('ISo-8859-10'),
+ array('ISo-8859-11'),
+ array('ISo-8859-13'),
+ array('ISo-8859-14'),
+ array('ISo-8859-15'),
+ array('ISo-8859-16'),
@weierophinney

weierophinney Jan 3, 2013

Owner

I'm guessing you intended for the "o" in the above to be "O" (capitalized). :-)

@marc-mabe

marc-mabe Jan 3, 2013

Member

no, it's to make sure the encodings are case-insensitive - I'll add a comment

@weierophinney weierophinney commented on the diff Jan 3, 2013

library/Zend/Text/MultiByte.php
- . $input
- . str_repeat($padString, $repeatCountRight) . $lastStringRight;
- } else {
- $lastString = iconv_substr($padString, 0, $lengthOfPadding % $padStringLength, $charset);
-
- if ($padType === STR_PAD_LEFT) {
- $return = str_repeat($padString, $repeatCount) . $lastString . $input;
- } else {
- $return = $input . str_repeat($padString, $repeatCount) . $lastString;
- }
- }
- }
+ trigger_error(sprintf(
+ "This method is deprecated, please use '%s' instead",
+ 'Zend\Stdlib\StringUtils::getWrapper(<charset>)->strPad'
+ ), E_USER_DEPRECATED);
@weierophinney

weierophinney Jan 3, 2013

Owner

If you're going to do this, I'd also add an @deprecated annotation to the class.

@marc-mabe

marc-mabe Jan 3, 2013

Member

wordWrap already had one - I added it for strPad, too

weierophinney was assigned Jan 3, 2013

Owner

weierophinney commented Jan 3, 2013

@marc-mabe Looking good -- address the comments I've made above, and I think we can review for inclusion.

Member

marc-mabe commented Jan 3, 2013

@weierophinney: I have fixed or commented your notes

Member

marc-mabe commented Jan 3, 2013

Added the methods getEncoding() and getConvertEncoding and removed the constructor from interface.
Now the wrappers has a supported default encoding UTF-8 for all wrappers else Native - there it's ASCII

Because the recommended way to instantiate a wrapper is to use StringUtils::getWrapper() which defines the given character encodings. The encoding will be used but unconventional instantiations doesn't have issues with an unset encoding because of the default encodings defined.

@weierophinney weierophinney commented on an outdated diff Jan 4, 2013

library/Zend/Feed/Writer/Extension/ITunes/Entry.php
@@ -34,6 +36,18 @@ class Entry
protected $encoding = 'UTF-8';
/**
+ * The used string wrapper supporting encoding
+ *
+ * @var StringWrapperInterface
+ */
+ protected $stringWrapper;
+
+ public function __construct()
+ {
+ $this->stringWrapper = StringUtils::getWrapper($this->getEncoding());
@weierophinney

weierophinney Jan 4, 2013

Owner

Considering there's a default, and getEncoding() simply returns the property, I'd just use $this->encoding here.

@weierophinney weierophinney commented on an outdated diff Jan 4, 2013

library/Zend/Feed/Writer/Extension/ITunes/Feed.php
@@ -34,6 +36,21 @@ class Feed
protected $encoding = 'UTF-8';
/**
+ * The used string wrapper supporting encoding
+ *
+ * @var StringWrapperInterface
+ */
+ protected $stringWrapper;
+
+ /**
+ * Constructor
+ */
+ public function __construct()
+ {
+ $this->stringWrapper = StringUtils::getWrapper($this->getEncoding());
@weierophinney

weierophinney Jan 4, 2013

Owner

Same comment here as for the feed.

@weierophinney weierophinney commented on an outdated diff Jan 4, 2013

library/Zend/ProgressBar/Adapter/Console.php
@@ -438,7 +439,12 @@ public function notify($current, $max, $percent, $timeTaken, $timeRemaining, $te
break;
case self::ELEMENT_TEXT:
- $renderedElements[] = \Zend\Text\MultiByte::strPad(substr($text, 0, $this->textWidth), $this->textWidth, ' ', STR_PAD_RIGHT, $this->charset);
+ $renderedElements[] = StringUtils::getWrapper($this->charset)->strPad(
+ substr($text, 0, $this->textWidth),
+ $this->textWidth,
+ ' ',
+ \STR_PAD_RIGHT
@weierophinney

weierophinney Jan 4, 2013

Owner

You don't need to do the global namespace prefix here; constants are dereferenced from the global namespace automagically.

@weierophinney weierophinney commented on the diff Jan 4, 2013

...Zend/Stdlib/Exception/ExtensionNotLoadedException.php
@@ -0,0 +1,7 @@
+<?php
+
@weierophinney

weierophinney Jan 4, 2013

Owner

Needs file and class level docblocks.

@weierophinney weierophinney commented on the diff Jan 4, 2013

library/Zend/Stdlib/Exception/RuntimeException.php
@@ -0,0 +1,7 @@
+<?php
+
@weierophinney

weierophinney Jan 4, 2013

Owner

Needs file and class level docblocks.

@weierophinney weierophinney commented on the diff Jan 4, 2013

library/Zend/Stdlib/StringUtils.php
+ public static function registerWrapper($wrapper)
+ {
+ $wrapper = (string) $wrapper;
+ if (!in_array($wrapper, static::$wrapperRegistry, true)) {
+ static::$wrapperRegistry[] = $wrapper;
+ }
+ }
+
+ /**
+ * Unregister a string wrapper class
+ *
+ * @param string $wrapper
+ * @return void
+ */
+ public static function unregisterWrapper($wrapper)
+ {
@weierophinney

weierophinney Jan 4, 2013

Owner

We should likely also have a method to clear the wrapper registry (for testing purposes).

@marc-mabe

marc-mabe Jan 6, 2013

Member

added StringUtils::resetRegisteredWrappers()

@weierophinney weierophinney commented on an outdated diff Jan 4, 2013

library/Zend/Text/MultiByte.php
*/
- public static function strPad($input, $padLength, $padString = ' ', $padType = STR_PAD_RIGHT, $charset = 'utf-8')
+ public static function strPad($input, $padLength, $padString = ' ', $padType = \STR_PAD_RIGHT, $charset = 'utf-8')
@weierophinney

weierophinney Jan 4, 2013

Owner

The global qualifier for the constant is not necessary.

Owner

weierophinney commented Jan 4, 2013

@marc-mabe Almost there!

@DASPRiD DASPRiD commented on an outdated diff Jan 6, 2013

...Test/Stdlib/StringWrapper/CommonStringWrapperTest.php
+ $this->markTestSkipped("Encoding {$encoding} or {$convertEncoding} not supported");
+ }
+
+ $result = $wrapper->convert($str);
+ $this->assertSame($expected, $result);
+
+ // backword
+ $result = $wrapper->convert($expected, true);
+ $this->assertSame($str, $result);
+ }
+
+ public function wordWrapProvider()
+ {
+ return array(
+ // Standard cut tests
+ array('utf-8', 'äbüöcß', 2, ' ', true,
@DASPRiD

DASPRiD Jan 6, 2013

Member

Within data providers, it helps to give every data-set a descriptive string key (see route tests for instance). Those will be given in case of an error, instead of just the numeric index. Since these are converted tests, it's fine to use the original test names (lowercase-dashed).

@DASPRiD DASPRiD commented on an outdated diff Jan 6, 2013

...y/Zend/Stdlib/StringWrapper/AbstractStringWrapper.php
+ * @return string|false
+ */
+ public function wordWrap($string, $width = 75, $break = "\n", $cut = false)
+ {
+ $string = (string) $string;
+ if ($string === '') {
+ return '';
+ }
+
+ $break = (string) $break;
+ if ($break === '') {
+ throw new Exception\InvalidArgumentException('Break string cannot be empty');
+ }
+
+ $width = (int) $width;
+ $cut = (bool) $cut;
@DASPRiD

DASPRiD Jan 6, 2013

Member

This is not directly neccessary, as we are not doing any strict comparision on $cut.

@DASPRiD DASPRiD commented on the diff Jan 6, 2013

...y/Zend/Stdlib/StringWrapper/AbstractStringWrapper.php
+ * @param string $encoding The character encoding to work with
+ * @param string|null $convertEncoding The character encoding to convert to
+ * @return StringWrapperInterface
+ */
+ public function setEncoding($encoding, $convertEncoding = null)
+ {
+ $supportedEncodings = static::getSupportedEncodings();
+
+ $encodingUpper = strtoupper($encoding);
+ if (!in_array($encodingUpper, $supportedEncodings)) {
+ throw new Exception\InvalidArgumentException(
+ 'Wrapper doesn\'t support character encoding "' . $encoding . '"'
+ );
+ }
+
+
@DASPRiD

DASPRiD Jan 6, 2013

Member

Redundant blank line.

@DASPRiD DASPRiD and 2 others commented on an outdated diff Jan 6, 2013

library/Zend/Stdlib/StringWrapper/Iconv.php
+ {
+ $encoding = $this->getEncoding();
+ $convertEncoding = $this->getConvertEncoding();
+ if ($convertEncoding === null) {
+ throw new Exception\LogicException(
+ 'No convert encoding defined'
+ );
+ }
+
+ if ($encoding === $convertEncoding) {
+ return $str;
+ }
+
+ $fromEncoding = $reverse ? $convertEncoding : $encoding;
+ $toEncoding = $reverse ? $encoding : $convertEncoding;
+ return iconv($fromEncoding, $toEncoding, $str);
@DASPRiD

DASPRiD Jan 6, 2013

Member

What about //translit and //ignore?

@marc-mabe

marc-mabe Jan 6, 2013

Member

Such functionality isn't supported yet because it needs to be available by all wrappers. Same with strto[lower|upper].

@DASPRiD

DASPRiD Jan 6, 2013

Member

So in cases those are required we fallback to direct iconv usage in the core?

@marc-mabe

marc-mabe Jan 6, 2013

Member

//TRANSLIT - than you need iconv explicit
//IGNORE - I tested converting an utf-8 string with one invalid character into iso-8859-1:

  • mbstring ignores the invalid character and doesn't trigger an error
  • iconv stops converting on the invalid character and triggers a notice. With //IGNORE it ignores the invalid character but triggers a notice, too

So I could add //IGNORE automatically on every convert because truncating the string to convert helps nobody.

What do you think ?

@weierophinney

weierophinney Jan 7, 2013

Owner

Sounds reasonable.

@marc-mabe

marc-mabe Jan 7, 2013

Member

added //IGNORE to the destination encoding on iconv wrapper + test to not truncate on converting for all wrappers

Member

marc-mabe commented Jan 6, 2013

hopefully done now

@DASPRiD DASPRiD commented on an outdated diff Jan 6, 2013

library/Zend/Stdlib/StringWrapper/Iconv.php
+ 'UCS-4',
+ 'UCS-4BE',
+ 'UCS-4LE',
+ 'UTF-16',
+ 'UTF-16BE',
+ 'UTF-16LE',
+ 'UTF-32',
+ 'UTF-32BE',
+ 'UTF-32LE',
+ 'UTF-7',
+ 'C99',
+ 'JAVA',
+
+ // Full Unicode, in terms of uint16_t or uint32_t (with machine dependent endianness and alignment)
+ // 'UCS-2-INTERNAL',
+ // 'UCS-4-INTERNAL',
@DASPRiD

DASPRiD Jan 6, 2013

Member

It should either be noted why those are commented out or those lines should be removed completely.

@DASPRiD DASPRiD commented on the diff Jan 6, 2013

...Test/Stdlib/StringWrapper/CommonStringWrapperTest.php
+ $wrapper = $this->getWrapper($encoding, $convertEncoding);
+ if (!$wrapper) {
+ $this->markTestSkipped("Encoding {$encoding} or {$convertEncoding} not supported");
+ }
+
+ $result = $wrapper->convert($str);
+ $this->assertSame($expected, $result);
+
+ // backword
+ $result = $wrapper->convert($expected, true);
+ $this->assertSame($str, $result);
+ }
+
+ public function wordWrapProvider()
+ {
+ return array(
@DASPRiD

DASPRiD Jan 6, 2013

Member

For some reason, my comment wasn't saved here yesterday:

Use string keys for all data sets (e.g. the original test names, lowercase-dashed). Those will be used for reporting failures, instead of the numeric index, which helps a lot to also understand what the data set is actually testing.

@marc-mabe

marc-mabe Jan 6, 2013

Member

Now I understand what you mean - I already added the orig. test method names but not as array keys -> updated

@DASPRiD

DASPRiD Jan 7, 2013

Member

Since this is already the word wrap data provider, I'd personally remove the redundant leading "word-wrap-" from the data set names :)

Owner

weierophinney commented Jan 7, 2013

@marc-mabe I think with the addition of the //IGNORE pragma, this will be ready; I'd add it myself, but I'm not sure I understand exactly which argument and/or which wrappers to add it to. Also, as @DASPRiD notes, remove the "word-wrap-" prefix from the one set of data providers.

Many thanks in advance -- this looks like it will be quite useful!

Member

marc-mabe commented Jan 7, 2013

@weierophinney: I removed the word-wrap- prefix as @DASPRiD noted and also added //IGNORE to the destination encoding of the iconv wrapper + tests

Owner

weierophinney commented Jan 7, 2013

I get 2 failures when I run tests; details are below. These are with PHP 5.4.9, using iconv v2.15 and libmbfl v1.3.2.

1) ZendTest\Stdlib\StringWrapper\IconvTest::testConvertDontSubstringsOnInvalidCharacter
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-foo x bar
+

tests/ZendTest/Stdlib/StringWrapper/CommonStringWrapperTest.php:151

2) ZendTest\Stdlib\StringWrapper\MbStringTest::testConvertDontSubstringsOnInvalidCharacter
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-foo x bar
+foo ?x bar

tests/ZendTest/Stdlib/StringWrapper/CommonStringWrapperTest.php:151

Travis doesn't report errors, so I'm going to go ahead and merge, but wanted to note the potential conflicts in versions.

@weierophinney weierophinney merged commit 0c8ad71 into zendframework:develop Jan 7, 2013

1 check failed

default The Travis build failed
Details
Member

marc-mabe commented Jan 7, 2013

@weierophinney:

  • mbstirng sets the configured substitution character - it looks like this character has been changed
    • the used character could be detected with mb_substitute_character to fix the test
    • I this the behavior is up to mbstring and the wrapper itself don't needs a change
  • iconv completely breaks conversion on an invalid character of input string
    • it looks like the //IGNORE addition is for unsupported character by output encoding and the behavior on invalid input characters has been changed to return an empty string now :(
    • This can't simply changed or handled without a version check
    • Zend\Escaper could be affected, too - I'm not sure 100%

I added the //IGNORE addition and the test to make sure converting will not be truncated on an invalid input character. Now this is the case for iconv and I don't see a workaround.

Possibilities:

  • remove the test
  • remove the test and the //IGNORE addition
  • changing the behavior to throw an exception or return false in this case (on all wrappers) but this requires additional checks and workarounds which slows down the function and we could ran into other issues not visible now

I personally prefer to simply remove the test, leave the //IGNORE addition and describe it on docs.

Owner

weierophinney commented Jan 7, 2013

@marc-mabe Makes sense -- give me a new PR, and I'll merge.

@weierophinney weierophinney added a commit to zendframework/zend-stdlib that referenced this pull request May 15, 2015

@weierophinney weierophinney Merge branch 'feature/3110' into develop b0e848a

@gianarb gianarb pushed a commit to zendframework/zend-stdlib that referenced this pull request May 15, 2015

Marc Bennewitz removed test failing since PHP>=5.4
noted in zendframework/zendframework#3110 (which added the test)
bcf6ba9

@weierophinney weierophinney added a commit to zendframework/zend-text that referenced this pull request May 15, 2015

@weierophinney weierophinney Merge branch 'feature/3110' into develop 527fcd1

@weierophinney weierophinney added a commit to zendframework/zend-progressbar that referenced this pull request May 15, 2015

@weierophinney weierophinney Merge branch 'feature/3110' into develop 3c20bc3

@weierophinney weierophinney added a commit to zendframework/zend-validator that referenced this pull request May 15, 2015

@weierophinney weierophinney Merge branch 'feature/3110' into develop 3a31898

@weierophinney weierophinney added a commit to zendframework/zend-feed that referenced this pull request May 15, 2015

@weierophinney weierophinney Merge branch 'feature/3110' into develop e20a954
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment