Stdlib\StringUtils #3110

marc-mabe · 2012-11-30T08:50:10Z

This class provides some basic handling of strings of different character encodings.
It comes with string wrappers for iconv, mbstring, intl (grapheme_* functions / UConverter) and a wrapper for native string functions. Wrapped functions are: strlen, strpos, substr, strpad, wordwrap, convert.

So it will be up to the user which PHP extension will be used supporting the required character encoding and which one is the best for a given character encoding.

Wrapper usage:

The following command returns an instance of the best available string wrapper supporting the given character encoding and if given with support to convert a string of the encoding into the other encoding.
If no wrapper was found an exception will be thrown.

StringUtils::getWrapper('encoding'[, 'encoding to convert to']) : StringWrapperInterface

The returned StringWrapperInterface simply wrappes the string functions -- there will be no error handling because of the heavy use of string functions (like in a loop) this should be up the consumer.

More helpful methods:

StringUtils::isSingleByteEncoding(<string encoding>) : boolean
StringUtils::getSingleByteEncodings() : string[]
StringUtils::isValidUtf8(<string>) : boolean // using preg_match
StringUtils::registerWrapper(<StringWrapperInterface>)
StringUtils::getRegisteredWrappers() : StringWrapperInterface[]

PS:

Character encodings are case-insensitive and will be handled in upper-case internally.
The default order of registered wrappers are: intl, mbstring, iconv, native
Zend\Text\MultiByte will be deprecated and redirected using the string wrapper
(all tests has been moved)

…er::convert

…bstring extension gives a wrong result

DASPRiD · 2013-01-06T01:09:18Z

library/Zend/Stdlib/StringWrapper/AbstractStringWrapper.php

+            );
+        }
+
+


Redundant blank line.

…arison on it

marc-mabe · 2013-01-06T15:06:09Z

hopefully done now

DASPRiD · 2013-01-06T16:53:48Z

library/Zend/Stdlib/StringWrapper/Iconv.php

+
+        // Full Unicode, in terms of uint16_t or uint32_t (with machine dependent endianness and alignment)
+        // 'UCS-2-INTERNAL',
+        // 'UCS-4-INTERNAL',


It should either be noted why those are commented out or those lines should be removed completely.

DASPRiD · 2013-01-06T18:59:35Z

tests/ZendTest/Stdlib/StringWrapper/CommonStringWrapperTest.php

+
+    public function wordWrapProvider()
+    {
+        return array(


For some reason, my comment wasn't saved here yesterday:

Use string keys for all data sets (e.g. the original test names, lowercase-dashed). Those will be used for reporting failures, instead of the numeric index, which helps a lot to also understand what the data set is actually testing.

Now I understand what you mean - I already added the orig. test method names but not as array keys -> updated

Since this is already the word wrap data provider, I'd personally remove the redundant leading "word-wrap-" from the data set names :)

weierophinney · 2013-01-07T17:16:06Z

@marc-mabe I think with the addition of the //IGNORE pragma, this will be ready; I'd add it myself, but I'm not sure I understand exactly which argument and/or which wrappers to add it to. Also, as @DASPRiD notes, remove the "word-wrap-" prefix from the one set of data providers.

Many thanks in advance -- this looks like it will be quite useful!

…lid character + test

marc-mabe · 2013-01-07T19:33:39Z

@weierophinney: I removed the word-wrap- prefix as @DASPRiD noted and also added //IGNORE to the destination encoding of the iconv wrapper + tests

weierophinney · 2013-01-07T21:44:04Z

I get 2 failures when I run tests; details are below. These are with PHP 5.4.9, using iconv v2.15 and libmbfl v1.3.2.

1) ZendTest\Stdlib\StringWrapper\IconvTest::testConvertDontSubstringsOnInvalidCharacter
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-foo x bar
+

tests/ZendTest/Stdlib/StringWrapper/CommonStringWrapperTest.php:151

2) ZendTest\Stdlib\StringWrapper\MbStringTest::testConvertDontSubstringsOnInvalidCharacter
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-foo x bar
+foo ?x bar

tests/ZendTest/Stdlib/StringWrapper/CommonStringWrapperTest.php:151

Travis doesn't report errors, so I'm going to go ahead and merge, but wanted to note the potential conflicts in versions.

marc-mabe · 2013-01-07T23:00:37Z

@weierophinney:

mbstirng sets the configured substitution character - it looks like this character has been changed
- the used character could be detected with mb_substitute_character to fix the test
- I this the behavior is up to mbstring and the wrapper itself don't needs a change
iconv completely breaks conversion on an invalid character of input string
- it looks like the //IGNORE addition is for unsupported character by output encoding and the behavior on invalid input characters has been changed to return an empty string now :(
- This can't simply changed or handled without a version check
- Zend\Escaper could be affected, too - I'm not sure 100%

I added the //IGNORE addition and the test to make sure converting will not be truncated on an invalid input character. Now this is the case for iconv and I don't see a workaround.

Possibilities:

remove the test
remove the test and the //IGNORE addition
changing the behavior to throw an exception or return false in this case (on all wrappers) but this requires additional checks and workarounds which slows down the function and we could ran into other issues not visible now

I personally prefer to simply remove the test, leave the //IGNORE addition and describe it on docs.

weierophinney · 2013-01-07T23:09:40Z

@marc-mabe Makes sense -- give me a new PR, and I'll merge.

Close zendframework/zendframework#3110

noted in zendframework/zendframework#3110 (which added the test)

Close zendframework/zendframework#3110

marc-mabe added 30 commits June 15, 2012 23:33

initial StringUtils

cd09a59

Merge branch 'master' of git://github.com/zendframework/zf2 into string

6c0f698

Native string adapter don't need ext/mbstring

906b5c3

StringUtils: tests, no component deps

2098c00

Merge branch 'master' of git://github.com/zendframework/zf2 into string

696fbec

adapter -> wrapper

49d8ec4

intl string wrapper and some small other changes

29e0da2

Merge branch 'develop' of git://github.com/zendframework/zf2 into string

4adbb3d

ZendTest namespace

6b42747

StringUtils: phpdoc + cs

3dd2d06

StringUtils: phpdoc + cs

b17b3de

StringUtils: added tests

0819967

StringUtilsTest: updated phpdoc

1ebab45

StringUtils: cs

5c89903

StringUtils: tests + fixes + supported encodings for iconv and mbstring

35b91e6

StringUtils: wording: charset -> encoding

dde5cf5

fixed wrong typed variable in StringUtils::getWrapper

1d800eb

StringUtils: cs

bdeddca

StringUtils: implemented basic functionality into AbstractStringWrapp…

9a44514

…er::convert

StringUtils: hopefully a little better encoding list

b831a53

StringUtils: hopefully a little better encoding list

5f42457

StringUtils: optimations

1c82b00

StringUtils: cs

0c998cb

Updated Zend\Validator to used StringUtils

c16b3ef

Updated Zend\Mvc to used StringUtils

954950d

StringUtils: MbString wrapper use of 'mb_list_encodings'

7ba3c5f

Updated Zend\Text to use StringUtils and deprecated Zend\Text\MultiByte

6b82989

Updated Zend\Feed to use StringUtils

d0fa0ad

Zend\Feed: replaced one iconv_strlen with a string wrapper

0a17816

FIXME: Converting the euro sign from UTF-8 to ISO-8859-16 using the m…

90b0367

…bstring extension gives a wrong result

DASPRiD reviewed Jan 6, 2013
View reviewed changes

marc-mabe added 7 commits January 6, 2013 15:16

psr

b394227

added StringUtils::resetRegisteredWrappers() for testing purposes

5ef114b

Global namespace not needed for constants

adb6827

just use $this->encoding

170b863

File and class level docblocks

9811ee1

Removed not neccessary variable comversion as there is no strict comp…

c5ec8cd

…arison on it

Added short describtions on tests using data providers

e6d514f

DASPRiD reviewed Jan 6, 2013
View reviewed changes

added comment for a commented out block

4b8897a

DASPRiD reviewed Jan 6, 2013
View reviewed changes

marc-mabe added 2 commits January 6, 2013 20:44

Use array keys as description for data provider

7f0751a

There is no encoding argument of strlen

ad9bf3d

marc-mabe added 2 commits January 7, 2013 20:28

Make sure StringWrapper::convert don't substring return value on inva…

c3034a1

…lid character + test

removed unneccissary 'word-wrap-' prefix

0c8ad71

weierophinney merged commit 0c8ad71 into zendframework:develop Jan 7, 2013

marc-mabe mentioned this pull request Jan 7, 2013

removed test failing since PHP>=5.4 #3377

Merged

weierophinney added a commit to zendframework/zend-stdlib that referenced this pull request May 15, 2015

Merge branch 'feature/3110' into develop

b0e848a

Close zendframework/zendframework#3110

gianarb pushed a commit to zendframework/zend-stdlib that referenced this pull request May 15, 2015

removed test failing since PHP>=5.4

bcf6ba9

noted in zendframework/zendframework#3110 (which added the test)

weierophinney added a commit to zendframework/zend-text that referenced this pull request May 15, 2015

Merge branch 'feature/3110' into develop

527fcd1

Close zendframework/zendframework#3110

weierophinney added a commit to zendframework/zend-progressbar that referenced this pull request May 15, 2015

Merge branch 'feature/3110' into develop

3c20bc3

Close zendframework/zendframework#3110

weierophinney added a commit to zendframework/zend-validator that referenced this pull request May 15, 2015

Merge branch 'feature/3110' into develop

3a31898

Close zendframework/zendframework#3110

weierophinney added a commit to zendframework/zend-feed that referenced this pull request May 15, 2015

Merge branch 'feature/3110' into develop

e20a954

Close zendframework/zendframework#3110

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stdlib\StringUtils #3110

Stdlib\StringUtils #3110

marc-mabe commented Nov 30, 2012

DASPRiD Jan 6, 2013

marc-mabe Jan 6, 2013

marc-mabe commented Jan 6, 2013

DASPRiD Jan 6, 2013

DASPRiD Jan 6, 2013

marc-mabe Jan 6, 2013

DASPRiD Jan 7, 2013

marc-mabe Jan 7, 2013

weierophinney commented Jan 7, 2013

marc-mabe commented Jan 7, 2013

weierophinney commented Jan 7, 2013

marc-mabe commented Jan 7, 2013

weierophinney commented Jan 7, 2013

Stdlib\StringUtils #3110

Stdlib\StringUtils #3110

Conversation

marc-mabe commented Nov 30, 2012

DASPRiD Jan 6, 2013

Choose a reason for hiding this comment

marc-mabe Jan 6, 2013

Choose a reason for hiding this comment

marc-mabe commented Jan 6, 2013

DASPRiD Jan 6, 2013

Choose a reason for hiding this comment

DASPRiD Jan 6, 2013

Choose a reason for hiding this comment

marc-mabe Jan 6, 2013

Choose a reason for hiding this comment

DASPRiD Jan 7, 2013

Choose a reason for hiding this comment

marc-mabe Jan 7, 2013

Choose a reason for hiding this comment

weierophinney commented Jan 7, 2013

marc-mabe commented Jan 7, 2013

weierophinney commented Jan 7, 2013

marc-mabe commented Jan 7, 2013

weierophinney commented Jan 7, 2013