Skip to content
This repository has been archived by the owner on Jan 8, 2020. It is now read-only.

Stdlib\StringUtils #3110

Merged
merged 54 commits into from Jan 7, 2013
Merged

Stdlib\StringUtils #3110

merged 54 commits into from Jan 7, 2013

Conversation

marc-mabe
Copy link
Member

This class provides some basic handling of strings of different character encodings.
It comes with string wrappers for iconv, mbstring, intl (grapheme_* functions / UConverter) and a wrapper for native string functions. Wrapped functions are: strlen, strpos, substr, strpad, wordwrap, convert.

So it will be up to the user which PHP extension will be used supporting the required character encoding and which one is the best for a given character encoding.

Wrapper usage:

The following command returns an instance of the best available string wrapper supporting the given character encoding and if given with support to convert a string of the encoding into the other encoding.
If no wrapper was found an exception will be thrown.

StringUtils::getWrapper('encoding'[, 'encoding to convert to']) : StringWrapperInterface

The returned StringWrapperInterface simply wrappes the string functions -- there will be no error handling because of the heavy use of string functions (like in a loop) this should be up the consumer.

More helpful methods:

StringUtils::isSingleByteEncoding(<string encoding>) : boolean
StringUtils::getSingleByteEncodings() : string[]
StringUtils::isValidUtf8(<string>) : boolean // using preg_match
StringUtils::registerWrapper(<StringWrapperInterface>)
StringUtils::getRegisteredWrappers() : StringWrapperInterface[]

PS:

  • Character encodings are case-insensitive and will be handled in upper-case internally.
  • The default order of registered wrappers are: intl, mbstring, iconv, native
  • Zend\Text\MultiByte will be deprecated and redirected using the string wrapper
    (all tests has been moved)

);
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant blank line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@marc-mabe
Copy link
Member Author

hopefully done now


// Full Unicode, in terms of uint16_t or uint32_t (with machine dependent endianness and alignment)
// 'UCS-2-INTERNAL',
// 'UCS-4-INTERNAL',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should either be noted why those are commented out or those lines should be removed completely.


public function wordWrapProvider()
{
return array(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, my comment wasn't saved here yesterday:

Use string keys for all data sets (e.g. the original test names, lowercase-dashed). Those will be used for reporting failures, instead of the numeric index, which helps a lot to also understand what the data set is actually testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand what you mean - I already added the orig. test method names but not as array keys -> updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is already the word wrap data provider, I'd personally remove the redundant leading "word-wrap-" from the data set names :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@weierophinney
Copy link
Member

@marc-mabe I think with the addition of the //IGNORE pragma, this will be ready; I'd add it myself, but I'm not sure I understand exactly which argument and/or which wrappers to add it to. Also, as @DASPRiD notes, remove the "word-wrap-" prefix from the one set of data providers.

Many thanks in advance -- this looks like it will be quite useful!

@marc-mabe
Copy link
Member Author

@weierophinney: I removed the word-wrap- prefix as @DASPRiD noted and also added //IGNORE to the destination encoding of the iconv wrapper + tests

@weierophinney
Copy link
Member

I get 2 failures when I run tests; details are below. These are with PHP 5.4.9, using iconv v2.15 and libmbfl v1.3.2.

1) ZendTest\Stdlib\StringWrapper\IconvTest::testConvertDontSubstringsOnInvalidCharacter
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-foo x bar
+

tests/ZendTest/Stdlib/StringWrapper/CommonStringWrapperTest.php:151

2) ZendTest\Stdlib\StringWrapper\MbStringTest::testConvertDontSubstringsOnInvalidCharacter
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-foo x bar
+foo ?x bar

tests/ZendTest/Stdlib/StringWrapper/CommonStringWrapperTest.php:151

Travis doesn't report errors, so I'm going to go ahead and merge, but wanted to note the potential conflicts in versions.

@weierophinney weierophinney merged commit 0c8ad71 into zendframework:develop Jan 7, 2013
@marc-mabe
Copy link
Member Author

@weierophinney:

  • mbstirng sets the configured substitution character - it looks like this character has been changed
    • the used character could be detected with mb_substitute_character to fix the test
    • I this the behavior is up to mbstring and the wrapper itself don't needs a change
  • iconv completely breaks conversion on an invalid character of input string
    • it looks like the //IGNORE addition is for unsupported character by output encoding and the behavior on invalid input characters has been changed to return an empty string now :(
    • This can't simply changed or handled without a version check
    • Zend\Escaper could be affected, too - I'm not sure 100%

I added the //IGNORE addition and the test to make sure converting will not be truncated on an invalid input character. Now this is the case for iconv and I don't see a workaround.

Possibilities:

  • remove the test
  • remove the test and the //IGNORE addition
  • changing the behavior to throw an exception or return false in this case (on all wrappers) but this requires additional checks and workarounds which slows down the function and we could ran into other issues not visible now

I personally prefer to simply remove the test, leave the //IGNORE addition and describe it on docs.

@weierophinney
Copy link
Member

@marc-mabe Makes sense -- give me a new PR, and I'll merge.

weierophinney added a commit to zendframework/zend-stdlib that referenced this pull request May 15, 2015
gianarb pushed a commit to zendframework/zend-stdlib that referenced this pull request May 15, 2015
weierophinney added a commit to zendframework/zend-text that referenced this pull request May 15, 2015
weierophinney added a commit to zendframework/zend-progressbar that referenced this pull request May 15, 2015
weierophinney added a commit to zendframework/zend-validator that referenced this pull request May 15, 2015
weierophinney added a commit to zendframework/zend-feed that referenced this pull request May 15, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants