Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some character disappear on mac os x #4896

Merged
merged 1 commit into from Jul 17, 2014
Merged

Conversation

sportica
Copy link
Contributor

std::isspace() of mac os x recognize 0xa0 as whitespace.
I think that it's a bug.
http://en.cppreference.com/w/cpp/string/byte/isspace

'죠' is 0xec, 0xa3, 0xa0
'고' 0xea, 0xb3, 0xa0
Both '죠' and '고' have 0xa0.

Perhaps more character will be affected.

@jmarshallnz
Copy link
Contributor

A better bet might be to just hit the ones we know we want?

@Karlson2k ?

@MartijnKaijser MartijnKaijser added this to the Pending for inclusion milestone Jun 12, 2014
@Karlson2k
Copy link
Member

Don't break things that already broken. :)
Actually it's incorrect to use any single-byte char function with utf-8 bytes.
You can fix it by:

  • detecting UTF-8 char length and checking only first byte of each UTF-8 char by ::isspace() in TrimX functions
  • convert string to wide character and use ::iswspace() in TrimX functions
  • use our CRegExp class in UTF-8 mode in TrimX funtions

@sportica
Copy link
Contributor Author

Only single byte within UTF-8 characters could be whitespace.
Because UTF-8 was designed for backward compatibility with ASCII.
http://en.wikipedia.org/wiki/UTF-8

How about this commit, @Karlson2k ?

epoke added a commit to epoke/xbmc that referenced this pull request Jun 16, 2014
@Karlson2k
Copy link
Member

@sportica UTF-8 is compatible backward with US-ASCII only, not ASCII in general. Unicode has much more whitespace characters than US-ASCII.
As temporal quick fix, I'd remove this helper and rewrite TrimX functions with direct tests like if ((c & 0x80) == 0 && ::isspace(c)) with comment like //TODO: support all UNICODE whitespace chars

@sportica
Copy link
Contributor Author

Okay. Thank you Karlson2k.

@sportica sportica closed this Jun 17, 2014
@sportica
Copy link
Contributor Author

sportica commented Jul 8, 2014

Now use std::ctype functions instead of isspace(). std::ctype functions determine white spaces by system locale. How about this way?

@sportica sportica reopened this Jul 8, 2014
@sportica
Copy link
Contributor Author

sportica commented Jul 9, 2014

How about writing own function, @Karlson2k ?

@sportica
Copy link
Contributor Author

sportica commented Jul 9, 2014

@Karlson2k It is based on your second suggestion. But only convert necessary amount, not whole string. Convert one character at once, and compare it.

@jmarshallnz
Copy link
Contributor

IMO I'd just implement using if ((c & 0x80) == 0 && ::isspace(c)) and keep the find_if as it is. We really only want to trim space, tab, linefeeds and carriage returns in these functions, not every possible whitespace character under the sun.

@sportica
Copy link
Contributor Author

Please let me know if you know suitable STL. It seems that there is no template for our case. Or we should keep helper function.

@jmarshallnz
Copy link
Contributor

The original? With the simple conditional on a char you have a function that takes a char and returns a bool. Exactly what the isspace_c() function does.

@sportica
Copy link
Contributor Author

Please check again.

@jmarshallnz
Copy link
Contributor

@Karlson2k mind taking a look?

@Karlson2k
Copy link
Member

Looks fine.
(unsigned char) is not required with high bit off.
OK to get in.

@jmarshallnz
Copy link
Contributor

jenkins build this please

@Karlson2k
Copy link
Member

jenkins build this please again, without strange fails this time

jmarshallnz added a commit that referenced this pull request Jul 17, 2014
Some character disappear on mac os x
@jmarshallnz jmarshallnz merged commit f18207d into xbmc:master Jul 17, 2014
@jmarshallnz jmarshallnz added Gotham and removed Gotham labels Jul 17, 2014
jmarshallnz added a commit that referenced this pull request Jul 19, 2014
Some character disappear on mac os x
@MartijnKaijser MartijnKaijser modified the milestones: Helix 14.0-alpha1, Pending for inclusion Jul 19, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants