Skip to content
This repository has been archived by the owner on Jan 8, 2021. It is now read-only.

Fix for Turkish "small dotless i" problem in Utf8::toAscii #2

Closed
wants to merge 1 commit into from
Closed

Fix for Turkish "small dotless i" problem in Utf8::toAscii #2

wants to merge 1 commit into from

Conversation

navruzm
Copy link

@navruzm navruzm commented Mar 19, 2013

Currently when you have an "ı" (small dotless i) in your string, Utf8::toAscii doesn't convert properly this character. "ı" converted to "?" instead of "i".

This is a known Unicode CLDR bug opened 3 years ago and does not seem to be fixed. They said "it should be fixed next release, in bug #3335" which opened 2 years ago.
I don't know when they fix that, but that bug causes problem like this one laravel/framework#552

@nicolas-grekas
Copy link
Contributor

Hi, thanks for this pull request. I would replace your line of code with this one:

if (false !== strpos($s, 'ı')) $s = str_replace('ı', 'i', $s);

the small dotless i is listed in:
http://unicode.org/repos/cldr/trunk/common/transforms/Latin-ASCII.xml
but this is not the data source that is used by iconv.

We should compare the mapping in this XML files with the one done by iconv and see if other character differs...

@nicolas-grekas
Copy link
Contributor

Here are all the characters that are mapped to something based on Latin-ASCII.xml, but are not mapped by iconv on ubuntu (the dotless small i is in the list).

Not sure about what do to with this list now...

Ð ? D
Ø ? O
Þ ? TH
ð ? d
ø ? o
þ ? th
Đ ? D
đ ? d
Ħ ? H
ħ ? h
ı ? i
ĸ ? q
Ŋ ? N
ŋ ? n
Ŧ ? T
ŧ ? t
ƀ ? b
Ɓ ? B
Ƃ ? B
ƃ ? b
Ƈ ? C
ƈ ? c
Ɖ ? D
Ɗ ? D
Ƌ ? D
ƌ ? d
Ɛ ? E
Ƒ ? F
ƒ ? f
Ɠ ? G
ƕ ? hv
Ɩ ? I
Ɨ ? I
Ƙ ? K
ƙ ? k
ƚ ? l
Ɲ ? N
ƞ ? n
Ƣ ? OI
ƣ ? oi
Ƥ ? P
ƥ ? p
ƫ ? t
Ƭ ? T
ƭ ? t
Ʈ ? T
Ʋ ? V
Ƴ ? Y
ƴ ? y
Ƶ ? Z
ƶ ? z
Ǥ ? G
ǥ ? g
ȡ ? d
Ȥ ? Z
ȥ ? z
ȴ ? l
ȵ ? n
ȶ ? t
ȷ ? j
ȸ ? db
ȹ ? qp
Ⱥ ? A
Ȼ ? C
ȼ ? c
Ƚ ? L
Ⱦ ? T
ȿ ? s
ɀ ? z
Ƀ ? B
Ʉ ? U
Ɇ ? E
ɇ ? e
Ɉ ? J
ɉ ? j
Ɍ ? R
ɍ ? r
Ɏ ? Y
ɏ ? y
ɓ ? b
ɕ ? c
ɖ ? d
ɗ ? d
ɛ ? e
ɟ ? j
ɠ ? g
ɡ ? g
ɢ ? G
ɦ ? h
ɧ ? h
ɨ ? i
ɪ ? I
ɫ ? l
ɬ ? l
ɭ ? l
ɱ ? m
ɲ ? n
ɳ ? n
ɴ ? N
ɶ ? OE
ɼ ? r
ɽ ? r
ɾ ? r
ʀ ? R
ʂ ? s
ʈ ? t
ʉ ? u
ʋ ? v
ʏ ? Y
ʐ ? z
ʑ ? z
ʙ ? B
ʛ ? G
ʜ ? H
ʝ ? j
ʟ ? L
ʠ ? q
ʣ ? dz
ʥ ? dz
ʦ ? ts
ʪ ? ls
ʫ ? lz
ᴀ ? A
ᴁ ? AE
ᴃ ? B
ᴄ ? C
ᴅ ? D
ᴆ ? D
ᴇ ? E
ᴊ ? J
ᴋ ? K
ᴌ ? L
ᴍ ? M
ᴏ ? O
ᴘ ? P
ᴛ ? T
ᴜ ? U
ᴠ ? V
ᴡ ? W
ᴢ ? Z
ᵫ ? ue
ᵬ ? b
ᵭ ? d
ᵮ ? f
ᵯ ? m
ᵰ ? n
ᵱ ? p
ᵲ ? r
ᵳ ? r
ᵴ ? s
ᵵ ? t
ᵶ ? z
ᵺ ? th
ᵻ ? I
ᵽ ? p
ᵾ ? U
ᶀ ? b
ᶁ ? d
ᶂ ? f
ᶃ ? g
ᶄ ? k
ᶅ ? l
ᶆ ? m
ᶇ ? n
ᶈ ? p
ᶉ ? r
ᶊ ? s
ᶌ ? v
ᶍ ? x
ᶎ ? z
ᶏ ? a
ᶑ ? d
ᶒ ? e
ᶓ ? e
ᶖ ? i
ᶙ ? u
ẜ ? s
ẝ ? s
ẞ ? SS
Ỻ ? LL
ỻ ? ll
Ỽ ? V
ỽ ? v
Ỿ ? Y
ỿ ? y
₠ ? CE
₢ ? Cr
₣ ? Fr.
₤ ? L.
₧ ? Pts
₹ ? Rs
℞ ? Rx
〇 ? 0
′ ? '
〝 ? "
〞 ? "
‖ ? ||
⁅ ? [
⁆ ? ]
⁎ ? *
、 ? ,
。 ? .
〈 ? <
〉 ? >
《 ? <<
》 ? >>
〔 ? [
〕 ? ]
〘 ? [
〙 ? ]
〚 ? [
〛 ? ]
︑ ? ,
︒ ? .
︹ ? [
︺ ? ]
︽ ? <<
︾ ? >>
︿ ? <
﹀ ? >
÷ ? /
∥ ? ||
⦅ ? ((
⦆ ? ))

@nicolas-grekas
Copy link
Contributor

Fixed in nicolas-grekas@cc3f2ea

@navruzm
Copy link
Author

navruzm commented Apr 18, 2013

Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants