why I run re.findall('[%s]' % zhon.hanzi.characters, 'I broke a plate: 我打破了一个盘子.'), I got an empty [] #20

legend0011 · 2015-05-08T10:57:35Z

why I run re.findall('[%s]' % zhon.hanzi.characters, 'I broke a plate: 我打破了一个盘子.'), I got an empty []???

I use python 2.7.3, ubuntu 12.04

The text was updated successfully, but these errors were encountered:

tsroten · 2015-05-08T15:21:04Z

@legend0011 It looks like the string you are using is not a Unicode string, yet it contains Unicode characters. Here are two ways you can fix it.

Prefix the string with u:

>>> import re
>>> import zhon.hanzi
>>> # Notice the "u" in the next line.
... characters = re.findall('[%s]' % zhon.hanzi.characters, u'I broke a plate: 我打破了一个盘子.')
>>> characters
[u'\u6211', u'\u6253', u'\u7834', u'\u4e86', u'\u4e00', u'\u4e2a', u'\u76d8', u'\u5b50']
>>> for character in characters:
...    print character
我
打
破
了
一
个
盘
子

Make all the strings in your code Unicode by default.

>>> from __future__ import unicode_literals
>>> import re
>>> import zhon.hanzi.characters
>>> characters = re.findall('[%s]' % zhon.hanzi.characters, 'I broke a plate: 我打破了一个盘子.')
>>> characters
[u'\u6211', u'\u6253', u'\u7834', u'\u4e86', u'\u4e00', u'\u4e2a', u'\u76d8', u'\u5b50']
>>> for character in characters:
...    print character
我
打
破
了
一
个
盘
子

Does that make sense?

legend0011 · 2015-05-09T01:46:57Z

Yes, that's very good help!!! Thank you very much!!! Still, I have a problem, if I pass an argument s, how can I transfer it to unicode ? I tried re.findall('[%s]' % zhon.hanzi.characters, s.decode('unicode-escape')), but it doesn't work.

legend0011 · 2015-05-09T01:49:32Z

oh, I found a solution : re.findall('[%s]' % zhon.hanzi.characters, unicode(s, "utf-8")), it seems it works. Is that a good way?

tsroten · 2015-05-09T02:31:08Z

@legend0011 Yes, that's a good way. You can also do s.decode('utf-8').

* origin/v2.0.0_release: Bump version and update changelog for version 2.0.0 Fixes #20. Add doc note aobut combining diactrical marks. add fullwidth full stop. fixes #30 remove python2 support fix copy/paste error and update docs link for pypi update docs links and status images fix remaining flake8 warnings. Also addresses #34 run black on all files formatting fixes from black Switch to using hatch for development and upgrade to latest Sphinx for documentation. Bump wheel from 0.29.0 to 0.38.1 Update __init__.py Update __init__.py Use new string format in docs. Add 3.6 and remove 3.3 from setup.py. Lint fixes. Add tests and Makefile to manifest. Move tests to separate directory. Update travis/tox tests. Update requirements file.

tsroten added the question label May 8, 2015

tsroten closed this as completed Jun 15, 2015

tsroten added a commit that referenced this issue Jun 24, 2023

Fixes #20. Add doc note aobut combining diactrical marks.

d144bbf

tsroten mentioned this issue Jun 24, 2023

Add doc note about combining diactrical marks. #38

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why I run re.findall('[%s]' % zhon.hanzi.characters, 'I broke a plate: 我打破了一个盘子.'), I got an empty [] #20

why I run re.findall('[%s]' % zhon.hanzi.characters, 'I broke a plate: 我打破了一个盘子.'), I got an empty [] #20

legend0011 commented May 8, 2015

tsroten commented May 8, 2015

legend0011 commented May 9, 2015

legend0011 commented May 9, 2015

tsroten commented May 9, 2015

why I run re.findall('[%s]' % zhon.hanzi.characters, 'I broke a plate: 我打破了一个盘子.'), I got an empty [] #20

why I run re.findall('[%s]' % zhon.hanzi.characters, 'I broke a plate: 我打破了一个盘子.'), I got an empty [] #20

Comments

legend0011 commented May 8, 2015

tsroten commented May 8, 2015

legend0011 commented May 9, 2015

legend0011 commented May 9, 2015

tsroten commented May 9, 2015