Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix json decode #154

Merged
merged 1 commit into from Jun 1, 2016
Merged

fix json decode #154

merged 1 commit into from Jun 1, 2016

Conversation

faceair
Copy link
Contributor

@faceair faceair commented Jun 1, 2016

我这边有一个用户资料是这样的 {"subscribe":1,"openid":"*","nickname":"monk","sex":1,"language":"zh_CN","city":"Y'Qt","province":"S�N¬","country":"","headimgurl":"http:\/\/wx.qlogo.cn\/mmopen\/ajNVdqHZLLBGnfR2C0W8cSLBbkeQASsMaSQsOKPwL9vIGr1Zen9zj9Jwibt06kpicNvH1NU7uWFZQ8rG4CrPD7uA\/0","subscribe_time":1376237570,"unionid":"*","remark":"","groupid":0,"tagid_list":[]}

其中 province 字段最后一个字符的 unicode 是 172 ,无法正常 decode,需要加 strict 参数。

image

代码中其他地方也可能有这种问题。

@messense
Copy link
Member

messense commented Jun 1, 2016

https://docs.python.org/2/library/json.html#json.JSONDecoder

If strict is False (True is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including '\t' (tab), '\n', '\r' and '\0'.

172 这个也行?

@codecov-io
Copy link

codecov-io commented Jun 1, 2016

Current coverage is 76.86%

Merging #154 into master will decrease coverage by <.01%

@@             master       #154   diff @@
==========================================
  Files            88         88          
  Lines          3258       3258          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           2505       2504     -1   
- Misses          753        754     +1   
  Partials          0          0          

Powered by Codecov. Last updated by 5a58d78...a22cca5

@faceair
Copy link
Contributor Author

faceair commented Jun 1, 2016

稍等 我确认一下

@@ -113,7 +113,7 @@ def _request(self, method, url_or_endpoint, **kwargs):
def _decode_result(self, res):
res.encoding = 'utf-8'
try:
result = res.json()
result = json.loads(res.content, strict=False)
Copy link
Member

@messense messense Jun 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result = json.loads(res.content.decode('utf-8', errors='ignore'))

可以考虑先 decode 成 utf-8

@faceair
Copy link
Contributor Author

faceair commented Jun 1, 2016

确实是 json 解析时编码的问题。调用的错误栈是:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "*/eggs/requests-2.9.1-py2.7.egg/requests/models.py", line 809, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/local/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python2.7/json/decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 128 (char 127)

@@ -113,7 +113,7 @@ def _request(self, method, url_or_endpoint, **kwargs):
def _decode_result(self, res):
res.encoding = 'utf-8'
try:
result = res.json()
result = json.loads(res.content.decode('utf-8', strict='ignore'))
Copy link
Member

@messense messense Jun 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该是 errors='ignore' 记错了

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [12]: s.decode?
Docstring:
S.decode([encoding[,errors]]) -> object

Decodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registered with codecs.register_error that is
able to handle UnicodeDecodeErrors.
Type:      builtin_function_or_method

@messense
Copy link
Member

messense commented Jun 1, 2016

Python 2.6 的测试居然挂了那么多,好奇怪。

@messense
Copy link
Member

messense commented Jun 1, 2016

py26 下这个更改可能会让本来正常的 raise 了 TypeError/ValueError

@faceair
Copy link
Contributor Author

faceair commented Jun 1, 2016

没太明白。需要我再改什么么?

@messense
Copy link
Member

messense commented Jun 1, 2016

可能是测试里面 Mock 数据的问题,先合了我再去解决吧,解决了再发新版本。

@messense messense merged commit 80e0ed9 into wechatpy:master Jun 1, 2016
@messense
Copy link
Member

messense commented Jun 1, 2016

TypeError('decode() takes no keyword arguments',)

好像 Python 2.6 string.decode 方法不能带额外参数.......................

@faceair
Copy link
Contributor Author

faceair commented Jun 1, 2016

告诉你一个不幸的事情,我这边又跑出来一个新的错误

{"subscribe":1,"openid":"*","nickname":"啊","sex":1,"language":"zh_CN","city":"","province":"\n61.151.217.163","country":"","headimgurl":"http:\/\/wx.qlogo.cn\/mmopen\/O1dAhMERUwUicLOeVG9fRAYmesYlfSZPcaphVhmCDuxKEA7LK9tnngjackWxlFDJS6XaryjOYic4dnWGeBfDmo4A\/0","subscribe_time":1360801037,"unionid":"*","remark":"","groupid":0,"tagid_list":[]}

Traceback (most recent call last):
  File "bin/python", line 98, in <module>
    exec(compile(__file__f.read(), __file__, "exec"))
  File "scripts/fetch_user.py", line 51, in <module>
    fetch_user('*', *)
  File "scripts/fetch_user.py", line 38, in fetch_user
    user_data = json.loads(user_data.content.decode('utf-8', errors='ignore'))
  File "/usr/local/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python2.7/json/decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 122 (char 121)

province 里面带了 \n,可能要改成 json.loads(res.content.decode('utf-8', errors='ignore'), strict=False)

@messense
Copy link
Member

messense commented Jun 1, 2016

这都是什么奇葩数据......

@faceair
Copy link
Contributor Author

faceair commented Jun 1, 2016

😰 你可以等等再修,还没跑完。。。

@faceair
Copy link
Contributor Author

faceair commented Jun 4, 2016

后来没再遇到其他的问题了

messense added a commit to messense/wechatpy that referenced this pull request Jun 5, 2016
View more information about this in pull request wechatpy#154
@messense
Copy link
Member

messense commented Jun 5, 2016

都增加了 strict=False 参数了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants