api cannot handle unicode #3

ragesoss · 2015-09-03T21:12:54Z

For example: http://tools.wmflabs.org/eranbot/plagiabot/api.py?action=suspected_diffs&page_title=%27Ch%C3%A2teau%20Laroque%27

ragesoss · 2015-09-03T21:45:13Z

As far as I can tell, there's no way to query for articles with titles like this. Here's the entry returned by the database for the general 50-limit query:

{'page_ns': '0', 'page_title': 'Ch\xc3\xa2teau_Laroque', 'diff_timestamp': '20150831161600', 'ithenticate_id': '19202899', 'project': 'wikipedia', 'diff': '678784050'}

But passing that same page_title page yields no results: http://tools.wmflabs.org/eranbot/plagiabot/api.py?action=suspected_diffs&lang=en&page_title=Ch\xc3\xa2teau_Laroque

valhallasw · 2015-09-04T07:48:30Z

What happens is the following:

%27Ch%C3%A2teau%20Laroque%27 is passed through the url
parse_qs parses this in "'x27Ch\xc3\xa2teau Laroque'"
this is a non-ascii byte string, and still needs to be decoded to utf-8 before passing it to oursql.

ragesoss · 2015-09-06T16:58:54Z

I'm not sure how tied to pywikibot this still is, but if the api portion (if not all of it) could be switched to python3, that'd probably involve a lot less unicode pain in the long run.

fhocutt · 2016-01-26T21:38:33Z

This is causing issues with the integration into @earwig's Copyvios Detector: earwig/copyvios#25
The API should at least handle errors and output a documented warning message instead of serving the html stack trace.

eranroz · 2016-01-26T23:00:04Z

Solved using ugly decode.

Using python3 is a good idea, but the instance in labs uses python2 and I'm too lazy to move the instance to use python3.

eranroz · 2016-01-26T23:04:52Z

tested:
http://tools.wmflabs.org/eranbot/plagiabot/api.py?action=suspected_diffs&lang=en&page_title=Ch%C3%A2teau_Laroque

earwig · 2016-01-27T08:31:42Z

Cheers!

kaldari · 2016-03-29T17:50:58Z

Is this ticket resolved now?

eranroz · 2016-03-29T20:17:52Z

yes

earwig mentioned this issue Jan 27, 2016

SyntaxError: invalid syntax earwig/copyvios#25

Closed

ragesoss closed this as completed Mar 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api cannot handle unicode #3

api cannot handle unicode #3

ragesoss commented Sep 3, 2015

ragesoss commented Sep 3, 2015

valhallasw commented Sep 4, 2015

ragesoss commented Sep 6, 2015

fhocutt commented Jan 26, 2016

eranroz commented Jan 26, 2016

eranroz commented Jan 26, 2016

earwig commented Jan 27, 2016

kaldari commented Mar 29, 2016

eranroz commented Mar 29, 2016

api cannot handle unicode #3

api cannot handle unicode #3

Comments

ragesoss commented Sep 3, 2015

ragesoss commented Sep 3, 2015

valhallasw commented Sep 4, 2015

ragesoss commented Sep 6, 2015

fhocutt commented Jan 26, 2016

eranroz commented Jan 26, 2016

eranroz commented Jan 26, 2016

earwig commented Jan 27, 2016

kaldari commented Mar 29, 2016

eranroz commented Mar 29, 2016