Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api cannot handle unicode #3

Closed
ragesoss opened this issue Sep 3, 2015 · 9 comments
Closed

api cannot handle unicode #3

ragesoss opened this issue Sep 3, 2015 · 9 comments

Comments

@ragesoss
Copy link
Contributor

ragesoss commented Sep 3, 2015

For example: http://tools.wmflabs.org/eranbot/plagiabot/api.py?action=suspected_diffs&page_title=%27Ch%C3%A2teau%20Laroque%27

@ragesoss
Copy link
Contributor Author

ragesoss commented Sep 3, 2015

As far as I can tell, there's no way to query for articles with titles like this. Here's the entry returned by the database for the general 50-limit query:

{'page_ns': '0', 'page_title': 'Ch\xc3\xa2teau_Laroque', 'diff_timestamp': '20150831161600', 'ithenticate_id': '19202899', 'project': 'wikipedia', 'diff': '678784050'}

But passing that same page_title page yields no results: http://tools.wmflabs.org/eranbot/plagiabot/api.py?action=suspected_diffs&lang=en&page_title=Ch\xc3\xa2teau_Laroque

@valhallasw
Copy link
Owner

What happens is the following:

  • %27Ch%C3%A2teau%20Laroque%27 is passed through the url
  • parse_qs parses this in "'x27Ch\xc3\xa2teau Laroque'"
  • this is a non-ascii byte string, and still needs to be decoded to utf-8 before passing it to oursql.

@ragesoss
Copy link
Contributor Author

ragesoss commented Sep 6, 2015

I'm not sure how tied to pywikibot this still is, but if the api portion (if not all of it) could be switched to python3, that'd probably involve a lot less unicode pain in the long run.

@fhocutt
Copy link

fhocutt commented Jan 26, 2016

This is causing issues with the integration into @earwig's Copyvios Detector: earwig/copyvios#25
The API should at least handle errors and output a documented warning message instead of serving the html stack trace.

@eranroz
Copy link
Collaborator

eranroz commented Jan 26, 2016

Solved using ugly decode.

Using python3 is a good idea, but the instance in labs uses python2 and I'm too lazy to move the instance to use python3.

@eranroz
Copy link
Collaborator

eranroz commented Jan 26, 2016

@earwig
Copy link

earwig commented Jan 27, 2016

Cheers!

@kaldari
Copy link

kaldari commented Mar 29, 2016

Is this ticket resolved now?

@eranroz
Copy link
Collaborator

eranroz commented Mar 29, 2016

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants