Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request params encoded using system encoding #39

Open
gazpachoking opened this issue Feb 6, 2014 · 8 comments
Open

Request params encoded using system encoding #39

gazpachoking opened this issue Feb 6, 2014 · 8 comments

Comments

@gazpachoking
Copy link
Contributor

Perhaps I'm misunderstanding how this is supposed to work, but it looks like all request parameters are encoded using the system locale encoding. (https://github.com/wagnerrp/pytmdb3/blob/master/tmdb3/request.py#L70) This causes problems when the system locale cannot encode all the charaters in the parameters, plus, I have no idea how tmdb is expected to know what encoding you have used to encode the parameters, I suspect it should be using a constant encoding defined by the tmdb api.
Portion of a relevant traceback:

File "/usr/local/lib/python2.7/dist-packages/flexget/plugins/api_tmdb.py", line 293, in lookup
    result = _first_result(tmdb3.tmdb_api.searchMovie(title.lower(), adult=True, year=year))
  File "/usr/local/lib/python2.7/dist-packages/tmdb3/tmdb_api.py", line 128, in searchMovie
    return MovieSearchResult(Request('search/movie', **kwargs), locale=locale)
  File "/usr/local/lib/python2.7/dist-packages/tmdb3/request.py", line 71, in __init__
    kwargs[k] = locale.encode(v)
  File "/usr/local/lib/python2.7/dist-packages/tmdb3/locales.py", line 110, in encode
    return dat.encode(self.encoding)
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-13: ordinal not in range(256)

Downstream ticket: http://flexget.com/ticket/2392

@gazpachoking
Copy link
Contributor Author

Did a bit of testing, looks like tmdb is expecting utf-8 encoding. Did a bit of a hack to get things working again:

# Before. Broken
>>> tmdb3.tmdb_api.searchMovie(u'Generation П')[0]
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "build\bdist.win32\egg\tmdb3\tmdb_api.py", line 128, in searchMovie
    return MovieSearchResult(Request('search/movie', **kwargs), locale=locale)
  File "build\bdist.win32\egg\tmdb3\request.py", line 70, in __init__
    kwargs[k] = locale.encode(v)
  File "build\bdist.win32\egg\tmdb3\locales.py", line 110, in encode
    return dat.encode(self.encoding)
  File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u041f' in position 11: character maps to <undefined>

# Hack to fix encoding
>>> tmdb3.locales.set_locale("en", "us", True)
>>> tmdb3.locales.syslocale.encoding = 'utf-8'

# After. Working.
>>> tmdb3.tmdb_api.searchMovie(u'Generation П')[0]
<Movie 'Generation P' (2011)>

@wagnerrp
Copy link
Owner

wagnerrp commented Feb 6, 2014

If the user is going to be accessing unicode content, such as movies with the character "П" in the title, it expects the user will have configured their system to handle unicode content. Specifically, that means configuring a UTF language in their environment.

# unconfigured default
> locale
LANG=
LC_CTYPE="C"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=
# Bourne users
> export LANG="en_US.UTF-8"
# C-shell users
> setenv LANG en_US.UTF-8
# confirmation
> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=

The tmdb3 library will then pull that encoding from the environment using the locale library.

> projects/pytmdb3/scripts/pytmdb3.py
PyTMDB3 Interactive Shell. TAB completion available.
>>> import locale
>>> locale.getdefaultlocale()
('en_US', 'UTF-8')
>>> get_locale().encoding
'UTF-8'

@wagnerrp wagnerrp closed this as completed Feb 6, 2014
@gazpachoking
Copy link
Contributor Author

The problem is, we can't just pick an arbitrary encoding when sending requests to tmdb. They are expecting utf-8.

@gazpachoking
Copy link
Contributor Author

It has nothing to do with the platform we are running on what encoding the api expects.

@gazpachoking
Copy link
Contributor Author

Here is some more evidence that just picking a codec that supports all unicode codepoints still isn't correct. It has to be in the encoding tmdb is expecting in order for it to be able to decode again:


>>> tmdb3.locales.syslocale.encoding = 'utf-8'
>>> tmdb3.tmdb_api.searchMovie(u'Generation П')[0]
<Movie 'Generation P' (2011)>
>>> tmdb3.locales.syslocale.encoding = 'utf-16'
>>> tmdb3.tmdb_api.searchMovie(u'Generation П')[0]
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\tmdb_api.py", line 128, in searchMovie
    return MovieSearchResult(Request('search/movie', **kwargs), locale=locale)
  File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\tmdb_api.py", line 157, in __init__
    lambda x: Movie(raw=x, locale=locale))
  File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\pager.py", line 106, in __init__
    super(PagedRequest, self).__init__(self._getpage(1), 20)
  File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\pager.py", line 59, in __init__
    self._data = list(iterable)
  File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\pager.py", line 110, in _getpage
    res = req.readJSON()
  File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\cache.py", line 118, in __call__
    data = self.func(*args, **kwargs)
  File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\request.py", line 125, in readJSON
    raise e
TMDBHTTPError: HTTP Error 500: Internal Server Error

@wagnerrp
Copy link
Owner

wagnerrp commented Feb 6, 2014

The environment does need to be configured for unicode to receive unicode responses from TMDb, due to the behavior of Python 2 itself, however I'll need to look at this again to figure out how to handle non-bytecode encodings.

@wagnerrp wagnerrp reopened this Feb 6, 2014
@gazpachoking
Copy link
Contributor Author

This should be entirely independent of the environment. Unicode is unicode no matter what locale an user has set. Tmdb declares what encoding they accept and send for byte strings, and the python library should only expose and accept strings as unicode objects to the user. If the user tries to query the library with a bytestring (str, python 2) representing non-ascii characters is the only time an error should be raised.

@gregorvolkmann
Copy link

gregorvolkmann commented Oct 2, 2018

tmdb3.locales.syslocale.encoding = 'utf-8' fixed also TMDbError Internal error - Something went wrong. Contact TMDb. on tmdb3.MovieSearch('some string with äüö')
Thanks @gazpachoking !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants