Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding problem #44

Closed
csill1634 opened this issue Feb 4, 2016 · 12 comments
Closed

Encoding problem #44

csill1634 opened this issue Feb 4, 2016 · 12 comments

Comments

@csill1634
Copy link

Hi!
I get the following exception:

Traceback (most recent call last):
  File "/usr/bin/urlwatch", line 376, in <module>
    main(parser.parse_args())
  File "/usr/bin/urlwatch", line 343, in main
    report.finish()
  File "/usr/lib/python3.5/site-packages/urlwatch/handler.py", line 128, in finish
    ReporterBase.submit_all(self, self.job_states, duration)
  File "/usr/lib/python3.5/site-packages/urlwatch/reporters.py", line 81, in submit_all
    cls(report, cfg, job_states, duration).submit()
  File "/usr/lib/python3.5/site-packages/urlwatch/reporters.py", line 298, in submit
    print(self._red(line))
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 63: ordinal not in range(256)

I traced it back to the following content of the website:

European Union’s
Please note the ' between n and s causing the hickup. Maybe all string operations must be moved to utf8?

Thanks!

@thp
Copy link
Owner

thp commented Feb 4, 2016

For which URL does this happen?

@csill1634
Copy link
Author

http://ecrime-project.eu/

the text that causes the error is in the footer of the webpage

@thp
Copy link
Owner

thp commented Feb 8, 2016

Ok, I've tested with this page and it properly specifies UTF-8 both in the HTTP header and in the meta http-equiv tag. So I guess you have any filters enabled for this URL? If so, which?

@csill1634
Copy link
Author

yes, html2text

@thp
Copy link
Owner

thp commented Feb 12, 2016

What is your system locale set to? You can check with the following snippet:

$ python3
[...]
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'

@csill1634
Copy link
Author

also utf8

>>> import sys
>>> sys.getdefaultencoding()
'utf-8'

@marbon87
Copy link

do you also have the problem only when running as a cronjob?
then look for the solution here:
#48

@ngld
Copy link

ngld commented Feb 16, 2016

It looks like Python's stdout is set to latin-1 for some reason. Can you please run the following?

$ python3
[...]
>>> import sys, os
>>> sys.stdout.encoding
'UTF-8'
>>> os.environ.get('PYTHONIOENCODING', '')
''

@csill1634
Copy link
Author

python3
>>> import sys, os
>>> sys.stdout.encoding
'ISO-8859-1'
>>> os.environ.get('PYTHONIOENCODING', '')
''

@ngld
Copy link

ngld commented Feb 17, 2016

Try running urlwatch with PYTHONIOENCODING="UTF-8" urlwatch. That should solve your problem. I just wonder why your Python uses latin-1 as stdout encoding...
What distribution are you using?

@csill1634
Copy link
Author

Arch Linux

@csill1634
Copy link
Author

PYTHONIOENCODING="UTF-8" urlwatch does the trick for me! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants