Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode tests fail with Python 2 #596

Closed
qauge opened this issue Apr 6, 2016 · 6 comments
Closed

Unicode tests fail with Python 2 #596

qauge opened this issue Apr 6, 2016 · 6 comments
Assignees
Milestone

Comments

@qauge
Copy link

qauge commented Apr 6, 2016

I have trouble getting tests right from the repository freshly cloned from master.

Some "unicode tests" won't pass with Python 2 (UnicodeDecodeError). Those are test_ndjson for csvjson, test_unicode for csvlook, test_sort_date for csvsort, and test_sort_string_reverse for csvsort.

Sample output:

ERROR: test_ndjson (tests.test_utilities.test_csvjson.TestCSVJSON)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/quentin/work/csvkit/tests/test_utilities/test_csvjson.py", line 108, in test_ndjson
    '{"text": "Unicode! Σ", "float": null, "datetime": null, "boolean": null, "time": null, "date": null, "integer": null}',
  File "/home/quentin/work/csvkit/tests/utils.py", line 56, in assertLines
    lines = self.get_output_as_list(args)
  File "/home/quentin/work/csvkit/tests/utils.py", line 44, in get_output_as_list
    return self.get_output(args).split('\n')
  File "/home/quentin/work/csvkit/tests/utils.py", line 37, in get_output
    utility.main()
  File "/home/quentin/work/csvkit/csvkit/utilities/csvjson.py", line 168, in main
    table = agate.Table.from_csv(self.input_file, sniff_limit=self.args.sniff_limit, column_types=self.get_column_types(), **self.reader_kwargs)
  File "/home/quentin/work/csvkit/.tox/pypy/site-packages/agate/table/__init__.py", line 362, in from_csv
    contents = contents.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 295: ordinal not in range(128)

Is it something on my side only?

@jpmckinney
Copy link
Member

This started erroring in Python 2 between agate 1.3.0 and 1.3.1 (i.e. tests pass using agate 1.3.0). @onyxfish @nbedi Any idea what the problem could be?

@nbedi
Copy link
Member

nbedi commented Apr 6, 2016

Seems to do with this commit wireservice/agate@6fea5a0.

It added this bit to agate Table initialization:

if six.PY2:
          contents = contents.encode('utf-8')

I don't know much about that commit and issue but @onyxfish will know more.

@nbedi
Copy link
Member

nbedi commented Jul 8, 2016

So I think the issue is that agate's from_csv now expects that a file has been read with proper encoding if passing a file-like object. But the encoding argument isn't used to open the file in _open_input_file if in Python 2. I may be wrong, does that seem like the issue @jpmckinney?

@nbedi
Copy link
Member

nbedi commented Jul 8, 2016

These tests seem to pass if I give from_csv the path argument instead of the LazyFile object (and let agate figure out encoding). Would that solution affect performance? Not sure if a LazyFile helps performance when using from_csv.

@jpmckinney
Copy link
Member

@nbedi Can you create a patch? I'm not sure what it looks like based only on your description.

@jpmckinney
Copy link
Member

LazyFile was implemented to resolve #178.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants