Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json output utf8 error #6

Open
glensc opened this issue Apr 9, 2017 · 10 comments
Open

json output utf8 error #6

glensc opened this issue Apr 9, 2017 · 10 comments

Comments

@glensc
Copy link
Member

glensc commented Apr 9, 2017

invoking with json output --format=json i get error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 454: invalid start byte

traceback is not useful, so i'm not posting it here.

as you may already know json is utf8 strict, no single byte encodings allowed. perhaps allow specify encoding or just try to convert from latin1?

@glensc
Copy link
Member Author

glensc commented Apr 9, 2017

altho somewhy i'm sure the error comes from binary attachments, not actual text data.

@nazavode
Copy link
Member

nazavode commented Apr 9, 2017

Thanks for the report. Yes, errors are related to attachments, I've encountered a lot of issues trying to serialize base64 stuff coming from xmlrpc into json, so now I switched from json to pickle as the defaul export format. My plans are to fix migration using a format that is known to work for sure (pickle) and then figure out how to fix all these serialization/deserialization problems. Allowing the user to specify the encoding could be useful but I'm not 100% sure, how can someone know in advance if its binary attachments content would be byte-serializable in a particular encoding? I was thinking about a more general approach, but I have to think about it.

@glensc
Copy link
Member Author

glensc commented Apr 9, 2017

my first guess was that maybe some legacy svn commits used non-utf8 commit messages and that messed up the export. but yes indeed the error is from binary issue and wiki attachments.

i was looking for some text version of the output to be able to see what goes into export.

so far i wrote just little wrapper for myself:

# cat pickledump.py 
#!/usr/bin/python
import pickle
from pprint import pprint

from_export_file = 'comment.pickle'
with open(from_export_file, 'r') as f:
    content = f.read()
data = pickle.loads(content)

#del data['tickets']
#del data['wiki']

pprint(data)

@nazavode
Copy link
Member

nazavode commented Apr 9, 2017

Nice point. If I remember well, the --format=python option does something similar.

@glensc
Copy link
Member Author

glensc commented Apr 9, 2017

--format=python is not mentioned in readme, but indeed it's mentioned in --help output

$ tracboat export --help
Usage: tracboat export [OPTIONS]

  export a complete Trac instance

Options:
  --trac-uri <uri>                uri of the Trac instance XMLRpc endpoint  [default: http://localhost/xmlrpc]
  --ssl-verify / --no-ssl-verify  Enable/disable SSL certificate verification  [default: True]
  --format [json|python|pickle]   export format  [default: pickle]
  --out-file <path>               Output file. If not specified, result will be written to stdout.
  --help                          Show this message and exit.

@nazavode
Copy link
Member

nazavode commented Apr 9, 2017

Yeah I know, README is lacking :)

@nazavode nazavode added this to Backlog in First usable release Apr 9, 2017
@glensc
Copy link
Member Author

glensc commented Apr 21, 2017

this decode error is kicking me in every corner. can't use export, can't use import. can't disable wiki importing without hacking code. the stacktrace as already known is not much useful

(VENV)root@soa:~/tracboat# tracboat --config-file=comment.toml export
INFO:export:crawling Trac instance: https://trac.example.org/trac/comment/login/xmlrpc
INFO:export:writing export to comment.pickle
Traceback (most recent call last):
  File "/home/glen/tracboat/VENV/bin/tracboat", line 9, in <module>
    load_entry_point('tracboat==0.2.0a0', 'console_scripts', 'tracboat')()
  File "/home/glen/tracboat/src/tracboat/cli.py", line 423, in main
    cli(obj={})  # pylint: disable=unexpected-keyword-arg,no-value-for-parameter
  File "/home/glen/tracboat/VENV/local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/glen/tracboat/VENV/local/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/glen/tracboat/VENV/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/glen/tracboat/VENV/local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/glen/tracboat/VENV/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/glen/tracboat/src/tracboat/cli.py", line 118, in wrapper
    return func(*args, **kwargs)
  File "/home/glen/tracboat/VENV/local/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/glen/tracboat/src/tracboat/cli.py", line 272, in export
    out_f.write(project)
  File "/home/glen/tracboat/VENV/lib/python2.7/codecs.py", line 688, in write
    return self.writer.write(data)
  File "/home/glen/tracboat/VENV/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 677560: ordinal not in range(128)

oh, there's no info how to use logging, i also modified code to dump log to file and tail -F it.

@glensc
Copy link
Member Author

glensc commented Apr 21, 2017

regarding creating export file. you force utf-8 there, but the result of pickle dump is not utf8, is it?

def export(ctx, trac_uri, ssl_verify, format, out_file):  # pylint: disable=redefined-builtin
    """export a complete Trac instance"""
    LOG = logging.getLogger(ctx.info_name)
    #
    LOG.info('crawling Trac instance: %s', _sanitize_url(trac_uri))
    source = trac.connect(trac_uri, encoding='UTF-8', use_datetime=True, ssl_verify=ssl_verify)
    project = trac.project_get(source, collect_authors=True)
    project = _dumps(project, fmt=format)
    if out_file:
        LOG.info('writing export to %s', out_file)
        with codecs.open(out_file, 'wb', encoding='utf-8') as out_f:
            out_f.write(project)
    else:
        click.echo(project)

as seems currently can't save .pickle or .json format (data is not utf-8), can't load .python format, gives "Malformed string" error:

  File "/home/glen/tracboat/src/tracboat/cli.py", line 118, in wrapper
    return func(*args, **kwargs)
  File "/home/glen/tracboat/src/tracboat/cli.py", line 168, in wrapper
    return func(*args, **kwargs)
  File "/home/glen/tracboat/VENV/local/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/glen/tracboat/src/tracboat/cli.py", line 374, in migrate
    project = _loads(content, fmt=fmt)
  File "/home/glen/tracboat/src/tracboat/cli.py", line 72, in _loads
    return ast.literal_eval(content)
  File "/usr/lib/python2.7/ast.py", line 80, in literal_eval
    return _convert(node_or_string)
  File "/usr/lib/python2.7/ast.py", line 63, in _convert
    in zip(node.keys, node.values))
  File "/usr/lib/python2.7/ast.py", line 62, in <genexpr>
    return dict((_convert(k), _convert(v)) for k, v
  File "/usr/lib/python2.7/ast.py", line 63, in _convert
    in zip(node.keys, node.values))
  File "/usr/lib/python2.7/ast.py", line 62, in <genexpr>
    return dict((_convert(k), _convert(v)) for k, v
  File "/usr/lib/python2.7/ast.py", line 63, in _convert
    in zip(node.keys, node.values))
  File "/usr/lib/python2.7/ast.py", line 62, in <genexpr>
    return dict((_convert(k), _convert(v)) for k, v
  File "/usr/lib/python2.7/ast.py", line 79, in _convert
    raise ValueError('malformed string')
ValueError: malformed string

and can't run migrate without export file because the gitlab_direct can't connect to db, and no info how to configure that...

  File "/home/glen/tracboat/VENV/local/lib/python2.7/site-packages/peewee.py", line 3698, in _create_connection
    return self._connect(self.database, **self.connect_kwargs)
  File "/home/glen/tracboat/VENV/local/lib/python2.7/site-packages/peewee.py", line 4092, in _connect
    conn = psycopg2.connect(database=database, **kwargs)
  File "/home/glen/tracboat/VENV/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 130, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
peewee.OperationalError: FATAL:  Peer authentication failed for user "gitlab"

@Sukender
Copy link

Sukender commented Jun 4, 2018

I'm struggling with this and had some results (I can write the JSON output). I don't know if everything is okay, but at least I'm giving my hints there for future reference:
I added ftfy==4.4.3 (the latest version for Python2). Then in cli.py, I added some imports:

from ftfy import fix_encoding
import collections

And then a new function, inspired from SO (with added unicode support):

def convert(data):
    # Using Python2 str/unicode difference (WILL NOT WORK with Python3! str is always unicode and basestring disappeared)
    if isinstance(data, str):
        return data.decode('utf-8', 'replace')
    elif isinstance(data, unicode):
        return fix_encoding(data)
    elif isinstance(data, collections.Mapping):
        return dict(map(convert, data.iteritems()))
    elif isinstance(data, collections.Iterable):
        return type(data)(map(convert, data))
    else:
        return data

And finally added a call to that function in _dumps:

def _dumps(obj, fmt=None):
    if fmt == 'toml':
        return toml.dumps(obj)
    elif fmt == 'json':
        return json.dumps(convert(obj), sort_keys=True, indent=2, default=json_util.default)
    elif fmt == 'python':
        return pformat(obj, indent=2)
    elif fmt == 'pickle':
        return pickle.dumps(obj)
    else:
        return str(obj)

Note that the convert() may need to be called for other formats (I just focused on JSON).

Hope this will be useful.

@don-vip
Copy link

don-vip commented May 25, 2019

I faced this problem too and can confirm @Sukender patch resolves it. Could it be applied?

don-vip added a commit to don-vip/tracboat that referenced this issue May 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

4 participants