-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json output utf8 error #6
Comments
altho somewhy i'm sure the error comes from binary attachments, not actual text data. |
Thanks for the report. Yes, errors are related to attachments, I've encountered a lot of issues trying to serialize base64 stuff coming from xmlrpc into json, so now I switched from json to pickle as the defaul export format. My plans are to fix migration using a format that is known to work for sure (pickle) and then figure out how to fix all these serialization/deserialization problems. Allowing the user to specify the encoding could be useful but I'm not 100% sure, how can someone know in advance if its binary attachments content would be byte-serializable in a particular encoding? I was thinking about a more general approach, but I have to think about it. |
my first guess was that maybe some legacy svn commits used non-utf8 commit messages and that messed up the export. but yes indeed the error is from binary issue and wiki attachments. i was looking for some text version of the output to be able to see what goes into export. so far i wrote just little wrapper for myself: # cat pickledump.py
#!/usr/bin/python
import pickle
from pprint import pprint
from_export_file = 'comment.pickle'
with open(from_export_file, 'r') as f:
content = f.read()
data = pickle.loads(content)
#del data['tickets']
#del data['wiki']
pprint(data) |
Nice point. If I remember well, the |
|
Yeah I know, README is lacking :) |
this decode error is kicking me in every corner. can't use export, can't use import. can't disable wiki importing without hacking code. the stacktrace as already known is not much useful
oh, there's no info how to use logging, i also modified code to dump log to file and |
regarding creating export file. you force utf-8 there, but the result of pickle dump is not utf8, is it? def export(ctx, trac_uri, ssl_verify, format, out_file): # pylint: disable=redefined-builtin
"""export a complete Trac instance"""
LOG = logging.getLogger(ctx.info_name)
#
LOG.info('crawling Trac instance: %s', _sanitize_url(trac_uri))
source = trac.connect(trac_uri, encoding='UTF-8', use_datetime=True, ssl_verify=ssl_verify)
project = trac.project_get(source, collect_authors=True)
project = _dumps(project, fmt=format)
if out_file:
LOG.info('writing export to %s', out_file)
with codecs.open(out_file, 'wb', encoding='utf-8') as out_f:
out_f.write(project)
else:
click.echo(project) as seems currently can't save .pickle or .json format (data is not utf-8), can't load .python format, gives "Malformed string" error:
and can't run migrate without export file because the gitlab_direct can't connect to db, and no info how to configure that...
|
I'm struggling with this and had some results (I can write the JSON output). I don't know if everything is okay, but at least I'm giving my hints there for future reference: from ftfy import fix_encoding
import collections And then a new function, inspired from SO (with added unicode support): def convert(data):
# Using Python2 str/unicode difference (WILL NOT WORK with Python3! str is always unicode and basestring disappeared)
if isinstance(data, str):
return data.decode('utf-8', 'replace')
elif isinstance(data, unicode):
return fix_encoding(data)
elif isinstance(data, collections.Mapping):
return dict(map(convert, data.iteritems()))
elif isinstance(data, collections.Iterable):
return type(data)(map(convert, data))
else:
return data And finally added a call to that function in def _dumps(obj, fmt=None):
if fmt == 'toml':
return toml.dumps(obj)
elif fmt == 'json':
return json.dumps(convert(obj), sort_keys=True, indent=2, default=json_util.default)
elif fmt == 'python':
return pformat(obj, indent=2)
elif fmt == 'pickle':
return pickle.dumps(obj)
else:
return str(obj) Note that the Hope this will be useful. |
I faced this problem too and can confirm @Sukender patch resolves it. Could it be applied? |
invoking with json output
--format=json
i get error:traceback is not useful, so i'm not posting it here.
as you may already know json is utf8 strict, no single byte encodings allowed. perhaps allow specify encoding or just try to convert from latin1?
The text was updated successfully, but these errors were encountered: