Python 3 compatibility issue -- pickle.load(), encoding argument #12

jmoseyko · 2018-07-23T01:21:21Z

Line 96 in predict.py:
model = pickle.load(f,encoding = 'latin1')

Error:
TypeError: load() got an unexpected keyword argument 'encoding'

Temporary work-around:
For Python 3, eliminate encoding = [] arg entirely. Documentation specifies that the optional args are only relevant for Python 2.x.

alexmloveless · 2018-07-27T12:34:00Z

This breaks for Python 2.7 too. Pickle has no arguments on 2.x

$ /usr/local/anaconda2/bin/python cliner predict --txt data/examples/ex_doc.txt --out data/predictions --model models/silver.crf --format i2b2
Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 52, in main
predict.main()
File "/data/CliNER-master/code/predict.py", line 79, in main
predict(files, args.model, args.output, format=format)
File "/data/CliNER-master/code/predict.py", line 96, in predict
model = pickle.load(f,encoding = 'latin1')
TypeError: load() got an unexpected keyword argument 'encoding'

Removing the encoding argument results in a different error (same command):

File "/data/CliNER-master/code/machine_learning/crf.py", line 181, in predict
clf_byte = bytearray(clf, 'latin1')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 6: ordinal not in range(128)

I'm not sure if this is a fault with the code or the encoding in the silver.crf pickle.

EDIT:
Temporary workaround. Remove the encoding as above, then on line 181 of code/machine_learning/crf.py remove the encoding from the bytearray() arguments:

before:

clf_byte = bytearray(clf, 'latin1')

after:
clf_byte = bytearray(clf)

simius · 2018-08-16T19:22:26Z

Changes to model.py, predict.py, and crf.py have been to address the encoding issues.

rajgupt · 2018-12-12T07:54:44Z

@simthyrearch - for python 3.7 this issue is still there.

(cliner) C:\Users\rgupta98\github\CliNER\code>python
Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

(cliner) C:\Users\rgupta98\github\CliNER\code>python ..\cliner predict --txt ../data/examples/ex_doc.txt --out ../data/p
redictions --model ../models/silver.crf --format i2b2
Traceback (most recent call last):
  File "..\cliner", line 60, in <module>
    main()
  File "..\cliner", line 52, in main
    predict.main()
  File "C:\Users\rgupta98\github\CliNER\code\predict.py", line 79, in main
    predict(files, args.model, args.output, format=format)
  File "C:\Users\rgupta98\github\CliNER\code\predict.py", line 96, in predict
    model = pickle.load(f, encoding='utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf4 in position 6: invalid continuation byte

(cliner) C:\Users\rgupta98\github\CliNER\code>python ..\cliner predict --txt ../data/examples/ex_doc.txt --out ../data/p
redictions --model ../models/silver.crf --format i2b2
C:\Users\rgupta98\AppData\Local\Continuum\anaconda3\envs\cliner\lib\site-packages\sklearn\base.py:251: UserWarning: Tryi
ng to unpickle estimator DictVectorizer from version 0.19.1 when using version 0.20.1. This might lead to breaking code
or invalid results. Use at your own risk.
  UserWarning)

        1 of 1
        ../data/examples/ex_doc.txt

        vectorizing words all
        predicting  labels all
Traceback (most recent call last):
  File "..\cliner", line 60, in <module>
    main()
  File "..\cliner", line 52, in main
    predict.main()
  File "C:\Users\rgupta98\github\CliNER\code\predict.py", line 79, in main
    predict(files, args.model, args.output, format=format)
  File "C:\Users\rgupta98\github\CliNER\code\predict.py", line 179, in predict
    labels = model.predict_classes_from_document(note)
  File "C:\Users\rgupta98\github\CliNER\code\model.py", line 282, in predict_classes_from_document
    return self.predict_classes(tokenized_sents)
  File "C:\Users\rgupta98\github\CliNER\code\model.py", line 313, in predict_classes
    hyperparams = hyperparams)
  File "C:\Users\rgupta98\github\CliNER\code\model.py", line 707, in generic_predict
    predictions =   crf.predict(clf, X)
  File "C:\Users\rgupta98\github\CliNER\code\machine_learning\crf.py", line 181, in predict
    clf_byte = bytearray(clf)
TypeError: string argument without an encoding

suryacaprice · 2018-12-17T11:53:52Z

same here issue is still there

simius self-assigned this Aug 16, 2018

simius added the bug label Aug 16, 2018

simius closed this as completed Aug 16, 2018

correlator mentioned this issue Dec 17, 2018

ensure that the model has a default encoding of latin #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python 3 compatibility issue -- pickle.load(), encoding argument #12

Python 3 compatibility issue -- pickle.load(), encoding argument #12

jmoseyko commented Jul 23, 2018 •

edited

Loading

alexmloveless commented Jul 27, 2018 •

edited

Loading

simius commented Aug 16, 2018

rajgupt commented Dec 12, 2018

suryacaprice commented Dec 17, 2018

Python 3 compatibility issue -- pickle.load(), encoding argument #12

Python 3 compatibility issue -- pickle.load(), encoding argument #12

Comments

jmoseyko commented Jul 23, 2018 • edited Loading

alexmloveless commented Jul 27, 2018 • edited Loading

simius commented Aug 16, 2018

rajgupt commented Dec 12, 2018

suryacaprice commented Dec 17, 2018

jmoseyko commented Jul 23, 2018 •

edited

Loading

alexmloveless commented Jul 27, 2018 •

edited

Loading