Skip to content
This repository has been archived by the owner on Aug 15, 2020. It is now read-only.

Python 3 compatibility issue -- pickle.load(), encoding argument #12

Closed
jmoseyko opened this issue Jul 23, 2018 · 4 comments · Fixed by #16
Closed

Python 3 compatibility issue -- pickle.load(), encoding argument #12

jmoseyko opened this issue Jul 23, 2018 · 4 comments · Fixed by #16
Assignees
Labels

Comments

@jmoseyko
Copy link

jmoseyko commented Jul 23, 2018

Line 96 in predict.py:
model = pickle.load(f,encoding = 'latin1')

Error:
TypeError: load() got an unexpected keyword argument 'encoding'

Temporary work-around:
For Python 3, eliminate encoding = [] arg entirely. Documentation specifies that the optional args are only relevant for Python 2.x.

@alexmloveless
Copy link

alexmloveless commented Jul 27, 2018

This breaks for Python 2.7 too. Pickle has no arguments on 2.x

$ /usr/local/anaconda2/bin/python cliner predict --txt data/examples/ex_doc.txt --out data/predictions --model models/silver.crf --format i2b2
Traceback (most recent call last):
File "cliner", line 60, in
main()
File "cliner", line 52, in main
predict.main()
File "/data/CliNER-master/code/predict.py", line 79, in main
predict(files, args.model, args.output, format=format)
File "/data/CliNER-master/code/predict.py", line 96, in predict
model = pickle.load(f,encoding = 'latin1')
TypeError: load() got an unexpected keyword argument 'encoding'

Removing the encoding argument results in a different error (same command):

File "/data/CliNER-master/code/machine_learning/crf.py", line 181, in predict
clf_byte = bytearray(clf, 'latin1')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 6: ordinal not in range(128)

I'm not sure if this is a fault with the code or the encoding in the silver.crf pickle.

EDIT:
Temporary workaround. Remove the encoding as above, then on line 181 of code/machine_learning/crf.py remove the encoding from the bytearray() arguments:

before:

clf_byte = bytearray(clf, 'latin1')

after:
clf_byte = bytearray(clf)

@simius simius self-assigned this Aug 16, 2018
@simius simius added the bug label Aug 16, 2018
@simius
Copy link
Collaborator

simius commented Aug 16, 2018

Changes to model.py, predict.py, and crf.py have been to address the encoding issues.

@simius simius closed this as completed Aug 16, 2018
@rajgupt
Copy link

rajgupt commented Dec 12, 2018

@simthyrearch - for python 3.7 this issue is still there.

(cliner) C:\Users\rgupta98\github\CliNER\code>python
Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

(cliner) C:\Users\rgupta98\github\CliNER\code>python ..\cliner predict --txt ../data/examples/ex_doc.txt --out ../data/p
redictions --model ../models/silver.crf --format i2b2
Traceback (most recent call last):
  File "..\cliner", line 60, in <module>
    main()
  File "..\cliner", line 52, in main
    predict.main()
  File "C:\Users\rgupta98\github\CliNER\code\predict.py", line 79, in main
    predict(files, args.model, args.output, format=format)
  File "C:\Users\rgupta98\github\CliNER\code\predict.py", line 96, in predict
    model = pickle.load(f, encoding='utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf4 in position 6: invalid continuation byte
(cliner) C:\Users\rgupta98\github\CliNER\code>python ..\cliner predict --txt ../data/examples/ex_doc.txt --out ../data/p
redictions --model ../models/silver.crf --format i2b2
C:\Users\rgupta98\AppData\Local\Continuum\anaconda3\envs\cliner\lib\site-packages\sklearn\base.py:251: UserWarning: Tryi
ng to unpickle estimator DictVectorizer from version 0.19.1 when using version 0.20.1. This might lead to breaking code
or invalid results. Use at your own risk.
  UserWarning)

        1 of 1
        ../data/examples/ex_doc.txt

        vectorizing words all
        predicting  labels all
Traceback (most recent call last):
  File "..\cliner", line 60, in <module>
    main()
  File "..\cliner", line 52, in main
    predict.main()
  File "C:\Users\rgupta98\github\CliNER\code\predict.py", line 79, in main
    predict(files, args.model, args.output, format=format)
  File "C:\Users\rgupta98\github\CliNER\code\predict.py", line 179, in predict
    labels = model.predict_classes_from_document(note)
  File "C:\Users\rgupta98\github\CliNER\code\model.py", line 282, in predict_classes_from_document
    return self.predict_classes(tokenized_sents)
  File "C:\Users\rgupta98\github\CliNER\code\model.py", line 313, in predict_classes
    hyperparams = hyperparams)
  File "C:\Users\rgupta98\github\CliNER\code\model.py", line 707, in generic_predict
    predictions =   crf.predict(clf, X)
  File "C:\Users\rgupta98\github\CliNER\code\machine_learning\crf.py", line 181, in predict
    clf_byte = bytearray(clf)
TypeError: string argument without an encoding

@suryacaprice
Copy link

same here issue is still there

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants