Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code changes via PM from u/fio_smiles #26

Closed
wants to merge 10 commits into from
Closed

code changes via PM from u/fio_smiles #26

wants to merge 10 commits into from

Conversation

torbengb
Copy link
Member

@torbengb torbengb commented Oct 20, 2017

code changes via PM from u/fio_smiles to address issue #25: "I don't know if that will work with Python 3+". Let's review and test it, see if we can build on it.
I will test it on my Linux tonight.

"I don't know if that will work with Python 3+"
Let's review and test it, see if we can build on it.
Copy link
Member

@nocalla nocalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'd consider here is doing a Python version check before each of the relevant steps and doing what needs to be done only then. Wasn't there another error you were getting? Also need to check before importing config parser which version to import.

@nocalla
Copy link
Member

nocalla commented Oct 20, 2017

PM him back and get him to join Github so we can credit him properly!

@nocalla
Copy link
Member

nocalla commented Oct 20, 2017

Or her! Accidental sexism, sorry!

@torbengb
Copy link
Member Author

I'm working on it via PM :-) S/he is already intrigued by GitHub and might sign up. Would be cool!

@torbengb
Copy link
Member Author

Okay I'm having trouble running this modified code. As-is I got this error:

Traceback (most recent call last):
  File "newtest.py", line 27, in <module>
    from future import unicode_literals # issue #25
ImportError: No module named future

So I tried installing future with the usual command sudo python -m pip install future but then it said:

Could not import setuptools which is required to install from a source distribution.
Please install setuptools.

So I did: sudo python -m pip install setuptools followed by sudo python -m pip install future. Then I got:

Traceback (most recent call last):
  File "newtest.py", line 27, in <module>
    from future import unicode_literals # issue #25
ImportError: cannot import name unicode_literals

I don't know what unicode_literals is but I tried installing that like above, but failed ("No matching distribution found for unicode_literals").

Without a better understanding of Python, I'm going to have to pass on this code change. Sorry.

@nocalla
Copy link
Member

nocalla commented Oct 20, 2017 via email

@torbengb torbengb removed their assignment Oct 20, 2017
@fiosmiles
Copy link

She :)
Not a developer and just hacked about with this to make it work, but I'll take a look next week and see if I can help.

It's future which may have been lost in formatting, see this link:
http://python-future.org/unicode_literals.html

@fiosmiles
Copy link

Ahhh, sorry the automatically formating is ruining it, there's two underscores before and after the word future, if you follow the link it should make more sense.

@torbengb
Copy link
Member Author

torbengb commented Oct 21, 2017

Welcome @fiosmiles!! Great to see you on board!

I've added the underscores (see new commit linked above) and now there's just a single issue left, but I can't figure out how to solve it:

  File "bank2ynab.py", line 81, in clean_data
    transaction_reader = csv.reader(transaction_file, delimiter = delim)
TypeError: "delimiter" must be string, not unicode

I've tried appending .encode('utf-8') or .decode('utf-8') to the assignment a few lines above, but it doesn't seem to make a difference. The specific line being referenced puzzles me because there's nothing in there:

 output_data = []

That's when I force the script to run as Python 2 (using the linux command python2 bank2ynab.py). When I run it as Python 3 (python3 bank2ynab.py), it works fine.

@nocalla
Copy link
Member

nocalla commented Oct 21, 2017

I haven't played with this yet. That future thing is interesting!

Edit: I see what you meant about auto formatting now @fiosmiles!

Does the CSV without the newline output correctly on python 3? The reason I had it was because I was getting extra blank rows between each row without.

@torbengb
Copy link
Member Author

Does the CSV without the newline output correctly on python 3?

Yes it does. I think the "wb" setting in line 125 did the trick.

@nocalla
Copy link
Member

nocalla commented Oct 21, 2017 via email

@nocalla
Copy link
Member

nocalla commented Oct 21, 2017

I'm getting errors.
For the write_data change to "wb":
Traceback (most recent call last): File "bank2ynab.py", line 174, in <module> main() File "bank2ynab.py", line 167, in main write_data(file, output) File "bank2ynab.py", line 129, in write_data writer.writerow(row) TypeError: a bytes-like object is required, not 'str'

And for encoding the delimiter:
Traceback (most recent call last): File "bank2ynab.py", line 174, in <module> main() File "bank2ynab.py", line 166, in main output = clean_data(file) File "bank2ynab.py", line 84, in clean_data transaction_reader = csv.reader(transaction_file, delimiter = delim) TypeError: "delimiter" must be string, not bytes

@fiosmiles
Copy link

@nocalla is there a sample file that everyone is using? If so, can you point me at it and I'll try that file, or if not can you share your file or some version of it and I'll try it on my end.

@fiosmiles
Copy link

Also, this is something I was looking at when I hacked this to work for me on 2.7, maybe the switch is worth implementing if it's just not compatible with 3+

https://www.reddit.com/r/ynab/comments/77d7rt/tools_bank2ynab_heres_a_script_that_helps_with/

@nocalla
Copy link
Member

nocalla commented Oct 21, 2017 via email

@torbengb
Copy link
Member Author

Here's a test file, feel free to add more in that folder: https://github.com/torbengb/bank2ynab/tree/master/test-data
I hope I anonymized it sufficiently :-)

@nocalla
Copy link
Member

nocalla commented Oct 22, 2017

I like this solution for the CSV issue.

@nocalla
Copy link
Member

nocalla commented Oct 22, 2017

Also, we have to account for the renamed configparser! I think both of you have installed the Python 3 version as a workaround, but we should make it dependent on the pre-installed modules only. I think the best way to do that is import ConfigParser as configparser near the top of the code!

nocalla and others added 4 commits October 22, 2017 10:24
Also remove delimiter encoding as shouldn't be needed with unicode_literals and was causing me errors
Also move future to top of file to prevent error

SyntaxError: from __future__ imports must occur at the beginning of the file
TypeError: "delimiter" must be string, not bytes
Add python version check for csv writing
If running 2.x, import ConfigParser but rename it to configparser. I don't think this should cause any issues with the features we're using of the config parser.
Add support for 2.x version of configparser
@nocalla
Copy link
Member

nocalla commented Oct 22, 2017

This all works fine at my end, but I need @fiosmiles and @torbengb to test it out, because I know it won't work properly at your end! If we can pinpoint the errors, then we can track down what to focus on, so please paste your lovely exceptions.

@torbengb
Copy link
Member Author

I'll test your code on my Linux (tonight?) and reply then.

@torbengb torbengb self-assigned this Oct 22, 2017
@torbengb
Copy link
Member Author

torbengb commented Oct 22, 2017

Sorry to bring bad news, but it doesn't run against my normal CSV file (very similar to the uploaded test file).

With Python 2 I get this error:

Traceback (most recent call last):
  (...)
  File "bank2ynab.py", line 41, in get_configs
    config.read(conf_files, encoding = "utf-8")
TypeError: read() got an unexpected keyword argument 'encoding'

And with Python 3 I get this error:

Traceback (most recent call last):
  (...)
  File "bank2ynab.py", line 89, in clean_data
    transaction_data = list(transaction_reader)
  File "/usr/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 541: invalid continuation byte

I then tried a search&replace for all the German umlaut characters in both lowercase&uppercase, but I still got the errors. I noted that the VISA CSV export has a LF line ending while the bank CSV export has CR+LF line endings. This is from the same bank, from the same online banking portal. I didn't try messing around with these line endings though.

I then used Notepad++ to convert the CSV file to ANSI file format. Note that I only converted the bank CSV and not the VISA CSV. After that, the error in Python 2 remains but the script ran successfully in Python 3!

I can't make much out of this but I'm sure a detailed feedback might help you identify something I've missed. Mail me at torben@g-b.dk and I can email you the actual files back. I don't want to post them here.

Greetings from rainy Austria,
Ben

@nocalla
Copy link
Member

nocalla commented Oct 23, 2017

After merging issue #29 into this, do you still get the same error here, @torbengb?

@torbengb
Copy link
Member Author

@nocalla I will test on my home Linux home tonight and respond here.

@torbengb
Copy link
Member Author

Test result with Python3: success! No errors found.

Test result with Python2:

Traceback (most recent call last):
  File "bank2ynab.py", line 197, in <module>
    main()
  File "bank2ynab.py", line 170, in main
    all_configs = get_configs()
  File "bank2ynab.py", line 34, in get_configs
    config.read(conf_files, encoding = "utf-8")
TypeError: read() got an unexpected keyword argument 'encoding'

@nocalla
Copy link
Member

nocalla commented Oct 24, 2017 via email

@torbengb
Copy link
Member Author

Well, if I leave out the , encoding = "utf-8" from line 34 then I get a different error:

Traceback (most recent call last):
  File "bank2ynab.py", line 196, in <module>
    main()
  File "bank2ynab.py", line 177, in main
    g_config = fix_conf_params(all_configs[section])
AttributeError: ConfigParser instance has no attribute '__getitem__'

But it still works in Python3!

@nocalla
Copy link
Member

nocalla commented Oct 24, 2017 via email

@torbengb
Copy link
Member Author

Hang on, I downloaded new CSV files and also downloaded a fresh copy of the bank2ynab.py in this pull request.

Test result with Python3: success! No errors found.

Test result with Python2:

Traceback (most recent call last):
  File "bank2ynab.py", line 196, in <module>
    main()
  File "bank2ynab.py", line 170, in main
    all_configs = get_configs()
  File "bank2ynab.py", line 34, in get_configs
    config.read(conf_files, encoding = "utf-8")
TypeError: read() got an unexpected keyword argument 'encoding'

Then I removed the , encoding = "utf-8"from the file and ran it again with Python 2:

Traceback (most recent call last):
  File "bank2ynab.py", line 196, in <module>
    main()
  File "bank2ynab.py", line 177, in main
    g_config = fix_conf_params(all_configs[section])
AttributeError: ConfigParser instance has no attribute '__getitem__'

Does this help you?

@nocalla
Copy link
Member

nocalla commented Oct 24, 2017

Need to do some reading about ConfigParser! But thanks.

@nocalla nocalla mentioned this pull request Oct 25, 2017
3 tasks
@toyg
Copy link
Collaborator

toyg commented Oct 25, 2017

@torbengb

AttributeError: ConfigParser instance has no attribute '__getitem__'

I handled this exact error in the py2 branch, the ConfigParser interface is different in 2 and 3, so that's done.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4

Eh, this is bitchy. Basically the encoding for these files is incompatible with utf-8 somewhere in its space. This is unsolvable with precision unless we include something like chardet.

A possible strategy would be to do dummy csv runs through each file, surrounded by try/except, with the most popular charsets (utf-8, utf-16, iso-8859 variants etc) until one completes without error.

def detect_encoding(filepath):
    for enc in ['utf-8',...]:
        try:
            with open(filepath, "rb", encoding=enc):
                for row in  csv.reader(transaction_file): continue
            return enc
        except ValueError:   # parent of UnicodeDecodeError and all that
           continue

This has obvious performance implications, but it might be acceptable in most cases - considering people will use this once a month or so, waiting a few seconds more should not be a biggie.

@torbengb
Copy link
Member Author

I like your idea of iterating through several charsets, it's actually an elegant solution. I hope it's worth the trouble!

As for performance I'm sure it doesn't matter at all, given how the script already runs in the blink of an eye. I'd be surprised if it takes two seconds to complete this loop.

@nocalla
Copy link
Member

nocalla commented Oct 25, 2017

I quite like the try except way also. Interesting approach. I still feel it's a bit weird that I'm not getting that error on Windows though.

@nocalla
Copy link
Member

nocalla commented Oct 25, 2017

Made a pull request to merge @toyg 's branch into this so we can compare approaches. I think the py2 branch effectively quashes most errors but this one has most discussion attached! #54

@nocalla
Copy link
Member

nocalla commented Oct 25, 2017

Merged into the py2 branch. #54

@nocalla nocalla closed this Oct 25, 2017
torbengb added a commit that referenced this pull request Mar 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants