Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a special parser for unpaid charges #16

Open
wants to merge 44 commits into
base: master
Choose a base branch
from

Conversation

fedex1
Copy link

@fedex1 fedex1 commented Jun 2, 2016

It appears that the current parse.py is very much intertwined with special cases. It may be better to have individual scripts to parse out particular parts of the data. For example unpaid charges could be it's own script. So I created one.

Also I'd like to run this against all the data. It appears taxbills.nyc already has all the data. How big is the entire data directory?

@talos
Copy link
Owner

talos commented Jun 2, 2016

Thanks for opening the PR!

I noticed that parse_unpaid.py appears to be a copy of parse.py with modifications. Instead of copying the file and modifying it, could you please apply the modifications to the original? Otherwise it is very difficult for me to see what you added.

I understand what you're saying about having separate functionality in separate scripts. In that case, you should confine all your new functionality to a separate file and import the requirements from parse.py, instead of copying everything over. If I accepted this PR as is, I would be adding ~500 lines of duplicate code.

The data directory is several hundred GB. I can't remember off the top of my head.

@fedex1
Copy link
Author

fedex1 commented Jun 2, 2016

Yes will make parse and parse_unpaid more modular.

The reason I did it this way is there a lot of special cases such as:


        if i == 0:
                continue

that do not apply in all cases.

For the data directory could we zip up only the .TXT files. I would volunteer to do that if you give me read access to the files on the machine. It would take a long time doing it over the internet (I believe)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants