Added a special parser for unpaid charges #16

fedex1 · 2016-06-02T03:46:53Z

It appears that the current parse.py is very much intertwined with special cases. It may be better to have individual scripts to parse out particular parts of the data. For example unpaid charges could be it's own script. So I created one.

Also I'd like to run this against all the data. It appears taxbills.nyc already has all the data. How big is the entire data directory?

talos · 2016-06-02T11:25:41Z

Thanks for opening the PR!

I noticed that parse_unpaid.py appears to be a copy of parse.py with modifications. Instead of copying the file and modifying it, could you please apply the modifications to the original? Otherwise it is very difficult for me to see what you added.

I understand what you're saying about having separate functionality in separate scripts. In that case, you should confine all your new functionality to a separate file and import the requirements from parse.py, instead of copying everything over. If I accepted this PR as is, I would be adding ~500 lines of duplicate code.

The data directory is several hundred GB. I can't remember off the top of my head.

fedex1 · 2016-06-02T11:50:42Z

Yes will make parse and parse_unpaid more modular.

The reason I did it this way is there a lot of special cases such as:


        if i == 0:
                continue

that do not apply in all cases.

For the data directory could we zip up only the .TXT files. I would volunteer to do that if you give me read access to the files on the machine. It would take a long time doing it over the internet (I believe)

…ion-unit-counts

fedex1 and others added 6 commits June 1, 2016 14:04

add setup script

8bbb83c

add parser for unpaid charges

2a7c317

add test script

57b3fbc

Add brew macos

0e7888a

script

68df4e2

Add previous charges

acdd7ff

fedex1 and others added 22 commits June 2, 2016 14:50

add searchfor

086353e

add simpler search

2d30f35

Add pdftotext

78d354b

Add training data

fc76c86

add big unpaid items

63c7df0

add big unpaid items

6644f27

add big unpaid items

075deca

add big unpaid items

14cd2b8

add big unpaid items

eb31183

add big unpaid items

be9595d

add outstanding charges

48cd929

add more unpaid terms

759bd1c

add data

973a4b5

add more information links

0e0f839

add more examples

a2489d3

Merge branch 'master' of https://github.com/civic-data/nyc-stabilizat…

d8f95ec

…ion-unit-counts

add more examples

6a87df2

add more

bf90d32

add searchfor links

64ebda2

add more data

84b83d9

add new

67ffb4d

nyc link

6cf878c

fedex1 and others added 16 commits June 9, 2016 21:52

add uniq

5f4a0d8

add uniq

377ce94

add uniq

0755c12

Add bbl

d6ed7b7

add data

63c6b2e

speed up

c7b94e7

add header

2b1d603

Add property mapping

ab74a2e

Add property mapping

c345400

Add property mapping

1cd45fd

add driver

7ac97c9

remove some type to speed it up

ed5f710

add subset2

2c17aed

add script

99a04c2

Add 2015

58154a6

add report

61124df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a special parser for unpaid charges #16

Added a special parser for unpaid charges #16

fedex1 commented Jun 2, 2016

talos commented Jun 2, 2016

fedex1 commented Jun 2, 2016

Added a special parser for unpaid charges #16

Are you sure you want to change the base?

Added a special parser for unpaid charges #16

Conversation

fedex1 commented Jun 2, 2016

talos commented Jun 2, 2016

fedex1 commented Jun 2, 2016