Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing rows from GeoNames cities1000.txt. #4

Merged
merged 1 commit into from Mar 28, 2015
Merged

Fix missing rows from GeoNames cities1000.txt. #4

merged 1 commit into from Mar 28, 2015

Conversation

bdon
Copy link
Contributor

@bdon bdon commented Mar 28, 2015

This fixes the parsing of the geonames CSV, you can reproduce the problem like this

import csv
import sys
csv.field_size_limit(sys.maxsize)
count1 = 0
count2 = 0
count3 = 0
for row in open('cities1000.txt','rb'):
  count1 = count1 + 1
for row in csv.reader(open('cities1000.txt','rb'),delimiter='\t'):
  count2 = count2 + 1
for row in csv.reader(open('cities1000.txt','rb'),delimiter='\t',quoting=csv.QUOTE_NONE):
  count3 = count3 + 1
print count1, count2, count3

outputs

144348 138630 144348

adding the quoting option corrects the 6000+ missing rows in the final csv.

@bdon bdon mentioned this pull request Mar 28, 2015
thampiman added a commit that referenced this pull request Mar 28, 2015
Fix missing rows from GeoNames cities1000.txt.
@thampiman thampiman merged commit 7ab5241 into thampiman:master Mar 28, 2015
@bdon
Copy link
Contributor Author

bdon commented Mar 28, 2015

Forgot to mention - should probably regenerate the committed CSV as well

@thampiman
Copy link
Owner

Excellent. Thanks for that fix. I've regenerated the CSV and v1.1 is now up on PyPI. README is updated as well. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants