osm loader -- support for very large queries #41

knaaptime · 2015-05-20T22:07:55Z

right now, very large OSM queries fail because python runs out of memory before the network object can be written to disk.

Would it be possible to add an option in the OSM loader to stream the overpass request to disk, allowing for really big networks?

jiffyclub · 2015-05-21T17:04:12Z

If you have a network that large won't you be unable to load it into memory anyway, even if it does make it to disk?

knaaptime · 2015-05-23T00:16:55Z

Well, the final network shouldn't be that big.

Before the OSM loader was added, @fscottfoti built me a network that covers the state of MD (an h5 file about 50mb). I want to create a similar network from scratch using the OSM loader, but when I try, eg:

from pandana.loaders import osm

osm.network_from_bbox(37.8856, -79.4872, 39.7905, -74.9852, network_type='walk', two_way=True)
lcn = network.low_connectivity_nodes(10000, 10, imp_name='distance')

network.save_hdf5('input/osmnetwork.h5', rm_nodes=lcn)

python eats up all of the machine's RAM, then quits with a memory error message. The same thing happens on an amazon server with 80gb of ram. There's no way the network is actually that big. Is it possible that something weird is happening during the overpass query?

maybe something like this? http://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py

jiffyclub · 2015-05-27T20:09:09Z

The overpass request response is JSON so there's really no way to stream it. You have to have the entire document in order to parse it.

When I get a chance to work on this the first step will be to figure out exactly what's using all the memory, whether it's the query data or some subsequent step.

fscottfoti · 2015-10-09T21:26:14Z

BTW, I tried this out. First I tried using a network from the Bay Area - basically all the 9 counties out basically to Sacramento. It ran fine in only about 5GB of memory and about 24 minutes. So that was great.

So I tried @knaaptime's query and did get a memory error on a 32GB machine. The place it errored for me was very strange though - in the from_records Pandas call?

Traceback (most recent call last):
File "go.py", line 3, in <module>
  network = osm.network_from_bbox(37.8856, -79.4872, 39.7905, -74.9852)
File "/home/ubuntu/pandana/pandana/loaders/osm.py", line 312, in network_from_bbox
  lat_min, lng_min, lat_max, lng_max, network_type)
File "/home/ubuntu/pandana/pandana/loaders/osm.py", line 202, in ways_in_bbox
  lat_min, lng_min, lat_max, lng_max, network_type=network_type)))
File "/home/ubuntu/pandana/pandana/loaders/osm.py", line 178, in parse_network_osm_query
  pd.DataFrame.from_records(nodes, index='id'),
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 888, in from_records
  columns)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 4808, in _arrays_to_mgr
  return create_block_manager_from_arrays(arrays, arr_names, axes)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3555, in      
 create_block_manager_from_arrays
blocks = form_blocks(arrays, names, axes)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3645, in 
  form_blocks
object_items, np.object_)
  File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3677, in _simple_blockify
values, placement = _stack_arrays(tuples, dtype)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3741, in _stack_arrays
stacked = np.empty(shape, dtype=dtype)
MemoryError

Eh2406 · 2017-03-10T15:53:49Z

Not the newish https://github.com/UDST/osmnet may be relevant here.

sablanchard · 2017-03-10T18:05:59Z

Thanks @Eh2406 that is correct the new OSMnet package fixes the issue with large bounding box queries: https://github.com/UDST/osmnet. The PR to replace the functions inside Pandana is still pending but will be merged soon so in the meantime anyone can use OSMnet to extract the nodes and edges from OSM and then place it inside a pandana network object. Will keep this issue open until the PR is merged and ready.

knaaptime · 2017-03-13T19:44:43Z

Thanks for the heads up. I noticed osmnet when i saw urbanaccess was released, and figured it would solve this issue.

I saw the PR get merged on friday so I ran the query again using osmnet. It finished in about an hour on my current-gen macbook pro

Downloaded OSM network data within bounding box from Overpass API in 40 request(s) and 907.23 seconds
657946 duplicate records removed. Took 160.33 seconds
Returning OSM data with 8,415,342 nodes and 774,600 ways...
Edge node pairs completed. Took 2,394.01 seconds
Returning processed graph with 986,809 nodes and 1,316,755 edges...
Completed OSM data download and Pandana node and edge table creation in 3,599.82 seconds

All looks good. Thanks for your great work

Osmnet. Addresses #41

sablanchard · 2017-03-14T23:16:33Z

Fixed with #63

knaaptime closed this as completed Mar 13, 2017

sablanchard added a commit that referenced this issue Mar 14, 2017

Merge pull request #63 from UDST/osmnet

b8714cd

Osmnet. Addresses #41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osm loader -- support for very large queries #41

osm loader -- support for very large queries #41

knaaptime commented May 20, 2015

jiffyclub commented May 21, 2015

knaaptime commented May 23, 2015

jiffyclub commented May 27, 2015

fscottfoti commented Oct 9, 2015

Eh2406 commented Mar 10, 2017

sablanchard commented Mar 10, 2017

knaaptime commented Mar 13, 2017

sablanchard commented Mar 14, 2017

osm loader -- support for very large queries #41

osm loader -- support for very large queries #41

Comments

knaaptime commented May 20, 2015

jiffyclub commented May 21, 2015

knaaptime commented May 23, 2015

jiffyclub commented May 27, 2015

fscottfoti commented Oct 9, 2015

Eh2406 commented Mar 10, 2017

sablanchard commented Mar 10, 2017

knaaptime commented Mar 13, 2017

sablanchard commented Mar 14, 2017