Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Issues #72

Open
yeesian opened this issue Sep 24, 2015 · 12 comments
Open

Performance Issues #72

yeesian opened this issue Sep 24, 2015 · 12 comments

Comments

@yeesian
Copy link
Collaborator

yeesian commented Sep 24, 2015

I've found the streaming parser to be too slow, mostly because it's maintaining a complicated state machine, and generating lots of objects during the parsing.

I think it's better to just slurp in the data, first, before dealing with the logic within julia. Here's an example:

With this package:

julia> MAP_FILENAME = "tech_square.osm"
"tech_square.osm"

julia> @time nodes, hwys, builds, feats = getOSMData(MAP_FILENAME);
  1.001549 seconds (340.25 k allocations: 1.888 GB, 8.55% gc time)

Under my proposal (implemented here):

julia> @time nodes1 = parseNodes(MAP_FILENAME);
  0.032125 seconds (163.66 k allocations: 10.621 MB, 17.01% gc time)

julia> @time ways1 = parseWays(MAP_FILENAME);
  0.022908 seconds (141.77 k allocations: 8.979 MB)

The differences become significant when processing the following osm file in julia v0.4:

Before:

julia> @time nodes, hwys, builds, feats = getOSMData(MAP_FILENAME);
  2161.987085 seconds (570.21 M allocations: 3.354 TB, 13.85% gc time)

After:

julia> @time nodes = OpenStreetMapParser.parseNodes(MAP_FILENAME);
  49.350339 seconds (244.94 M allocations: 14.512 GB, 9.32% gc time)

julia> @time ways = OpenStreetMapParser.parseWays(MAP_FILENAME);
  50.805148 seconds (209.75 M allocations: 11.937 GB, 23.16% gc time)
@yeesian
Copy link
Collaborator Author

yeesian commented Sep 24, 2015

I don't really know how to deal with it in this package though. I'm also leaning towards many other major changes (such as replacing Graphs with LightGraphs, modeling OSM elements more faithfully, and interoperability with DataFrames). Would you prefer for me to get the other package up to speed with most of the functionality here before dropping it in as a replacement, or to maintain a separate fork going forward?

@garborg
Copy link
Collaborator

garborg commented Sep 24, 2015

That all sounds great to me -- I'd guess with the major switches, it could
be easier to implement and benchmark them in the pared down package at
first.

On Thursday, September 24, 2015, Yeesian Ng notifications@github.com
wrote:

I don't really know how to deal with it in this package though. I'm also
leaning towards many other major changes (such as replacing Graphs with
LightGraphs, modeling OSM elements more faithfully, and interoperability
with DataFrames). Would you be interested for me to get the other package
https://github.com/yeesian/OpenStreetMapParser.jl up to speed with most
of the functionality here before dropping it in as a replacement, or to
maintain a separate fork going forward?


Reply to this email directly or view it on GitHub
#72 (comment)
.

@tedsteiner
Copy link
Owner

I feel like I missed something big here. Are you planning to replace this package entirely?

If there is a faster parser available, why not just incorporate it here instead? I can't see anyone arguing against a faster parser that just reads the nodes or ways. The same goes for Graphs vs. LightGraphs -- I think everyone's only preference would be towards something that works well.

I guess my question is, why do you need to fork or start fresh?

@garborg
Copy link
Collaborator

garborg commented Sep 24, 2015

I'm on the road, so I haven't followed the link, but it sounded to me like
a set of refactorings that would be easier to do (and Api easier to
experiment with) with a small core, and this is a bit of a sprawling
package. I assumed that would be temporary. Aside: the one item I still
wouldn't be in favor of is having data frames interop in the core
openstreetmap package.

On Thursday, September 24, 2015, Ted Steiner notifications@github.com
wrote:

I feel like I missed something big here. Are you planning to replace this
package entirely?

If there is a faster parser available, why not just incorporate it here
instead? I can't see anyone arguing against a faster parser that just reads
the nodes or ways. The same goes for Graphs vs. LightGraphs -- I think
everyone's only preference would be towards something that works well.

I guess my question is, why do you need to fork or start fresh?


Reply to this email directly or view it on GitHub
#72 (comment)
.

@yeesian
Copy link
Collaborator Author

yeesian commented Sep 24, 2015

Sorry I was in a seminar.

No, I don't have any intentions of replacing this package by registering a different package in METADATA. Any changes I'd like to see will still come in incremental PRs to this package.

But I did need to start afresh, for the reasons sean mentioned: to experiment with a smaller core, and see how changes (that are beyond the scope of a PR) would work out in practice. It will be really draining to do multiple major-but-incremental PRs (I have tried) that change both the parsers and the object models simultaneously, while (i) making sure everything else works, (ii) demonstrating why the changes might be necessary, and (iii) maintaining a tidy commit history.

@yeesian
Copy link
Collaborator Author

yeesian commented Sep 24, 2015

Aside: the one item I still wouldn't be in favor of is having data frames interop in the core openstreetmap package.

There won't be any dataframes operations in the core openstreetmap package. The OSM package/objects should be designed only to reflect the nature of OSM objects, rather than providing sophisticated data operations. The allusion to dataframes is only to allow users to export/convert OSM objects to dataframes (for functionality that OSM will not provide) if they so desire.

@garborg
Copy link
Collaborator

garborg commented Sep 24, 2015

Oh, I misread. That makes sense.

On Thursday, September 24, 2015, Yeesian Ng notifications@github.com
wrote:

Aside: the one item I still wouldn't be in favor of is having data frames
interop in the core openstreetmap package.

There wouldn't be any dataframes operations in the core openstreetmap
package. The OSM package/objects should be designed only to reflect the
nature of OSM objects, rather than providing sophisticated data operations.
The allusion to dataframes is to allow users to export/convert OSM objects
to dataframes (for functionality that OSM will not provide) if they so
desire.


Reply to this email directly or view it on GitHub
#72 (comment)
.

@tedsteiner
Copy link
Owner

Cool, I'm all for it. Thanks for explaining, @yeesian! I was just a little thrown off by some of the text in the scope section your readme. If there's anything I can do to help make your life easier just let me know!

@garborg
Copy link
Collaborator

garborg commented Sep 25, 2015

I'm back in front of a computer, and after reading @yeesian's readme, I agree that the package structure he outlines is what we should have in the long term.

Since I think @tedsteiner wants this package to remain a one-stop shop, which makes sense to have, I think that, once the design (apis and boundaries) settles (long term), making those individual packages offshoots / dependencies of this package, and registering those individual packages (for those that aren't served by always having graphing + plotting + ... dependencies in production) is the right goal.

@tedsteiner
Copy link
Owner

Sounds good to me.

@sbromberger
Copy link

Let me know how I can help with LightGraphs. Specifically, if there's functionality you need that doesn't exist, or you have questions relating to existing functionality, please feel free to open up an issue there so we can discuss the best way to proceed.

@yeesian
Copy link
Collaborator Author

yeesian commented Nov 2, 2015

So I've got stuff working (with LightGraphs) and running on OpenStreetMapParser, and am using that for my research at the moment. I'm really busy at the moment, and won't have time to start doing PRs to this package for awhile though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants