Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #62

Open
11 tasks
yeesian opened this issue Jan 21, 2015 · 16 comments
Open
11 tasks

Roadmap #62

yeesian opened this issue Jan 21, 2015 · 16 comments

Comments

@yeesian
Copy link
Collaborator

yeesian commented Jan 21, 2015

I've seen the milestones, and think it's worth laying out what I think is left, for present and future contributors, before we release (the ever elusive) 1.0? Feedback welcome!

  1. OSM Elements: I think we can take our cue from the OSM Concepts adopted by imposm-parser
    • Provide Support for Relations
    • Query API (for filtering of osm-elements/tags/etc - previous discussion here)
    • Data Structures -- my preference is towards reasonable and concrete data types, and to add abstractions only when necessary (some discussion here and here)
  2. Interfaces for interoperability with other Packages

Beyond 1.0, since this has always been an experimental kind of repository, I think it's worth having a look at our python neighbours for inspiration:

I think it makes sense for Buildings/Features/Highways to return DataFrames, with columns corresponding to tags/etc, rather than the existing Dicts that we use.

@kjordahl @jwass @fscottfoti

@garborg
Copy link
Collaborator

garborg commented Jan 21, 2015

This looks great.

PBF as another possible format (it would be especially nice right now, with strings being costly, but that should be improved at least in 0.4)?

Speed limits.

@tedsteiner
Copy link
Owner

I agree these look like good features to add. A couple comments on the features:

  • I would also like speed limits, and also elevation data (I used LLA format, but OSM doesn't provide the altitude). These would be really valuable, but they don't seem easily available.
  • I'm not really sure that I see the need for other visualizers. I see value in more detailed map rendering, but I'm not sure why we should duplicate capability.
  • I have thought about adding an interface to the Overpass API, but for my own use cases, using the XML files works better (I don't want my map pulls to change). That certainly doesn't mean we shouldn't have it, I just don't have a need for it, personally.
  • Interfacing to a real routing engine like OSRM sounds cool and valuable, but I think it should be in a separate add-on package, so that OpenStreetMap.jl doesn't require additional dependencies or compiling.
  • I haven't used DataFrames. I'm not opposed to the switch, but I'm also not sure of their value here.

In general, I would prefer for OpenStreetMap.jl to be an easy way to parse OSM data and perform basic tasks with it, to serve as a starting point for anyone wanting to work with OSM data and a central hub for OSM-related packages by defining common data representations. Then fancier routing could be performed with an additional package, or detailed map tile rendering with a separate package. But OpenStreetMap.jl will help you get your data loaded, converted into the coordinate system you prefer, cropped to a region and filtered, combined (e.g., add speed limits and elevations), allow you to perform basic capabilities like find the drive time regions or find a driving route, and give you a quick visualization of the results.

I think I have a slightly different view on versioning than others, but in my view, many of the features you mentioned could be released as versions 1.1, 1.2, etc, leading up to version 2.0. The only thing that I really think is worth waiting for on version 1.0 is making sure we have settled on our data formats and function interfaces for the core tasks. I'm not convinced we're there yet, but we're close. Also, I'm not sure whether it's considered to be in poor form to have version 1.0 when Julia hasn't even hit 1.0 yet. But I think that as soon as we hit version 1.0, we will be making a declaration about stability, and more people will be willing to use the package. Basically, I don't want Version 1.0 to be ever-elusive. I also think that once we get the core functionality, the package can probably be mostly "done," and most additional features will probably fit better in add-on packages.

@garborg
Copy link
Collaborator

garborg commented Jan 21, 2015

I have to say, I'm not as convinced we're close to 1.0. I'm not saying the interfaces are bad, or that we have to wait until Julia and all the packages we rely on are 1.0 to go there ourselves. Just that:

  • it would be strange to go 1.0 before Julia or any of the packages we rely on go 1.0
    • we often return the data structures they create / make available, so the stability promise that we would not change any of those things without making a major version bump seems both unkeepable, and like an unacceptable burden on development of this and complementary packages
  • none of the most popular packages, packages with 1000 commits, packages with dozens of contributors, are at 1.0, or likely near it
  • from the last couple comments (and I'm in the same place) there's a lot in the air about what should end up in what packages even
  • I think the interfaces will likely benefit benefit if they get a couple rounds of updates over time

My hope would be for the roadmap to be the goal and not the number. I don't think forcing the package to have multiple development branches, with new or split out packages having to keep in sync with the more obscure branches rather than release (and be hard to release themselves as a result), would be a good situation, especially with only a few developers.

The conversation around functionality and scope of packages seems great to me, and I think we should flesh out priorities and timelines, just perhaps without 1.0 presumed to fall so early in it. (I'm also not saying robustness and stability and full-featured-ness and improved APIs should be pushed off.)

@yeesian
Copy link
Collaborator Author

yeesian commented Jan 21, 2015

Rather than editting the original comment, I think I'll just reply here, so that your replies still make sense to subsequent viewers --

Yeah, I agree with the both of you: let's frame the discussion as goals for subsequent "minor versions" (increment by 0.1), rather than 1.0, then? I think feature releases (that doesn't break/deprecate anything) should actually be released as patches (see the example by JuMP), which we haven't been following so far.. Our previous milestones should really have been patches instead?


Regarding features:

  • we should include speed limits and elevation data then (are they provided in OSM though?)
  • I'm not familiar with the PBF format, but sure! There's also lots of other formats: WKT, GTFS. The emphasis on GeoJSON is because it seems to be the de facto format for web maps. I included shapefiles because I commonly work with them too, and julia has an existing package for parsing it.
  • just thought I should mention:
    • it took me quite a while to figure out how to download the osm extracts that I'm interested in (especially for people who have not worked with OSM data before), and
    • the interface to the Overpass API is a big deal for people who want the latest changes/features in OSM. We could
    • we could perhaps provide convenience functions for people to fetch metro extracts from mapzen too (see their github repo)
  • regarding the use of DataFrames, I think their value is better understood through the examples by geopandas and pandana. We don't seem to use features/buildings much, so we're ambivalent either way. But for people doing urban studies/simulations, I think it makes sense to "query" (i.e. what we've been doing so far by providing keyword arguments and filtering) OSM data, and get results back in the form of a table (hence a DataFrame), for subsequent processing in the data-analysis pipeline.

All the stuff on "interoperability with other Packages" is not a priority (to me) at this point, so they might seem like over-engineering at this point. I'll try to explain the motivation for them, using plotting/visualizations as a running example:

I like the default styles we have, but it's not clear to me how to conditionally plot objects (buildings/highways) in a modular/composable way, apart from

  • "removing those objects and re-plotting" (which btw might be made easier if we work with DataFrames, rather than Dicts),
  • "write the logic of selecting the objects you want to plot, and then plotting them (which is what we do right now, but would get more and more complicated when we deal with more osm attributes/objects)" or
  • "export those objects (since you have the data right?), and plot it yourself (and lose all the nice styling defaults in OSM)"

I had thought of re-writing it in Compose/Gadfly (as an exercise), and there are immediate benefits, like the ability to export to alot more formats:

PNG, Postscript, PDF, SVG. The SVG backend uses embedded javascript, powered by Snap.svg to add interactivity like panning, zooming, and toggling"

and would be helpful for web mappers. Which made me wonder about maintaining support for Winston, and the possibility of allowing for other plotting engines (if we want to support both Gadfly and Winston, might as well get it right the first time?)

We've gone through the same experience with xml parsing, which is why I think it's worth bringing it up again. The immediate way forward would just to be a lot more careful about reducing the internal coupling of the functions we write with the libraries that we use (which wasn't the case with XML parsing, and isn't the case with plotting). That way, if we move towards a model of allowing extensions to be built upon OSM, it'll become easier to define abstract interfaces for the various extensions to implement.

@garborg
Copy link
Collaborator

garborg commented Jan 21, 2015

@yeesian I agree about what you say about decoupling, and what we should think hard about, but I think rather than making interoperability not a priority, it means it's a bigger deal.

Being able to factor things out into packages at will gets more important, as does making sure experimental packages can be based off our master branch.

For example, I think we're on the same page about this, but the last PR would have had to wait until 2.0 if we were at 1.0 right now, in a branch that creates double work to keep up with bugfixes and and features on master, and other functionality developers wanted to be based on the current/future state of the ecosystem rather than the past, would have to interoperate with an unpublished, untagged branch of our package. It also would have been a pain if we weren't willing to deprecate APIs quickly in the name of coherent building blocks.

More relevant, various functionality for updating, subsetting, analyzing geospatial, representing geospatial data will likely move across package boundaries as we experiment with the roadmap items you and Ted mentioned, and updating the interfaces at function and package boundaries as new use cases comes up, seems critical.

Anyway, that doesn't counter anything you said -- I just wanted to bring it up in case is spurs any discussion about belongs in and out of this package, and how to get there with the least friction given we probably don't know yet. Specifically because Ted mentioned not everything belongs inside this package, and you didn't seem to be thinking about complementary packages, though your interfaces approach seems right for it.

Maybe the will be a lot of things too tightly coupled to OSM's specific data format to move them out, and what we'll be moving out are more of the building blocks, like how to draw generic points and ways and features to any backend in a composable way, with an OSM wrapper staying in the package.

P.S. Thanks for linking so heavily in recent issues -- I'm learning about a lot of new projects, and remembering discussions I had forgotten about, thanks to you.

@garborg
Copy link
Collaborator

garborg commented Jan 21, 2015

Oh, agreed on releasing more patchlevel versions.

Speed limits exist for a significant minority of roads. I think Ted knows more about elevation, but sounds like it's a very small minority, and people integrate other data sources when they need it?

@tedsteiner
Copy link
Owner

Versioning

First, I want to say I agree with Sean on versioning. And I'm uncomfortable with going to version 1.0 before Julia does. However, just because we don't have 1000 commits doesn't mean, to me, we are versioning too fast. It just means that the functionality we're providing probably isn't as complex. And I think the interfaces need time to settle.

I think that the versioning so far has been accurate, and I disagree that the releases so far should have been patches. I've been going off the Semantic Versioning Guidelines, which was suggested in my original Metadata pull request. It looks like JuMP is following this, as well, but maybe we should also have a news page at some point. I agree we should frame the milestones in terms of minor versions. I think we should push out patches when necessary, but never as a milestone. I would love to nail down the core API and release version 1.0, but I think that's a way off. But in my mind, the only thing required of version 1.0 is stabilizing the API, not any additional features.

@garborg I see you posted as I was writing. But I definitely get what you mean about the multiple branches, and I don't want to have to keep up with that. All I really meant was that if we know for certain we can lock down the API, then I think we're ready for version 1.0. But we're a ways away from knowing that yet. But if Julia, LibExpat, Winston, and Graphs all bumped to version 1.0 tomorrow, I'd suggest we focus more on API stabilization than we are currently and work towards version 1.0.

Additional Features

  • Speed limits are only occasionally available in OSM. We would need to get this from another database if we were going to use them for routing, but I haven't looked into it very much. There are global elevation databases, but they are pretty big files so I ended up not dealing with them. I don't know of a way to query a database to give us the elevation profile of just a Bounds object, for example. As a navigation person, I'll admit it drives me nuts that elevation changes aren't incorporated when we compute distances between nodes.
  • I certainly agree that integration with Overpass would be a valuable inclusion. I had a ton of difficulty learning that the first round, but I tried to include simple directions in our own documentation. Did you happen to see those, and did they help?
  • DataFrames do sound nice, thank you for explaining.

I like that Julia packages tend to be focused, and the awesome repository system makes it easy to have dependencies. I think we could basically have "modules" that exist as separate packages for specific tasks, and link to them all from the OpenStreetMap.jl main page, or Geodesy.jl. I think that anything requiring additional source dependencies should be modulized, so the average user doesn't have to worry about compilation issues, etc.

Map Plotting

  • Winston can export in JPEG (barf), PNG, PDF, and EPS. I use EPS for everything. I'm not sure if it can do SVG or not.
  • Plotting is slow, which I'm not crazy about, but I'm hoping Winston continues to speed up as it matures.
  • I really wish we could pan and zoom, but not enough so for me to have done anything about it. :)
  • I have always seen the map plotting capabilities as just a convenience, since visualizing your data is so important in really any field. I'm all for more detailed map viewing, but I question at what point it should move into its own package.
  • I think what you say about abstracting the packages is a great point. If there's an easy way to do this, then I think it's probably worthwhile in the long term.

Other

To put my point of view in a little more context, I'm very busy right now trying to hopefully graduate sometime this year, and hopefully sooner rather than later. Right now this package does everything that I personally need it to do for my work, so while I focus on writing a thesis I probably won't be adding any additional "extraneous" features (from the point of view of my own work). I obviously am quite attached to the package and will keep working on it and trying to make it the best that it can be, especially after graduation, but I also think that there will be a limit to how many features need to be added by us to make this package worthwhile to the larger community, and I'm not all that interested in surpassing that limit.

I'm absolutely thrilled that you guys have been continuing to add features, and you shouldn't worry about breaking compatibility for me, etc., since I can always just pin a specific version. I don't say it enough, but thanks for all your help. All your code speed ups and improvements have really sped up my work for me and helped me learn Julia much better than I otherwise would have, and have also turned this package into something that's useful for a much wider audience.

@garborg
Copy link
Collaborator

garborg commented Jan 21, 2015

That all sounds good.

Versioning:
Pre 1.0, semantic versioning leaves the meaning of minor and patch levels up to the developer. I see some Julia packages tagging patchlevel versions for minor increases in functionality, bug fixes, maybe minor breaks in compatibility, etc., and saving the minor versions for when enough of the major items have been ticked off, but it's ad hoc and certainly not a rule, and we don't necessarily have to tag that often because our user group is up to date with current development on master.

We could probably tag more patches after bugfixes to be a little friendlier to outsiders, or to give us versions to pin that are without previous bugs and without later compatibility breaks? For the latter, manual git checkouts work, too, and there's DeclarativePackages.jl.

You're welcome for the help -- thanks, first for releasing the package and for being so open to contributions! It has been a great intro to geospatial work for me, and contributing has been helping me become a better programmer, too.

@tedsteiner
Copy link
Owner

@yeesian @garborg

I haven't made any changes to this package in a while, but we have two big changes that still haven't made it into a release: XML streaming and moving the coordinate systems into Geodesy.jl. I've been quiet lately because I've been working on my thesis (and I also just don't have anything else I needed to add), but I think it would be good to get those changes into a release for others to use (if anyone else is using the package).

For my thesis, I'd like to give the release number that I used to generate my results. Does anyone have any objections to me pushing a new release or any changes that are about to be committed? I had wanted to wait until we figured out the Travis testing issues, but those don't seem to have worked themselves out in the last couple of months like I had hoped.

@yeesian
Copy link
Collaborator Author

yeesian commented Apr 27, 2015

I'm okay with that! I have friends from my office who might be using this package for their own work as well, so it'll be great to have a release number.

@garborg
Copy link
Collaborator

garborg commented Apr 27, 2015

@tedsteiner I am, too. If you're not in a hurry, I can put the package through the motions on 0.4 tonight, and try to clear up any compatibility issues (or give an ETA), but don't let that hold you up if you want to get something out the door.

@tedsteiner
Copy link
Owner

Nope, I'm not in a big hurry, I'd just like to push out a release sometime in the next week or so. Thanks, guys!

@garborg
Copy link
Collaborator

garborg commented Apr 28, 2015

No problem. I ran into an issue running using the package 0.4 (JuliaIO/LibExpat.jl#30), but it just requires a naming decision, so it shouldn't take long to resolve.

@yeesian
Copy link
Collaborator Author

yeesian commented May 11, 2015

Perhaps we should push for the release soon, in lieu of #70?

@tedsteiner
Copy link
Owner

Yes, I definitely agree. I should be able to get to it later this week (my thesis defense is this afternoon, so after today I should have more time).

@garborg Do you happen to know if LibExpat's naming decision has been resolved yet?

@garborg
Copy link
Collaborator

garborg commented May 11, 2015

Nothing yet, just pinged Amit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants