New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSM XML Support #877

Open
hallahan opened this Issue Jul 27, 2016 · 10 comments

Comments

Projects
None yet
4 participants
@hallahan
Contributor

hallahan commented Jul 27, 2016

Because Mapzen currently supports vector tiles in 3 formats: TopoJSON, GeoJSON and Mapbox vector tiles, it is relatively straightforward to begin adding support for additional data sources. In working on OpenMapKit, I have spent some time thinking about the internals involved with editing and working with OpenStreetMap data. We have been searching long and hard for a new map renderer, and my desire is to do it in a way that OSM data really is a first-class citizen in the mapping library. My thought is that adding OSM XML support is the first step toward creating an OSM editor with Tangram.

I like that Tangram's data pipeline revolves around the notion of a tile. Although JOSM does not think that way, iD actually does. Individual GET requests are made to the OSM Editing API (0.6) for the bbox of a given tile. It turns out we can do the same thing in Tangram ES.

I'm working on a proof-of-concept that takes the approach of directly requesting OSM XML from the OSM Editing API. This XML is then parsed into an OSM data model, similar to what you will see in JOSM and OpenMapKit. We have an in-memory data set for a given tile with access to the OSM elements themselves (nodes, ways, and relations). In addition, we can create a Tangram::Layer from the dataset, allowing tiles to be rendered in a similar manor to vector tiles. We loop through the standalone nodes for points, open ways for lines, and closed ways for polygons.

Right now I'm using rapidxml to parse the XML, mainly because you are using rapidjson for your JSON, and it is a header-only library that is easy to be included. It is a DOM parser only, and I'm wondering if you'd prefer that we switch to something more mainstream--like libxml2 or expat? The current lib is probably fine on a tile-by-tile basis, but if we are trying to populate a DB later on with more data, we might want a streaming parser.

This direct OSM XML support makes sense online, but the real goal for me is to make this work from an offline-data store (SQLite). With that in mind, I'd like to make the MemoryDataSet a child to an abstract DataSet class. That way we can later create a SQLiteDataSet class that queries a database for OSM objects--an alternative to hitting an online REST endpoint. We can break that off into a different issue when it's time.

I've got a branch going that renders OSM XML, though some of the tiles don't come through yet. The colored buildings are using the data: { source: osmApi }. In the scene.yaml, all of the OSM XML data is being treated as a single layer, osmXml, from which tags (properties) are being filtered and styled.

screenshot 2016-07-10 20 01 34

https://github.com/hallahan/tangram-es/tree/OSM_XML

The beginnings of an OSM model:

https://github.com/hallahan/tangram-es/tree/OSM_XML/core/src/osm

We'll probably want to start a new branch that will make it's way into being a pull request. Any input on how to make that happen is appreciated!

cc/ @bcamper @tallytalwar

@matteblair

This comment has been minimized.

Show comment
Hide comment
@matteblair

matteblair Jul 27, 2016

Member

This is super cool :)

I'm looking through your changes now, I think we have a few possible directions we could go with this and I'll have more to say on this tomorrow.

Member

matteblair commented Jul 27, 2016

This is super cool :)

I'm looking through your changes now, I think we have a few possible directions we could go with this and I'll have more to say on this tomorrow.

@nvkelso

This comment has been minimized.

Show comment
Hide comment
@nvkelso

nvkelso Jul 28, 2016

Member

Wow! Great work :)

On Jul 27, 2016, at 16:58, Matt Blair notifications@github.com wrote:

This is super cool :)

I'm looking through your changes now, I think we have a few possible directions we could go with this and I'll have more to say on this tomorrow.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

Member

nvkelso commented Jul 28, 2016

Wow! Great work :)

On Jul 27, 2016, at 16:58, Matt Blair notifications@github.com wrote:

This is super cool :)

I'm looking through your changes now, I think we have a few possible directions we could go with this and I'll have more to say on this tomorrow.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@hallahan

This comment has been minimized.

Show comment
Hide comment
@hallahan

hallahan Jul 28, 2016

Contributor

After a bit of debugging, I've come to find that the "missing tiles" are basically the rapidxml parser failing. Looking at the char* that is passed to the parser, the OSM XML looks totally valid. It chokes saying it is missing an expected <. I've used libxml2 in the past, and it seems to be one of the most widely used. It looks like CGImap uses that lib as well.

I'm new to CMake. How hard would it be to add libxml2 as a dependency?

Contributor

hallahan commented Jul 28, 2016

After a bit of debugging, I've come to find that the "missing tiles" are basically the rapidxml parser failing. Looking at the char* that is passed to the parser, the OSM XML looks totally valid. It chokes saying it is missing an expected <. I've used libxml2 in the past, and it seems to be one of the most widely used. It looks like CGImap uses that lib as well.

I'm new to CMake. How hard would it be to add libxml2 as a dependency?

@matteblair

This comment has been minimized.

Show comment
Hide comment
@matteblair

matteblair Aug 15, 2016

Member

Hey again! Sorry for the delay on getting back to this. Thoughts with regards to incorporating an OSM XML data source:

  • This is a pretty awesome feature for a somewhat specific use case. The main use scenario that I imagine for an XML data source is an editor application (like the one you're building), for other uses it seems that more rendering-optimized formats would be preferable. Considering also that the XML data model requires quite a bit more code than the other supported formats, it seems to me that we should not require every library user to build and link the code for XML data source support (the way that MVT and GeoJSON are required, for example).
  • The tangram team constantly strives to make the feature sets of tangram-es and tangram-js as close as possible. In general, a scene file that can load on one platform can also load on the other and produce the same results. Wherever this isn't true, we have plans to bring the engines to parity. To support an XML data format in the same way as, say, GeoJSON, we would want to also commit to supporting it in the JS engine - and I don't think that's on a roadmap right now.

That said, we'd still love to support this in some way! Some options:

  1. We can add the OSM XML code to the tangram-es core library and build it conditionally based on a compile-time flag (e.g. #define TANGRAM_BUILD_OSM_XML).
  2. We can maintain a separate project containing just the DataSource and data model files. Since the tangram-es C++ interface allows adding any data source that conforms to the base class, a user would only need to compile both projects and add an OSM_XML data source at run time.

There are tradeoffs to both of these and I'm sure these aren't the only paths we could take, but either seems plausible to me. Thoughts?

Member

matteblair commented Aug 15, 2016

Hey again! Sorry for the delay on getting back to this. Thoughts with regards to incorporating an OSM XML data source:

  • This is a pretty awesome feature for a somewhat specific use case. The main use scenario that I imagine for an XML data source is an editor application (like the one you're building), for other uses it seems that more rendering-optimized formats would be preferable. Considering also that the XML data model requires quite a bit more code than the other supported formats, it seems to me that we should not require every library user to build and link the code for XML data source support (the way that MVT and GeoJSON are required, for example).
  • The tangram team constantly strives to make the feature sets of tangram-es and tangram-js as close as possible. In general, a scene file that can load on one platform can also load on the other and produce the same results. Wherever this isn't true, we have plans to bring the engines to parity. To support an XML data format in the same way as, say, GeoJSON, we would want to also commit to supporting it in the JS engine - and I don't think that's on a roadmap right now.

That said, we'd still love to support this in some way! Some options:

  1. We can add the OSM XML code to the tangram-es core library and build it conditionally based on a compile-time flag (e.g. #define TANGRAM_BUILD_OSM_XML).
  2. We can maintain a separate project containing just the DataSource and data model files. Since the tangram-es C++ interface allows adding any data source that conforms to the base class, a user would only need to compile both projects and add an OSM_XML data source at run time.

There are tradeoffs to both of these and I'm sure these aren't the only paths we could take, but either seems plausible to me. Thoughts?

@hallahan

This comment has been minimized.

Show comment
Hide comment
@hallahan

hallahan Aug 15, 2016

Contributor

I agree that it makes sense to make the OSM support be optional. In addition to the OSM editor use case, support for OSM XML will be useful for browsing data that is guaranteed to be the original source. For example, OpenStreetMap.org has a data view where you can see Leaflet rendered vectors of the data on top of the map. OSM elements can be selected, and tags can be seen on a side view. If a future phase of OpenStreetMap.org wanted to adopt Tangram as the renderer, you could use this functionality to provide seamless rendering and introspection on fresh data--something that is currently lacking.

I totally agree parity with tangram-js is a great idea. ES makes the most sense for a mobile editor, but JS makes a lot of sense for casually viewing data without requiring an app. For example, it would be quite useful to have a map style that renders based off of the user attribute of OSM elements. You could then see who has modified what in a mapathon. It could also be useful for Missing Maps leaderboards where the cartography demonstrates data top users have edited. Also, the Overpass API is a great source of OSM XML that can be derived from super intricate queries.

Because Tangram is well suited to gain support for any type of spatial data format, I think this could be a huge technical edge over other modern map renderers out there. Taking a step back, how would we do this if we also wanted to add Shapefile support? GPX, KML, Esri Geodatabase, CSV, etc? How would we do this if we want and individual repo for a specific data format?

Some sort of data source plugin architecture?

Contributor

hallahan commented Aug 15, 2016

I agree that it makes sense to make the OSM support be optional. In addition to the OSM editor use case, support for OSM XML will be useful for browsing data that is guaranteed to be the original source. For example, OpenStreetMap.org has a data view where you can see Leaflet rendered vectors of the data on top of the map. OSM elements can be selected, and tags can be seen on a side view. If a future phase of OpenStreetMap.org wanted to adopt Tangram as the renderer, you could use this functionality to provide seamless rendering and introspection on fresh data--something that is currently lacking.

I totally agree parity with tangram-js is a great idea. ES makes the most sense for a mobile editor, but JS makes a lot of sense for casually viewing data without requiring an app. For example, it would be quite useful to have a map style that renders based off of the user attribute of OSM elements. You could then see who has modified what in a mapathon. It could also be useful for Missing Maps leaderboards where the cartography demonstrates data top users have edited. Also, the Overpass API is a great source of OSM XML that can be derived from super intricate queries.

Because Tangram is well suited to gain support for any type of spatial data format, I think this could be a huge technical edge over other modern map renderers out there. Taking a step back, how would we do this if we also wanted to add Shapefile support? GPX, KML, Esri Geodatabase, CSV, etc? How would we do this if we want and individual repo for a specific data format?

Some sort of data source plugin architecture?

@matteblair

This comment has been minimized.

Show comment
Hide comment
@matteblair

matteblair Aug 16, 2016

Member

Those are some really good ideas for OSM XML rendering that hadn't even occurred to me - well noted!

A plugin-like architecture does seem like a natural path. The DataSource abstract class is a minimal prototype of what a "plugin" could be. Fully specifying a plugin interface and maintaining separate repos for plugins would be the cost, but the benefits seem pretty great: tangram-es can keep a streamlined set of features for apps that just need efficient rendering and developers are free to implement or modify data source plugins for their specific needs.

I'll look around for examples of this sort of architecture and see what might work for us.

Member

matteblair commented Aug 16, 2016

Those are some really good ideas for OSM XML rendering that hadn't even occurred to me - well noted!

A plugin-like architecture does seem like a natural path. The DataSource abstract class is a minimal prototype of what a "plugin" could be. Fully specifying a plugin interface and maintaining separate repos for plugins would be the cost, but the benefits seem pretty great: tangram-es can keep a streamlined set of features for apps that just need efficient rendering and developers are free to implement or modify data source plugins for their specific needs.

I'll look around for examples of this sort of architecture and see what might work for us.

@hallahan

This comment has been minimized.

Show comment
Hide comment
@hallahan

hallahan Aug 25, 2016

Contributor

I made some headway with OSM XML support today. I ended up finding a better XML parser that was easy to include in the project called pugixml. Not only do the benchmarks look good, but the docs are fantastic, and the error reporting is good.

Speaking of which, I've figured out why sometimes the XML doesn't parse...

The char* buffer isn't always the correct. Maybe the EOF delimeter isn't quite in the right place?

For example, here is the contents of task.rawTileData->data() that went into OSM::XmlParser:

https://gist.github.com/hallahan/c6a0a1f14fb7bb900f8232bc462c7a58

The end contents vary. Since I'm devving on my Macbook, seeing that odd <plist... XML suggests it's from the memory of my tangram process on my laptop.

</osm>

        <key>HSTS Host</key>
        <true/>
        <key>Include Subdomains</key>
        <true/>
    </dict>
    <key>za.search.yahoo.com</key>
    <dict>
...

Does anyone have any insight as to why task.rawTileData->data() may be the wrong size?

https://github.com/hallahan/tangram-es/blob/57cab4c6c1f7a195a02370f90332bf6eb6b6584f/core/src/data/osmXmlSource.cpp#L32

Contributor

hallahan commented Aug 25, 2016

I made some headway with OSM XML support today. I ended up finding a better XML parser that was easy to include in the project called pugixml. Not only do the benchmarks look good, but the docs are fantastic, and the error reporting is good.

Speaking of which, I've figured out why sometimes the XML doesn't parse...

The char* buffer isn't always the correct. Maybe the EOF delimeter isn't quite in the right place?

For example, here is the contents of task.rawTileData->data() that went into OSM::XmlParser:

https://gist.github.com/hallahan/c6a0a1f14fb7bb900f8232bc462c7a58

The end contents vary. Since I'm devving on my Macbook, seeing that odd <plist... XML suggests it's from the memory of my tangram process on my laptop.

</osm>

        <key>HSTS Host</key>
        <true/>
        <key>Include Subdomains</key>
        <true/>
    </dict>
    <key>za.search.yahoo.com</key>
    <dict>
...

Does anyone have any insight as to why task.rawTileData->data() may be the wrong size?

https://github.com/hallahan/tangram-es/blob/57cab4c6c1f7a195a02370f90332bf6eb6b6584f/core/src/data/osmXmlSource.cpp#L32

@hallahan

This comment has been minimized.

Show comment
Hide comment
@hallahan

hallahan Aug 25, 2016

Contributor

I'm noticing that DownloadTileTask has this public member:

// Raw tile data that will be processed by DataSource.
std::shared_ptr<std::vector<char>> rawTileData;

We're seeing a .data() from the different data sources, and that seems to give a pointer to the underlying vector's array. I wonder... Is the size of this always correct?

http://en.cppreference.com/w/cpp/container/vector/data

Why are we doing this instead of having maybe?

std::shared_ptr<std::string> rawTileData

cc/ @hjanetzek

Contributor

hallahan commented Aug 25, 2016

I'm noticing that DownloadTileTask has this public member:

// Raw tile data that will be processed by DataSource.
std::shared_ptr<std::vector<char>> rawTileData;

We're seeing a .data() from the different data sources, and that seems to give a pointer to the underlying vector's array. I wonder... Is the size of this always correct?

http://en.cppreference.com/w/cpp/container/vector/data

Why are we doing this instead of having maybe?

std::shared_ptr<std::string> rawTileData

cc/ @hjanetzek

@matteblair

This comment has been minimized.

Show comment
Hide comment
@matteblair

matteblair Aug 26, 2016

Member

Nice work! I may be able to shed some light on this issue with buffer length. The reason we store "raw data" in a vector of bytes instead of a string is that this must also support binary formats like MVT, which can contain null characters in the body and therefore can't be treated as strings. Consequently, rawTileData is not guaranteed to be null-terminated, so if you are using it as a string you should also use the length of the vector to limit the number of bytes read.

Member

matteblair commented Aug 26, 2016

Nice work! I may be able to shed some light on this issue with buffer length. The reason we store "raw data" in a vector of bytes instead of a string is that this must also support binary formats like MVT, which can contain null characters in the body and therefore can't be treated as strings. Consequently, rawTileData is not guaranteed to be null-terminated, so if you are using it as a string you should also use the length of the vector to limit the number of bytes read.

@hallahan

This comment has been minimized.

Show comment
Hide comment
@hallahan

hallahan Aug 26, 2016

Contributor

Ah, I see, makes sense. Thanks!

Explicitly making a string with the size of the vector included fixes the problem.

xmlParser.parse(std::string(rawTileData->data(), rawTileData->size()));

Though, I bet I can do this without having to construct a string using a different load function in pugixml...

hallahan@38630e9

Contributor

hallahan commented Aug 26, 2016

Ah, I see, makes sense. Thanks!

Explicitly making a string with the size of the vector included fixes the problem.

xmlParser.parse(std::string(rawTileData->data(), rawTileData->size()));

Though, I bet I can do this without having to construct a string using a different load function in pugixml...

hallahan@38630e9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment