Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce limits for tags and relation members #174

Open
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@mmd-osm
Copy link
Collaborator

mmd-osm commented Feb 17, 2019

Corresponds to openstreetmap/openstreetmap-website#1711

This pull request introduces fairly liberal limits for relation members and tags per object. Without those limits, relations with millions of members could be uploaded, which is clearly something we don't want.

const int RELATION_MAX_MEMBERS = 100000;
const int OSM_ELEMENT_MAX_TAGS = 5000;
@Zverik

This comment has been minimized.

Copy link

Zverik commented Feb 18, 2019

I suggest 32000 members, since that's near the osm2pgsql limit. I assume we have only a couple relations that exceed this limit, and they can be easily fixed.

@pnorman

This comment has been minimized.

Copy link
Contributor

pnorman commented Feb 19, 2019

+1 to unsigned shortint being a limit. I'd prefer a lower limit like we have for way nodes, but ~32k is an easily justified limit based on other parts of the OSM toolchain.

Reduce relation max members to 32k
This should be more in line with what other tools in the osm ecosystem typically handle
@mmd-osm

This comment has been minimized.

Copy link
Collaborator Author

mmd-osm commented Feb 23, 2019

I believe we should be good with 32k as a max number of relation members. In the latest commit I reduced the limit accordingly and deployed the new code to the dev instance for testing.

I created one example before this new 32k limit was in place (see https://upload.apis.dev.openstreetmap.org/changeset/1022 -> relation 5106). When trying to upload a new version, you will get the following error message now:

32k_member

Most likely this pull request will be one of the last changes before releasing cgimap version 0.7.0. Due to semantic versioning, the version number jumps from 0.6.2 to 0.7.0 this time, while complying to OSM API 0.6.

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Feb 24, 2019

The limit does seem to be rather arbitrary and the reasoning too. Suppose I write a popular OSM tool that only processes the first 1000 members of a relation and then pops up a message stating that the relation is too large? Are you then going to reduce the max member count again?

IMHO it would make more sense to find a value that is technically defined (which might be tricky) and then require conformant apps to handle relations up to that size gracefully.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator Author

mmd-osm commented Feb 24, 2019

@simonpoole : you raise a valid point here. I think it's probably quite difficult to express such a limit in terms of technical limits. A mobile application is much more constrained by the available memory and CPU and might already have a hard time dealing with 1000 relation members (ignoring the UX implications for a moment).

Now, desktop editors like iD also struggle with 32k relation members to a point where in the best case it takes minutes for the UI to respond to any user interaction (luckily, JOSM works flawlessly).

Bottom line here is, even though some editors may be unhappy, users still have the option to deal with those large relations, even if it implies changing editors in some cases.

I must say that my motivation for proposing such limits was somewhat different. You can think of it as some kind of fuse otherwise known from electric circuits. Most of the time you will not notice it. Only when really bad things happen they're there to protect the API (and downstream processors) from insane amounts of data.

It shouldn't be seen as an invitation to data providers to create as large as possible relations that still somehow fit in that limit. Likewise, it shouldn't be seen as a hard requirement for clients to meet.

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Feb 24, 2019

With relations it is a bit more complex than that.

-every- editor that allows geometry edits (that can be as simple as deleting an object) has to be able to deal with implicit edits of relations with any number of members (or up to a future maximum number), there is simply no way around that. And as we saw: Vespucci was quite happy editing the 21'000 member relation in New York, so happy that when the bug was tickled that caused member duplication it was also happy editing it when it then had 42'000 members (and didn't crash).

Having an UI that can handle such a large number of members and direct editing of the members is a different question and obviously, depending on specifics, there are going to limits depending on per UI member element size (in Vespucci it is completely possible to directly edit a relation with more than 1000 members, and that could probably be substantially increased by re-using UI objects there just isn't much of an argument for doing the work for that).

The other point, that I've already made on the other related issue, is that the way editors works, having an upload fail because of an error in the edits is really bad (we have to live with it for conflicts, but they are at least in general rather straight forward to fix), so any such limit needs to be available to the editors and the code to enforce it needs to be written -before- they are put in to force.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator Author

mmd-osm commented Feb 24, 2019

-every- editor that allows geometry edits (that can be as simple as deleting an object) has to be able to deal with implicit edits of relations with any number of members (or up to a future maximum number)

Maybe I'm missing something obvious here, but I don't really see why we have to treat an explicit limit any different from an implicit limit we already have today, which may well be in the 100'000++ relation member range (I haven't really tested it, and don't know the exact number). We're not requesting that every editor supports this implicit limit as of today, although there's basically nothing which would prevent an app from creating such humongous relations.

As I see it, you're basically suggesting a new requirement for every editor, which is to handle relations of whatever size the API happens to return (that would be the 'as is' situation), and in the future a more relaxed requirement due to the guaranteed maximum number of relation members.

any such limit needs to be available to the editors and the code to enforce it needs to be written -before- they are put in to force

Yes, I agree with this one, although I won't be much of a help to make that happen on the Rails port. Volunteers are very welcomed to take over this task!

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Feb 24, 2019

Maybe I'm missing something obvious here, but I don't really see why we have to treat an explicit limit any different from an implicit limit we already have today, which may well be in the 100'000++ relation member range (I haven't really tested it, and don't know the exact number). We're not requesting that every editor supports this implicit limit as of today, although there's basically nothing which would prevent an app from creating such humongous relations.

I'm assuming that there is in practical terms currently no (API-side of things) limit outside of disk space and potentially the high probability that something will go wrong in transfer, so I would expect to be able to add at least millions, if not billions, of members (as these are just a string and a 64bit id space limits are likely the least of things that will go wrong). As a corollary adding a limit is creating a new error condition that cannot be retried and the editors need to cater for that (or wing it, whatever fits the philosophy of the resp. devs.).

@mmd-osm

This comment has been minimized.

Copy link
Collaborator Author

mmd-osm commented Feb 24, 2019

adding a limit is creating a new error condition that cannot be retried and the editors need to cater for that

Agreed. Avoiding a failing upload and providing some meaningful error message when it's clear that API will reject the data definitely makes sense. That's a slightly different story now than asking editors to keep up with 32k relation members.

Looking at how JOSM handles the max way_nodes API limit today, this boils down to implementing

  1. some logic in getCapabilities to parse and manage two new properties in /api/capabilities.
  2. new validator checks
  3. final checks right before the upload starts

(+ test cases)

Apart from that, there doesn't seem to be much logic and I don't expect anything more complex for oversized relations.

https://github.com/openstreetmap/josm/blob/mirror/src/org/openstreetmap/josm/data/validation/tests/ApiCapabilitiesTest.java#L44-L56

limit1

https://github.com/openstreetmap/josm/blob/mirror/src/org/openstreetmap/josm/actions/upload/ApiPreconditionCheckerHook.java#L88-L101

limit2

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Feb 24, 2019

Now outside of that the way JOSM handles this is already at least massively user hostile, if not just silly, simply aborting the upload is even worse when we introduce a relation member limit.

Imagine you've been working for an hour on a project say adding lanes and have been busy splitting ways, you're finished now and want to upload your work. BANG your upload fails because relation X now has 32'100 members instead of just 31'999. So to get your upload to work and save what you were doing you now have to fix a 32'100 member relation, which even for JOSM users is going to be out of the grasp of all but a select few.

That's why any sane implementation of the constraint is going to have to monitor the relations sizes operation by operation, error when necessary and potentially rollback more complicated compound operations (think for example auto-spliting when adding turn-restrictions), and that is a substantial amount of work.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator Author

mmd-osm commented Feb 24, 2019

Just a quick data point to put 32k relation members into perspective and give a first l idea about the expected cost vs. benefit for a more sophisticated logic.

Currently, we have exactly 2 relations (out of ~ 6.6 mio relations in total) with more than 10k members, the largest of them two having 18555 members. This leaves us with quite some headroom until we hit the 32k limit. Or to phrase it a different way, hitting that limit could be seen as quite a strong indication that you're doing something wrong.

In any case, JOSM still allows you to save the data to a local file, regardless of what the validator complains about, i.e. hitting the limit won't necessarily result in immediate loss of data.

(Source: https://overpass-turbo.eu/s/GpB)

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Feb 24, 2019

If you remember, we just had a case where that limit would have been exceeded and due to it not being in place the user in question was able to save their work, and have somebody else fix the relation in question after the fact and loose exactly zero % of their work. All which wouldn't have been possible in your scenario.

Or to phrase it a different way, hitting that limit could be seen as quite a strong indication that you're doing something wrong.

That is completely wrong, the vast majority of relation editing is indirect and automatic without direct user interaction with the relation. An user can hit the limit without doing anything wrong at all, that is why this is different from exceeding the 2000 node limit on a way.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator Author

mmd-osm commented Feb 25, 2019

I think it's a good thing to have some mechanism in place to contain the effects of a software bug, and not spill it over to all data consumers. That's exactly my motivation to have this limit.

In your particular case, I'd assume that a Vespucci user would still have been able to download their data to some local file (I haven't tested this, but I seem to vaguely remember that is possible), and ask for support.

An user can hit the limit without doing anything wrong at all

Can you elaborate a bit how this should happen in real life, assuming the user isn't inadvertedly triggering some software bug, and actually intends to create such a relation, either directly or indirectly. Given that all of our relations are so far away from the limit, I'm maybe lacking some insight here.

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Feb 25, 2019

First a couple of numbers: largest route relation 7'019 members (no duplicates, with there is one that is just shy of 10'000 ), largest multipolygon 18'581 members, so we're not talking about even just one order of magnitude headroom.

I thought I had made it already clear, but in any case: in no modern editor does splitting, merging and other similar operations require the user to interact or even be aware that the object they are editing is a member of a relation, adding and removing relation members will in general be automatic and silent. In other words an afternoon of lane editing in a city with some public transport infrastructure will lead to the user touching dozens of relations without even noticing.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator Author

mmd-osm commented Feb 25, 2019

In other words an afternoon of lane editing in a city with some public transport infrastructure will lead to the user touching dozens of relations without even noticing.

Yes, that's totally clear. My point was that splitting ways (and updating relations under the hood) is really common place, yet it hasn't resulted in larger amount of relations with > 8000 members. Hence my motivating question what kind of activity would typically trigger such huge relations.

I think we need to take a closer look at following two to three relations in particular, as they might hit the 32k limit in the future:

The expressway relation includes both motorway directions, is rather long, and has lots of tunnels and bridges. Islands and lakes might see further splitting of ways over time, further contributing to even more relation mebers.

I find it rather unlikely that this will happen over the course of an afternoon mapping session. Also, I'm not convinced that public transport mapping will be affected. However, this may be a different story over a period of several months or even years.

As you've probably noticed in my first post, the initial proposal was to go with 100k relation members. Would that be something you'd be more comfortable with?

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Mar 2, 2019

I find it rather unlikely that this will happen over the course of an afternoon mapping session. Also, I'm not convinced that public transport mapping will be affected. However, this may be a different story over a period of several months or even years.

Last try: to blow things up, the relation only needs to have (max. member limit) members -before- the unsuspecting user splits a way that is a member of the relation. How the relation got to that point and that it is at that limit can't be a concern of our hypothetical user. So again, any limit that is not way out there, requires more than just punting on upload.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator Author

mmd-osm commented Mar 7, 2019

to blow things up, the relation only needs to have (max. member limit) members -before- the unsuspecting user splits a way that is a member of the relation.

Agreed, finding out only during update about your huge relation won't be much of a help to a user. That's why there's the other issue which exposes the limit via /api/capabilities, and the expectation would be that editors would take care of this limit in a way they find suitable for their app.

Now, back to the question of the actual relation member limit. As you've probably noticed, I updated my previous post with some additional analysis results. Huge relations in CN tend to be a result of some invalid/corrupted data. In reality, those relations would have 2k-3k members, and not 10k as they have today. So that's really a non-issue.

Quite interesting is the distribution of number of waynodes across all relation members in Japan, or the US. It seems to me that quite a lot of ways have only very few nodes. Relation 6677259 is such an example. One explanation could be that this was somehow related to the way this data was originally imported from an external source.

In the case of a coastline, there's probably quite some potential to merge some of those ways. I'm not saying that each way should have 2k nodes, but for sure a 32k limit won't be a hard limit for such relations. A 32k relation member limit would allow up to 64 million nodes, although that is more of a theoretical figure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.