Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase (or remove) the rate limit for administrators/moderators #290

Closed
Zaczero opened this issue Oct 13, 2023 · 32 comments · Fixed by #294
Closed

Increase (or remove) the rate limit for administrators/moderators #290

Zaczero opened this issue Oct 13, 2023 · 32 comments · Fixed by #294
Milestone

Comments

@Zaczero
Copy link

Zaczero commented Oct 13, 2023

Currently, Thanos (tool used by DWG) encounters difficulties with mass-reverts due to rapidly reaching the rate limit. This results in considerable delays for extensive revert tasks, sometimes requiring several hours of waiting. Ideally, the rate limit behavior should be adjusted for OSM moderation purposes.

@Zaczero Zaczero changed the title Increase (or remove) rate limit for administrators/moderators Increase (or remove) the rate limit for administrators/moderators Oct 13, 2023
@tomhughes
Copy link
Contributor

To be clear while this repository is the place to ask for the ability to set a higher rate limit for moderators the actual decision of what rate limits to set is a an operational matter that is out of scope here.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 14, 2023

@tomhughes : In case you need an immediate workaround, you could delete the respective keys directly on memcached. Syntax is as follows, where 1, 2, 3 corresponds to the user id. This could be part of some scheduled script.

delete cgimap:user:1
delete cgimap:user:2
delete cgimap:user:3
quit

@mmd-osm mmd-osm added this to the v0.9.0 milestone Oct 16, 2023
DavidKarlas added a commit to DavidKarlas/openstreetmap-cgimap that referenced this issue Oct 28, 2023
This change removes transfer size rate limiting for moderator role users aka. DWG members.
@DavidKarlas
Copy link
Contributor

DavidKarlas commented Oct 28, 2023

I opened PR in attempt to fix this issue: #294
Let me know if you would prefer increasing limit instead of increasing it, I can make change to increase limit instead...

@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 28, 2023

@Zaczero : I was wondering a bit why we're even hitting the rate limit here.

The way rate limiting works today is that we're counting the size of an API response towards the user quota. In case of /map or some full history calls, users should be throttled fairly quickly.

However, in case of changeset uploads, we're sending back a tiny diffResult XML document with about 100 bytes per created/changed/deleted object. A user would have to upload more than a million objects before they're hitting rate limiting.

Presumably, changeset upload is not the only endpoint you're calling. It would be good to get some idea, what kind of data volume to expect for moderators. Maybe we have to revisit our 32bit field here.

@Zaczero
Copy link
Author

Zaczero commented Oct 28, 2023

We're dealing with hundreds of changeset XMLs, each with a size of 5000-10000. I'm not sure what else to add here, but the rate limit is being hit, and it's significantly impacting mass revert operations.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 28, 2023

More specifically, I'm trying to find out if we're likely running into the issues mentioned here: #294 (comment)

This is only feasible when we better understand how your tool works, what kind of API requests you're triggering, etc.

@DavidKarlas
Copy link
Contributor

I believe they are hitting on upload because

fmt.start_document(generator, "diffResult");

Which is responding with diffResult:
image

@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 28, 2023

I'm not really sure that's the case. diffResult is a fairly small message, like mentioned earlier, that's about 100 bytes per object, if not less. Is DWG really reverting millions of objects before hitting rate limiting?

What I'm looking for is some kind of list of API requests along with the number of executions, when rate limiting happens. We really need to get a clear picture, if we're good to go with our 32bit fields here.

@DavidKarlas
Copy link
Contributor

From my test for 1000 objects uploaded it results in 54,255 bytes response size.

To put things into perspective(don't have DWG members numbers at hand) but todays vandal from first to last changeset it passed 25minutes in this time it modified 292370 nodes, 6739 ways and 1656 relations.

Let's assume DWG member banned them and wants to revert it as fast as possible 300765 objects modified, so 300 * 54KB = 16MB. So it is very unlikely this is problem, unless production --ratelimit and --maxdebt are much smaller than defaults in code.

What is probably happening tool first fetches objects before doing revert, will take a look at Thanos code...

@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 28, 2023

Thank you for double checking the numbers, much appreciated.

I'd like to mention one other topic which comes to mind. Assuming Thanos and osm-revert are all running on the same server, then even read-only API requests should use the authenticated user details, to avoid rate limiting based on the server's IP address (rather than a single user who's performing the revert). I haven't really checked if that's already the case today.

Zerebubuth had added this feature years ago: https://lists.openstreetmap.org/pipermail/dev/2016-October/029544.html

@DavidKarlas
Copy link
Contributor

DavidKarlas commented Oct 28, 2023

Looking at code at osm-revert/main.py#L188 as far as I understand requests are authenticated, but changesets that need reverting are downloaded, going back to our vandal from today... Downloading all changesets results in 76MB.

But I have now different theory, moderator job get really busy when vandals are active, jumping between osm.org, JOSM, loading huge relations, kicked off osm-revert running at same time in background, all together, authenticated from multiple locations hitting API from everywhere all at once, and boom, goes past the limit...

@SomeoneElseOSM
Copy link

but todays vandal

Just in case people are not aware - the revert of that isn't being done from a moderator account so is unrelated to this issue.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 28, 2023

@SomeoneElseOSM : today we're not (yet) differentiating between normal users and moderators when it comes to rate limiting. This is part of a new pull request by @DavidKarlas only, which will get merged shortly.

@Zaczero
Copy link
Author

Zaczero commented Oct 28, 2023

More specifically, I'm trying to find out if we're likely running into the issues mentioned here: #294 (comment)

This is only feasible when we better understand how your tool works, what kind of API requests you're triggering, etc.

The osm-revert tool primarily retrieves changeset XML information from OSM. Element information is bulk-downloaded from Overpass. The only additional request involves creating and uploading the changesets. All OSM requests are authenticated with the reverting user (or moderator).

@Zaczero
Copy link
Author

Zaczero commented Oct 28, 2023

Looking at code at osm-revert/main.py#L188 as far as I understand requests are authenticated, but changesets that need reverting are downloaded, going back to our vandal from today... Downloading all changesets results in 76MB.

But I have now different theory, moderator job get really busy when vandals are active, jumping between osm.org, JOSM, loading huge relations, kicked off osm-revert running at same time in background, all together, authenticated from multiple locations hitting API from everywhere all at once, and boom, goes past the limit...

I have personally reverted some instances of mass-vandalism, and I have not performed any other actions on my account at the time. This issue appears to be isolated in this context.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 28, 2023

Thanks a lot for your feedback. Looking at the settings for cgimap in production (link), a user would have to wait a bit more than 21 minutes for a maxdebt of 250MB to be fully consumable again. If you've downloaded e.g. some large changesets shortly before kicking off the revert job, chances are you're hitting the rate limiting more easily.

Regarding the original problem I was trying to get some idea on: I think we should be good with 2-3GB maxdebt. If needed, we could bump the rate limit parameter as needed, to minimize any potential wait time for moderators.

@mmd-osm mmd-osm modified the milestones: v0.9.0, v0.8.9 Oct 28, 2023
@mmd-osm mmd-osm linked a pull request Oct 28, 2023 that will close this issue
@tomhughes
Copy link
Contributor

The only thing we need here is the configuration options - what limit to set is an operational matter than can be adjusted as needed in response to observed conditions.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 28, 2023

You won't be able to set an arbitrarily high value from an operations point of view, but the max. limits in place should be still ok.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 29, 2023

@tomhughes : by the way, do we have memcached deployed on the dev instance at the moment? Could we run some tests there as well? How would we set up a moderator user there?

@tomhughes
Copy link
Contributor

Yes it looks like the dev instances are using memcached and I can make people moderators.

mmd-osm added a commit that referenced this issue Oct 29, 2023
Fix #290: Add moderators specific rate limiting settings
@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 29, 2023

@tomhughes : sounds good. I have created a new user https://master.apis.dev.openstreetmap.org/user/mmd2mod to try out moderator rate limits.

@DavidKarlas , @Zaczero : if you want to do a bit of testing with a "mod hat on", please sign up on https://master.apis.dev.openstreetmap.org and post your user name in this issue.

A while time ago, I've uploaded a fair amount of data in Halifax, Nova Scotia, which could be used to test rate limiting: https://master.apis.dev.openstreetmap.org/#map=17/44.66416/-63.60977

@tomhughes
Copy link
Contributor

I've made that user a moderator.

@DavidKarlas
Copy link
Contributor

I tested it on my local machine few times, so I will pass chance to test on test server, @mmd-osm thank you for testing on separate system.
Btw, it might be a bit nicer/easier to test with #295 merged

@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 29, 2023

I tested both scenarios on the dev instance. Results look good to me. I was having a bit of a hard time to hit the rate limit as a moderator, and even then, the retry-limit lasted a few secs only.

Also, including the Retry-Limit in the error message helps JOSM users to better understand what is going on, and how long they need to wait.

@DavidKarlas
Copy link
Contributor

Thank you for testing this!

@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 29, 2023

@tomhughes : I've triggered a new ppa build process. Packages should be available soon, publication is still pending: https://launchpad.net/~mmd-osm/+archive/ubuntu/cgimap-809-jammy/+packages

@mmd-osm
Copy link
Collaborator

mmd-osm commented Oct 30, 2023

Change is in production now.

@DavidKarlas
Copy link
Contributor

Yay, tested:
image
Thank you for all the support!

@DavidKarlas
Copy link
Contributor

Now we just need someone from DWG to see if they have harder time hitting limits...

@SomeoneElseOSM
Copy link

someone from DWG someone from DWG

Thanks - as I mentioned earlier I don't normally use a moderator account for reverts. Some people do though, so we'll see what happens.

@DavidKarlas
Copy link
Contributor

I assume adding moderator role to _revert accounts is big no-no?

@tomhughes
Copy link
Contributor

I'm happy to add it to revert accounts of DWG members. I already added it to one of @woodpeck's repair accounts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants