Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Mass prefetch user details for faster /map call #142

Merged
merged 1 commit into from
Jul 20, 2018

Conversation

mmd-osm
Copy link
Collaborator

@mmd-osm mmd-osm commented Apr 29, 2018

Fixes #122, and pretty much any other API call based on the same extraction logic.

I hope the general approach of tackling this issue is ok with only minor code cleanups needed.

@mmd-osm
Copy link
Collaborator Author

mmd-osm commented Jul 18, 2018

@tomhughes : Re: https://lists.openstreetmap.org/pipermail/talk/2018-July/081015.html

If the read only request in question is really handled by cgimap, this pull request should also address the last remaining bit, where we still have a larger number of single selects. It's too short notice to include this in the current data center migration period, but from a coding point of view, we should be good.

@tomhughes
Copy link
Contributor

tomhughes commented Jul 18, 2018

That's really what I was getting at - the current setup has two different potential issues you see:

  • cgimap calls are using eddie which is about 7.5ms away rather than the normal 0.3ms
  • rails calls are using katla which is local (about 0.2ms) but is an older slower machine

Hence my attempt to figure out which sort of problem it is.

@tomhughes
Copy link
Contributor

Confirmed that it was /api/0.6/relation/.../full which is a cgimap call.

@mmd-osm
Copy link
Collaborator Author

mmd-osm commented Jul 18, 2018

Makes sense. This API call could trigger several hundred single selects to fetch details for distinct changeset ids. At least for the example relation/7449166/full mentioned on talk-ml, that's roughly 300 calls.

Once this pull request is in place, this should turn into one select to fetch all changeset details in one go.

@zerebubuth
Copy link
Owner

Apologies, I've been holding this up for far too long. Since it seems like it might be rather more urgent now, and I have limited bandwidth (quite literally: EDGE) I've invited @mmd-osm to collaborate on this repo. If my understanding of Github permissions is correct, this should include the ability to merge PRs.

Sorry I didn't do this sooner & good luck!

@mmd-osm
Copy link
Collaborator Author

mmd-osm commented Jul 19, 2018

If my understanding of Github permissions is correct, this should include the ability to merge PRs.

Yes, that seems correct, I have a new "Merge pull request" option now, which means in theory I could merge this PR right away. Before doing so, I'd like to double check what additional steps would be needed afterwards. Changing the minor version 0.6.0 to 0.6.1 is probably optional.

However, we would need to build a new PPA on the osmadmins launchpad account. I don't have the proper authorization, and the latest ubuntu/bionic branch isn't published yet. Not sure if @tomhughes can step in here.

Overall, I don't have the impression that people are flooding various channels with complaints about poor performance. Some use cases have much higher runtimes now, which isn't too bad as Roland already gave some hints how to achieve the same via Overpass ApI for the time being. My personal recommendation would be not to put even more load on the admin team right now, and live with the current situation for 1-2 weeks. After all, many folks are on vacation anyway and will probably blame their poor EDGE connection for the bad response time :)

@tomhughes
Copy link
Contributor

@mmd-osm I can give you access to the PPA if you tell me your launchpad username.

@mmd-osm mmd-osm merged commit d843f79 into zerebubuth:master Jul 20, 2018
@mmd-osm
Copy link
Collaborator Author

mmd-osm commented Jul 20, 2018

I have merged #142 and #147 into the master branch now and bumped the version to 0.6.1.

@tomhughes : I think we can also do it the other way around, so that I don't have to become a member of osmadmins: I will try to build a new PPA via my user account https://launchpad.net/~mmd-osm - once that is in place, you can navigate to the respective PPA from your end (as osmadmins user), then click on "View package details", followed by "Copy packages".

This way you can fully control the whole copy process and initiate a "Rebuild the copied sources".

@tomhughes
Copy link
Contributor

Yes we can do it that way as well.

@mmd-osm
Copy link
Collaborator Author

mmd-osm commented Jul 20, 2018

The build process for 0.6.1 has just finished and build results are now available under https://launchpad.net/~mmd-osm/+archive/ubuntu/cgimap-perf-hotfix-061

I already deployed this new ppa locally and verified that it correctly identifies as version 0.6.1.

@tomhughes
Copy link
Contributor

So it turns out we've been running the new code live for about 14 hours now (I copied the package but didn't realise chef was set to update it automatically) and so far nobody has complained.

@mmd-osm
Copy link
Collaborator Author

mmd-osm commented Jul 23, 2018

This looks like the first report of some side effect due to the performance improvements: https://lists.openstreetmap.org/pipermail/dev/2018-July/030326.html

While the response time used to be high enough that the rate limiting didn't kick in, that situation has changed quite a bit now that some calls are factor 3 - 10 faster. For clients that didn't care much about limiting the number of requests/s, this might be an issue now. ihmo, the client should be fixed here.

Maybe revisiting and fine tuning the rate limit config setting for cgimap may also be needed in https://github.com/openstreetmap/chef/blob/master/cookbooks/web/recipes/cgimap.rb#L52

@tomhughes
Copy link
Contributor

Nothing has really changed, other than that you can now hit the limit more quickly - the limit is on the number of bytes downloaded not the number of requests as Nick seems to think.

I don't see why we should increase the limits - they're fairly generous I think and hitting them is normally an indication that you're doing something silly.

@mmd-osm
Copy link
Collaborator Author

mmd-osm commented Jul 23, 2018

I remembered the bit about the download volume, but haven't really looked into the details. If limits are ok the way they are right now, there's no need to change anything.

One thing I found a bit strange in this report; assuming it is a rate limiting issue, I'd expect an HTTP/509 Bandwidth Limit Exceeded error, rather than some "frequently hangs waiting for a response".

@tomhughes
Copy link
Contributor

Yes that's what he should get - a hang is not due to any limits.

@mmd-osm mmd-osm deleted the patch/cache_prefetch branch September 9, 2018 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants