Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd support for diff uploads #140
Comments
zerebubuth
added
the
difficulty:hard
label
Dec 26, 2017
This comment has been minimized.
This comment has been minimized.
I use HTTP Basic Auth over HTTPS most of the time when I need to use the redacted history API.
I expect there's a fair number of editors using that. Some even over HTTP. Right now, for feature parity, we'd need to either keep HTTP Basic requests going to the rails port or support them in cgimap. I wouldn't be opposed to dropping basic auth support, and would support dropping basic auth without HTTPS, but this issue tracker isn't the right place to make that policy change. |
This comment has been minimized.
This comment has been minimized.
A quick update from my side. I completed a first proof of concept implementation, which covers creating/modifying/deleting nodes/ways/relations, including tags, way nodes and relation members. Also, copying from current_* tables to history tables is part of the PoC, as well as an update of the number of changes in the changeset. One of my test cases with 4341 nodes and some 180'000 tags in total that took around 15-20 minutes on rails now finishes in just 5 seconds. The code already survives a JOSM editing session as a drop in replacement for rails code. I see a number of follow up activities (in no particular order): Review code
Actions
SELECT "user_blocks".* FROM "user_blocks" WHERE "user_blocks"."user_id" = $1 AND (needs_view or ends_at > (now() at time zone 'utc')) LIMIT $2
Nice to have:
Open issues: https://github.com/mmd-osm/diffupload/issues |
This comment has been minimized.
This comment has been minimized.
@pnorman : as GSoC 2018 is coming up soon, and the API topic is on the proposed projects page again, I think it would make sense to review the "Full cgimap write support" proposal, given that we probably want to incorporate the upload redesign in this issue in some way. My other question would be if you (and possibly @Komzpa) could spend some time mentoring the upload redesign even outside of some GSoC framework, i.e. by reviewing code, bringing in ideas for improvement, providing guidance on how to add POST support to cgimap, which we could then route to my implementation. There's still lots of work to be done to get test cases up and running, and any support here would be highly welcomed as well. |
This comment has been minimized.
This comment has been minimized.
@zerebubuth : I started integrating my coding into cgimap now, please see https://github.com/mmd-osm/openstreetmap-cgimap/tree/feature/bulk_upload The biggest challenge at the moment is to include https://github.com/mmd-osm/diffupload/blob/master/src/diffuploader.cpp - as there's really no framework available in cgimap yet for HTTP POST request semantics. It doesn't really fit into the existing concept of a data_selection, and I would need the OsmChange message payload available (preferably as a file), along with the current UID and changeset number, and of course a database connection. The router needs some concept to distinguish HTTP GET / HEAD / OPTIONS / POST, and restrict the path to certain request methods. |
This comment has been minimized.
This comment has been minimized.
Komzpa
commented
Feb 4, 2018
@mmd-osm it would be cool if you can get in touch with @hesidoryn who is sketching golang version in https://github.com/hesidoryn/fastmap-wrapper - it may be much easier to port it all to golang as is, since it already has rich HTTP API framework embedded. |
This comment has been minimized.
This comment has been minimized.
Yes clearly this is the time to start over in a different language. Can we like finish this rewrite before starting on a new one please? |
This comment has been minimized.
This comment has been minimized.
Komzpa
commented
Feb 4, 2018
@thughes given current cgimap incarnation suffers from the same kind of architectural issues as railsport, it does not really matter which technology to use to start over as you'll have to start from scratch anyway. |
This comment has been minimized.
This comment has been minimized.
What "architectural issues" are you talking about? The whole point of cgimap is to avoid the architectural problems of using rails, and as far as I'm aware it does so. That's not so say it can't be improved of course but the basic architecture is good. Of course there is a "finish" as there are a finite number of API methods to implement and once they are all done the port away from rails is finished. Obviously more API methods might be added in future but that's the same regardless of technology. The point is that given limited engineering resource it's insane to keep forking it rather than concentrating on getting one thing done. Unfortunately it's more exciting to start your own new thing in the latest fancy language than to contribute to an existing project. |
This comment has been minimized.
This comment has been minimized.
Komzpa
commented
Feb 4, 2018
@tomhughes I'm talking about issues described in detail in openstreetmap/operations#135, that currently most of time of the api calls is lost in database roundtrips, on etiher rails or cgimap APIs. This behavior is dictated by "let's issue only selects without joins and sorts" concept in cgimap and the whole object model layout. There just isn't an idea that you may get all data you need (from all the tables you need) in a single database query in cgimap. There is also no "engineering resource" at all. You're not paying anybody here to be in position to dehumanize them and call them "resource". People are volunteers and may do whatever they want. Only result you get from telling someone "stop doing it" is them possibly stop doing it, not switching the thing you may want them to. If you think I shouldn't be getting people working on the similar topic in touch @tomhughes you should really reconsider what you're doing here. |
This comment has been minimized.
This comment has been minimized.
That's undoubtedly true of the diff upload API call, and possibly several others. I hope it's no longer true of anything that goes through cgimap, as we had some awesome contributions from @jronak in #130 and #136 which meant cgimap could make a single database call for extracting nodes, one for extracting ways and another for extracting relations. No doubt it would be possible to get that down to a single call, but the extra saving of a handful of round-trips didn't seem to be worth the extra effort. According to #122 (comment), each round-trip adds about 0.2ms, so a few 10s of round-trips isn't too bad. We had issues when cgimap was making 1000s of round-trips for individual objects.
It's not an explicit goal of cgimap to only issue selects without joins or sorts. In fact, cgimap now does many joins, some of which require sorting. I've tried to avoid sorting on the whole query because I'm worried about the extra load on the database. I'd love to be proved wrong about that, and we could make a few simple changes to enable sorted output.
I agree totally, and I hope to see some awesome stuff from fastmap-wrapper in the future. Having multiple projects means we can experiment with better ideas. If I were designing cgimap from scratch today then it would look totally different, and there's a number of mistakes in the design which have either required a lot of effort fix, or continue to require effort to work with. For example, the backend "independence" isn't something cgimap really uses, and leads to some ugly code to try and continue supporting. The data selection interface was intended to abstract over the data model, allowing handlers to be written in a style which concentrated on what was being queried, rather than how - ideally I wanted something which was close to a graph query like Gremlin, but unfortunately it didn't really work out that way. For sure, there's a lot of code in cgimap which ought to be provided by some HTTP standard library, as it would be in Go and any other sensible modern language for writing web services! Unfortunately, the origins of cgimap date back to 2009, so it carries some baggage because of assumptions which were true but are no longer, and the languages and libraries which were available at the time. @mmd-osm: Thanks so much for the work that you've been doing. I'm sorry I haven't had a chance to review it yet, but I'll try to get to that as soon as I can. |
This comment has been minimized.
This comment has been minimized.
@mmd-osm I had a quick look through and it looks good so far, thanks! Your code style is a little different from the repo style, but that's not worth worrying about now. The biggest issues seem to be plugging it into the request flow (i.e: not I've started trying to add code so we can support POST requests. Will PR that into your branch as soon as it's ready. |
This comment has been minimized.
This comment has been minimized.
Now that the OSM API rewrite in golang is an official GSoC 2018 project, the question about priorities is getting a bit more interesting now. |
This comment has been minimized.
This comment has been minimized.
WTF |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
No mentor? But seriously, who was responsible for deciding that without any consultation? |
This comment has been minimized.
This comment has been minimized.
This is a cgimap issue thread. It's not an appropriate place to be discussing the approval process for GSoC applications. While on a technical level I feel it's a shame to be duplicating work, it's clear that there's no great enthusiasm for the current cgimap implementation (even from me - my apologies for all the outstanding stuff blocked on me). If a new implementation is able to attract more sustained effort and maintenance, then that's a good thing for OSM. Even if the motivation for a new implementation is language chauvinism or other non-technical reasons. |
This comment has been minimized.
This comment has been minimized.
If I understood the application right, the scope of the GSoc project will cover the whole of OSM API 0.6, not just the parts that are currently implemented in cgimap. I don't want to go into the architectural implications on the Rails port (or projects like cgimap-ruby), but I feel that the scope is pretty huge for a 3 months project (which starts mid of May 2018 btw). Which brings me back to this issue: I think it's pretty clear that porting the current Rails based diff upload with its many single inserts and selects is nothing we want to advocate for a future golang based implementation. On the other hand redesigning this thing took me several weeks to figure out how the Rails port works, and then redesign and implement it as bulk insert. The results are pretty amazing with more than factor 100 speedup compared to the current Rails based implementation.
My motivation is pretty clear: instead of making the job for the student project even harder, let's provide a rock solid basis for a C++ --> Golang port (including test cases) and try to do as much of the heavy lifting upfront as possible. I'm sure this will be a clear win win. Let's jump right into it! |
This comment has been minimized.
This comment has been minimized.
No, it's just what's implemented in cgimap. If there is time new stuff might get worked on, but the priority is the ability to switch the deployment from cgimap to the go port. |
This comment has been minimized.
This comment has been minimized.
@zerebubuth : I think we've already achieved some extremely good progress with your refactoring work, so thanks for that. I'm continuing now in my branch: feature/bulk_upload...mmd-osm:feature/bulk_upload and created #143 to merge my changes back into your branch. List of changes
Current statusHere's a screenshot taken in wireshark for the very first successful changeset upload via cgimap, and JOSM was also happy with the result I did some local edits using Potlatch, Potlatch2 and iD as well. For testing, I can highly recommend to apply a few changes to the lighttpd config file: https://github.com/mmd-osm/openstreetmap-cgimap/wiki/lighttpd-proxy It acts as a reverse proxy for both CGImap and the Rails port. All requests to http://localhost:31337 get automatically dispatched to correct server, even OAuth works flawlessly. ~~~Open topics~~~ --> solved
|
May 9, 2018
This was referenced
This comment has been minimized.
This comment has been minimized.
@tomhughes : at some point in the future we may want to deploy the changeset upload redesign on the dev instance to get some wider community exposure and identify bugs we couldn’t come up with in our own unit tests. Has cgimap ever been deployed on the dev instance before, alongside a working rails port installation, and some Apache / lighttpd to route requests to either one? How would you roughly assess the time and effort needed? TIA. |
This comment has been minimized.
This comment has been minimized.
No, that has never been done. It shouldn't be too hard, if we do it for all the dev apis. Doing it for just one will be much harder. |
This comment has been minimized.
This comment has been minimized.
@tomhughes : I have another short question on how to deploy cgimap on the dev instance. Some older commits in the chef repo indicate that the dev instance is also managed by chef and some PPA for cgimap would be needed for deployment. I don't seem to find any information describing how this should actually work. Do I have to go that route or would compiling the sources on the dev instance also be feasible? The new changeset upload doesn't introduce any new dependencies except for a C++11 compiler. So whatever worked in the past to deploy cgimap should basically still be valid. I have merged all 3 open PRs in this repo into https://github.com/mmd-osm/openstreetmap-cgimap/tree/demo - this could be used for testing. |
This comment has been minimized.
This comment has been minimized.
Ah I forgot cgimap was coming from a PPA now - it used to be build from source. I don't really want to reintroduce building from source, so a PPA is probably best but it will need to be separate to our main one or the package will need to have a different name. |
This comment has been minimized.
This comment has been minimized.
Hmmm. ok, this is going to be a bit of a challenge then, as I can't find the PPA packaging instructions for CGImap anywhere. The As the openstreetmap-cgimap - 0.6.0-1~xenial1 package installs both shared libs and the cgimap binary, it doesn't make too much sense recreating those settings from scratch. Edit: It looks like the relevant details are available here: http://ppa.launchpad.net/zerebubuth/openstreetmap-cgimap/ubuntu/pool/main/o/openstreetmap-cgimap/openstreetmap-cgimap_0.6.0-1~xenial1.debian.tar.xz I believe I'd still need some support from @zerebubuth on how to create a ppa which could be used for testing on the dev instance. |
This comment has been minimized.
This comment has been minimized.
Yup, apologies - that's not documented anywhere. I think I was so exhausted by the time I finally got it working that I didn't write it down anywhere. The distro-specific packaging is kept on distro-specific branches, of which there's currently The process of building is something like (apologies if this doesn't work - mostly from memory):
Based on whether the package is formatted and signed correctly, Launchpad may or may not send you an email. If you don't get an email and the package doesn't show up in the build queue, it's worth checking the |
This comment has been minimized.
This comment has been minimized.
Ooops, looks like I had a whole bunch of local changes too. Pushed those to |
This comment has been minimized.
This comment has been minimized.
Many thanks for your very detailed description of the build process. I totally agree it's really mind-boggling. As I'm working with a fork of your repo, I also tried to follow https://unix.stackexchange.com/questions/324680/how-to-apply-a-patch-in-a-debian-package - The idea here is to use your package as a baseline and apply my changes as patches to it. This however got rejected with:
After a bit of further research there seems to be yet another way:
(instead of using tar directly, some guides recommend I'm not totally happy with this process yet, as it skips the local build. That's a huge pain on its own as any issues will only show up on the launchpad build process log half an hour later. NB Later I found out that there's some way to run the build process locally via pbuilder detailspbuilder :: initial setup
pbuilder :: build binary package
After a few iterations, launchpad finally produced some binary files for a testing version named Build results
I added the xenial PPA to my local box, installed openstreetmap-cgimap and ran a few test cases, which seemed ok. For better visibility into failing test cases, I'd also propose the following change to debian/rules:
|
This comment has been minimized.
This comment has been minimized.
@tomhughes : now that we have PPAs for both xenial and bionic, I wonder what the next step would be. Open an issue on the operations tracker, maybe? |
This comment has been minimized.
This comment has been minimized.
We're likely to be extremely busy in the near future, and it's very much non-trivial to do, but open a ticket and I'll get to it when I can. |
This comment has been minimized.
This comment has been minimized.
Thanks, will do soon. Meanwhile, I also set up a mini test drive on http://static.64.206.46.78.clients.your-server.de:31337/user/mmd2 (= http://78.46.206.64:31337/user/mmd2) so people can already start playing around with the new upload without impacting the dev instance. To try it out create a new user, wait up to 5 min. for auto confirm process to run in background (no email confirm needed), and go ahead editing in Potlatch 2, iD or JOSM (see instructions on the user page on how to set up OAuth for JOSM). There's also a blog post out there now: https://www.openstreetmap.org/user/mmd/diary/44318 |
This comment has been minimized.
This comment has been minimized.
simonpoole
commented
Jul 5, 2018
•
@mmd-osm not getting very far with testing uploads right now (getting a 400 without any further messaging), maybe a short interactive session on irc would be the easiest to debug this and avoid spamming this issue (downloads work and OAuth handshake seemed to run smoothly too). |
This comment has been minimized.
This comment has been minimized.
@simonpoole : many thanks for your feedback. Could you somehow extract the osmChange message? Which editor did you use for testing? We could arrange an irc session maybe later tonight. |
This comment has been minimized.
This comment has been minimized.
simonpoole
commented
Jul 5, 2018
•
@mmd-osm The current diff upload mechanism in Vespucci uses OkHttp which uses Transfer-encoding chunked, could this be causing the issue? Capture from wireshark enclosed |
This comment has been minimized.
This comment has been minimized.
That's interesting, I guess you don't encounter this issue when testing against dev.openstreetmap.org? |
This comment has been minimized.
This comment has been minimized.
simonpoole
commented
Jul 5, 2018
•
Works ok vs dev instance etc.. It is more than slightly painful to get OkHttp to change its ways so I'm not sure if I can do A-B testing today, but I intend to asap. |
This comment has been minimized.
This comment has been minimized.
@simonpoole : I feel like chasing chunked transfer encoding issues on lighttpd might not be worth it, so I decided to set up an Apache instance on port 80 now, following the Chef cookbook. It serves as a reverse proxy for fcgi (like in the cookbook), but also forwards requests to the Rails port running locally on port 3000. Unfortunately I cannot follow all steps in the cookbook, as I'm not really familiar with Passenger and don't know how to set that up properly. At least for JOSM and iD this doesn't seem to impact the upload: http://static.64.206.46.78.clients.your-server.de/changeset/2000000054 Could you please give this alternative URL a try: http://static.64.206.46.78.clients.your-server.de I'm getting the following in the logs now (reponse= HTTP/500)
Looks like you've created your first successful changeset in Vespucci: http://static.64.206.46.78.clients.your-server.de/changeset/2000000055 |
This comment has been minimized.
This comment has been minimized.
simonpoole
commented
Jul 5, 2018
•
Works now, thanks. A simple diff upload completes now and seems to do the "right" thing..... now on to test if you meticulously copied all the error messages :-) (The error message you saw was because I was trying to convince OkHttp not to use chunking, which involves turning off gzipping and a lot of other stuff). |
This comment has been minimized.
This comment has been minimized.
simonpoole
commented
Jul 5, 2018
•
Sigh the celebrations were a bit premature, with my standard code it now fails in the actual diff upload with the error Http header 'Content-Length' missing (which is true :-)). So likely you need to handle the chunked transfer in whatever you are using for cgi-map as a server too. |
This comment has been minimized.
This comment has been minimized.
Pretty good that we're discovering this in early testing, I was expecting some 'i don't know what i don't know' issues, that's why I set up this demo. None of the other editors I tested uses chunked encoding. Right now CGImap is running as daemon ( There's a check for the Content-Length HTTP Header in my code, and if it's missing, it will return an error message. I think I need to take a closer look on how to deal with this. Somewhat relevant maybe: https://bz.apache.org/bugzilla/show_bug.cgi?id=53332, https://bz.apache.org/bugzilla/show_bug.cgi?id=57087 Does this stop you from testing other things now, or can you go back to the same mode as for cs 2000000055? |
This comment has been minimized.
This comment has been minimized.
simonpoole
commented
Jul 5, 2018
There's no problem in testing with the "working" code, but the problem still has to be resolved: on the one hand OkHttp is "fairly" popular (as in the likely the most used HTTP library globally), on the other hand supporting HTTP 1.1 doesn't exactly fall in the newfangled stuff category. |
This comment has been minimized.
This comment has been minimized.
Strictly speaking, CGImap doesn't even talk HTTP but FastCGI protocol and Apache and its modules do all the translation between HTTP/* and FastCGI. Now the question boils down to how chunked encoding is getting mapped onto the FastCGI protocol. I wonder if you could make your Vespucci APK file available for testing somehow. After installing the released version from Github, I figured out that I'm missing proper endpoint information for OAuth, presumably this has to be added to https://github.com/MarcusWolschon/osmeditor4android/blob/master/src/main/res/values/apis.xml. |
This comment has been minimized.
This comment has been minimized.
simonpoole
commented
Jul 6, 2018
•
@mmd-osm sure I can make an apk available for download, I assume this should be with chunked transfers (the OAuth key config is currently static and requires a recompile, which I've naturally done). APK https://drive.google.com/file/d/1aXcvPJ2DEDk43pqwoaPWE-mPLM7G96J3/view?usp=sharing Note this was built from code that is undergoing some refactoring wrt presets and still has a couple of remaining issues, so it shouldn't be used for anything else than testing. You will need to add an API entry (in the Advanced preferences) for the test server with the URL http://static.64.206.46.78.clients.your-server.de/api/0.6/ and then set that as the current API instance to use. |
This comment has been minimized.
This comment has been minimized.
@simonpoole : thanks a lot. I could install the apk now and already reproduce the "Content-length missing" error. |
This comment has been minimized.
This comment has been minimized.
@simonpoole : I have adjusted the logic to extract the POST payload a bit and could now successfully upload a change via Vespucci: http://static.64.206.46.78.clients.your-server.de/changeset/2000000073
|
This comment has been minimized.
This comment has been minimized.
@zerebubuth : now that GSoC is over (and we're still on cgimap), I'm wondering a bit what might be the best way to move forward. Deploying cgimap on dev is already triggered in a separate operations issue, so that more people will have a chance for the second round of tests. Then there's a quite large pull request, including lots test cases for a good overall code coverage. Do you still plan to review this code in some way, or should we just merge it? I think it would be incredibly useful to have some more feedback on the code, although this will for sure be quite time consuming to go through. Thanks! |
This comment has been minimized.
This comment has been minimized.
Thanks to @tomhughes tireless work on the Chef repo, the changeset upload is now ready for general testing on the dev instance: https://upload.apis.dev.openstreetmap.org Settings for JOSM It supports both Basic auth and OAuth, as well as uncompressed and compressed messages for osmChange upload. Examples: https://upload.apis.dev.openstreetmap.org/user/mmd2/history Patch for JOSM to test compressed uploadIndex: src/org/openstreetmap/josm/io/OsmApi.java
===================================================================
--- src/org/openstreetmap/josm/io/OsmApi.java (Revision 14258)
+++ src/org/openstreetmap/josm/io/OsmApi.java (Arbeitskopie)
@@ -4,6 +4,9 @@
import static org.openstreetmap.josm.tools.I18n.tr;
import static org.openstreetmap.josm.tools.I18n.trn;
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.PrintWriter;
import java.io.StringReader;
@@ -21,6 +24,8 @@
import java.util.Map;
import java.util.function.Consumer;
import java.util.function.Function;
+import java.util.zip.GZIPInputStream;
+import java.util.zip.GZIPOutputStream;
import javax.xml.parsers.ParserConfigurationException;
@@ -642,6 +647,17 @@
return sendRequest(requestMethod, urlSuffix, requestBody, monitor, true, false);
}
+
+ public static byte[] compress(String data) throws IOException {
+ ByteArrayOutputStream bos = new ByteArrayOutputStream(data.length());
+ GZIPOutputStream gzip = new GZIPOutputStream(bos);
+ gzip.write(data.getBytes(StandardCharsets.UTF_8));
+ gzip.close();
+ byte[] compressed = bos.toByteArray();
+ bos.close();
+ return compressed;
+ }
+
/**
* Generic method for sending requests to the OSM API.
*
@@ -682,7 +698,10 @@
}
if ("PUT".equals(requestMethod) || "POST".equals(requestMethod) || "DELETE".equals(requestMethod)) {
+ System.err.println(urlSuffix);
client.setHeader("Content-Type", "text/xml");
+ if (urlSuffix.endsWith("/upload"))
+ client.setHeader("Content-Encoding", "gzip"); // TEST ONLY
// It seems that certain bits of the Ruby API are very unhappy upon
// receipt of a PUT/POST message without a Content-length header,
// even if the request has no payload.
@@ -689,7 +708,14 @@
// Since Java will not generate a Content-length header unless
// we use the output stream, we create an output stream for PUT/POST
// even if there is no payload.
- client.setRequestBody((requestBody != null ? requestBody : "").getBytes(StandardCharsets.UTF_8));
+
+ if (urlSuffix.endsWith("/upload")) {
+
+ byte[] compressedRequestBody = compress(requestBody != null ? requestBody : ""); // TEST ONLY
+ client.setRequestBody(compressedRequestBody); // TEST ONLY
+ }
+ else
+ client.setRequestBody((requestBody != null ? requestBody : "").getBytes(StandardCharsets.UTF_8));
}
final HttpClient.Response response = client.connect();
|
zerebubuth commentedDec 26, 2017
Now that we have support for OAuth, we can add support for diff (i.e:
osmChange
) uploads. It seems like there's a considerable improvement to be made by batching queries to the database.Considerations: