New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for diff uploads #140

Closed
zerebubuth opened this Issue Dec 26, 2017 · 46 comments

Comments

Projects
None yet
6 participants
@zerebubuth
Copy link
Owner

zerebubuth commented Dec 26, 2017

Now that we have support for OAuth, we can add support for diff (i.e: osmChange) uploads. It seems like there's a considerable improvement to be made by batching queries to the database.

Considerations:

  • This will be cgimap's first write operation, so we should make sure we've got excellent test coverage and the code has been thoroughly reviewed to avoid corrupting the database.
  • Although cgimap supports OAuth for viewing redacted versions of historical elements, it doesn't support HTTP Basic Auth. Only moderators use the redacted-history API, and they can probably live without HTTP Basic Auth. But there might be other people using Basic for the regular user API. Do we want to support that too? Or work with something like cgimap-ruby to allow the auth to be handled at a higher level?
@pnorman

This comment has been minimized.

Copy link
Contributor

pnorman commented Dec 30, 2017

Only moderators use the redacted-history API, and they can probably live without HTTP Basic Auth.

I use HTTP Basic Auth over HTTPS most of the time when I need to use the redacted history API.

But there might be other people using Basic for the regular user API. Do we want to support that too?

I expect there's a fair number of editors using that. Some even over HTTP.

Right now, for feature parity, we'd need to either keep HTTP Basic requests going to the rails port or support them in cgimap.

I wouldn't be opposed to dropping basic auth support, and would support dropping basic auth without HTTPS, but this issue tracker isn't the right place to make that policy change.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jan 2, 2018

A quick update from my side. I completed a first proof of concept implementation, which covers creating/modifying/deleting nodes/ways/relations, including tags, way nodes and relation members. Also, copying from current_* tables to history tables is part of the PoC, as well as an update of the number of changes in the changeset.

One of my test cases with 4341 nodes and some 180'000 tags in total that took around 15-20 minutes on rails now finishes in just 5 seconds. The code already survives a JOSM editing session as a drop in replacement for rails code.

peek 2018-01-05 18-47

I see a number of follow up activities (in no particular order):


Review code

  • Changes to locking concept

  • Check object timestamps - due to mass updates, many objects share the same timestamp. Check that this is ok for downstream apps (it shouldn't really cause issues)

  • Prepared statements: Double check escaping for tag keys, values and rel member roles


Actions

  • Adding proper validations, error messages and http::* exceptions everywhere

  • Prepare cgimap handler for http post calls

  • Fix Updating Changeset Bounding box for relations

  • Check if user is blocked

SELECT  "user_blocks".* FROM "user_blocks" WHERE "user_blocks"."user_id" = $1 AND (needs_view or ends_at > (now() at time zone 'utc')) LIMIT $2
  • XML Parser operator() - chunking is absolutely necessary to avoid XML_MAX_LOOKUP_LIMIT due to excessive memory consumption.

  • XML Parser: additional NULL check missing

  • Deleting objects, which are still referenced by another object, and if-used flags is set -> should be reflected correctly in diffResult.

  • Adding testcases
    Examples:

    1. openstreetmap/openstreetmap-website#1465 (comment) & openstreetmap/iD#3208 (comment) → already works with the current code as-is.

Nice to have:

  • Review every relevant exception raised in current Rails code, and check how it applies here
  • Extra care with creating/modifying/deleting relations: processing may not be split up in multiple parts due to possible dependencies of master/child relations.

Open issues: https://github.com/mmd-osm/diffupload/issues

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jan 17, 2018

@pnorman : as GSoC 2018 is coming up soon, and the API topic is on the proposed projects page again, I think it would make sense to review the "Full cgimap write support" proposal, given that we probably want to incorporate the upload redesign in this issue in some way.

My other question would be if you (and possibly @Komzpa) could spend some time mentoring the upload redesign even outside of some GSoC framework, i.e. by reviewing code, bringing in ideas for improvement, providing guidance on how to add POST support to cgimap, which we could then route to my implementation.

There's still lots of work to be done to get test cases up and running, and any support here would be highly welcomed as well.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jan 27, 2018

@zerebubuth : I started integrating my coding into cgimap now, please see https://github.com/mmd-osm/openstreetmap-cgimap/tree/feature/bulk_upload

The biggest challenge at the moment is to include https://github.com/mmd-osm/diffupload/blob/master/src/diffuploader.cpp - as there's really no framework available in cgimap yet for HTTP POST request semantics. It doesn't really fit into the existing concept of a data_selection, and I would need the OsmChange message payload available (preferably as a file), along with the current UID and changeset number, and of course a database connection.

The router needs some concept to distinguish HTTP GET / HEAD / OPTIONS / POST, and restrict the path to certain request methods.

@Komzpa

This comment has been minimized.

Copy link

Komzpa commented Feb 4, 2018

@mmd-osm it would be cool if you can get in touch with @hesidoryn who is sketching golang version in https://github.com/hesidoryn/fastmap-wrapper - it may be much easier to port it all to golang as is, since it already has rich HTTP API framework embedded.

@tomhughes

This comment has been minimized.

Copy link
Contributor

tomhughes commented Feb 4, 2018

Yes clearly this is the time to start over in a different language. Can we like finish this rewrite before starting on a new one please?

@Komzpa

This comment has been minimized.

Copy link

Komzpa commented Feb 4, 2018

@thughes given current cgimap incarnation suffers from the same kind of architectural issues as railsport, it does not really matter which technology to use to start over as you'll have to start from scratch anyway.
There is no "finish" either. :)

@tomhughes

This comment has been minimized.

Copy link
Contributor

tomhughes commented Feb 4, 2018

What "architectural issues" are you talking about? The whole point of cgimap is to avoid the architectural problems of using rails, and as far as I'm aware it does so. That's not so say it can't be improved of course but the basic architecture is good.

Of course there is a "finish" as there are a finite number of API methods to implement and once they are all done the port away from rails is finished. Obviously more API methods might be added in future but that's the same regardless of technology.

The point is that given limited engineering resource it's insane to keep forking it rather than concentrating on getting one thing done. Unfortunately it's more exciting to start your own new thing in the latest fancy language than to contribute to an existing project.

@Komzpa

This comment has been minimized.

Copy link

Komzpa commented Feb 4, 2018

@tomhughes I'm talking about issues described in detail in openstreetmap/operations#135, that currently most of time of the api calls is lost in database roundtrips, on etiher rails or cgimap APIs. This behavior is dictated by "let's issue only selects without joins and sorts" concept in cgimap and the whole object model layout. There just isn't an idea that you may get all data you need (from all the tables you need) in a single database query in cgimap.

There is also no "engineering resource" at all. You're not paying anybody here to be in position to dehumanize them and call them "resource". People are volunteers and may do whatever they want. Only result you get from telling someone "stop doing it" is them possibly stop doing it, not switching the thing you may want them to.

If you think I shouldn't be getting people working on the similar topic in touch @tomhughes you should really reconsider what you're doing here.

@zerebubuth

This comment has been minimized.

Copy link
Owner Author

zerebubuth commented Feb 4, 2018

currently most of time of the api calls is lost in database roundtrips, on etiher rails or cgimap APIs.

That's undoubtedly true of the diff upload API call, and possibly several others. I hope it's no longer true of anything that goes through cgimap, as we had some awesome contributions from @jronak in #130 and #136 which meant cgimap could make a single database call for extracting nodes, one for extracting ways and another for extracting relations.

No doubt it would be possible to get that down to a single call, but the extra saving of a handful of round-trips didn't seem to be worth the extra effort. According to #122 (comment), each round-trip adds about 0.2ms, so a few 10s of round-trips isn't too bad. We had issues when cgimap was making 1000s of round-trips for individual objects.

This behavior is dictated by "let's issue only selects without joins and sorts" concept in cgimap and the whole object model layout.

It's not an explicit goal of cgimap to only issue selects without joins or sorts. In fact, cgimap now does many joins, some of which require sorting. I've tried to avoid sorting on the whole query because I'm worried about the extra load on the database. I'd love to be proved wrong about that, and we could make a few simple changes to enable sorted output.

People are volunteers and may do whatever they want.

I agree totally, and I hope to see some awesome stuff from fastmap-wrapper in the future. Having multiple projects means we can experiment with better ideas. If I were designing cgimap from scratch today then it would look totally different, and there's a number of mistakes in the design which have either required a lot of effort fix, or continue to require effort to work with.

For example, the backend "independence" isn't something cgimap really uses, and leads to some ugly code to try and continue supporting.

The data selection interface was intended to abstract over the data model, allowing handlers to be written in a style which concentrated on what was being queried, rather than how - ideally I wanted something which was close to a graph query like Gremlin, but unfortunately it didn't really work out that way.

For sure, there's a lot of code in cgimap which ought to be provided by some HTTP standard library, as it would be in Go and any other sensible modern language for writing web services! Unfortunately, the origins of cgimap date back to 2009, so it carries some baggage because of assumptions which were true but are no longer, and the languages and libraries which were available at the time.

@mmd-osm: Thanks so much for the work that you've been doing. I'm sorry I haven't had a chance to review it yet, but I'll try to get to that as soon as I can.

@zerebubuth

This comment has been minimized.

Copy link
Owner Author

zerebubuth commented Feb 5, 2018

@mmd-osm I had a quick look through and it looks good so far, thanks! Your code style is a little different from the repo style, but that's not worth worrying about now. The biggest issues seem to be plugging it into the request flow (i.e: not demo()), and the lack of tests.

I've started trying to add code so we can support POST requests. Will PR that into your branch as soon as it's ready.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Apr 23, 2018

Yes clearly this is the time to start over in a different language. Can we like finish this rewrite before starting on a new one please?

Now that the OSM API rewrite in golang is an official GSoC 2018 project, the question about priorities is getting a bit more interesting now.

@tomhughes

This comment has been minimized.

Copy link
Contributor

tomhughes commented Apr 23, 2018

WTF

@mmd-osm

This comment has been minimized.

@tomhughes

This comment has been minimized.

Copy link
Contributor

tomhughes commented Apr 23, 2018

No mentor? But seriously, who was responsible for deciding that without any consultation?

@zerebubuth

This comment has been minimized.

Copy link
Owner Author

zerebubuth commented Apr 24, 2018

This is a cgimap issue thread. It's not an appropriate place to be discussing the approval process for GSoC applications.

While on a technical level I feel it's a shame to be duplicating work, it's clear that there's no great enthusiasm for the current cgimap implementation (even from me - my apologies for all the outstanding stuff blocked on me). If a new implementation is able to attract more sustained effort and maintenance, then that's a good thing for OSM. Even if the motivation for a new implementation is language chauvinism or other non-technical reasons.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Apr 25, 2018

If I understood the application right, the scope of the GSoc project will cover the whole of OSM API 0.6, not just the parts that are currently implemented in cgimap. I don't want to go into the architectural implications on the Rails port (or projects like cgimap-ruby), but I feel that the scope is pretty huge for a 3 months project (which starts mid of May 2018 btw).

Which brings me back to this issue: I think it's pretty clear that porting the current Rails based diff upload with its many single inserts and selects is nothing we want to advocate for a future golang based implementation. On the other hand redesigning this thing took me several weeks to figure out how the Rails port works, and then redesign and implement it as bulk insert. The results are pretty amazing with more than factor 100 speedup compared to the current Rails based implementation.

I couldn't really figure out how to integrate the standalone piece into cgimap, that's where I heavily depend on @zerebubuth's insights. (almost done, see below)... I already have a number of test cases waiting here to be integrated, which I extracted from the current Rails port. It's just a matter of @zerebubuth setting up the environment to have 1 test case and I could continue with the rest, or we could find even more people working on the test cases.

My motivation is pretty clear: instead of making the job for the student project even harder, let's provide a rock solid basis for a C++ --> Golang port (including test cases) and try to do as much of the heavy lifting upfront as possible. I'm sure this will be a clear win win.

Let's jump right into it!

@pnorman

This comment has been minimized.

Copy link
Contributor

pnorman commented Apr 30, 2018

If I understood the application right, the scope of the GSoc project will cover the whole of OSM API 0.6, not just the parts that are currently implemented in cgimap. I don't want to go into the architectural implications on the Rails port (or projects like cgimap-ruby), but I feel that the scope is pretty huge for a 3 months project (which starts mid of May 2018 btw).

No, it's just what's implemented in cgimap. If there is time new stuff might get worked on, but the priority is the ability to switch the deployment from cgimap to the go port.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented May 5, 2018

@zerebubuth : I think we've already achieved some extremely good progress with your refactoring work, so thanks for that. I'm continuing now in my branch: feature/bulk_upload...mmd-osm:feature/bulk_upload and created #143 to merge my changes back into your branch.

List of changes

  • Moved temporary table creation to changeset_upload_responder for the time being
  • Moved most parts of the "demo" code to changeset_upload_responder
  • ApiDB_Node_Updater : Node_Updater inheritance also applied to Ways, Relations and Changesets
  • XML parser: added chunk size, fixed bug with empty attributes
  • xml_formatter: added two methods for diffResults
  • Changed inheritance to new osm_diffresult_responder class, as osmchange_responder didn't fit semantically (the document returned is a diffResult, not an osmChange document). osm_diffresult_responder also writes out a diffResult message now.
  • Extract payload from HTTP POST request, respecting hard coded 50MB size limit
  • Added payload_enabled_responder, which inherits from responder. This is used to propagate payload and user id to our changeset upload responder. Design wise this is not at all beautiful, I'm open for suggestions here.

Current status

Here's a screenshot taken in wireshark for the very first successful changeset upload via cgimap, and JOSM was also happy with the result 💯 🥇

bildschirmfoto vom 2018-05-06 20-52-13

I did some local edits using Potlatch, Potlatch2 and iD as well.

For testing, I can highly recommend to apply a few changes to the lighttpd config file: https://github.com/mmd-osm/openstreetmap-cgimap/wiki/lighttpd-proxy

It acts as a reverse proxy for both CGImap and the Rails port. All requests to http://localhost:31337 get automatically dispatched to correct server, even OAuth works flawlessly.

~~~Open topics~~~ --> solved

  • changeset_upload depends on Transaction_Manager, but it's not clear, how it should get this from the framework. That's also the question if we want a separate DB connection, like in the case of OAuth. writeable_pgsql_selection is set to read-only mode after temporary tables have been created, so this doesn't seem like a perfect fit.

  • changeset_upload_responder still uses ApiDB_*_Updater instance, which doesn't make a whole lot of sense in 0.6 code.
    For the data selection, this is already dealt with in main.cpp with by a factory class boost::shared_ptr<data_selection::factory> factory = create_backend(options);.
    It's not clear to me, if we want something like a factory for database updates, or more specifically for changeset uploads, or something entirely different.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jun 3, 2018

@tomhughes : at some point in the future we may want to deploy the changeset upload redesign on the dev instance to get some wider community exposure and identify bugs we couldn’t come up with in our own unit tests. Has cgimap ever been deployed on the dev instance before, alongside a working rails port installation, and some Apache / lighttpd to route requests to either one? How would you roughly assess the time and effort needed? TIA.

@tomhughes

This comment has been minimized.

Copy link
Contributor

tomhughes commented Jun 3, 2018

No, that has never been done.

It shouldn't be too hard, if we do it for all the dev apis. Doing it for just one will be much harder.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jun 9, 2018

@tomhughes : I have another short question on how to deploy cgimap on the dev instance.

Some older commits in the chef repo indicate that the dev instance is also managed by chef and some PPA for cgimap would be needed for deployment. I don't seem to find any information describing how this should actually work. Do I have to go that route or would compiling the sources on the dev instance also be feasible?

The new changeset upload doesn't introduce any new dependencies except for a C++11 compiler. So whatever worked in the past to deploy cgimap should basically still be valid.

I have merged all 3 open PRs in this repo into https://github.com/mmd-osm/openstreetmap-cgimap/tree/demo - this could be used for testing.

@tomhughes

This comment has been minimized.

Copy link
Contributor

tomhughes commented Jun 9, 2018

Ah I forgot cgimap was coming from a PPA now - it used to be build from source.

I don't really want to reintroduce building from source, so a PPA is probably best but it will need to be separate to our main one or the package will need to have a different name.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jun 9, 2018

Hmmm. ok, this is going to be a bit of a challenge then, as I can't find the PPA packaging instructions for CGImap anywhere. The /debian directory which is usually the place for this sort of information isn't part of this repo - (like in the following example: https://github.com/valhalla/packaging).

As the openstreetmap-cgimap - 0.6.0-1~xenial1 package installs both shared libs and the cgimap binary, it doesn't make too much sense recreating those settings from scratch.

Edit: It looks like the relevant details are available here: http://ppa.launchpad.net/zerebubuth/openstreetmap-cgimap/ubuntu/pool/main/o/openstreetmap-cgimap/openstreetmap-cgimap_0.6.0-1~xenial1.debian.tar.xz

I believe I'd still need some support from @zerebubuth on how to create a ppa which could be used for testing on the dev instance.

@zerebubuth

This comment has been minimized.

Copy link
Owner Author

zerebubuth commented Jun 11, 2018

Yup, apologies - that's not documented anywhere. I think I was so exhausted by the time I finally got it working that I didn't write it down anywhere.

The distro-specific packaging is kept on distro-specific branches, of which there's currently ubuntu/wily and ubuntu/xenial. I think we can probably ignore the ubuntu/wily one now. I used git-buildpackage to manage the build process, as that seemed to be the least awful way of doing it.

The process of building is something like (apologies if this doesn't work - mostly from memory):

  1. Tag release version on master.
  2. Merge release version into ubuntu/xenial.
  3. Update debian/changelog in the branch to have a release message. I haven't been particularly good about making these very meaningful, mostly out of frustration with having to keep trying again and again before it works. I don't understand the version numbering system (-1~ubuntu1) and sometimes I have to fiddle with it to get a usable build. Worth pointing out that the formatting of that file is very sensitive and brittle. If you get it wrong, then either the Debian build tools or Launchpad will reject the package without a useful error message.
  4. Commit the changes.
  5. mkdir -p ../build-area
  6. DEBSIGN_KEYID=XXXXXXXX DEB_SIGN_KEYID=XXXXXXXX DEBSIGN_MAINT="Your Name <you@example.com>" gbp buildpackage --git-tag --git-export-dir=../build-area --git-export=ubuntu/xenial --git-debian-branch=ubuntu/xenial (If you don't have gbp then sudo apt install git-buildpackage dh-autoreconf). Note that the package version is taken from debian/changelog, so unless you change and commit that file, you might find gbp does nothing or errors out. Use whichever key you've uploaded to Launchpad. I had real trouble getting it to use the right key, which is why I've probably overspecced the environment variables.
  7. cd ../build-area and use dput to upload the .changes file.

Based on whether the package is formatted and signed correctly, Launchpad may or may not send you an email. If you don't get an email and the package doesn't show up in the build queue, it's worth checking the debian/changelog file.

@zerebubuth

This comment has been minimized.

Copy link
Owner Author

zerebubuth commented Jun 11, 2018

Ooops, looks like I had a whole bunch of local changes too. Pushed those to ubuntu/xenial now, so it should be up-to-date with master.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jun 15, 2018

Many thanks for your very detailed description of the build process. I totally agree it's really mind-boggling.

As I'm working with a fork of your repo, gbp buildpackage always wanted to reference your upstream repo and then bailed out with gbp:error: upstream/0.6.1 is not a valid treeish. Still trying to figure out a sensible way on how to use gbp buildpackage in this scenario.

I also tried to follow https://unix.stackexchange.com/questions/324680/how-to-apply-a-patch-in-a-debian-package - The idea here is to use your package as a baseline and apply my changes as patches to it. This however got rejected with:

Unable to find openstreetmap-cgimap_0.6.0.orig.tar.gz in upload or distribution.
Files specified in DSC are broken or missing, skipping package unpack verification.
Source/binary (i.e. mixed) uploads are not allowed.

After a bit of further research there seems to be yet another way:

tar cvfj openstreetmap-cgimap_0.7.0.orig.tar.bz2 --exclude .git openstreetmap-cgimap
debuild -kXXX_my_signing_keyXXX -S -sa
dput -f ppa:mmd-osm/cgimap-test-002 openstreetmap-cgimap_0.6.1-1~xenial1_source.changes

(instead of using tar directly, some guides recommend git archive --format=tar.gz HEAD > ../openstreetmap-cgimap_0.6.1.orig.tar.gz instead, optionally suppressing files via .gitattributes, see http://www.lpenz.org/articles/debgit/index.html)

I'm not totally happy with this process yet, as it skips the local build. That's a huge pain on its own as any issues will only show up on the launchpad build process log half an hour later.

NB Later I found out that there's some way to run the build process locally via pbuilder. Using gbp buildpackage right from the start seems much more convenient, though.

pbuilder details

pbuilder :: initial setup

sudo pbuilder --create --distribution xenial --architecture amd64 --basetgz /var/cache/pbuilder/xenial-amd64-base.tgz

sudo pbuilder update --distribution xenial --architecture amd64 --basetgz /var/cache/pbuilder/xenial-amd64-base.tgz  --othermirror "deb http://archive.ubuntu.com/ubuntu xenial main restricted universe multiverse" --override-config

pbuilder :: build binary package

sudo pbuilder --build --distribution xenial --architecture amd64 --basetgz /var/cache/pbuilder/xenial-amd64-base.tgz openstreetmap-cgimap_0.6.1-1~xenial1.dsc

After a few iterations, launchpad finally produced some binary files for a testing version named openstreetmap-cgimap 0.6.1-1~xenial1 :

Build results

I added the xenial PPA to my local box, installed openstreetmap-cgimap and ran a few test cases, which seemed ok.

For better visibility into failing test cases, I'd also propose the following change to debian/rules:

override_dh_auto_test:
        VERBOSE=1 dh_auto_test
@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jun 21, 2018

@tomhughes : now that we have PPAs for both xenial and bionic, I wonder what the next step would be. Open an issue on the operations tracker, maybe?

@tomhughes

This comment has been minimized.

Copy link
Contributor

tomhughes commented Jun 21, 2018

We're likely to be extremely busy in the near future, and it's very much non-trivial to do, but open a ticket and I'll get to it when I can.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jul 3, 2018

open a ticket and I'll get to it when I can.

Thanks, will do soon.

Meanwhile, I also set up a mini test drive on http://static.64.206.46.78.clients.your-server.de:31337/user/mmd2 (= http://78.46.206.64:31337/user/mmd2) so people can already start playing around with the new upload without impacting the dev instance.

To try it out create a new user, wait up to 5 min. for auto confirm process to run in background (no email confirm needed), and go ahead editing in Potlatch 2, iD or JOSM (see instructions on the user page on how to set up OAuth for JOSM).

There's also a blog post out there now: https://www.openstreetmap.org/user/mmd/diary/44318

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Jul 5, 2018

@mmd-osm not getting very far with testing uploads right now (getting a 400 without any further messaging), maybe a short interactive session on irc would be the easiest to debug this and avoid spamming this issue (downloads work and OAuth handshake seemed to run smoothly too).

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jul 5, 2018

@simonpoole : many thanks for your feedback. Could you somehow extract the osmChange message? Which editor did you use for testing? We could arrange an irc session maybe later tonight.

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Jul 5, 2018

@mmd-osm hmm seems to be some emulator induced brokenness that I've seen before, so likely false alarm for now.

The current diff upload mechanism in Vespucci uses OkHttp which uses Transfer-encoding chunked, could this be causing the issue?

Capture from wireshark enclosed
borkedupload.txt

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jul 5, 2018

That's interesting, PUT /api/0.6/changeset/create is still being forwarded to the Rails port by lighttpd (which acts as a reverse proxy), and is not part of the changeset upload reimplementation. So this could be some kind of some lighttpd misconfiguration. A quick search yields open issues like https://redmine.lighttpd.net/issues/2156 - will check this later today in more detail.

I guess you don't encounter this issue when testing against dev.openstreetmap.org?

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Jul 5, 2018

Works ok vs dev instance etc..

It is more than slightly painful to get OkHttp to change its ways so I'm not sure if I can do A-B testing today, but I intend to asap.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jul 5, 2018

@simonpoole : I feel like chasing chunked transfer encoding issues on lighttpd might not be worth it, so I decided to set up an Apache instance on port 80 now, following the Chef cookbook. It serves as a reverse proxy for fcgi (like in the cookbook), but also forwards requests to the Rails port running locally on port 3000.

Unfortunately I cannot follow all steps in the cookbook, as I'm not really familiar with Passenger and don't know how to set that up properly. At least for JOSM and iD this doesn't seem to impact the upload: http://static.64.206.46.78.clients.your-server.de/changeset/2000000054

Could you please give this alternative URL a try: http://static.64.206.46.78.clients.your-server.de

I'm getting the following in the logs now (reponse= HTTP/500)

Started PUT "/api/0.6/changeset/create" for xxxx at 2018-07-05 20:07:44 +0200
  
Zlib::GzipFile::Error (not in gzip format):
  
config/initializers/compressed_requests.rb:35:in `initialize'
config/initializers/compressed_requests.rb:35:in `new'
config/initializers/compressed_requests.rb:35:in `decode'
config/initializers/compressed_requests.rb:17:in `call'

Looks like you've created your first successful changeset in Vespucci: http://static.64.206.46.78.clients.your-server.de/changeset/2000000055 🏆

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Jul 5, 2018

Works now, thanks. A simple diff upload completes now and seems to do the "right" thing.....

now on to test if you meticulously copied all the error messages :-)

(The error message you saw was because I was trying to convince OkHttp not to use chunking, which involves turning off gzipping and a lot of other stuff).

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Jul 5, 2018

Sigh the celebrations were a bit premature, with my standard code it now fails in the actual diff upload with the error Http header 'Content-Length' missing (which is true :-)).
borkedupload2.txt

So likely you need to handle the chunked transfer in whatever you are using for cgi-map as a server too.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jul 5, 2018

it now fails in the actual diff upload with the error Http header 'Content-Length' missing

Pretty good that we're discovering this in early testing, I was expecting some 'i don't know what i don't know' issues, that's why I set up this demo. None of the other editors I tested uses chunked encoding.

Right now CGImap is running as daemon (/usr/bin/openstreetmap-cgimap --dbname openstreetmap --username osm --socket 127.0.0.1:8000 --logfile log2 --daemon) and Apache simply points to port 8000 like in the cookbook.

There's a check for the Content-Length HTTP Header in my code, and if it's missing, it will return an error message. I think I need to take a closer look on how to deal with this. Somewhat relevant maybe: https://bz.apache.org/bugzilla/show_bug.cgi?id=53332, https://bz.apache.org/bugzilla/show_bug.cgi?id=57087

Does this stop you from testing other things now, or can you go back to the same mode as for cs 2000000055?

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Jul 5, 2018

There's no problem in testing with the "working" code, but the problem still has to be resolved: on the one hand OkHttp is "fairly" popular (as in the likely the most used HTTP library globally), on the other hand supporting HTTP 1.1 doesn't exactly fall in the newfangled stuff category.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jul 6, 2018

Strictly speaking, CGImap doesn't even talk HTTP but FastCGI protocol and Apache and its modules do all the translation between HTTP/* and FastCGI. Now the question boils down to how chunked encoding is getting mapped onto the FastCGI protocol.

I wonder if you could make your Vespucci APK file available for testing somehow. After installing the released version from Github, I figured out that I'm missing proper endpoint information for OAuth, presumably this has to be added to https://github.com/MarcusWolschon/osmeditor4android/blob/master/src/main/res/values/apis.xml.

@simonpoole

This comment has been minimized.

Copy link

simonpoole commented Jul 6, 2018

@mmd-osm sure I can make an apk available for download, I assume this should be with chunked transfers (the OAuth key config is currently static and requires a recompile, which I've naturally done).

APK https://drive.google.com/file/d/1aXcvPJ2DEDk43pqwoaPWE-mPLM7G96J3/view?usp=sharing

Note this was built from code that is undergoing some refactoring wrt presets and still has a couple of remaining issues, so it shouldn't be used for anything else than testing. You will need to add an API entry (in the Advanced preferences) for the test server with the URL http://static.64.206.46.78.clients.your-server.de/api/0.6/ and then set that as the current API instance to use.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jul 6, 2018

@simonpoole : thanks a lot. I could install the apk now and already reproduce the "Content-length missing" error.

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Jul 6, 2018

@simonpoole : I have adjusted the logic to extract the POST payload a bit and could now successfully upload a change via Vespucci: http://static.64.206.46.78.clients.your-server.de/changeset/2000000073

At this time, this fix is only available on a local build running on port 80. I will update the PPA once there are no further issues with this change. PPA is updated now: https://launchpad.net/~mmd-osm/+archive/ubuntu/cgimap-test-003

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Aug 23, 2018

@zerebubuth : now that GSoC is over (and we're still on cgimap), I'm wondering a bit what might be the best way to move forward. Deploying cgimap on dev is already triggered in a separate operations issue, so that more people will have a chance for the second round of tests.

Then there's a quite large pull request, including lots test cases for a good overall code coverage. Do you still plan to review this code in some way, or should we just merge it? I think it would be incredibly useful to have some more feedback on the code, although this will for sure be quite time consuming to go through. Thanks!

@mmd-osm

This comment has been minimized.

Copy link
Collaborator

mmd-osm commented Sep 17, 2018

Thanks to @tomhughes tireless work on the Chef repo, the changeset upload is now ready for general testing on the dev instance: https://upload.apis.dev.openstreetmap.org 🥇

Settings for JOSM
API endpoint: https://upload.apis.dev.openstreetmap.org/api
Access token key: YaTwrUFPGMlskcqc0eLmyVUxHpGabM9UP94VjcNG
Access token secret: KzpJY8IyPfI3tl9YgWdfxSmdlasSUZQIn5BJVjPI

It supports both Basic auth and OAuth, as well as uncompressed and compressed messages for osmChange upload.

Examples: https://upload.apis.dev.openstreetmap.org/user/mmd2/history

Patch for JOSM to test compressed upload
Index: src/org/openstreetmap/josm/io/OsmApi.java
===================================================================
--- src/org/openstreetmap/josm/io/OsmApi.java	(Revision 14258)
+++ src/org/openstreetmap/josm/io/OsmApi.java	(Arbeitskopie)
@@ -4,6 +4,9 @@
 import static org.openstreetmap.josm.tools.I18n.tr;
 import static org.openstreetmap.josm.tools.I18n.trn;
 
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
 import java.io.IOException;
 import java.io.PrintWriter;
 import java.io.StringReader;
@@ -21,6 +24,8 @@
 import java.util.Map;
 import java.util.function.Consumer;
 import java.util.function.Function;
+import java.util.zip.GZIPInputStream;
+import java.util.zip.GZIPOutputStream;
 
 import javax.xml.parsers.ParserConfigurationException;
 
@@ -642,6 +647,17 @@
         return sendRequest(requestMethod, urlSuffix, requestBody, monitor, true, false);
     }
 
+
+    public static byte[] compress(String data) throws IOException {
+	ByteArrayOutputStream bos = new ByteArrayOutputStream(data.length());
+	GZIPOutputStream gzip = new GZIPOutputStream(bos);
+	gzip.write(data.getBytes(StandardCharsets.UTF_8));
+	gzip.close();
+	byte[] compressed = bos.toByteArray();
+	bos.close();
+	return compressed;
+    }
+
     /**
      * Generic method for sending requests to the OSM API.
      *
@@ -682,7 +698,10 @@
                 }
 
                 if ("PUT".equals(requestMethod) || "POST".equals(requestMethod) || "DELETE".equals(requestMethod)) {
+                    System.err.println(urlSuffix);
                     client.setHeader("Content-Type", "text/xml");
+                    if (urlSuffix.endsWith("/upload"))
+                      client.setHeader("Content-Encoding", "gzip");   // TEST ONLY
                     // It seems that certain bits of the Ruby API are very unhappy upon
                     // receipt of a PUT/POST message without a Content-length header,
                     // even if the request has no payload.
@@ -689,7 +708,14 @@
                     // Since Java will not generate a Content-length header unless
                     // we use the output stream, we create an output stream for PUT/POST
                     // even if there is no payload.
-                    client.setRequestBody((requestBody != null ? requestBody : "").getBytes(StandardCharsets.UTF_8));
+
+                    if (urlSuffix.endsWith("/upload")) {
+
+                      byte[] compressedRequestBody = compress(requestBody != null ? requestBody : "");         // TEST ONLY
+                      client.setRequestBody(compressedRequestBody);                                            // TEST ONLY
+                    }
+                    else
+                      client.setRequestBody((requestBody != null ? requestBody : "").getBytes(StandardCharsets.UTF_8));
                 }
 
                 final HttpClient.Response response = client.connect();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment