Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planet state file #6

Open
zerebubuth opened this issue Feb 10, 2015 · 4 comments
Open

Planet state file #6

zerebubuth opened this issue Feb 10, 2015 · 4 comments

Comments

@zerebubuth
Copy link
Owner

It's a pain to figure out which replication state.txt file corresponds to a given planet and, now that all the current & history for both XML and PBF correspond to the same state, it would make it easier if the dump process or script would figure out the state from which replication could continue.

The dump already tracks the last timestamp in the file, and this can be used to find a state file. But there might be in-progress transactions at that point, so it will be necessary to track backwards in the state files until before all those transactions start.

@ssipos90
Copy link

Hi,
Sorry for hijacking your thread, but it's kind of related.

For the life of me if I can figure this out.

As a background, to understand my approach, we're building a private OSM server and a tiles server, part of a bigger app.
With the intention of having a small DB and new clients as up-to-date as possible with the main OSM server, we decided to import only the client's bits, not the whole country.
When a new client joins, we re-download the country.pbf, slice his bit and import it without impacting our other clients (I think).
Clients will edit the map using iD so I've setup a tiles server in sync.
Replication using osmdbt is done and I'm currently working on importing the chages using imposm or osm2pgsql and here is the tricky bit.

I can't manage to generate the correct state.txt. Using osmium fileinfo on the generated PBF shows the latest change's timestamp, but no sequenceNumber.

To summarise, when a new client joins:

  • download "country.pbf", slice his turf using a poly and import it using osmosis
  • dump the updated database to a PBF using your tool (kudos, nice work)
  • drop postgis and mapnik tiles and re-import with imposm or osm2pgsql

I'd appreciate some help, thanks :)

@zerebubuth
Copy link
Owner Author

The planet-dump-ng software only sets the current time in the PBF header, not the sequence number. This is because the planet dump is an independent process from the replication diffs and neither depends on the other. Also, there are minutely, hourly and daily replication streams and each has a different (independent) sequence number.

There are tools to synchronise a planet dump with a chosen replication stream, for example pyosmium's up-to-date tool. This works by looking at the timestamp of the planet file, rewinding a bit and replaying the diffs covering that period.

The general reason why these streams are all independent is that it previously wasn't easy to identify a linear point in time in Postgres, hence all the stuff in Osmosis' state file about txnActiveList and the xid column index in the database. More recently, Postgres made it easier to get access to the internals of the replication log, which made more robust tools like osmdbt possible and allows talking about a specific linear point in the log.

In summary; planet-dump-ng won't write the sequence number header in PBF files, you'll have to use something else (e.g: pyosmium-up-to-date) to merge replication stream info into the planet file.

Hope that helps!

@ssipos90
Copy link

ssipos90 commented Feb 23, 2021

it does, thanks for the explanation.

Edit: technically, I'd rather reset the sequence numbers every time

@ssipos90
Copy link

ssipos90 commented Mar 3, 2021

Hi,

I wanted to contribute to your project so I'm pasting our dockerfiles here. Maybe you guys need it.
Postgres version can be bumped to 12 without any hiccups, we're just not there and haven't tested it.

I removed line, might hiccup at permissions on the volume but I don't think so.

Dockerfile:

FROM debian:buster-slim

ARG PLANET_DUMP_URL=https://github.com/zerebubuth/planet-dump-ng/archive/v1.2.0.tar.gz

RUN set -eu; \
    apt-get update; \
    apt-get install -y --no-install-recommends \
      build-essential \
      autoconf \
      automake \
      ca-certificates \
      curl \
      libboost-date-time-dev \
      libboost-dev \
      libboost-filesystem-dev \
      libboost-iostreams-dev \
      libboost-program-options-dev \
      libboost-thread-dev \
      libosmpbf-dev \
      libprotobuf-dev \
      libxml2-dev \
      osmpbf-bin \
      pkg-config \
      postgresql-client-11; \
    useradd -u 999 -r planetdump; \
    mkdir /opt/build; \
    curl -sL $PLANET_DUMP_URL | tar xz -C /opt/build --strip-components=1; \
    cd /opt/build; \
    ./autogen.sh; \
    ./configure; \
    make -j $(nproc); \
    make install; \
    cd /; \
    rm -rf /opt/build; \
    mkdir /dumps; \
    chown planetdump:planetdump /dumps

COPY entrypoint /usr/local/bin/entrypoint
VOLUME /dumps
USER planetdump
WORKDIR /dumps
ENTRYPOINT ["/usr/local/bin/entrypoint"]
CMD ["bash"]

entrypoint (chmod +x)

#!/bin/sh
set -eu

PBF_FILE=${PBF_FILE:-latest.pbf}

case "$1" in
  dump)
    cd /dumps
    rm -rf users changeset* node* way* relation*
    echo "dumping OSM db"
    DUMP_FILE=$(mktemp)
    pg_dump -F custom > $DUMP_FILE
    echo "creating PBF"
    planet-dump-ng -f $DUMP_FILE -p "$PBF_FILE"
    rm -rf users changeset* node* way* relation*
  ;;
  *) exec "$@";;
esac

Edit: added missing file name, removed osmium.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants