Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest Europe continent database indexing taking too much time #124

Open
akki401 opened this issue Apr 1, 2024 · 11 comments
Open

Latest Europe continent database indexing taking too much time #124

akki401 opened this issue Apr 1, 2024 · 11 comments

Comments

@akki401
Copy link

akki401 commented Apr 1, 2024

Hello Team,
I am using overpass docker image(https://hub.docker.com/r/wiktorn/overpass-api) to launch the overpass docker service for Europe continent and the plant URL I am using is https://download.geofabrik.de/europe-latest.osm.bz2 which is latest up to date.
It took more than 48 hours to do indexing and still doing indexing. Find the below docker command to start the overpass docker and EC2 hardware config.
I have given diff url to update every 30 days in environment variables.
My question is when I am starting the docker image it will start with latest europe continent database but when I see the logs it is considering the updates also and start doing indexing( I see update 30 days interval). Screenshot of last few lines of latest logs
image

Docker command:
cmd = " ".join(
[
"docker",
"run",
"--restart=always", # starts docker after system reboot
"--log-driver json-file",
"--log-opt max-size=10m",
"--log-opt max-file=3",
"-e",
"OVERPASS_META=yes",
"-e",
"OVERPASS_MODE=init",
"-e",
f"OVERPASS_PLANET_URL=file:///db/{self.region}-latest.osm.bz2",
"-e",
"OVERPASS_RULES_LOAD=10", # infinite areas update process. Ex: 0-always run, 5-run 5% of the time...
"-e",
f"OVERPASS_DIFF_URL={self._updates_url}", # https://download.geofabrik.de/europe-updates
"-e",
f"OVERPASS_UPDATE_SLEEP={3600 * 24 * 30}", # update every 30 days
"-e",
"OVERPASS_STOP_AFTER_INIT=false",
"-v",
f"{self._dir_overpass}/:/db",
"-p",
f"{self._overpass_port}:80",
"-i",
"-t",
"-d",
"--name",
self._overpass_container_name,
"wiktorn/overpass-api",
]
)
EC2 config:
Instance Size: i4i.2xlarge
vCPU: 64
Instance Storage (GB): 1 x 1,875 AWS Nitro SSD
Network Bandwidth (Gbps): Up to 12
EBS Bandwidth (Gbps): Up to 10

and After very long time the status of docker is:
image

I tried mutliple times but got same errors.

Note: I am using url to a diff directory for updating the instance is https://download.geofabrik.de/europe-updates instead of https://planet.openstreetmap.org/replication/minute/ Does it cause any issue?
But other continents like northa-merica, asia are working fine with diff url https://download.geofabrik.de/north-americal-updates, https://download.geofabrik.de/asia-updates

@wiktorn
Copy link
Owner

wiktorn commented Apr 1, 2024

Hi,

I have problem understanding what the issue is. From the logs you have shared, I don't see any problem using https://download.geofabrik.de/europe-updates - and it makes applying updates easier, as you have less files to download.

Regarding updating every 30 days - if you restart the container, it resets the timer, so it is not super precise, and script starts with the update and then sleeps, so I don't see anything suspicious in what you have reported above.

@akki401
Copy link
Author

akki401 commented Apr 1, 2024

Thanks for the reply @wiktorn.
As you mentioned even I don't see any issue in logs, but the docker status is unhealthy after longtime. I run the docker multiple times but status is unhealthy.
Interesting thing is the same docker run command working fine( with update URL https://download.geofabrik.de/north-americal-updates, https://download.geofabrik.de/asia-updates) for north-america and asia.

Quite confused

@wiktorn
Copy link
Owner

wiktorn commented Apr 1, 2024

What is the error reported by healthcheck?

Can you post first 100 lines of logs after the container restart and maybe last 100 lines after few minutes of running, but excluding lines containing compute_geometry?

@akki401
Copy link
Author

akki401 commented Apr 5, 2024

I see the below log statements repeatedly seeing in logs
image

image

image

image

image

@wiktorn
Copy link
Owner

wiktorn commented Apr 5, 2024

If you sort your logs by time, it just looks like it is applying updates one by one, so it doesn't look like anything unusual.

Why do you think, that this logs are of an issue?

@akki401
Copy link
Author

akki401 commented Apr 5, 2024

Because I didn't invoke the updates diff url flag "OVERPASS_DIFF_URL" environment variable flag while running the docker.

@wiktorn
Copy link
Owner

wiktorn commented Apr 5, 2024

So how the container knows, that it needs to fetch updates from https://download.geofabrik.de/europe-updates? It's nowhere in the defaults.

@akki401
Copy link
Author

akki401 commented Apr 5, 2024

I assume if I don't pass OVERPASS_DIFF_URL then container does not look for updates. Am I wrong?
What if I assign empty to OVERPASS_DIFF_URL (OVERPASS_DIFF_URL = "") does it not look for updates?
what would be the default action?

@akki401
Copy link
Author

akki401 commented Apr 5, 2024

It took almost and more than 2 days but still indexing the database.
Is it usual or unusual?
image

Finally the docker status is "unhealthy"
image

and the last logs are:
image

@wiktorn
Copy link
Owner

wiktorn commented Apr 5, 2024

I'm not sure from where your update process has started. If it is still part of initial update during startup or it is part of update loop running alongside the daemon itself.

If it is the latter, then the container should be healthy. If it is still updating as a part of initial update, then it should be in the "starting" state.

The other thing is Europe should be taking that long to update. I do not have enough hardware to test it, but I'd expect that most of the time is spent on processing planet file and the updates should be applied pretty quickly.

From the logs you have shared it looks like it takes 30 minutes to process 30MB of updates. If that's the case, that's very slow.

The other thing that could be happening here, that for some strange reason, the updates are in infinite loop, but it's hard to tell whether this is the case or not.

@akki401
Copy link
Author

akki401 commented Apr 10, 2024

I was wondered when I see the database still indexing the DB/updating the updates even after docker health status is unhealthy.
The docker status was unhealthy at 7th April 2024 but still there are still some updates are seen in db and log file also updated(till now 10th April 2024)
what does it mean?
find the screenshots:
image

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants