Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teslamate fails with "Bad Gateway" after 1-4 days of normal operation while Grafana shows "404 page not found" #3793

Closed
1 task done
SemoTech opened this issue Mar 29, 2024 · 19 comments
Labels
closed:invalid This doesn't seem right note:more information needed The reporter has to provide more information undetermined Not sure if this is considered a real bug

Comments

@SemoTech
Copy link

SemoTech commented Mar 29, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

SOLVED 4/4/2024: Issue was lack of disk space due to unused docker images that needed to be purged. I also had a mistyped TZ variable.


For the last 2 updates (currently on v1.28.5) and after a few days of normal operation, Teslamate becomes unreachable and shows just "Bad Gateway". At the same time, trying to access Grafana I get "404 page not found"

I saw there was another thread with this "Bad Gateway" issue where there were reports that restarting Docker fixed it, but in my case neither restarting Docker or rebooting my server helps! Both Teslamate and Grafana are broken!

My only resort has been to revert from a snapshot taken 2 days before (while loosing all data gathered since) and if I miss to do this the next automated snapshot will be of the broken Teslamate, and I will be stuck permanently until there is a proper & permanent fix.

For reference, my OS is CentOS 7, Docker version is v26.0.0, build 2ae903e, Docker Compose is version v2.25.0 and here is my yaml file:

[root@teslamate ~]# cat docker-compose.yml 
version: '3'

services:
  teslamate:
    image: teslamate/teslamate:latest
    restart: always
    depends_on:
      - database
    environment:
      - ENCRYPTION_KEY=${TM_ENCRYPTION_KEY} 
      - DATABASE_USER=${TM_DB_USER}
      - DATABASE_PASS=${TM_DB_PASS}
      - DATABASE_NAME=${TM_DB_NAME}
      - DATABASE_HOST=database
      - MQTT_HOST=mosquitto
      - VIRTUAL_HOST=${FQDN_TM}
      - CHECK_ORIGIN=true
      - TZ={$TM_TZ}
    volumes:
      - ./import:/opt/app/import
    labels:
      - "traefik.enable=true"
      - "traefik.port=4000"
      - "traefik.http.middlewares.redirect.redirectscheme.scheme=https"
      - "traefik.http.middlewares.auth.basicauth.usersfile=/auth/.htpasswd"
      - "traefik.http.routers.teslamate-insecure.rule=Host(`${FQDN_TM}`)"
      - "traefik.http.routers.teslamate-insecure.middlewares=redirect"
      - "traefik.http.routers.teslamate-ws.rule=Host(`${FQDN_TM}`) && Path(`/live/websocket`)"
      - "traefik.http.routers.teslamate-ws.entrypoints=websecure"
      - "traefik.http.routers.teslamate-ws.tls"
      - "traefik.http.routers.teslamate.rule=Host(`${FQDN_TM}`)"
      - "traefik.http.routers.teslamate.middlewares=auth"
      - "traefik.http.routers.teslamate.entrypoints=websecure"
      - "traefik.http.routers.teslamate.tls.certresolver=tmhttpchallenge"
    cap_drop:
      - all

  database:
    image: postgres:12
    restart: always
    environment:
      - POSTGRES_USER=${TM_DB_USER}
      - POSTGRES_PASSWORD=${TM_DB_PASS}
      - POSTGRES_DB=${TM_DB_NAME}
    volumes:
      - teslamate-db:/var/lib/postgresql/data

  grafana:
    image: teslamate/grafana:latest
    restart: always
    environment:
      - DATABASE_USER=${TM_DB_USER}
      - DATABASE_PASS=${TM_DB_PASS}
      - DATABASE_NAME=${TM_DB_NAME}
      - DATABASE_HOST=database
      - GRAFANA_PASSWD=${GRAFANA_PW}
      - GF_SECURITY_ADMIN_USER=${GRAFANA_USER}
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PW}
      - GF_AUTH_BASIC_ENABLED=true
      - GF_AUTH_ANONYMOUS_ENABLED=false
      - GF_SERVER_ROOT_URL=https://${FQDN_GRAFANA}
      - GF_SMTP_ENABLED=true
      - GF_SMTP_FROM_NAME=${GF_SMTPNAME}
      - GF_SMTP_FROM_ADDRESS=${GF_SMTPSENDER}
      - GF_SMTP_HOST=${GF_SMTPHOST}                      
      - GF_SMTP_USER=${GF_SMTPUSER}                   
      - GF_SMTP_PASSWORD=${GF_SMTPPASS}
      - GF_SMTP_SKIP_VERIFY=false
    volumes:
      - teslamate-grafana-data:/var/lib/grafana
    labels:
      - 'traefik.enable=true'
      - 'traefik.port=3000'
      - "traefik.http.middlewares.redirect.redirectscheme.scheme=https"
      - "traefik.http.routers.grafana-insecure.rule=Host(`${FQDN_GRAFANA}`)"
      - "traefik.http.routers.grafana-insecure.middlewares=redirect"
      - "traefik.http.routers.grafana.rule=Host(`${FQDN_GRAFANA}`)"
      - "traefik.http.routers.grafana.entrypoints=websecure"
      - "traefik.http.routers.grafana.tls.certresolver=tmhttpchallenge"

  mosquitto:
    image: eclipse-mosquitto:1.6
    restart: always
    ports:
      - 127.0.0.1:1883:1883
    volumes:
      - mosquitto-conf:/mosquitto/config
      - mosquitto-data:/mosquitto/data

  proxy:
    image: traefik:v2.1
    restart: always
    command:
      - "--global.sendAnonymousUsage=false"
      - "--providers.docker"
      - "--providers.docker.exposedByDefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.tmhttpchallenge.acme.httpchallenge=true"
      - "--certificatesresolvers.tmhttpchallenge.acme.httpchallenge.entrypoint=web"
      - "--certificatesresolvers.tmhttpchallenge.acme.email=${LETSENCRYPT_EMAIL}"
      - "--certificatesresolvers.tmhttpchallenge.acme.storage=/etc/acme/acme.json"
    ports:
      - 80:80
      - 443:443
    volumes:
      - ./.htpasswd:/auth/.htpasswd
      - ./acme/:/etc/acme/
      - /var/run/docker.sock:/var/run/docker.sock:ro

volumes:
    teslamate-db:
    teslamate-grafana-data:
    mosquitto-conf:
    mosquitto-data:

[root@teslamate ~]# 

Expected Behavior

Normal operaiton

Steps To Reproduce

Deploy latest teslamate, wait 1-4 days...

Relevant log output

Sanitized Output:

[root@teslamate ~]# docker compose logs teslamate
WARN[0000] /root/docker-compose.yml: `version` is obsolete 
root_teslamate_1  | 2024-03-25 20:55:28.555 [info] Migrations already up
root_teslamate_1  | 2024-03-25 20:55:32.384 [info] No tzdata release files found in custom data dir. Copying release file from tzdata priv dir.
root_teslamate_1  | 2024-03-25 20:55:32.505 [info] System Info: Erlang/OTP 26 (jit)
root_teslamate_1  | 2024-03-25 20:55:32.505 [info] Version: 1.28.5
root_teslamate_1  | 2024-03-25 20:55:32.872 [info] POST https://auth.tesla.com/oauth2/v3/token -> 200 (288.909 ms)
root_teslamate_1  | 2024-03-25 20:55:32.872 [info] Refreshed api tokens
root_teslamate_1  | 2024-03-25 20:55:32.878 [info] Scheduling token refresh in 6 h
root_teslamate_1  | 2024-03-25 20:55:32.885 [info] Running TeslaMateWeb.Endpoint with cowboy 2.10.0 at :::4000 (http)
root_teslamate_1  | 2024-03-25 20:55:32.888 [info] Access TeslaMateWeb.Endpoint at http://teslamate.semotech.com
root_teslamate_1  | 2024-03-25 20:55:33.317 [info] Starting logger for 'CAR1'
root_teslamate_1  | 2024-03-25 20:55:33.467 [info] Starting logger for 'CAR2'
root_teslamate_1  | 2024-03-25 20:55:33.490 [info] Starting logger for 'CAR3'
root_teslamate_1  | 2024-03-25 20:55:33.748 [info] MQTT connection has been established
root_teslamate_1  | 2024-03-25 20:55:34.567 car_id=2 [info] Start / :offline
root_teslamate_1  | 2024-03-25 20:55:34.567 car_id=1 [info] Start / :asleep
root_teslamate_1  | 2024-03-25 20:55:35.024 car_id=3 [info] Start / :asleep
root_teslamate_1  | 2024-03-25 20:55:35.361 [info] GET /
root_teslamate_1  | 2024-03-25 20:55:35.387 [info] Sent 200 in 25ms
root_teslamate_1  | 2024-03-25 20:55:35.748 [info] tzdata release in place is from a file last modified Fri, 22 Oct 2021 02:20:47 GMT. Release file on server was last modified Thu, 01 Feb 2024 18:40:48 GMT.
root_teslamate_1  | 2024-03-25 20:55:37.946 [info] Tzdata has updated the release from 2021e to 2024a
root_teslamate_1  | 2024-03-25 20:55:38.812 [info] GET /settings
root_teslamate_1  | 2024-03-25 20:55:39.915 [info] Sent 200 in 1103ms
root_teslamate_1  | 2024-03-25 20:55:44.687 [info] GET https://owner-api.teslamotors.com/api/1/vehicles/101010101010101010 -> 429 (97.325 ms)
root_teslamate_1  | 2024-03-25 20:55:44.687 [warning] TeslaApi.Error / "Retry later\n"
root_teslamate_1  | 2024-03-25 20:55:44.687 car_id=2 [error] Error / :unknown
root_teslamate_1  | 2024-03-25 20:55:45.131 [info] GET https://owner-api.teslamotors.com/api/1/vehicles/010101010101010101 -> 429 (87.187 ms)
root_teslamate_1  | 2024-03-25 20:55:45.131 [warning] TeslaApi.Error / "Retry later\n"
root_teslamate_1  | 2024-03-25 20:55:45.131 car_id=3 [error] Error / :unknown
root_teslamate_1  | 2024-03-25 20:55:49.696 [info] GET /
root_teslamate_1  | 2024-03-25 20:55:49.708 [info] Sent 200 in 11ms
root_teslamate_1  | 2024-03-25 20:58:16.047 [info] GET /settings
root_teslamate_1  | 2024-03-25 20:58:16.057 [info] Sent 200 in 10ms
root_teslamate_1  | 2024-03-25 20:58:21.087 [info] GET /
root_teslamate_1  | 2024-03-25 20:58:21.093 [info] Sent 200 in 5ms
root_teslamate_1  | 2024-03-25 21:18:29.864 [info] GET /
root_teslamate_1  | 2024-03-25 21:18:29.876 [info] Sent 200 in 11ms
root_teslamate_1  | 2024-03-25 21:36:16.325 [info] GET /
root_teslamate_1  | 2024-03-25 21:36:16.331 [info] Sent 200 in 6ms
root_teslamate_1  | 2024-03-25 21:51:18.340 car_id=3 [info] Start / :online
root_teslamate_1  | 2024-03-25 21:51:18.379 car_id=3 [info] Connecting ...
root_teslamate_1  | 2024-03-25 21:54:20.832 car_id=3 [info] Suspending logging
root_teslamate_1  | 2024-03-25 22:04:30.668 car_id=3 [info] Fetching vehicle state ...
root_teslamate_1  | 2024-03-25 22:04:30.811 car_id=3 [info] Start / :asleep
root_teslamate_1  | 2024-03-25 22:04:30.853 car_id=3 [info] Disconnecting ...
root_teslamate_1  | 2024-03-25 22:18:54.073 [warning] GET https://owner-api.teslamotors.com/api/1/vehicles/11001100110011 -> 500 (5154.185 ms)
root_teslamate_1  | 2024-03-25 22:18:54.074 [warning] TeslaApi.Error / "<!DOCTYPE html>\n<html>\n<head>\n  <title>We're sorry, but something went wrong (500)</title>\n  <style type=\"text/css\">\n    body { background-color: #fff; color: #666; text-align: center; font-family: arial, sans-serif; }\n    div.dialog {\n      width: 25em;\n      padding: 0 4em;\n      margin: 4em auto 0 auto;\n      border: 1px solid #ccc;\n      border-right-color: #999;\n      border-bottom-color: #999;\n    }\n    h1 { font-size: 100%; color: #f00; line-height: 1.5em; }\n  </style>\n</head>\n\n<body>\n  <!-- This file lives in public/500.html -->\n  <div class=\"dialog\">\n    <h1>We're sorry, but something went wrong.</h1>\n  </div>\n</body>\n</html>\n"
root_teslamate_1  | 2024-03-25 22:18:54.074 car_id=3 [error] Error / :unknown
root_teslamate_1  | 2024-03-25 22:34:04.204 car_id=3 [info] Start / :online
root_teslamate_1  | 2024-03-25 22:34:04.249 car_id=3 [info] Connecting ...
root_teslamate_1  | 2024-03-25 22:37:07.583 car_id=3 [info] Suspending logging
root_teslamate_1  | 2024-03-25 23:06:38.778 car_id=3 [info] Fetching vehicle state ...
root_teslamate_1  | 2024-03-25 23:06:38.890 car_id=3 [info] Start / :asleep
root_teslamate_1  | 2024-03-25 23:06:38.898 car_id=3 [info] Disconnecting ...
root_teslamate_1  | 2024-03-25 23:41:34.260 car_id=3 [info] Start / :online
root_teslamate_1  | 2024-03-25 23:41:34.277 car_id=3 [info] Connecting ...
...
root_teslamate_1  | 2024-03-27 21:04:54.580 [error] GenServer #PID<0.90374.0> terminating
root_teslamate_1  | ** (ArgumentError) time_zone_not_found
root_teslamate_1  |     (timex 3.7.11) lib/format/datetime/formatter.ex:55: Timex.Format.DateTime.Formatter.lformat!/4
root_teslamate_1  |     (teslamate 1.28.5) lib/teslamate_web/live/car_live/summary.html.heex:221: anonymous fn/2 in TeslaMateWeb.CarLive.Summary.render/1
root_teslamate_1  |     (phoenix_live_view 0.18.18) lib/phoenix_live_view/diff.ex:398: Phoenix.LiveView.Diff.traverse/7
root_teslamate_1  |     (phoenix_live_view 0.18.18) lib/phoenix_live_view/diff.ex:544: anonymous fn/4 in Phoenix.LiveView.Diff.traverse_dynamic/7
root_teslamate_1  |     (elixir 1.16.1) lib/enum.ex:2528: Enum."-reduce/3-lists^foldl/2-0-"/3
root_teslamate_1  |     (phoenix_live_view 0.18.18) lib/phoenix_live_view/diff.ex:373: Phoenix.LiveView.Diff.traverse/7
root_teslamate_1  |     (phoenix_live_view 0.18.18) lib/phoenix_live_view/diff.ex:139: Phoenix.LiveView.Diff.render/3
root_teslamate_1  |     (phoenix_live_view 0.18.18) lib/phoenix_live_view/channel.ex:833: Phoenix.LiveView.Channel.render_diff/3
...
root_teslamate_1  | 2024-03-29 06:22:40.574 [warning] Too many disconnects from streaming API
root_teslamate_1  | 2024-03-29 06:22:40.574 car_id=2 [info] Creating new connection … 
root_teslamate_1  | 2024-03-29 06:22:40.574 car_id=2 [info] Disconnecting ...
root_teslamate_1  | 2024-03-29 06:22:40.580 car_id=2 [info] Connecting ...
root_teslamate_1  | 2024-03-29 06:29:14.346 [warning] Too many disconnects from streaming API
root_teslamate_1  | 2024-03-29 06:29:14.346 car_id=2 [info] Creating new connection … 
root_teslamate_1  | 2024-03-29 06:29:14.346 car_id=2 [info] Disconnecting ...
root_teslamate_1  | 2024-03-29 06:29:14.352 car_id=2 [info] Connecting ...
root_teslamate_1  | 2024-03-29 06:35:45.213 [warning] Too many disconnects from streaming API
root_teslamate_1  | 2024-03-29 06:35:45.213 car_id=2 [info] Creating new connection … 
root_teslamate_1  | 2024-03-29 06:35:45.213 car_id=2 [info] Disconnecting ...
root_teslamate_1  | 2024-03-29 06:35:45.220 car_id=2 [info] Connecting ...
root_teslamate_1  | 2024-03-29 06:42:18.318 [warning] Too many disconnects from streaming API
root_teslamate_1  | 2024-03-29 06:42:18.318 car_id=2 [info] Creating new connection … 
root_teslamate_1  | 2024-03-29 06:42:18.318 car_id=2 [info] Disconnecting ...
root_teslamate_1  | 2024-03-29 06:42:18.324 car_id=2 [info] Connecting ...
root_teslamate_1  | 2024-03-29 06:48:50.653 [warning] Too many disconnects from streaming API
root_teslamate_1  | 2024-03-29 06:48:50.653 car_id=2 [info] Creating new connection … 
root_teslamate_1  | 2024-03-29 06:48:50.653 car_id=2 [info] Disconnecting ...
root_teslamate_1  | 2024-03-29 06:48:50.659 car_id=2 [info] Connecting ...
root_teslamate_1  | 2024-03-29 06:55:21.497 [warning] Too many disconnects from streaming API
root_teslamate_1  | 2024-03-29 06:55:21.498 car_id=2 [info] Creating new connection … 
root_teslamate_1  | 2024-03-29 06:55:21.498 car_id=2 [info] Disconnecting ...
root_teslamate_1  | 2024-03-29 06:55:21.504 car_id=2 [info] Connecting ...
root_teslamate_1  | 2024-03-29 07:01:54.508 [warning] Too many disconnects from streaming API
root_teslamate_1  | 2024-03-29 07:01:54.508 car_id=2 [info] Creating new connection … 
root_teslamate_1  | 2024-03-29 07:01:54.508 car_id=2 [info] Disconnecting ...
root_teslamate_1  | 2024-03-29 07:01:54.516 car_id=2 [info] Connecting ...
root_teslamate_1  | 2024-03-29 07:04:20.741 [error] Postgrex.Protocol (#PID<0.2452.0>) disconnected: ** (DBConnection.ConnectionError) tcp recv (idle): closed
root_teslamate_1  | 2024-03-29 07:04:20.741 [error] Postgrex.Protocol (#PID<0.2451.0>) disconnected: ** (DBConnection.ConnectionError) tcp recv (idle): closed
root_teslamate_1  | 2024-03-29 07:04:20.741 [error] Postgrex.Protocol (#PID<0.2455.0>) disconnected: ** (DBConnection.ConnectionError) tcp recv (idle): closed
root_teslamate_1  | 2024-03-29 07:04:20.741 [error] Postgrex.Protocol (#PID<0.2453.0>) disconnected: ** (DBConnection.ConnectionError) tcp recv (idle): closed
root_teslamate_1  | 2024-03-29 07:04:20.741 [error] Postgrex.Protocol (#PID<0.2449.0>) disconnected: ** (DBConnection.ConnectionError) tcp recv (idle): closed
root_teslamate_1  | 2024-03-29 07:04:20.741 [error] Postgrex.Protocol (#PID<0.2454.0>) disconnected: ** (DBConnection.ConnectionError) tcp recv (idle): closed
root_teslamate_1  | 2024-03-29 07:04:20.741 [error] Postgrex.Protocol (#PID<0.2445.0>) disconnected: ** (DBConnection.ConnectionError) tcp recv (idle): closed
root_teslamate_1  | 2024-03-29 07:04:20.741 [error] Postgrex.Protocol (#PID<0.2450.0>) disconnected: ** (DBConnection.ConnectionError) tcp recv (idle): closed
root_teslamate_1  | 2024-03-29 07:04:20.741 [error] Postgrex.Protocol (#PID<0.2448.0>) disconnected: ** (DBConnection.ConnectionError) tcp recv (idle): closed
root_teslamate_1  | 2024-03-29 07:04:20.741 [error] Postgrex.Protocol (#PID<0.2444.0>) disconnected: ** (DBConnection.ConnectionError) tcp recv (idle): closed
root_teslamate_1  | 2024-03-29 07:04:20.764 [error] Postgrex.Protocol #PID<0.2451.0> could not cancel backend: tcp connect: connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.764 [error] Postgrex.Protocol #PID<0.2445.0> could not cancel backend: tcp connect: connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.764 [error] Postgrex.Protocol #PID<0.2455.0> could not cancel backend: tcp connect: connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.764 [error] Postgrex.Protocol #PID<0.2453.0> could not cancel backend: tcp connect: connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.764 [error] Postgrex.Protocol #PID<0.2449.0> could not cancel backend: tcp connect: connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.764 [error] Postgrex.Protocol #PID<0.2452.0> could not cancel backend: tcp connect: connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.764 [error] Postgrex.Protocol #PID<0.2454.0> could not cancel backend: tcp connect: connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.764 [error] Postgrex.Protocol #PID<0.2450.0> could not cancel backend: tcp connect: connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.764 [error] Postgrex.Protocol #PID<0.2448.0> could not cancel backend: tcp connect: connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.764 [error] Postgrex.Protocol #PID<0.2444.0> could not cancel backend: tcp connect: connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.777 [error] Postgrex.Protocol (#PID<0.2451.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (database:5432): connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.777 [error] Postgrex.Protocol (#PID<0.2445.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (database:5432): connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.777 [error] Postgrex.Protocol (#PID<0.2453.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (database:5432): connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.777 [error] Postgrex.Protocol (#PID<0.2450.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (database:5432): connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.777 [error] Postgrex.Protocol (#PID<0.2455.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (database:5432): connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.777 [error] Postgrex.Protocol (#PID<0.2449.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (database:5432): connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.777 [error] Postgrex.Protocol (#PID<0.2452.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (database:5432): connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.777 [error] Postgrex.Protocol (#PID<0.2454.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (database:5432): connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.777 [error] Postgrex.Protocol (#PID<0.2448.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (database:5432): connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:20.777 [error] Postgrex.Protocol (#PID<0.2444.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (database:5432): connection refused - :econnrefused
root_teslamate_1  | 2024-03-29 07:04:21.810 [error] Postgrex.Protocol (#PID<0.2445.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (database:5432): non-existing domain - :nxdomain
...
root_teslamate_1  | 2024-03-29 07:55:33.603 [info] GET https://owner-api.teslamotors.com/api/1/products -> 429 (96.745 ms)
root_teslamate_1  | 2024-03-29 07:55:33.604 [warning] TeslaApi.Error / "Retry later\n"
root_teslamate_1  | 2024-03-29 07:55:33.604 [warning] Could not get vehicles: :unknown
root_teslamate_1  | 2024-03-29 07:55:35.506 [error] GenServer TeslaMate.Repair terminating
root_teslamate_1  | ** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 1770ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
root_teslamate_1  | 
root_teslamate_1  |   1. Ensuring your database is available and that you can connect to it
root_teslamate_1  |   2. Tracking down slow queries and making sure they are running fast enough
root_teslamate_1  |   3. Increasing the pool_size (although this increases resource consumption)
root_teslamate_1  |   4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
root_teslamate_1  | 
root_teslamate_1  | See DBConnection.start_link/2 for more information
root_teslamate_1  | 
root_teslamate_1  |     (ecto_sql 3.10.2) lib/ecto/adapters/sql.ex:1047: Ecto.Adapters.SQL.raise_sql_call_error/1
root_teslamate_1  |     (ecto_sql 3.10.2) lib/ecto/adapters/sql.ex:945: Ecto.Adapters.SQL.execute/6
root_teslamate_1  |     (ecto 3.10.3) lib/ecto/repo/queryable.ex:229: Ecto.Repo.Queryable.execute/4
root_teslamate_1  |     (ecto 3.10.3) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
root_teslamate_1  |     (teslamate 1.28.5) lib/teslamate/repair.ex:58: TeslaMate.Repair.handle_cast/2
root_teslamate_1  |     (stdlib 5.2) gen_server.erl:1121: :gen_server.try_handle_cast/3
root_teslamate_1  |     (stdlib 5.2) gen_server.erl:1183: :gen_server.handle_msg/6
root_teslamate_1  |     (stdlib 5.2) proc_lib.erl:241: :proc_lib.init_p_do_apply/3
root_teslamate_1  | Last message: {:"$gen_cast", :repair}
root_teslamate_1  | 2024-03-29 07:55:35.509 [info] Shutting down 8 sockets in 1 rounds of 2000ms
root_teslamate_1  | 2024-03-29 07:55:35.520 [notice] Application teslamate exited: shutdown
root_teslamate_1  | Kernel pid terminated (application_controller) ("{application_terminated,teslamate,shutdown}")
root_teslamate_1  | 
root_teslamate_1  | Crash dump is being written to: erl_crash.dump...done
root_teslamate_1  | waiting for postgres at database:5432
root_teslamate_1  | waiting for postgres at database:5432
root_teslamate_1  | waiting for postgres at database:5432
root_teslamate_1  | waiting for postgres at database:5432
root_teslamate_1  | waiting for postgres at database:5432
root_teslamate_1  | waiting for postgres at database:5432
root_teslamate_1  | waiting for postgres at database:5432
root_teslamate_1  | waiting for postgres at database:5432
...

Screenshots

No response

Additional data

No response

Type of installation

Docker

Version

v1.28.5

@JakobLichterfeld
Copy link
Collaborator

JakobLichterfeld commented Mar 30, 2024

Thanks for reporting.

root_teslamate_1  | ** (ArgumentError) time_zone_not_found

Your TZ Environment variable is not valid.

Even if unreleated: In addition your postgres version is very outdated. Please do a full backup and follow https://docs.teslamate.org/docs/maintenance/upgrading_postgres

Check your trafik config, as the unreachable issue is related to your config imo

@JakobLichterfeld JakobLichterfeld added closed:invalid This doesn't seem right undetermined Not sure if this is considered a real bug note:more information needed The reporter has to provide more information labels Mar 30, 2024
@SemoTech
Copy link
Author

SemoTech commented Mar 30, 2024

@JakobLichterfeld thank you kindly for the reply and feedback as well as flagging that I am running an older Postgres version. What version # should I use?

Now, while will be sure to backup the DB and update Postgres, can you please explain how it is possible that Teslamate, trafik, the "TZ Environment" and "old" Postgres work together just fine for 2-4 days when I restore from last backup, and then just days later I get the "Bad Gateway" error?

Here is a redacted image of my .env file:
image
If the issue was with my components or configuration, the latest Teslamate would not work at all, right?

Something is happening after it runs just fine for a few days and this all started about 2 Teslamate version updates ago, right before Docker needed to be updated to v20+

Thank you.

@JakobLichterfeld
Copy link
Collaborator

JakobLichterfeld commented Mar 30, 2024

What version # should I use?
Actually 15, see https://docs.teslamate.org/docs/installation/docker/

The correct name of the Time Zone Environemt variable is TZ, see:
https://docs.teslamate.org/docs/configuration/environment_variables

TZ was not really used in old versions, #3646 changed that since 1.28.3

In #3678 people had same issue with incorrect TZ variable name used

@SemoTech
Copy link
Author

SemoTech commented Mar 30, 2024

Thanks @JakobLichterfeld but I am using the correct timezone variable. Here is an excerpt of my YML config file:
image
The "TM_TZ" entry variable reference is from the .env file as a variable name for "America/New_York". So TeslaMate is reading the "TZ" value, which is set from the "TM_TZ" value. Is that not correct?

Now again, the mystery is why is it working fine for a few days, after a restore from backup?

I actually just new had to restore from backup again, since it was showing "Bad Gateway". Now it again works fine with TM v1.28.5 but noticed something new: it is possible the core issue may be that it is running out of disk space!

I issued a "df -h" and got this:
image
Since there are 25 GB of overall space, something is eating it all up!

Again, even with 0 free space it is working now, but I bet it will crash again in a day or two unless I figure out how to free up some space...

To solve this I 1st ran "curl -Ls http://bit.ly/clean-centos-disk-space | sudo bash" but it only cleaned up about 500MB

Digging deeper, here is the output from "docker system df":
image
So looks like a lot of orphaned images from 3 years of docker-compose pull updates...
I therefore ran "docker image prune -f" and got: Total reclaimed space: 16.42GB
image
Now the system has 18GB of free disk space and still runs fine.
image

If the core issue was indeed with lack of space, the current running restored backup of TM v1.28.5 should now run without fail for more than a few days. Let's see...

@JakobLichterfeld
Copy link
Collaborator

Yeah sorry, your TZ seams to be right, but log shows different.

Yeah a docker system prune or similar to delete old images is needed from time to time

Anyways, as I first said, not a TeslaMate issue

@SemoTech
Copy link
Author

SemoTech commented Mar 30, 2024

Thanks @JakobLichterfeld Interesting that you confirm TZ is setup correctly, yet the log still shows an issue.
Is that a TM issue, and how do we fix that if impactful?

Also, I attempted the Postgres version update from v12 to v15 (as per here), made a DB backup and DB restore as per here), but it failed to come back up!
So I had to restore from my last good system backup image (after docker cleanup to free up space) again...

Let's see if it stays active now for more than a few days...

@JakobLichterfeld
Copy link
Collaborator

As you do not use official Installation, nothing is confirmed 😀
Name of TZ is correct.

If disk is full weired symtopms may apply, do not se an TeslaMate issue here, as wrote before.

@SemoTech
Copy link
Author

SemoTech commented Mar 30, 2024

As you do not use official Installation, nothing is confirmed 😀 Name of TZ is correct.

If disk is full weired symtopms may apply, do not se an TeslaMate issue here, as wrote before.

Sorry @JakobLichterfeld but could you please elaborate on what do you mean that I am not using the "official installation"?

I am using the "Advanced installation with Traefik, Let's Encrypt & HTTP Basic Auth" (as per HERE), and have been using it for 3 years!

@brianmay
Copy link
Collaborator

There are numerous possible ways you can configure Teslamate to start. As much as we would like to, we are not able to provide support for every single user. Which doesn't mean there is anything wrong with what you are doing, just that our ability to help you is limited.

Having said that I notice the line above is:

- TZ={$TM_TZ}

Shouldn't that be:

- TZ=${TM_TZ}

?

@SemoTech
Copy link
Author

SemoTech commented Mar 30, 2024

There are numerous possible ways you can configure Teslamate to start. As much as we would like to, we are not able to provide support for every single user. Which doesn't mean there is anything wrong with what you are doing, just that our ability to help you is limited.

Having said that I notice the line above is:

- TZ={$TM_TZ}

Shouldn't that be:

- TZ=${TM_TZ}

?

Good catch @brianmay thank you. I have fixed
- TZ={$TM_TZ}
to be the correct
- TZ=${TM_TZ}

As for my configuration, it is "stock" and the advanced config in the manual.

Only thing that changed in the last 3Y was the version of Docker and the version of Postgres DB. Nothing specific to me.

@ogezue
Copy link

ogezue commented Apr 3, 2024

Same here. Disk space is fine, Grafana is working but TeslaMate gives me a 404.

@cwanja
Copy link
Collaborator

cwanja commented Apr 3, 2024

What version Docker are those impacted running? See #3754 for more details. There was also a comment about adding a line to your YML file in the following comment.

@SemoTech
Copy link
Author

SemoTech commented Apr 3, 2024

What version Docker are those impacted running? See #3754 for more details. There was also a comment about adding a line to your YML file in the following comment.

Hi @cwanja I believe my issue was old Docker 19 followed by lack of disk space, since after the last full system image restore on 3/30/2024 the system is still up. Today is the 4th day still up, so if it works tomorrow as well then lack of space was the issue. Purging stale images in Docker did the trick for me.

I also updated Docker to v26 (see here) but failed to update Postgres DB from 12 to 14 so still running PG DB 12. Hopefully that does not become an issue as well.

@cwanja
Copy link
Collaborator

cwanja commented Apr 3, 2024

Thanks @SemoTech. Please close the issue whenever you feel it's completed.

@SemoTech
Copy link
Author

SemoTech commented Apr 3, 2024

Thanks @SemoTech. Please close the issue whenever you feel it's completed.

Of course @cwanja, will do.
Just FYI, I did not have to modify my config YML file, I assume others did since they are not running Docker 26...

@JakobLichterfeld
Copy link
Collaborator

JakobLichterfeld commented Apr 4, 2024

Same here. Disk space is fine, Grafana is working but TeslaMate gives me a 404.

Thanks for reporting, if your disk space is fine, than it is not the same issue, please open a seperate one (if you are using docker-compose v1, plase migrate to docker compose v2 and rebuid your docker-compose stack).

@SemoTech
Copy link
Author

SemoTech commented Apr 4, 2024

Thanks @SemoTech. Please close the issue whenever you feel it's completed.

My Teslamate and Grafana still work fine today so I believe the space issue was the core cause. I am closing this case.

Thanks again for all the help in troubleshooting and keep up the great work in Teslamate!

@SemoTech SemoTech closed this as completed Apr 4, 2024
@ogezue
Copy link

ogezue commented Apr 4, 2024

@SemoTech Can you provide instructions how you upgraded your system? The official guide for Debian is not working for me, I'm hosting TM on Google cloud.

@SemoTech
Copy link
Author

SemoTech commented Apr 4, 2024

@SemoTech Can you provide instructions how you upgraded your system? The official guide for Debian is not working for me, I'm hosting TM on Google cloud.

My issue was running out of space. I ran the docker command (docker system prune) to remove orphaned images and freed up over 18GB pf space.
As for Docker, I made sure I had a backup and then followed [this tutorial](this tutorial) for my CentOS 7 system, your mileage may vary but hope this helps you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed:invalid This doesn't seem right note:more information needed The reporter has to provide more information undetermined Not sure if this is considered a real bug
Projects
None yet
Development

No branches or pull requests

5 participants