Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gafana errors after importing teslamate postgres database #3976

Closed
1 task done
SemoTech opened this issue Jun 19, 2024 · 14 comments
Closed
1 task done

Gafana errors after importing teslamate postgres database #3976

SemoTech opened this issue Jun 19, 2024 · 14 comments
Labels
note:more information needed The reporter has to provide more information undetermined Not sure if this is considered a real bug

Comments

@SemoTech
Copy link

SemoTech commented Jun 19, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

SOLVED! - SEE HERE: #3976 (comment)

Failed attempt to "move" (backup and restore) a full teslamate database between two different hosts (servers)!

  • The Source host was using CentOS 7 and an older postgres:12 database and was configured for separate FQDN's:
    teslamate.example.com & garfana.example.com

  • The Target host was setup with Ubuntu Server 22.04 and the newest postgres:16 database, as well as one FQDN with a subpath:
    tm.example.com & tm.example.com/grafana

  • Both hosts confirmed running latest Docker (v26.0.0 / v26.1.4) and Docker Compose (vv2.25.0 / v2.27.1) versions.

These are the steps taken with great care and with no errors occurred in the import process, only in Grafana when accessing the imported data:

A - Ensured teslamate on Source host was running properly and was recording data from the Tesla API.
B - Upgraded Source host teslamate from v1.29.1 to latest current version of v1.29.2
C - Exported Source host teslamate postgres:12 database into a file without any errors using:
docker compose exec -T database pg_dump -U teslamate teslamate > teslamate.bck
D - Deployed a fresh teslamate v1.29.2 environment on the Target host, using postgres:16 and single FQDN for Grafana as tm.example.com/grafana
E - Prepared Target host for database import, as follows:
E1 - Stopped the new teslamate container
E2 - Accessed new postgres:16 DB and issued cleanup commands as per HERE (with no errors) using:
docker exec -i teslamate-postgres /bin/bash -c "PGPASSWORD=<TM_DB_PASS> psql --username teslamate teslamate" << .
Result was:

> drop schema public cascade;
> create schema public;
> create extension cube;
> create extension earthdistance;
> CREATE OR REPLACE FUNCTION public.ll_to_earth(float8, float8)
> RETURNS public.earth
> LANGUAGE SQL
> IMMUTABLE STRICT
> PARALLEL SAFE
> AS 'SELECT public.cube(public.cube(public.cube(public.earth()*cos(radians(\$1))*cos(radians(\$2))),public.earth()*cos(radians(\$1))*sin(radians(\$2))),public.earth()*sin(radians(\$1)))::public.earth';
> .
NOTICE:  drop cascades to 25 other objects
DETAIL:  drop cascades to extension cube
drop cascades to extension earthdistance
drop cascades to type billing_type
drop cascades to type range
drop cascades to type states_status
drop cascades to type unit_of_length
drop cascades to type unit_of_pressure
drop cascades to type unit_of_temperature
drop cascades to function convert_celsius(numeric,text)
drop cascades to function convert_km(numeric,text)
drop cascades to function convert_m(double precision,text)
drop cascades to function convert_tire_pressure(numeric,character varying)
drop cascades to table addresses
drop cascades to table car_settings
drop cascades to table cars
drop cascades to table charges
drop cascades to table charging_processes
drop cascades to table drives
drop cascades to table geofences
drop cascades to table positions
drop cascades to table schema_migrations
drop cascades to table settings
drop cascades to table states
drop cascades to table tokens
drop cascades to table updates
DROP SCHEMA
CREATE SCHEMA
CREATE EXTENSION
CREATE EXTENSION
CREATE FUNCTION

F - Imported database backup file copied to Target host (no errors) using:
docker exec -i teslamate-postgres /bin/bash -c "PGPASSWORD=<TM_DB_PASS> psql --username teslamate teslamate" < teslamate.bck
Results can be seen HERE
G - Started new teslamate container successfully
H - Accessed tm.example.com with no Tesla token prompt shown, so it means the original token imported correctly!
I - Manually changed the access paths under teslamate's Settings Page for the cars
From:
image

To:
image

J - Attempted to access the Dashboards which opened up Grafana correctly and using the new path of tm.example.com/grafana, but there is no actual data shown only errors like this:

image

Followed moments later by a Grafana "Failed to Fetch" Error notice:
image

K - Checked all containers log files and only errors were in the grafana container logs, see HERE

L - As a last resort used the pgAdmin manager tool to access the teslamate postgres:16 database, and confirmed the data imported correctly (cars were there, and so was old data), yet something is still broken!

image

QUESTIONS:

1 - Since the Target host had a fresh / newly deployed postgres:16 database, are the specific cleanup restore instructions outlined HERE to "Drop existing data and reinitialize" as per previous step "E2" still needed on a fresh setup before a backup file import is performed?

2 - Any other adjustments needed on the backup data file prior to importing, due to differences in the postgres:12 to postgres:16 versions?

3 - Before and After the database import, are there any other changes or database commands needed to properly display the historical data in Grafana without errors?

ANY IDEAS HOW TO FIX?

Thank you.

Expected Behavior

Grafana should render the imported data

Steps To Reproduce

See steps in description

Relevant log output

See logs links in description

Screenshots

see screenshots inline in description

Additional data

No response

Type of installation

Docker

Version

v1.29.2

@JakobLichterfeld
Copy link
Collaborator

1 - Since the Target host had a fresh / newly deployed postgres:16 database, are the specific cleanup restore instructions outlined HERE to "Drop existing data and reinitialize" as per previous step "E2" still needed on a fresh setup before a backup file import is performed?

Yes, thats why they are stated in the docs.

2 - Any other adjustments needed on the backup data file prior to importing, due to differences in the postgres:12 to postgres:16 versions?

As Postgres12 is outdated since years, we can not support such bug version jumps, but there had been succesful migrations in the past from other users, so it should be possible. Best ist to do the migration on old system, then import to new system.

3 - Before and After the database import, are there any other changes or database commands needed to properly display the historical data in Grafana without errors?

https://docs.teslamate.org/docs/maintenance/upgrading_postgres

@JakobLichterfeld JakobLichterfeld added undetermined Not sure if this is considered a real bug note:more information needed The reporter has to provide more information labels Jun 19, 2024
@SemoTech
Copy link
Author

SemoTech commented Jun 19, 2024

Thank you for the prompt reply @JakobLichterfeld.

If it's not too much trouble, could you please be so kind as to keep this ticket Open until there is some resolution? I spent a lot of hours troubleshooting and putting this ticket together, with all the details and logs, and it is very disconcerting to see it closed in a minute and without any resolution.

Since teslamate is about 5 years old, other users here may have run into this or similar issues with DB upgrades & migrations, therefore someone may have solutions or ideas they can share, this is why I provide so much detail. This in turn will help the teslamate community, provided the ticket is not closed before an actual resolution...

Now, just to recap:

1 - You said that the "Drop existing data and reinitialize" process is STILL needed even with importing a backup into a fresh/newly spun-up database, which is what I did, but it is good to know since that was not clear in the docs, as they only cover an in-place upgrade.

2 - Appreciate your great idea of me trying to 1st do an in-place upgrade directly on the Source/old host from postgres:12 to postgres:16 which I will do, and will report back here if it worked.

3 - If the Source upgrade-in-place is successful, I'll export/backup the upgraded Source teslamate postgres:16 database, and again import it into the freshly rebuilt Target host with clean postgres:16, after re-doing the "Drop existing data and reinitialize" process.

Thank you.

P.S. @DrMichael, also thank you for clarifying my last ticket #3968 prematurely closed was not related to Portainer and for the suggestion to use the service name with docker compose exec. I actually managed to just use docker exec -i with the password, and detailed the commands above for reference. This seemed to have worked and imported the database. Fingers crossed the in-place postgres:16 upgrade and re-import are more successful at allowing this migration to succeed.

@SemoTech
Copy link
Author

SemoTech commented Jun 19, 2024

OK, @JakobLichterfeld and @DrMichael I have partial good news, we are closer to a solution:

1 - I managed to do an in-place postgres:12 to postgres:16 database upgrade on the Source host!
Grafana was then accessible (on it's own FQDN of grafana.example.com) and all data seems to have survived the postgres upgrade process.
All steps are detailed HERE with all outputs.

2 - I re-exported the now upgraded postgres:16 teslamate database data from the Source host and imported it into a fresh build of the Target host after it's own postgres:16 database first received the "Drop existing data and reinitialize" process.
See all steps and detailed output HERE

Sadly, while the Source upgrade was successful and that host and is now running postgres:16, the imported data on the Target host is again NOT being rendered correctly by Grafana!

This is what I get in Grafana on the Target host tm.example.com/grafana after the import of postgres:16 data:

image

Grafana logs on the Target host continue to include the dial tcp: lookup **database** on 127.0.0.11:53: server misbehaving entries, as per below, once anything is accessed:

logger=context userId=1 orgId=1 uname=grafana-admin t=2024-06-19T21:05:22.513347603Z level=info msg="Request Completed" method=POST path=/api/ds/query status=400 remote_addr=192.168.1.199 time_ms=21 duration=21.093359ms size=327 referer="https://tm.example.com/grafana/d/Y8upc6ZRk/drives?orgId=1" handler=/api/ds/query status_source=downstream

logger=context userId=1 orgId=1 uname=grafana-admin t=2024-06-19T21:05:22.513360376Z level=info msg="Request Completed" method=POST path=/api/ds/query status=400 remote_addr=192.168.1.199 time_ms=21 duration=21.09387ms size=339 referer="https://tm.example.com/grafana/d/Y8upc6ZRk/drives?orgId=1" handler=/api/ds/query status_source=downstream

logger=tsdb.postgres endpoint=queryData pluginId=grafana-postgresql-datasource dsName=TeslaMate dsUID=PC98CB2W6A21E2V80 uname=grafana-admin t=2024-06-19T21:05:22.513622123Z level=error msg="Query error" err="dial tcp: lookup database on 127.0.0.11:53: server misbehaving"

Some references online refer to not having DNS access from Grafana container, but I do:

16d129e94819:/usr/share/grafana$ ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: seq=0 ttl=42 time=9.819 ms
64 bytes from 1.1.1.1: seq=1 ttl=42 time=11.242 ms
^C
--- 1.1.1.1 ping statistics ---
3 packets transmitted, 2 packets received, 33% packet loss
round-trip min/avg/max = 9.819/10.530/11.242 ms
16d129e94819:/usr/share/grafana$ ping www.yahoo.com
PING www.yahoo.com (209.73.190.12): 56 data bytes
64 bytes from 209.73.190.12: seq=0 ttl=42 time=9.692 ms
64 bytes from 209.73.190.12: seq=1 ttl=42 time=15.956 ms
64 bytes from 209.73.190.12: seq=2 ttl=42 time=15.940 ms
^C
--- www.yahoo.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 9.692/13.862/15.956 ms
16d129e94819:/usr/share/grafana$ 

AND nslookup successfuly...

7c7ae9b9209e:/usr/share/grafana$ nslookup www.yahoo.com
Server:         127.0.0.11
Address:        127.0.0.11:53

Non-authoritative answer:
www.yahoo.com   canonical name = me-ycpi-cf-www.g06.yahoodns.net
Name:   me-ycpi-cf-www.g06.yahoodns.net
Address: 2001:4998:28:800::4000
Name:   me-ycpi-cf-www.g06.yahoodns.net
Address: 2001:4998:28:800::4001

Non-authoritative answer:
www.yahoo.com   canonical name = me-ycpi-cf-www.g06.yahoodns.net
Name:   me-ycpi-cf-www.g06.yahoodns.net
Address: 209.73.190.12
Name:   me-ycpi-cf-www.g06.yahoodns.net
Address: 209.73.190.11

However it is unclear why Grafana is trying to access "database" on 127.0.0.11:53 and how this can be fixed? Could this be some internal hardcoding in the postgres DB of the expected docker compose database service name??? If so, where can it be found and changed?

Now, as this is no longer a postgres old version issue, it is clearly something with postgres or Grafana, possibly related to the paths, or maybe docker compose service names, so I am hoping there is more I can try to get this fixed given we are so close...

Maybe something in the Grafana config or something I can check or change in the backup file before importing it?

Your patience and any additional help would be appreciated...

Thank you.

@JakobLichterfeld
Copy link
Collaborator

JakobLichterfeld commented Jun 20, 2024

Grafana was then accessible (on it's own FQDN of grafana.example.com) and all data seems to have survived the postgres upgrade process.

As expected it is not an issue with TeslaMate but with your new "target" system. Please provide docker-compose.yml without credentials and make sure to delete your postgres docker volume completely before the restore to get a clear starting state (as you messed around with it).

@SemoTech
Copy link
Author

SemoTech commented Jun 20, 2024

Hello @JakobLichterfeld,

This does indeed prove that an in-place teslamate postgres:12 to postgres:16 version upgrade works, which is great news for anyone with an older setup.

However, as described from the start, the Target system uses one FQDN instead of two for the Source, so it is different. Adjusting teslamate to migrate from one configuration to the other should be trivial, especially since the manual details setups for both instances.

Also, as described in detail above and per provided logs, the postgres:16 docker volume on the Target was deleted in full followed by being re-created and having the "Drop existing data and reinitialize" process applied, BEFORE the DB restore.

HERE is the Source docker-compose.yml
HERE is the Target docker-compose.yml

If any other details are needed, please just ask.

I am hoping someone has a suggestion on how to fix this and allow the data imported into the Target from the Source to be made accessible by Grafana on the Target...

Thank you.

P.S. If I start the Target host with a fresh postgres:16 so not to loose new tesla API events, and once this issue is resolved I attempt to import the old/original data from the Source server (with same postgres:16 version DB), can I do so without loosing the new/previous up-to-date data in the database? Basically can one add teslamate backup data "on top" of existing data, rather than replace everything with the older backup?

@SemoTech
Copy link
Author

***** SOLVED, GOT IT WORKING!!!! *****

Based on the Grafana logs error that stated dial tcp: lookup **database** on 127.0.0.11:53: server misbehaving I initially thought this was a DNS resolution problem or lack of access to Internet, which I proved it was not. Then the keyword database made me think something was hardcoded in the postgres DB so I did a search but alas found no entries matching it.

Finally, I recalled that in the Source host docker-compose.yml file the references for both the teslamate: service and grafana: service called onto the postgres database using: DATABASE_HOST=database, however in my Target setup docker-config.yml I had changed the name just for the teslamate: service to teslamate-postgres while the grafana: service entry still had the name database, and therefore they did not match causing Grafana to error out.

Recommend everyone using a variable in the .env file not just for entries that are security related, but also any that are referenced by more than one service, to ensure no typos or mismatching!

P.S. - I'd still like to know if one can add teslamate backup data "on top" of existing data, rather than replace everything with the older backup?

P.P.S. - I continue to get a Grafana Failed to fetch popup error about 10-15 seconds after Garfana dashboard data is loaded, but this is also happening on the Source host (and there is nothing relevant in the Grafana logs) so maybe it is a side-effect of the postgres:12 to postgres:16 upgrade, or possibly an issue with polling the Tesla API?
image

@cwanja
Copy link
Collaborator

cwanja commented Jun 20, 2024

P.S. - I'd still like to know if one can add teslamate backup data "on top" of existing data, rather than replace everything with the older backup?

No. Would suggest you take your source system (which is hopefully complete) and load it to your target system and then shut down source.

P.P.S. - I continue to get a Grafana Failed to fetch popup error about 10-15 seconds after Garfana dashboard data is loaded, but this is also happening on the Source host (and there is nothing relevant in the Grafana logs) so maybe it is a side-effect of the postgres:12 to postgres:16 upgrade, or possibly an issue with polling the Tesla API?

See #3982

@SemoTech
Copy link
Author

SemoTech commented Jun 20, 2024

Thanks @cwanja !

So based on the link you referenced, the last issue me and others are seeing, the Grafana Failed to fetch message, is unrelated to the DB upgrade or the Tesla API and is internal to the current Grafana version which makes an extra call that gets blocked and devs are working on a fix for next Grafana version.

@cwanja
Copy link
Collaborator

cwanja commented Jun 20, 2024

Thanks @cwanja !

So based on the link you referenced, the last issue me and others are seeing, the Grafana Failed to fetch message, is unrelated to the DB upgrade or the Tesla API and is internal to the current Grafana version which makes an extra call that gets blocked and devs are working on a fix for next Grafana version.

There is a lot of assumptions in that block of text. Anything Grafana focused is outside my purview.

Can confirm TeslaMate with non-uBlock ad blockers running is not facing Grafana pop-ups.

@SemoTech
Copy link
Author

SemoTech commented Jun 20, 2024

Can confirm TeslaMate with non-uBlock ad blockers running is not facing Grafana pop-ups.

Correct @cwanja I further tested using a Private Safari window, and no Grafana error.

Though I did notice discrepancies from the Source teslamate host to the Target host as can be seen below.

Any idea why would the plugged-in and locked as well as the Charge Limit info not be displayed for the same vehicle on the Target host? Both hosts run the same version teslamate/postgres and the same data, since the Source was just exported and imported to Target host!

@cwanja
Copy link
Collaborator

cwanja commented Jun 20, 2024

Wake the car up.

@SemoTech
Copy link
Author

Wake the car up.

Good call @cwanja Thank you.

@JakobLichterfeld
Copy link
Collaborator

however in my Target setup docker-config.yml I had changed the name just for the teslamate: service to teslamate-postgres while the grafana: service entry still had the name database, and therefore they did not match causing Grafana to error out.

Glad you found the error in your docker-compose on your own. As assumed correctly, no TeslaMate issue at all

@SemoTech
Copy link
Author

Indeed @JakobLichterfeld, thank you for pointing me in the right direction and very glad this was just an internal configuration mismatch.

Recommend having the database entry/name for the teslamarte: and postgres: services made a ${} variable in the standard configuration, in order to avoid this. Hopefully this ticket helps others as well.

My thanks again for everyone's help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
note:more information needed The reporter has to provide more information undetermined Not sure if this is considered a real bug
Projects
None yet
Development

No branches or pull requests

3 participants