Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Positions table index on Odometer and Car_ID to improve response time of some queries #3801

Closed

Conversation

jheredianet
Copy link
Contributor

I've realized that some queries from Grafana dashboard that shows information from the odometer are very slow specially on databases with too much data logged and for those who has more than one car.

This response speed is greatly improved with an index on the "positions" table based on the "odometer" and "car_id" fields.

Copy link

netlify bot commented Apr 1, 2024

Deploy Preview for teslamate ready!

Name Link
🔨 Latest commit b9f340b
🔍 Latest deploy log https://app.netlify.com/sites/teslamate/deploys/660add5a40231b0008a5b178
😎 Deploy Preview https://deploy-preview-3801--teslamate.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@JakobLichterfeld JakobLichterfeld added enhancement New feature or request area:dashboard Related to a Grafana dashboard area:teslamate Related to TeslaMate core and removed area:dashboard Related to a Grafana dashboard labels Apr 2, 2024
@JakobLichterfeld JakobLichterfeld changed the title Positions table index on Odometer and Car_ID perf: Positions table index on Odometer and Car_ID Apr 2, 2024
@JakobLichterfeld JakobLichterfeld changed the title perf: Positions table index on Odometer and Car_ID perf: Positions table index on Odometer and Car_ID to improve response time of queries Apr 2, 2024
@JakobLichterfeld
Copy link
Collaborator

Thanks for the change. I appreciate the performance improvements as well.

I feel like we should change it in the first place and add your suggested migration in addition. @brianmay What do you think?

@Dulanic
Copy link
Collaborator

Dulanic commented Apr 2, 2024

I had thought about this before, but adding index in positions means the index needs to be updated many times a second, this could possibly cause write issues for RPI etc... Not a problem for most, but I know we have a lot of RPI users too. What dashboard specifically are you noticing this wtih? Im trying to think of use cases where odometer is a useful index, but I don't have all the SQL up to look.

Also there is already a car_id index.

@brianmay
Copy link
Collaborator

brianmay commented Apr 2, 2024

adding index in positions means the index needs to be updated many times a second, this could possibly cause write issues for RPI

Some random thoughts:

  • "updated many times a second"? I don't think we get that many updates? I think most I have observed is about one update a second.
  • The rate of updates might go down even more if can't continue to use the streaming API.
  • We should be able to get some metrics that show just how much it will slow down on RPi. I think we can't make any performance based judgements without concrete before and after metrics (at least I have been to many conference talks that make this point...)

@Dulanic
Copy link
Collaborator

Dulanic commented Apr 2, 2024

adding index in positions means the index needs to be updated many times a second, this could possibly cause write issues for RPI

Some random thoughts:

  • "updated many times a second"? I don't think we get that many updates? I think most I have observed is about one update a second.
  • The rate of updates might go down even more if can't continue to use the streaming API.
  • We should be able to get some metrics that show just how much it will slow down on RPi. I think we can't make any performance based judgements without concrete before and after metrics (at least I have been to many conference talks that make this point...)

Agree, I mean you're probably right, but I'd love to test the code some. So Ill await what dashboard is that slow first.

@jheredianet
Copy link
Contributor Author

Agree, I mean you're probably right, but I'd love to test the code some. So, I'll await what dashboard is that slow first.

Hi, the odometer is used on several dashboards, but it goes unnoticed because they are usually time filtered (30 days, 90 days, etc). But in the Battery Health dashboard there is a panel for the trips to get the distance logged in TeslaMate and it's extremely slow, specially this query:

SELECT max(odometer) - min(odometer) as "Mileage" from positions where car_id = 1;

I have a database with more than 16 millions of records in the table "positions" (of 3 cars as shown in the following screenshot):

image

That query without this suggested index takes around 26 secs to run and with the index just 55ms (tested on a cloud server with 4 cores and 8GB of RAM).

image

You said in the previous comment:

Also there is already a car_id index.

Yes, it is, and it's necessary (since all dashboards have to filter by car), but at the same time there is another composite index based on "date, drive_id" which is also used a lot (even having already an index based on date).

image

That is why I'm proposing this composite index based on "odometer, car_id", which will be very useful especially for those who have more than one car in the database. If a simple index is created based only on the odometer field, it would work, but it would not be enough when there are more cars registered.

@Dulanic
Copy link
Collaborator

Dulanic commented Apr 3, 2024

Sorry to push back, I don't think you understand indexing. Indexing is used to FIND records. I doubt any of the dashboards have odometer in a join or in the where statement. Indexes are not used to provide data *with a few exceptions, B-tree indexes allow for efficient searching, insertion, and deletion of data.

In your case, since you have such a large data set, if you want to add an index, go for it. It doesn't hurt anything most likely for you. But I really don't see a benefit to adding it to everyone.

For my above comment of B-Tree indexing not helping most dashboards, it's prob accurate. But for a MIN/MAX situation it likely helps because it can just check the MIN/MAX of the index. But for returning 1:1 values and most use cases, it would not benefit anything.

Also, most of the dashboards should be using the drive data NOT the positions data for min/max odometer readings. If they are not, they should be reviewed and possibly enhanced. Also, you can show the explain analyze for more indexing details and how the query optimizer works.

EX: select min(start_km), max(end_km)
from drives

https://explain-postgresql.com/archive/explain/b4250b3cd7cbef4f2d81032ac1fcef85:0:2024-04-03

using positions:
https://explain-postgresql.com/archive/explain/3550d3af35c9c4e9e8eeab7f49edc7d7:0:2024-04-03

@Dulanic Dulanic self-assigned this Apr 3, 2024
@JakobLichterfeld JakobLichterfeld added WIP Work in progress and removed enhancement New feature or request labels Apr 3, 2024
@jheredianet
Copy link
Contributor Author

Sorry to push back, I don't think you understand indexing. Indexing is used to FIND records. I doubt any of the dashboards have odometer in a join or in the where statement. Indexes are not used to provide data *with a few exceptions, B-tree indexes allow for efficient searching, insertion, and deletion of data.

In your case, since you have such a large data set, if you want to add an index, go for it. It doesn't hurt anything most likely for you. But I really don't see a benefit to adding it to everyone.

For my above comment of B-Tree indexing not helping most dashboards, it's prob accurate. But for a MIN/MAX situation it likely helps because it can just check the MIN/MAX of the index. But for returning 1:1 value and most use cases, it would not benefit anything.

Also, most of the dashboards should be using the drive data NOT the positions data for min/max odometer readings. If they are not, they should be reviewed and possibly enhanced. Also, you can show the explain analyze for more indexing details and how the query optimizer works.

EX: select min(start_km), max(end_km) from drives

https://explain-postgresql.com/archive/explain/b4250b3cd7cbef4f2d81032ac1fcef85:0:2024-04-03

using positions: https://explain-postgresql.com/archive/explain/3550d3af35c9c4e9e8eeab7f49edc7d7:0:2024-04-03

Hi, I understand what is an Index for and agree with you, but this is the use case:

In the battery health dashboard, we have this information:

image

Where:

"Logged" is the distance traveled that is saved on Teslamate database, and "Mileage" is the distance the car has traveled since using Teslamate.

These are the queries:

select ROUND(convert_km(sum(distance)::numeric, 'km'),0)|| ' km' as "Logged"
from drives 
where car_id = 1;

Any problem here, fast and clean.

SELECT ROUND(convert_km(max(odometer)::numeric, 'km'),0) || ' km' as "Odometer"
from positions where car_id =  1;

Without the index takes more than 20 seconds. With the index just milliseconds.
We could use this other query to get the same information (as in Overview dashboard), but it's still very slow if we don't filter the time:

SELECT ROUND(convert_km(odometer::numeric, 'km'),0) || ' km' as "Odometer" 
from positions WHERE car_id=1 ORDER BY "date" DESC LIMIT 1;

And the special one is this:

SELECT ROUND(convert_km((max(odometer) - min(odometer))::numeric, 'km'),0)|| ' km' as "Mileage"
from positions where car_id = 1;

Without the index takes almost a minute. With the index just milliseconds.

In both last queries we cannot use the drives table (of course it will be so fast), but, what about if there some unclosed drives? or if Teslamate was offline while driving. The real information about the odometer is only in the positions table, so we have to use it.

The case here is to show the user, about how much mileages have been not saved in the database, when for some reason a drive hasn't been fully recorded, for example due to a bug or an unexpected restart and that Teslamate has not been able to record, either due to lack of connection, areas without signal, or that it has been out of service.

I am open to hearing any improvements you can come up with to improve query performance for this data without the need of creating this proposed index.

@brianmay
Copy link
Collaborator

brianmay commented Apr 3, 2024

Explain is your friend here :-). Without the new index:

teslamate=# explain select ROUND(convert_km(sum(distance)::numeric, 'km'),0)|| ' km' as "Logged"
from drives
where car_id = 1;
                           QUERY PLAN
-----------------------------------------------------------------
 Aggregate  (cost=299.62..299.64 rows=1 width=32)
   ->  Seq Scan on drives  (cost=0.00..278.51 rows=8441 width=8)
         Filter: (car_id = 1)
(3 rows)

teslamate=# explain SELECT ROUND(convert_km(max(odometer)::numeric, 'km'),0) || ' km' as "Odometer"
from positions where car_id =  1;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Aggregate  (cost=305056.46..305056.48 rows=1 width=32)
   ->  Index Scan using positions_car_id_index on positions  (cost=0.44..304829.46 rows=90801 width=8)
         Index Cond: (car_id = 1)
(3 rows)

teslamate=# explain SELECT ROUND(convert_km(odometer::numeric, 'km'),0) || ' km' as "Odometer"
from positions WHERE car_id=1 ORDER BY "date" DESC LIMIT 1;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..16.19 rows=1 width=40)
   ->  Index Scan Backward using positions_date_index on positions  (cost=0.44..1430761.48 rows=90801 width=40)
         Filter: (car_id = 1)
(3 rows)

teslamate=# explain SELECT ROUND(convert_km((max(odometer) - min(odometer))::numeric, 'km'),0)|| ' km' as "Mileage"
from positions where car_id = 1;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Aggregate  (cost=305283.46..305283.49 rows=1 width=32)
   ->  Index Scan using positions_car_id_index on positions  (cost=0.44..304829.46 rows=90801 width=8)
         Index Cond: (car_id = 1)
(3 rows)

Adding the new index:

$ mix ecto.migrate
Compiling 2 files (.ex)

08:45:44.916 [info] == Running 202404161637000 TeslaMate.Repo.Migrations.AddCompositeIndexToPositionsOnOdometer.change/0 forward

08:45:44.917 [info] create index positions_car_id_odometer_index

08:45:50.516 [info] == Migrated 202404161637000 in 5.5s

Running the above again:

teslamate=# explain select ROUND(convert_km(sum(distance)::numeric, 'km'),0)|| ' km' as "Logged"
from drives
where car_id = 1;
                           QUERY PLAN
-----------------------------------------------------------------
 Aggregate  (cost=299.62..299.64 rows=1 width=32)
   ->  Seq Scan on drives  (cost=0.00..278.51 rows=8441 width=8)
         Filter: (car_id = 1)
(3 rows)

teslamate=# explain SELECT ROUND(convert_km(max(odometer)::numeric, 'km'),0) || ' km' as "Odometer"
from positions where car_id =  1;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.60..0.62 rows=1 width=32)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.56..0.60 rows=1 width=8)
           ->  Index Only Scan Backward using positions_car_id_odometer_index on positions  (cost=0.56..643282.88 rows=18160116 width=8)
                 Index Cond: ((car_id = 1) AND (odometer IS NOT NULL))
(5 rows)

teslamate=# explain SELECT ROUND(convert_km(odometer::numeric, 'km'),0) || ' km' as "Odometer"
from positions WHERE car_id=1 ORDER BY "date" DESC LIMIT 1;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..0.49 rows=1 width=40)
   ->  Index Scan Backward using positions_date_index on positions  (cost=0.44..972265.94 rows=18160116 width=40)
         Filter: (car_id = 1)
(3 rows)

teslamate=# explain SELECT ROUND(convert_km((max(odometer) - min(odometer))::numeric, 'km'),0)|| ' km' as "Mileage"
from positions where car_id = 1;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=1.20..1.22 rows=1 width=32)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.56..0.60 rows=1 width=8)
           ->  Index Only Scan Backward using positions_car_id_odometer_index on positions  (cost=0.56..643282.88 rows=18160116 width=8)
                 Index Cond: ((car_id = 1) AND (odometer IS NOT NULL))
   InitPlan 2 (returns $1)
     ->  Limit  (cost=0.56..0.60 rows=1 width=8)
           ->  Index Only Scan using positions_car_id_odometer_index on positions positions_1  (cost=0.56..643282.88 rows=18160116 width=8)
                 Index Cond: ((car_id = 1) AND (odometer IS NOT NULL))
(9 rows)

The 1st is the exact same. It only touches the drives table. Hardly surprising.

The 2nd one has changed the "index scan" to a "index only scan". I think this means we get get the max odometer value straight from the index, instead of having the search every car_id=1 entry. Cheap.

The third query we are looking for every odometer entry, finding one with the latest date. The new index isn't helping here. Instead we do a search on the date index for the latest date, and then scan all of these entries for car_id=1. Which is going to be expensive if you have a lot of cars, but cheap if you only have 1 car. To me this looks like it should be improved if we had an index on (car_id, date) or (date, car_id).

The 4th case I think instead of scanning each any every entry sequentially, we can get the max and min values straight from the index.

@Dulanic
Copy link
Collaborator

Dulanic commented Apr 3, 2024

So I am going to go back to my previous statement, wouldn't it be far more efficient just to use the drives for the query instead? A quick PR with a simple query change and that long query time go away. It made little difference for me, but bet it will be for you.

The only difference may be the dashboard WHILE you are driving. But honestly, really who does that? I think thats a edge case that really shouldnt matter.

One thing I dont mind is the query that orders by date :) That's a good use of indexing as the date field is indexed.

SELECT ROUND(convert_km((max(odometer) - min(odometer))::numeric, 'mi'),0)|| ' mi' as "Logged"
from positions where car_id = 3;

SELECT ROUND(convert_km((max(end_km) - min(start_km))::numeric, 'mi'),0)|| ' mi' as "Logged"
from drives where car_id = 3;

And why can't we use the other one? It seems a weird way to get the MAX record basically. I'm sure it was created a while ago, but that doesn't mean it's the best way to do it. Same results, much faster /w the 2nd.

SELECT ROUND(convert_km(odometer::numeric, 'km'),0) || ' km' as "Odometer" 
from positions WHERE car_id=3 ORDER BY "date" DESC LIMIT 1;

SELECT ROUND(convert_km(max(end_km)::numeric, 'km'),0) || ' km' as "Odometer" 
from drives WHERE car_id=3 ;

And the other can easily be done as:

SELECT ROUND(convert_km((max(odometer) - min(odometer))::numeric, 'km'),0)|| ' km' as "Mileage"
from positions where car_id = 3;


SELECT ROUND(convert_km((max(end_km) - min(start_km))::numeric, 'km'),0)|| ' km' as "Mileage"
from drives where car_id = 3;

Working with existing indexed can also make some of these queries MUCH faster.

EX: Changing

SELECT ROUND(convert_km((max(odometer) - min(odometer))::numeric, 'km'),0)|| ' km' as "Mileage"
    
from positions where car_id = 3;

to:

SELECT
    ROUND(convert_km(maxodo::numeric - minodo::numeric, 'km'),0)|| ' km' as "Mileage"
FROM
    (SELECT odometer AS maxodo  FROM  positions
        WHERE car_id=3
        ORDER BY "date" DESC
        LIMIT 1
    ) a
    CROSS JOIN (
        SELECT odometer AS minodo FROM positions
        WHERE car_id=3
        ORDER BY  "date"
        LIMIT 1
    ) b

made that query a lot faster. I threw together the code, so it's poorly written, but it gives an example of working with the existing indexing can make much of our existing queries MUCH fast.

@JakobLichterfeld JakobLichterfeld changed the title perf: Positions table index on Odometer and Car_ID to improve response time of queries perf: Positions table index on Odometer and Car_ID to improve response time of some queries Apr 4, 2024
@jheredianet
Copy link
Contributor Author

I threw together the code, so it's poorly written, but it gives an example of working with the existing indexing can make much of our existing queries MUCH fast.

Thank you. You're right, it only makes a little difference of data using only the "drives" Table instead of using positions.
I'll make a PR for the Battery Health dashboard. Then we can close and deny this proposed index.

@jheredianet
Copy link
Contributor Author

Ok, I've managed to include these query changes on this PR #3806

@jheredianet jheredianet closed this Apr 5, 2024
@jheredianet jheredianet deleted the positions-table-indexes branch April 5, 2024 09:21
@cwanja
Copy link
Collaborator

cwanja commented Apr 5, 2024

This is an interesting dialog considering the issue that was raised here #3727

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:teslamate Related to TeslaMate core WIP Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants