Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add index on stream_entries table #5793

Merged

Conversation

Projects
None yet
6 participants
@takayamaki
Copy link
Contributor

commented Nov 23, 2017

This pull request is adding index for below statement .
SELECT "stream_entries".* FROM "stream_entries" WHERE "stream_entries"."activity_type" = ? AND "stream_entries"."account_id" = ? AND "stream_entries"."hidden" = ? ORDER BY "stream_entries"."id" DESC LIMIT ?

This statement executed 67,112 times.

before

                                                                               QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..754.44 rows=20 width=48) (actual time=54.970..167.629 rows=20 loops=1)
   ->  Index Scan Backward using index_stream_entries_on_id on stream_entries  (cost=0.43..450939.73 rows=11961 width=48) (actual time=54.968..167.617 rows=20 loops=1)
         Filter: ((NOT hidden) AND ((activity_type)::text = 'Status'::text) AND (account_id = 11109))
         Rows Removed by Filter: 298624
 Planning time: 0.109 ms
 Execution time: 167.658 ms
(6 rows)

after

                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..167.31 rows=20 width=48) (actual time=1.718..5.579 rows=20 loops=1)
   ->  Index Scan Backward using test3 on stream_entries  (cost=0.43..103891.09 rows=12451 width=48) (actual time=1.717..5.572 rows=20 loops=1)
         Index Cond: ((account_id = 11109) AND ((activity_type)::text = 'Status'::text))
         Filter: (NOT hidden)
 Planning time: 0.126 ms
 Execution time: 5.603 ms
(6 rows)
@kaniini
Copy link
Contributor

left a comment

this sped up timeline queries significantly on my instance

@ykzts ykzts added the performance label Nov 24, 2017

@takayamaki takayamaki force-pushed the takayamaki:add-index-on-stream_entries-table branch from f063861 to d00e15d Nov 24, 2017

@Gargron

This comment has been minimized.

Copy link
Member

commented Nov 24, 2017

Isn't a query like this used only on the Atom feed and only when you're paginating it? How could this have improved timelines @kaniini?

@kaniini

This comment has been minimized.

Copy link
Contributor

commented Nov 24, 2017

Sorry, I meant when loading some specific account's timeline.

class AddIndexOnStreamEntries < ActiveRecord::Migration[5.1]
def change
commit_db_transaction
add_index :stream_entries, [:id, :account_id, :activity_type], algorithm: :concurrently

This comment has been minimized.

Copy link
@akihikodaki

akihikodaki Nov 25, 2017

Collaborator

I think you should put id at the last for sorting. By putting account_id at the very beginning you can also cover index_stream_entries_on_account_id and remove it.

@akihikodaki akihikodaki requested a review from abcang Nov 25, 2017

@akihikodaki

This comment has been minimized.

Copy link
Collaborator

commented Nov 25, 2017

Requesting a review from @abcang; he has suggested a similar change for Pawoo, but it is also concerning the idea I suggested in #5793 (review).

@Gargron

This comment has been minimized.

Copy link
Member

commented Nov 25, 2017

I meant when loading some specific account's timeline.

I still don't think this could have any effect on that? As far as I remember account timelines use statuses directly without querying through stream_entries first - only the OStatus Atom feed does that, and it's never accessed programmatically from Mastodon code.

@abcang

This comment has been minimized.

Copy link
Collaborator

commented Nov 25, 2017

I also recognized that stream_entries is currently used only for atom feed. However, instances that do not correspond to ActivityPub are accessing atom feed, so I think that it will be effective for instances connected to such instances. Also, I think that it is more effective for instances that have many stream_entries like Pawoo.

@takayamaki takayamaki force-pushed the takayamaki:add-index-on-stream_entries-table branch 2 times, most recently from f063861 to b6355bb Nov 29, 2017

@takayamaki

This comment has been minimized.

Copy link
Contributor Author

commented Nov 29, 2017

Thank you for review.
It is more faster.

mastodon=> explain analyze SELECT "stream_entries".* FROM "stream_entries" WHERE "stream_entries"."activity_type" = 'Status' AND "stream_entries"."account_id" = ***** AND "stream_entries"."hidden" = 'f' ORDER BY "stream_entries"."id" DESC L
IMIT 20;
                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..78.54 rows=20 width=48) (actual time=0.037..0.078 rows=20 loops=1)
   ->  Index Scan Backward using test6 on stream_entries  (cost=0.43..50676.88 rows=12975 width=48) (actual time=0.035..0.067 rows=20 loops=1)
         Index Cond: ((account_id = 1) AND ((activity_type)::text = 'Status'::text))
         Filter: (NOT hidden)
 Planning time: 0.178 ms
 Execution time: 0.100 ms
(6 rows)

mastodon=>
@takayamaki

This comment has been minimized.

Copy link
Contributor Author

commented Nov 29, 2017

This statement was executed 45974 times in the last 6 days at imastodon.net.
If average execution time is 100 ms, it takes 12.77 minutes per day.

I think this is still necessary.

@abcang

abcang approved these changes Nov 30, 2017

Addressed

@Gargron Gargron merged commit dc1ebd4 into tootsuite:master Nov 30, 2017

2 checks passed

codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@takayamaki takayamaki deleted the takayamaki:add-index-on-stream_entries-table branch Nov 30, 2017

cobodo pushed a commit to cobodo/mastodon that referenced this pull request Dec 6, 2017

abcang added a commit to pixiv/mastodon that referenced this pull request Dec 18, 2017

abcang added a commit to pixiv/mastodon that referenced this pull request Dec 18, 2017

kaniini added a commit to kaniini/mastodon that referenced this pull request Mar 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.