Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move status counters to separate table, count replies #8104

Merged
merged 3 commits into from
Aug 14, 2018

Conversation

Gargron
Copy link
Member

@Gargron Gargron commented Jul 31, 2018

See #7908 for separate table justification

Also, Rails 5.2 had a breaking change in cache_key behaviour, which removed the updated_at timestamp from the key! Therefore, the cache was always stale. cache_version = false restores old behaviour we were expecting...

The data from statuses table is copied in a migration using UPSERT mechanics introduced in Postgres 9.5 to avoid conflicts.

Instructions:

  1. Run SKIP_POST_DEPLOYMENT_MIGRATIONS=true rails db:migrate first
  2. Restart Mastodon
  3. Run rails db:migrate

See also: Showing replies_count in web UI (#8181)

@Gargron Gargron force-pushed the feature-status-stats-table branch from a5b256b to 75fae7b Compare July 31, 2018 01:01
else
Status.where(id: status_id).update_all('favourites_count = COALESCE(favourites_count, 0) + 1')
StatusStat.where(status_id: status_id).update_all('favourites_count = COALESCE(favourites_count, 0) + 1')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crap, just realized the record may not exist when this is executed. Hmm.

@Gargron Gargron force-pushed the feature-status-stats-table branch 3 times, most recently from 87a1fbc to a508cb5 Compare August 12, 2018 17:30
@Gargron
Copy link
Member Author

Gargron commented Aug 12, 2018

@unarist @abcang This is a big change (in terms of data, not size of PR), so could you review it? In terms of performance implications and design...

@Gargron Gargron force-pushed the feature-status-stats-table branch 2 times, most recently from 3e3c145 to 85e5448 Compare August 12, 2018 18:33
@aschmitz
Copy link
Contributor

aschmitz commented Aug 12, 2018

I don't believe the CopyStatusStats migration will lock (or block) here, so that's fine as far as it goes. (Postgres' MVCC handling will keep old rows from being vacuumed while it's running, but that's not a big deal, it's not like it should be running for days or anything.)

I'd suggest adding a trigger to update these stats, though, along the lines of https://github.com/tootsuite/mastodon/blob/master/lib/mastodon/migration_helpers.rb#L698 . (Otherwise, any changes between this migration being started and the code being rolled out will be omitted.) Even with that trigger, the ON CONFLICT DO UPDATE will overwrite any values that change during the migration, although I'm not sure there's a much better way to do it. (You could take the old value if the overwriting value would be zero, which is probably more right more of the time, but there's no good way without making the migration significantly more complicated. A sketch on how to do that is below if you want it.)

Other than that race, this looks fine. I'm not sure if you really need timestamps on StatusStat, but other than doubling the amount of space used for status timestamps, it's not really hurting anything (and could possibly be useful, since the update timestamp on Status won't catch these changing any longer).

You'll need another migration to backfill reply counts, but I assume you know that.

If you really want to not lose updated counts during that migration, one option would be to:

  1. Create the table with nullable counts and a default value of null.
  2. Add triggers to update status_stats rows when reblogs_ or favourites_count is updated. (They may need to create the row in the first place, but that's doable. Don't forget to set timestamps if you leave them in.)
  3. Run essentially the same INSERTyou have now, but only overwrite columns if they're null in the status_stats row you'd be overwriting. (Otherwise they changed during the migration.)
  4. Run change_column_default(:status_stats, :reblogs_count, 0), use MigrationHelpers' update_column_in_batches(:status_stats, :reblogs_count, 0), and then change_column_null(:status_stats, :reblogs_count, false) (and the same for favourites_count).

(In theory, you'll want to do the same when filling in the replies_count, but that query will probably take a while, and you'll want the trigger to not just increment/decrement the count, but to calculate it correctly each time, because there's no "correct" value to reference.)

@Gargron
Copy link
Member Author

Gargron commented Aug 12, 2018

@aschmitz Thank you for the assessment! I feel more confident about this update now. As for counter accuracy, I think it is negligible (some super old posts have negative counters due to the first ever migration close to two years ago). For example, remote posts only count local favourites/reblogs anyway, so we're not losing correctness because we don't have any.

@Gargron Gargron added the performance Runtime performance label Aug 12, 2018
class CreateStatusStats < ActiveRecord::Migration[5.2]
def change
create_table :status_stats do |t|
t.belongs_to :status, foreign_key: { on_delete: :cascade }, index: { unique: true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is better to set null: false. Also, it seems that foreign_key is not reflected in db/schema.rb.

@@ -26,6 +24,8 @@
#

class Status < ApplicationRecord
self.cache_versioning = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to stop changing cache_versioning and stop using cache_key directly.

def change
safety_assured do
remove_column :statuses, :reblogs_count, :integer, default: 0, null: false
remove_column :statuses, :favourites_count, :integer, default: 0, null: false
Copy link
Contributor

@aschmitz aschmitz Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will cause any existing uses of these columns to throw errors until the running code is also reloaded: I might suggest waiting on this for a later release, so most people won't have old code running when these columns are removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschmitz It's a two-step migration, see OP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right-o. Didn't realize that had changed. (Or that this was in post_migrate.) Carry on, then!

@Gargron
Copy link
Member Author

Gargron commented Aug 14, 2018

@abcang Fixed

@Gargron Gargron merged commit 8e111b7 into master Aug 14, 2018
@Gargron Gargron deleted the feature-status-stats-table branch August 14, 2018 17:19
kyori19 pushed a commit to kyori19/mastodon that referenced this pull request Sep 20, 2018
* Move status counters to separate table, count replies

* Migration to remove old counter columns from statuses table

* Fix schema file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Runtime performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants