Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User counters using PostgreSQL JSON #5373

Merged
merged 10 commits into from
Feb 3, 2020
Merged

User counters using PostgreSQL JSON #5373

merged 10 commits into from
Feb 3, 2020

Conversation

rhymes
Copy link
Contributor

@rhymes rhymes commented Jan 6, 2020

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update

Description

This PR is related to a discussion we had in #5310 about creating a proof of concept of how counters could work in a JSONB column in PostgreSQL.

A few details:

  • users has a 1:1 relationship with user_counters
  • counters can be accessed both as user.counters.data[:comments_these_7_days] or user.counters.comments_this_7_days both in read and write
  • there's a generic index on the JSONB column, indexes can be adjusted on specific paths if needed, but for now it should be enough, various types of indexes have different behaviors on JSONB columns.

Example of the column content (after running the included task):

PracticalDeveloper_development=# select id, user_id, data from user_counters;
 id | user_id |                           data
----+---------+----------------------------------------------------------
  1 |       1 | {"comments_prior_7_days": 0, "comments_these_7_days": 4}
  2 |       2 | {"comments_prior_7_days": 0, "comments_these_7_days": 3}
  3 |       3 | {"comments_prior_7_days": 0, "comments_these_7_days": 3}
  4 |       4 | {"comments_prior_7_days": 0, "comments_these_7_days": 2}
  5 |       5 | {"comments_prior_7_days": 0, "comments_these_7_days": 2}
  6 |       6 | {"comments_prior_7_days": 0, "comments_these_7_days": 3}
  7 |       7 | {"comments_prior_7_days": 0, "comments_these_7_days": 1}
  8 |       8 | {"comments_prior_7_days": 0, "comments_these_7_days": 1}
  9 |       9 | {"comments_prior_7_days": 0, "comments_these_7_days": 6}
 10 |      10 | {"comments_prior_7_days": 0, "comments_these_7_days": 5}

Related Tickets & Documents

#5310

@rhymes rhymes requested a review from a team January 6, 2020 12:31
@pr-triage pr-triage bot added the PR: unreviewed bot applied label for PR's with no review label Jan 6, 2020
Copy link
Contributor Author

@rhymes rhymes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments :)

@@ -148,6 +149,16 @@ class User < ApplicationRecord

scope :dev_account, -> { find_by(id: ApplicationConfig["DEVTO_USER_ID"]) }

scope :with_this_week_comments, lambda { |number|
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included a few scopes as an example on how to use JSONB operators for queries

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a file length perspective, how do you feel about maybe moving all scopes to a separate file one way or another? If we lean in here I could see this getting long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I personally like the idea of at least counter scopes and even related methods being separate! We did something similar at Kenna with a model that had extensive status options and it worked out really well. Very easy to track down anything status-related bc it was all contained in a single file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also down for organizing files in a sane way, some of our models are way too big!

class UserCounter < ApplicationRecord
belongs_to :user

serialize :data, HashSerializer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes care of the conversion between hash and JSON

belongs_to :user

serialize :data, HashSerializer
store_accessor :data, :comments_these_7_days, :comments_prior_7_days
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates the accessor that allows us to do:

user.counters.comments_these_7_days

in addition to

user.counters.data[:comments_these_7_days]

Comment on lines +9 to +10
validates :comments_these_7_days, numericality: { only_integer: true }
validates :comments_prior_7_days, numericality: { only_integer: true }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add basic validators to the json fields

class CreateUserCounters < ActiveRecord::Migration[5.2]
def change
create_table :user_counters do |t|
t.references :user, index: { unique: true }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique to make sure that we only have 1 row per user

@rhymes rhymes changed the title [POC] User counters using in PostgreSQL json [WIP] [POC] User counters using in PostgreSQL json Jan 6, 2020
@pr-triage pr-triage bot removed the PR: unreviewed bot applied label for PR's with no review label Jan 6, 2020
Copy link
Contributor

@mstruve mstruve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely the kind of data I would store in Elasticsearch but I love this postgres approach. We will see how it scales and once ES is up and running if we want to do performance tests and comparisons we can.

task update_users: :environment do
ActiveRecord::Base.transaction do
# NOTE: we could bypass Rails by using SQL directly to compute the counts and update these columns
User.includes(:counters).find_each do |user|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should log the duration of this in Datadog so that we can track how long it takes and if it starts taking a really long time, then we might want to skip the Rails interface in an attempt to speed it up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often do you plan on running this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know yet, I copied the idea of the task from #5310 :D

It really is a POC for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be run daily, or maybe split it up into sections and run it more often.

I see this as only having to be as accurate as we need it to be so we can play around with this.

It immediately enables us to ask the question "how many people had more comments this week than last?" and I think we can flexibly tie the functionality to the questions we want to ask and how often we want to ask them.

As a proof of concept this all looks good to me @rhymes.

@mstruve when we get up and running with Elastic we can start defining the useful borders of where to use each, and maybe we won't need this at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It immediately enables us to ask the question "how many people had more comments this week than last?" and I think we can flexibly tie the functionality to the questions we want to ask and how often we want to ask them.

This can probably be asked with a single SQL query :) We should probably start thinking about storing these questions in dedicated query objects outside the perimeter of the model, since they will mostly likely be used only for reports or dashboards

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mstruve how do we log the duration? :-) I'm not familiar with the DD library

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we do not log the duration of the rake tasks bc we have turned off Honeycomb and Datadog for the schedulers. If you want to track the duration of this task I would put it in a worker and kick off the worker from the scheduler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to a Sidekiq worker when we have a chance so that we can track how long it takes and if we want send some of the count data to Datadog.

Copy link
Contributor Author

@rhymes rhymes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what we want to do with this PR, it was meant as a POC but if we like this approach we can get it merged and then add the fields to it as we review the user counters code around the codebase

cc @mstruve @benhalpern

task update_users: :environment do
ActiveRecord::Base.transaction do
# NOTE: we could bypass Rails by using SQL directly to compute the counts and update these columns
User.includes(:counters).find_each do |user|
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mstruve how do we log the duration? :-) I'm not familiar with the DD library

Copy link
Contributor

@mstruve mstruve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm game to give this a shot! Want to pull the WIP off the title?

task update_users: :environment do
ActiveRecord::Base.transaction do
# NOTE: we could bypass Rails by using SQL directly to compute the counts and update these columns
User.includes(:counters).find_each do |user|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to a Sidekiq worker when we have a chance so that we can track how long it takes and if we want send some of the count data to Datadog.

@rhymes rhymes changed the title [WIP] [POC] User counters using in PostgreSQL json User counters using in PostgreSQL json Feb 1, 2020
@pr-triage pr-triage bot added the PR: reviewed-approved bot applied label for PR's where reviewer approves changes label Feb 1, 2020
@rhymes rhymes changed the title User counters using in PostgreSQL json User counters using PostgreSQL JSON Feb 1, 2020
@mstruve mstruve merged commit a1835ae into forem:master Feb 3, 2020
@pr-triage pr-triage bot added PR: merged bot applied label for PR's that are merged and removed PR: reviewed-approved bot applied label for PR's where reviewer approves changes labels Feb 3, 2020
@rhymes rhymes deleted the rhymes/users_counters_jsonb branch February 3, 2020 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: merged bot applied label for PR's that are merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants