Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve API controllers efficiency by selecting only columns needed for serialization #5805

Merged
merged 18 commits into from
Feb 6, 2020
Merged

Improve API controllers efficiency by selecting only columns needed for serialization #5805

merged 18 commits into from
Feb 6, 2020

Conversation

rhymes
Copy link
Contributor

@rhymes rhymes commented Jan 29, 2020

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update

Description

The common thread here is avoiding SELECT * as much as possible. Especially with tables like users and articles that have respectively 150 and 90 columns:

PracticalDeveloper_development> select table_name, count(*) from information_schema.columns group by table_name having table_name in ('articles', 'users', 'organizations', 'podcast_episodes', 'podcasts', 'tags', 'tweets') order by count d
 esc;
+------------------+---------+
| table_name       | count   |
|------------------+---------|
| users            | 150     |
| articles         | 90      |
| organizations    | 42      |
| tweets           | 30      |
| podcast_episodes | 29      |
| tags             | 25      |
| podcasts         | 21      |
+------------------+---------+

thus I went through the code and did the following:

  • add .select(attributes) for all queries I could add it for
  • add some specs

This should have us some memory

@rhymes rhymes requested a review from a team January 29, 2020 08:18
@pr-triage pr-triage bot added the PR: unreviewed bot applied label for PR's with no review label Jan 29, 2020
@rhymes rhymes removed the request for review from a team January 29, 2020 08:18
app/controllers/api/v0/articles_controller.rb Outdated Show resolved Hide resolved
@@ -70,6 +75,25 @@ def me

private

INDEX_ATTRIBUTES_FOR_SERIALIZATION = %i[
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll notice this pattern in the entire PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I really like about https://github.com/Netflix/fast_jsonapi, it forces you to be more explicit about the attributes that get serialized:

https://github.com/Netflix/fast_jsonapi#serializer-definition

We already have the gem installed, but only use it for a handful of serializers (see app/serializers/, they are all related to the Stackbit webhooks).

Anyway, your effort here seems to also be about reducing memory consumption when instantiating objects pre-serialization, not only about reducing payload size over the wire.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast_jsonapi is in the back of my mind for API v1, as are JSON Schema and OpenAPI with things like https://github.com/interagent/committee

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast JSON has been something I have been dying to use since working with Elasticsearch requires so much serialization I think it could really speed things up when we hit higher volumes.

existing_user_repos = Set.new(existing_user_repos)

existing_user_repos = current_user.github_repos.where(featured: true).
distinct.pluck(:github_id_code)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.distinct.pluck tells the DB to select only unique values, so we don't have to use Set.new or .to_set in memory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! 🔥

unless fetched_repo
render json: "error: Could not find Github repo", status: :not_found
return
end

@repo = GithubRepo.find_or_create(fetched_repo_params(fetched_repo))
repo = GithubRepo.find_or_create(fetched_repo_params(fetched_repo))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to store it in an instance variable, as it's only used in this method

Comment on lines +19 to +21
client = create_octokit_client

fetched_repo = fetch_repo(client)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passed arguments over instance variables is a way of life :D

# this is fully documented in the API docs
# see <https://github.com/thepracticaldev/dev.to/issues/4206> for more details
json.tag_list article.cached_tag_list_array
json.tags article.cached_tag_list
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this note because I had to keep these two attributes out of the partial, some of us are familiar with this discrepancy but it's still a good idea to document it

Comment on lines +3 to +8
# /api/articles and /api/articles/:id have opposite representations
# of `tag_list` and `tags and we can't align them without breaking the API,
# this is fully documented in the API docs
# see <https://github.com/thepracticaldev/dev.to/issues/4206> for more details
json.tag_list @article.cached_tag_list
json.tags @article.cached_tag_list_array
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this note because I had to keep these two attributes out of the partial, some of us are familiar with this discrepancy but it's still a good idea to document it

@@ -8,4 +8,3 @@ json.tags listing.tag_list
json.category listing.category
json.processed_html listing.processed_html
json.published listing.published
json.slug listing.slug
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this as json.slug is already serialized at the top of the file

@@ -0,0 +1,5 @@
json.array! follows do |follow|
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared partial

get :users
get :tags
get :organizations
get :podcasts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this have been changed into namespaces because we are using none of the CRUD operations created by resources. We don't need to generate all those routes

Copy link
Contributor

@mstruve mstruve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks FANTASTIC! There is a lot here and I only was able to give it a quick pass as its time for me to make my way to the airport for Mohonk but once I give it another thorough look I think this is going to pay off big time.

error_not_found unless @chat_channel.has_member?(current_user)
end

SHOW_ATTRIBUTES_FOR_SERIALIZATION = %i[id description channel_name].freeze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOVE these constants!

existing_user_repos = Set.new(existing_user_repos)

existing_user_repos = current_user.github_repos.where(featured: true).
distinct.pluck(:github_id_code)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! 🔥

@@ -36,7 +42,7 @@ def valid_user
user
end

def organization_article?(reaction)
def org_article?(reaction)
reaction.reactable_type == "Article" && reaction.reactable.organization_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update this to check for the actual organization rather than just the id in the event that the organization is removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if we want to do it in the same PR but yes!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding you too that so I dont see it in HB in the future 😉

json.published_at @article.published_at&.utc&.iso8601
json.last_comment_at @article.last_comment_at&.utc&.iso8601
json.published_timestamp @article.published_timestamp
json.partial! "article", article: @article
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super awesome refactor with definitely save us from getting out of sync at some point

spec/requests/api/v0/chat_channels_spec.rb Outdated Show resolved Hide resolved
expect(json_response.first["slug"]).to eq(ClassifiedListing.last.slug)
expect(json_response.first["user"]).to include("username")
expect(json_response.first["user"]["username"]).not_to be_empty
expect(response.parsed_body.size).to eq(7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parsed_body is seriously one of my new favorite shortcuts!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahah learned it from you :D

app/controllers/api/v0/articles_controller.rb Outdated Show resolved Hide resolved
@@ -70,6 +75,25 @@ def me

private

INDEX_ATTRIBUTES_FOR_SERIALIZATION = %i[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I really like about https://github.com/Netflix/fast_jsonapi, it forces you to be more explicit about the attributes that get serialized:

https://github.com/Netflix/fast_jsonapi#serializer-definition

We already have the gem installed, but only use it for a handful of serializers (see app/serializers/, they are all related to the Stackbit webhooks).

Anyway, your effort here seems to also be about reducing memory consumption when instantiating objects pre-serialization, not only about reducing payload size over the wire.

app/controllers/api/v0/articles_controller.rb Outdated Show resolved Hide resolved
app/controllers/api/v0/followings_controller.rb Outdated Show resolved Hide resolved
positive_reactions_count created_at edited_at last_comment_at published
updated_at
].freeze
private_constant :INDEX_ATTRIBUTES_FOR_SERIALIZATION
Copy link
Contributor

@citizen428 citizen428 Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the singular name, private_constant does take a list of constants to make private. Maybe this makes the repetition a bit more bearable.

private_constant :INDEX_ATTRIBUTES_FOR_SERIALIZATION,
                 :SHOW_ATTRIBUTES_FOR_SERIALIZATION,
                 :ME_ATTRIBUTES_FOR_SERIALIZATION

Copy link
Contributor

@citizen428 citizen428 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an awesome PR, thanks for this @rhymes 🎉

In general — and we talked about this before — I think we do a bit too much AR directly in the controller, whereas in the past I tended towards query objects as outlined in this post: https://codeclimate.com/blog/7-ways-to-decompose-fat-activerecord-models/

But that's a wider discussion since it introduces a whole new pattern.

@pr-triage pr-triage bot added PR: reviewed-approved bot applied label for PR's where reviewer approves changes and removed PR: unreviewed bot applied label for PR's with no review labels Jan 31, 2020
Copy link
Contributor

@mstruve mstruve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@@ -70,6 +75,25 @@ def me

private

INDEX_ATTRIBUTES_FOR_SERIALIZATION = %i[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast JSON has been something I have been dying to use since working with Elasticsearch requires so much serialization I think it could really speed things up when we hit higher volumes.

@@ -36,7 +42,7 @@ def valid_user
user
end

def organization_article?(reaction)
def org_article?(reaction)
reaction.reactable_type == "Article" && reaction.reactable.organization_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding you too that so I dont see it in HB in the future 😉

@rhymes
Copy link
Contributor Author

rhymes commented Feb 1, 2020

fast JSON has been something I have been dying to use since working with Elasticsearch requires so much serialization I think it could really speed things up when we hit higher volumes.

One thing we could already do is to use Oj as a serializer (which we're using for "fast json" already) for ActiveSupport::JSON::Encoding. I have this enabled in another project who serializes A LOT and it was very helpful when we switched. I haven't proposed this yet because I have no idea if our API endpoints are fast or slow TBH

In that project I have a config/initializers/json_encoding.rb like this:

# see https://precompile.com/2015/07/25/rails-activesupport-json.html
module ActiveSupport
  module JSON
    module Encoding
      # Use Oj as JSON encoder
      class OjEncoder < JSONGemEncoder
        def encode(value)
          ::Oj.dump(value.as_json, mode: :compat)
        end
      end
    end
  end
end

ActiveSupport.json_encoder = ActiveSupport::JSON::Encoding::OjEncoder

Holding you too that so I dont see it in HB in the future

as soon as we merge this I'll send a PR for that

@mstruve mstruve merged commit fd5643b into forem:master Feb 6, 2020
@pr-triage pr-triage bot added PR: merged bot applied label for PR's that are merged and removed PR: reviewed-approved bot applied label for PR's where reviewer approves changes labels Feb 6, 2020
@rhymes rhymes deleted the rhymes/improve-api-controllers-efficiency branch February 6, 2020 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: merged bot applied label for PR's that are merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants