Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Oban usage for logging, rerror reporting, and periodic jobs #378

Merged

Conversation

sorentwo
Copy link
Contributor

Hello!

Changelog switched to Oban a little while ago, and there were a few missing integrations and potential improvements I saw:

  • Upgrade Oban to the recently released 2.8 to make use of the time unit syntax
  • Unlock a variety of unused packages that aren't used anymore
  • Use Oban's telemetry events for logging and error reporting to sentry
  • Swap out Quantum for Oban to run periodic jobs (along with some refactoring to isolate individual runs)
  • Make use of additions to the Date module rather than Timex and Timex.Interval

Thanks for all your news and podcasts; they're a definite favorite 💛

@jerodsanto
Copy link
Member

This is AMAZING!

I plan on diving in and checking everything out this afternoon or tomorrow, but just the fact that this PR exists gives me so many warm fuzzies. 💚💚💚

What prompted this effort @sorentwo?

This converts the stats module into an oban worker for cron execution.
When the cron job executes it inserts individual jobs for all podcasts
over the past few days. That adds resiliency and retryability, without
having to re-process the entire batch.
This adds a new worker rather than modifying `NewsQueue` because there
are other manual callers for the `publish` function.
@sorentwo sorentwo force-pushed the chore/move-to-oban-for-cron branch from 9fb28e1 to 37be30f Compare August 2, 2021 16:57
Oban.Plugins.Stager,
{Oban.Plugins.Cron,
timezone: "US/Central",
crons: [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically these would go in the primary config, but to mimic the "prod only tasks" that were already set up I've only defined them in prod.exs.

@@ -22,6 +22,5 @@ config :changelog, Changelog.Repo,
username: System.get_env("DB_USER", "postgres")

config :changelog, Oban,
crontab: false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't been necessary for a while now. The top-level crontab option is deprecated, as CRON is implemented as a plugin now.

Supervisor.start_link(children, opts)
# Only attach the telemetry logger when we aren't in an IEx shell
unless Code.ensure_loaded?(IEx) && IEx.started?() do
Oban.Telemetry.attach_default_logger(:info)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets start/stop/error type span logging for all jobs. Rather than sprinkling custom logging into the workers you can debug and get timing automatically.

unless Code.ensure_loaded?(IEx) && IEx.started?() do
Oban.Telemetry.attach_default_logger(:info)

Changelog.ObanReporter.attach()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets native sentry errors when a job encounters an error. Without this, the errors are silent and only visible in the errors column stored on the job record in the DB.

@@ -22,7 +20,7 @@ defmodule Changelog.ObanWorkers.CommentNotifier do
"""
def schedule_notification(%NewsItemComment{id: id}) do
%{comment_id: id}
|> __MODULE__.new(schedule_in: @five_mins)
|> new(schedule_in: {5, :minutes})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I've switched to the brand new time unit syntax.

end

results = Oban.drain_queue(queue: :scheduled, with_recursion: true, with_safety: false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The drain_queue/1 function will execute all of the jobs in this process. The with_recursion option means that it will keep executing new jobs after the previous one finishes, which is required to process the sub-jobs generated by the default case. The with_safety flag disables "safe execution", any errors that are encountered will bubble up to the caller so you can see them from the CLI.

@@ -27,13 +27,11 @@
"ex_aws_s3": {:hex, :ex_aws_s3, "2.3.0", "5dfe50116bad048240bae7cd9418bfe23296542ff72a01b9138113a1cd31451c", [:mix], [{:ex_aws, "~> 2.0", [hex: :ex_aws, repo: "hexpm", optional: false]}, {:sweet_xml, ">= 0.0.0", [hex: :sweet_xml, repo: "hexpm", optional: true]}], "hexpm", "0b13b11478825d62d2f6e57ae763695331be06f2216468f31bb304316758b096"},
"ex_machina": {:hex, :ex_machina, "2.7.0", "b792cc3127fd0680fecdb6299235b4727a4944a09ff0fa904cc639272cd92dc7", [:mix], [{:ecto, "~> 2.2 or ~> 3.0", [hex: :ecto, repo: "hexpm", optional: true]}, {:ecto_sql, "~> 3.0", [hex: :ecto_sql, repo: "hexpm", optional: true]}], "hexpm", "419aa7a39bde11894c87a615c4ecaa52d8f107bbdd81d810465186f783245bf8"},
"excoveralls": {:hex, :excoveralls, "0.14.2", "f9f5fd0004d7bbeaa28ea9606251bb643c313c3d60710bad1f5809c845b748f0", [:mix], [{:hackney, "~> 1.16", [hex: :hackney, repo: "hexpm", optional: false]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: false]}], "hexpm", "ca6fd358621cb4d29311b29d4732c4d47dac70e622850979bc54ed9a3e50f3e1"},
"exjsx": {:hex, :exjsx, "4.0.0", "60548841e0212df401e38e63c0078ec57b33e7ea49b032c796ccad8cde794b5c", [:mix], [{:jsx, "~> 2.8.0", [hex: :jsx, repo: "hexpm", optional: false]}], "hexpm", "32e95820a97cffea67830e91514a2ad53b888850442d6d395f53a1ac60c82e07"},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the packages that are removed here came from running mix deps.unlock --unused.

use Changelog.SchemaCase
use Oban.Testing, repo: Changelog.Repo
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there were more Oban tests, this could go right in Changelog.SchemaCase. However, since this is the only module with Oban-specific tests, I only stuck it in here.

podcast1 = insert(:podcast)
podcast2 = insert(:podcast)

{:ok, jobs} = perform_job(StatsProcessor, %{})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The perform_job/2,3 testing helper verifies the arguments and return values.

)
end

def handle_event([:oban, :job, _], measure, meta, _) do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bulk of this was pulled directly from the Oban README.

The slack importer now uses one call for each update, rather than
fetching and separately writing for each member.
@sorentwo sorentwo force-pushed the chore/move-to-oban-for-cron branch from 37be30f to 9b57862 Compare August 2, 2021 17:16
@sorentwo
Copy link
Contributor Author

sorentwo commented Aug 2, 2021

What prompted this effort @sorentwo?

@jerodsanto After the recent "Kaizen" episode where you discussed stats, I thought I'd take a look to see how stats are imported. The task's implementation struck me as a great place to expand on Oban use. After I dove into that, I noticed that there were a few gaps (logging, error reporting, lack of CRON plugin), and I got a little carried away 😄

One other important factor is that The Changelog is a prominent OSS Elixir/Phoenix project. So if people are perusing the source, I'd like to have some idiomatic Oban examples sprinkled around.

To make the changes more digestible, I've done a self-review that explains my thought process and a few functions.

@adamstac
Copy link
Member

adamstac commented Aug 2, 2021

Super cool @sorentwo 💚

Love this thinking too...

One other important factor is that The Changelog is a prominent OSS Elixir/Phoenix project. So if people are perusing the source, I'd like to have some idiomatic Oban examples sprinkled around.

@jerodsanto jerodsanto merged commit aa47bbc into thechangelog:master Aug 2, 2021
@jerodsanto
Copy link
Member

@all-contributors please add @sorentwo for code

@allcontributors
Copy link
Contributor

@jerodsanto

I've put up a pull request to add @sorentwo! 🎉

@jerodsanto
Copy link
Member

Merged and deployed! I'll be monitoring it over the next couple days to make sure all of the background jobs run as expected. Thanks again, so cool! 💚💚💚

@sorentwo sorentwo deleted the chore/move-to-oban-for-cron branch August 2, 2021 20:57
@sorentwo
Copy link
Contributor Author

sorentwo commented Aug 2, 2021

Awesome! That was a quick turnaround. Let me know if you run into any issues 👍

@gerhard
Copy link
Member

gerhard commented Aug 2, 2021

This was great to read, thank you @sorentwo for this amazing contribution!

It also had the unintended side-effect of showing us where we need to improve code rollouts:

image

image

This is what happened:

image

And this is what we need to do next:

If a new app version cannot boot in prod, it should not be taken live (a.k.a. placed in the service). I suspect that this is related to the app being configured with latest, rather than a specific SHA. Double-check that this is the case, and address it 🚒

Thanks for highlighting this @sorentwo, and nice catch @jerodsanto 😉


Not to worry, changelog.com was not affected, just the latency was slightly higher than normal:

image
👆🏻 @tomwilkie

👇🏻 @sjwhitworth @evnsio @petehamilton
image

@sorentwo
Copy link
Contributor Author

sorentwo commented Aug 2, 2021

Doh! Sorry for the trouble. That’s what I get for the prod only config. Glad you were resilient @gerhard and thanks for the fix @jerodsanto!

@gerhard
Copy link
Member

gerhard commented Aug 2, 2021

On the contrary! This is my best reward for episode 10, thank you Parker 😉

We learn more from failure than success. @pfiaux mentioned that a few weeks back, and this is exactly that situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants