Skip to content

Commit

Permalink
hn trends
Browse files Browse the repository at this point in the history
  • Loading branch information
toddwschneider committed May 12, 2019
0 parents commit bc1ba6d
Show file tree
Hide file tree
Showing 93 changed files with 2,022 additions and 0 deletions.
27 changes: 27 additions & 0 deletions .gitignore
@@ -0,0 +1,27 @@
# See https://help.github.com/articles/ignoring-files for more about ignoring files.
#
# If you find yourself ignoring temporary files generated by your text editor
# or operating system, you probably want to add a global ignore instead:
# git config --global core.excludesfile '~/.gitignore_global'

# Ignore bundler config.
/.bundle

# Ignore all logfiles and tempfiles.
/log/*
/tmp/*
!/log/.keep
!/tmp/.keep

# Ignore uploaded files in development
/storage/*
!/storage/.keep

/node_modules
/yarn-error.log

/public/assets
.byebug_history

# Ignore master key for decrypting credentials and more.
/config/master.key
1 change: 1 addition & 0 deletions .ruby-version
@@ -0,0 +1 @@
2.6.3
39 changes: 39 additions & 0 deletions Gemfile
@@ -0,0 +1,39 @@
source 'https://rubygems.org'
git_source(:github) { |repo| "https://github.com/#{repo}.git" }

ruby '2.6.3'

gem 'rails', '~> 5.2.3'

gem 'addressable', '~> 2.6'
gem 'bootsnap', '>= 1.1.0', require: false
gem 'clockwork', '~> 2.0'
gem 'delayed_job_active_record', '~> 4.1'
gem 'foreman', '~> 0.85'
gem 'hashie', '~> 3.6'
gem 'httparty', '~> 0.17'
gem 'nokogiri', '~> 1.10'
gem 'pg', '>= 0.18', '< 2.0'
gem 'puma', '~> 3.11'
gem 'sass-rails', '~> 5.0'
gem 'typhoeus', '~> 1.3'
gem 'uglifier', '>= 1.3.0'

group :development, :test do
gem 'byebug', platforms: [:mri, :mingw, :x64_mingw]
end

group :development do
gem 'web-console', '>= 3.3.0'
gem 'listen', '>= 3.0.5', '< 3.2'
gem 'spring'
gem 'spring-watcher-listen', '~> 2.0.0'
end

group :test do
gem 'capybara', '>= 2.15'
gem 'selenium-webdriver'
gem 'chromedriver-helper'
end

gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw, :jruby]
232 changes: 232 additions & 0 deletions Gemfile.lock
@@ -0,0 +1,232 @@
GEM
remote: https://rubygems.org/
specs:
actioncable (5.2.3)
actionpack (= 5.2.3)
nio4r (~> 2.0)
websocket-driver (>= 0.6.1)
actionmailer (5.2.3)
actionpack (= 5.2.3)
actionview (= 5.2.3)
activejob (= 5.2.3)
mail (~> 2.5, >= 2.5.4)
rails-dom-testing (~> 2.0)
actionpack (5.2.3)
actionview (= 5.2.3)
activesupport (= 5.2.3)
rack (~> 2.0)
rack-test (>= 0.6.3)
rails-dom-testing (~> 2.0)
rails-html-sanitizer (~> 1.0, >= 1.0.2)
actionview (5.2.3)
activesupport (= 5.2.3)
builder (~> 3.1)
erubi (~> 1.4)
rails-dom-testing (~> 2.0)
rails-html-sanitizer (~> 1.0, >= 1.0.3)
activejob (5.2.3)
activesupport (= 5.2.3)
globalid (>= 0.3.6)
activemodel (5.2.3)
activesupport (= 5.2.3)
activerecord (5.2.3)
activemodel (= 5.2.3)
activesupport (= 5.2.3)
arel (>= 9.0)
activestorage (5.2.3)
actionpack (= 5.2.3)
activerecord (= 5.2.3)
marcel (~> 0.3.1)
activesupport (5.2.3)
concurrent-ruby (~> 1.0, >= 1.0.2)
i18n (>= 0.7, < 2)
minitest (~> 5.1)
tzinfo (~> 1.1)
addressable (2.6.0)
public_suffix (>= 2.0.2, < 4.0)
archive-zip (0.12.0)
io-like (~> 0.3.0)
arel (9.0.0)
bindex (0.7.0)
bootsnap (1.4.4)
msgpack (~> 1.0)
builder (3.2.3)
byebug (11.0.1)
capybara (3.19.0)
addressable
mini_mime (>= 0.1.3)
nokogiri (~> 1.8)
rack (>= 1.6.0)
rack-test (>= 0.6.3)
regexp_parser (~> 1.2)
xpath (~> 3.2)
childprocess (1.0.1)
rake (< 13.0)
chromedriver-helper (2.1.1)
archive-zip (~> 0.10)
nokogiri (~> 1.8)
clockwork (2.0.3)
tzinfo
concurrent-ruby (1.1.5)
crass (1.0.4)
delayed_job (4.1.5)
activesupport (>= 3.0, < 5.3)
delayed_job_active_record (4.1.3)
activerecord (>= 3.0, < 5.3)
delayed_job (>= 3.0, < 5)
erubi (1.8.0)
ethon (0.12.0)
ffi (>= 1.3.0)
execjs (2.7.0)
ffi (1.10.0)
foreman (0.85.0)
thor (~> 0.19.1)
globalid (0.4.2)
activesupport (>= 4.2.0)
hashie (3.6.0)
httparty (0.17.0)
mime-types (~> 3.0)
multi_xml (>= 0.5.2)
i18n (1.6.0)
concurrent-ruby (~> 1.0)
io-like (0.3.0)
listen (3.1.5)
rb-fsevent (~> 0.9, >= 0.9.4)
rb-inotify (~> 0.9, >= 0.9.7)
ruby_dep (~> 1.2)
loofah (2.2.3)
crass (~> 1.0.2)
nokogiri (>= 1.5.9)
mail (2.7.1)
mini_mime (>= 0.1.1)
marcel (0.3.3)
mimemagic (~> 0.3.2)
method_source (0.9.2)
mime-types (3.2.2)
mime-types-data (~> 3.2015)
mime-types-data (3.2019.0331)
mimemagic (0.3.3)
mini_mime (1.0.1)
mini_portile2 (2.4.0)
minitest (5.11.3)
msgpack (1.2.10)
multi_xml (0.6.0)
nio4r (2.3.1)
nokogiri (1.10.3)
mini_portile2 (~> 2.4.0)
pg (1.1.4)
public_suffix (3.0.3)
puma (3.12.1)
rack (2.0.7)
rack-test (1.1.0)
rack (>= 1.0, < 3)
rails (5.2.3)
actioncable (= 5.2.3)
actionmailer (= 5.2.3)
actionpack (= 5.2.3)
actionview (= 5.2.3)
activejob (= 5.2.3)
activemodel (= 5.2.3)
activerecord (= 5.2.3)
activestorage (= 5.2.3)
activesupport (= 5.2.3)
bundler (>= 1.3.0)
railties (= 5.2.3)
sprockets-rails (>= 2.0.0)
rails-dom-testing (2.0.3)
activesupport (>= 4.2.0)
nokogiri (>= 1.6)
rails-html-sanitizer (1.0.4)
loofah (~> 2.2, >= 2.2.2)
railties (5.2.3)
actionpack (= 5.2.3)
activesupport (= 5.2.3)
method_source
rake (>= 0.8.7)
thor (>= 0.19.0, < 2.0)
rake (12.3.2)
rb-fsevent (0.10.3)
rb-inotify (0.10.0)
ffi (~> 1.0)
regexp_parser (1.4.0)
ruby_dep (1.5.0)
rubyzip (1.2.2)
sass (3.7.4)
sass-listen (~> 4.0.0)
sass-listen (4.0.0)
rb-fsevent (~> 0.9, >= 0.9.4)
rb-inotify (~> 0.9, >= 0.9.7)
sass-rails (5.0.7)
railties (>= 4.0.0, < 6)
sass (~> 3.1)
sprockets (>= 2.8, < 4.0)
sprockets-rails (>= 2.0, < 4.0)
tilt (>= 1.1, < 3)
selenium-webdriver (3.142.2)
childprocess (>= 0.5, < 2.0)
rubyzip (~> 1.2, >= 1.2.2)
spring (2.0.2)
activesupport (>= 4.2)
spring-watcher-listen (2.0.1)
listen (>= 2.7, < 4.0)
spring (>= 1.2, < 3.0)
sprockets (3.7.2)
concurrent-ruby (~> 1.0)
rack (> 1, < 3)
sprockets-rails (3.2.1)
actionpack (>= 4.0)
activesupport (>= 4.0)
sprockets (>= 3.0.0)
thor (0.19.4)
thread_safe (0.3.6)
tilt (2.0.9)
typhoeus (1.3.1)
ethon (>= 0.9.0)
tzinfo (1.2.5)
thread_safe (~> 0.1)
uglifier (4.1.20)
execjs (>= 0.3.0, < 3)
web-console (3.7.0)
actionview (>= 5.0)
activemodel (>= 5.0)
bindex (>= 0.4.0)
railties (>= 5.0)
websocket-driver (0.7.0)
websocket-extensions (>= 0.1.0)
websocket-extensions (0.1.3)
xpath (3.2.0)
nokogiri (~> 1.8)

PLATFORMS
ruby

DEPENDENCIES
addressable (~> 2.6)
bootsnap (>= 1.1.0)
byebug
capybara (>= 2.15)
chromedriver-helper
clockwork (~> 2.0)
delayed_job_active_record (~> 4.1)
foreman (~> 0.85)
hashie (~> 3.6)
httparty (~> 0.17)
listen (>= 3.0.5, < 3.2)
nokogiri (~> 1.10)
pg (>= 0.18, < 2.0)
puma (~> 3.11)
rails (~> 5.2.3)
sass-rails (~> 5.0)
selenium-webdriver
spring
spring-watcher-listen (~> 2.0.0)
typhoeus (~> 1.3)
tzinfo-data
uglifier (>= 1.3.0)
web-console (>= 3.3.0)

RUBY VERSION
ruby 2.6.3p62

BUNDLED WITH
1.17.2
21 changes: 21 additions & 0 deletions LICENSE
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2019 Todd Schneider

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
4 changes: 4 additions & 0 deletions Procfile
@@ -0,0 +1,4 @@
web: bundle exec puma -C config/puma.rb
worker: bundle exec rake jobs:work
clock: bundle exec clockwork clock.rb
clockplusworker: bundle exec foreman start -f Procfile.clockplusworker
2 changes: 2 additions & 0 deletions Procfile.clockplusworker
@@ -0,0 +1,2 @@
clock: bundle exec clockwork clock.rb
djworker: bundle exec rake jobs:work
41 changes: 41 additions & 0 deletions README.md
@@ -0,0 +1,41 @@
# Hacker News Front Page Trends

A Ruby on Rails app that stores [Hacker News](https://news.ycombinator.com) items that have appeared on the front page, and exposes a few JSON API endpoints that let users search for terms, domains, and users to see how popular they have been on the HN front page over time.

[Click here for a live dashboard that uses this API](https://toddwschneider.com/dashboards/hacker-news-trends/)

## Screenshot

[![screenshot](https://user-images.githubusercontent.com/70271/57560906-f10c2f00-7356-11e9-81ba-0271c4241262.png)](https://toddwschneider.com/dashboards/hacker-news-trends/?q=statistics%2C+"machine+learning"+or+ML%2C+"artificial+intelligence"+or+AI&f=title&s=text&m=frac_items&t=year)

## Caveat

HN only provides the exact list of front page items for dates since 11/11/2014, so anything before then is an estimate. For earlier dates, I used a heuristic of sorting by score and taking the top 115 items on weekdays, 80 on weekends, subject to a minimum of 3 points. This definitely isn’t perfect, for example:

- it excludes job posts before 11/11/2014 since they always have 1 point
- items with high scores don’t always get to the front page
- it’s possible that HN has changed its algorithm over time to promote faster or slower front page turnover

But it should be a decent approximation, and the code could also be modified to use other heuristics. It would also probably be an improvement to fetch all job posts from pre 11/11/14 via the [HN API](https://github.com/HackerNews/API).

## Structure

There are 3 files of interest:

1. `app/lib/hn_client.rb` - code to collect front page data via the HN website and [API](https://github.com/HackerNews/API)
2. `app/models/hn_item.rb` - code that uses the `HnClient` to store the appropriate records in PostgreSQL database
3. `app/lib/hn_trends_calculator.rb` - code to calculate trends over time and top items for given search terms. The trends endpoint returns 4 metrics for each term/date:
1. Fraction of all front page items
2. Number of all front page items
3. Fraction of total front page score, i.e. the total score of items matching the search term divided by the total score of all front page items
4. Front page score

The trends calculator supports searching titles, domains (with or without subdomains), and usernames. When searching by title, there are 3 different search styles:

1. Web search uses PostgreSQL [full text search](https://www.postgresql.org/docs/11/textsearch.html), in particular the [websearch_to_tsquery()](https://www.postgresql.org/docs/11/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES) function and [GIN indexes](https://www.postgresql.org/docs/11/textsearch-tables.html). By default the tsv column uses the `simple` text search configuration
2. Case-insensitive exact title match uses the `~*` PostgreSQL [regular expression](https://www.postgresql.org/docs/11/functions-matching.html#FUNCTIONS-POSIX-REGEXP) operator, combined with a [trigram index](https://www.postgresql.org/docs/11/pgtrgm.html#id-1.11.7.40.7)
3. Case-sensitive exact title match is the same as #2, but uses the `~` regex operator instead of `~*`

## Requirements

Requires PostgreSQL 11+, since `websearch_to_tsquery()` was added in version 11
6 changes: 6 additions & 0 deletions Rakefile
@@ -0,0 +1,6 @@
# Add your own tasks in files placed in lib/tasks ending in .rake,
# for example lib/tasks/capistrano.rake, and they will automatically be available to Rake.

require_relative 'config/application'

Rails.application.load_tasks
3 changes: 3 additions & 0 deletions app/assets/config/manifest.js
@@ -0,0 +1,3 @@
//= link_tree ../images
//= link_directory ../javascripts .js
//= link_directory ../stylesheets .css
Empty file added app/assets/images/.keep
Empty file.

0 comments on commit bc1ba6d

Please sign in to comment.