Rails app - news aggregator that powers http://hrfilter.de and http://fahrrad-filter.de
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitlab/issue_templates
app switched twitter client to global twitter client Jan 18, 2019
bin Gitlab script + deployment changed Dec 12, 2018
config
db TestFixes Jan 13, 2019
lib fix Nov 8, 2018
log init Feb 21, 2013
one-timers Run some analytics for our review Jun 20, 2017
public Blocked invalid page parameter requests May 5, 2017
script init Feb 21, 2013
spec TestFixes Jan 13, 2019
test/mailers/previews Abmeldelink in Footer Aug 20, 2018
vendor fix(Admin): update js-tablesorter und füge dateFormat Option hinzu Nov 21, 2017
.agignore
.gitignore
.gitlab-ci.yml Gitlab script + deployment changed Dec 12, 2018
.gtt.yml Gtt-config fuer das Projekt angelegt Nov 6, 2017
.hound.yml hound.yml Sep 29, 2015
.jshintrc Shuffle eingebaut Jun 1, 2015
.rspec
.rubocop.yml Rubocop.yml updated Jun 11, 2018
.ruby-version Ruby 2.5.3 Dec 12, 2018
.tool-versions Ruby 2.5.3 Dec 12, 2018
.travis.yml Fix Travis Dec 12, 2018
Gemfile like worker queuing Jan 14, 2019
Gemfile.lock like worker queuing Jan 14, 2019
LICENSE Add LICENSE Nov 13, 2018
README.md Add files via upload Feb 26, 2018
Rakefile Rake: no need for env task Feb 21, 2016
TODO.md
config.ru init Feb 21, 2013

README.md

Build Status

News aggregator app

This is a Ruby-on-Rails app for running (German) news aggregator websites. Today, it powers:

Reasoning

I want to follow news of those two areas but struggle with RSS, as it is too much for me too process - I want to see the most "relevant" sources at once, without investing too much time. Other sources, like Twitter + Reddit I found too noisy to follow.

This is why I created that app

News fetching + scoring algorithm

The admin of the apps curates a list of trusted sources. Those will regularly checked for new content. Following news sources are supported:

  • RSS/Atom feeds (FeedSource)
  • Podcast via RSS/Atom (similar as FeedSource but different visual)
  • Twitter Streams
  • (in planning) RedditSources - subscribe whole /r/'s

In similar fashion, the app checks popularity of the news in social network, that means:

  • Facebook likecount (as reported by Facebook Like Button)
  • Twitter retweets + favorites (as reported by Twitter API)
  • XING + LinkedIn shares (as reported by regarding Widgets)
  • Reddit total score sum in all subreddits (if exists)
  • Each of those sources is configured with a different value (e.g. Facebook likes are more common, so less value than XING share)

The admin of the sites can give a Source individual:

  • Base factor (that means, how much "Likes" any link of that website is worth, can also be negative too remove noise from some sources)
  • Multiplicator, e.g. 0.2, 1.0, 2.0 - each like will be multiplied by that number -- Some sources have a much higher reach and can be leveled out so the news are more broad

Altogether, the score is calculated regularly for fresh links. For Display on the homepage, the freshness is also important - the older the link, the more the score is reduced.

Topics

The topic matching is very simple - just simple keyword lists. That means, the categorization is far far from perfect or even good. It might be an area of further development :)

Newsletter

It is possible to subscribe via E-Mail. Then, once per week on sunday, you will receive a Mail with from the selected topics.

Development

As it is a fully functioning Rails app, you can try to run it yourself. First make sure to have Ruby at least 2.0 installed and bundler, then:

git clone ...
cd ...
bundle install
rake environment db:create
rake db:migrate
rails server

before the rake commands, you might have to create a config/application.yml (see config/application.hrfilter.yml as example) and adjust config/database.yml and config/secrets.yml too your needs.

If you'd like, you can try to import some of the HRfilter sources for an initial seed:

rails r 'Setting.read_yaml'
rake db:seed

If you have issues to get the data with db:seed you can also try:

rails runner 'Source.cronjob'
rails runner 'NewsItem.cronjob'

The necessary tasks are at: config/schedule.rb