Carebot is an effort in thinking about alternative ways to look at analytics for journalism: both the measures and indicators used to understand story impact, and the way which analytics data is used in the newsroom.
Carebot takes your metrics and posts them to your team's Slack channels.
To get Carebot up and running you will need to:
- Add the Carebot Tracker to your content.
- Set up the Carebot Slackbot to report data through notifications.
Note: You will need accounts with Google Analytics and Slack.
Here's how to work with Carebot once you've invited it to your channels.
Help!
@carebot help
To find out details about a story:
@carebot slug story-slug-here
@carebot What's going on with slug story-slug-here?
@carebot Give me the scoop on slug another-slug-here please!
To see an overview of the last week:
@carebot hello!
Carebot will automatically track stories (see "Running Carebot", below). Sometimes you want to manually add stories to the queue:
@carebot track story-slug-here http://example.com/story
The following things are assumed to be true in this documentation.
- You are running OSX.
- You are using Python 2.7.
- You have virtualenv and virtualenvwrapper installed and working.
- You have an Amazon Web Services account, and know how to create an S3 bucket and associated access keys.
- You are using Google Analytics
- You are using Slack
If you need assistance setting up a development environment to work with this for the first time, we recommend NPR Visuals' Setup.
Clone this repository and create a virtual environment:
git clone https://github.com/thecarebot/carebot.git
cd carebot
mkvirtualenv carebot --no-site-packages
workon carebot
pip install -r requirements.txt
The bot requires several API credentials available as environment variables.
An easy way to add them is to use a .env
file. Copy sample.env
to dev.env
.
Fill in the values (see detailed instructions below). Then, run source dev.env
to make them availabe to the app.
Copy sample.env
to staging.env
and production.env
to store keys for those
environments. The appropriate .env
file is copied to the server on deploy.
Create an S3 bucket and add the bucket name, access key, and secret to your
.env
file. You might want to set up a user that only has read-write access
to that bucket. No other Amazon services are needed.
You'll need to add a slackbot integration to get a SLACKBOT_API_TOKEN
:
- Browse to http://yourteam.slack.com/apps/manage/custom-integrations
- Navigate to "Bots"
- Select "Add integration"
- Fill in the username with "@carebot"
- Select "Add integration"
- Copy the API token to your
.env
After adding the bot, invite it to your Slack channels. You might want to set up a private channel for testing. (Note that you can't make a channel private channel public later.)
The first time you run Carebot, you'll need to authorize the app with your Google account.
First, you'll need to create a Google API project via the Google developer console. There's a more detailed guide on the NPR Apps blog
Then, enable the Drive and Analytics APIs for your project.
Finally, create a "web application" client ID. For the redirect URIs use:
http://localhost:8000/authenticate/
http://127.0.0.1:8000/authenticate
http://localhost:8888/authenticate/
http://127.0.0.1:8888/authenticate
For the Javascript origins use:
http://localhost:8000
http://127.0.0.1:8000
http://localhost:8888
http://127.0.0.1:8888
Copy the Client ID and Client Secret from the project's Credentials tab to
GOOGLE_OAUTH_CLIENT_ID
and GOOGLE_OAUTH_CONSUMER_SECRET
in your .env
file.
After you have set these up, from the the project root, run:
source dev.env
fab app
And follow the on-screen instructions. If you have already set up an app using the NPR Apps template, you may not have to do this.
If you're running this internally at NPR, you'll want a key for the NPR API. This will be used to pull more details about articles, like primary image and a more accurate "published at" date. You can register for one here.
- Double check the Google Analytics organization IDs you set in
app_config
- Test your analytics query in the (Query Explorer](https://ga-dev-tools.appspot.com/query-explorer/) and make sure you get the results you expect there.
- Review the step-by-step oauth instructions on the NPR Visuals Team blog.
Scroll depth stats use an optional screenshot tool to take snaps of your pages. We've hardcoded in the URL to our service and you can run it on your own if you prefer. It runs well on Heroku or any other place you can run a Node app. However, it's totally optional and the bot will function just fine without it.
python carebot.py
After starting the bot, make sure to invite it to the channel you set in .env
.
Configure Carebot to pull stores from various sources by customizing the TEAMS
and SOURCES
in `app_config.
Under TEAMS
, define team names and the channel messages for that team should
post to. Make sure you have a default
team (it can have the same properties as
another team).
ga_org_id
should be the ID of the team's Google Analytics account. You can
find the ID you need using the Analytics API explorer:
select the account, property, and view, then copy the number after ga:
in
the ids
field.
Under SOURCES
, define where content should be pulled from, and what team it
belongs to. There currently are two supported sources:
spreadsheet
pulls from a google doc using itsdoc_key
rss
pulls from an RSS feed via aurl
. It recognizes many typical feeds.
This is usually run via a cronjob, but you can fire it manually to test it out:
fab carebot.load_new_stories
This is usually run via a cronjob after load_new_stories
, but you can fire it
manually:
fab carebot.get_story_stats
You'll need a couple packages:
- SQLite:
sudo apt-get install sqlite3 libsqlite3-dev
sudo apt-get install libjpeg-dev
for pilsudo apt-get install python-dev
for pycryptosudo apt-get install libpng-dev
for matplotlibsudo apt-get install libfreetype6-dev libxft-dev
for matplotlib- Any other basics: python, pip.
If you have trouble running carebot with matplotlib, this command may help:
echo "backend: TkAgg" > ~/.matplotlib/matplotlibrc
Make sure you have a .env
file with your production keys (see "add
creditials"). Also, ensure you have an analytics.dat
file (see "Add Google
Analytics Credentials" above).
Then, run:
fab production setup
To deploy carebot to production:
fab production deploy
You may occasionally need to stop and start a remote carebot instance:
fab staging stop_service:bot
fab staging start_service:bot
We use upstart
to keep carebot running on deploys. Sometimes carebot doesn't
start. The debug process is fairly annoying, but here are some things that might
help.
- Logs should be available in
/var/log/upstart/carebot.log
- Not all errors go there :-/. You might want to try outputting each command in
confs/bot.conf
, eg...command >> {{ SERVER_PROJECT_PATH }}/log.txt
- The cron file is
/etc/cron.d/carebot
- Cron errors might end up in
/var/log/syslog
; checktail -100 /var/log/syslog
- By default, cron errors will be in
/var/log/carebot/
Are you collecting stats not built into Carebot? Great! You can write plugins to fetch and share the stats.
Plugins have two functions:
-
Pull stats regularly and announce them to a channel
-
Respond to inquiries ("@carebot, help!")
Here's how to write a new plugin:
The CarebotPlugin
base class is in /plugins/base.py
. Extend CarebotPlugin
to create a new plugin. You can also copy one of the existing plugins to a new
file.
If you want your plugin to listen for questions sent to Carebot, (for example,
"@carebot, what should I have for lunch?") implement get_listeners
on your
class. The optional get_listeners
function should return a list of regular
expressions and corresponding handler functions that will match and respond to
incoming slack messages. For a simple example, see the help
plugin.
Carebot will regularly check every story in the database. If the story has not
been checked recently, the optional get_update_message
function of each plugin
will be called. You can return a mesage and attachments, for example with the
latest stat for the article. Check the linger
or scrolldepth
plugins for
example code.
If you're pulling stats from Google Analytics, the Query Explorer is a helpful way to test your parameters before coding them into Carebot.
To enable a plugin, list it in two places in app_config.py
:
-
Add the full path to the list of
CAREBOT_PLUGINS
-
Add the full path of the plugin to the
plugins
array on the teams you want to report that stat.
Here's an example:
CAREBOT_PLUGINS = (
'plugins.npr.help.NPRHelp',
'plugins.npr.linger.NPRLingerRate',
'plugins.npr.scrolldepth.NPRScrollDepth',
)
TEAMS = {
'default': {
'channel': 'visuals-graphics',
'ga_org_id': '100693289',
'plugins': [
'plugins.npr.linger.NPRLingerRate',
'plugins.npr.scrolldepth.NPRScrollDepth',
],
}
Carebot regularly runs scrapers to load new stories into its database. You'll probably have to write or customize a scraper to get Carebot to learn about your stories.
Scraper code is located in /scrapers
. We've included an RSS scraper and a
scraper that pulls articles from a Google Spreadsheet. You can copy or customize
these to pull in your stories, or write a new scraper from scratch.
Scrapers are instantiated with the SOURCES
pulled from app_config
. All
scrapers should implement a scrape_and_load
method that returns a list of
stories.
Once you've got a scraper, register it in /fabfile/carebot.py
in the
load_new_stories
method.
To run tests:
nosetests
These tests help you make sure that scrapers, plugins, and analytics tools are running as expected. We highly recommend writing simple tests for your plugins to make sure they respond correctly.
Carebot loads new stories and checks for stats regularly by running the fabfile
via cron jobs. These are set for staggered 15-minute runs. You can configure
them by editing the crontab
file.
If you make changes to existing models in models.py
, you will need to write
a migration.
For now, migrations are not automatically run on deploy and may need to be run
manually. Inside the environment, run python MIGRATION_NAME.py