log data storage
Switch branches/tags
Nothing to show
Pull request Compare This branch is 4 commits behind ryandotsmith:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
CHANGELOG
readme.md
sample.env
wcld.go
wcld_test.go

readme.md

wcld

wc -l (daemon)

Wcld is a process that will listen on TCP $PORT for incoming syslog data. Wcld will parse the crnl separated data looking for key=value substrings. When a key=value substring is found, wcld will write the keys and values to an hstore column in a PostgreSQL database.

Usage

Once your applications are draining their logs into a wcld process, you can begin reporting our your log data.

On a typical web process, the Heroku router will emit the following log message:

2012-02-16T06:06:16+00:00 heroku[router]: PUT shushu.herokuapp.com/resources/328408/billable_events/41143162 dyno=web.3 queue=0 wait=0ms service=89ms status=201 bytes=235

Notice how the log message contains the service time. This represents the time it took our web process to respond to the request. We can quickly group our app's average response time grouped by hour:

$ heroku pg:psql

Avg

SELECT
  date_trunc('hour', time) AS time_group,
  avg((data -> 'service')::interval)
FROM
  log_data
WHERE
  data ? 'service'
  GROUP BY time_group
  ORDER BY time_group
;
       time_group       |       avg
------------------------+-----------------
 2012-02-13 20:00:00+00 | 00:00:00.074848
 2012-02-13 21:00:00+00 | 00:00:00.076898
 2012-02-13 22:00:00+00 | 00:00:00.073627
 2012-02-13 23:00:00+00 | 00:00:00.075232
 2012-02-14 00:00:00+00 | 00:00:00.075852
 2012-02-14 01:00:00+00 | 00:00:00.073475
 2012-02-14 02:00:00+00 | 00:00:00.072609
 2012-02-14 03:00:00+00 | 00:00:00.073081

Percentile

SELECT
  perctile,
  avg(elapsed_time::interval)
FROM (
  SELECT
    data -> 'elapsed_time' as elapsed_time,
    ntile(100) over (order by (data -> 'elapsed_time')) as perctile
  FROM
    log_data
  WHERE
    data -> 'action' = 'find_prev_rec'
    and
    time > now() - '9 minutes'::interval
    and
    expired = false
) x
WHERE
  perctile = 95
GROUP BY perctile
;
 perctile |       avg
----------+-----------------
       95 | 00:00:00.008944
(1 row)

Indicies

One possible indexing strategy:

ALTER TABLE log_data ADD COLUMN expired boolean default false;
UPDATE log_data SET expired = 't' where time <= now() - '3 days'::interval;
CREATE INDEX recent_events on log_data (time) where expired = false;
-- use crom to REINDEX each day ??

Deploy to Heroku

  • Create app with Go buildpack
  • Attach database to app
  • Attach route to app
  • Point emitter app's at new wcld app

Create App

$ git clone git://github.com/ryandotsmith/wcld.git
$ cd wcld
$ heroku create -s cedar --buildpack=git@github.com:kr/heroku-buildpack-go.git#rc
$ echo "wcld/wcld" >.godir
$ echo "wcld: bin/wcld -f=\"kv\"" > Procfile #or -f="json"
$ git add . ; git commit -am "init"
$ git push heroku master

Attach Database

$ heroku addons:add heroku-postgresql:ika
$ heroku pg:wait
$ heroku pg:promote HEROKU_POSTGRESQL_<COLOR>
$ heroku pg:psql
psql- create extension hstore;
psql- create table log_data (id bigserial, time timestamptz, data hstore);
psql- create index index_log_data_by_time on log_data (time);

Attach Route

$ heroku routes:create
$ heroku routes:attach tcp://... wcld

Start WCLD Process

$ heroku scale wcld=2 #can use multiple processes

Use it to drain an emitter app:

$ heroku drains:add syslog://... -a other-app

Build

$ cd $GOROOT
$ hg update weekly
$ cd src; ./all.bash
$ cd $GOPATH/src
$ git clone git://github.com/ryandotsmith/wcld.git
$ cd wcld
$ go build .

Test

$ cd $GOPATH/src/wcld
$ go test .