Log2Graphite

access.log parser. Application parse and upload metrics to graphite server in realtime. In addition it supports non-realtime operation (reparsing olg logs).

What is it

Sometimes you want to parse logs (maybe in realtime), retrieve some custom metrics and finally got some graphs. For log parsing could be used Logstash, which has plugin for uploading metrics to Graphite. Such configuraton works fine for most applications.

But not for me. I realized that parsing 100G access.log takes too mutch time. And realtime acccess.log parsing takes too mutch CPU cores for QPS rates 5K and more.

Log2Graphite app is pure Java app, tested with Java 1.7. Multithreaded, high performance.

Features

Log2Graphite supports:

realtime metrics upload with 1 minute granularity
customs access.log format. It is configurable using apache / nginx style
real time log parsing
.gz format for non-realtime parsing
S3 Amazon storage for non-realtime parsing

Metrics supported

total requests counter
response and upstream time (min / max / avg / stdev / 99%)
response code
data size
GET/POST/OTHER counters
specific metrics for my application, likely nobody need it

Usage

usage: Log2Graphite -f <filepath> [ options ]
 -atime <aggregate_time>   aggregate metric timeout in seconds. default is 60
 -c <config>               path to config file (for custom access.log format)
 -f <filepath>             path to log file (local or S3)
 -h <host>                 Graphite host IP. default is not upload metrics to graphite
 -p <port>                 Graphite port. default is 2003
 -notail                   parse single file from start without tail (non-realtime parsing)
 -start                    for realtime parsing only: tail log from start. default is to start parsing from the end
 -t <arg>                  number of parsers. default is 1 parser
 -key <AWS access key>     S3 access key
 -secret <AWS secret key>  S3 secret key

Config file format

cat config.properties 
# log_format similar to nginx.conf
#
# escape lines for support multiply line format

# default value
#log_format='$remote_addr - $remote_user [$time_local] "$request" ' \
#         '$status $body_bytes_sent "$http_referer" ' \
#           '"$http_user_agent" "$http_x_forwarded_for" "$request_time" "$upstream_response_time" "$pipe"';

#log_format='$remote_addr - $remote_user [$time_local] "$request" '
#           '$status $body_bytes_sent "$http_referer" '
#           '"$http_user_agent" "$http_x_forwarded_for" "$request_time" "$upstream_response_time" "$pipe"';

Default access.log format

    '$remote_addr - $remote_user [$time_local] "$request" '
                 '$status $body_bytes_sent "$request_body" '
                  '"$connection_requests" "$http_connection" '
                  '"$http_user_agent" "$http_x_forwarded_for" "$request_time" "$upstream_response_time" "$pipe"';

Logging customization

Used Log4j. By default logging enabled to stdout

# Root logger option
log4j.rootLogger=INFO, stdout

# Direct log messages to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d [%t] %-5p %c - %m%n

use custom log4j.properties in classpath to override.

Usage examples

tail access.log in realtime from the beginning and upload metrics to Graphite.

java -classpath <path to config file>:<path to jar file> com.company.log2graphite.Log2Graphite -f <path to access.log> -t 4 -h <IP> -start

parse archived access.log and upload metrics to Graphite

java -classpath <path to config file>:<path to jar file> com.company.log2graphite.Log2Graphite -f <path to access.log-20140101.gz> -t 10 -h <IP> -notail

parse archived access.log from S3 and upload metrics to Graphite

java -classpath <path to config file>:<path to jar file> com.company.log2graphite.Log2Graphite -f s3://bucket/path/to/access.log.gz -t 10 -h <IP> -notail -key <S3 access key> -secret <S3 secret key>

Performance

Tested on c3.8xlarge AWS instance. Log2Graphite started with 10 Parser threads

10G archived access.log.gz : ~9 minutes

60G raw access.log : ~4 minutes

6G S3 archived access.log.gz : ~17 minutes

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

README.md

README.md

pom.xml

pom.xml

Repository files navigation

Log2Graphite

What is it

Features

Metrics supported

Usage

Config file format

Default access.log format

Logging customization

Usage examples

Performance

About

Releases 2

Packages

Languages

vklindukh/Log2Graphite

Folders and files

Latest commit

History

Repository files navigation

Log2Graphite

What is it

Features

Metrics supported

Usage

Config file format

Default access.log format

Logging customization

Usage examples

Performance

About

Resources

Stars

Watchers

Forks

Languages