Change logging of WebZMachine #360

Closed
mworrell opened this Issue Jun 18, 2012 · 10 comments

Projects

None yet

3 participants

@mworrell
Member

WebZMachine makes a http access log per hour in the priv/logs directory.

There is no automatic cleanup in WebZMachine (or Zotonic) for those logs.
This makes them accumulate.

As most of us never looked into those log files anyway, I propose to check if we can change the logging.

More interesting logging could be:

  • host
  • request
  • peer ip
  • request time
  • request duration
  • request size (headers + body)
  • request type (include websockets?)
  • request referrer
  • reply http code
  • reply body size
  • internal request id
  • optional user id
  • optional session id
  • optional page id
  • language
  • user agent
  • user agent device classification
  • user agent 'is_bot' flag (future addition)

We can use the internal request id to match against logs from other sub systems.

Question: where to store this information?

Options:

  • send it over UDP to some listener
  • make a process that collects this information (good for real time insights)
  • a combination
@kaos
Member
kaos commented Jun 18, 2012

+1 for fixing the request logging. First off to purge old files.

Use lager?
Arjan started some work with lager for logging. Could we use that?

2012/6/18 Marc Worrell <
reply@reply.github.com

WebZMachine makes a http access log per hour in the priv/logs directory.

There is no automatic cleanup in WebZMachine (or Zotonic) for those logs.
This makes them accumulate.

As most of us never looked into those log files anyway, I propose to check
if we can change the logging.

More interesting logging could be:

  • host
  • request
  • peer ip
  • request time
  • request duration
  • request size (headers + body)
  • request type (include websockets?)
  • request referrer
  • reply http code
  • reply body size
  • internal request id
  • optional user id
  • optional session id
  • optional page id
  • language
  • user agent
  • user agent device classification
  • user agent 'is_bot' flag (future addition)

We can use the internal request id to match against logs from other sub
systems.

Question: where to store this information?

Options:

  • send it over UDP to some listener
  • make a process that collects this information (good for real time
    insights)
  • a combination

Reply to this email directly or view it on GitHub:
#360

@mworrell
Member

Lager might be a good start.

Though we need to modify some things, like having a log per host?

@mmzeeman
Member

We could use gen_event event notification here. Then we can add and remove different handlers when needed.

Maas

WebZMachine makes a http access log per hour in the priv/logs directory.

There is no automatic cleanup in WebZMachine (or Zotonic) for those logs.
This makes them accumulate.

As most of us never looked into those log files anyway, I propose to check if we can change the logging.

More interesting logging could be:

  • host
  • request
  • peer ip
  • request time
  • request duration
  • request size (headers + body)
  • request type (include websockets?)
  • request referrer
  • reply http code
  • reply body size
  • internal request id
  • optional user id
  • optional session id
  • optional page id
  • language
  • user agent
  • user agent device classification
  • user agent 'is_bot' flag (future addition)

We can use the internal request id to match against logs from other sub systems.

Question: where to store this information?

Options:

  • send it over UDP to some listener
  • make a process that collects this information (good for real time insights)
  • a combination

Reply to this email directly or view it on GitHub:
#360

@kaos
Member
kaos commented Jun 18, 2012

Sounds feasible :)

2012/6/18 Maas-Maarten Zeeman <
reply@reply.github.com

We could use gen_event event notification here. Then we can add and remove
different handlers when needed.

Maas

WebZMachine makes a http access log per hour in the priv/logs directory.

There is no automatic cleanup in WebZMachine (or Zotonic) for those logs.
This makes them accumulate.

As most of us never looked into those log files anyway, I propose to
check if we can change the logging.

More interesting logging could be:

  • host
  • request
  • peer ip
  • request time
  • request duration
  • request size (headers + body)
  • request type (include websockets?)
  • request referrer
  • reply http code
  • reply body size
  • internal request id
  • optional user id
  • optional session id
  • optional page id
  • language
  • user agent
  • user agent device classification
  • user agent 'is_bot' flag (future addition)

We can use the internal request id to match against logs from other sub
systems.

Question: where to store this information?

Options:

  • send it over UDP to some listener
  • make a process that collects this information (good for real time
    insights)
  • a combination

Reply to this email directly or view it on GitHub:
#360


Reply to this email directly or view it on GitHub:
#360 (comment)

@mmzeeman
Member
mmzeeman commented Jun 5, 2014

Our servers are currently becoming busier, generating 22 Mb access log files per hour. Having no automatic cleanup and more flexible configuration possibilities is becoming a problem for us. This system is on a simple vps without a lot of disk space.

Logging should not take a lot of time for the processes handling the requests. Currently writing the access log can become a bottleneck too when things get busy.

Therefor I was thinking about sending the access log to syslog. Then you can use all the nice tools available on the system. Automatic compression, cleanup, sending things to a centralized log server...

I was thinking about implementing it like this:

  • Request processes write log entries to an ets table (optimized for parallel writing)
  • A new server will become responsible for sending the access log entries to syslog once every second.

An erlang syslog application is here: https://github.com/Vagabond/erlang-syslog

What do you think?

@mworrell
Member
mworrell commented Jun 5, 2014

I would like to be able to intercept and interpret the requests as well.

Then we could more easily add tracing on certain kinds of requests, ip-addresses, users etc.
And (using the request-id) we can combine them with other logs from the system.

For this we could (slowly) add smarter logging, using data from the RequestData and Context to tag logging messages. Similar to the property lists that can be added to lager (as a first argument).

@mworrell
Member
mworrell commented Jun 5, 2014

(Continued)

The access log can then just be a diversion from the normal logging processes, even adding extra information (like timing per dispatch rule/host/user/ip etc)

@mmzeeman
Member
mmzeeman commented Jun 5, 2014

Currently the logging is done in z_stats which takes useful information from the requests and sends it to folsom and then calls the standard webmachine access logger.

We should make it pluggable then.

Maybe simply with a call to z_notifier:notify? That way we can also separate the logs for different sites easily.

@mmzeeman
Member
mmzeeman commented Jun 5, 2014

What we have right now is also ok, except that the configuration of the logger is hard coded inside zotonic_sup. We could make it pluggable right there..

A simple middleware type module which calls multiple other configured loggers will do the trick just fine.

The current setup can be the default configuration.

@mworrell
Member

This sho

@mworrell mworrell closed this Apr 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment