Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
412 lines (382 sloc) 16.5 KB
---
# Copyright 2017 Yahoo Holdings. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
title: "Log Files"
---
<p>
All Vespa components use a common log module for logging. These log
messages are added to a local log file in <em>$VESPA_HOME/logs/vespa/</em>
and filtered, then forwarded, to the log server on the administration node.
The log archive and rotation is explained in <a href="#log-server">log server</a>.
</p>
<h2 id="log-file-fields">Log file fields</h2>
<p>
Log files are in a machine-readable log format, made more human-readable by
<a href="vespa-cmdline-tools.html#vespa-logfmt">vespa-logfmt</a> -
it can filter out log messages from specific programs, only show certain log levels,
print the time in a more directly understandable format and so on.
Each line that is logged consists of the following fields, in order, separated by a TAB (ASCII 9) character:
</p>
<pre>
time host pid service component level message
</pre>
<table class="table">
<thead>
<tr><th>Log field</th><th>Description</th></tr>
</thead><tbody>
<tr><td><em><tt>time</tt></em></td><td>
Time in seconds since 1970-01-01 UTC, with optional fractional seconds after. E.g. 1102675319.726342
</td></tr>
<tr><td><em><tt>host</tt></em></td><td>
The hostname of the machine that produced this log entry
</td></tr>
<tr><td><em><tt>pid</tt></em></td><td>
The process id, and an optional thread-id of the process/thread that logged the message
</td></tr>
<tr><td><em><tt>service</tt></em></td><td>
The Vespa service name of the logger
</td></tr>
<tr><td><em><tt>component</tt></em></td><td>
The component name that logged.
An application may have many different sub-components with their own component names,
usually starts with the name of the binary
</td></tr>
<tr><td><em><tt>level</tt></em></td><td>
One of fatal, error, warning, info, config, event, debug, or spam
</td></tr>
<tr><td><em><tt>message</tt></em></td><td>
The log message itself. All dangerous characters are escaped
(CR, NL, TAB, \, ASCII &lt; 32 and ASCII 128..159)
</td></tr>
</tbody>
</table>
<table class="table">
<thead>
<tr><th>Log level</th><th>Description</th></tr>
</thead><tbody>
<tr><td><em><tt>fatal</tt></em></td><td>
Fatal error messages. The application must exit immediately,
and restarting it will not help
</td></tr>
<tr><td><em><tt>error</tt></em></td><td>
Error messages. These are serious, the application cannot function correctly
</td></tr>
<tr><td><em><tt>warning</tt></em></td><td>
Warnings - the application may be able to continue, but the situation should be looked into
</td></tr>
<tr><td><em><tt>info</tt></em></td><td>
Informational messages that are not reporting error conditions,
but should still be useful to the operator
</td></tr>
<tr><td><em><tt>config</tt></em></td><td>
Configuration settings
</td></tr>
<tr><td><em><tt>event</tt></em></td><td>
<a href="#log-events">Machine-readable events</a>.
May contain information about processes starting and stopping, and various metrics
</td></tr>
<tr><td><em><tt>debug</tt></em></td><td>
Debug messages - normally suppressed
</td></tr>
<tr><td><em><tt>spam</tt></em></td><td>
Very low-level debug messages, normally suppressed.
Generates massive amounts of logs when enabled
</td></tr>
</tbody>
</table>
<h2 id="writing-logs">Writing logs</h2>
<p>
Code can have many <em>DEBUG</em>-level log lines -
include a simple check to speed up execution:
<pre>
if(log.isLoggable(LogLevel.DEBUG)) then log ...
</pre>
</p>
<h2 id="controlling-log-levels">Controlling log levels</h2>
<p>
Vespa programs running as services can have their active log levels modified at
run-time by the vespa-logctl program. Standalone programs will
instead read the environment variable <em>VESPA_LOG_LEVEL</em> on startup to
determine which log levels should be active.
</p><p>
The default setting for <em>VESPA_LOG_LEVEL</em> is <em>"all -debug -spam"</em>,
which enables all log levels except debug and spam.
</p><p>
<em>vespa-logctl</em> can alter the active log levels for a program, or for
parts of a program, while it is running. This is especially useful for
enabling debug output in parts of a live system for diagnosing problems. It
can also be used to silence programs that are too verbose in their
logging.
</p><p>
<em>vespa-logctl -h</em> shows how to use the program.
Programs that can be controlled by vespa-logctl put log control files in
<em>$VESPA_HOME/var/db/vespa/logcontrol/service.logcontrol</em>.
If this file exists on program startup, it will be used to set the logging levels.
This means that log level modifications done with <em>vespa-logctl</em> are sticky,
and can also be performed even if the program is not running.
</p>
<h2 id="log-events">Log events</h2>
<p>
Event messages are log messages of the <em>event</em> type.
Events contain a well-defined payload which makes them suitable
for automated processing of various kinds, like alerting.
An event is emitted by a component when something of interest happens to it,
or when it has some metric data it wants to share with the world.
As all other log messages, events are collected to the admin nodes by the logserver component,
where they may be found in the Vespa log or intercepted programmatically by a logserver plugin.
</p><p>
Metrics are used to report on internal variables detailing the
processing performed by a particular component.
<em>VALUES</em> are numbers with momentarily significance,
such as queue lengths and latencies.
<em>COUNTER</em> are numbers increasing monotonically with each processing step,
such as number of documents processes, or number of queries.
Refer to the <a href="metrics-health-format.html">metrics API</a> and
<a href="../jdisc/metrics.html">container metrics</a>.
</p><p>
Each event has an event <em>type</em>, a <em>version</em> and an optional <em>payload</em>.
In the log format, event types are expressed as a single word, versions as a simple integer,
and the payload as a set of <em>key=value</em> pairs.
The event payload is backslash-quoted just like log messages are in general.
This means that events may be double-quoted during transport.
Double-quote delimiters are not supported.
</p>
<table class="table">
<thead></thead><tbody>
<tr>
<th>starting</th>
<td>
Payload: <em>name=&lt;name&gt;</em><br/>
This event is sent by processes when they are about to start another
process. Typical for, but not limited to, shell scripts. This event
is not required to track processes, but is useful in cases where a
sub-process may fail during startup. Example:
<pre>starting qrserver for default/container.0</pre>
</td>
</tr>
<tr>
<th>started</th>
<td>
Payload: <em>name=&lt;name&gt;</em><br/>
The <em>started</em> event is sent by a service that just started up. Example:
<pre>started/1 name="vespa-proton"</pre>
</td>
</tr>
<tr>
<th>stopping</th>
<td>
Payload: <em>name=&lt;name&gt; why=&lt;why&gt;</em><br/>
The <em>stopping</em> event is sent by a process that is about to exit. Example:
<pre>stopping/1 name="vespa-proton" why="clean shutdown"</pre>
</td>
</tr>
<tr>
<th>stopped</th>
<td>
Payload: <em>name=&lt;name&gt; pid=&lt;pid&gt; exitcode=&lt;exitcode&gt;</em><br/>
This event is sent by a process monitoring when a sub-process exits. Example:
<pre>stopped/1 name="vespa-proton" pid=14523 exitcode=0</pre>
</td>
</tr>
<tr>
<th>crash</th>
<td>
Payload: <em>name=&lt;name&gt; pit=&lt;pid&gt; signal=&lt;signal&gt;</em><br/>
Submitted by a process monitoring a sub-process when the sub-process
crashes (dumps core etc.). Example:
<pre>crash/1 name="vespa-proton" pid=12345 signal=11</pre>
</td>
</tr>
<tr>
<th>count</th>
<td>
Payload: <em>name=&lt;name&gt; value=&lt;value&gt;</em><br/>
General event for counts - for tracking any type of counter metric.
The <em>name</em> is specific to each library/application. Counters are assumed
to increase with time, counting the number of events since the program was started. Example:
<pre>count/1 name="queries" value=10</pre>
</td>
</tr>
<tr>
<th>value</th>
<td>
Payload: <em>name=&lt;name&gt; value=&lt;value&gt;</em><br/>
General event for values - for tracking any type of value metric.
<em>Value is for values that cannot be counts</em>.
Typical values are queue lengths, transaction frequencies and so on. Example:
<pre>value/1 name="peak_qps" value=200</pre>
</td>
</tr>
<tr>
<th>state</th>
<td>
Payload: <em>name=&lt;name&gt; value=&lt;value&gt;</em><br/>
General event for components in a process.
<em>value</em> contains a string with more detailed information on what has happened.
Note that the format and content of such strings can change between releases. Example:
<pre>state/1 name="transactionlog.replay.start" value="{"domain":"test","serialnum":{"first":1,"last":1000}}"</pre>
</td>
</tr>
</tbody>
</table>
<h2 id="logd">Logd</h2>
<p>
Details here on how local log rotation and forward to log server is done by logd
<!-- ToDo -->
</p>
<h2 id="log-server">Log server</h2>
<p>
On the log server on the administration node, the <em>Archiver</em>
plugin will write the log messages from each
node to a log archive. These messages are written to the
log file based on the message timestamp. The log files are located
in the <code>$VESPA_HOME/logs/vespa/logarchive</code> directory.
The catalog structure is like:
<pre>
logarchive/&lt;year&gt;/&lt;month&gt;/&lt;day&gt;/&lt;hour&gt;-&lt;serial&gt;
</pre>
For instance will a message logged at 2016-07-22 08:05:00 be found in:
<pre>
logarchive/2016/07/22/08-0
</pre>
All dates and times are in UTC. If the log file exceeds 20 Mb,
the file will be rotated and the serial number will increase. These
files are the same that are available through the web interface.
</p><p>
The log archive might grow large - to disable storing of metrics in log
archive, set the <code>vespa_log_server.archive_metric</code> environment setting to off.
If you later need to debug using metric data, aggregate these by hand
from the individual nodes in the system.
</p><p>
Events are normally filtered before sending to the log archive.
To forward, add to <em>services.xml</em>:
<pre>
&lt;services&gt;
...
&lt;config name="cloud.config.log.logd"&gt;
&lt;loglevel&gt;
&lt;event&gt;
&lt;forward&gt;true&lt;/forward&gt;
&lt;/event&gt;
&lt;/loglevel&gt;
&lt;/config&gt;
...
</pre>
</p>
<h3 id="real-time-log-monitoring">Real-time log monitoring</h3>
<p>
One of the standard plugins in the log server is the <em>Replicator</em> plugin.
This plugin is able to replicate log messages to other interested services or parties.
To use this plugin, telnet to port 19083 on the log server.
Use the <a href="vespa-cmdline-tools.html#vespa-model-inspect">vespa-model-inspect</a>
to verify that the logserver has a port tagged with 'telnet logtp' at 19083.
<pre>
$ telnet host 19083
</pre>
After connecting, you can apply a message filter. To list the
message filters available, simply write <code>list</code>. To
apply a message filter, use <code>use &lt;filter&gt;</code>, like:
<pre>
Connected to host.
Escape character is '^]'.
list
# 203 filter list
# 204 system.all - Match all log messages
# 204 system.metricsevents - Match all Count and Value messages
# 204 system.allevents - Match specific log levels
# 204 system.nometricsevents - Matches all log messages except Count and Value events
# 204 system.mute - Matches no messages. Mute.
# 205 end filter list
use system.all
# 200 using 'com.yahoo.logserver.filter.NullFilter@62974e'
</pre>
</p>
<h2 id="access-log-file-content">Access log file content</h2>
<p>
The Search Container logs each query in its access log.
The log files are named QueryAccessLog.&lt;timestamp&gt;
and is found in <em>$VESPA_HOME/logs/vespa/qrs/</em>.
In the default 'vespa' format, each line that is logged consists of the following fields,
in order, separated by a space character:
<pre>
SourceIP - - [Date] "HTTPRequest" HTTPReturnCode ByteCount "Referrer" "UserAgent" ProcessingTime TotalHitCount 0 HitCount
</pre>
<table class="table">
<thead>
<tr><th>Log field</th><th>Description</th></tr>
</thead><tbody>
<tr><td><em><tt>SourceIP </tt></em></td><td>IP address of client </td></tr>
<tr><td><em><tt>- </tt></em></td><td>Nothing </td></tr>
<tr><td><em><tt>- </tt></em></td><td>Nothing </td></tr>
<tr><td><em><tt>Date </tt></em></td><td>Timestamp (localtime) when query was executed with offset from GMT </td></tr>
<tr><td><em><tt>HTTPRequest </tt></em></td><td>HTTP request string </td></tr>
<tr><td><em><tt>HTTPReturnCode </tt></em></td><td>HTTP return
code for the query
<ul>
<li>200 for ok</li>
</ul>
</td></tr>
<tr><td><em><tt>ByteCount </tt></em></td><td>Number of bytes sent to the client </td></tr>
<tr><td><em><tt>Referrer </tt></em></td><td>HTTP referrer string </td></tr>
<tr><td><em><tt>UserAgent </tt></em></td><td>Client user agent </td></tr>
<tr><td><em><tt>ProcessingTime </tt></em></td><td>Time used to process the query in seconds. Including query parsing, back end execution. Does not include query reception from client and result set transfer time to client.</td></tr>
<tr><td><em><tt>TotalHitCount </tt></em></td><td>The total number of hits for the result set. </td></tr>
<tr><td><em><tt>0 </tt></em></td><td>Reserved for later expansion - always 0 </td></tr>
<tr><td><em><tt>HitCount </tt></em></td><td>Number of hits in the query result page. Note that the number of grouping results (if there are any) is included in this number </td></tr>
</tbody>
</table>
Example log entry (timestamp offset -0800 is equal to PST):
<pre>
127.0.0.1 - - [24/Jan/2008:05:29:04 -0800] "GET search/?query=yahoo HTTP/1.1" 200 270 ""
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7" 2.792 1 0 1
</pre>
If more detailed logging is needed, configure access logging in JSON.
This format also allows logging of custom fields - see
<a href="../logging-in-json-format.html">access logging in JSON</a>.
</p>
<h3 id="time-values-in-the-access-log-compared-to-metrics-and-log-events">
Time values in the access log compared to metrics and log events</h3>
<p>
The timing in the access log will in general be slightly off compared
to the timing values in the vespa.log.
The reason is the “probes” into the system are placed at slightly different levels of abstraction.
The explanations here are directed at experienced users and troubleshooting.
</p>
<h4>Definition of processing time in the access log</h4>
<p>
Processing time in the access log starts when the execution is first
invoked from the search handler. The end is dependent on whether
the response is asynchronous or not. For a synchronous response, the
end is after the renderer has been invoked, but before the rendering
buffer is flushed. For an asynchronous response, e.g. a normal
search response, the end is defined as when the completion handler
is created. That means after control flow has returned from the
search chains, but before any network traffic or rendering has been
done.
</p>
<h4>Definition of processing time in the vespa.log</h4>
<p>
StatisticsSearcher defines the metric <em>query_latency</em> and the log event
<em>mean_query_latency</em>. The data fed into these is identical. The start of
the interval is defined as when the control flow enters
StatisticsSearcher, the end as when the next searcher after
StatisticsSearcher returns from search(). This has the side
effect of <em>not</em> including fill time if the result was not already
filled when passed on from StatisticsSearcher. This may happen if the
SearchHandler has to invoke fill() itself, e.g. if no searchers need to
access hit fields.
</p>
<h4>Timing summary</h4>
<p>
The access log includes everything happening before rendering, but
will exclude expensive rendering logic and slow networks. The query
latency event and metrics only covers what happens inside the search
chain where StatisticsSearcher is placed, and may exclude summary
fetching.
</p>
<h3 id="zookeeper-log">ZooKeeper Log</h3>
<p>
The ZooKeeper log file is normally not necessary to monitor on a regular basis,
but is mentioned here as a possible source of information in case you
should ever need to debug the Vespa configuration system.
Is is located at <code>$VESPA_HOME/logs/vespa/zookeeper.&lt;servicename&gt;.log</code>.
</p>