# LOG FILES

In order to effectively manage a web server, it is necessary to get feedback about the activity and performance of the server 
as well as any problems that may be occuring.The Apache HTTP Server provides very comprehensive and flexible logging capabilities. 
This document describes how to configure its logging capabilities, and how to understand what the logs contain.

# error logs

#The server error log, whose name and location is set by the ErrorLog directive, 
#is the most important log file. This is the place where Apache httpd will send diagnostic information and record
#any errors that it encounters in processing requests.
#It is the first place to look when a problem occurs with starting the server or with the operation of the server,
#since it will often contain details of what went wrong and how to fix it.

# format of the error log

[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test

#The first item in the log entry is the date and time of the message. 
#The second entry lists the severity of the error being reported. 
#The LogLevel directive is used to control the types of errors that are sent to the error log by restricting the severity level 
#The third entry gives the IP address of the client that generated the error. 
#Beyond that is the message itself, which in this case indicates that the server has been configured to deny the client access
#The server reports the file-system path (as opposed to the web path) of the requested document.

# ACCESS LOGS

#The server access log records all requests processed by the server.

# common log format
A typical configuration for the access log might look as follows.

LogFormat "%h %l %u %t \"%r\" %>s %b" common
CustomLog logs/access_log common

The format string consists of percent directives, each of which tell the server to log a particular piece of information. Literal characters may also be placed in the format string and will be copied directly into the log output. The quote character (") must be escaped by placing a back-slash before it to prevent it from being interpreted as the end of the format string. The format string may also contain the special control characters "\n" for new-line and "\t" for tab.

# The log file entries produced in CLF will look something like this:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

# each part of the log entry is described below

In [8]:
#1.   127.0.0.1 (%h)

#This is the IP address of the client (remote host) which made the request to the server.
#If HostnameLookups is set to On, then the server will try to determine the hostname and log it in place of the IP address.
#However, this configuration is not recommended since it can significantly slow the server.
#Instead, it is best to use a log post-processor such as logresolve to determine the hostnames. 
#The IP address reported here is not necessarily the address of the machine at which the user is sitting. 
#If a proxy server exists between the user and the server, this address will be the address of the proxy,
#rather than the originating machine."""

In [9]:
#2.   (%l)
#The "hyphen" in the output indicates that the requested piece of information is not available. 
#In this case, the information that is not available is the RFC 1413 identity of the client determined by identd on the clients machine.
#This information is highly unreliable and should almost never be used except on tightly controlled internal networks. 
#Apache httpd will not even attempt to determine this information unless IdentityCheck is set to On.

In [11]:
#3. frank (%u)
#This is the userid of the person requesting the document as determined by HTTP authentication. The same value is typically provided to CGI scripts in the REMOTE_USER environment variable. If the status code for the request (see below) is 401, then this value should not be trusted because the user is not yet authenticated. If the document is not password protected, this entry will be "-" just like the previous one.

#4.  [10/Oct/2000:13:55:36 -0700] (%t)
#The time that the server finished processing the request. The format is:
#[day/month/year:hour:minute:second zone]
#day = 2*digit
#month = 3*letter
#year = 4*digit
#hour = 2*digit
#minute = 2*digit
#second = 2*digit
#zone = (`+' | `-') 4*digit
#It is possible to have the time displayed in another format by specifying %{format}t in the log format string, where format is as in strftime(3) from the C standard library.

#5.  "GET /apache_pb.gif HTTP/1.0" (\"%r\")
#The request line from the client is given in double quotes. The request line contains a great deal of useful information. First, the method used by the client is GET. Second, the client requested the resource /apache_pb.gif, and third, the client used the protocol HTTP/1.0. It is also possible to log one or more parts of the request line independently. For example, the format string "%m %U%q %H" will log the method, path, query-string, and protocol, resulting in exactly the same output as "%r".


#6.  200 (%>s)
#This is the status code that the server sends back to the client. This information is very valuable, because it reveals whether the request resulted in a successful response (codes beginning in 2), a redirection (codes beginning in 3), an error caused by the client (codes beginning in 4), or an error in the server (codes beginning in 5). The full list of possible status codes can be found in the HTTP specification (RFC2616 section 10).

#7.  2326 (%b)
#The last entry indicates the size of the object returned to the client, not including the response headers. If no content was returned to the client, this value will be "-". To log "0" for no content, use %B instead.

# combined log format

In [None]:
Another commonly used format string is called the Combined Log Format. It can be used as follows.

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
CustomLog log/acces_log combined
This format is exactly the same as the Common Log Format, with the addition of two more fields. Each of the additional fields uses the percent-directive %{header}i, where header can be any HTTP request header. The access log under this format will look like:

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
The additional fields are:

"http://www.example.com/start.html" (\"%{Referer}i\")
The "Referer" (sic) HTTP request header. This gives the site that the client reports having been referred from. (This should be the page that links to or includes /apache_pb.gif).
"Mozilla/4.08 [en] (Win98; I ;Nav)" (\"%{User-agent}i\")
The User-Agent HTTP request header. This is the identifying information that the client browser reports about itself.

In [4]:
# To search a string to see if it starts with "The" and ends with "spain"
import re 
txt = "The rain in spain"
x= re.search("^The.*spain$",txt)

In [5]:
x

<re.Match object; span=(0, 17), match='The rain in spain'>

In [6]:
m = re.findall("ai",txt) # returns alist containing all matches
m

['ai', 'ai']

In [7]:
g = re.findall("portugal",txt) # ireturn an empty list if no match was found

In [8]:
g

[]

In [10]:
k= re.search("\s",txt)

In [15]:
print("The first white-space character is located in position:", k.start())

The first white-space character is located in position: 3


In [16]:
k

<re.Match object; span=(3, 4), match=' '>

In [17]:
q=re.search("\S",txt)

In [18]:
q

<re.Match object; span=(0, 1), match='T'>

In [19]:
v= re.split("\s",txt)

In [20]:
v

['The', 'rain', 'in', 'spain']

In [40]:
vv=re.split("\s",txt,1) # splits the text at position 1

In [41]:
vv

['The', 'rain in spain']

In [48]:
qq= re.sub("\s","_",txt,3)# replaces _ with the white space

In [49]:
qq

'The_rain_in_spain'

In [54]:
import re
txt = "The rain in Spain"
pp= re.search(r"\bS\w+", txt)

In [56]:
pp.string

'The rain in Spain'

In [57]:
txt = "The rain in Spain"
ff= re.search(r"\bS\w+", txt)
ff.span()


(12, 17)

In [58]:
txt = "The rain in Spain"
hh= re.search(r"\bS\w+", txt)
hh.group()


'Spain'

In [59]:
s = 'foo123bar'

In [60]:
'123'in s

True

In [61]:
s.find('123')

3

In [62]:
s.index('123')

3

In [63]:
if re.search('123',s):
    print("found a match")
else:
    print("no match")
    


found a match


In [None]:
#<re.Match object; span=(3, 4), match=' '> in the result means that span=(3,6) is the portion in the string where the match was found and this means s[3:6] in the slice notation