## Introduction

Imagine your company uses a server that runs a service called ticky, an internal ticketing system. The service logs events to syslog, both when it runs successfully and when it encounters errors.

The service's developers need your help getting some information from those logs so that they can better understand how their software is used and how to improve it. So, for this lab, you'll write some automation scripts that will process the system log and generate reports based on the information extracted from the log files.

What you'll do
Use regex to parse a log file
Append and modify values in a dictionary
Write to a file in CSV format
Move files to the appropriate directory for use with the CSV->HTML converter

```bash
ip : 34.82.23.153
username : student-03-00b41e060a5d

qwiklabs-L83148684.pem

chmod 600 qwiklabs-L83148684.pem

ssh -i qwiklabs-L83148684.pem student-03-00b41e060a5d@34.82.23.153

#### Exercise - 1
We'll be working with a log file named syslog.log, which contains logs related to ticky.

You can view this file using:
```bash
cat syslog.log
```

When the service runs correctly, it logs an INFO message to syslog. It then states what it did and states the username and ticket number related to the event. If the service encounters a problem, it logs an ERROR message to syslog. This error message indicates what was wrong and states the username that triggered the action that caused the problem.

In this section, we'll search and view different types of error messages. The error messages for ticky details the problems with the file with a timestamp for when each problem occurred.

These are a few kinds of listed error:

Timeout while retrieving information
The ticket was modified while updating
Connection to DB failed
Tried to add information to a closed ticket
Permission denied while closing ticket
Ticket doesn't exist
To grep all the logs from ticky, use the following command:

```bash
grep ticky syslog.log
```

In order to search all the ERROR logs, use the following command:    
```bash
grep ERROR syslog.log
```

To enlist all the ERROR messages of specific kind use the below syntax.

Syntax: grep ERROR [message] [file-name]
    
```bash
grep ERROR "Connection to DB failed" syslog.log

grep "ERROR Tried to add information to closed ticket" syslog.log
```

Let's now write a few regular expressions using a python3 interpreter.

We can also grep the ERROR/INFO messages in a pythonic way using a regular expression. Let's now write a few regular expressions using a python3 interpreter.

Open Python shell using the command below:

```bash
python3
````

In [1]:
import re
line = "May 27 11:45:40 ubuntu.local ticky: INFO: Created ticket [#1234] (username)"

# Extracting the username
user_pattern = r"\((\w+)\)$"
user = re.search(user_pattern, line)
print(user[1])

username


In [2]:
import re
line = "May 27 11:45:40 ubuntu.local ticky: INFO: Created ticket [#1234] (username)"
re.search(r"ticky: INFO: ([\w ]*) ", line)


<re.Match object; span=(29, 57), match='ticky: INFO: Created ticket '>

In [4]:
line = "May 27 11:45:40 ubuntu.local ticky: ERROR: Error creating ticket [#1234] (username)"
#re.search(r"ticky: ([\w ]*) ", line)
re.search(r"ticky: ERROR: ([\w ]*) ", line)

<re.Match object; span=(29, 65), match='ticky: ERROR: Error creating ticket '>

## Exercise - 2
Now, use the Python interactive shell to create a dictionary.


In [5]:
fruit = {"oranges": 3, "apples": 5, "bananas": 7, "pears": 2}

sorted(fruit.items())

[('apples', 5), ('bananas', 7), ('oranges', 3), ('pears', 2)]

We'll now sort the dictionary using the item's key. For this use the operator module.

Pass the function itemgetter() as an argument to the sorted() function. Since the second element of tuple needs to be sorted, pass the argument 0 to the itemgetter function of the operator module.

In [6]:
import operator

sorted(fruit.items(), key=operator.itemgetter(0))

[('apples', 5), ('bananas', 7), ('oranges', 3), ('pears', 2)]

To sort a dictionary based on its values, pass the argument 1 to the itemgetter function of the operator module.

In [7]:
sorted(fruit.items(), key=operator.itemgetter(1))

[('pears', 2), ('oranges', 3), ('apples', 5), ('bananas', 7)]

Finally, you can also reverse the order of the sort using the reverse parameter. This parameter takes in a boolean argument.

To sort the fruit object from most to least occurrence, we pass the argument reverse=True.

In [8]:
sorted(fruit.items(), key = operator.itemgetter(1), reverse=True)

[('bananas', 7), ('apples', 5), ('oranges', 3), ('pears', 2)]

You can see the fruit object is now sorted from the most to the least number of occurrences.

### Exercise - 3
We'll now work with a file named csv_to_html.py. This file converts the data in a CSV file into an HTML file that contains a table with the data. Let's practice this with an example file.

Create a new CSV file.
```bash
nano user_emails.csv
```

```csv
Full Name, Email Address
Blossom Gill, blossom@abc.edu
Hayes Delgado, nonummy@utnisia.com
Petra Jones, ac@abc.edu
Oleg Noel, noel@liberomauris.ca
Ahmed Miller, ahmed.miller@nequenonquam.co.uk
Macaulay Douglas, mdouglas@abc.edu
Aurora Grant, enim.non@abc.edu
Madison Mcintosh, mcintosh@nisiaenean.net
Montana Powell, montanap@semmagna.org
Rogan Robinson, rr.robinson@abc.edu
Simon Rivera, sri@abc.edu
Benedict Pacheco, bpacheco@abc.edu
Maisie Hendrix, mai.hendrix@abc.edu
Xaviera Gould, xlg@utnisia.net
Oren Rollins, oren@semmagna.com
Flavia Santiago, flavia@utnisia.net
Jackson Owens, jackowens@abc.edu
Britanni Humphrey, britanni@ut.net
Kirk Nixon, kirknixon@abc.edu
Bree Campbell, breee@utnisia.net
```

Give executable permission to the script file csv_to_html.py.
```bash
chmod +x csv_to_html.py
```

In [None]:
#!/usr/bin/env python3
import sys
import csv
import os

def process_csv(csv_file):
    """Turn the contents of the CSV file into a list of lists"""
    print("Processing {}".format(csv_file))
    with open(csv_file,"r") as datafile:
        data = list(csv.reader(datafile))
    return data
    
def data_to_html(title, data):
    """Turns a list of lists into an HTML table"""

    # HTML Headers
    html_content = """
<html>
<head>
<style>
table {
  width: 25%;
  font-family: arial, sans-serif;
  border-collapse: collapse;
}

tr:nth-child(odd) {
  background-color: #dddddd;
}

td, th {
  border: 1px solid #dddddd;
  text-align: left;
  padding: 8px;
}
</style>
</head>
<body>
"""


    # Add the header part with the given title
    html_content += "<h2>{}</h2><table>".format(title)

    # Add each row in data as a row in the table
    # The first line is special and gets treated separately
    for i, row in enumerate(data):
        html_content += "<tr>"
        for column in row:
            if i == 0:
                html_content += "<th>{}</th>".format(column)
            else:
                html_content += "<td>{}</td>".format(column)
        html_content += "</tr>"

    html_content += """</tr></table></body></html>"""
    return html_content


def write_html_file(html_string, html_file):

    # Making a note of whether the html file we're writing exists or not
    if os.path.exists(html_file):
        print("{} already exists. Overwriting...".format(html_file))

    with open(html_file,'w') as htmlfile:
        htmlfile.write(html_string)
    print("Table succesfully written to {}".format(html_file))

def main():
    """Verifies the arguments and then calls the processing function"""
    # Check that command-line arguments are included
    if len(sys.argv) < 3:
        print("ERROR: Missing command-line argument!")
        print("Exiting program...")
        sys.exit(1)
    
    # Open the files
    csv_file = sys.argv[1]
    html_file = sys.argv[2]
    
    # Check that file extensions are included
    if ".csv" not in csv_file:
        print('Missing ".csv" file extension from first command-line argument!')
        print("Exiting program...")
        sys.exit(1)
    
    if ".html" not in html_file:
        print('Missing ".html" file extension from second command-line argument!')
        print("Exiting program...")
        sys.exit(1)
    
    # Check that the csv file exists
    if not os.path.exists(csv_file):
        print("{} does not exist".format(csv_file))
        print("Exiting program...")
        sys.exit(1)

    # Process the data and turn it into an HTML
    data = process_csv(csv_file)
    title = os.path.splitext(os.path.basename(csv_file))[0].replace("_", " ").title()
    html_string = data_to_html(title, data)
    write_html_file(html_string, html_file)

if __name__ == "__main__":
    main()

To visualize the data in the user_emails.csv file, you have to generate a webpage that'll be served by the webserver running on the machine.

The script csv_to_html.py takes in two arguments, the CSV file, and location that would host the HTML page generated. Give write permission to the directory that would host that HTML page:

```bash
sudo chmod  o+w /var/www/html
```

Next, run the script csv_to_html.py script by passing two arguments: user_emails.csv file and the path /var/www/html/. Also, append a name to the path with an HTML extension. This should be the name that you want the HTML file to be created with.

```bash
#./csv_to_html.py user_emails.csv /var/www/html/<html-filename>.html
./csv_to_html.py user_emails.csv /var/www/html/emails.html
```

-bash: ./csv_to_html.py: Permission denied

```bash
sudo ./csv_to_html.py user_emails.csv /var/www/html/emails.html

cat /var/www/html/emails.html
```

Navigate to the /var/www/html directory. Here, you'll find an HTML file created with the filename you passed to the above script.
    
```bash
ls /var/www/html
cd /var/www/html

### Generate reports

Now, we're going to practice creating a script, named ticky_check.py, that generates two different reports from this internal ticketing system log file i.e., syslog.log. This script will create the following reports:

- **The ranking of errors** generated by the system: A list of all the error messages logged and how many times each error was found, sorted by the most common error to the least common error. This report doesn't take into account the users involved.

- **The user usage statistics**  for the service: A list of all users that have used the system, including how many info messages and how many error messages they've generated. This report is sorted by username.
To create these reports write a python script named ticky_check.py. Use nano editor for this.

```bash
nano ticky_check.py
```

Here's your challenge: Write a script to generate two different reports based on the ranking of errors generated by the system and the user usage statistics for the service. You'll write the script on your own, but we'll guide you throughout.

First, import all the Python modules that you'll use in this Python script. After importing the necessary modules, initialize two dictionaries: one for the number of different error messages and another to count the number of entries for each user (splitting between INFO and ERROR).

Now, parse through each log entry in the syslog.log file by iterating over the file.

For each log entry, you'll have to first check if it matches the INFO or ERROR message formats. You should use regular expressions for this. When you get a successful match, add one to the corresponding value in the per_user dictionary. If you get an ERROR message, add one to the corresponding entry in the error dictionary by using proper data structure.

After you've processed the log entries from the syslog.log file, you need to sort both the per_user and error dictionary before creating CSV report files.

Keep in mind that:

The error dictionary should be sorted by the number of errors from most common to least common.
The user dictionary should be sorted by username.
Insert column names as ("Error", "Count") at the zero index position of the sorted error dictionary. And insert column names as ("Username", "INFO", "ERROR") at the zero index position of the sorted per_user dictionary.

Here's your challenge: Write a script to generate two different reports based on the ranking of errors generated by the system and the user usage statistics for the service. You'll write the script on your own, but we'll guide you throughout.

First, import all the Python modules that you'll use in this Python script. After importing the necessary modules, initialize two dictionaries: one for the number of different error messages and another to count the number of entries for each user (splitting between INFO and ERROR).

Now, parse through each log entry in the syslog.log file by iterating over the file.

For each log entry, you'll have to first check if it matches the INFO or ERROR message formats. You should use regular expressions for this. When you get a successful match, add one to the corresponding value in the per_user dictionary. If you get an ERROR message, add one to the corresponding entry in the error dictionary by using proper data structure.

After you've processed the log entries from the syslog.log file, you need to sort both the per_user and error dictionary before creating CSV report files.

Keep in mind that:

The error dictionary should be sorted by the number of errors from most common to least common.
The user dictionary should be sorted by username.
Insert column names as ("Error", "Count") at the zero index position of the sorted error dictionary. And insert column names as ("Username", "INFO", "ERROR") at the zero index position of the sorted per_user dictionary.



```python
#!/usr/bin/env python3

import re
import operator
import csv

per_user = {}
error = {}

with open("syslog.log") as file:
    for log in file.readlines():
        username = re.search(r"\((.*)\)", log).group(1)
        log_type = re.search(r"(INFO|ERROR)", log).group(1)
        if username not in per_user:
            per_user[username] = {"INFO": 0, "ERROR": 0}
        per_user[username][log_type] += 1
        if log_type == "ERROR":
            error_msg = re.search(r"ERROR (.*) ", log).group(1)
            if error_msg not in error:
                error[error_msg] = 0
            error[error_msg] += 1

sorted_per_user = sorted(per_user.items(), key=operator.itemgetter(0))
sorted_error = sorted(error.items(), key=operator.itemgetter(1), reverse=True)

with open("error_message.csv", "w") as error_file:
    writer = csv.writer(error_file)
    writer.writerow(["Error", "Count"])
    writer.writerows(sorted_error)

with open("user_statistics.csv", "w") as user_file:
    writer = csv.writer(user_file)
    writer.writerow(["Username", "INFO", "ERROR"])
    for item in sorted_per_user:
        writer.writerow([item[0], item[1]["INFO"], item[1]["ERROR"]])
```

```bash
chmod +x ticky_check.py
```

```bash
./ticky_check.py
```

```bash
cat error_message.csv
cat user_statistics.csv
```

```bash
python3 ./csv_to_html.py error_message.csv /var/www/html/error_message.html
python3 ./csv_to_html.py user_statistics.csv /var/www/html/user_statistics.html
```

```bash
cat /var/www/html/error_message.html
cat /var/www/html/user_statistics.html
```

```bash
sudo chmod  o+w /var/www/html
```





```python

error = sorted(error.items(), key=operator.itemgetter(1), reverse=True)
per_user = sorted(per_user.items(), key=operator.itemgetter(0))

error.insert(0, ("Error", "Count"))
per_user.insert(0, ("Username", "INFO", "ERROR"))
