- An internal reporting tool that will use information from the database to discover what kind of articles the site's readers like.
- The database contains newspaper articles, as well as the web server log for the site. The log has a database row for each time a reader loaded a web page.
- It won't take any input from the user. Instead, it will connect to that database, use SQL queries to analyze the log data, and print out the answers to some questions.
- Project made for Udacity NanoDegree.
-
Clone it by running command: git clone https://github.com/yash2code/log-analysis-ud.git
-
This project makes use of the Linux-based virtual machine (VM).
-
Download the Data
You will need to unzip this file after downloading it. The file inside is called newsdata.sql. Put this file into the vagrant directory, which is shared with your virtual machine.
-
To load the data, use the command
psql -d news -f newsdata.sql
-
Make Views
-
Run
python log-analysis.py
- Version 0.1
-
author_view
create view author_view as select authors.name, count(articles.author) as views from articles, log, authors where log.path = '/article/' || articles.slug and articles.author = authors.id group by authors.name order by views desc;
-
error_view
create view error_view as select Date,Total,Error, (Error::float*100)/Total::float as Percent from (select time::timestamp::date as Date, count(status) as Total, sum(case when status = '404 NOT FOUND' then 1 else 0 end) as Error from log group by time::timestamp::date) as output where (Error::float*100)/Total::float > 1.0 order by Percent desc;
- What are the most popular three articles of all time?
article | views
--------------------+--------
candidate-is-jerk | 338647
bears-love-berries | 253801
bad-things-gone | 170098
(3 rows)
- Who are the most popular article authors of all time?
name | views
------------------------+--------
Ursula La Multa | 507594
Rudolf von Treppenwitz | 423457
Anonymous Contributor | 170098
Markoff Chaney | 84557
(4 rows)
- On which days did more than 1% of requests lead to errors?
date | total | error | percent
------------+-------+-------+------------------
2016-07-17 | 55907 | 1265 | 2.26268624680273
(1 row)