Udacity log analysis project is to build an internal reporting tool that will analyze information from the newspaper database to discover what kind of articles the site's readers like. Source code can be found here
- Install Vagrant and Virtual Box
- Download fullstack_nanodegree_repo zip or clone here
- Download News Database or clone here and move these files to Vagrant directory, which can be found in fullstack_nanodegree_repo.
- Navigate to full-stack-nanodegree-vm/vagrant directory through the command prompt or terminal.
- Execute
vagrant up
command - Execute
vagrant ssh
command to login the Virtual Machine you may need login details - Change the directory after successful
vagrant ssh
execute cd /vagrant - To load and run the News Database use
psql -d news -f newsdata.sql
. - Create views listed below
- After successful creation of Views Run
python log.py
to check out the output.
- What are the most popular three articles of all time?
create view new_log as select substring(path, 10) as newpath, status, id from log;
- Who are the most popular article authors of all time?
create view art_log as select new_log.newpath, articles.author,articles.slug, articles.id from new_log join articles on articles.slug=new_log.newpath;
- On which days did more than 1% of requests lead to errors?
create view error_log as select time::timestamp::date as date, count(*) as errors from log where status similar to '404%' group by date order by errors desc;
create view request_log as select time::timestamp::date as date, count(*) as requests from log group by date order by requests desc;
create view error_rate as select error_log.date as day, error_log.errors::float/request_log.requests*100 as error_ratio from error_log join request_log on error_log.date=request_log.date order by error_ratio desc limit 10;
- Sql (Postgresql)
- Python programming
Please refer to the wiki to find more about project.
Welcome to the Log_Analysis project, which is the part of the Full Stack Web Developer Nanodegree course
Creating a internal tool which generates meaningful reports for the Newspaper Database, which has Articles, Authors and log table using postgresql and python script. The database contains newspaper articles, as well as the web server log for the site. The log has a database row for each time a reader loaded a web page.
Analyzing data from the logs of a web service to answer questions such as
- What are the most popular three articles of all time?
- Who are the most popular article authors of all time?
- On which days did more than 1% of requests lead to errors? using advanced SQL queries.
Building an informative summary from logs is a real task that comes up very often in software engineering.
- Vagrant
- Virtual Box
- Git bash
- Sql
- Python Script