# Hacker News Analysis
In this project I will be analyzing posts from the site Hacker News
There are 2 types of posts relevant to our analysis: Ask HN and Show HN. Ask HN are posts that are submitted by the Hacker News community to ask a specific question. Show HN posts are to show the community something interesting like a new project or product.

We will be analyzing the data set to answer 2 specific questions:
1. Which type of post receives more comments on average? 
2. Do posts created at a certain time receive more comments on average?
I'll begin by importing the Hacker News Posts data set:

In [3]:
from csv import reader 

opened_file = open('../input/hacker-news-posts/HN_posts_year_to_Sep_26_2016.csv') 
read_file = reader(opened_file)
hn = list(read_file)
print(hn[:5])

## Data Processing

I will move the column header to its own variable ```headers```. 

Looking at the header row we have the following attributes included in the data set:
0. **Id**: post id
1. **title**: title of the post (self explanatory)
2. **url**: the url of the item being linked to
3. **num_points**: the number of upvotes the post received
4. **num_comments**: the number of comments the post received
5. **author**: the name of the account that made the post
6. **created_at**: the date and time the post was made (the time zone is Eastern Time in the US)

In [4]:
headers = hn[0]
hn = hn[1:]
print(headers, '\n')
print(hn[:5])

We're primarily interested in 'Ask HN' or 'Show HN' posts. I'll create 3 lists and loop through each row, looking at the 1st index whih is the ```title``` attribute. If it starts with 'Ask HN' or 'Show HN' I'll add it to its respective list, otherwise I'll add it to an ```other_posts``` list.

In [5]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    if row[1].lower().startswith('ask hn'):
        ask_posts.append(row)
    elif row[1].lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
print('Ask HN:',len(ask_posts))
print('Show HN:',len(show_posts))
print('Other:',len(other_posts))

## Analysis of post comments by post type
One of the questions we are seeking to answer with this project is which type of Hacker News posts receive the most user engagement. We are measuring that by the average amount of comments for each type of post. Below I will find which type of post receives the most average comments by using the ```num_comments``` attribute

In [6]:
total_ask_comments = 0
avg_ask_comments = 0

for post in ask_posts:
    comments = post[4]
    comments = int(comments)
    total_ask_comments += comments
print("Total Ask HN comments:", total_ask_comments)
avg_ask_comments = round(total_ask_comments / len(ask_posts))
print("Average comments per Ask HN post:", avg_ask_comments)

total_show_comments = 0
avg_show_comments = 0

for post in show_posts:
    comments = post[4]
    comments = int(comments)
    total_show_comments += comments
print("Total Show HN comments:", total_show_comments)
avg_show_comments = round(total_show_comments/len(show_posts))
print("Average comments per Show HN post:", avg_show_comments)



Based on the data set, Ask HN posts receive two times more comments on average than do Show HN posts. The average Ask HN post receives 10 comments and Show HN receive an average of 5 comments per post

## Analysis of post times and comments
Since I've determined that Ask HN posts receive the most comments on average, I will use those posts going forward in my analysis. Next, I want to see if posts created at a certain time receive more comments on average:
1. calculate number of posts created for each hour of the day
2. calculate average amount of comments for each hour

In [14]:
import datetime as dt

result_list = []
for post in ask_posts:
    created_at = post[6]
    num_comments = int(post[4])
    result_list.append([created_at, num_comments])

counts_by_hour = {}
comments_by_hour = {}

for post in result_list:
    created_at = dt.datetime.strptime(post[0],"%m/%d/%Y %H:%M")
    hour = created_at.hour
    num_comments = post[1]
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = num_comments
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += num_comments

avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([comments_by_hour[hour]/counts_by_hour[hour], hour])
    
avg_by_hour = sorted(avg_by_hour, reverse=True)

print("Top 5 Hours for Ask HN Post Comments:")

for row in avg_by_hour[:5]:
    hour_formatted = dt.datetime.strptime(str(row[1]),'%H')
    hour_formatted = hour_formatted.strftime('%H:%M')
    print('{}: {:.2f} average comments per post'.format(hour_formatted, row[0]))
    
    

## Conclusion
The tope 5 times to post an Ask HN article according to my analysis are:
1. 3:00 PM
2. 1:00 PM
3. 12:00 PM
4. 2 AM
5. 10 AM