# Ultimate Engagement At Hacker News 

We will explore what type of post (Ask HN or Show HN) gets the most traction by analyzing the number of comments each post gets. Then we will figure out at what hour of the day these posts get the most engagement/comments. In this dataset, we have approximately 20K post, which contains information about each post like, the title, time posted, number of comments.

Lets start by opening the csv file and exploring a few rows in the dataset.

In [1]:
from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
headers = hn[:1]
hn = hn[1:]
print(headers,"\n")
print(hn[:3])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']] 

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']]


Since we are only interested in the posts that start with Ask HN or Show HN, we will create a new list of lists that contains the data with those titles.

In [2]:
"""Split the dataset into 3 separate sets depeding 
on what the title of the post is. """

ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print("Number of Ask HN posts: ", len(ask_posts))
print("Number of Show HN posts: ", len(show_posts))
print("Number of Other posts: ", len(other_posts)) 

Number of Ask HN posts:  1744
Number of Show HN posts:  1162
Number of Other posts:  17194


Now that we have the Show HN and Ask HN post lets find out which one receives more comments on average.

In [3]:
"""Count the number of from both Ask HN and Show HN dataset."""

total_ask_comments = 0
for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments += num_comments
avg_ask_comments = total_ask_comments / len(ask_posts)

print("Average amount of comments in Ask HN posts: ", 
      round(avg_ask_comments, 1))

total_show_comments = 0
for row in show_posts:
    num_comments = int(row[4])
    total_show_comments += num_comments
avg_show_comments = total_show_comments / len(show_posts)

print("Average amount of comments in Show HN posts: ", 
      round(avg_show_comments, 1))

Average amount of comments in Ask HN posts:  14.0
Average amount of comments in Show HN posts:  10.3


We can see the on average the Ask HN posts get more comments than the Show HN posts. Therefore Ask HN post are more likely to get engagement, hence we will focus on these post from now on. Now we are going to investigate if a certain time to post an Ask HN is more likely to get more comments.

In [4]:
"""Create two dictionaries one with the number of post created 
per hour and the number of comments per hour. Then use these two
dictionaries to calculate an average of comments per post per hour."""

import datetime as dt

result_list = []
counts_by_hour = {}
comments_by_hour = {}

for row in ask_posts:
    created_at_num_comments = [row[6], int(row[4])]
    result_list.append(created_at_num_comments)
    
for row in result_list:
    date = row[0]
    date = dt.datetime.strptime(date, "%m/%d/%Y %H:%M")
    hour = date.strftime("%H")
    
    if hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]
    else:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1] 
        
avg_by_hour = []
for h in counts_by_hour:
    avg_comments_per_post_per_hour = comments_by_hour[h] / counts_by_hour[h]
    avg_by_hour.append([h, avg_comments_per_post_per_hour])

swap_avg_by_hour = []
for item in avg_by_hour:
    swap_avg_by_hour.append([item[1], item[0]])

sorted_swap = sorted(swap_avg_by_hour, reverse = True)
print("Top 5 Hours for Ask Posts Comments:\n")
for item in sorted_swap[:5]:
    formated_avg = "{hour}:00: {avg:.2f} average comments per post.".format(hour=item[1], avg=item[0])
    
    print(formated_avg)

Top 5 Hours for Ask Posts Comments:

15:00: 38.59 average comments per post.
02:00: 23.81 average comments per post.
20:00: 21.52 average comments per post.
16:00: 16.80 average comments per post.
21:00: 16.01 average comments per post.


Here we have concluded that we want the highest chance of receiving comments on an Ask HN post we should create the post at 3 pm - 4 pm or at 8 pm - 9 pm.