# Exploring Hacker News Posts

In this project, we'll work with a data set of submissions to popular technology site Hacker News.

We're specifically interested in posts whose titles begin with either Ask HN or Show HN. Users submit Ask HN posts to ask the Hacker News community a specific question. Likewise, users submit Show HN posts to show the Hacker News community a project, product, or just generally something interesting

We'll compare these two types of posts to determine the following:

Do Ask HN or Show HN receive more comments on average?
Do posts created at a certain time receive more comments on average?

In [1]:
from csv import reader

In [15]:
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)

In [17]:
headers = hn[0]

In [19]:
hn = hn[1:]

In [21]:
headers

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

In [23]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(row)
    elif title.lower().startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

In [24]:
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


Avg comments in Ask Posts List

In [31]:
total_ask_comments = 0

for post in ask_posts:
    n_comments = int(post[4])
    total_ask_comments += n_comments
print(total_ask_comments)
avg_ask_comments = total_ask_comments/len(ask_posts)
print(avg_ask_comments)

24483
14.038417431192661


Avg Comments in Show Posts List

In [32]:
total_show_comments = 0

for post in show_posts:
    n_comments = int(post[4])
    total_show_comments += n_comments
print(total_show_comments)
avg_show_comments = total_show_comments/len(show_posts)
print(avg_show_comments)

11988
10.31669535283993


Next, we'll determine if ask posts created at a certain time are more likely to attract comments. We'll use the following steps to perform this analysis:

Calculate the amount of ask posts created in each hour of the day, along with the number of comments received.
Calculate the average number of comments ask posts receive by hour created.

In [33]:
import datetime as dt

In [35]:
result_list = []

for post in ask_posts:
    post_created = post[6]
    n_comments = int(post[4])
    result_list.append([post_created, n_comments])

In [37]:
print(result_list[:3])

[['8/16/2016 9:55', 6], ['11/22/2015 13:43', 29], ['5/2/2016 10:14', 1]]


In [47]:
counts_by_hour = {}
comments_by_hour = {}

for data in result_list:
    date_time_string = data[0]
    comments = data[1]
    date_time_object = dt.datetime.strptime(date_time_string, "%m/%d/%Y %H:%M")
    time_hour = date_time_object.hour
    
    if time_hour in counts_by_hour:
        comments_by_hour[time_hour] += comments
        counts_by_hour[time_hour] += 1
    else:
        comments_by_hour[time_hour] = comments
        counts_by_hour[time_hour] = 1
print(counts_by_hour)
print("\n")
print(comments_by_hour)

    

{0: 55, 1: 60, 2: 58, 3: 54, 4: 47, 5: 46, 6: 44, 7: 34, 8: 48, 9: 45, 10: 59, 11: 58, 12: 73, 13: 85, 14: 107, 15: 116, 16: 108, 17: 100, 18: 109, 19: 110, 20: 80, 21: 109, 22: 71, 23: 68}


{0: 447, 1: 683, 2: 1381, 3: 421, 4: 337, 5: 464, 6: 397, 7: 267, 8: 492, 9: 251, 10: 793, 11: 641, 12: 687, 13: 1253, 14: 1416, 15: 4477, 16: 1814, 17: 1146, 18: 1439, 19: 1188, 20: 1722, 21: 1745, 22: 479, 23: 543}


In [60]:
avg_by_hour = []

for time in comments_by_hour:
    avg_by_hour.append([time, comments_by_hour[time]/counts_by_hour[time]])
print(avg_by_hour)

[[0, 8.127272727272727], [1, 11.383333333333333], [2, 23.810344827586206], [3, 7.796296296296297], [4, 7.170212765957447], [5, 10.08695652173913], [6, 9.022727272727273], [7, 7.852941176470588], [8, 10.25], [9, 5.5777777777777775], [10, 13.440677966101696], [11, 11.051724137931034], [12, 9.41095890410959], [13, 14.741176470588234], [14, 13.233644859813085], [15, 38.5948275862069], [16, 16.796296296296298], [17, 11.46], [18, 13.20183486238532], [19, 10.8], [20, 21.525], [21, 16.009174311926607], [22, 6.746478873239437], [23, 7.985294117647059]]


In [61]:
swap_avg_by_hour = []

for row in avg_by_hour:
    hr = row[0]
    avg = row[1]
    swap_avg_by_hour.append([avg, hr])
print(swap_avg_by_hour)

[[8.127272727272727, 0], [11.383333333333333, 1], [23.810344827586206, 2], [7.796296296296297, 3], [7.170212765957447, 4], [10.08695652173913, 5], [9.022727272727273, 6], [7.852941176470588, 7], [10.25, 8], [5.5777777777777775, 9], [13.440677966101696, 10], [11.051724137931034, 11], [9.41095890410959, 12], [14.741176470588234, 13], [13.233644859813085, 14], [38.5948275862069, 15], [16.796296296296298, 16], [11.46, 17], [13.20183486238532, 18], [10.8, 19], [21.525, 20], [16.009174311926607, 21], [6.746478873239437, 22], [7.985294117647059, 23]]


In [65]:
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

In [66]:
sorted_swap

[[38.5948275862069, 15],
 [23.810344827586206, 2],
 [21.525, 20],
 [16.796296296296298, 16],
 [16.009174311926607, 21],
 [14.741176470588234, 13],
 [13.440677966101696, 10],
 [13.233644859813085, 14],
 [13.20183486238532, 18],
 [11.46, 17],
 [11.383333333333333, 1],
 [11.051724137931034, 11],
 [10.8, 19],
 [10.25, 8],
 [10.08695652173913, 5],
 [9.41095890410959, 12],
 [9.022727272727273, 6],
 [8.127272727272727, 0],
 [7.985294117647059, 23],
 [7.852941176470588, 7],
 [7.796296296296297, 3],
 [7.170212765957447, 4],
 [6.746478873239437, 22],
 [5.5777777777777775, 9]]

In [67]:
print("Top 5 Hours for Ask Comments")

Top 5 Hours for Ask Comments


In [72]:
for row in sorted_swap[:6]:
    hour = str(row[1])
    new_format = dt.datetime.strptime(hour, "%H").strftime("%H:%M")
    avgs = row[0]
    
    template = "{}: {:.2f} average comments per post"
    print(template.format(new_format, avgs))

15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post
13:00: 14.74 average comments per post


We can see that the top hour for asking comments is 15:00 Hrs with an avergae of 38.59 comments per post