# Analysis of Hacker News Posts

Here, we will be analyzing the hacker news website posts to determine which kind of posts receive the most traction in terms of the number of comments it gets.

Some of the things that will be analyzed include the kind of posts, and the time in which the post was made. About the kind of post, I will be analyzing those starting with "Ask HN", and those starting with "Show HN"

## 1. Read and Display Data

In [1]:
from csv import reader
hn = list(reader(open("./hacker_news.csv")))

In [2]:
for record in hn[:5]:
    print(record)
    print("\n")

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']


['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']




## 2. Remove Header

In [3]:
headers = hn[0]
hn = hn[1:]

In [4]:
headers

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

In [5]:
for record in hn[:5]:
    print(record)
    print("\n")

['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']


['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']




## 3. Extract "Ask HN" and "Show HN" Posts

In [6]:
ask_posts = []
show_posts = []
other_posts = []

for record in hn:
    title = record[1]
    if(title.lower().startswith('ask hn')):
        ask_posts.append(record)
    elif(title.lower().startswith('show hn')):
        show_posts.append(record)
    else:
        other_posts.append(record)

In [7]:
print("ask posts: ", len(ask_posts))
print("show posts: ", len(show_posts))
print("other posts: ", len(other_posts))

ask posts:  1744
show posts:  1162
other posts:  17194


## 4. Calculate Average No. of Comments

In [8]:
def get_total_comments(post_list, comment_column=4):
    total_comments = 0
    for post in post_list:
        total_comments += int(post[comment_column])
    
    return total_comments

In [9]:
total_ask_comments = get_total_comments(ask_posts)
avg_ask_comments = round(total_ask_comments/len(ask_posts), 2)
total_show_comments = get_total_comments(show_posts)
avg_show_comments = round(total_show_comments/len(show_posts), 2)

On average, posts starting with "ask hn" gained more traction with regards to the comments it received. On average, such posts had 14 comments, compared to those starting with "show hn" that had an average of 10 commments

In [10]:
print('total "ask hn" comments: ', total_ask_comments)
print('average "ask hn" comments: ', avg_ask_comments)
print('total "show hn" comments: ', total_show_comments)
print('average "show hn" comments: ', avg_show_comments)

total "ask hn" comments:  24483
average "ask hn" comments:  14.04
total "show hn" comments:  11988
average "show hn" comments:  10.32


## 5. See Posts Created by Hour

In [24]:
import datetime as dt

result_list = []
for post in ask_posts:
    created_at = post[6]
    comment_count = int(post[4])
    result_list.append([created_at, comment_count])
    
counts_by_hour = {}
comments_by_hour = {}
for post in result_list:
    date_time = dt.datetime.strptime(post[0], "%m/%d/%Y %H:%M")
    hour = date_time.hour
    
    if(hour in counts_by_hour):
        counts_by_hour[hour] += 1
    else:
        counts_by_hour[hour] = 1
        
    if(hour in comments_by_hour):
        comments_by_hour[hour] += post[1]
    else:
        comments_by_hour[hour] = post[1]    

In [28]:
print('"ask hn" posts by hour')
print("\n")
print(counts_by_hour)
print("\n")
print("\n")
print('"ask hn" post comments by hour')
print("\n")
print(comments_by_hour)

"ask hn" posts by hour


{9: 45, 13: 85, 10: 59, 14: 107, 16: 108, 23: 68, 12: 73, 17: 100, 15: 116, 21: 109, 20: 80, 2: 58, 18: 109, 3: 54, 5: 46, 19: 110, 1: 60, 22: 71, 8: 48, 4: 47, 0: 55, 6: 44, 7: 34, 11: 58}




"ask hn" post comments by hour


{9: 251, 13: 1253, 10: 793, 14: 1416, 16: 1814, 23: 543, 12: 687, 17: 1146, 15: 4477, 21: 1745, 20: 1722, 2: 1381, 18: 1439, 3: 421, 5: 464, 19: 1188, 1: 683, 22: 479, 8: 492, 4: 337, 0: 447, 6: 397, 7: 267, 11: 641}


## 6. Average Ask HN Comments by Hour

In [32]:
avg_by_hour = []
for hour in counts_by_hour:
    no_of_posts = counts_by_hour[hour]
    no_of_comments = comments_by_hour[hour]
    
    avg_by_hour.append([hour, no_of_comments/no_of_posts])
    
print(avg_by_hour)

[[9, 5.5777777777777775], [13, 14.741176470588234], [10, 13.440677966101696], [14, 13.233644859813085], [16, 16.796296296296298], [23, 7.985294117647059], [12, 9.41095890410959], [17, 11.46], [15, 38.5948275862069], [21, 16.009174311926607], [20, 21.525], [2, 23.810344827586206], [18, 13.20183486238532], [3, 7.796296296296297], [5, 10.08695652173913], [19, 10.8], [1, 11.383333333333333], [22, 6.746478873239437], [8, 10.25], [4, 7.170212765957447], [0, 8.127272727272727], [6, 9.022727272727273], [7, 7.852941176470588], [11, 11.051724137931034]]


In [38]:
swap_avg_by_hour = avg_by_hour

for element in swap_avg_by_hour:
    temp = element[0]
    element[0] = element[1]
    element[1] = temp
    
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

### Top 5 Hours for Ask Posts Comments

In [54]:
for element in sorted_swap[:5]:
    message = "{}:00: {:.2f} average comments per post".format(element[1], element[0])
    print(message)

15:00: 38.59 average comments per post
2:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post
