# Hacker News Post Analysis
## Objective
+ Do Ask HN or Show HN receive more comments on average?
+ Do posts created at a certain time receive more comments on average?

In [36]:
# Importing the relevant libraries

from csv import reader
from datetime import *

In [2]:
# Opening the file and converting to list datatype

open_file = open("hacker_news.csv")
read_file = reader(open_file)
hn = list(read_file)

In [3]:
# Displaying first 5 rows

print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [4]:
# Removing the header row from the list

headers = hn[0]
hn = hn[1:]

In [5]:
# Confirming the header row has been removed

print(hn[:5])

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


Now, we seperate posts starting with "Ask HN" and "Show HN" from the rest.

In [6]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        

Next step is to figure out which category receives more comments on average between ask posts and show posts.

In [7]:
# Determining the average no. of comments for the ask posts

total_ask_comments = 0
for row in ask_posts:
    total_ask_comments += int(row[4])

avg_ask_comments = total_ask_comments / len(ask_posts)

print(avg_ask_comments)

# Determining the average no. of comments for the show posts
total_show_comments = 0
for row in show_posts:
    total_show_comments += int(row[4])

avg_show_comments = total_show_comments / len(show_posts)

print(avg_show_comments)

14.038417431192661
10.31669535283993


* From the above, we have found that the average comments per post for the Ask Posts is 14 while that of the Show Post is 10 comments per post. 

* We can conclude that Ask Posts on average has more comments than Show Posts. 

Since ask posts are more likely to receive comments, we'll focus our remaining analysis just on these posts.

Next we will: 
1. Calculate the number of ask posts created in each hour of the day, along with the number of comments received.
2. Calculate the average number of comments ask posts receive by hour created.

In [23]:
result_list = []

for row in ask_posts:
    temp = [row[6], int(row[4])]
    result_list.append(temp)

counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    hour = datetime.strptime(row[0], "%m/%d/%Y %H:%M").hour
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]    

In [24]:
# Display the number of ask posts created each hour of the day
print(counts_by_hour)

{9: 45, 13: 85, 10: 59, 14: 107, 16: 108, 23: 68, 12: 73, 17: 100, 15: 116, 21: 109, 20: 80, 2: 58, 18: 109, 3: 54, 5: 46, 19: 110, 1: 60, 22: 71, 8: 48, 4: 47, 0: 55, 6: 44, 7: 34, 11: 58}


In [25]:
# Display the number of comments ask posts created at each hour received
print(comments_by_hour)

{9: 251, 13: 1253, 10: 793, 14: 1416, 16: 1814, 23: 543, 12: 687, 17: 1146, 15: 4477, 21: 1745, 20: 1722, 2: 1381, 18: 1439, 3: 421, 5: 464, 19: 1188, 1: 683, 22: 479, 8: 492, 4: 337, 0: 447, 6: 397, 7: 267, 11: 641}


In [29]:
# Creating a list of list of hours and corresponding average number of comments per post

avg_by_hour = []

for item in counts_by_hour:
    temp = []
    avg = round(comments_by_hour[item] / counts_by_hour[item], 2)
    if item not in avg_by_hour:
        temp.append(item)
        temp.append(avg)
    avg_by_hour.append(temp)

print(avg_by_hour)

[[9, 5.58], [13, 14.74], [10, 13.44], [14, 13.23], [16, 16.8], [23, 7.99], [12, 9.41], [17, 11.46], [15, 38.59], [21, 16.01], [20, 21.52], [2, 23.81], [18, 13.2], [3, 7.8], [5, 10.09], [19, 10.8], [1, 11.38], [22, 6.75], [8, 10.25], [4, 7.17], [0, 8.13], [6, 9.02], [7, 7.85], [11, 11.05]]


In [40]:
# Sorting the list showing the hour with highest average at the top

swap_avg_by_hour = []

for item in avg_by_hour:
    temp = []
    temp.append(item[1])
    temp.append(item[0])
    swap_avg_by_hour.append(temp)

sorted_swap = sorted(swap_avg_by_hour, reverse=True)
print("Top 5 Hours for Ask Posts Comments")
print("------------------------------------")
text = "{}: {} average comments per post"
for item in sorted_swap[:5]:
    hour = datetime.strptime(str(item[1]), "%H")
    hour = datetime.strftime(hour, "%H:00")
    avg = item[0]
    print(text.format(hour,avg))
    

    

Top 5 Hours for Ask Posts Comments
------------------------------------
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.8 average comments per post
21:00: 16.01 average comments per post


## Conclusion
Based on hour findings, you would have a better chance of receiving a comments if you post an Ask HN post at 15:00 hours (3pm)