# Playing around with HN posts
We'll compare these two types of posts to determine the following:

Do Ask HN or Show HN receive more comments on average?
Do posts created at a certain time receive more comments on average?

## Import data

In [1]:
from csv import reader
hn = list(reader(open("hacker_news.csv")))

First 5 rows:

In [3]:
for row in hn[:5]:
    print(row)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']
['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']
['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']
['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


Seperate headers and data:

In [6]:
headers = hn[0]
hn = hn[1:]
print("Headers:")
print(headers)
print("First 5 data rows:")
for row in hn[:5]:
    print(row)

Headers:
['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']
First 5 data rows:
['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']
['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']
['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']
['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']
['10482257', 'Title II kills investment? Comcast and other ISPs are now

## Extracting Ask HN and Show HN and Others Posts

In [10]:
ask_posts, show_posts, other_posts = [],[],[]
for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
print('Numbers of Ask HN, Show HN and other respectively')
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

Numbers of Ask HN, Show HN and other respectively
1744
1162
17193


## Calculating the Average Number of Comments for Ask HN and Show HN Posts

In [11]:
#Ask HN average cmts
total_ask_comments = 0
for row in ask_posts:
    cmt = int(row[4])
    total_ask_comments += cmt
avg_ask_comments = total_ask_comments / len(ask_posts)
#Show HN average cmts
total_show_comments = 0
for row in show_posts:
    cmt = int(row[4])
    total_show_comments += cmt
avg_show_comments = total_show_comments / len(show_posts)

print('Average Ask HN cmt = {}'.format(avg_ask_comments))
print('Average Show HN cmt = {}'.format(avg_show_comments))

Average Ask HN cmt = 14.038417431192661
Average Show HN cmt = 10.31669535283993


Hence, Ask HN has more comments on average than Show HN

## Finding the Number of Ask Posts and Comments by Hour Created

1. Calculate the number of ask posts created in each hour of the day, along with the number of comments received.
2. Calculate the average number of comments ask posts receive by hour created.

In [17]:
import datetime as dt
result_list = []
for post in ask_posts:
    result_list.append([post[6],int(post[4])]) # (created_at,num_comments)

counts_by_hour, comments_by_hour = {},{}
for post in result_list:
    hour = post[0]
    hour = dt.datetime.strptime(hour, "%m/%d/%Y %H:%M")
    hour = hour.strftime('%H')
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 0
        comments_by_hour[hour] = 0
    counts_by_hour[hour] += 1
    comments_by_hour[hour] += post[1]


{'09': 45, '13': 85, '10': 59, '14': 107, '16': 108, '23': 68, '12': 73, '17': 100, '15': 116, '21': 109, '20': 80, '02': 58, '18': 109, '03': 54, '05': 46, '19': 110, '01': 60, '22': 71, '08': 48, '04': 47, '00': 55, '06': 44, '07': 34, '11': 58}
{'09': 251, '13': 1253, '10': 793, '14': 1416, '16': 1814, '23': 543, '12': 687, '17': 1146, '15': 4477, '21': 1745, '20': 1722, '02': 1381, '18': 1439, '03': 421, '05': 464, '19': 1188, '01': 683, '22': 479, '08': 492, '04': 337, '00': 447, '06': 397, '07': 267, '11': 641}


Average comments per hours:

In [18]:
avg_by_hour = []
for hour in counts_by_hour:
    avg_by_hour.append([hour,comments_by_hour[hour]/
                       counts_by_hour[hour]])
print(avg_by_hour)

[['09', 5.5777777777777775], ['13', 14.741176470588234], ['10', 13.440677966101696], ['14', 13.233644859813085], ['16', 16.796296296296298], ['23', 7.985294117647059], ['12', 9.41095890410959], ['17', 11.46], ['15', 38.5948275862069], ['21', 16.009174311926607], ['20', 21.525], ['02', 23.810344827586206], ['18', 13.20183486238532], ['03', 7.796296296296297], ['05', 10.08695652173913], ['19', 10.8], ['01', 11.383333333333333], ['22', 6.746478873239437], ['08', 10.25], ['04', 7.170212765957447], ['00', 8.127272727272727], ['06', 9.022727272727273], ['07', 7.852941176470588], ['11', 11.051724137931034]]


Sorting average comments per hours:

In [21]:
sorted_avg_by_hour = sorted(avg_by_hour, key = lambda x: x[1], reverse = True)

print("Top 5 Hours for Ask Posts Comments")
for hour in sorted_avg_by_hour[:5]:
    print("{}:00: {:.2f} average comments per post".format(hour[0],hour[1]))

Top 5 Hours for Ask Posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


We should post around 3-4pm EST and 8-9pm EST for the largest chance of having good comment counts.