# Working with Hacker News Data

---
We will be analyzing posts from the Hacker News Website included in the `hacker_news.csv`

---

We will begin by reading in `hacker_news.csv` using the csv module and removing the headers. Please note that we will store the Hacker News Data Set in a variable named `hn`.

In [1]:
import csv

opened_file = open('hacker_news.csv', encoding='utf-8')
read_file = csv.reader(opened_file)
hn = list(read_file)

hn[:5]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12579008',
  'You have two days to comment if you want stem cells to be classified as your own',
  'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018',
  '1',
  '0',
  'altstar',
  '9/26/2016 3:26'],
 ['12579005',
  'SQLAR  the SQLite Archiver',
  'https://www.sqlite.org/sqlar/doc/trunk/README.md',
  '1',
  '0',
  'blacksqr',
  '9/26/2016 3:24'],
 ['12578997',
  'What if we just printed a flatscreen television on the side of our boxes?',
  'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43',
  '1',
  '0',
  'pavel_lishin',
  '9/26/2016 3:19'],
 ['12578989',
  'algorithmic music',
  'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext',
  '1',
  '0',
  'poindontcare',
  '9/26/2016 3:16']]

We will split the first row from the data set

In [2]:
headers = hn[0]
headers

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

In [3]:
hn = hn[1:]
hn[:5]

[['12579008',
  'You have two days to comment if you want stem cells to be classified as your own',
  'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018',
  '1',
  '0',
  'altstar',
  '9/26/2016 3:26'],
 ['12579005',
  'SQLAR  the SQLite Archiver',
  'https://www.sqlite.org/sqlar/doc/trunk/README.md',
  '1',
  '0',
  'blacksqr',
  '9/26/2016 3:24'],
 ['12578997',
  'What if we just printed a flatscreen television on the side of our boxes?',
  'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43',
  '1',
  '0',
  'pavel_lishin',
  '9/26/2016 3:19'],
 ['12578989',
  'algorithmic music',
  'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext',
  '1',
  '0',
  'poindontcare',
  '9/26/2016 3:16'],
 ['12578979',
  'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake',
  'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94',
  '1',
  '0',
  'markgainor1',
  '9/26/2016 3:14']]

The Hacker News platform uses labels such as **Ask HN** and **Show HN** as tags for their posts. **Ask HN** refers to posts that wish to ask the community for a response to a question. While **Show HN** refers to posts that aim to put a member's work on display for the community to respond. With that in mind, we are only interested in posts with either of these labels. We will begin by filtering the data for the two labels then creating a separate list for each.

# Filtering the Data Set

---

Let's begin by filtering data for **Ask HN** posts. To do this we will begin by using the `.startswith()` method.

In [4]:
for row in hn:
    title = row[1]
    if title.startswith('Ask HN'):
        print(title)

Ask HN: What TLD do you use for local development?
Ask HN: How do you pass on your work when you die?
Ask HN: How a DNS problem can be limited to a geographic region?
Ask HN: Why join a fund when you can be an angel?
Ask HN: Someone uses stock trading as passive income?
Ask HN: How hard would it be to make a cheap, hackable phone?
Ask HN: What is that one deciding factor that makes a website successful?
Ask HN: Is the world really short of software developers?
Ask HN: Geolocalized public API?
Ask HN: How to sell and idea?
Ask HN: Doesn't matter what p. say about U, as long as  Do You Agree?
Ask HN: What React charting lib. do you use?
Ask HN: Is cloud storage a solved problem?
Ask HN: Can a marketer become a tech entrepreneur and start a startup?
Ask HN: Why would government security and hacking be any good?
Ask HN: Are Americans really ready to give up their cars?
Ask HN: Have you ever visited example.com?
Ask HN: What are the best practises for using SSH keys?
Ask HN: Know how transf

Ask HN: Engineer working with rLoop needs place to crash in Bay area
Ask HN: How do you manage your backups?
Ask HN: How to easily check if two words are too close
Ask HN: Whats a modern CSS book?
Ask HN: How can I get true information about snowfall for skiing?
Ask HN: How do you deal with haters?
Ask HN: Developers  How did you learn to say NO?
Ask HN: Any nice quality JavaScript code one can learn from using Ramda.js?
Ask HN: Resume opinion needed
Ask HN: How did you recover from failure?
Ask HN: Does your employer let you choose your gear?
Ask HN: Will more women study computer science after smartphones and tablets?
Ask HN: How to start a successful side business
Ask HN: Georgia Tech MOOC MS/CS Experience
Ask HN: What are the proven techniques to build Twitter following?
Ask HN: RSS feeds you subscribe to?
Ask HN: Grokking concurrent applications?
Ask HN: Experience with professional mobile development courses?
Ask HN: Talented kid: what to do?
Ask HN: Should You be coding a client

Ask HN: An engineer's perspective on healthcare
Ask HN: How do you use (Linux) containers?
Ask HN: What's your choices if you write '7 network frameworks in 7 weeks'?
Ask HN: Need Advice with UK's Tier 1 Exceptional Talent Visa
Ask HN: How can I get iOS projects
Ask HN: Getting started with AI today?
Ask HN: iOS/XCode designing UI layouts tutorials?
Ask HN: Geizeer  eco friendly ice cooling
Ask HN: Recommendation for firewall software for Windows PCs
Ask HN: Starting my own customer analytics consultancy?
Ask HN: What has machine learning done for YOU?
Ask HN: Talking to an investor for the first time. Any advice?
Ask HN: How to get into clinical trials (how they work/operate etc)?
Ask HN: Best way to make quick, relatively simple HTML5 games today?
Ask HN: What to do if you're not good at your new job?
Ask HN: How to handle staging environments?
Ask HN: How should an Entrepreneur meetup be?
Ask HN: UK Entrepreneurs, how does a brexit/remain vote affect your start up?
Ask HN: Any HW st

Ask HN: How to avoid JS/CSS bloat?
Ask HN: Beating the Averages in 2016?
Ask HN: What widget toolkit does Chrome OS use?
Ask HN: What will it take for you pay for podcasts?
Ask HN: Anyone worked on Darpa software projects?
Ask HN: What does each part of an IP Address represent?
Ask HN: Can someone solve this mystery for me about a saying at JFK's funeral?
Ask HN: Open Amazon Echo Device?
Ask HN: Building an iOS app in golang
Ask HN: What's worse, noise or glare?
Ask HN: Other than HN, which websites do you check regularly?
Ask HN: What font do you use to print sensitive passwords?
Ask HN: Which analytics do you use on apps?
Ask HN: How well does the Windows version of node.js work?
Ask HN: Which framework for a CRUD app in 2016?
Ask HN: Best Cross-platform development tool that you've come across?
Ask HN: How to get PR for your website?
Ask HN: How does your company organize its knowledge?
Ask HN: What does a star next to a profile picture on ProductHunt means?
Ask HN: Should DEV and P

Ask HN: What's the Different Between a Senior and a Regular Developer?
Ask HN: What Router Do You Use?
Ask HN: how to effectively open source remote pet treat dispenser
Ask HN: Can anyone review my resume and provide feedback?
Ask HN: Those making $1,000+/month on side projects  what did you make?
Ask HN: Long held tech taboos being broken?
Ask HN: What is the average salary for a Head of Product in the UK?
Ask HN: How to quickly remove pages from Googles Index?
Ask HN: Who is looking for co-founders?
Ask HN: Research topics at the intersection of topology and computer science
Ask HN: What do you use as a web based BOM management tool?
Ask HN: New startup, what social account names should I secure early?
Ask HN: Any open source Scala project to contribute? (Besides libraries)
Ask HN: Which Programming Language Is the Best for Front End Development?
Ask HN: CMS for Saas Product
Ask HN: Shopping in Japan
Ask HN: Do you allow your kids to own an iPad?
Ask HN: Was Mark Zuckerberg wrong to 

Ask HN: Why is Azure so expensive?
Ask HN: GitHub thinks I'm a bot; what about my projects?
Ask HN: How can I transition to be a systems programmer?
Ask HN: Is there a webcast for the SpaceX Jason-3 launch tomorrow?
Ask HN: A felon has offered to invest in my startup, what are the implications?
Ask HN: How to call functions and do basic math in Clojure and core.logic?
Ask HN: How much does code-commenting cost your organization?
Ask HN: What are the most interesting roles for a non-technical person?
Ask HN: How to evolve the levels of a game while maintaining progress integrity?
Ask HN: How can I improve my beta subscribe page
Ask HN: Would a Canadian startup sponsor my work visa?
Ask HN: Did You Get Invited to YC Open Office Hours?
Ask HN: What book changed your life in 2015?
Ask HN: Must-read blogs for Startup CTOs?
Ask HN: Who has sold an algorithm and how?
Ask HN: Are we heading to a new Black Monday?
Ask HN: The Art of Computer Programming, Overrated?
Ask HN: Why are people offend

Ask HN: How to value small side projects? Can you sell them?
Ask HN: Your browser-to-server (XMLHttpRequest/WebSocket) free wishlist?
Ask HN: Is it common to run anti-malware on production linux boxes?
Ask HN: Examples of patterns that start well, but break
Ask HN: Where do I find a partner for a side project?
Ask HN: Why does Uber need ID to DELETE account?
Ask HN: Help me become a coder
Ask HN: Which tech companies have ball pits for employees to play in?
Ask HN: What does the rising value of the US dollar mean for me?
Ask HN: Explaining front-end frameworks to a designer
Ask HN: List of most payed programming languages: Why is ABAP not on the list?
Ask HN: What's going on with the Apple Genius bar?
Ask HN: Do you employ or have worked with a growth hacker?
Ask HN: How should I get start developing mac app without learning Obj-c
Ask HN: I have 130$/month of free Azure credit. How would you use it?
Ask HN: Should I use my old mailing list?
Ask HN: If Donald Knuth woke up 10,000 years 

Ask HN: Discontd UBNT AirRouter still best buy for sm home/office/OpenWRT bgnr?
Ask HN: Anyone doing Transcendental Meditation?
Ask HN: Start date is set, but I got another offer. What to do?
Ask HN: How would you sell open source software?
Ask HN: Do you deploy apps using deb/rpm packages?
Ask HN: Do you feel guilty when you quit a job?
Ask HN: Can this way we support publishers and readers, and also get rid of ads?
Ask HN: How to find remote freelance work for a graphic designer?
Ask HN: Will adblockers kill js widget products/startups?
Ask HN: Zoho mail custom domain and alternatives
Ask HN: Looking for suggestions on fairly basic IT Asset Management webapps
Ask HN: What do you think of the movie Office Space?
Ask HN: Which diplomas/certificates to make me able to get a TN visa?
Ask HN: Could sites let their ad provider serve content to thwart ad blockers?
Ask HN: I have an idea to fix the DNS change problem
Ask HN: Revision tips for a struggling high school student?
Ask HN: What do

We can see that there are many posts with **Ask HN**. To make sure we have all variations of the way "Ask HN" can be capitalized we can convert all title strings to lowercase. To prepare a separate list for the two different posts we can create two lists to store the data.

In [5]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

Now we can check the length of each list to see how many of each post there are.

In [6]:
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

9139
10158
273822


There are many more show posts in the data set. Let's take a closer look at each one.

In [7]:
print(ask_posts[:5])
print(show_posts[:5])

[['12578908', 'Ask HN: What TLD do you use for local development?', '', '4', '7', 'Sevrene', '9/26/2016 2:53'], ['12578522', 'Ask HN: How do you pass on your work when you die?', '', '6', '3', 'PascLeRasc', '9/26/2016 1:17'], ['12577908', 'Ask HN: How a DNS problem can be limited to a geographic region?', '', '1', '0', 'kuon', '9/25/2016 22:57'], ['12577870', 'Ask HN: Why join a fund when you can be an angel?', '', '1', '3', 'anthony_james', '9/25/2016 22:48'], ['12577647', 'Ask HN: Someone uses stock trading as passive income?', '', '5', '2', '00taffe', '9/25/2016 21:50']]
[['12578335', 'Show HN: Finding puns computationally', 'http://puns.samueltaylor.org/', '2', '0', 'saamm', '9/26/2016 0:36'], ['12578182', 'Show HN: A simple library for complicated animations', 'https://christinecha.github.io/choreographer-js/', '1', '0', 'christinecha', '9/26/2016 0:01'], ['12578098', 'Show HN: WebGL visualization of DNA sequences', 'http://grondilu.github.io/dna.html', '1', '0', 'grondilu', '9/25

## Analyzing comments of posts

---

Now that we have separate lists for ask posts and show posts, we are tasked with finding the number of comments for ask posts.

In [8]:
total_ask_comments = 0

for post in ask_posts:
    total_ask_comments += int(post[4])
    
avg_ask_comments = total_ask_comments / len(ask_posts)
   
print('The total number of comments belonging to ask posts is:', total_ask_comments)    
print('The total number of "Ask HN" posts is:', len(ask_posts))
print('The average number of comments for ask posts is:', avg_ask_comments)

The total number of comments belonging to ask posts is: 94986
The total number of "Ask HN" posts is: 9139
The average number of comments for ask posts is: 10.393478498741656


In [9]:
total_show_comments = 0

for row in show_posts:
    total_show_comments += int(post[4])

avg_show_comments = total_show_comments / len(show_posts)

print('The total number of comments belonging to show posts is:', total_show_comments)    
print('The total number of "Show HN" posts is:', len(show_posts))
print('The average number of comments for show posts is:', avg_show_comments)

The total number of comments belonging to show posts is: 203160
The total number of "Show HN" posts is: 10158
The average number of comments for show posts is: 20.0


We can see that all though there are more **Show HN** posts, the number of comments is significantly lower. This is somewhat expected due to the nature of these posts. It seems commenters are engaging more on posts where they are asked a specific question and engaging loss on posts that simply show something that a member wants to share.

## Time data of Ask HN Posts

---

Now that we know ask posts attract more comments, let's determine if time of day affects the number of comments received. We will do this by calculating the number of ask posts created and the number of comments received for each hour of the day. We will then calculate the average number of comments for each hour.

In [10]:
import datetime as dt

result_list = []

for post in ask_posts:
    result_list.append(
        [post[6], int(post[4])])
    
print(result_list[:5])

[['9/26/2016 2:53', 7], ['9/26/2016 1:17', 3], ['9/25/2016 22:57', 0], ['9/25/2016 22:48', 3], ['9/25/2016 21:50', 2]]


## Finding the Amount of Comments by Hour for Ask Posts

---

In [11]:
counts_by_hour = {}
comments_by_hour = {}
date_format = '%m/%d/%Y %H:%M'

for row in result_list:
    date = row[0]
    comment = row[1]
    time = dt.datetime.strptime(date, date_format).strftime("%H")
    if time not in counts_by_hour:
        counts_by_hour[time] = 1
        comments_by_hour[time] = comment
    else:
        counts_by_hour[time] += 1
        comments_by_hour[time] += comment
        
comments_by_hour

{'02': 2996,
 '01': 2089,
 '22': 3372,
 '21': 4500,
 '19': 3954,
 '17': 5547,
 '15': 18525,
 '14': 4972,
 '13': 7245,
 '11': 2797,
 '10': 3013,
 '09': 1477,
 '07': 1585,
 '03': 2154,
 '23': 2297,
 '20': 4462,
 '16': 4466,
 '08': 2362,
 '00': 2277,
 '18': 4877,
 '12': 4234,
 '04': 2360,
 '06': 1587,
 '05': 1838}

## Calculating the Average Number of Momments by Hour

---

In [12]:
avg_by_hour = []

for hr in comments_by_hour:
    avg_by_hour.append([hr, comments_by_hour[hr] / counts_by_hour[hr]])
    
avg_by_hour

[['02', 11.137546468401487],
 ['01', 7.407801418439717],
 ['22', 8.804177545691905],
 ['21', 8.687258687258687],
 ['19', 7.163043478260869],
 ['17', 9.449744463373083],
 ['15', 28.676470588235293],
 ['14', 9.692007797270955],
 ['13', 16.31756756756757],
 ['11', 8.96474358974359],
 ['10', 10.684397163120567],
 ['09', 6.653153153153153],
 ['07', 7.013274336283186],
 ['03', 7.948339483394834],
 ['23', 6.696793002915452],
 ['20', 8.749019607843136],
 ['16', 7.713298791018998],
 ['08', 9.190661478599221],
 ['00', 7.5647840531561465],
 ['18', 7.94299674267101],
 ['12', 12.380116959064328],
 ['04', 9.7119341563786],
 ['06', 6.782051282051282],
 ['05', 8.794258373205741]]

## Swapping and Sorting Values in a List

---

In [15]:
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
    
swap_avg_by_hour

[[11.137546468401487, '02'],
 [7.407801418439717, '01'],
 [8.804177545691905, '22'],
 [8.687258687258687, '21'],
 [7.163043478260869, '19'],
 [9.449744463373083, '17'],
 [28.676470588235293, '15'],
 [9.692007797270955, '14'],
 [16.31756756756757, '13'],
 [8.96474358974359, '11'],
 [10.684397163120567, '10'],
 [6.653153153153153, '09'],
 [7.013274336283186, '07'],
 [7.948339483394834, '03'],
 [6.696793002915452, '23'],
 [8.749019607843136, '20'],
 [7.713298791018998, '16'],
 [9.190661478599221, '08'],
 [7.5647840531561465, '00'],
 [7.94299674267101, '18'],
 [12.380116959064328, '12'],
 [9.7119341563786, '04'],
 [6.782051282051282, '06'],
 [8.794258373205741, '05']]

In [18]:
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

In [20]:
print("The 5 Busiest Hours for 'Ask HN' Comments")
for avg, hr in sorted_swap[:5]:
    print(
        "{}: There are {:.2f} average comments per post".format(
            dt.datetime.strptime(hr, "%H").strftime("%H:%M"),avg
        )
    )

The 5 Busiest Hours for 'Ask HN' Comments
15:00: There are 28.68 average comments per post
13:00: There are 16.32 average comments per post
12:00: There are 12.38 average comments per post
02:00: There are 11.14 average comments per post
10:00: There are 10.68 average comments per post


## For Further Analysis:

---

Here are some next steps for you to consider:

* Determine if show or ask posts receive more points on average.
* Determine if posts created at a certain time are more likely to receive more points.
* Compare your results to the average number of comments and points other posts receive.
* Use Dataquest's data science project style guide to format your project.